Using the QR Decomposition to orthogonalize data

This is going to be a very short post, illustrating one idea with one example (yes, one, not five).

It turns out that there is another way to have Mathematica® orthogonalize a matrix: it’s called the QR decomposition. The matrix Q will contain the orthogonalized data… and the matrix R will specify the relationship between the original data and the orthogonalized.

That means we do not have to do the laborious computations described in this post. Understand, if we do not care about the relationship between the original data and the orthogonalized data, then I see no advantage in Mathematica to using the QR over using the Orthogonalize command.
Read the rest of this entry »

the relationship between the raw and the orthogonalized data

OK, so we orthogonalized the hald data, including the constant (the column of 1s).

What’s the relationship between the new variables and the old? We might someday get a new observation, and if we were using the fit to the orthogonalized data, we might want to see what it predicts for a new data point.

(In all honesty, I would use the original fit – but I still want to know what the relationship is.)

My notation is a little awkward. I’m going to stay with what is used for this post, in which I first showed how to find….

Let me start fresh. If we have two typical data matrices (i.e. taller than wide), and they are supposed to be the same data, how do we find the relationship?
Read the rest of this entry »

Andrews Curves

introduction

A long time ago, in a post about PCA (principal component analysis), I said that I did not know what Andrews curves were. (The suggestion was made that Andrews curves might help us decide how many principal components to keep. I did not understand how they were to be computed.)

Now I know. Let me show you. I will compute Andrews curves for what is called “the iris data”… for both the full data set (150 observations) and a reduced data set (30 observations). I will also show you a possible variant.

In addition, we will know that there are in fact three kinds of irises in the data – so we can assess how well the Andrews curves did. In practive, of course, we will be trying to figure out how many kinds of observations we have.

The data is here. The paper which explained Andrews curves to me is here. Andrews original paper is: Andrews D. Plots of high-dimensional data Biometrics 1972 28:125-136… but I haven’t found it freely available anywhere online.

In addition, there is a short and sweet webpage by one of the authors of the paper I read.
Read the rest of this entry »

Regression 1: You inverted what matrix?

Edit: 2011 Nov 25: In “How the Hell” I have some negative signs. Find “edit”.

I want to show you something about LinearModelFit that shocked me. I will use the Toyota data. Let me point out that I am using version 7.0.1.0 of Mathematica@. Furthermore, this post is in two categories, OLS and linear algebra; the example comes from regression, but the key concept is the inverse of a matrix.
Read the rest of this entry »

Norms and Condition Numbers

We have had a few occasions to talk about norms of vectors. I want to take a look at them, and to also look at norms of matrices, and then at condition numbers of matrices.

Vector Norms

Let me jump right in. Here’s a 2-dimensional vector:

we get the Euclidean length of the vector – it’s the length of the hypotenuse of a 3-4-5 right triangle.


Read the rest of this entry »

Compressed Sensing: the L1 norm finds sparse solutions

I’ve just finished watching three hours of lectures on something called compressed sensing. I had fully expected to put out a post showing how to compute time from position on an elliptical orbit. But I really want to show you a key piece of mathematics. This was utterly new to me.

I want to show you, via a small example, that minimizing the L1 norm often returns sparse solutions.

Hey, what?

Imagine you have a one megapixel photograph. More likely, the raw file created by your digital camera is 8 to 15 megapixels… but if you save it as JPEG, you may compress it by a factor of – what? – 20? (I’m too busy to work it out.)

The key is that there is some representation in which the photograph has a sparse representation… the data can be represented with good accuracy by fewer numbers than we started with. We see something similar when we represent a sound recording by its Fourier coefficients, or computer tomography by a Radon transform.

Let me get to the simple example.
Read the rest of this entry »

Regression 1 – Isolating & Identifying Linear Dependence

I almost named this post “Nailing Linear Dependence”. It’s so easy….

Introduction

This post will show just how easy it is to isolate and identify linear dependence. Looking at subsets, as we did in this previous post, works – but what I’m about to show you is easier and faster. (Henceforth I will refer to that link as “the previous post” or once as “that post”.)

On the other hand, we will see a case where looking at subsets gives an equivalent answer that might be more appealing.

Now, I’m going to solve the same five examples as in the previous post. I am not going to duplicate the introductory material in the previous post, so if you need more context, please read that post. You might or might not want to spend much time looking at the details of examining subsets of the columns of X – that examination is what I’m about to replace by something more incisive.

As usual, I will use either XT or X’ to denote the transpose of X; and v’ or vt to denote the transpose of v.

Example 1

Read the rest of this entry »

Rotations: figuring out what a rotation does

I want to write a short emphatic post making one major point. I’ve made it before, but I think it needs to be hammered home.

In addition to the links in this post, you could use the Tag Cloud to find related posts.

I mentioned in this happenings post, near the end, that I had found a nice introductory book on the control of aircraft (Thomas Yechout et al., “Introduction to Aircraft Flight Mechanics…”, AIAA 2003, ISBN 1-56347-4344).

I looked at it yesterday morning. I was planning to spend the day working on the next regression post … and I did … but I figured I’d turn my kid loose first, and he wanted to play in that book.

I still think it’s a nice book… but two of their drawings for rotations are mighty confusing… at first glance, they appear to be wrong. It turns out they are right, but the display is unusual.

I’ve talked about this out before, but let me put it in a post all by itself. (This post seemed essential when I thought they had made a mistake; it is merely desirable once I saw that the reader had to take a careful look at their drawings.)

Suppose we are given the following rotation matrix…

Rz[\theta] = \left(\begin{array}{ccc} \cos (\theta ) & \sin (\theta ) & 0 \\ -\sin (\theta ) & \cos (\theta ) & 0 \\ 0 & 0 & 1\end{array}\right)
Read the rest of this entry »

Color: Cohen’s intriguing F matrix again

Minor edits, 4 Jan 2010.

Let me run thru the derivation of Figure 20 again, still with the CIE 1964 tables, but at 20 nm intervals. Then I show two alternative calculations. And one of them will show us why Cohen switched the sign on f2.

There is interesting linear algebra and vector algebra in this post, even if you’re not interested in where these problems came from.

Review:

Read the rest of this entry »

Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:
Read the rest of this entry »