This example comes from PCA / FA (principal component analysis, factor analysis), namely from Jolliffe (see the bibliography). But it illustrates some very nice linear algebra.

More precisely, the source of this example is:

Yule, W., Berger, M., Butler, S., Newham, V. and Tizard, J. (1969). The WPPSL: An empirical evaluation with a British sample. Brit. J. Educ. Psychol., 39, 1-13.

I have not been able to find the original paper. There is a problem here, and I do not know whether the problem lies in the original paper or in Jolliffe’s version of it. If anyone out there can let me know, I’d be grateful. (I will present 3 matrices, taken from Jolliffe; my question is, does the original paper contain the same 3 matrices?)

Like the previous post on this topic, this one is self-contained. In fact, it has almost nothing to do with PCA, and everything to do with finding — or failing to find! — a transition matrix relating two matrices.

On p. 163, Jolliffe provides a matrix of “principal components”:

On the same page, he provides two matrices of the “rotated factor loadings”. One is “Varimax”…

… and the other is “direct quartimin”:

I presume — not knowing how to comute a varimax rotation! — that each of these matrices represents the same data, i.e. the same 4D subspace of R^{10} (since we have 4 column vectors of length 10 in each matrix). It would seem silly if a varimax rotation of PCs was not supposed to represent rotated PCs within the same subspace.

That presumption is wrong: they do not all represent the same subspaces.

How do I know this?

Because given two matrices of the same shape, I know how to find the transition matrix if it exists. We did this here.

Let me elaborate on what I’m doing. Imagine that our matrices had 2 columns and 3 rows. Those 3D column vectors lie in a 2D subspace — a plane — in . By doing 2D transformations within that plane, I might find a particularly nice basis for representing that data.

If, instead, I apply a 3D transformation, I would be moving that data into a different plane. What I am doing here is looking at 4D transformations within the 4D subspace. Everything is predicated on my belief that the PCs in P, and the varimax rotation of them in V, and the direct quartimin data in Q are all supposed to be in the same subspace.

Not knowing how they were supposed to be computed, I could be seriously wrong here. OTOH, since we will find that V and Q are the same subspace, and the first 3 of the 4 columns of P also lie in that common subspace, I really believe that the 4th column of P should too. That it does not strikes me as an error.

Let me make one simplification up front. Are the matrices P, V, Q of full rank (4)?

Yes. The Mathematica command MatrixRank says that each of those matrices is of rank 4.

Let us consider V and Q. If the observations in each are related by a transition matrix T, and if we arbitrarily take V to be the “old” data, then each column of V’ (the transpose of V) is found by applying the transition matrix to the corresponding column of Q’.

V’ = T Q’

i.e.

V = Q T’.

Then, as I showed before, I imagine briefly that I can “solve” for T’…

,

but quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

.

(It’s worth repeating that the inappropriate pseudo-inverse would use QQ’, but that can’t work because QQ’ is 10×10 but of rank 4 at most, hence not invertible.)

If Q’Q is invertible, that recipe always gives me a matrix T, but T need not be a transition matrix. The resulting T is a transition matrix if and only if we actually have

V = Q T’,

(given that V and Q are of the same full rank).

Do we have V = Q T’ ?

By direct computation I get

and then I look at V – Q T’…. You know, it’s easy enough to put them side by side.

Even the difference of the rounded-off matrices is satisfying small. That is,

**Round[V,.01]-Round[Q.TT,.01]//Chop//MatrixForm** is

Judging that the difference is negligible, I conclude that the matrices V and Q do span the same subspace of ; they are the same 4D subspace with two different coordinate systems (two sets of basis vectors).

While we’re here, let’s do this another way. After all, **the fundamental question is: can the columns of V be written as linear combinations of the columns of Q?** Alternatively, can each column of Q be written as a linear combination of the columns of V?

Bear in mind that we have already established that Q and V are of the same full rank. In that case, the alternatives are equivalent.

We can answer the fundamental question by trying to regress each column of V on Q. Just remember to tell the regression to drop the constant!

To be specific, here’s the first column of V as a linear combination of the 4 columns of Q. Here’s the parameter table and — not the adjusted : I really want to know how good this fit is, never mind how many variables it has.

Here’s a snapshot of the Mathematica commands (version 7). (Oh, darn. The last option in the LinearModelFit command, which replaced “Regress” is “IncludeConstant->False”. And I print the names just to confirm that none of those symbols has a left-over numerical value!)

(Let me just remind you that the is a measure of a how close the fit is. When we add another independent variable, however, the cannot decrease; it is not a good measure of the value — i.e. the worth — of that latest independent variable. The adjusted can decrease; in fact, in my experience, if the t-statistic of that latest variable is greater than 1 (in magnitude), then the adjusted goes up. As I said, in this case, I really want the rather than adjusted because I know exactly how many independent variables I am supposed to use. That said, there isn’t a whole lot of difference between the two in this case.)

That’s a really good fit. And it should be. And that column of , i.e. “Estimate”? It’s the first column of T’, i.e. the first row of T:

{0.9273, 0.0921888, 0.15161, 0.101824}

We could run 3 more regressions, but they’d only confirm what we believe: T’ could, in fact, be computed as regression coefficients. If the two subspaces coincide, then T is also a transition matrix between two bases in one and the same space.

There is yet another way to look at that. We know that in regression, yhat is the projection of y onto the subspace spanned by the columns of X. If y itself is in that subspace, then yhat = y, and the relatonship is given by a transition matrix.

So we could do this a third way. We know how to compute a projection operator onto the subspace spanned by the columns of X. In this case, X is Q, so we find its SVD (Singular Value Decomposition)…

Then the 4 leftmost columns (u1) of u are an orthonormal basis for the range of Q….

That is, u1′ u1 = I, if the column space of X is of full rank. Check it:

Bur the reversed matrix product

R = u1 u1′

gives us a projection operator R onto the range of Q:

We can check it. (I won’t print it: it’s 10×10.) One, R is a projection operator if and only if it is idempotent, i.e. R R = R. And by computation, I see that is true.

Two, applied to each column of Q, it reproduces that column of Q; i.e. applied to Q as a whole, it reproduces Q, i.e. R Q = Q. Again, computation shows that is true, too.

Having checked that R is a projection onto the column space of Q, let’s use it: apply it to V. If V lies in the subspace spanned by the columns of Q, then R V will reproduce V….

I find that the largest difference between the elements of R V and V is 0.00720726 .

That’s as close as the other two calculations. Not exactly zero, but then, our regression didn’t fit exactly: like the RGB and XYZ tables in the earlier post, V and Q are almost but not exactly related by a transition matrix.

So we have seen three ways of looking at this. All three tell the same story: the column spaces of V and Q are the same.

Let me say a little more about two of them, the projection operator and the explicit regression.

The projection operator R came from the SVD. The regression can also be written using a projection operator, namely the hat matrix H. I keep getting mixed up about the relationship between T and R and H, so let me write it out.

The normal equations for

are

.

(I showed you a quick and dirty way to recover that equation in the earlier post.)

Then

and so we may write

yhat = H y

with the hat matrix H defined as

.

For our regression, we took y to be the first column of V, and X to be the four colums of Q, so our hat matrix would be

.

It is a projection operator onto the range of Q. But that’s exactly what R was. Go ahead and check: you will find that

H = R.

This means that we could compute R as the hat matrix, instead of using the SVD. Personally, I know how to use the SVD, whereas I would have to work out the definition of the hat matrix every time. But that’s a preference; you may choose to compute the hat matrix instead. (And that is how Malinowski was doing it!)

Going a little further, T’ was

so we have

Q T’ = H V.

We’ll look at the relationship between P and V in Part 2.

March 30, 2010 at 8:28 am

You give me link or documment about “Yule, W., Berger, M., Butler, S., Newham, V. and Tizard, J. (1969). The WPPSL: An empirical evaluation with a British sample. Brit. J. Educ. Psychol., 39, 1-13.”.

Thanks

March 31, 2010 at 3:31 pm

Hi,

I guess you are _asking_ me to provide a link. (I also guess that English is not your native language.)

But, as I said in the post itself, “I have not been able to find the original paper. There is a problem here, and I do not know whether the problem lies in the original paper or in Jolliffe’s version of it.” That is, I don’t have a link or a copy.

If you find a link to it, could you please give it to me? Thanks, and good luck.

(And you might note how I asked the question. No offense intended, but it took me a while to realize you were asking a question.)