I would like to revisit the work we did in Davis (example 4). For one thing, I did a lot of calculations with that example, and despite the compare-and-contrast posts towards the end, I fear it may be difficult to sort out what I finally came to.
In addition, my notation has settled down a bit since then, and I would like to recast the work using my current notation.
The original (“raw”) data for example 4 was (p. 502, and columns are variables):
It would be prudent to compute the means…
and variances of the data…
It would also be prudent to calculate the row sums.
They are all the same (we have “constant row sums”).
Now, Davis chooses to analyze centered data. Let’s get it, by subtracting its mean from each column.
That is what he has on page 503. I remind us that we expect (HERE) that the row sums of the centered data are now zero. That in turn means that we have lost rank: the centered data should have two nonzero eigenvalues or singular values, instead of 3.
Next, Davis chooses to analyze instead of the covariance matrix . We recall that these two have the same eigenvectors, but eigenvalues differ by a factor of N-1. In fact, Davis also uses the singular value decomposition (SVD); that is a good reason for doing an eigendecomposition of .
Here is the SVD of X, . (Always remember that my u and v correspond to Davis’ V and U respectively, if you’re looking at both of us.)
Sure enough, there are only two nonzero singular values in the w matrix.
Let us now compute Davis’ loadings and scores. Using my notation for the SVD, he has
, but use the definition for computation: one less matrix product.)
(I should remark that we see that we could have done all of those calculations using the data X, the singular values w, and the matrix v: we do not need u.)
and we get
Davis does not have any of these extra columns of zeros, but all of our nonzero numbers agree (to within a sign for each column). At this point, I think the most significant thing to say is that these scores and loadings do not reproduce the data; in the sense that, the matrix product of corresponding scores S and loadings A is not X.
Let us see what Jolliffe would have done.
He writes his model as
Z = X A,
with Z as the new data, and A as an orthogonal eigenvector matrix; but I would write that as
Y = X v,
since I reserve A for the weighted eigenvector matrix.
(I have to point out that, as we saw in previous posts, .)
Now, Jolliffe would usually do an eigendecomposition of the covariance matrix; but we have enough experience to understand that we can do an eigendecomposition of . Here it is:
Here are its eigenvalues:
Here is an orthogonal eigenvector matrix (now I use my standard notation, V for an orthogonal matrix computed from an eigendecomposition, versus v from the SVD):
We can check that . (It does.)
We know that V should differ from v (from the SVD) by at most the signs of columns. Recall v from the SVD:
In fact, they agree completely.
Jolliffe would project the data X onto the eigenvector matrix V to get new data Y = XV:
Now we should recall , to confirm the equivalence:
We recall that Y = X v = X V is equivalent to the change of basis formula
x = v y,
where x is an old observation and y is the corresponding new one.
We learned – from Harman! – that the new data Y has maximally redistributed variance. It is crucial that we used an orthogonal eigenvector matrix v or V. Let’s confirm that by computing the variances of the columns of Y…
and their sum is 32.
Let’s recall the variances of the columns of X…
and their sum is also 32.
Now, what about Harman? He writes his model as
Z = A F,
(i.e. his variables are in rows, not in columns) and A is a -weighted eigenector matrix. And I generally favor his notation for Z, but I will follow Jolliffe for a few seconds later on.
Let us compute. From the eigenvalues of c,
we construct a diagonal matrix of their square roots:
We should expect that these are the nonzero nonsingular values w. Sure enough:
Note, however, that w is not square, hence is not equal to the diagonal matrix . This will lead to an irrelevant difference between A and , etc. In fact, I often ignore the distinction between and w conceptually; in practice, of course, whatever matrices we’re using have to be “the right size” for whatever matrix products we compute (conformable).
We should recall that the nonzero eigenvalues are the nonzero values of both and :
The nonzero eigenvalues were…
Resuming our thread, we compute the weighted eigenvector matrix A
and that should be Davis’ ; and it is, except for an extra – to my mind, irrelevant – column of zeros.
Viewing A as a transition matrix, we know that Harman’s scores are the new data . We also know (HERE) that we may compute them as
that is, the “first few” columns of u, not all of u.
Ah, life gets interesting because A is of rank 2 instead of 3. The new data should be the first two columns of u, but if we want to see (Harman’s) Z = A F, then we must either cut down A or keep the first 3 columns of u, to have conformable matrices. I will cut down A.
We check that by computing A F and comparing it to ; we expect Z = AF. And we have it; here’s Z:
And here’s A F (using the two nonzero columns of A!):
I remind us that this new data does not have redistributed variance; in fact, this new data has constant variance:
There are a couple of (intertwined) ways to understand why the variances are not 1. Fundamentally, we did an eigendecomposition of rather than of the covariance matrix . Alternatively, our new data was given by columns of u rather than by columns of . (And that’s a consequence of the “fundamental” reason, but it may make sense by itself.)
That constant variance is in marked contrast to the maximally redistributed variances of the new data Y constructed from the orthogonal matrix v instead of from A.
It is my strong preference to compute the new data from the first few columns of u. To make sure I have the scaling correct, I then check the product A F.
Any number of people, including Davis himself in a subsequent example, show us how to compute the new data using a reciprocal basis instead; that is, one which weights the v matrix by instead of . (Whether or not an author uses the phrase reciprocal basis or dual basis, if he divides each column of v by , he is computing a reciprocal basis.)
Life is made interesting, but not difficult, by the presence of a zero eigenvalue. Frankly, the simplest way to deal with a zero eigenvalue – for finding a reciprocal basis – it is to set the zero eigenvalue to 1. Instead of the diagonal matrix
and instead of , I compute the reciprocal basis (i.e. I divide each column by instead of multiplying by ).
Then the new data with respect to A can be found by projecting X onto B:
That had better be … and it is:
(Okay, that was the simplest way to get a reciprocal basis when we have a zero eigenvalue; but the simplest way to get the new data is to avoid the reciprocal basis entirely by taking the first few columns of u instead – assuming that you have the SVD available to you.)
For this example, Davis did not compute the new data with respect to A (i.e. wrt his ). As I have said before, I have seen him do it in another example; but his scores are not the new data , which is, as far I have seen in textbooks, what other people use for the scores corresponding to loadings A.
Moving on, we learned from Basilevsky (HERE) that the weighted eigenvector matrix was also the dispersion matrix between the new data and the old data X. I say “dispersion matrix” because A itself was not computed from a covariance matrix but from a more general “dispersion” matrix .
Let’s confirm that. For just a moment I want to denote the new data by (Jolliffe’s) Z instead of , so as not to get confused by transposes. A quick computation (on the side) shows me that I want to compare A with , i.e. that I want to compute
(finally reverting to my original notation). Fine, I compute that product, and I get
and I compare that to A:
Good. Our matrix A is, as we expect, a cross-dispersion matrix between old and new data. (Perhaps I should be less meticulous, and refer to A as a cross-covariance matrix; that term may be more familiar and hence clearer in sense, even though it is not the covariance matrix.)
Now let me show you something new. We had also discovered from Basilevsky (HERE) that the orthogonal eigenvector matrix v was not a dispersion matrix D between its old data X and its new data Y. The v matrix does not have this property of the A matrix. We had seen that v and D differed by eigenvalue column-weights:
But… we’ve just seen that weight here. Okay, it wasn’t , but it was either or . Well, it must have been , i.e. the one that multiplied v:
Of course, I should check that computationally. Here is :
and here is our :
Hang on. We need to explicitly see , the cross-dispersion matrix analagous to : we compute it…
I don’t know if it’s useful, but there it is anyway. I can hear my favorite Vulcan whisper, “Fascinating.”
Don’t misunderstand me: I am sure that the Q-mode scores and loadings are interesting in their own right. (I don’t understand the goal of a Q-mode analysis, but that’s another question.)
But I do find it… fascinating… that Davis’ Q-mode scores are the cross-dispersion matrix between the old data X and the new data Y (in addition to our earlier finding that his Q-mode loadings are the new data Y with respect to the transition matrix v).
To put that another way…. The orthogonal eigenvector matrix v could have been computed from an eigendecomposition of the dispersion matrix , and defines new data Y; and is the cross-dispersion between old data X and new data Y.
I had asked a question: is the orthogonal matrix v the dispersion matrix D between the old data X and the new data Y? The answer was, “No” (HERE).
Now we know the rest of the story: not v, but the Q-mode scores are that dispersion matrix D.