It’s taking me a while to summarize Davis, partly because it isn’t just a summary. There are several interesting details to discuss, and then I need to ask: what did we see Harman and jolliffe do?
While I work on My Answers, let me show you the outline of interesting details, and barely refresh your memory of Harman & jolliffe.
In addition, I will show you one answer that blew me away. Maybe it shouldn’t have, but it did.
Part 1 outline
First off, i would summarize what we found in davis. We computed 4 matrices, , , , . Just as importantly, we computed them 3 different ways, and got – strictly speaking – 3 different answers because the sizes were different. The 3 different ways were the SVD, the full eigendecompositions of and , and davis’ cut-down versions which used an invertible 2×2 matrix of .
for 2 of the alternatives, I showed that Q-mode and R-mode were related by equations of the form
where the exact form of depends on which of the 2 alternatives we used.
- is this true for the 3rd alternative?
- should i work it out?
- does it affect my recommendation that we either use the SVD or the full eigendecompositions in preference to davis’ cut-down version?
- we should take a careful look at the relationship between the eigendecompositions and the SVD.
Second, it was conceptually convenient that davis’ example turned out to be row-centered as well as column-centered: the mean of each row turned out to be zero, after we made the mean of each column zero. Had that not been the case, then we would have a number of possibilities. One, if X is column-centered but is not, then our R-mode analysis is of centered data but our Q-mode is not. Two, we could have row-centered the data before doing the Q-mode analysis; but that would change X and we would lose the relationship between R-mode and Q-mode.
- is there any chance that we can always center both the columns and the rows?
- is there any chance that we can always standardize both the columns and the rows?
- if we can’t have our cake and eat it too, should we give up the duality between R-mode and Q-mode?
- what if rows of the data matrix add up to 1 or 100 (i.e. the variables are percentages)?
If we can center both, davis’ example was not a special case, he merely didn’t show us that we could always have it that way. Further, if we can center both, then we are effectively using covariance matrices for X and . If we could standardize both, then we could effectively use correlation matrices for both X and . But if we cannot, then we end up using centered one way, uncentered the other, or standardized one way but not the other.
Third, we haven’t seen a serious example of Q-mode analysis. Davis shows that it can be viewed as a consequence of R-mode, nothing new. The general tenor of remarks about it in most of my books is negative. I’d like to see a serious example, but i’ll just have to keep my eyes open.
Fourth, how do we relate davis to harman? we saw three key things in harman:
- a plot that was not intuitively obvious to me at first.
- we could redistribute the variance of the original variables.
- his model, written Z = A F, was (old data, variables in rows) = (some eigenvector matrix) x (new data).
we should try the first two, and contrast Z = A F with davis’ model.
- Hmm, just what is davis’ model?
Fifth, how do we relate davis to jolliffe? well, what did jolliffe show us?
- serious rounding of numbers, to see structure.
- a lot of suggestions that we couldn’t carry out without the data.
- four criteria for deciding how many new variables to keep.
- his model, written Z = X A, was (new data) = (old data) x (orthogonal eigenvector matrix).
we should look at all those with the possible exception of jolliffe’s un-illustrated suggestions, and even those i may consider.
Sixth, can i interpret the , , , in any or all of the 3 alternative definitions?
i’m still mucking about in this.
Part 2 oh, wow
Now, let me hit you with a thunderbolt. Well, it was a thunderbolt for me; maybe it won’t be for you.
This is the PCA breakthrough that i mentioned about a week ago in “happenings 24 march”. it’s a little embarrassing that i didn’t know this. That’s why it was a breakthrough. and it taught me a lesson:
if you have the design matrix X, always compute the new data wrt the appropriate orthonormal eigenvector basis (in this case my v ~ davis’ U).
Mucking about with the algebra didn’t prepare me for this. Yes, i had seen the answer, and thereby hangs my head. But algebra that just falls out neatly is wrong more often than right, for me anyway. i make mistakes that end up with neat answers way too often. Besides, i was mucking about with several matrices, looking for anything interesting.
It was too big a pot of stew. I’m still tasting that stew.
Then i asked myself a specific question: what is the new data, i.e. what are the components of the data wrt the orthonormal eigenvector matrix v?
For each data vector we have old components z = v y (cf Z = A F in harman), where z and y are column vectors. For row vectors, , where i have just redefined y to be a row vector. The corresponding matrix equation is
oh, of course. We project (each row of) X onto the orthogonal eigenvector matrix v to get new components of the row.
That looks familiar. Is any one of my four matrices a 4×3? yes, exactly one of them, . No, don’t tell me they’re the same.
god god damn. god damn. oh, damn. (that’s what i wrote then, and i see no reason to hide it.) They look the same.
Then i write out that
and i’m content. They really are the same. There was at least one large piece of steak hidden in that pot of stew.
It seems strange. The generally deprecated “Q-mode loadings” are the new data under the change-of-basis v, which is the one we find from an R-mode analysis. (Something similar is true of , but at least one matrix has to be transposed.)
To me, this makes the most important matrix in the bunch. I understand “new data” under a change-of-basis.
It shocks me that davis didn’t say this. OTOH, his AQ only had two columns; it is not the same size as the design matrix. To be specific, it was
Maybe now is a good time to dig out a quotation from John von Neumann. Darn, i can’t find it. It was something like
One doesn’t understand mathematics. One gets used to it.
PCA / FA seems to focus on “scores” instead of the new data.
That’s ok. Because i know what is, i can cope with “scores”.