## PCA / FA Example 4: Davis. review (1)

It’s taking me a while to summarize Davis, partly because it isn’t just a summary. There are several interesting details to discuss, and then I need to ask: what did we see Harman and jolliffe do?

While I work on My Answers, let me show you the outline of interesting details, and barely refresh your memory of Harman & jolliffe.

In addition, I will show you one answer that blew me away. Maybe it shouldn’t have, but it did.

Part 1 outline

First off, i would summarize what we found in davis. We computed 4 matrices, $A^R$, $S^R$, $A^Q$, $S^Q$. Just as importantly, we computed them 3 different ways, and got – strictly speaking – 3 different answers because the sizes were different. The 3 different ways were the SVD, the full eigendecompositions of $X\ X^T$ and $X^T\ X$, and davis’ cut-down versions which used an invertible 2×2 matrix of $\sqrt{\text{eigenvalues}}$.

for 2 of the alternatives, I showed that Q-mode and R-mode were related by equations of the form

$S^Q = A^R\ \sqrt{\text{eigenvalues}}$

and

$S^R = A^Q\ \sqrt{\text{eigenvalues}}$,

where the exact form of $\sqrt{\text{eigenvalues}}$ depends on which of the 2 alternatives we used.

• is this true for the 3rd alternative?
• should i work it out?
• does it affect my recommendation that we either use the SVD or the full eigendecompositions in preference to davis’ cut-down version?
• we should take a careful look at the relationship between the eigendecompositions and the SVD.

Second, it was conceptually convenient that davis’ example turned out to be row-centered as well as column-centered: the mean of each row turned out to be zero, after we made the mean of each column zero. Had that not been the case, then we would have a number of possibilities. One, if X is column-centered but $X^T$ is not, then our R-mode analysis is of centered data but our Q-mode is not. Two, we could have row-centered the data before doing the Q-mode analysis; but that would change X and we would lose the relationship between R-mode and Q-mode.

• is there any chance that we can always center both the columns and the rows?
• is there any chance that we can always standardize both the columns and the rows?
• if we can’t have our cake and eat it too, should we give up the duality between R-mode and Q-mode?
• what if rows of the data matrix add up to 1 or 100 (i.e. the variables are percentages)?

If we can center both, davis’ example was not a special case, he merely didn’t show us that we could always have it that way. Further, if we can center both, then we are effectively using covariance matrices for X and $X^T$. If we could standardize both, then we could effectively use correlation matrices for both X and $X^T$. But if we cannot, then we end up using centered one way, uncentered the other, or standardized one way but not the other.

Third, we haven’t seen a serious example of Q-mode analysis. Davis shows that it can be viewed as a consequence of R-mode, nothing new. The general tenor of remarks about it in most of my books is negative. I’d like to see a serious example, but i’ll just have to keep my eyes open.

Fourth, how do we relate davis to harman? we saw three key things in harman:

• a plot that was not intuitively obvious to me at first.
• we could redistribute the variance of the original variables.
• his model, written Z = A F, was (old data, variables in rows) = (some eigenvector matrix) x (new data).

we should try the first two, and contrast Z = A F with davis’ model.

• Hmm, just what is davis’ model?

Fifth, how do we relate davis to jolliffe? well, what did jolliffe show us?

• serious rounding of numbers, to see structure.
• a lot of suggestions that we couldn’t carry out without the data.
• four criteria for deciding how many new variables to keep.
• his model, written Z = X A, was (new data) = (old data) x (orthogonal eigenvector matrix).

we should look at all those with the possible exception of jolliffe’s un-illustrated suggestions, and even those i may consider.

Sixth, can i interpret the $A^R$, $S^R$, $A^Q$, $S^Q$ in any or all of the 3 alternative definitions?

i’m still mucking about in this.

Part 2 oh, wow

Now, let me hit you with a thunderbolt. Well, it was a thunderbolt for me; maybe it won’t be for you.

This is the PCA breakthrough that i mentioned about a week ago in “happenings 24 march”. it’s a little embarrassing that i didn’t know this. That’s why it was a breakthrough. and it taught me a lesson:

if you have the design matrix X, always compute the new data wrt the appropriate orthonormal eigenvector basis (in this case my v ~ davis’ U).

Always.

Mucking about with the algebra didn’t prepare me for this. Yes, i had seen the answer, and thereby hangs my head. But algebra that just falls out neatly is wrong more often than right, for me anyway. i make mistakes that end up with neat answers way too often. Besides, i was mucking about with several matrices, looking for anything interesting.

It was too big a pot of stew. I’m still tasting that stew.

Then i asked myself a specific question: what is the new data, i.e. what are the components of the data wrt the orthonormal eigenvector matrix v?

For each data vector we have old components z = v y (cf Z = A F in harman), where z and y are column vectors. For row vectors, $x = y\ v^T$, where i have just redefined y to be a row vector. The corresponding matrix equation is

$Y = X v^{-T} = X \ v$

oh, of course. We project (each row of) X onto the orthogonal eigenvector matrix v to get new components of the row.

$X\ v = \left(\begin{array}{ccc} -6&3&3\\ 2&1&-3\\ 0.&-1&1\\ 4&-3&-1\end{array}\right)$ x $\left(\begin{array}{ccc} 0.816497&0&0.57735\\ -0.408248&-0.707107&0.57735\\ -0.408248&0.707107&0.57735\end{array}\right)$

=$\left(\begin{array}{ccc} -7.34847&0&0\\ 2.44949&-2.82843&0\\ 0.&1.41421&0\\ 4.89898&1.41421&0\end{array}\right)$

That looks familiar. Is any one of my four matrices a 4×3? yes, exactly one of them, $A^Q$. No, don’t tell me they’re the same.

$A^Q = \left(\begin{array}{ccc} -7.34847&0.&0.\\ 2.44949&-2.82843&0.\\ 0.&1.41421&0.\\ 4.89898&1.41421&0.\end{array}\right)$

god god damn. god damn. oh, damn. (that’s what i wrote then, and i see no reason to hide it.) They look the same.

Then i write out that

$X\ v = (u\ w \ v^T)\ v = u\ w = A^Q$

and i’m content. They really are the same. There was at least one large piece of steak hidden in that pot of stew.

It seems strange. The generally deprecated “Q-mode loadings” $A^Q$ are the new data under the change-of-basis v, which is the one we find from an R-mode analysis. (Something similar is true of $A^R$, but at least one matrix has to be transposed.)

To me, this makes $A^Q$ the most important matrix in the bunch. I understand “new data” under a change-of-basis.

It shocks me that davis didn’t say this. OTOH, his AQ only had two columns; it is not the same size as the design matrix. To be specific, it was

$\left(\begin{array}{cc} -7.34847&0.\\ 2.44949&2.82843\\ 0.&-1.41421\\ 4.89898&-1.41421\end{array}\right)$

Maybe now is a good time to dig out a quotation from John von Neumann. Darn, i can’t find it. It was something like

One doesn’t understand mathematics. One gets used to it.

PCA / FA seems to focus on “scores” instead of the new data.

That’s ok. Because i know what $A^Q$ is, i can cope with “scores”.