## PCA / FA Example 1: Harman. discussion 1.

harman gave us two outputs as a result of his analysis: a table and a picture.

let’s consider the picture first. it clearly shows z2 and z5 similar, z1 and z3 similar, and z4 roughly in the middle between them.

i don’t know about you, but this surprised me. should it have? well, let’s take a look at the correlation matrix, from which we got our results. i’m going to round it off just a little bit, so i can refer to 3-digit numbers instead of 5.

$\left(\begin{array}{ccccc} 1.&0.01&0.972&0.439&0.022\\ 0.01&1.&0.154&0.691&0.863\\ 0.972&0.154&1.&0.515&0.122\\ 0.439&0.691&0.515&1.&0.778\\ 0.022&0.863&0.122&0.778&1.\end{array}\right)$

we confirm that there is a high correlation (.972) between z1 and z3, and and one almost as large (.863) between z2 and z5. z4 is about equally related to z2 (.691) and to z5 (.778). the same information as the picture seems to be there, but i didn’t see it.

let’s learn from that blindness. how might we have expected it? this time, round off the correlation matrix severely (to the nearest .5).

$\left(\begin{array}{ccccc} 1.&0&1.&0.5&0 \\ 0&1.&0&0.5&1. \\ 1.&0&1.&0.5&0 \\ 0.5&0.5&0.5&1.&1. \\ 0&1.&0&1.&1.\end{array}\right)$

there we are: on the off-diagonal, we have 1s for z1 & z3, z2 & z5 (good), and also for z4 & z5 (not so good). at the other extreme, we have 0s for z1 & z2 and z1 & z5, so z2 & z5 are far from z1. finally, we have either .5 or 1 for z4 vs everything else.

there was nothing sacred about .5. if we round to the nearest .25, instead, we get …

$\left(\begin{array}{ccccc} 1.&0&1.&0.5&0\\ 0&1.&0.25&0.75&0.75\\ 1.&0.25&1.&0.5&0\\ 0.5&0.75&0.5&1.&0.75\\ 0&0.75&0&0.75&1.\end{array}\right)$

we see similar relationships, but now the only off-diagonal 1 is between z1 & z3; z2 seems to be equally related to z4 & z5. but z4 is still the only variable related to everything else. bear in mind that we – i at least – still think the third variable, P3 (cf. F3) might be important.

conclusion: the picture we got showed us some strong relationships that are not quite so clearly seen by playing with the correlation matrix.

i find it reassuring that the relationships can be somewhat seen in the correlation matrix; i begin to trust that the relationships were not produced by our computations.

conclusion: severe rounding can be informative.

before we leave the picture for his table, let’s go to 3D. we take the first three columns of the weighted eigenvector matrix A…

$\left(\begin{array}{ccc} 0.580958&0.806421&0.0275932\\ 0.767036&-0.544759&0.319267\\ 0.672433&0.726044&0.114925\\ 0.932392&-0.104306&-0.307804\\ 0.79116&-0.558179&-0.0647204\end{array}\right)$

… and we plot them as points in 3D:

that’s disappointing: z1 & z3 are still close, but z2 & z5 are not. ok, we have something to look at down the road. what we’re nibbling on the edges of is reduction of dimensionality: can we replace 5 variables by fewer? we are not done with this, not by a long shot.

open: investigate reduction of dimensionality.

now let’s go look at his table. recall it:

$\left(\begin{array}{ccccccc} Variable&P1&P2&P3&P4&P5&Variance\\ 1&0.581&0.8064&0.0276&-0.0645&-0.0852&1.\\ 2&0.767&-0.5448&0.3193&0.1118&-0.0216&1.0002\\ 3&0.6724&0.726&0.1149&-0.0073&0.0862&0.9999 \\ 4&0.9324&-0.1043&-0.3078&0.1582&0&1.\\ 5&0.7912&-0.5582&-0.0647&-0.2413&0.0102&0.9999\\ Variance&2.8733&1.7967&0.2148&0.0999&0.0153&5.\\Percent&57.5&35.9&4.3&2.&0.3&100.\end{array}\right)$

there are a few things which i find confusing at best, misleading at worst. (in fact, i think the table is inconsistent.)

i do not count his use of $P_i$ instead of $F_i$, although you might. maybe i should not have used his notation. he used $P_i$ instead of $F_i$ simply because his is a principal components example, but he will later do a factor analysis of the same data. eventually, he will obtain some other coefficients which he will multiply by $F_i$

this is just like my distinquishing the orthogonal eigenvector matrix P from the weighted eigenvector matrix A. you haven’t seen me use P since we got A, but i want both available; we won’t see him compute the other coefficients, but he wanted both labels available. i do plan to discuss his factor analysis; unfortunately, the example i need to use is not based on this data.

the first thing i object to is the column headings. whether $P_i$ or $F_i$, they are misleading. the column headed P1 contains the first column of the weighted eigenvector matrix A. P1 (cf. F1) should denote the first column of the F matrix. instead, he is using the column heading as a reminder that in the equation…

$z_2 = .767 P_1 - .545 P_2 + .319 P_3 + .112 P_4 - .022 P_5$

we will multiply, for example, the 3rd column number .319 by P3. numerically, that just says that the second row of Z is the matrix product of the second row of A with (each column F1, F2, … of) the F matrix. his headings really stand for the columns of F.

his fundamental model is

$Z = A \ F$

and for the second row of Z, that equation would be written with $F_i$ instead of $P_i$:

$z_2 = .767 F_1 - .545 F_2 + .319 F_3 + .112 F_4 - .022 F_5$

as i said, maybe i made a mstake in switching to his notation; i’ll keep it in mind for the future. and no, he did not write that model as Z = A P; although he changed $F_i$ to $P_i$, he did not change the matrix F to P. good thing, or i’d have had to use a symbol different from P for the orthogonal eigenvector matrix.

the second thing i object to is both of his “variance” labels. in fact, the numbers in the second-to-last row are the squared lengths of the columns of the weighted eigenvector matrix A. and the numbers in the last column are the squared lengths of the rows of the weighted eigenvector matrix A.

i shouldn’t be hasty. maybe those squared lengths are the variances.

ok, let’s compute F. hmm. we need Z. harman never computed Z. as i said, what he wanted was that drawing showing the relationship between the Z variables and the F varables.

ok, let’s figure out what Z might be. we can certainly construct a Z matrix with variances equal to 1, which is what he says they are.

(the only reason i found this confusing is that for his theoretical work he has taken Z to be “small standard deviates” – where you divide each datum by$\sqrt{N-1} = \sqrt{11}$ – instead of “standardized”.)

in order for the variances of the Z variables to be 1, all we need to do is standardize the data. we recall our data matrix D…

$D = \left(\begin{array}{ccccc} 5700&12.8&2500&270&25000\\ 1000&10.9&600&10&10000\\ 3400&8.8&1000&10&9000\\ 3800&13.6&1700&140&25000\\ 4000&12.8&1600&140&25000\\ 8200&8.3&2600&60&12000\\ 1200&11.4&400&10&16000\\ 9100&11.5&3300&60&14000\\ 9900&12.5&3400&180&18000\\ 9600&13.7&3600&390&25000\\ 9600&9.6&3300&80&12000\\ 9400&11.4&4000&100&13000\end{array}\right)$

we standardize D and call it X:

$X = \left(\begin{array}{ccccc} -0.157462&0.760313&0.134277&1.29792&1.25637\\ -1.52374&-0.303192&-1.39649&-0.964376&-1.09933\\ -0.826067&-1.47865&-1.07422&-0.964376&-1.25637\\ -0.709788&1.2081&-0.510254&0.166772&1.25637\\ -0.651648&0.760313&-0.590821&0.166772&1.25637\\ 0.569284&-1.75852&0.214844&-0.529319&-0.785234\\ -1.4656&-0.0233225&-1.55762&-0.964376&-0.157047\\ 0.830912&0.0326515&0.778809&-0.529319&-0.47114\\ 1.06347&0.592391&0.859375&0.514817&0.157047\\ 0.976261&1.26408&1.02051&2.34206&1.25637\\ 0.976261&-1.03085&0.778809&-0.355296&-0.785234\\ 0.918122&-0.0233225&1.34277&-0.181274&-0.628187\end{array}\right)$

i have changed the name; this let’s me distinquish the data matrix D from a design matrix X. further, we will let

$Z = X^T$

that may seem like a silly step to you, but i have enough trouble sorting out what’s going on without having to translate things in my head: i really want both X and Z. oh, why do i want both? because Z in harman’s model has variables in rows, not in columns. back to our model:

$Z = A \ F$

$A^{-1}\ Z = F.$

(BTW, A inverse exists because none of the eigenvalues were zero. A is an eigenvector matrix weighted by the (square roots of the) eigenvalues. it came from an orthogonal matrix P of unit eigenvectors. P is trivial to invert; just take the transpose: $P^T = P^{-1}.$ A will be invertible so long as the weights are nonzero, because the corresponding column of A will be zero if and only if a weight is zero.)

next, bizarre as you may think me, i’m going to take the transpose of the last equation. and i’m going to abuse notation (so i’ve been told), writing $A^{-T}$ for the inverse transpose of A…

$Z^T \ A^{-T} = F^T$

but that’s

$X \ A^{-T} = F^T$

why am computing $F^T$, the transpose of F? for one thing, both Z and F have 12 columns and don’t fit in the page! for another, Mathematica wants variables in columns: its Mean, Variance, Standardize, Covariance, and Correlation commands each want $F^T$, not F. (did you think i was computing the correlation matrix step-by-step?)

here is $F^T= X \ A^{-T} =$

$\left(\begin{array}{ccccc} 0.96967&-0.712621&-1.05659&-0.0364685&1.40114\\ -1.33148&-0.758802&0.319568&1.87313&0.308985\\ -1.47203&0.0897711&-1.11798&0.463894&-0.205381\\ 0.459637&-1.29109&0.813821&-0.922761&0.206882\\ 0.332999&-1.16178&0.112718&-1.45529&0.0608755\\ -0.692033&1.15022&-1.43036&-1.292&0.00102472\\ -1.02327&-1.17542&0.372886&-0.114468&-0.693806\\ 0.0574899&0.854874&1.4722&-0.256639&-0.598284\\ 0.784582&0.566317&0.691755&0.349624&-1.81338\\ 1.8796&-0.0589858&-1.18423&1.38369&-0.631389\\ -0.227041&1.33005&-0.24434&-0.506254&-0.113582\\ 0.261862&1.16748&1.25055&0.513541&2.07692\end{array}\right)$

ok, like the Z variables, the F variables have zero means:

${2.0 x 10^{-16},\ 2.0 x 10^{-16},\ 1.0 x 10^{-15},\ 9.0 x 10^{-16},\ 1.1 x 10^{-15}}$

that is to say,

${0,\ 0,\ 0,\ 0,\ 0}$

and, like the Z variables, the F variables have unit variances:

${1.,\ 0.999999,\ 1.00002,\ 0.999961,\ 1.00002}$

oops. this is not ok. the table is wrong: harman said the variances were the eigenvalues. (and if you happen to know, vaguely or precisely, that PCA was supposed give us new variabes that redistribute the variance, you were really, really expecting to see the eigenvalues. welcome to the club.)

what’s going on? it’s time to do a little algebra.

i need one key relationship. if we have a matrix M with variables in columns, with N rows, and with each variable having zero mean (called centered data), then the covariance matrix of M is given by

$c = \frac{1}{N-1}\ M^T M.$

further, if the variance of each variable is 1, then the covariance matrix c is also the correlation matrix r.

let’s lay out all the equations we have.

one, X contains columns of standardized variables, so the correlation matrix r (= the covariance matrix) of X (or of Z) is given by

$r = \frac{1}{N-1} X^T X = \frac{1}{N-1} Z Z^T \ or\ Z Z^T = (N-1) r.$

two, the covariance (not necessarily correlation) matrix of F is given by

$c = \frac{1}{N-1} F F^T.$

(i’m going to take it for granted that the means of the F values are zero; after all, the Fs a linear combination of Xs, which do have zero mean.)

three, the eigendecomposition was

$\Lambda = P^{-1}\ r \ P = P^T\ r\ P, \ or \ r = P \Lambda P^{-1}$

with P orthogonal and $\Lambda$ diagonal.

four, the weighted eigenvector matrix A is

$A = P \sqrt{\Lambda } = P \Lambda ^{1/2} \ or\ A^{-1} = \Lambda ^{-1/2} P^{-1}$

where $\sqrt{\Lambda } = \Lambda ^{1/2}$ denotes a diagonal matrix whose entries are square roots of the entries of $\Lambda$. (all of the eigenvalues are non-negative in general; and in particular, my changing all the signs of columns of A amounts to choosing the negative square roots! eigenvectors are not unique.)

perhaps i should remind you that multiplying diagonal matrices is almost trivial: we multiply corresponding diagonal elements. that means, for example, that $\Lambda ^{-1/2}$ and $\Lambda ^{1/2}$ have diagonal elements which are the inverses of each other. that’s also why it makes sense to take square roots of the elements of $\Lambda$ and call the result $\Lambda ^{1/2}$.

finally, five, the fundamental model is

$Z = A \ F, \ or \ F = A^{-1} Z$

assuming that A is invertible (i.e. assuming all eigenvalues of r were positive, not just non-negative).

let’s compute the covariance (not correlation) matrix c of F. we have

$c = \frac{1}{N-1} F F^T$.

first we substitute $F = A^{-1} Z$ and $Z Z^T = (N-1) r$ :

$c = \frac{1}{N-1} (A^{-1} Z) (A^{-1}Z)^T = \frac{1}{N-1}A^{-1}Z Z^T A^{-T} = \frac{N-1}{N-1}A^{-1}\ r \ A^{-T} = A^{-1}\ r \ A^{-T}$;

then we substitute $A^{-1} = \Lambda ^{-1/2} P^{-1}$:

$c = (\Lambda ^{-1/2} P^{-1})\ r \ {(\Lambda ^{-1/2} P^{-1})}^T = \Lambda ^{-1/2} P^{-1} r \ P^{-T} \Lambda ^{-1/2}= \Lambda ^{-1/2} P^{-1} r \ P \Lambda ^{-1/2}$;

then we substitute $r = P \Lambda P^{-1}$:

$c = \Lambda ^{-1/2} P^{-1} (P \Lambda P^{-1}) P \Lambda ^{-1/2} = \Lambda ^{-1/2} (P^{-1}P) \Lambda (P^{-1} P) \Lambda ^{-1/2} = \Lambda ^{-1/2} \Lambda \Lambda ^{-1/2} = I.$

whoa! an identity matrix? boy, do we need to compute the covariance matrix of F for this data!

here it is, truncated so i don’t have to reformat a bunch of numbers of the form a x 10^-b:

$c = \left(\begin{array}{ccccc} 1.&0&0&0&0\\ 0&0.999999&0&0&0\\ 0&0&1.00002&0&0\\ 0&0&0&0.999961&0\\ 0&0&0&0&1.00002\end{array}\right)$

an identity matrix, conceptually if not numerically. yes, the algebra was right. not only are the F variables of unit variance, they are completely uncorrelated with each other. yessssss. now i remember seeing that statement somewhere.

conclusion: the F variables are uncorrelated variables of unit variance.

conclusion: the eigenvalues are not the variances of the F variables. not for Z = A F, with A a weighted eigenvector matrix. tsk, tsk.

conclusion: harman’s table can be fixed most simply by changing the first-column label “Variance” to “Eigenvalue”.

nevertheless, his “variance” label suggests that we should be able to construct combinations of the Zs which have the eigenvalues for variances.

open: how do we find new variables whose variance is the eigenvalues?

but we’ve done enough for one post.

ok, ok, i won’t quite leave you hanging. what about using the orthogonal eigenvector matrix P instead of the weighted eigenvector matrix A?