## PCA / FA example 2: jolliffe. discussion 2: what might we have ended up with?

back to the table. here’s what jolliffe showed for the “principal components based on the correlation matrix…” with a subheading of “coefficients” over the columns of the eigenvectors.
$\left(\begin{array}{cccc} 0.2&-0.4&0.4&0.6\\ 0.4&-0.2&0.2&0.\\ 0.4&0.&0.2&-0.2\\ 0.4&0.4&-0.2&0.2\\ -0.4&-0.4&0.&-0.2\\ -0.4&0.4&-0.2&0.6\\ -0.2&0.6&0.4&-0.2\\ -0.2&0.2&0.8&0.\end{array}\right)$
under each column, he also showed “percentage of total variation explained”. those numbers were derived from the eigenvalues. we saw this with harman:
• we have standardized data;
• we find an orthogonal eigenvector matrix of the correlation matrix;
• which we use as a change-of-basis to get data wrt new variables;
• the variances of the new data are given by the eigenvalues of the correlation matrix.
the most important detail is that the eigenvalues are the variances of the new data if and only if the change-of-basis matrix is an orthogonal eigenvector matrix.
and that is what jolliffe has: the full eigenvector matrix P is orthogonal. OTOH, we don’t actually know that the data was standardized, but the derivation made it clear that if we want the transformed data to have variances = eigenvalues, then the original data needs to be standardized.
again, since jolliffe never uses data, we can’t very well transform it.
but first, instead of listing the eigenvalues, jolliffe displayed percentages. recall the eigenvalues…
${2.79227,\ 1.53162,\ 1.24928,\ 0.778408,\ 0.621567,\ 0.488844,\ 0.435632,\ 0.102376}$
we compute their sum: it’s 8.
(of course. each original standardized variable has variance 1, so the eight of them have total variance 8. to put it another way, the correlation matrix implicitly standardizes the variables.)
now divide each eigenvalue by the total, 8, and round off, and make it a percentage…
${34.9,\ 19.1,\ 15.6,\ 9.7,\ 7.8,\ 6.1,\ 5.4,\ 1.3}$
the first 4 numbers match jolliffe. he only printed 4 numbers. for whatever reason, he only looked at the first 4 eigenvectors, hence the first 4 eigenvalues.
looking at all of them, i wonder why he chose 4. from the eigenvalues themselves, some would argue that all values less than 1 – i.e. with variance less than the variance of the original variables – should be discarded. but once he keeps the 4th one (.778), i see no cause to discard the 5th one (.622).
take a deep breath. i’m about to short-change jolliffe a little. he has an entire chapter on graphical methods – but most of them cannot be reproduced without data, which he didn’t provide.
let me be more precise. he has several plots of data with respect to their first two PCs. but he doesn’t provide the data, so we’ll make a note to do this kind of plot when we have some data. he has biplots. these, again, appear to require data whch he didn’t provide. make a note to grab a biplot in the first book that has one. he shows a graph which is an example of correspondence analysis. data? no.
finally, he has graphs of something called  andrews’ curves, which appear to be trigonometric polynomials whose coefficients are the first 7 PCs out of 20 (i.e. the 7 new data variables z1,… z7). these look intriguing, except they’ve been clustered somehow, and the text leading up to them talks about andrews’ curves of the residuals, and the trig polynomial comes out of nowhere.
like i said, intriguing. boy do i wish i had the data! well, we will have data from other sources. make a note: try an andrews’ curve, too, someday.
BTW, harman did not do any of these graphs, although we worked an example with data.
nevertheless, jolliffe’s discussions of all these graphs, and the variations on them, may be indispensible if you do this professionally. me? i’ll settle for finding examples down the road.
in addition, he also has an entire chapter on deciding how many variables to keep, either transformed variables or original variables. we’ll continue this example later, looking at how many transformed variables to keep.