PCA / FA example 2: jolliffe. discussion 3: how many PCs to keep?

from jolliffe’s keeping only 4 eigenvectors, i understand that he’s interested in reducing the dimensionality of his data. in this case, he wants to replace the 8 original variables by some smaller number of new variables. that he has no data, only a correlation matrix, suggests that he’s interested in the definitions of the new variables, as opposed to the numerical values of them.
there are 4 ad hoc rules he will use on the example we’ve worked. he mentions a 5th which i want to try.
from the correlation matrix, we got the following eigenvalues.
${2.79227, \ 1.53162, \ 1.24928, \ 0.778408, \ 0.621567, \ 0.488844, \ 0.435632, \ 0.102376}$
we can compute the cumulative % variation. recall the eigenvalues as percentages…
${34.9034, \ 19.1452, \ 15.6161, \ 9.7301, \ 7.76958, \ 6.11054, \ 5.4454, \ 1.2797}$
now we want cumulative sums, rounded….
${34.9, \ 54., \ 69.7, \ 79.4, \ 87.2, \ 93.3, \ 98.7, \ 100.}$
his 1st rule is to keep 70-90% of the total variation. read literally, we would keep 4 or 5 eigenvectors, getting ether 79.4% or 87.2% of the variation; we would not keep just 3, because 69.7% is less than 70%. but it seems silly to interpret an ad hoc rule literally: interpreted loosely we could keep just 3 eigenvectors, rounding the third cumulative to 70%. and i suppose we could keep the sixth one, even though it pushes our total to 93%. anyway, this is a pretty vague answer: keep 3 to 5 eigenvectors, maybe even 6 of them.
that 1st rule based the cutoff on the cumulative sum of the eigenvalues.
his 2nd rule is based on the individual eigenvalues: keep all eigenvalues above a lower bound. the classical recommendation is 1 – because the original variables each have variance 1, so anything less is not as significant as an original variable – but jolliffe himself recommends about .7: he argues that a variable which was completely independent of all the others must lead to an eigenvalue less than 1. (i don’t really care to see why: this is, after all, a rule of thumb.)
so let’s round off the eigenvalues…
${2.8, \ 1.5, \ 1.2, \ 0.8, \ 0.6, \ 0.5, \ 0.4, \ 0.1}$
to better see what we would keep.
his 2nd rule would have us keep up to the 4th eigenvalue (.8), but not the 5th (.6). this is why he displayed only 4 eigenvectors earlier. so this gives a single recommendaton: keep the first 4. (having seen this rule with the classical cutoff of 1, i was expecting him to keep only 3.)
he mentions a 3rd rule, but says that it amounts to a cutoff between .1 and .2 (instead of his .7), which he finds keeps too many eigenvectors. in this case it would keep at least 7 of them.
incidentaly, for a covariance matrix instead of correlation matrix, we would modify the cutoffs to use 1 or .7 or about .15 times the average variance (for the correlation matrix, the average variance is 1.)
his 4th method is called a $\pmb{ scree}$ graph: it’s just a plot of the eigenvalues in decreasing order.
we’re supposed to look for an “elbow”, a flattening out of the curve. well, maybe after the 4th point. he agrees with me that the (magnitude of the) slope has actualy increased between 3 and 4, and he says he views the graph with skepticism. in this case the scree graph gives us not much more than a hint.
alternatively, there is something called an $\pmb{ LEV diagram}$, which replaces the eigenvalue by its logarithm. this is used in meteorology and oceanography, he says. it also seems more useful applied to the covariance matrix.
in this case, applied to the correlation matrix…
he agrees that the diagram shows maximum (magnitude) slope between 7 and 8. so much for a flattening out. the LEV diagram gives us nothing.
the 5th rule is to compare our eigenvalues with “broken sticks”. that is, if we have a stick of some length and we break it into random-sized pieces, what are the expected sizes? to be specific, if we have a stick of unit length and we break it at random into p segments, then, he says, the expected length of the kth longest segment is
$L(p,k) = l_{k}^* = \frac{1}{p}\sum _{ j = k}^{ p} \frac{1}{j}$.
having converted our eigenvalues to % variance explained, we retain those for which the % variation explained exceeds $l_{k}^*$. (in fact, we will use fraction explained, and fractions of 1, rather than % explained and fractions of a stick of length 100.)
he does not provide a table for $l_{k}^*$, hence does not apply it in this case, but we can create one easily enough. i define that function… and i check it. for p = k = 1: L(1,1) = 1.
good: if we left the stick in one piece, that piece is of length 1.
if we break the stick into two pieces (p = 2), we get: L(2,1) = 3/4 and L(2,2) = 1/4, where L(2,1) is the expected length of the first longest piece (i.e. the longer piece of 2), and L[2,2] is the expected length of the second longest piece (i.e. the shorter piece).
let’s see. the larger piece cannot be less than 1/2, and it’s equally likely (random = uniform distribution) to be anywhere between 1/2 and 1, hence its expected value is 3/4. then the shorter piece has expected length 1/4.
i actually want to get a list of the values for a fixed p. then mathematica says the expected lengths of a unit stick broken into 2 pieces are:
${\frac{3}{4}, \ \frac{1}{4}}$
and the expected lengths for a unit stick broken into 3 pieces are:
${\frac{11}{18}, \ \frac{5}{18}, \ \frac{1}{9}}$
or
${0.611111, \ 0.277778, \ 0.111111}$
in our example, we have 8 variables ~ 8 eigenvalues, so we want the expected lengths for a unit stick broken into 8 pieces:
${\frac{761}{2240}, \ \frac{481}{2240}, \ \frac{341}{2240}, \ \frac{743}{6720}, \ \frac{533}{6720}, \ \frac{73}{1344}, \ \frac{15}{448}, \ \frac{1}{64}}$
or
${0.339732, \ 0.214732, \ 0.152232, \ 0.110565, \ 0.0793155, \ 0.0543155, \ 0.0334821, \ 0.015625}$
we could check that they add up to 1 (they do).
now recall the eigenvalues as a fraction of variation:
${0.349034, \ 0.191452, \ 0.156161, \ 0.097301, \ 0.0776958, \ 0.0611054, \ 0.054454, \ 0.012797}$
and we are to keep those whose variation exeeds the broken stick number. take each eigenvalue and subtract the corresponding broken stick value; we get:
${0.0093, \ -0.023, \ 0.003928, \ -0.01326, \ -0.001619, \ 0.00679, \ 0.02097, \ -0.002828}$
well, i can’t say i like the alternating signs: keep the 1st, 3rd, 6th and 7th? he does say, “principal components for which the proportion exceeds [the expected length of the kth largest segment] are then retained, and all other PCs deleted.” hmm. did he really mean that, or would he tell us to keep all of them thru the 7th eigenvector (the last one greater than the broken stick number)?
i’m a little more impressed – and not favorably! – by how close to random, i.e. how close to the broken stick numbers, the eigenvalues seem to be.
maybe we should use cumulatives? we have cumulative % for the eigenvalues, so recall them:
${34.9, \ 54., \ 69.7, \ 79.4, \ 87.2, \ 93.3, \ 98.7, \ 100.}$
let’s get cumulative % for the stick broken into 8 pieces:
${34., \ 55.4, \ 70.7, \ 81.7, \ 89.7, \ 95.1, \ 98.4, \ 100.}$
subtract the stick cumulatives from the eigenvalue cumulatives. does the sign change at some point?
${0.9, \ -1.4, \ -1., \ -2.3, \ -2.5, \ -1.8, \ 0.3, \ 0.}$
yes, on the second number! and the cumulative sum of % differences is nothing like zero. well, of course it’s zero for all 8 pieces, just not for any smaller number of pieces. i wonder if there’s an overall test we can do on this. in my notes, i write “open”. i am intrigued.
he then cites one study which shows the broken stick gives very good results; another showing that it’s good but no rule is consistently good; another that it’s bad. no help here, except perhaps the conclusion that none of these ad hoc rules is consistently good!
well, the broken stick was an interesting idea, but i can’t say it floats my boat, not for this problem anyway. but there’s something fascinating about it….