## PCA / FA example 2: jolliffe. correlation matrix

what we got from harman’s “factor analysis” was his only example of principal component analysis. now let’s see what jolliffe’s “principal component analysis” has for us.
i’m going to work one example, but he did it two ways, and i’ll confirm both sets of calculations.
first, he gives us a correlation matrix. i note that he says who gave him the data, but there is no other detail. don’t plan on ever finding the data behind this. here is the correlation matrix.
$\left(\begin{array}{cccccccc} 1&0.29&0.202&-0.055&-0.105&-0.252&-0.229&0.058\\ 0.29&1&0.415&0.285&-0.376&-0.349&-0.164&-0.129\\ 0.202&0.415&1&0.419&-0.521&-0.441&-0.145&-0.076\\ -0.055&0.285&0.419&1&-0.877&-0.076&0.023&-0.131\\ -0.105&-0.376&-0.521&-0.877&1&0.206&0.034&0.151\\ -0.252&-0.349&-0.441&-0.076&0.206&1&0.192&0.077\\ -0.229&-0.164&-0.145&0.023&0.034&0.192&1&0.423\\ 0.058&-0.129&-0.076&-0.131&0.151&0.077&0.423&1\end{array}\right)$
we know what to do: get the eigenstructure.
here are the eigenvalues:
${2.79227,\ 1.53162,\ 1.24928,\ 0.778408,\ 0.621567,\ 0.488844,\ 0.435632,\ 0.102376}$
here is an eigenvector matrix; it is orthogonal:
$\left(\begin{array}{cccccccc} -0.19422&-0.417123&-0.399761&0.651593&0.175206&0.362816&0.176312&-0.102404\\ -0.400362&-0.153929&-0.167729&0.06372&-0.8476&-0.230413&-0.110465&-0.0101724\\ -0.458879&0.000298497&-0.167775&-0.2738&0.251231&-0.402953&0.676969&-0.0503862\\ -0.430336&0.472442&0.171282&0.169149&0.117723&0.0645932&-0.236745&-0.677924\\ 0.493775&-0.36045&-0.0871641&-0.180372&-0.138999&-0.135721&0.157246&-0.723646\\ 0.319455&0.320166&0.276619&0.633314&-0.161544&-0.383745&0.376513&0.0521431\\ 0.176886&0.535273&-0.410277&-0.163143&-0.298896&0.512799&0.367054&-0.014846\\ 0.170516&0.245283&-0.708611&0.0869052&0.197851&-0.469135&-0.375711&0.0262083\end{array}\right)$
here are just the 4 largest eigenvectors – of course, i mean the eigenvectors having the 4 largest eigenvalues (all of these eigenvectors are the same size, length 1):
$\left(\begin{array}{cccc} -0.19422&-0.417123&-0.399761&0.651593\\ -0.400362&-0.153929&-0.167729&0.06372\\ -0.458879&0.000298497&-0.167775&-0.2738\\ -0.430336&0.472442&0.171282&0.169149\\ 0.493775&-0.36045&-0.0871641&-0.180372\\ 0.319455&0.320166&0.276619&0.633314\\ 0.176886&0.535273&-0.410277&-0.163143\\ 0.170516&0.245283&-0.708611&0.0869052\end{array}\right)$
he rounded things to the nearest .2 (this is where i saw this device, which i applied in harman). when i do it, i get:
$\left(\begin{array}{cccc} -0.2&-0.4&-0.4&0.6\\ -0.4&-0.2&-0.2&0.\\ -0.4&0.&-0.2&-0.2\\ -0.4&0.4&0.2&0.2\\ 0.4&-0.4&0.&-0.2\\ 0.4&0.4&0.2&0.6\\ 0.2&0.6&-0.4&-0.2\\ 0.2&0.2&-0.8&0.\end{array}\right)$
this is not exactly what he had. the signs of eigenvectors are not unique; to match him, i need to take the negatives of the 1st and 3rd. this is very interesting, that the absolute signs of the factors are unknown, because in the course of my reading i’ve seen many comments about “all or most of the coefficients of the first eigenvector are positive”. but that’s not true. what was true is that all or most of the coefficients had the same sign, but it’s an indeterminate sign.
we could instruct Mathematica to change signs directly, but we can also do this using matrices. we want to change the signs of the 1st and 3rd columns, but not of the 2nd and 4th. we create a diagonal matrix $\Gamma$ whose diagonal entries are
${-1,\ 1,\ -1,\ 1}$
i.e. -1 in the 1st and 3rd columns, +1 in the 2nd and 4th – and we compute
$P \ \Gamma$
right? right. we get
$\left(\begin{array}{cccc} 0.2&-0.4&0.4&0.6\\ 0.4&-0.2&0.2&0.\\ 0.4&0.&0.2&-0.2\\ 0.4&0.4&-0.2&0.2\\ -0.4&-0.4&0.&-0.2\\ -0.4&0.4&-0.2&0.6\\ -0.2&0.6&0.4&-0.2\\ -0.2&0.2&0.8&0.\end{array}\right)$
that’s what he got. it’s worth mentioning that each of these 4 eigenvectors is a mixture of most of the original data: there are very zeroes in that matrix.
in keeping with “less is more”, we’ll pick this up next time.