PCA / FA example 4: Davis. R-mode FA, eigenvectors

so much for the eigenvalues. now for the eigenvectors.
it is usual in mathematica that for an integer or rational result, the eigenvector matrix is not orthogonal; for real results, by contrast, the eigenvector matrix returned is orthogonal. we got that the eigenvector matrix was (but let’s call it something else, now):
B = \left(\begin{array}{ccc} -2&0.&1\\ 1&-1&1\\ 1&1&1\end{array}\right)
by computing B^T B, we see that the dot products among all the vectors are:
\left(\begin{array}{ccc} 6&0.&0.\\ 0.&2&0.\\ 0.&0.&3\end{array}\right)
which, as i expected, says the eigenvectors are mutually orthogonal (they have to be, because the eigenvalues are distinct) but not orthonormal. (In fact, their squared lengths are 6, 2, 3, resp.) so we scale the columns of B, getting…
\left(\begin{array}{ccc} -\sqrt{\frac{2}{3}}&0&\frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{6}}&-\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{6}}&\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{3}}\end{array}\right)
\left(\begin{array}{ccc} -0.816497&0.&0.57735\\ 0.408248&-0.707107&0.57735\\ 0.408248&0.707107&0.57735\end{array}\right)
as usual, i didn’t get his signs. to match him, i need to change the signs of the first two columns, so i post-multiply by a diagonal matrix with entries {(-1,\ -1,\ 1)}, getting an orthogonal matrix which we will now call P:
P = \left(\begin{array}{ccc} 0.816497&0.&0.57735\\ -0.408248&0.707107&0.57735\\ -0.408248&-0.707107&0.57735\end{array}\right)
if you need to, you can confirm that P is orthogonal by showing P^T P = I.
recall that one of the eigenvalues was 0. instead of keeping that orthogonal (hence trivially invertible) matrix P, davis (along with almost every other non-mathematician) throws away the 3rd column, because it corresponds to the 0 eigenvalue. i will not do that: there’s nothing simpler than an orthogonal matrix, except the identity.
(i know, i let harman and jolliffe throw them away, but this is different. for harman and jolliffe, presenting the first few eigenvectors was the end of their analysis. for davis, the eigenvector matrix is the beginning.)
i also think it’s a shame to lose the eigendecomposition. i have
\Sigma = P^{-1}\ c\ P = P^T \ c \ P
and we could rewrite it in the form
c = P \ \Sigma \ P^{T},
which we will need in just a moment.
let me confirm the first form: i compute P^{T}\ c\ P =
\left(\begin{array}{ccc} 0.816497&-0.408248&-0.408248\\ 0.&0.707107&-0.707107\\ 0.57735&0.57735&0.57735\end{array}\right) x \left(\begin{array}{ccc} 56&-28&-28\\ -28&20&8\\ -28&8&20\end{array}\right) x \left(\begin{array}{ccc} 0.816497&0.&0.57735\\ -0.408248&0.707107&0.57735\\ -0.408248&-0.707107&0.57735\end{array}\right)
and get \Sigma, of course:
\left(\begin{array}{ccc} 84.&0.&0.\\ 0.&12.&0.\\ 0.&0.&0\end{array}\right)
let’s check something else. davis’ cut down eigenvector matrix is…
\left(\begin{array}{cc} 0.816497&0.\\ -0.408248&0.707107\\ -0.408248&-0.707107\end{array}\right)
and i’m hoping that the equation
c = P \ \Sigma \ P^T
(which i said we would need) still makes sense when we use the smaller matrices \Sigma 2 and U instead of \Sigma and P. i compute
U \ \Sigma 2 \ U^T =
\left(\begin{array}{cc} 0.816497&0.\\ -0.408248&0.707107\\ -0.408248&-0.707107\end{array}\right) x \left(\begin{array}{cc} 84&0.\\ 0.&12\end{array}\right) x \left(\begin{array}{ccc} 0.816497&-0.408248&-0.408248\\ 0.&0.707107&-0.707107\end{array}\right)
and get
\left(\begin{array}{ccc} 56.&-28.&-28.\\ -28.&20.&8.\\ -28.&8.&20.\end{array}\right)
which is precisely c. of course. when i used the full-size matrices, the 0 eigenvalue took out any effect of its eigenvector. we can recover the original matrix c = X^T X from the product of two smaller matrices.
on the one hand, we have lost something: P was invertible and U is not. we haveU^T but we do not have U^{-1}. we have not actually lost the eigendecomposition, but we have lost U^{-1}.
on the other hand, we have found something: the matrix c is of rank 2, not of rank 3. no matrix of rank 3 can be factored as the product of a 3×2 and a 2×3 (each of rank at most 2). and that’s just what we did; the 2×2 eigenvector matrix \Sigma 2 in between is irrelevant, because we could absorb it into either of the outside matrices. what matters is the (3×2) x (2×3).
so if we see a zero eigenvalue, or if choose to set an eigenvalue to zero, we are dealing with a matrix of reduced rank.
let me elaborate on setting an eigenvalue to zero (although this may make more sense down the road). it could also be a singular value that we set to zero. i can do either, and i can do it without changing the eigenvector matrices, because i’m not changing the size of either the w matrix in the SVD or the diagonal matrix in the eigendecomposition. sure, i’ll be carrying around a column of zeroes, but i think they’re a cheap price to pay for looking at alternative models without having to go explicitly throw away columns of matrices. (this example was constructed to have a 0 eigenvalue; in the real world, we would set “small” eigenvalues to zero.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: