PCA / FA example 4: davis. Davis & Jolliffe.

What tools did jolliffe give us (that i could confirm)?

  1. Z = X A, A orthogonal, X old data variables in columns
  2. serious rounding
  3. plot data wrt first two PCs
  4. how many variables to keep?

but jolliffe would have used the correlation matrix. Before we do anything else, let’s get the correlation matrix for davis’ example. Recall the centered data X:

X = \left(\begin{array}{lll} -6 & 3 & 3 \\ 2 & 1 & -3 \\ 0 & -1 & 1 \\ 4 & -3 & -1\end{array}\right)

i compute its correlation matrix:

\left(\begin{array}{lll} 1. & -0.83666 & -0.83666 \\ -0.83666 & 1. & 0.4 \\ -0.83666 & 0.4 & 1.\end{array}\right)

Now get an eigendecomposition of the correlation matrix; i called the eigenvalues \lambda and the eigenvector matrix A. Here’s A:

A = \left(\begin{array}{lll} -0.645497 & 0 & -0.763763 \\ 0.540062 & -0.707107 & -0.456435 \\ 0.540062 & 0.707107 & -0.456435\end{array}\right)

If we compute A^T\ A and get an identity matrix, then A is orthogonal. (it is.)

the eigenvalues are:

\lambda = \{2.4,\ 0.6,\ 0\}

That the third eigenvalue is zero would tell us, if we didn’t already know it from davis, that the data, X, is 2D.

now we can get (1) the new data Z, wrt the orthogonal eigenvector matrix A; it is given by the projection of X onto A (because A is orthogonal, i.e. the basis vectors are orthonormal):

Z = \left(\begin{array}{lll} 7.11335 & 0 & 1.84396 \\ -2.37112 & -2.82843 & -0.614654 \\ 0 & 1.41421 & 0 \\ -4.74224 & 1.41421 & -1.22931\end{array}\right)

Fascinating.

An understatement. There are 3 nonzero columns.

in sharp contrast to A^Q from the centered data X, this new data does not look 2D wrt the orthogonal eigenvector basis. It is still 2D, nothing changed about that, but it isn’t at all clear from the new components of the data. our 3 columns are not linearly independent.

i just gave us all a beautiful lesson: it can be very worthwhile to look at both covariance (or X^T\ X of centered data) and correlation.

BTW, since this is new data, get the variances…

\{26.2369,\ 4.,\ 1.76307\}

and the total is… 32. we remember that number from davis.

Sure enough, the orthogonal eigenvector matrix A has redistributed the total variance, which is 32 – but it has spread it out over 3 variables instead of 2.

Now we are ready to follow jolliffe. BTW, i will refer to things like “jolliffe’s Z” although jolliffe, of course, never worked out davis’ example! we could try (2) some serious rounding of everything in sight. The correlation matrix: original…

\left(\begin{array}{lll} 1. & -0.83666 & -0.83666 \\ -0.83666 & 1. & 0.4 \\ -0.83666 & 0.4 & 1.\end{array}\right)

rounded to the nearest .5…

\left(\begin{array}{lll} 1. & -1. & -1. \\ -1. & 1. & 0.5 \\ -1. & 0.5 & 1.\end{array}\right)

The eigenvector matrix? original…

\left(\begin{array}{lll} -0.645497 & 0 & -0.763763 \\ 0.540062 & -0.707107 & -0.456435 \\ 0.540062 & 0.707107 & -0.456435\end{array}\right)

rounded to the nearest .5…

\left(\begin{array}{lll} -0.5 & 0 & -1. \\ 0.5 & -0.5 & -0.5 \\ 0.5 & 0.5 & -0.5\end{array}\right)

The new data? original…

\left(\begin{array}{lll} 7.11335 & 0 & 1.84396 \\ -2.37112 & -2.82843 & -0.614654 \\ 0 & 1.41421 & 0 \\ -4.74224 & 1.41421 & -1.22931\end{array}\right)

Rounded to the nearest .5 …

\left(\begin{array}{lll} 7. & 0 & 2. \\ -2.5 & -3. & -0.5 \\ 0 & 1.5 & 0 \\ -4.5 & 1.5 & -1.\end{array}\right)

Rounded to the nearest 1…

\left(\begin{array}{lll} 7. & 0 & 2. \\ -2. & -3. & -1. \\ 0 & 1. & 0 \\ -5. & 1. & -1.\end{array}\right)

i don’t see anything exciting in all that. (but if you’re thinking that i haven’t looked at everything in sight, you’re right. My bad. I’ll make it right. Soon. Very soon.)

As for (3) a plot of the data wrt the first two PCs, that should be: take the first two columns of the new data…

\left(\begin{array}{ll} 7.11335 & 0 \\ -2.37112 & -2.82843 \\ 0 & 1.41421 \\ -4.74224 & 1.41421\end{array}\right)

and plot those pairs of points. (this corresponds to davis’ plotting A^Q.)

That looks remarkably like a plot of my A^Q (mine or davis’ doesn’t matter much, but there are sign differences; nevertheless, the geometry was memorable.) here’s a plot of my A^Q:

Clearly we need to look at rounded A^Q and Z, and A and v. (i said i’d make it right.) here are jolliffe’s Z …

\left(\begin{array}{lll} 7.2 & 0 & 1.8 \\ -2.4 & -2.8 & -0.6 \\ 0 & 1.4 & 0 \\ -4.8 & 1.4 & -1.2\end{array}\right)

and my A^Q, rounded to the nearest .2 …

\left(\begin{array}{lll} -7.4 & 0 & 0 \\ 2.4 & -2.8 & 0 \\ 0 & 1.4 & 0 \\ 4.8 & 1.4 & 0\end{array}\right)

The first two columns are not identical, but awfully close, and everything else is identical (to within the usual sign ambiguity).

here are jolliffe’s A …

\left(\begin{array}{lll} -0.5 & 0 & -1. \\ 0.5 & -0.5 & -0.5 \\ 0.5 & 0.5 & -0.5\end{array}\right)

and my v…

\left(\begin{array}{lll} 1. & 0 & 0.5 \\ -0.5 & -0.5 & 0.5 \\ -0.5 & 0.5 & 0.5\end{array}\right)

i am surprised at how similar these are. This will not always happen. After all, we saw, in jolliffe when he worked both the correlation and the covariance matrices, that the eigenvectors need not be that similar.

Moving on to (4), how many new variables would jolliffe tell us to keep? recall the eigenvalues of the correlation matrix…

\lambda = \{2.4,\ 0.6,\ 0\}

If we keep only those greater than 1 or even those greater than .7, we would keep only one.

Write them as percentages of total…

\{80., 20., 0\}

If we keep those whose cumulative contribution is 70 – 90%, we would keep only one.

a scree graph won’t help at all: we only have two nonzero eigenvalues, so we get a single line segment, and no “elbow” is possible.

What about the broken stick model? recall that if a stick of length 1 is broken into p pieces, then the expected length of the kth largest piece is given by

L(p,k) =  \frac{1}{p}\sum _{ j = k}^{ p} \frac{1}{j}

The expected lengths for a unit stick broken into 3 pieces are…

\{\frac{11}{18},\frac{5}{18},\frac{1}{9}\}

i.e.

\{0.611111,\ 0.277778,\ 0.111111\}

and our fractional eigenvalues were

\{0.8,\ 0.2,\ 0\}

So, once more, jolliffe would keep only one eigenvector (.8 > .61 but .2 < .278).

Jolliffe notwithstanding, i would keep both nonzero eigenvalues. it just doesn’t seem excessive.

Advertisements

2 Responses to “PCA / FA example 4: davis. Davis & Jolliffe.”

  1. PCA / FA . “Preprocessing” « Rip’s Applied Mathematics Blog Says:

    […] an eigendecomposition of the correlation matrix did not lead to new data that was manifestly 2D. (here) In a sense the correlation matrix was not as effective at showing the 2D dimensionality of the data […]

  2. rip Says:

    well, well, well.

    my own link labeled “(here)” to this post, in “preprocessing”, generated an automatic comment; it therefore provides a link to the calling post.

    i just thought i would remark on this phenomenon.

    rip


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: