PCA / FA Example 7: Bartholomew et al. Calculations

The familiar part

This is the second post about Example 7. We confirm their analysis, but we work with the computed correlation matrix rather than their published rounded-off correlation matrix.

I am pretty well standardizing my notation. V is an orthogonal eigenvector matrix from an eigendecomposition; \lambda is the associated eigenvalues, possibly as a list, possibly as a square diagonal matrix. \Lambda is the square roots of \lambda\ , possibly as a list, possibly as a square matrix, and possibly as a matrix with rows of zeroes appended to make it the same shape as w (below).

Ah, X is a data matrix with observations in rows. Its transpose is Z = X^T\ .

The (full) singular value decomposition (SVD) is the product u\ w\ v^T\ , with u and v orthogonal, w is generally rectangular, as it must be the same shape as X.

The cut-down SVD makes a square matrix w0 out of w, and I write uo\ wo\ vo^T\ . I only do it in order to make an invertible form of w. For the product to work, we must drop some columns of u, and possibly columns of v.

By the same token, if any eigenvalues \lambda are zero, we would drop them, and make the diagonal matrices \lambda and \Lambda invertible. And, again, conformability requires that we drop columns of V. (There is no U, unless I also do an eigendecomposition of X\ X^T\ , which is not all that likely.)

A is a weighted matrix derived from V. It is usually defined as

A = V\ \Lambda\ ,

which means that each column of V has been weighted by a common \sqrt{\text{eigenvalue}}.

You will note that I am being very sloppy about the sizes of \lambda\ , \Lambda\ , and V. I do distinguish things derived from the eigendecomposition (\lambda\ , \Lambda\ , V, A) from things derived from the SVD (u, v, w); in addition, I usually distinguish the cut-down SVD (uo, vo, wo) from the full one.

Here are the eigenvalues of the computed correlation matrix:

\begin{array}{l} 3.48715 \\ 2.13017 \\ 1.09896 \\ 0.994483 \\ 0.543218 \\ 0.383428 \\ 0.225754 \\ 0.13679 \\ 0.0000456251\end{array}

I can get a scree plot and confirm their fig 5.6. That is, we simply plot the each eigenvalue against the order in which it occurred. Personally, i would have used percentages rather than the actual eigenvalues. Regardless, we would probably keep 3 or 4 of the eigenvalues. (We discussed both of these way back in the Jolliffe posts, which followed Harman, the beginning of the PCA / FA category.)

I start to say that their definition of loadings is, in fact, Davis’ A^R = v\ w^T

but it’s not so. w \ne \Lambda = \sqrt{\lambda} edited: = instead of a 2nd \ne

Harman and Jolliffe both computed eigendecompositions of a correlation matrix; Davis and Malinowski both computed eigendecompositions of X^T\ X\ . There’s quite a difference. Do you recall it? (I will, later and in some detail.)

Their definition of loadings is the weighted eigenvector matrix: best to write it as A = V\ \Lambda.

First we get \Lambda as a diagonal matrix with the following diagonal elements, which are \sqrt{\text{eigenvalues}}:

\begin{array}{l} 1.86739 \\ 1.45951 \\ 1.04831 \\ 0.997238 \\ 0.737033 \\ 0.619215 \\ 0.475136 \\ 0.369851 \\ 0.00675464\end{array}

For reference, here are the first three columns of the orthogonal eigenvector matrix V:

\left(\begin{array}{lll} -0.523791 & -0.0535939 & -0.0486744 \\ -0.00132346 & -0.617807 & 0.2011 \\ 0.347495 & -0.355054 & 0.150463 \\ 0.255716 & -0.261096 & 0.561083 \\ 0.325179 & -0.0512884 & -0.153321 \\ 0.37892 & 0.350172 & 0.115096 \\ 0.0743736 & 0.453698 & 0.587361 \\ 0.387409 & 0.221521 & -0.311904 \\ 0.366823 & -0.202592 & -0.375106\end{array}\right)

Get loadings as A = V\ \Lambda and display only the first 3 columns, rounded.

A = \left(\begin{array}{lll} -0.98 & -0.08 & -0.05 \\ 0 & -0.9 & 0.21 \\ 0.65 & -0.52 & 0.16 \\ 0.48 & -0.38 & 0.59 \\ 0.61 & -0.07 & -0.16 \\ 0.71 & 0.51 & 0.12 \\ 0.14 & 0.66 & 0.62 \\ 0.72 & 0.32 & -0.33 \\ 0.69 & -0.3 & -0.39\end{array}\right)

That’s their table 5.7. (Except for the sign of the 2nd column, but we know that’s ok: any eigenvector can be multiplied by any number, and any unit eigenvector can be multiplied by –1.)

Now plot the first two. But change the sign of the second (only to match them for the drawing).

That’s their fig. 5.7.

Next, we get “component score coefficients”. They define them as eigenvectors weighted by \frac{1}{\sqrt{\text{eigenvalues}}}\ . (Doesn’t that division remind you of a reciprocal basis, namely the reciprocal basis of A? That’s exactly what it is.) For now, anyway, I simply compute what they tell me, and get, for the first two columns:

\left(\begin{array}{ll} -0.28 & 0.04 \\ 0 & 0.42 \\ 0.19 & 0.24 \\ 0.14 & 0.18 \\ 0.17 & 0.04 \\ 0.2 & -0.24 \\ 0.04 & -0.31 \\ 0.21 & -0.15 \\ 0.2 & 0.14\end{array}\right)

That’s their table 5.9. and those two columns are two of the reciprocal basis vectors to A.

The unfamiliar part

To compute scores, we need to introduce the data.

Well, what is the data?

Please don’t curse me out. That wasn’t as stupid a question as it sounded.

First off, we started with “data”. We had to: we couldn’t use the published correlation matrix because it was not positive definite. But what we started with was the “raw data”, and what we did to it was compute its correlation matrix.

So what’s the appropriate data for the analysis? After all, over the past several months we’ve discussed four possible forms of the data

  1. raw
  2. centered
  3. standardized
  4. small standard deviates

and every one of them has exactly the same correlation matrix. How do we choose?

There are only two sensible possibilities, but the fact is there are two, not just one. We can use standardized data, or we can use small standard deviates.

(In fact, we could use any multiple of the standardized data but why would we? Small standard deviates are one particular multiple, all divided by \sqrt{\text{N-1}}, where N is the number of observations.)

But why must we use some multiple (possibly 1) of the standardized data? Because then the eigenvector matrix V from the eigendecomposition of the correlation matrix matches the v matrix from the singular value decomposition u\ w^T\ v of the standardized data (to within signs of columns).

Multiplying the standardized data matrix by any constant simply multiplies the w’s or the \lambda‘s, it has no effect on the orthogonal eigenvector matrices u, v, V.

Oh, if we had done an eigendecomposition of the covariance matrix instead of the correlation matrix, we would have to now choose our “data” as the centered data, not the standardized. The reasoning is the same: the eigenvector matrix V from the eigendecomposition of the covariance matrix matches the v matrix from the singular value decomposition of the centered data (or any constant multiple of it).

So now we do what most people take completely for granted, and we standardize the data, because we worked from the correlation matrix. But we could choose small standard deviates. (More to follow.)

I would check the variances; they are each 1, as they should be.

Henceforth, X is the matrix of standardized data. Here are its first two columns:

\left(\begin{array}{ll} -1.01828 & -0.364773 \\ -0.638776 & -1.18948 \\ -0.535859 & -0.467862 \\ -0.799583 & 0.0475791 \\ 0.261745 & -0.261685 \\ -0.207812 & -0.674038 \\ -0.73526 & 1.90317 \\ -0.825312 & -1.18948 \\ -1.05687 & 0.150667 \\ -0.413646 & -0.158597 \\ -0.394349 & -0.880214 \\ 1.43242 & -0.674038 \\ -0.65164 & -0.777126 \\ 0.55763 & -0.983302 \\ 0.242448 & -0.467862 \\ -0.838177 & -0.880214 \\ -0.73526 & -1.08639 \\ 3.06622 & -0.57095 \\ 0.287474 & 0.666108 \\ -0.169219 & 1.69699 \\ -0.96039 & 1.69699 \\ 0.16526 & 1.90317 \\ 0.769895 & 1.28464 \\ 1.00146 & 0.872284 \\ 0.293906 & 0.150667 \\ 1.90198 & 0.253755\end{array}\right)

Finally, having decided upon the standardized data X, they compute the scores by projecting X onto the “component score coefficients”,

scores = X\ cs

Hey, that’s why they computed the reciprocal basis: so they could get the components wrt the A basis by doing a projection. Here are the first two columns of the scores.

\left(\begin{array}{ll} 0.898195 & -0.820867 \\ 0.500376 & -1.42956 \\ 0.396261 & -0.753289 \\ 0.447681 & -0.00764342 \\ -0.0543495 & -0.27814 \\ 0.197129 & -0.517025 \\ 0.556321 & 0.507807 \\ 0.886495 & -1.34697 \\ 0.85616 & -0.250689 \\ 0.617761 & 0.0961429 \\ 0.520875 & -0.502753 \\ -1.10976 & -0.237475 \\ 0.885591 & -0.722726 \\ -0.533138 & -0.508625 \\ -0.228874 & -0.41409 \\ 0.569114 & -1.06054 \\ 0.555631 & -0.49878 \\ -3.33314 & -0.716295 \\ -0.381018 & 1.00561 \\ 0.223669 & 1.7902 \\ 0.935125 & 1.89129 \\ 0.303693 & 2.11194 \\ -0.584818 & 1.28005 \\ -1.07726 & 1.07835 \\ 0.0264838 & 0.850927 \\ -2.0742 & -0.546846\end{array}\right)

Let’s plot those numbers, as xy-coordinates.

That matches their figure 5.8.

To summarize: the familiar part was the eigendecomposition of the correlation matrix, yielding eigenvalues \lambda and their square roots \Lambda\ , and the orthogonal eigenvector matrix V and the \sqrt{\text{eigenvalue}}-weighted eigenvector matrix A = V\ \Lambda\ . We got a scree plot of the eigenvalues, and plotted the first two columns of A.

For the unfamiliar part, we computed the scores by computing the reciprocal basis (“component score coefficients”, i.e. weighted by the inverse square roots, or V\ \Lambda^{-1}), and then projecting the standardized data onto the reciprocal basis.

There are a couple of ways to get the scores without the reciprocal basis, but there is one wrinkle. Let’s look at it next time.

Review: reciprocal basis

if the reciprocal basis confuses you, look for it in the linear algebra category, and in the PCA / FA category…. or you could pick a month, and search for “reciprocal”.

Or, for this special case, when we only change the sizes but not the directions of an orthonormal basis, think of it as follows.

If we have old components of data, as a column vector z, and an orthogonal transition matrix V, then the new components of the data, as a column vector y, are described by

z = V\ y\ .

Transpose:

z^T = y^T\ V^T

Let x = z^T\ , since Z = X^T\ :

x = y^T\ V^T

Post-multply by V (edit: not by V^T\ ):

x\ V = y^T\ V^T\ V = y^T

since V is orthogonal, i.e. V^T\ V = I\ .

What we’ve just seen is that the usual change-of-basis equation

z = V\ y

or

Z = V\ Y

which represents old and new data components as columns is consistent with the projection

Y^T = X\ V\ ,

which represents old and new data components as rows, if V is orthogonal.

Now suppose we were to make all the vectors in V larger, say by a factor of 2; then the new components y would have to be smaller by a factor of 2:

z = \left(2\ V\right)\  \left(y\ /\ 2\right)\ .

If we were to project X onto the new basis (2 V), however, we’d get numbers that were larger by a factor of 2 instead of smaller. That’s just wrong.

This what the reciprocal basis takes care of. If the new basis were 2 V, then the reciprocal basis would be V / 2. If the new basis has columns A = V\ \Lambda\ , then the reciprocal basis has columns V\ \Lambda^{-1}\ .

And then we can get new components of the data by projecting onto the reciprocal basis. If you insist on projecting, then you need the reciprocal basis. (If you want the scores to be the new components of the data wrt the A basis. Not everyone does, apparently. Harman’s or Jolliffe’s definitions of the scores are not universal: Davis is an exception. More to follow.)

As I said, however, there are a couple of ways to get the scores without the reciprocal basis, but there is one wrinkle.

Let’s look at it next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: