PCA / FA Example 7: Bartholomew et al.: the scores

edited 17 Oct 2008 to round-off the first two columns of 5 u.

This is the 3rd post about the Bartholomew et al. book.

Introduction

It would be convenient if Bartholomew’s model were one we had seen before.

It is.

We got their scores and loadings just by following their instructions. Although they didn’t use matrix notation, their equations amounted to

loadings = the \sqrt{\text{eigenvalue}}-weighted eigenvector matrix, A = V\ \Lambda

“component score coefficients” = reciprocal basis vectors, cs = V\ \Lambda^{-1}

scores = X cs.

where V is an orthogonal eigenvector matrix of the correlation matrix, \Lambda is the diagonal matrix of (nonzero) \sqrt{\text{eigenvalues}}\ , X is the standardized data.

Let’s recall Harman’s model. He had

Z = A\ F

where Z was the transpose of the standardized data X:

Z = X^T

and A is the \sqrt{\text{eigenvalue}}]-weighted eigenvector matrix, so we have

A = V\ \Lambda\ .

Transposing, we get

X = F^T\ A^T\ .

There are two crucial facts:

  • Harman has factored the data matrix X as scores times loadings.
  • The right factor is not orthogonal: A^T rather than V^T.

Terminology: Harman says the elements of F are the factor values, and the “coefficients of the factors” are loadings. I take it, therefore, that A is the loadings.

So can we compute F or F^T? In this case, yes: all the eigenvalues are nonzero, so A is invertible, and then

F^T = X\ A^{-T}\ .

So let’s compute F^T. (You may have recognized A^{-T}\ . It is in fact the reciprocal basis to A, and that equation says that F^T is found by projecting the data onto the reciprocal basis. That’s exactly what Bartholomew had us do, without telling us that we were computing A^{-T}\ along the way.)

Before we do that, I want to be very careful about the signs of columns. In the previous post, for example, the A matrix which I computed differed from the A matrix which Bartholomew et al. computed. We know that’s ok, because eigenvectors of length 1 can be multiplied by -1 and still be eigenvectors of length 1. But I was not consistent about how I changed signs to confirm that our graphs matched: in fact while I had a different A matrix from them, I defined my “component score coefficients” to match theirs. That was not consistent.)

But now, if I want to check my computations by multiplying the scores times the loadings, I need a consistent set of signs for the columns.

Rewriting Bartholomew’s scores

My personal preference for achieving that is to start with the orthogonal v matrix from the SVD (singular value decomposition) and construct A from it. You’ll see why, below.

What follows seems to be my adopted recipe for doing PCA / FA, although it’s cluttered here by comparisons to Bartholomew’s calculations. You’ll see it again.

First, decide what “the data” is to be: raw, centered, standardized, etc. We know that Bartholomew used standardized data, so let X be the standardized data, and let

X = u\ w\ v^T

be its SVD.

I remind us that the dimensions are

X: 26 x 9

u: 26 x 26

w: 26 x 9

v: 9 x 9

That is, u and v are square (as they must be: they are orthogonal); and w is the same shape as the data X.

Right now, I wish to use Harman’s & Bartholomew’s model, i.e. the correlation matrix. Rather than do algebra in my head, I’m going to compute the correlation matrix of X… get just its eigenvalues …

\begin{array}{l} 3.48715 \\ 2.13017 \\ 1.09896 \\ 0.994483 \\ 0.543218 \\ 0.383428 \\ 0.225754 \\ 0.13679 \\ 0.0000456251\end{array}

That’s all I want the correlation matrix for: its eigenvalues. Instead of its eigenvector matrix V, with whatever signs Mathematica® returns, I use the v matrix from the SVD.

Then create a diagonal matrix whose entries are the square roots of those eigenvalues; that is, the diagonal entries are

\sqrt{\lambda} = \begin{array}{l} 1.86739 \\ 1.45951 \\ 1.04831 \\ 0.997238 \\ 0.737033 \\ 0.619215 \\ 0.475136 \\ 0.369851 \\ 0.00675464\end{array}

Let me remind us that w0, the cut-down w, is the same shape as \Lambda\ . That is, let w0 be the square 9×9 submatrix of w:

w0 = \left(\begin{array}{lllllllll} 9.33696 & 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 7.29756 & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 5.24156 & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 4.98619 & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 3.68517 & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 3.09608 & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 2.37568 & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 0. & 1.84926 & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0.0337732\end{array}\right)

I promised in the previous post that there would be one caveat, and here it is: w0 is proportional to \Lambda\ , but not equal to it.

\Lambda = \left(\begin{array}{lllllllll} 1.86739 & 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 1.45951 & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 1.04831 & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 0.997238 & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 0.737033 & 0. & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 0.619215 & 0. & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 0.475136 & 0. & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0.369851 & 0. \\ 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0.00675464\end{array}\right)

The proportionality constant is \sqrt{N-1}\ ; in this case, 5 because we have 26 observations. We have

w0 = 5\ \Lambda

It’s that proportionality that I know I will mess up unless I explicitly start with the eigenvalues of the correlation matrix.

then the \sqrt{\text{eigenvalue}}-weighted matrix A is

A = v\ \Lambda

and its first 2 columns, rounded off, are…

\left(\begin{array}{ll} -0.98 & 0.08 \\ 0 & 0.9 \\ 0.65 & 0.52 \\ 0.48 & 0.38 \\ 0.61 & 0.07 \\ 0.71 & -0.51 \\ 0.14 & -0.66 \\ 0.72 & -0.32 \\ 0.69 & 0.3\end{array}\right)

Those are loadings. They happen to match Bartholomew’s choice of signs.

The “component score coefficients” are computed as \frac{1}{\sqrt{\text{eigenvalue}}}-weighted matrix v… that inverse square root tells me, and you, I hope, that the “component score coefficients” are the reciprocal basis to A.

The first two columns of the “component score coefficients” are…

cs = \left(\begin{array}{ll} -0.28 & 0.04 \\ 0 & 0.42 \\ 0.19 & 0.24 \\ 0.14 & 0.18 \\ 0.17 & 0.04 \\ 0.2 & -0.24 \\ 0.04 & -0.31 \\ 0.21 & -0.15 \\ 0.2 & 0.14\end{array}\right)

They, too, match Bartholomew et al., as we would expect, since the first two columns of A matched. (I have labeled those “cs” but they are only the first two columns of the cs matrix, which is 9×9. I will consistently abuse that notation, because I’m going to be displaying matrices which are equal but computed differently. I really am displaying the results of computations, not just copying the thing that I’m supposed to get.)

The scores are computed by projecting the data onto the “component score coefficients”. That told me that Bartholomew’s scores were the new components of the data wrt the A basis. Here they are, as Bartholomew would compute them:

scores = \left(\begin{array}{ll} 0.9 & -0.82 \\ 0.5 & -1.43 \\ 0.4 & -0.75 \\ 0.45 & -0.01 \\ -0.05 & -0.28 \\ 0.2 & -0.52 \\ 0.56 & 0.51 \\ 0.89 & -1.35 \\ 0.86 & -0.25 \\ 0.62 & 0.1 \\ 0.52 & -0.5 \\ -1.11 & -0.24 \\ 0.89 & -0.72 \\ -0.53 & -0.51 \\ -0.23 & -0.41 \\ 0.57 & -1.06 \\ 0.56 & -0.5 \\ -3.33 & -0.72 \\ -0.38 & 1.01 \\ 0.22 & 1.79 \\ 0.94 & 1.89 \\ 0.3 & 2.11 \\ -0.58 & 1.28 \\ -1.08 & 1.08 \\ 0.03 & 0.85 \\ -2.07 & -0.55\end{array}\right)

I need to plot these, to confirm that they are the same as Bartholomew’s.

Yes.

We have reproduced the calculations in the previous post, but whereas I was changing signs all over the place in the previous post – in order to match their pictures – this time I know that I have a consistent set of signs for whatever calculations I choose to do.

Computing F transpose instead

Now I’m ready to get F^T = X\ A^{-T}\ . Let’s compute A^{-T} and look at its first two columns: they should be the “component score coefficients”.

A^{-T} = \left(\begin{array}{ll} -0.28 & 0.04 \\ 0 & 0.42 \\ 0.19 & 0.24 \\ 0.14 & 0.18 \\ 0.17 & 0.04 \\ 0.2 & -0.24 \\ 0.04 & -0.31 \\ 0.21 & -0.15 \\ 0.2 & 0.14\end{array}\right)

They are. Recall the “component score coefficients”, as computed according to Bartholomew:

cs = \left(\begin{array}{ll} -0.28 & 0.04 \\ 0 & 0.42 \\ 0.19 & 0.24 \\ 0.14 & 0.18 \\ 0.17 & 0.04 \\ 0.2 & -0.24 \\ 0.04 & -0.31 \\ 0.21 & -0.15 \\ 0.2 & 0.14\end{array}\right)

They match.

Then F^T is the projection of X onto the reciprocal basis, and must agree with the scores as we already computed them:

F^T = \left(\begin{array}{ll} 0.9 & -0.82 \\ 0.5 & -1.43 \\ 0.4 & -0.75 \\ 0.45 & -0.01 \\ -0.05 & -0.28 \\ 0.2 & -0.52 \\ 0.56 & 0.51 \\ 0.89 & -1.35 \\ 0.86 & -0.25 \\ 0.62 & 0.1 \\ 0.52 & -0.5 \\ -1.11 & -0.24 \\ 0.89 & -0.72 \\ -0.53 & -0.51 \\ -0.23 & -0.41 \\ 0.57 & -1.06 \\ 0.56 & -0.5 \\ -3.33 & -0.72 \\ -0.38 & 1.01 \\ 0.22 & 1.79 \\ 0.94 & 1.89 \\ 0.3 & 2.11 \\ -0.58 & 1.28 \\ -1.08 & 1.08 \\ 0.03 & 0.85 \\ -2.07 & -0.55\end{array}\right)

Yes, the same.

Let’s check things. We can compute AF, compare it with Z = X^T. They should be equal…. I get differences of about 10^–15 or less.

Good.

Bartholomew’s “component score coefficients” are the columns of the reciprocal basis A^{-T}\ , and his scores are the F^T matrix. Bartholomew et al. have factored the data matrix the same way Harman did. (Good. it’s nice that they agree with a classic text.)

There is another way

But there is another approach to all this. After all, we know how to factor a data matrix. In fact, I started this process by factoring it!

There’s just that one little caveat: the singular values w are not quite the square roots of the eigenvalues – because there’s a pesky factor of \sqrt{N-1}.

To be quite explicit, suppose we do an SVD of the standardized data X:

X = u\ w\ v^T\ .

Now, compute the correlation matrix c..

c = \frac{1}{N-1} \left(v\ w^T\ u^T\right)\ \left(u\ w\ v^T\right) = \frac{1}{N-1} \left(v\ w^T\ \ w\ v^T\right)

but the eigendecomposition gave us

c = V\ \lambda\ V^T = V \Lambda^2\ V^T

and we may choose, as I did, to set V = v. (That’s pretty major: we are requiring that u and V have consistent signs.) Then we see that

\Lambda^2 = \frac{w^T\ w}{N-1}

Hence (if we cut down w to w0 so that it is square)

\Lambda = \frac{w0}{\sqrt{N-1}}

or

w0 = \sqrt{N-1}\ \Lambda\ .

(as we confirmed earlier.)

Now go back and rewrite the SVD (cutting down u to u0 that it is conformable, i.e. the right size) :

X = u0\ w0\ v^T = u0\ \sqrt{N-1}\ \Lambda\ v^T\ .

Collect terms:

X = \left(u0\ \sqrt{N-1}\right)\  \left(\Lambda\ v^T\right)

Now recognize A^T = \Lambda\ v^T (since A = v\ \Lambda) and we have

X = \left(u0\ \sqrt{N-1}\right)\ A^T\ .

For A invertible, we may then infer that

X\ A^{-T} = F^T = u0\ \sqrt{N-1}\ .

in this case, with N = 26, we have \sqrt{N-1} = 5\ .

In other words, the scores are 5 u0.

(I have to tell you that I found that his scores were 5 u0 before I understood why. I was a little annoyed that they were so much simpler than projections onto a reciprocal basis.)

So let’s compute 5 u, and take its first two columns. (In fact, we could compute the full matrix 5u rather than 5 u0; I would never take more than 9 columns of it, because that’s all the columns I have of data X, and the scores are the new components of the data.)

Here are the first two columns of 5 u:

5u  = \left(\begin{array}{ll} 0.898195 & -0.820867 \\ 0.500376 & -1.42956 \\ 0.396261 & -0.753289 \\ 0.447681 & -0.00764342 \\ -0.0543495 & -0.27814 \\ 0.197129 & -0.517025 \\ 0.556321 & 0.507807 \\ 0.886495 & -1.34697 \\ 0.85616 & -0.250689 \\ 0.617761 & 0.0961429 \\ 0.520875 & -0.502753 \\ -1.10976 & -0.237475 \\ 0.885591 & -0.722726 \\ -0.533138 & -0.508625 \\ -0.228874 & -0.41409 \\ 0.569114 & -1.06054 \\ 0.555631 & -0.49878 \\ -3.33314 & -0.716295 \\ -0.381018 & 1.00561 \\ 0.223669 & 1.7902 \\ 0.935125 & 1.89129 \\ 0.303693 & 2.11194 \\ -0.584818 & 1.28005 \\ -1.07726 & 1.07835 \\ 0.0264838 & 0.850927 \\ -2.0742 & -0.546846\end{array}\right)

edit 17 oct 2008: while it isn’t essential, it would help if i rounded those off, and showed F^T:

5 u = 5 u0 = \left(\begin{array}{ll} 0.9 & -0.82 \\ 0.5 & -1.43 \\ 0.4 & -0.75 \\ 0.45 & -0.01 \\ -0.05 & -0.28 \\ 0.2 & -0.52 \\ 0.56 & 0.51 \\ 0.89 & -1.35 \\ 0.86 & -0.25 \\ 0.62 & 0.1 \\ 0.52 & -0.5 \\ -1.11 & -0.24 \\ 0.89 & -0.72 \\ -0.53 & -0.51 \\ -0.23 & -0.41 \\ 0.57 & -1.06 \\ 0.56 & -0.5 \\ -3.33 & -0.72 \\ -0.38 & 1.01 \\ 0.22 & 1.79 \\ 0.94 & 1.89 \\ 0.3 & 2.11 \\ -0.58 & 1.28 \\ -1.08 & 1.08 \\ 0.03 & 0.85 \\ -2.07 & -0.55\end{array}\right) F^T = \left(\begin{array}{ll} 0.9 & -0.82 \\ 0.5 & -1.43 \\ 0.4 & -0.75 \\ 0.45 & -0.01 \\ -0.05 & -0.28 \\ 0.2 & -0.52 \\ 0.56 & 0.51 \\ 0.89 & -1.35 \\ 0.86 & -0.25 \\ 0.62 & 0.1 \\ 0.52 & -0.5 \\ -1.11 & -0.24 \\ 0.89 & -0.72 \\ -0.53 & -0.51 \\ -0.23 & -0.41 \\ 0.57 & -1.06 \\ 0.56 & -0.5 \\ -3.33 & -0.72 \\ -0.38 & 1.01 \\ 0.22 & 1.79 \\ 0.94 & 1.89 \\ 0.3 & 2.11 \\ -0.58 & 1.28 \\ -1.08 & 1.08 \\ 0.03 & 0.85 \\ -2.07 & -0.55\end{array}\right)

Those are, indeed, first two columns of the scores, and the first two columns of F^T.

It’s the computation of 5u that requires the signs of A to be consistent with the signs of u. That’s why I will use v from the SVD instead of V from the eigendecomposition of a correlation or covariance matrix, even though I will calculate their eigenvalues. (If, and it’s a big if, I choose to use the correlation or covariance matrices at all.)

I want to confirm the factoring of the data, that the product of the scores and the loadings is the standardized data. Let’s compute (5 u0) A^T. For this I need the cut-down u. Since A is 9×9, u0 must have 9 columns.

So I compute it. the difference between the product and the data, X - 5\ u0\ A^T\ , is the zero matrix (to within 10^-15 on each entry).

Let me emphasize that the \sqrt{N-1} showed up because we used \Lambda instead of w. If we’d never seen the eigenvalues \lambda of the correlation matrix, computing scores would have been simpler, because we’d have used w instead of the square roots \Lambda\ .

But, of course, the scores and loadings would have been different: we would have gotten a different matrix A. Instead of

X = \left(5\ u\right)\ \left(\Lambda\ v^T\right)\

we would have had

X = u\ \left(w\ v^T\right)\

i.e. A = w\ v^T\ .

Anyway, we know 3 equivalent ways to compute the scores for Harman’s or Bartholomew’s model:

  • project onto the reciprocal basis: either as X\ \left(V\  \Lambda^{-1}\right) or as F^T= X\ A^{-T}\ .
  • compute \sqrt{N-1}\ u0\ .

Ah, they’re not quite equivalent: what we just worked is a case where A was invertible.

What if A is not invertible? Equivalently, what if there’s a zero eigenvalue?

That is a very real possibility, even for real data: if the rows sum to exactly 1, then, as we saw, both the correlation matrix and the covariance matrix have a zero eigenvalue.

In such a case, \sqrt{N-1}\ u0\ is probably the way to go, although we can make the other computations work. Let’s look at such a case next time..

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 48 other followers

%d bloggers like this: