PCA / FA example 9: standardized data, 3 models

Introduction

edited 16 Jan 2009: I found a place where I called F^T the loadings instead of the scores. That’s all.

I want to run thru what is admittedly a toy case, but this seems to be where I stand on the computation of PCA / FA.

Recall the raw data of example 9:

\text{raw} = \left(\begin{array}{lll} 2.09653 & -0.793484 & -7.33899 \\ -1.75252 & 13.0576 & 0.103549 \\ 3.63702 & 29.0064 & 8.52945 \\ 0.0338101 & 46.912 & 19.8517 \\ 5.91502 & 70.9696 & 36.0372\end{array}\right)

Get the mean and variance of each column. The means are

\{1.98597,\ 31.8304,\ 11.4366\}

and the variances are

\{8.99072,\ 796.011,\ 291.354\}

We see that the raw data will differ from the centered data, and that will differ from the standardized data. Let’s do the standardized data first, because that’s what we’ve been doing most recently.

Here’s what I’m going to do. For a data matrix X

  1. get the SVD, X = u\ w\ v^T
  2. get the eigenvalues \lambda\ \text{of } X^T\ X/\left(N-1\right) (in 2 cases, that’s the correlation matrix or the covariance matrix)
  3. form the diagonal matrix \Lambda\ \text{of } \sqrt{\lambda}
  4. form the weighted eigenvector matrix A = v\ \Lambda
  5. form the loadings scores F^T= \sqrt{N-1}\ u
  6. form the new data Y wrt v, Y = u w
  7. form Davis’ loadings A^R = v\ w^T
  8. form Davis’ scores S^R = X\ A^R\ .


I will also keep the largest “few” singular values w
and form the reconstituted data X_0\ and the residual data X_1\ , where:

X = X_0 + X_1\ .

From given raw data, I’m going to do all that for the raw data, the centered data, and the standardized data. Chances are only one of those is really appropriate for any given problem, but I don’t know how to decide, so I’m going to use my really cheap computing power to see it all.

That lays out everyone’s models. For Harman and Bartholomew, we’re getting

X = F^T\ A^T (i.e. Z = A F)

for Jolliffe and Malinowski we’re getting

X = Y\ v^T (i.e. X = R C and Y = X v)

and I’ve included Davis’s R-mode scores and loadings.

This is complicated by my doing it three times, but that encourages me to check things along the way. Two of the obvious checks are

  • X = F^T\ A^T
  • X = Y\ v^T\

but we can also check that

  • Y has redistributed the variance
  • F^T is uncorrelated: covariance matrix = identity

but we have to be very careful about the last two when we work with the raw data.

Finally, I will look across loadings v, A, AR and across corresponding scores Y, F^T, SR

Gentemen, start your engines.

Standardized

i standardize the data and call it Xs:

Xs = \left(\begin{array}{lll} 0.0368717 & -1.15632 & -1.09998 \\ -1.24681 & -0.665379 & -0.663951 \\ 0.550632 & -0.100095 & -0.170316 \\ -0.651057 & 0.534548 & 0.493006 \\ 1.31036 & 1.38724 & 1.44124\end{array}\right)

Although we saw most of these calculations in the previous post, my focus is a little different this time. Among other things, we will see just how few things need to be computed.

Get the SVD X = u\ w\ v^T\ , labeled by “s”. For now, look only at w…

ws = \left(\begin{array}{lll} 3.11948 & 0. & 0. \\ 0. & 1.50437 & 0. \\ 0. & 0. & 0.0757068 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

Get the eigenvalues \lambda of the correlation matrix:

\lambda s = \{2.43279,\ 0.565781,\ 0.00143288\}

If I simply do a pie chart, I don’t have to explicitly compute percentages:

Form the diagonal matrix of square roots \Lambda\

\Lambda s = \left(\begin{array}{lll} 1.55974 & 0. & 0. \\ 0. & 0.752184 & 0. \\ 0. & 0. & 0.0378534\end{array}\right)

Form the weighted eigenvector matrix A

As = \left(\begin{array}{lll} -0.752315 & 0.658804 & -0.000578824 \\ -0.963827 & -0.265197 & -0.0266011 \\ -0.968424 & -0.247849 & 0.0269245\end{array}\right)

Get F^T as the first three columns of 2u (where 2 = \sqrt{N-1}\ , N=5). This is as close to the u matrix itself as I need to get.

Fs^T = \left(\begin{array}{lll} 0.88458 & 1.06679 & 0.782819 \\ 0.913474 & -0.849064 & 0.380336 \\ -0.0628237 & 0.762691 & -1.56451 \\ -0.206697 & -1.22463 & -0.396946 \\ -1.52853 & 0.244206 & 0.798301\end{array}\right)

Check it by confirming that we have X = F^T\ A^T\ . We do.

We compute the new data Y wrt the v basis (Y = u w = Xv):

Ys = \left(\begin{array}{lll} 1.37971 & 0.802425 & 0.0296324 \\ 1.42478 & -0.638652 & 0.014397 \\ -0.0979886 & 0.573684 & -0.059222 \\ -0.322394 & -0.921145 & -0.0150258 \\ -2.38411 & 0.183688 & 0.0302184\end{array}\right)

And we confirm the product Xs = Ys\ vs^T\ – which amounts to confirming the SVD. (That is, from Y = u w, the SVD becomes X = Y v^T\ .) Not a bad thing to do considering how many u’s and w’s I have floating around.

Finally, Davis defines R-mode loadings AR and scores SR:

A^R = v\ w^T

S^R = X\ A^R\ ,

so we can compute them:

As^R = \left(\begin{array}{lllll} -1.50463 & 1.31761 & -0.00115765 & 0. & 0. \\ -1.92765 & -0.530395 & -0.0532022 & 0. & 0. \\ -1.93685 & -0.495698 & 0.0538489 & 0. & 0.\end{array}\right)

Ss^R = \left(\begin{array}{lllll} 4.30399 & 1.20714 & 0.00224337 & 0. & 0. \\ 4.44458 & -0.960768 & 0.00108995 & 0. & 0. \\ -0.305673 & 0.863033 & -0.00448351 & 0. & 0. \\ -1.0057 & -1.38574 & -0.00113755 & 0. & 0. \\ -7.43719 & 0.276335 & 0.00228774 & 0. & 0.\end{array}\right)

OK, he would of course drop the zero columns:

As^R = \left(\begin{array}{lll} -1.50463 & 1.31761 & -0.00115765 \\ -1.92765 & -0.530395 & -0.0532022 \\ -1.93685 & -0.495698 & 0.0538489\end{array}\right)

Ss^R = \left(\begin{array}{lll} 4.30399 & 1.20714 & 0.00224337 \\ 4.44458 & -0.960768 & 0.00108995 \\ -0.305673 & 0.863033 & -0.00448351 \\ -1.0057 & -1.38574 & -0.00113755 \\ -7.43719 & 0.276335 & 0.00228774\end{array}\right)

In case I’ve never pointed it out before, although Davis’ scores S^R are strange, his loadings A^R = v\ w^T are exactly analagous to A = v\ \Lambda\ . The difference is in using w for weights instead of \Lambda\ .
We should display the three definitions of loadings A, v, A^R (and this is when I choose to see v from the SVD computed way back at the beginning):

As = \left(\begin{array}{lll} -0.752315 & 0.658804 & -0.000578824 \\ -0.963827 & -0.265197 & -0.0266011 \\ -0.968424 & -0.247849 & 0.0269245\end{array}\right)

vs = \left(\begin{array}{lll} -0.482334 & 0.875854 & -0.0152912 \\ -0.617941 & -0.35257 & -0.70274 \\ -0.620889 & -0.329506 & 0.711283\end{array}\right)

As^R\left(\begin{array}{lll} -1.50463 & 1.31761 & -0.00115765 \\ -1.92765 & -0.530395 & -0.0532022 \\ -1.93685 & -0.495698 & 0.0538489\end{array}\right)

Since we have no zeroes, we can display the ratios (no, I wouldn’t usually do this):

As / vs = \left(\begin{array}{lll} 1.55974 & 0.752184 & 0.0378534 \\ 1.55974 & 0.752184 & 0.0378534 \\ 1.55974 & 0.752184 & 0.0378534\end{array}\right)

As^R / vs = \left(\begin{array}{lll} 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068\end{array}\right)

As^R / As = \left(\begin{array}{lll} 2. & 2. & 2. \\ 2. & 2. & 2. \\ 2. & 2. & 2.\end{array}\right)

i hope there were no surprises there. Each column of A or A^R is a multiple of the corresponding column of v.

We should display the corresponding scores (“new data”) F^T, Y, S^R (where F^T and Y are new components of X wrt A and v resp., but S^R is new components wrt the reciprocal basis of A.)

Fs^T = \left(\begin{array}{lll} 0.88458 & 1.06679 & 0.782819 \\ 0.913474 & -0.849064 & 0.380336 \\ -0.0628237 & 0.762691 & -1.56451 \\ -0.206697 & -1.22463 & -0.396946 \\ -1.52853 & 0.244206 & 0.798301\end{array}\right)

Ys = \left(\begin{array}{lll} 1.37971 & 0.802425 & 0.0296324 \\ 1.42478 & -0.638652 & 0.014397 \\ -0.0979886 & 0.573684 & -0.059222 \\ -0.322394 & -0.921145 & -0.0150258 \\ -2.38411 & 0.183688 & 0.0302184\end{array}\right)

Ss^R = \left(\begin{array}{lll} 4.30399 & 1.20714 & 0.00224337 \\ 4.44458 & -0.960768 & 0.00108995 \\ -0.305673 & 0.863033 & -0.00448351 \\ -1.0057 & -1.38574 & -0.00113755 \\ -7.43719 & 0.276335 & 0.00228774\end{array}\right)

Since we have no zeroes, we can display the ratios:

Fs^T / Ys = \left(\begin{array}{lll} 0.641133 & 1.32946 & 26.4177 \\ 0.641133 & 1.32946 & 26.4177 \\ 0.641133 & 1.32946 & 26.4177 \\ 0.641133 & 1.32946 & 26.4177 \\ 0.641133 & 1.32946 & 26.4177\end{array}\right)

Ss^R / Ys = \left(\begin{array}{lll} 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068 \\ 3.11948 & 1.50437 & 0.0757068\end{array}\right)

Again, one set of scores is a column weighted multiple of any other.

We could confirm that the new data Y are centered and have redistributed variances equal to the eigenvalues: each column of Y does have mean 0, and the variances are

\{2.43279,\ 0.565781,\ 0.00143288\}

and those are indeed the eigenvalues

\lambda = \{2.43279,\ 0.565781,\ 0.00143288\}

And we could confirm that F^T is standardized: each column does in fact have mean 0 and variance 1. In fact, far more is true: F^T is uncorrelated, it’s covariance matrix is the identity.

What about reconstituted data? I would look at w in preference to \lambda\ :

ws = \left(\begin{array}{lll} 3.11948 & 0. & 0. \\ 0. & 1.50437 & 0. \\ 0. & 0. & 0.0757068 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

We know by now that setting w_{33} to zero will have a very small effect on the data: the reconstituted matrix would differ from the original by a root sum of squares = \sqrt{w_{33}^2}  = .076\ . So I’m going to set all but the first w equal to zero, just to get reconstituted data that differs nontrivially from the original.

There are many equivalent ways to do this, as we saw last time. What come to mind are

  1. take 1 column of F and one column of A.
  2. set two columns of F and two columns of A to zero.
  3. reduce w to a 1×1 matrix and take skinny forms of u and v.
  4. set two columns of Y to zero.
  5. leave sizes alone, but reset two w’s to zero.

i’ll do the last one, (5). And we didn’t actually see the second-to-last one (4), but I’ll show it to you after we get the reconstituted data. I zero-out two entries in w:

ws0 = \left(\begin{array}{lll} 3.11948 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

and I compute Xs0 = u\ ws0\ v^T\ :

Xs0 = \left(\begin{array}{lll} -0.665482 & -0.852582 & -0.856648 \\ -0.68722 & -0.880431 & -0.884631 \\ 0.0472632 & 0.0605512 & 0.06084 \\ 0.155502 & 0.199221 & 0.200171 \\ 1.14994 & 1.47324 & 1.48027\end{array}\right)

Now recall our initial Xs:

Xs = \left(\begin{array}{lll} 0.0368717 & -1.15632 & -1.09998 \\ -1.24681 & -0.665379 & -0.663951 \\ 0.550632 & -0.100095 & -0.170316 \\ -0.651057 & 0.534548 & 0.493006 \\ 1.31036 & 1.38724 & 1.44124\end{array}\right)

and look at their difference:

Xs1=Xs - Xs0 = \left(\begin{array}{lll} 0.702354 & -0.303734 & -0.243327 \\ -0.559586 & 0.215052 & 0.22068 \\ 0.503369 & -0.160646 & -0.231156 \\ -0.806559 & 0.335327 & 0.292835 \\ 0.160422 & -0.0859985 & -0.0390325\end{array}\right)

Incidentally, we could square each element in Xs1

\left(\begin{array}{lll} 0.493301 & 0.0922546 & 0.0592079 \\ 0.313137 & 0.0462474 & 0.0486997 \\ 0.253381 & 0.0258071 & 0.0534331 \\ 0.650537 & 0.112444 & 0.0857524 \\ 0.0257352 & 0.00739575 & 0.00152354\end{array}\right)

and add up all those squares: the sum is 2.26886 . That, as we’ve seen before, is also the sum of squares of the omitted w’s; that is,

w_{22}^2 + w_{33}^2 = 2.26886\ .

Is it plausible to ask about reconstituted Y? yeah, that’s just part of the previous calculation. We had computed Xs0 = u\ ws0\ v^T\ ; the reconstituted Y is Y0 = u\ ws0\

Ys0 = \left(\begin{array}{lll} 1.37971 & 0. & 0. \\ 1.42478 & 0. & 0. \\ -0.0979886 & 0. & 0. \\ -0.322394 & 0. & 0. \\ -2.38411 & 0. & 0.\end{array}\right)

Ah ha! And sure enough, the difference between Ys and Ys0 is extremely simple:

Ys - Ys0 = \left(\begin{array}{lll} 0. & 0.802425 & 0.0296324 \\ 0. & -0.638652 & 0.014397 \\ 0. & 0.573684 & -0.059222 \\ 0. & -0.921145 & -0.0150258 \\ 0. & 0.183688 & 0.0302184\end{array}\right)

So the reconstituted Y was the first column of Ys and the other two columns set to zero. We could have just taken it and post-multiplied by vs^T to get the reconstituted X: this was variant (4) of ways to do the calculation.

OK, we’ve seen how to run thru the calculations, for the standardized data. There’s a lot redundancy in all that I’ve computed, more than I need for any analysis of my own. Still, if I’m checking someone’s else’s work, I should have routinely gotten anything they came up with.

That’s enough for now; next time, centered data and raw data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: