PCA / FA example 9: centered and raw, 3 models

What follows is simple computation, solely to show us exactly what happens. It continues the work of the previous post, which did my default calculations for standardized data. Here I do the same calculations for centered and raw data.

Centered

The raw data is still

\text{raw} = \left(\begin{array}{lll} 2.09653 & -0.793484 & -7.33899 \\ -1.75252 & 13.0576 & 0.103549 \\ 3.63702 & 29.0064 & 8.52945 \\ 0.0338101 & 46.912 & 19.8517 \\ 5.91502 & 70.9696 & 36.0372\end{array}\right)

I will center the data and call it Xc. Get the column means…

\{1.98597,\ 31.8304,\ 11.4366\}

and subtract them from each column to get Xc:

Xc = \left(\begin{array}{lll} 0.110558 & -32.6239 & -18.7756 \\ -3.73849 & -18.7728 & -11.333 \\ 1.65105 & -2.82404 & -2.90714 \\ -1.95216 & 15.0815 & 8.41516 \\ 3.92905 & 39.1392 & 24.6006\end{array}\right)

Proceeding as before, we get the SVD and display w…

wc = \left(\begin{array}{lll} 66.0141 & 0. & 0. \\ 0. & 5.01572 & 0. \\ 0. & 0. & 1.54942 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

Get the eigenvalues \lambda of the covariance matrix (and that’s the only change we make after switching to centered data):

\lambda c = \{1089.47,\ 6.28937,\ 0.600172\}

I’m getting fond of just doing pie charts:

Get the diagonal matrix \Lambda of square roots of those eigenvalues of the covariance matrix…

\Lambda c = \left(\begin{array}{lll} 33.0071 & 0. & 0. \\ 0. & 2.50786 & 0. \\ 0. & 0. & 0.774708\end{array}\right)

Get the weighted eigenvector matrix A

Ac = \left(\begin{array}{lll} -1.67235 & 2.48708 & -0.0914469 \\ -28.2097 & -0.261331 & -0.394039 \\ -17.0553 & 0.188375 & 0.660714\end{array}\right)

Get the scores F^T as the first three columns of 2u (where 2 = \sqrt{N-1}\ , N=5).

Fc^T = \left(\begin{array}{lll} 1.13849 & 0.83693 & 0.73261 \\ 0.669241 & -1.03777 & 0.41852 \\ 0.116099 & 0.683164 & -1.59786 \\ -0.519249 & -1.14658 & -0.340197 \\ -1.40458 & 0.664253 & 0.786923\end{array}\right)

Confirm that we have X = F^T\ A^T\ . (Yes.)

We compute the new data (Y = Xv = uw)

Yc = \left(\begin{array}{lll} 37.5783 & 2.0989 & 0.567558 \\ 22.0897 & -2.60257 & 0.324231 \\ 3.8321 & 1.71328 & -1.23787 \\ -17.1389 & -2.87546 & -0.263554 \\ -46.3612 & 1.66585 & 0.609635\end{array}\right)

And we confirm the product Xc = Yc\ vc^T\ – which amounts to confirming the SVD. Not a bad thing to do considering how many u’s and w’s I have floating around in my Mathematica® notebook.

Then, on second thought, I’m going to forget about computing Davis’ loadings and scores.

So we should display the two forms of loadings A, v together:

Ac = \left(\begin{array}{lll} -1.67235 & 2.48708 & -0.0914469 \\ -28.2097 & -0.261331 & -0.394039 \\ -17.0553 & 0.188375 & 0.660714\end{array}\right)

vc = \left(\begin{array}{lll} -0.0506666 & 0.991715 & -0.118041 \\ -0.854657 & -0.104205 & -0.508629 \\ -0.516715 & 0.0751137 & 0.852856\end{array}\right)

We should display the corresponding scores F^T, Y:

{Fc}^T = \left(\begin{array}{lll} 1.13849 & 0.83693 & 0.73261 \\ 0.669241 & -1.03777 & 0.41852 \\ 0.116099 & 0.683164 & -1.59786 \\ -0.519249 & -1.14658 & -0.340197 \\ -1.40458 & 0.664253 & 0.786923\end{array}\right)

Yc = \left(\begin{array}{lll} 37.5783 & 2.0989 & 0.567558 \\ 22.0897 & -2.60257 & 0.324231 \\ 3.8321 & 1.71328 & -1.23787 \\ -17.1389 & -2.87546 & -0.263554 \\ -46.3612 & 1.66585 & 0.609635\end{array}\right)

And we could confirm that the new data Y have mean 0 (yes) and redistributed variances equal to the eigenvalues:

\text{variances} = \{1089.47,\ 6.28937,\ 0.600172\}

(yes, the same as the eigenvalues \lambda r\ .)

And we could confirm that F^T is standardized: the means and variances of the columns are 0 and 1 respectively. in fact F^T is uncorrelated: its covariance matrix is the identity.

What about reconstituted data? I would look at w in preference to \lambda r since it’s the w’s I will be setting to zero.

wc = \left(\begin{array}{lll} 66.0141 & 0. & 0. \\ 0. & 5.01572 & 0. \\ 0. & 0. & 1.54942 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

As before, setting w_{33} to zero will have a very small effect on the data: the reconstituted matrix would differ from the original by a square root sum of squares of 1.55 . So I’m going to set all but the first w equal to zero, just to get reconstituted data that differs nontrivially from the original.

wc0 = \left(\begin{array}{lll} 66.0141 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

The reconstituted data is X0 = u\ w0\ v^T\ , i.e. Xc0 = uc\ wc0\ {vc}^T\

Xc0 = \left(\begin{array}{lll} -1.90396 & -32.1165 & -19.4173 \\ -1.11921 & -18.8791 & -11.4141 \\ -0.194159 & -3.27513 & -1.9801 \\ 0.868368 & 14.6479 & 8.85592 \\ 2.34896 & 39.6229 & 23.9555\end{array}\right)

and the leftover is X1 = X – X0 is

Xc1 = Xc - Xc0 = \left(\begin{array}{lll} 2.01452 & -0.507392 & 0.641702 \\ -2.61928 & 0.106287 & 0.0810332 \\ 1.8452 & 0.451085 & -0.927034 \\ -2.82053 & 0.433688 & -0.44076 \\ 1.58009 & -0.483668 & 0.645059\end{array}\right)

I’ll show you again that the difference between X and X0 is the sum of squares of the w’s we omitted. the squares of the entries in Xc1 are

\left(\begin{array}{lll} 4.05829 & 0.257447 & 0.411781 \\ 6.86065 & 0.0112969 & 0.00656638 \\ 3.40478 & 0.203478 & 0.859393 \\ 7.9554 & 0.188085 & 0.194269 \\ 2.49669 & 0.233934 & 0.416101\end{array}\right)

Their sum is 27.5582 and the square root is 5.24959 . And the squares of w_{22} and w_{33} are 25.1575 and 2.40069, with matching sum 27.5582 .

We could compute the reconstituted new data Y, but as before, it would just be the first column of Yc.

Raw

I use the raw data and call it Xr.

Xr = \left(\begin{array}{lll} 2.09653 & -0.793484 & -7.33899 \\ -1.75252 & 13.0576 & 0.103549 \\ 3.63702 & 29.0064 & 8.52945 \\ 0.0338101 & 46.912 & 19.8517 \\ 5.91502 & 70.9696 & 36.0372\end{array}\right)

Get the SVD and display w…

wr = \left(\begin{array}{lll} 100. & 0. & 0. \\ 0. & 10. & 0. \\ 0. & 0. & 5. \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

Neither the correlation matrix nor covariance matrix is appropriate, so we look at the singular values. But we want to honor

w = \sqrt{N-1}\ \Lambda\ ,

I could compute the \Lambda matrix from the w matrix. Given the sizes of the matrices, I might actually compute

\Lambda^2 = \frac{w^T w}{N-1}

and take the square root. But conceptually it is cleaner to compute the eigenvalues of \frac{X^T X}{N-1}\ , which is exactly the correlation and covariance matrices for standardized and centered data. (Let me put that another way: if I did not have built-in Mathematica® commands Covariance and Correlation, I would be computing \frac{X^T X}{N-1}\ in every case.) Note, however, that for the raw data it is not remotely a covariance matrix. I get

\lambda r = \{2500.,\ 25.,\ 6.25\}

and here’s the pie chart of them:

and then I form the diagonal matrix \Lambda of square roots:

\Lambda r = \left(\begin{array}{lll} 50. & 0. & 0. \\ 0. & 5. & 0. \\ 0. & 0. & 2.5\end{array}\right)

Get the weighted eigenvector matrix A = v\ \Lambda

Ar = \left(\begin{array}{lll} -2.77207 & 0.0869692 & 2.49578 \\ -45.3666 & 2.08223 & -0.144112 \\ -20.8372 & -4.54497 & -0.0182659\end{array}\right)

Get the scores FT as the first three columns of 2u (where 2 = \sqrt{N-1}\ , N=5).

{Fr}^T = \left(\begin{array}{lll} 0.0732441 & 1.27542 & 0.87694 \\ -0.235872 & 1.06264 & -1.00121 \\ -0.601493 & 0.877928 & 0.758596 \\ -1.01679 & 0.298354 & -1.12621 \\ -1.59478 & -0.619934 & 0.620283\end{array}\right)

Check by confirming that X = F^T\ A^T\ for Xr, Fr, Ar. (Yes.)

we compute the new data (Y = u w):

Yr = \left(\begin{array}{lll} 3.66221 & 6.37712 & 2.19235 \\ -11.7936 & 5.31319 & -2.50302 \\ -30.0746 & 4.38964 & 1.89649 \\ -50.8397 & 1.49177 & -2.81552 \\ -79.7392 & -3.09967 & 1.55071\end{array}\right)

And we confirm the product X = Y\ v^T\ – which amounts to confirming the SVD. Not a bad thing to do considering how many u’s and w’s I have floating around.

Now let’s look at the new data Y. Here are its means…

\{-33.757,\ 2.89441,\ 0.064203\}

and its variances…

\{1075.58,\ 14.528,\ 6.24485\}

The sum of the variances is 1096.36, and that is, in fact, the sum of the variances of the raw data. we have redistributed the variance.

But.

Recall the eigenvalues:

\lambda r = \{2500.,\ 25.,\ 6.25\}

The first “but” is that the redistributed variances have nothing to do with the eigenvalues. Hey, Xc is not centered, so \frac{X^T X}{N-1}\ has nothing to do with the variance of the raw data.

The second “but” is that the redistributed variances of the centered data were

\{1089.47,\ 6.28937,\ 0.600172\}.

and the first one is larger than what I got for Yr. That’s actually good, since the first variance of Yc was supposed to be the largest I could get. The redistribution of variance of the raw data is not maximal. It’s not that far off, but it’s not maximal.

This has implications for regression analysis. That is, if I want to redistribute the variance of a bunch of variables for regression, to do it properly I have to center the data, which is not something I like doing under the circumstances. The uncentered data seems more appropriate.

The change to Y is far more significant than the change to F^T. In this case, it is still true that

\frac{X^T X}{N-1} = I\ ,

but it is no longer appropriate to say that the F^T are uncorrelated. In fact, the means are nonzero

\{-0.67514,\ 0.578882,\ 0.0256812\}

and the variances are not 1:

\{0.430233,\ 0.581119,\ 0.999176\}

The matrix F^T is almost orthonormal, except that instead of

F^T\ F = I

we have

F^T\ F = \left(N-1\right)\ I

What about reconstituted data? I would look at w in preference to \lambda r\ :

wr = \left(\begin{array}{lll} 100. & 0. & 0. \\ 0. & 10. & 0. \\ 0. & 0. & 5. \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

As before, setting w_{33} to zero will have a very small effect on the data: the reconstituted matrix would differ from the original by a total sum of squares of 25 . so I’m going to set all but the first w equal to zero, just to get reconstituted data that differs nontrivially from the original.

Here’s w0:

wr0 = \left(\begin{array}{lll} 100. & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right)

Here’s the reconstituted data:

Xr0 = \left(\begin{array}{lll} -0.203038 & -3.32283 & -1.52621 \\ 0.653854 & 10.7007 & 4.91492 \\ 1.66738 & 27.2877 & 12.5335 \\ 2.81862 & 46.1284 & 21.1872 \\ 4.42085 & 72.3499 & 33.2309\end{array}\right)

It is still true that the raw data X and the reconstituted data X0 differ by the square root of the sum of squares of the two omitted w’s.

Finally, if we were to compute the new components of the reconstituted data we would get the first column of the Yr matrix.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: