I want to look at reconstituting the data. Equivalently, I want to look at setting successive singular values to zero.

This example was actually built on the previous one. Before I set the row sums to 1, I had started with

$t1 = \left(\begin{array}{lll} 1 & 1 & -3 \\ -1 & 2 & -2 \\ 1 & 3 & -1 \\ -1 & 4 & 1 \\ 1 & 5 & 4\end{array}\right)$

I’m going to continue with Harmon’s & Bartholomew’s model: Z = A F, Z = X^T, X is standardized, A is an eigenvector matrix weighted by the square roots of the eigenvalues of the correlation matrix of X.

I want data with one eigenvalue so large that we could sensibly retain only that one. Let me show you how I got that.

Get the SVD (Singular Value Decomposition) of t1… and look at w:

$w = \left(\begin{array}{lll} 7.84944 & 0. & 0. \\ 0. & 4.9565 & 0. \\ 0. & 0. & 2.19533 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)$

OK, keep u and v – just because they’re handy – but redefine w, and compute a new data matrix using this w:

$w = \left(\begin{array}{lll} 100 & 0. & 0. \\ 0. & 10 & 0. \\ 0. & 0. & 5 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)$

Let $t2 = u\ w\ v^T\$:

$t2 = \left(\begin{array}{lll} 2.09653 & -0.793484 & -7.33899 \\ -1.75252 & 13.0576 & 0.103549 \\ 3.63702 & 29.0064 & 8.52945 \\ 0.0338101 & 46.912 & 19.8517 \\ 5.91502 & 70.9696 & 36.0372\end{array}\right)$

Standardize t2 to get data X (“the data”):

$X = \left(\begin{array}{lll} 0.0368717 & -1.15632 & -1.09998 \\ -1.24681 & -0.665379 & -0.663951 \\ 0.550632 & -0.100095 & -0.170316 \\ -0.651057 & 0.534548 & 0.493006 \\ 1.31036 & 1.38724 & 1.44124\end{array}\right)$

Get the SVD of X, $X = u\ w\ v^T\$, but look at w… (we’ll see u and v later)

$w = \left(\begin{array}{lll} 3.11948 & 0. & 0. \\ 0. & 1.50437 & 0. \\ 0. & 0. & 0.0757068 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)$

Get the eigenvalues of the correlation matrix and look at the percentages:

$\lambda = \{2.43279, 0.565781, 0.00143288\}$

$\text{percent} = \{81.0929, 18.8594, 0.0477627\}$

$\text{cumulative percent} = \{81.0929, 99.9522, 100.\}$

so our first eigenvalue is 81% of the sum, the second is nearly all the rest, 19%.

Get the diagonal matrix of square roots, $\Lambda\$:

$\Lambda = \left(\begin{array}{lll} 1.55974 & 0. & 0. \\ 0. & 0.752184 & 0. \\ 0. & 0. & 0.0378534\end{array}\right)$

Get A and the scores $2\ u = F^T\$, except that I never want more than 3 columns. (The first 3 columns of 2 u are the components of the new data wrt the A basis.)

$A = \left(\begin{array}{lll} -0.752315 & 0.658804 & -0.000578824 \\ -0.963827 & -0.265197 & -0.0266011 \\ -0.968424 & -0.247849 & 0.0269245\end{array}\right)$

$F^T = 2 u = \left(\begin{array}{lll} 0.88458 & 1.06679 & 0.782819 \\ 0.913474 & -0.849064 & 0.380336 \\ -0.0628237 & 0.762691 & -1.56451 \\ -0.206697 & -1.22463 & -0.396946 \\ -1.52853 & 0.244206 & 0.798301\end{array}\right)$

Check by confirming that $F^T\ A^T = X\$, i.e. that we have factored the data matrix.

Let’s quickly compute the reconstituted data from 1 and 2 singular values. I could use square forms of w (w1 and w2) with u and v cut down to u1 and v1 (or u2, v2) to be conformable. To be specific,

$w1 = \left(\begin{array}{l} 3.11948\end{array}\right)$

and compute $X1 = u1\ w1\ v1^T\$:

$X1 = \left(\begin{array}{lll} -0.665482 & -0.852582 & -0.856648 \\ -0.68722 & -0.880431 & -0.884631 \\ 0.0472632 & 0.0605512 & 0.06084 \\ 0.155502 & 0.199221 & 0.200171 \\ 1.14994 & 1.47324 & 1.48027\end{array}\right)$

or take

$w2 = \left(\begin{array}{ll} 3.11948 & 0. \\ 0. & 1.50437\end{array}\right)$

and then $X2 = u2\ w2\ v2^T\$:

$X2 = \left(\begin{array}{lll} 0.0373248 & -1.13549 & -1.12105 \\ -1.24659 & -0.655262 & -0.674191 \\ 0.549727 & -0.141712 & -0.128192 \\ -0.651287 & 0.523988 & 0.503694 \\ 1.31082 & 1.40848 & 1.41974\end{array}\right)$

You might note that my “1” and “2” indicate how many singular values I retained.

But what I was saying about leaving u and v untouched is that I can keep u and v full-size

$u = \left(\begin{array}{lllll} 0.44229 & 0.533396 & 0.391409 & 0.605413 & -0.0119286 \\ 0.456737 & -0.424532 & 0.190168 & -0.0677073 & 0.755259 \\ -0.0314119 & 0.381346 & -0.782255 & 0.201541 & 0.448384 \\ -0.103349 & -0.612313 & -0.198473 & 0.740037 & -0.165366 \\ -0.764266 & 0.122103 & 0.39915 & 0.201541 & 0.448384\end{array}\right)$

$v = \left(\begin{array}{lll} -0.482334 & 0.875854 & -0.0152912 \\ -0.617941 & -0.35257 & -0.70274 \\ -0.620889 & -0.329506 & 0.711283\end{array}\right)$

and use the following matrices for w, all the same size as X. The full-size w is:

$w = \left(\begin{array}{lll} 3.11948 & 0. & 0. \\ 0. & 1.50437 & 0. \\ 0. & 0. & 0.0757068 \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)$

and the two others are:

$w2 = \left(\begin{array}{lll} 3.11948 & 0 & 0 \\ 0 & 1.50437 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)$

$w1 = \left(\begin{array}{lll} 3.11948 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)$

It’s easy enough to reconstitute the data that way,

$X1 = u\ w1\ v^T$

$X2 = u\ w2\ v^T$

and we do get the same answers for X1 and X2.

Now, what if take just the first two columns of F^T, and the first two columns of A (i.e. the first two rows of A^T)? (We saw this last time, too.) That is, I cut F^T down to:

$F^T = \left(\begin{array}{ll} 0.88458 & 1.06679 \\ 0.913474 & -0.849064 \\ -0.0628237 & 0.762691 \\ -0.206697 & -1.22463 \\ -1.52853 & 0.244206\end{array}\right)$

and I cut A^T down to

$A^T = \left(\begin{array}{lll} -0.752315 & -0.963827 & -0.968424 \\ 0.658804 & -0.265197 & -0.247849\end{array}\right)$

Do we get X2? Yes.

What if take just the first column of FT, and the first row of A^T? That is, I take FT to be:

$F^T = \left(\begin{array}{l} 0.88458 \\ 0.913474 \\ -0.0628237 \\ -0.206697 \\ -1.52853\end{array}\right)$

and AT to be

$A^T = \left(\begin{array}{lll} -0.752315 & -0.963827 & -0.968424\end{array}\right)$

Is their product equal to X1? Yes.

What we just saw is obvious in retrospect, but worth stating explicitly. When we throw away small singular values or eigenvalues, we do not change the scores and loadings. What we change, instead, is the number of scores and loadings used.

When we throw away a nonzero singular value or eigenvalue, we are throwing away a scores-and-loadings pair. We don’t change any of the other scores and loadings.

We have 3 pairs of “scores & loadings”, and then for each pair we compute a product. The individual pairs, and the 3 products, are not affected by our decision to “drop a factor”. Instead of adding all three products, we may choose to add only the first two products, or to keep only the first product.

It is probably customary to say that we are retaining 1 or 2 factors when we retain 1 or 2 scores-loadings pairs.

That “the scores” are selected columns of $\sqrt{N-1}\ u$ tells us that the individual scores & loadings cannot change no matter how many factors we keep.

Throwing away – choosing not to use – a nonzero scores-loadings pair does affect the reconstituted data. X2 is close to X, but not the same, and X1 is quite different.

$X2 - X = \left(\begin{array}{lll} 0.000453114 & 0.0208238 & -0.021077 \\ 0.000220147 & 0.0101173 & -0.0102403 \\ -0.000905575 & -0.0416176 & 0.0421236 \\ -0.000229762 & -0.0105592 & 0.0106876 \\ 0.000462075 & 0.0212357 & -0.0214938\end{array}\right)$

$X1 - X = \left(\begin{array}{lll} -0.702354 & 0.303734 & 0.243327 \\ 0.559586 & -0.215052 & -0.22068 \\ -0.503369 & 0.160646 & 0.231156 \\ 0.806559 & -0.335327 & -0.292835 \\ -0.160422 & 0.0859985 & 0.0390325\end{array}\right)$

To look at that another way, the means of X2 are zero, and the variances of X2 (2 factors retained) are:

$\{1., 0.999292, 0.999275\}$

“Not far from 1” is an understatement.

The means of X1 are also zero, but the variances of X1 (1 factor retained) are:

$\{0.565977, 0.928963, 0.937846\}$

You might note that the X1 data is still centered, but it is no longer standardized. That was lost when we threw away a fairly large singular value.

Next, I think I will show you what I reckon I would do, today, for PCA / FA.