Malinowski’s work is considerably different from everything else we’ve seen before.
First of all, he expects that in most cases one will neither standardize nor even center the data X. We can do his computations as an SVD of X, or an eigendecomposition of or of
– but because the data isn’t even centered,
and
are not remotely covariance matrices. For this reason, I assert that preprocessing is a separate issue.
Nevertheless, the underlying mathematics is the same: get either an SVD of X or an eigendecomposition of and/or of
. But what do we do to X first? That’s a separate question.
Second, he is not primarily interested in eliminating “small” eigenvalues or singular values: he is interested in eliminating “experimental error”, i.e. “noise”. Although I worked thru his chapter 4 on estimating noise, I have not discussed it: his main interest is in deciding when x and are “close enough”, and without a real-world application, and without a more rigorous treatment, I’d rather pass for now. I’ll come back to his error stuff, however, if I ever find myself anywhere else looking at error estimation in PCA / FA.
In addition to omitting chapter 4, I stopped after chapter 5. What he has in common with Harman and Jolliffe is a lot of references to the literature. I didn’t see anything after chapter 5 that I could confirm the computations of.
Third, he doesn’t much use the usual vocabulary of scores and loadings, although he does use the subscript “load” for his in contrast to the subscript “basic” for his matrix X of successful test vectors x. (I sometimes think that he uses subscripts for emphasis rather than to distinguish entities, but I’m probably exaggerating.) In any case, I decided that the customary vocabulary was secondary to the mathematics: find the eigenvector matrix or matrices.
Fourth, he has no graphical techniques; he provides none of the graphs we came to expect in Harman, Jolliffe, and Davis. Such graphs do, in fact, have a place in chemistry; the Brereton “Chemometrics” – which I have not yet discussed – has a few. We will not do much with it: their internet data is available only to owners of the book, so I can compute to my heart’s content, but I can’t very well publish the data and you can’t very well follow along without it. But I will see if there’s anything I need to say about it.
Fifth, his notion of target testing (including using it to fill in missing values) is a whole new world, a brave new world, and I like it. I think I did it more simply than he did, but I was just cleaning up the math.
I do wonder. Can target testing be used in PCA “rotations”, i.e. change of basis? For handling multicollinearity in OLS? To tell us we should have centered the data? To tell us to subtract a constant? I don’t know yet. I’ll keep my eyes open.
I learned a lot from Malinowski about using the available tools. Davis taught me to use the SVD, Malinowski got me comfortable with using both the full and the cut-down SVDs. Not to mention using the u and v bases, and constructing the hat matrix.
OTOH, there is at least one thing we did along with Malinowski that the other authors did not do. They might have, but they did not: reconstitute the data matrix using the reduced set of eigenvalues or singular values.
The classical techniques generally just list the eigenvectors v1, possible weighted by the w0 (equivalently, by the square roots of the eigenvalues). But in principle, they could have computed D1. In practice, they usually got no closer than describing the new correlation matrix or variances.
When we start with the SVD
and replace the smallest singular value in w by 0 (and call the new matrix w0), we can reconstitute the data D as
(“Reconstitute” is intended to convey that impression that D1 is not the real data, not fresh orange juice. )
I want to show you more detail of the difference between D and D1. I’ve been a little vague about it.
Recall example 5 with noise. Here’s the data matrix:
The singular values were
We interpret 3 nonzero singular values to mean that the matrix D is technically of rank 3; we know we can go further, however, and say that D differs from a matrix of rank 2 by its smallest singular value, namely .238533.
(I could go so far as to say the matrix D is of rank 2.238533, but perhaps it’s better to leave it at “differs from a matrix of rank 2 by .238533.” After all, if the smallest singular value were 10, the matrix would differ from a matrix of rank 2 by 10, and I don’t want to say that it’s of rank 12 = 2+10.)
We replaced the smallest singular value (0.238533) by 0, and reconstituted the data, calling it D1.
What is the difference between D and D1? D1 is of rank 2, and the difference is supposed to be 0.238533.
Just how do we compute that difference? That’s the question.
The appropriate “norm” – appropriate because it gives this answer – is called the Frobenius norm, and it’s pretty simple: pretend the matrix is a vector, and compute its Euclidean norm (2-norm), namely the square root of the sum of squares.
Here we are. The element-by-element differences between D and D1 are:
Now take the square root of the sum of the squares of the “components”.
In case you’re working thru this with me, the squares of those numbers are
and the sum of them is 0.0568979, and the square root of that is 0.238533.
That number should seem familiar: it’s exactly the singular value that we set to zero; it’s exactly what it ought to be.
By computing the difference e1, and then the square root of the sum of squares, I explicitly computed the difference D – D1 and confirmed that it was equal to the smallest singular value of D. In practice, of course, there is no need to compute the sum-of-squares, because we already know it.
I’ll close that by saying that if we set two singular values to zero, the difference between the original (D) and the reconstituted (D1) is the square root of the sum of squares of the two singular values. What’s going on is that the norm of D is the same as the norm of w, and the norm of D1 is the same as the norm of w0; so the difference between D and D1 is the difference between w and w0.
I think we’re done here.
Oh, let me be clear. I’m glad I own Malinowski (ah, the book). If you’re going to be doing PCA / FA in the physical sciences, you probably want it, too. If you’re going to be doing PCA / FA in chemistry, don’t even think about not buying it. Just my opinion.