PCA / FA Malinowski: example 5.

(June 10: i have made 4 edits, all cosmetic. you may search on “edit:”)

Malinowski (edit: “Factor Analysis in Chemistry”, 3rd ed.) does a lot of things differently from what we’ve seen. Fortunately, his model is simple enough, although his notation is… different. His model is

X = R C,

and he calls R and C the row and column matrices respectively. He wants X to have more rows than columns, so he transposes if necessary; then he chooses C to have more columns than rows, and R will have more rows than columns. For starters, then, his X matrix looks like the usual design matrix for regression. (Incidentally, he didn’t call it X.)

He chooses C = {v_1}^T, from the cut-down SVD. That is, I write the SVD of X as

X = u\ w\ v^T\ ,

where u and v are orthogonal and w is the same shape as X. But we know from the derivation and our experience with Davis that we may also write

X = u_1\ w_1\ {v_1}^T\ ,

where w_1 is square, diagonal, and invertible (it is a cut-down w), and u_1 and v_1 are the submatrices of u and v which are conformable with w_1. (We’ll see all this shortly.) We have dropped the parts of u, w, and v which are not required for reproducing X. (I remind you that what we’ve lost is the orthogonality of the matrices u and v.)

Malinowski’s model is

X = \left(u_1\ w_1\right)\ {v_1}^T\ ,

i.e. he has chosen

C = {v_1}^T

R = u_1\ w_1\ .

In particular, C is (part of) an eigenvector matrix. I would argue that whether we use those choices or

C = {v}^T

R = u\ w

is purely a matter of convenience, of no real significance.

Have we seen this before? Yes. This is effectively Harman’s model. That was

Z = A F


eigenvector matrix A and Z = X^T\ , i.e.

X = F^T\ A^T\ .

In addition, when we looked at Harman, we found F by computing

F^T = X\ A^{-T}\ .

Malinowski does the same thing. Having chosen C = v^T\ , he computes

R = X\ v_1\ .

That’s ok. We know that u_1 and v_1 are not orthogonal, but they are orthonormal matrices:

{u_1}^T\ u_1 = I = {v_1}^T\ v_1\ .

Then from

X = R\ {v_1}^T

we post-multiply by v1 and get

X\  v_1 = R\ {v_1}^T\ v_1 = R\ .

I should probably recall that v could be found as an eigenvector matrix of X^T\ X\ , and v_1 is the subset of eigenvectors with nonzero eigenvalues. What’s important about this is that we could have gotten R and C without ever knowing u or u_1\ .

He knows about the SVD, but he seems not to really use it. I say that because i’m going to get a lot of mileage from u and u_1\ . I should probably also recall that (edit: add R to the following)

R = X\ v = u\ w\ = A^Q

(see Davis) is the new components of the data wrt the v basis. So we understand R.

He prefers the term “factor analysis” to “principal component analysis” because “component” has special meaning in chemistry; otherwise, he considers the terms to be synonyms, along with “singular value decomposition”. He also prefers “row matrix” and “column matrix” to “scores” and “factors”. He says, in fact (p.29), “… if we focus attention on R, we would call R the score matrix and C the loading matrix. If we focus attention on C, we would call C the score matrix and R the loading matrix.” Maybe that’s when I stopped caring about the usual terminology in PCA / FA.

Far more important than his breaking free of the conventional names is his determination that in chemistry, the data should not be centered. Do take logarithms, for example, if it seems you should, but do not subtract out the means of the variables. “By using covariance or correlation about the mean, we lose information concerning the zero point of the experimental scale.” This is what broke me free of covariance and correlation matrices.

His entire chapter 5 is devoted to one manufactured example. I do not wish to reproduce an entire chapter, so I am going to build a similar but different example.

Here is my data: 3 variables, 5 observations each.

X = \left(\begin{array}{lll} 2 & 3 & 4 \\ 1 & 0 & -1 \\ 4 & 5 & 6 \\ 3 & 2 & 1 \\ 6 & 7 & 8\end{array}\right)

We get the SVD of X, i.e. X = u\ w\ v^T\ without any preprocessing at all. We look at w to see the rank of X. Both w and X are of rank 2 because there are 2 nonzero principal values.

w = \left(\begin{array}{lll} 16.2781 & 0. & 0. \\ 0. & 2.45421 & 0. \\ 0. & 0. & 0. \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)

In this case, for example, with only two nonzero singular values in w, if drop three rows and one column of w, we get w1 an invertible diagonal 2×2:

w_1 = \left(\begin{array}{ll} 16.2781 & 0. \\ 0. & 2.45421\end{array}\right)

In order to have X = u_1\ w_1\ {v_1}^T\ , we drop three columns of u and one column of v. (We are partially reversing the derivation of the SVD. In that, we had v, split it into v1 and v2, then we cut down w to w1 and computed u1; then we extended u1 to an orthonormal basis u.)

Let me be explicit. Here are u and u1:

u = \left(\begin{array}{lllll} 0.327517 & 0.309419 & -0.813733 & 0.257097 &   -0.262167 \\ -0.0107664 & -0.571797 & -0.464991 & -0.668451 &   0.0994427 \\ 0.538684 & 0.134501 & 0. & 0. & 0.831703 \\ 0.200401 & -0.746715 & 0. & 0.634172 & -0.00904025   \\ 0.749851 & -0.0404178 & 0.348743 & -0.291376 &   -0.479133\end{array}\right)

u_1 = \left(\begin{array}{ll} 0.327517 & 0.309419 \\ -0.0107664 & -0.571797 \\ 0.538684 & 0.134501 \\ 0.200401 & -0.746715 \\ 0.749851 & -0.0404178\end{array}\right)

u1 is the first two columns of u. Here are v and v1:

v = \left(\begin{array}{lll} 0.485272 & -0.773204 & 0.408248 \\ 0.5729 & -0.0715479 & -0.816497 \\ 0.660528 & 0.630108 & 0.408248\end{array}\right)

v_1 = \left(\begin{array}{ll} 0.485272 & -0.773204 \\ 0.5729 & -0.0715479 \\ 0.660528 & 0.630108\end{array}\right)

v1 is the first two columns of v. You can confirm that we really do have

X = u_1\ w_1\ {v_1}^T\ .

For Malinowski’s model, we choose C = {v_1}^T\ and R = u_1\ w_1\ . We get

R = \left(\begin{array}{ll} 5.33135 & 0.759381 \\ -0.175256 & -1.40331 \\ 8.76875 & 0.330094 \\ 3.26214 & -1.8326 \\ 12.2062 & -0.0991938\end{array}\right)

C = \left(\begin{array}{lll} 0.485272 & 0.5729 & 0.660528 \\ -0.773204 & -0.0715479 & 0.630108\end{array}\right)

If we had used the full SVD, we would have had

R = \left(\begin{array}{lll} 5.33135 & 0.759381 & 0. \\ -0.175256 & -1.40331 & 0. \\ 8.76875 & 0.330094 & 0. \\ 3.26214 & -1.8326 & 0. \\ 12.2062 & -0.0991938 & 0.\end{array}\right)

which simply adds a column of zeroes to Malinowski’s R. I no longer care about that as much as I did when I started Davis. Either form of R is explicitly of rank 2.

I should point out that Malinowski does not describe his decomposition as an SVD. But this is his model. He also says that from

X = PQ


X = RC

where P and Q formed a purely hypothetical decomposition of X, we may conclude R = P and C = Q.

No, not quite. From X = RC we may conclude that we have a decomposition, but the decomposition of X into a product of matrices is not unique, and he knows it. (Which is important. Read him carefully, just as you must read me carefully. He knows what he’s doing, and is well worth reading, though his language is sometimes imprecise.) After asserting R = P and C = Q on p. 31, he shows us on p. 33 that the decomposition is not unique. We can even write a recipe for creating an infinite number of decompositions, from a given one. Take any change-of-basis transition matrix (edit: replace P, which was already in use, by T) T, insert I = T\ T^{-1} into X = R C, and get

X = \left(R\ T\right)\ \left(T^{-1}\ C\right)\ .

That is, we have a new R and a new C. edit: Even if the original R and C had been equal to P and Q, the new ones are not.

Next, he moves on to something rather fascinating, which he calls target testing.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: