## PCA / FA . “Preprocessing”

When I moved beyond the first couple of books, I was bewildered by the huge number of alternatives for PCA / FA. I think my final count was that there were 288 different ways of doing it. It was so bad that I put together a form so that whenever I read a new example I could check off boxes for the choices made. As I’ve gotten more experience, I no longer need that checklist.

A lot of those choices pertained to the starting point. If the analysis began with an eigendecomposition, well, it could have been applied to the correlation matrix, or to the covariance matrix – oh, and one text used N instead of N-1 to compute sample variances.

Or an eigendecomposition could have been applied to $X^T\ X$ or to $X\ X^T$ or to both… but X itself could have been raw data, centered data, doubly-centered data, standardized data, or small standard deviates. Oh, and X could have observations in rows (jolliffe, Davis, and I) or in columns (Harman). Oh boy.

Or we could have done an SVD of X, where X itself could have been raw data, centered data, doubly-centered data, standardized data, or small standard deviates, with observations in rows or columns. Yikes!

(not to mention that the data could very likely have been manipulated even before these possible transformations.)

Then I decided that all those choices were pre-processing, not really part of PCA / FA. Actually, I decided that I must have been careless, and missed the point when everyone made it, that one had to decide what pre-processing to do before starting PCA / FA. I was kicking myself a bit.

When I sat down to summarize the material, I discovered that most people seem to prefer a specifc form of pre-processing, although most do, in fact, describe alternatives. (It felt like I had to look hard to see their alternatives.) I think a description of PCA / FA should not restrict itself to a specific choice of pre-processing. I’m not ready to summarize PCA / FA for my own purposes – we have to look at two more examples, at the very least – but I am prepared to say that the construction of X, pre-processing the data, should be viewed as a separate activity, possibly requiring its own justification in one’s report. I’d certainly like to see more discussion of it in the texts.

The social sciences, geology, chemistry, and oceanography all specify different constructions for X. The social sciences (and jolliffe and Harman) seem to favor the correlation matrix (which is tantamount to using standardized data). Geology favors centered data (which is tantamount to using the covariance matrix). Chemistry, however, says we should use the raw data (which takes us into uncharted waters). And oceanography says we should remove a linear trend (or more) from the data!

These are serious decisions to make. Computing an SVD or an eigendecomposition is pretty easy nowadays, a lot easier than it was when PCA / FA were invented. Now that the computations are so easy, we can afford to pre-process the data in more than one way. But I’d still like to see more discussion of one choice over another.

And I can’t settle it. See what is customary in your own field. Talk to your colleagues. Try more than one form of preprocessing.

We’ve already seen that for Davis’ (admittedly artificial) example, an eigendecomposition of the correlation matrix did not lead to new data that was manifestly 2D. (here) In a sense the correlation matrix was not as effective at showing the 2D dimensionality of the data as the covariance matrix was (strictly speaking, as the SVD of the centered data was and as the covariance matrix would have been).

I see four major possibilities for preprocessing, but these are not the only possibilities:

1. raw data,
2. centered data,
3. doubly-centered data,
4. and standardized data.

Raw data and doubly-centered data are symmetric choices: we could view a Q-mode analysis of X as an R-mode analysis of $X^T\$.

Maybe I should be more precise there. My working definition of an R-mode analysis of X is: an eigendecomposition of $X^T\ X\$; my working definition of a Q-mode analysis of X is: an eigendecomposition of $X\ X^T\$. We accomplish them simultaneously by looking at the SVD of X.

But they would be two distinct analyses if we required, for example, standardized columns for $X^T\ X\$ and standardized rows for $X\ X^T\$. The X’s themselves would be different. This troubles me at present. OTOH, I have not seen Q-mode analysis carried out by its proponents, so maybe i’m troubled by shadows rather than substance.

One of my criteria for categorizing choices as major is: when do we affect the eigenvectors? The eigenvalues, by contrast, are all too easy to change: $X^T\ X\$ of centered data has the same eigenvectors as the covariance matrix of X, but the eigenvalues differ by a common scale factor. I don’t see that as a major difference. More and more, I am inclined to convert the eigenvalues to percentages just so I don’t have to worry about scale factors between slightly different computational choices.

Then we have two major choices for executing the PCA / FA:

1. an eigendecomposition of some square matrix or matrices,
2. or an SVD of the data matrix X.

There may be some relationships between the SVD and the eigendecomposition, depending on what square matrix or matrices we choose. We have a lot of freedom, perhaps too much, in our choice of square matrices.

Whichever we choose, however, we will end up with some eigenvalues and eigenvectors, or principal values and (technically) left and right principal vectors, to give the columns of u and v of the SVD their proper names. For our purposes, where $X^T\ X\$ and $X\ X^T\$ are lurking about if not standing in full sunlight, it is convenient to call the u and v eigenvector matrices even when they come from the SVD.

Frankly, when I’m reading a new example, the first thing I want to find is the eigenvector matrix; I’ll go so far as to say, if there isn’t one somewhere, it isn’t PCA / FA. “There must be a pony.”

As a computational consideration, the eigendecomposition may or may not yield an orthogonal eigenvector matrix; and the SVD may or may not yield orthogonal u and v. the eigendecomposition should, however, yield orthogonal eigenvectors even if they need to be normalized to length 1; conversely, the SVD should yield sets of orthonormal vectors, even if there aren’t enough of them to make square (orthogonal) matrices.

To rephrase that, the eigendecomposition should always yield a square matrix, whose columns may need to be normalized; the SVD should always yield orthonormal vectors, in matrices that may not be square.

Nevertheless, in principle we could have orthogonal matrices of eigenvectors from either the eigendecomposition or the SVD, and in practice, I do. And I keep them close to hand, in addition to doing whatever people want me to do to them.

Oh, we do some post-processing, too.

1. Leave the orthonormal eigenvectors alone?
2. Weight them by the square roots of their eigenvalues?
3. How about by the inverse square roots of their eigenvalues?
4. I may even be able to find an example where they were weighted by the eigenvalues!

(And of course, if we use the SVD (me raises hand) the principal values correspond to the square roots of the eigenvalues.)

Or, if we did want unique eigenvectors – at least from real data – we could scale each eigenvector so its largest component was +1. For artificial examples, that may not be unique, but for real data, it seems unlikely that any eigenvector would have two identical largest components of opposite sign. (That’s when this prescription would fail to work.) In any case, it is a prescription I have encountered .

Now I think I’m ready for chemistry. I hope you are, too.

Advertisements