PCA / FA tricky preprocessing

Introduction

I have stumbled across a tricky point in the preprocessing of data. The most relevant post is probably

this of April 7. Rather than lecture, let me ask and answer some questions. The fundamental question is:
Can I inadvertently reduce the rank (the dimensionality) of the data matrix?
The answer is yes.
Read the rest of this entry »

Advertisements

PCA / FA . “Preprocessing”

When I moved beyond the first couple of books, I was bewildered by the huge number of alternatives for PCA / FA. I think my final count was that there were 288 different ways of doing it. It was so bad that I put together a form so that whenever I read a new example I could check off boxes for the choices made. As I’ve gotten more experience, I no longer need that checklist.

A lot of those choices pertained to the starting point. If the analysis began with an eigendecomposition, well, it could have been applied to the correlation matrix, or to the covariance matrix – oh, and one text used N instead of N-1 to compute sample variances.

Or an eigendecomposition could have been applied to X^T\ X or to X\ X^T or to both… but X itself could have been raw data, centered data, doubly-centered data, standardized data, or small standard deviates. Oh, and X could have observations in rows (jolliffe, Davis, and I) or in columns (Harman). Oh boy.

Or we could have done an SVD of X, where X itself could have been raw data, centered data, doubly-centered data, standardized data, or small standard deviates, with observations in rows or columns. Yikes!

(not to mention that the data could very likely have been manipulated even before these possible transformations.)

Then I decided that all those choices were pre-processing, not really part of PCA / FA. Actually, I decided that I must have been careless, and missed the point when everyone made it, that one had to decide what pre-processing to do before starting PCA / FA. I was kicking myself a bit.
Read the rest of this entry »