PCA / FA malinowski: example 5. target testing

Recall that we computed the SVD $X = u\ w\ v^T\$ of this matrix:

$X = \left(\begin{array}{lll} 2 & 3 & 4 \\ 1 & 0 & -1 \\ 4 & 5 & 6 \\ 3 & 2 & 1 \\ 6 & 7 & 8\end{array}\right)$

and we found that the w matrix was

$w = \left(\begin{array}{lll} 16.2781 & 0. & 0. \\ 0. & 2.45421 & 0. \\ 0. & 0. & 0. \\ 0. & 0. & 0. \\ 0. & 0. & 0.\end{array}\right)$

Because w has only two nonzero entries, we know that X is of rank 2. Its three columns only span a 2D space.

Given a column of data x (a variable, in this example, of length 5), Malinowski wants to know if it is in that 2D space. As he puts it, “if the suspected test vector [x] is a real factor, then the regeneration $\hat{x} = R\ t$ will be successful.” He gives us a formula for computing t; by a successful regeneration, he means that $\hat{x}\$ is close to x.

He says that this is how it is done in practice: given a “suspected test vector” x, compute t …, compute R t, and compare R t to x.

We will see that he has some use for t apart from this testing of x. Otherwise, if he doesn’t need t, what he’s doing is unnecessarily complicated. I want to focus on his “regeneration” computation. He says that

$t = {\left(R^T\ R\right)}^{-1}\ R^T\ x\$.

You might recognize that as the normal equations

$\beta = {\left(X^T\ X\right)}^{-1}\ X^T\ y\$

of ordinary least squares regression. (You might look at my “math OLS” category.) He counts on this, and says (p. 57) that “A least-squares procedure is employed to generate a transformation vector [t] that will yield a predicted vector… most closely matching [x]….”

(I would argue that’s a little misleading. The predicted vector is restricted to lie in the 2D subspace.)

$y = X\ \beta + e = \hat{y} + e$

his implicit regression model is

$x = R\ t + e = \hat{x} + e\$.

The analog of $\hat{y} = X\ \beta$ is $\hat{x} = R\ t\$: his t’s are the $\beta\$‘s, his xhat is the yhat. In the customary notation, he’s computing yhat and comparing it to a “suspected test vector” y.

In his notation, he wants to compute R t and compare it to x.

Fine, let’s do it. We know enough to label his R t as
$\hat{x}\$. We have

$\hat{x} = R\ t = R\ {\left(R^T\ R\right)}^{-1}\ R^T\ x =: H x\$,

so let’s find the matrix

$H = R\ {\left(R^T\ R\right)}^{-1}\ R^T\$.

I have called it H because it has a name, the hat matrix. It is ubiquitous in regression diagnostics. Note that we are working with the R matrix, not with the full X matrix. Boy, does that matter!

we have R ( = u w)…

$R = \left(\begin{array}{ll} 5.33135 & 0.759381 \\ -0.175256 & -1.40331 \\ 8.76875 & 0.330094 \\ 3.26214 & -1.8326 \\ 12.2062 & -0.0991938\end{array}\right)$

and

$R^T\ R = \left(\begin{array}{ll} 264.977 & 0 \\ 0 & 6.02317\end{array}\right)$

and its inverse…

${\left(R^T\ R\right)}^{-1} = \left(\begin{array}{ll} 0.00377391 & 0. \\ 0. & 0.166026\end{array}\right)$

We really want to observe that $R^T\ R\$ is the squared nonzero singular values, w1^2:

$w1 = \left(\begin{array}{ll} 16.2781 & 0. \\ 0. & 2.45421\end{array}\right)$

${w_1}^2 = \left(\begin{array}{ll} 264.977 & 0. \\ 0. & 6.02317\end{array}\right)$

(Believe me, we will pursue that later.)

We premultiply by R and post-multiply by $R^T\$, getting

$H = \left(\begin{array}{lllll} 0.203008 & -0.180451 & 0.218045 & -0.165414 & 0.233083 \\ -0.180451 & 0.327068 & -0.0827068 & 0.424812 & 0.0150376 \\ 0.218045 & -0.0827068 & 0.308271 & 0.0075188 & 0.398496 \\ -0.165414 & 0.424812 & 0.0075188 & 0.597744 & 0.180451 \\ 0.233083 & 0.0150376 & 0.398496 & 0.180451 & 0.56391\end{array}\right)$

Suppose we now try this on one of the columns of the X matrix. Here’s the first column:

$x_1 = \left(\begin{array}{l} 2 \\ 1 \\ 4 \\ 3 \\ 6\end{array}\right)$

And here’s H applied to it:

$H\ x_1 = \left(\begin{array}{l} 2. \\ 1. \\ 4. \\ 3. \\ 6.\end{array}\right)$

He would say, we regenerated the input. I would say, our regression has been perfect. (My suspect was known to lie in the 2D space. because it was a column of X.) Let’s try his favorite suspect in general: all 1’s. Here’s our test vector…

$x_2 = \left(\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \\ 1\end{array}\right)$

and here’s H applied to it:

$H\ x_2 = \left(\begin{array}{l} 0.308271 \\ 0.503759 \\ 0.849624 \\ 1.04511 \\ 1.39098\end{array}\right)$

Now that’s quite a different outcome. This $\hat{x}\$ is not close to x = x2. We did not regenerate the input. This is not a good regression. That vector x2 is not in the 2D subspace, not by a long shot.

His next favorite test is the basis vectors. We could apply H to each of the basis vectors and see what we get. But we know that the images under H of the basis vectors are the columns of H, so what we would get is the columns of H. Let’s do some serious rounding and just see if any of the columns of H look like the appropriate basis vectors. (That is, for example, does the second column look like the second basis vector, i.e. 1 on the diagonal and zero elsewhere?)

$H \approx\ \left(\begin{array}{lllll} 0 & 0 & 0 & 0 & 0 \\ 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0.5 & 0 & 0.5 \\ 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0.5 & 0 & 0.5\end{array}\right)$

I would say not: never mind the diagonals, the columns have either zero or two large entries, instead of one large entry.

In summary, we can compute $\hat{x}\$. But what’s going on? As we see this more clearly, we will discover three things.

1. we can make a significant computational simplification.
2. we can make a significant conceptual breakthrough.
3. we can dispense with the hat matrix H completely.

But all of that depends on having u and $u_1\$, that is, all of it depends on using the SVD in both forms:

$X = u\ w\ v^T = u_1\ w_1\ {v_1}^T$

We must know that $R = u_1\ w_1\$, not just that $R = X\ v_1\$.