## introduction

For a good reason which I have not yet discussed, Malinowski wants to find x vectors which are close to $\hat{x} = H\ x$ vectors. (His x and $\hat{x}$ are usually written y and $\hat{y}$ for a least-squares fit). He finds a possible x and tests it to see if xhat is close. He recommended computing an intermediate t vector which was the $\beta$ for his least-squares fit to x.

Since he seems to care about t only when $\hat{x}$ is close to x, and since $\hat{x}$ is incredibly easy to compute directly, I prefer to delay the computation of t. Find t after we’ve found a good x.

It will also turn out that he wants a collection of t vectors in order to pick a nicer basis than u or u1. And I’m not going to follow him there, because all of that is what practitioners call “non-orthogonal rotations”. (That strikes me as an oxymoron.) It’s what Harman spends most of his book doing, and that’s where I’ll look if I ever I want to. It’s important, but I’m not going to look at it this time around.

Anyway, we factored the data matrix

$X = \left(\begin{array}{lll} 2 & 3 & 4 \\ 1 & 0 & -1 \\ 4 & 5 & 6 \\ 3 & 2 & 1 \\ 6 & 7 & 8\end{array}\right)$

as X = R C, with R = u1 w1 and $C = v1^T\$.

In addition, I’m keeping the full SVD handy: $X = u\ w\ v^T\$.

For partial reference, we had

$R = \left(\begin{array}{ll} 5.33135 & 0.759381 \\ -0.175256 & -1.40331 \\ 8.76875 & 0.330094 \\ 3.26214 & -1.8326 \\ 12.2062 & -0.0991938\end{array}\right)$

$u = \left(\begin{array}{lllll} 0.327517 & 0.309419 & -0.813733 & 0.257097 & -0.262167 \\ -0.0107664 & -0.571797 & -0.464991 & -0.668451 & 0.0994427 \\ 0.538684 & 0.134501 & 0. & 0. & 0.831703 \\ 0.200401 & -0.746715 & 0. & 0.634172 & -0.00904025 \\ 0.749851 & -0.0404178 & 0.348743 & -0.291376 & -0.479133\end{array}\right)$

$u_1 = \left(\begin{array}{ll} 0.327517 & 0.309419 \\ -0.0107664 & -0.571797 \\ 0.538684 & 0.134501 \\ 0.200401 & -0.746715 \\ 0.749851 & -0.0404178\end{array}\right)$

(u1 is the first two columns of u.)

We computed the hat matrix H directly as

$H = R\ \left(R^T\ R \right)^{-1}\ R^T\$,

and we saw a few examples of computing $\hat{x} = H\ x\$. In particular, we applied H to a vector of all 1s and got

$\hat{x} = \left(\begin{array}{l} 0.308271 \\ 0.503759 \\ 0.849624 \\ 1.04511 \\ 1.39098\end{array}\right)$

=

$\left(\begin{array}{lllll} 0.203008 & -0.180451 & 0.218045 & -0.165414 & 0.233083 \\ -0.180451 & 0.327068 & -0.0827068 & 0.424812 & 0.0150376 \\ 0.218045 & -0.0827068 & 0.308271 & 0.0075188 & 0.398496 \\ -0.165414 & 0.424812 & 0.0075188 & 0.597744 & 0.180451 \\ 0.233083 & 0.0150376 & 0.398496 & 0.180451 & 0.56391\end{array}\right)$ $\left(\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \\ 1\end{array}\right)$

I closed by saying that I could simplify the computation, clarify the concept, and even dispense with H if I chose.

Let’s see all that.

## computational simplification

We saw that $R^T\ R$ was quite simple (diagonal, just the square of w1) when we computed it. First, let’s confirm the simplicity of $R^T\ R$. Since

$R = u_1\ w_1\$,

$R^T\ R = \left(w_1\ {u_1}^T\right)\ \left(u_1 w_1\right) = {w_1}^2\$,

because ${u_1}^T\ u_1 = I$ (u1 is orthonormal although not orthogonal).

Let’s take a look at the H matrix. We have

${\left(R^T\ R\ \right)}^{-1} = w_1^{-2}$ because w1 is diagonal.

Now we can expand the definition of H:

$H = R\ \left(R^T\ R\right)^{-1}\ R^T\$

$= \left(u_1\ w_1 \right) w_1^{-2}\ \left(w_1\ {u_1}^T \right) = u_1\ {u_1}^T\$.

In other words, if we want the matrix H, we just compute $u_1 {u_1}^T\$. That’s a lot easier than using the normal equations. (I should probably say that for regression, the hat matrix is defined as $H = X\ \left(X^T\ X \right)^{-1}\ X^T\$; it simplified to $u_1 {u_1}^T\$ here because R was special.)

## conceptual simplification

Fine, we have simplified the computation of H. This is about the time that I decided to find the matrix of H wrt the new basis u (not u1). I will confess that the hat matrix is a new friend of mine, not an old one; it took me a while to recognize him. The following computation helped a lot.

To transform H to the u basis, we would compute

$P = u^{-1}\ H\ u$

i.e.

$P = u^T\ H\ u$

because u, unlike u1, is orthogonal. We get

$P = \left(\begin{array}{lllll} 1. & 0 & 0 & 0 & 0 \\ 0 & 1. & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0\end{array}\right)$

OMG. Now that is a projection operator, in all its glory. It’s the identity on a 2D subspace, and the zero operator elsewhere.

It projects any vector onto the 2D subspace spanned by the data. All components other than the first two are mapped to zero. (If w1 were 3×3, we would have three 1’s, and be projecting onto a 3D subspace, and so on.)

It turns out that the hat matrix H is always a projection operator; it doesn’t require the special form $H = u_1\ {u_1}^T\$.

Incidentally, there is a very simple test for whether a matrix A represents a projection operator: a matrix is idempotent, $A^2 = A\$, if and only if the matrix represents a projection operator. And if A is idempotent then so is any similar matrix; i.e. if the linear operator is a projection, then all of its matrix representations are idempotent. It’s easy to show that: if B is similar to an idempotent matrix A,

$B = Q^{-1}\ A\ Q$

then B is idempotent too:

$B^2 = \left(Q^{-1}\ A\ Q\ \right) \left(Q^{-1}\ A\ Q \right) = Q^{-1}\ A^2\ Q = Q^{-1}\ A\ Q = B\$.

We can show that the hat matrix is always a projection operator.

I personally find it far more revealing, for this application, that $\hat{x}$ is the projection of x into the 2D subspace than that it is the result of a least-squares fit. It is both, of course; I just prefer seeing H as a projection operator. That speaks to me.

So, although I computed $H = R\ \left(R^T\ R\right)^{-1}\ R^T\$, I could (and should!) have computed

$H = u_1\ {u_1}^T\$.

or, we could have computed H from that nice simple projection operator P:

$H = u\ P\ u^T\$.

Yes, I found P from H, but now that we know that H has that form wrt the u basis, we could define P and compute H from it. Nevertheless, although the projection P is visually stunning, it’s not an easier way to compute H. I would still compute H from u1.

But only if I had some use for H.

## H is convenient, not essential.

Tell me again: what is xhat? It’s the projection of x into the 2D subspace spanned by u1.

So what would happen if we took x, found its components wrt the u basis, and zeroed out the components outside the subspace?

We would have the new components of xhat.

We would also have seen explicitly how far from the subspace x was. Then we could get the old components of xhat. (This works because u1 is a basis for the 2D subspace, and u1 is a subset of u.)

For an example, let’s take his all-1s vector. Get its new components by applying the inverse transition matrix $u^{-1} = u^T\$, and we get

$\left(\begin{array}{lllll} 0.327517 & -0.0107664 & 0.538684 & 0.200401 & 0.749851 \\ 0.309419 & -0.571797 & 0.134501 & -0.746715 & -0.0404178 \\ -0.813733 & -0.464991 & 0. & 0. & 0.348743 \\ 0.257097 & -0.668451 & 0. & 0.634172 & -0.291376 \\ -0.262167 & 0.0994427 & 0.831703 & -0.00904025 & -0.479133\end{array}\right)$ $\left(\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \\ 1\end{array}\right)$

=

$\left(\begin{array}{l} 1.80569 \\ -0.91501 \\ -0.929981 \\ -0.0685591 \\ 0.180805\end{array}\right)$

These are the new components of x; we see that the 4th component is very small, the 5th is small, but the 3rd component is as large as the second. This vector isn’t close to being in the 2D subspace. Now zero out all but the first two components and we have the new components of xhat

$s2 = \left(\begin{array}{l} 1.80569 \\ -0.91501 \\ 0 \\ 0 \\ 0\end{array}\right)$

– but we want the old components of xhat. Fine, the old components of xhat are found by applying the transition matrix u …

$\left(\begin{array}{lllll} 0.327517 & 0.309419 & -0.813733 & 0.257097 & -0.262167 \\ -0.0107664 & -0.571797 & -0.464991 & -0.668451 & 0.0994427 \\ 0.538684 & 0.134501 & 0. & 0. & 0.831703 \\ 0.200401 & -0.746715 & 0. & 0.634172 & -0.00904025 \\ 0.749851 & -0.0404178 & 0.348743 & -0.291376 & -0.479133\end{array}\right)$

to s2, and we get exactly what we computed in the previous post by applying H to x:

$\hat{x} = \left(\begin{array}{l} 0.308271 \\ 0.503759 \\ 0.849624 \\ 1.04511 \\ 1.39098\end{array}\right)$

Because we understand that xhat is the projection of x onto the 2D subspace, and because u was designed to include a subbasis for that 2D subspace, we can compute xhat without using H.

As it happens, I find it easier to compute and use H for finding xhat, but I also really like knowing that I can get additional insight by looking at the new components of x, especially the ones outside the 2D subspace.

In summary then, if I must compare x with $\hat{x}\$, I would compute $H = u_1\ {u_1}^T\ \text{and } \hat{x} = H\ x\$. In the case that x and xhat are not close, that’s when I would use $u^T$ to look at the new components of x, to see the components outside the subspace.