OK, so we orthogonalized the hald data, including the constant (the column of 1s).
What’s the relationship between the new variables and the old? We might someday get a new observation, and if we were using the fit to the orthogonalized data, we might want to see what it predicts for a new data point.
(In all honesty, I would use the original fit – but I still want to know what the relationship is.)
My notation is a little awkward. I’m going to stay with what is used for this post, in which I first showed how to find….
Let me start fresh. If we have two typical data matrices (i.e. taller than wide), and they are supposed to be the same data, how do we find the relationship?
And let me rephrase that. If each row of one matrix is related to the corresponding row of another matrix by a change-of-basis (i.e. by a transition matrix), how do we find the transition matrix?
In addition, I clam that if there is a transition matrix relating the rows, then the columns of the two matrices span the same subspace. If we had, for example, two matrices with 4 columns and 13 rows, and if there were one transition matrix relating all the rows, then the 4 columns of the one matrix would span the same 4D-subspace of R^13 as the 4 columns of the other matrix.
I used that fact in this post about such a transition matrix, but I never actually proved it.
Let me say even more. The existence of a transition matrix relating each row of one matrix to the corresponding row of the other matrix… that’s a fairly simple relationship… each observation in one matrix is just a change-of-basis of the corresponding observation in the other matrix. That’s one way of saying they’re the same data: row by row, they are related by one change of basis.
But when we look at columns, we’re interested in a different relationship. The four columns of the Hald data span a 4D subspace of R^13 (since there are 13 rows). Do the four columns of the orthogonalized data span the same 4D subspace? That, in turn, is equivalent to asking if each column of one matrix is a linear combination of the columns of the other matrix. That’s another way of saying they’re the same data.
One of the things I will prove in this post is that the two relationships are equivalent: if they are the same data in one sense then they are the same data in the other sense. The equation saying that corresponding rows are related by one transition matrix can be interpreted as saying that the columns of one matrix are a linear combination of the columns of the other.
On the other hand, I will also show in this post that the Hald data and its orthogonalization are not so related: orthogonalizing the data is not, in general, a change of basis. But they almost are, and we will – miraculously – find the relationship.
Let me emphasize this: I am going to prove something about the linear relationship
K’ = T J’ or equivalently K = J T’
which does not apply to the affine relationship
K = J T’ + M
which relates the raw and orthogonalized data.
Back to what I was saying: my notation is a little awkward, but I want to retain the original notation from this post.
We suppose we have two data matrices, J and K, and that each row of K is gotten from the corresponding row of J by a transition matrix T. Then we have
K’ = T J’
Recall that a transition matrix maps from new to old, so J is new and K is old. (It would have been better that J be old, but… too late.)
K = J T’
We would like to solve for the transpose of T…
but J is not invertible… so we use the appropriate pseudo-inverse and write
but then – for the hald data K (= X) and the othogonalized data (J = Z), we get
T’ = J’ K because J’J = I.
Here are K and J:
Now we compute T’ = J’K and then transpose to get T.
Now, computing T’ by that formula is all well and good – but we do not know that J and K are in fact related by K = J T’. To put that another way, T’ = J’K does not imply that K = JT’.
In fact, they’re not equal: K ≠ J T’.
Well, well, well. The difference is constant within each column.
so it turns out that the relationship is
K = J T’ + M
K’ = T J’ + M’.
You may not recognize the constants in M, but I did: they’re the means of the columns – i..e what was subtracted to center the data.
So how did we get T right?
We lucked out, in a manner of speaking.
Remember that we created Z by orthogonalizing everything in sight, including a column of 1s. J was cut down from Z by dropping the first, constant, column – but each column of J is still orthogonal to any constant column… and so, each column of J is orthogonal to each column of M:
That is, the true relationship between J and K is
K = J T’ + M
so let’s solve for T’ again.
K – M = J T’
We use the pseudo-inverse again and write
= J’ K – J’M because J’J = 1 (the columns of J are mutually orthonormal)
= J’K because J’M = 0.
So we computed T’ and T correctly even though the relationship was affine rather than linear (J = 0 does not imply K = 0; the origin is not preserved.) But it would not have worked for a general affine transformation; it required J’M = 0. It would not have worked if we had only orthogonalized the data matrix rather than the design matrix and it would not work for PCA (principal component analysis) – unless, I’m pretty sure, the original data were in fact centered.
We got T correctly because J’M = 0 because each column of J was orthogonal to a constant column.
Now, can I show that the columns of J and of K span the same subspace if the rows are related by a transition matrix?
So we’re asking if the columns of K are a linear combination of the columns of J. Treat 4 the columns of J as a basis for a 4D subspace of R^13… then the columns of K span the same 4D subspace if each column of K is a linear combination of the basis.
But isn’t that what we have when we write
K = J T’ ?
Write it out with indices (summing over the repeated index p):
Specialize to the first column of K:
Write it out explicitly:
Now let UC Ji stand for columns of J and let K1 be the first column of K (so we’re getting rid of the subscript m):
The first column of K is a linear combination of the columns of J. Further, the coefficient of the 2nd column J2 of J is is t’21, the (2,1) entry of T’… the coeff of the 3rd column j3 of J is the (3,1) entry of T’. That is, the coeffs on the RHS are the 1st column of T’, i.e. the 1st row of T.
So, yes: the equation K = J T’ can be interpreted as saying that the columns of K are a linear combination of the columns of J, the coefficients for each column of K being the corresponding row of T.
Let’s confirm that.
Unfortunately, we can’t use the J and K we have, because they don’t satisfy K = J T’.
No problem. Redefine K:
(Don’t panic. I’m not messing with the orthogonalized data Z; I’m just creating an example of the equation K = J T’, using the T and J I’ve got at hand.)
Hmm. Maybe I shouldn’t have been so clever. KT[], for example, is literally the first row of K’… and, equivalently, the first column of K. Similarly, T[] is literally the first row of T – hence the first column of T’. So, although I didn’t literally write that J times the first column of T’ was equal to the first column of K… I wrote something equivalent. (Mathematica is happier dealing with rows than with columns. Or maybe it’s just that I’m happier with what Mathematica does.)
Alternatively, we could have tried to solve for coefficients that make K1 a linear combination of the columns of J. That’s the straight-forward thing to do: prove it’s a linear combination by finding the coefficients of the combination. I clear a-d… write a set of equations… solve… and compare the solution to the first row of T. Got it!
The orthogonalized data J is related to the original Hald data by an affine (origin-shifting but otherwise linear) relationship:
K = J T’ + M.
From that relationship, we compute T’ = J’K and verify the relationship. It was crucial that J’M = 0, i.e. that the columns of J were orthogonal to a constant column – and we had imposed that condition by orthogonalizing the design matrix with its column of 1s.
And we showed that purely linear relationship
K = J T’,
which says that each row of K is related to the corresponding row of J by the transition matrix T (describing a change of basis)… can be interpreted as meaning that each column of K is a linear combination of the columns of J, with each row of T’ being the coefficients in the combination. Just remember that this linear case is not the affine relationship between the raw data and the orthogonalized data. I was just clearing up some old business for the linear case while looking at the affine case:
K = J T’ + M,
because T is the same.