## Transition matrix: to be or not to be

Cohen (“Visual Color & Color Mixture”, see the bibliography) did something very interesting. In fact, he did something useful which I had never seen before.

Although this post uses some matrices which we saw in the color posts, I think this can stand on its own: you need not have read the color posts. But if you are specifically interested in color, or in Cohen’s work, this post is very relevant.

He was trying to describe how to find a transition matrix between two given data matrices. This will come in handy — very handy! — whenever people give the alleged result of an unspecified linear transformation of a data matrix.

On p. 93, he wrote

D T = C in general

F T = A in particular

with

$T = (D'D)^{-1} D'\ C\$.

Because D’D (that is, $D^T\ D\$) is invertible, we infer that D and C are taller than wide, like the A matrix… like our usual data or design matrix X.

We also infer that T is a small matrix: its size is cxc, where c is the number of columns of D. D and C are the same size.

I do have to point out that in his equation (and his notation)

D T = C,

T is an attitude matrix! I will change that. (Trust me: you want me to change that.)

Since we have both XYZ (i.e. 1931 xyz bar) and RGB (1931 rgb bar) tables, and since they are supposed to be related by the transformation matrix T31…

$T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)$

… we should try the idea embodied in that equation for T. Assuming that there is a transition matrix T, let J and K stand for the XYZ and RGB matrices respectively.

As usual, let me use 20 nm intervals instead of the published 5 nm intervals. We have

$J = \left(\begin{array}{ccc} 0.00003 & -0.00001 & 0.00117 \\ 0.0003 & -0.00014 & 0.01214 \\ 0.00211 & -0.0011 & 0.11541 \\ -0.00261 & 0.00149 & 0.31228 \\ -0.02608 & 0.01485 & 0.29821 \\ -0.04939 & 0.03914 & 0.14494 \\ -0.07173 & 0.08536 & 0.04776 \\ -0.09264 & 0.17468 & 0.01221 \\ -0.03152 & 0.21466 & 0.00146 \\ 0.0906 & 0.19702 & -0.0013 \\ 0.24526 & 0.1361 & -0.00108 \\ 0.34429 & 0.06246 & -0.00049 \\ 0.29708 & 0.01828 & -0.00015 \\ 0.15968 & 0.00334 & -0.00003 \\ 0.05932 & 0.00037 & 0 \\ 0.01687 & 0.00003 & 0 \\ 0.0041 & 0 & 0 \\ 0.00105 & 0 & 0 \\ 0.00025 & 0 & 0 \\ 0.00006 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)$

$K = \left(\begin{array}{ccc} 0.0014 & 0. & 0.0065 \\ 0.0143 & 0.0004 & 0.0679 \\ 0.1344 & 0.004 & 0.6456 \\ 0.3483 & 0.023 & 1.7471 \\ 0.2908 & 0.06 & 1.6692 \\ 0.0956 & 0.139 & 0.813 \\ 0.0049 & 0.323 & 0.272 \\ 0.0633 & 0.71 & 0.0782 \\ 0.2904 & 0.954 & 0.0203 \\ 0.5945 & 0.995 & 0.0039 \\ 0.9163 & 0.87 & 0.0017 \\ 1.0622 & 0.631 & 0.0008 \\ 0.8544 & 0.381 & 0.0002 \\ 0.4479 & 0.175 & 0. \\ 0.1649 & 0.061 & 0. \\ 0.0468 & 0.017 & 0. \\ 0.0114 & 0.0041 & 0. \\ 0.0029 & 0.001 & 0. \\ 0.0007 & 0.0003 & 0. \\ 0.0002 & 0.0001 & 0. \\ 0. & 0. & 0.\end{array}\right)$

In case you have not read the color posts, all I am saying is that J and K and T31 are given matrices, and T31 is supposed to be a transition matrix between J and K.

We do know — or take it as given — that for the transition matrix T31, old is XYZ ~ K, so let me write

K’ = T J’

That says T is a transition matrix which, applied to each column vector of J’, delivers the corresponding column of K’.

Transpose:

K = J T’

and then pretend $J^{-1}$ exists (even though J is rectangular)

$T' = J^{-1} K \$

and then substitute the appropriate pseudo-inverse:

$T' = (J'J)^{-1} J'\ K\$.

(The inappropriate pseudo-inverse would use JJ’ instead — which cannot be of full rank, given the assumed shape of J, hence cannot be invertible.)

That equation defined the transpose T’, but here’s T:

$T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.06008 \\ 0.00004 & 0.05661 & 5.59443\end{array}\right)$

Recall the given T31:

$T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)$

By doing it my way, I constructed the counterpart to T31, which is the published transition matrix between XYZ and RGB (1931) with XYZ considered the “old” basis.

T and T31 are very close, but not the same.

Even if I used the full tables, at 5 nm intervals, we would not get T31.

First, I am certain that if there were a transition matrix T, then we must have computed it. Since this computation did not yield T31, I infer that T31 is not the transition matrix.

Second, is T the transition matrix? We should apply T to the appropriate table and see what we get. We had assumed

K’ = T J’, or equivalently

K = J T’

so let’s compute a counterpart to K, namely Q = J T’:

$Q = \left(\begin{array}{ccc} 0.00139 & 0.00005 & 0.00654 \\ 0.01431 & 0.00039 & 0.06791 \\ 0.13434 & 0.00399 & 0.64559 \\ 0.3483 & 0.02299 & 1.74711 \\ 0.29082 & 0.06001 & 1.66915 \\ 0.09561 & 0.139 & 0.81307 \\ 0.00489 & 0.323 & 0.27202 \\ 0.06329 & 0.71 & 0.07819 \\ 0.29041 & 0.95401 & 0.02032 \\ 0.59452 & 0.99499 & 0.00388 \\ 0.91629 & 0.87 & 0.00167 \\ 1.06216 & 0.631 & 0.00081 \\ 0.85443 & 0.38099 & 0.00021 \\ 0.44795 & 0.17501 & 0.00003 \\ 0.1649 & 0.06102 & 0.00002 \\ 0.04676 & 0.01701 & 0 \\ 0.01135 & 0.0041 & 0 \\ 0.00291 & 0.00105 & 0 \\ 0.00069 & 0.00025 & 0 \\ 0.00017 & 0.00006 & 0 \\ 0 & 0 & 0\end{array}\right)$

and compare it to K, via Q – K:

(I keep using pictures for things like that, namely scientific notation, just because it’s easier than editing the LaTeX.)

The largest entry in absolute value is… 0.000069765 . The three columns of Q look like this:

Pretty small, and pretty random, but still not really zero. Well, maybe it wouldn’t be. Let’s try our magic formula on J and Q, since they are related by the transition matrix T. From

$T' = (J'J)^{-1} J'\ K\$,

I compute what I call t’:

$t' = (J'J)^{-1} J'\ Q\$,

and then display the transpose of the transpose, t” = t:

$t = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)$

Recall T:

$T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)$

So. That equation for T’ does recover the transition matrix if there is one. The discrepancy between T and T31 is real.

Maybe I’m making too big a deal of this. I infer that the XYZ and RGB tables do not span exactly the same subspace. They almost do, but not quite.

To be more explicit: the 1931 XYZ (xbar, ybar, zbar) and RGB (rbar, gbar, bbar) tables are not exactly related by any transition matrix whatsoever, although the published T31 is close to being one for them.

I have found a marvelous example — disappointing as a published result but marvelous as a bad example — where the discrepancy is significant. I expect to show it in another post, soon.

We’ve been looking at the computation in one way: if there is a transition matrix T such that

K’ = T J’

then

$T' = (J'J)^{-1} J'\ K\$.

What if there is no such transition matrix T? We can still define T by that equation, and compute $(J'J)^{-1} J'\ K\$ (still assuming J is of full rank, with more rows than columns), but what are we getting, if T itself does not exist as a transition matrix?

That is, what if we compute T, and find that

$K' \ne T\ J'\$?

Well, this will be clearer in the next post, but…. That equation for T’ looks an awful lot like the normal equations for a least-squares fit.

What I will illustrate in the next post is that $(J'J)^{-1} J'\ K\$ generates a transition matrix to the subspace spanned by J. That is, T is a transition matrix — any invertible matrix is! — but it relates J to a subspace other than (the one spanned by) K.

(In our case, we had 21 observations; the 3 columns of J and of K define 3D subspaces of $R^{21}\$. Those two 3D subspaces are almost the same, but not exactly.)

Let me try to explain here it without a good (I mean, bad) example. If J and K span the same subspace, then any one column of K (think of it as the dependent variable y) can be written as a linear combination of the columns of J (think of it as the X matrix).

But if J and K do not span the same space, then the best we can do is to find that linear combination (call it yhat !) of columns of X which is as close to y as possible.

They really are examples of y and yhat and X.

And what is T’ ? Well, if we write it with y and X…

$(X'X)^{-1} X'\ y\$

we might recognize that as $\beta$ in ordinary least squares.

Let me show you that, while providing another example of the utility of the pseudo-inverse.

The derivation of the normal equations for ordinary least squares is fairly complicated, if only because it involves the derivative of a matrix. But the pseudo-inverse would let us recover the equations themselves.

Deriving them is one thing; recovering the form of them once we know they exist is another.

We write our model (this is a quick & dirty recollection of the answer, not a derivation):

$y = X \ \beta\$.

(I guess I have to talk about that. True is $yhat = X\ \beta\$. But I write y instead of yhat, as though we have equality rather than a projection onto a subspace.)

We pretend that X is invertible, and write

$\beta = X^{-1} y\$.

Since X is not invertible, but X’X is (by assumption X is of full rank with more rows than columns), we replace the non-existent inverse by the pseudo-inverse and write

$\beta = (X'X)^{-1} X'\ y\$.

The real derivation gave us that very answer. Once I know that, I never need to compare them again. I have a plausibility argument that gives me the known right answer.

This is analagous to confirming in the previous color post, among others, that if I compute

$E' = (A'A)^{-1} A'\$

then I have a dual (or reciprocal or biorthogonal) basis E such that

E’A = I.

In that case, I prefer to verify it computationally every time.

Incidentally, this trickery with the pseudo-inverse would also generate a true result if I had started with yhat instead of y — it is true that $\beta = (X'X)^{-1} X'\ \hat{y}\$, but it’s not very useful because we don’t know yhat!

You might have noticed that $\beta$ corresponds, algebraically, to a column of T’. Asking if the first column of K is a linear combination of all the columns of J gives us the first column of T’; asking if the ith column of K is a linear combination of all the columns of J gives us the ith column of T’. This, too, will be in the next related post.

## Summary

Given two data matrices J and K, tall and thin, we can define and compute

$T' = (J'J)^{-1} J'\ K\$.

If the individual observations are in fact related by a transition matrix M, so that

K’ = M J’,

then our computed T is M:

T = M.

This is pretty slick. Any time someone hands us some data and the alleged result of applying an unspecified linear transformation to it — we can find that linear transformation if it exists, or demonstrate that it does not exist (in which case, they goofed).

And if there is no such matrix M? Well, it’s simple enough: if there is no such M — and J and K are both of full rank — then our computed T must be something else, so we do not have equality:

$K' \ne T\ J'\$.

That’s all it takes for us to know that there is no transition matrix M between J’ and K’. If there were, it would be T, but that didn’t work.

(That’s if J and K are both of full rank; if only one is not, then one of J’J or K’K is actually not invertible — as JJ’ and KK’ are not invertible — and T can only be computed in one direction. I expect that T is not invertible if J and K are not both of full rank.)

And when there is no transition matrix M, we nevertheless get a transition matrix T, but it’s between J’ and Q’ = T J’.

I will freely confess that I did not know any part of this before I read Cohen: neither that we could find the transition matrix so easily; nor what we were finding if it were not a transition matrix.

(Cohen implicitly assumes that there exists a transition M; he does not discuss what happens if there isn’t one.)

Next post on this topic, a numerical illustration.