Color: Cohen Figure 20, “an intriguing F matrix”

I struggled to make sense of pages 94-101 of Cohen’s “Visual Color and Color Mixture”.

I understand them now. Mostly.

The challenge was to derive a drawing of three basis vectors (Figure 20, p. 95), shown here :

He calls these three vectors both “an intriguing F matrix” and “This canonical orthonormal configuration….” But as far as I can tell, he never said where he got this particular one. He made it sound important, so I had to derive it. Figuring it out, as usual, was very educational.

What follows is a detective story.

I eventually decided that the most likely candidate was a “transformation matrix” on p. 101. There are five transformation matrices on that page, but the key is that exactly one of them is associated with an equal-energy illuminant — and that’s related to “EE” in the figure.

Let us proceed somewhat as I did, but without all the wrong turns and blind alleys. If you want a chance to wander those byways, then close this page, open a copy of Cohen, and go to work. OTOH, I’m about to take you down one, just one, wrong turn. Hey, it’s interesting and valuable.

First, let’s get that particular transition matrix of his. He denotes it T_E but I will just refer to it as “TE” (outside of LaTeX).

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

He refers to the five T’s as “direct transformation matrices from the 1964 CIE X,Y,Z color-matching functions to orthonormal F matrices.” TE is specifically related to the equal-energy illumninant; the other four are related — in the same but as yet unexplained fashion — to the standard illuminants A, B, C, and D_65. (I have done the work to confirm that.)

OK. TE relates an A matrix to an F matrix. But how? “Transformation matrix” just doesn’t tell me enough.

A has 3 columns, so we could compute A TE, i.e. post-multiply A by TE. Since T_E^T\ , T_E^{-1}\ , and T_E^{-T} are all the same size, we could use any of them in place of TE.

Well, A TE doesn’t look right — I dismissed it immediately when I first saw it — but eventually I saw the light. A TE does not produce an orthonormal matrix — but it’s only off by a factor of \sqrt{2}\ (and some change).

Let’s compute A TE and call it F….

Oh, first I had better get the A matrix.

Cohen uses the 1964 CIE tables for this, so I take the A matrix to be the 1964 XYZ (i.e. xyz bar) tables at 5 nm intervals. Not at 20 nm. My only target is a drawing, and I want a lot of detail. I plan to repeat part of these calculations, and to do some alternatives, at 20 nm in the next post. But for this one, I need detail.

I have found a set of these tables online. Let me show you, for checking purposes, a tiny subset of the XYZ tables as I store them:

\left(\begin{array}{cc} 400 & \{0.0191,0.002,0.086\} \\ 405 & \{0.0434,0.0045,0.1971\} \\ 410 & \{0.0847,0.0088,0.3894\} \\ 415 & \{0.1406,0.0145,0.6568\} \\ 420 & \{0.2045,0.0214,0.9725\} \\ 425 & \{0.2647,0.0295,1.2825\}\end{array}\right)

My A matrix has 3 columns and 61 rows. The six rows of A corresponding to that part of the XYZ64 table are:

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.0434 & 0.0045 & 0.1971 \\ 0.0847 & 0.0088 & 0.3894 \\ 0.1406 & 0.0145 & 0.6568 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.2647 & 0.0295 & 1.2825\end{array}\right)

Mathematica aside.
There, I’ve finally shown you how I’m getting different A matrices whenever I change things. The table XYZ64 — like my other three tables — has wavelength for the first argument, and the Interpolation command lets me select any subsets, or even to interpolate. (It always gets the actual points right. In fact, I am usually simply getting subsets of the data, not truly interpolating. If I ever ask for 1 nm intervals, then I would be interpolating from the 5 nm table.)

For example, the following command gives me the entries for an A matrix from 400 to 500 in 20 nm intervals (and displays it nicely):

Table[Interpolation[XYZ64,i],{i,400,500,20}]//MatrixForm

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185\end{array}\right)

So, once having set up my XYZ64 table, and the others, I can get any plausible A matrix I want.

end Mathematica aside.

For this problem, my A matrix is the entire 1964 CIE XYZ table at 5 nm intervals. Now we compute

F = A TE

and plot the results. Let me put the two drawings side-by-side (mine on the left):

That black curve could be his “EE” on a different scale, but my other two are utterly different from “I” and “II”.

Let me show you the first six rows of F:

\left(\begin{array}{ccc} 0.0118144 & -0.0177697 & 0.00228112 \\ 0.027023 & -0.0406959 & 0.00540148 \\ 0.0532659 & -0.0802704 & 0.0112139 \\ 0.0895494 & -0.135158 & 0.0200304 \\ 0.132212 & -0.19958 & 0.0316765 \\ 0.174118 & -0.262005 & 0.045079\end{array}\right)

Cohen said that F would be orthonormal. It is not. We would conclude that F was orthonormal if F’F = I. So we compute F’F:

F^TF = \left(\begin{array}{ccc} 2.05018 & 0.00324299 & -0.0144556 \\ 0.00324299 & 2.00107 & 0.000405886 \\ -0.0144556 & 0.000405886 & 2.0003\end{array}\right)

Round it off:

F^TF = \left(\begin{array}{ccc} 2.05 & 0 & -0.01 \\ 0 & 2. & 0 \\ -0.01 & 0 & 2.\end{array}\right)

I have no idea why his F — equivalently, his TE — is too large by \sqrt{2}\ . (The off-diagonal terms are almost zero, so these vectors are almost orthogonal. But their squared lengths are 2, so their lengths are \sqrt{2}\ . We could have fixed that by dividing TE by \sqrt{2}\ .)

Since the book was published posthumously, perhaps the editor was editing without computing. Perhaps he took what he found in Cohen’s notes or drafts and hoped that everything was consistent.

I assure you that the other possibilities (T_E^T\ , T_E^{-1}\ , and T_E^{-T}) do not come close to delivering an orthonormal F. Feel free to try them.

We conclude that there is a minor error in TE; it turns out that the other four transformation matrices on p. 100 also need to be divided by \sqrt{2}\ . Whatever he had in mind, he was consistent. Under the circumstances — posthumous publication, and the ease of computing unit vectors — this is rather unimportant — conceptually and computationally — once we know it.

More importantly, I think, we also conclude that the F produced by TE is not the F matrix in Figure 20. They seem to have — and actually do have — proportional first columns, but that’s all.

That is a little more troubling. I could have searched for a derivation of TE; instead, I searched for a derivation of Figure 20. I don’t care to do both. Let me be explicit: I don’t know how he got his TE matrix, nor any of the other 4 transformation matrices on p. 100. I do know how he got Figure 20.

It will be more than enough work to get Figure 20. I don’ t know where TE came from. I have some ideas, and I expect that you will too by the time we finish, but I’ m done with TE. Maybe I shouldn’t be, but enough is enough.

I should emphasize, however, that I am keeping the first column of F computed from TE, at least for a little while.

Oh, let me do one last thing with TE before I abandon it.

We know the relationship between A and TE: F = A TE. How would we express that in my usual terminology? Using T for TE, we have

F = A T

and we transpose, getting

F’ = T’ A’.

That says that each column of F’ is obtained by applying the matrix T’ to the corresponding column of A’. That tells us two or three things.

First and second, T’ is the transition matrix, and therefore his “transformation matrix” TE is actually an attitude matrix; third, he takes A as the new components and F as the old.

Just so I know.

Now let’s find a way to match Figure 20.

First, where did the first column — the black curve or “EE” — come from? (Yes, we showed that F produced from TE has a first column which is proportional to the “EE” curve. But where did this common vector come from?)

He tells us that it is proportional to the fundamental of the illuminant.

Ok. To get the fundamental, we need the projection operator. To get that, we need the SVD (Singular Value Decomposition) of the color-matching functions, i.e. of the matrix A^T\ .

A’ = u w v’.

More to the point, we need the first three columns of the matrix v. (We need three columns because A’, like A, is of rank 3.)

Let me show you the first six (of 61) rows of v1:

\left(\begin{array}{ccc} -0.0126669 & 0.00772016 & 0.00299387 \\ -0.0289985 & 0.0177432 & 0.00674605 \\ -0.0572094 & 0.0351751 & 0.0129521 \\ -0.0963098 & 0.0596099 & 0.0210865 \\ -0.142331 & 0.0886581 & 0.0298013 \\ -0.187415 & 0.117288 & 0.0368556\end{array}\right)

Those 3 column vectors are an orthonormal basis for the preimage of the range of A’ (“the fundamental color space”). We form a projection operator by

R = v1 v1′

Now to get compute the fundamental of an equal-energy illuminant ee, we need to define the illuminant (I used all 100’s):

ee = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

Its fundamental f is the projection under R:

f = R ee =

{5.53525, 12.6614, 24.9595, 41.9659, 61.9674, 81.6212, 98.8882, 114.647, 125.821, 130.294, 129.298, 124.753, 116.918, 107.167, 95.2578, 80.7352, 68.1596, 60.0607, 54.8153, 53.7962, 55.5973, 59.7538, 65.4578, 72.9488, 81.1471, 88.5547, 95.6709, 102.8, 109.309, 114.255, 118.44, 122.816, 126.542, 129.052, 130.251, 129.621, 127.996, 126.529, 123.888, 119.146, 112.759, 105.059, 96.1651, 86.301, 76.0275, 65.9399, 55.8152, 45.5545, 36.3286, 28.6452, 22.185, 16.823, 12.5322, 9.19186, 6.64563, 4.72835, 3.33391, 2.32954, 1.61867, 1.12825, 0.779473}

Let me graph a scaled version of f along with the first column of F. (Scaled? I have to divide f by 470. Trust me.)

As I have done before, the black dots are one vector — in this case, the fundamental f — and the red curve is the other vector, in this case the first column of F.

We see that they are the same. Where we used to know that his TE gave us a column of F proportional to the EE curve in Figure 20, now we know what the curve is. Now I can dispense with F computed from TE, because I can use the fundamental f of the equal-energy illuminant instead.

The horizontal axis is off — I didn’t bother showing the wavelength.

That scaling by 470? Ha! The 1964 XYZ table at 1 nm intervals is over the range 360 to 830 nm. And 830-360 = 470.

(Probably “the length of the interval of integration”. Not that an integral is appropriate, in my opinion. And Cohen has pointedly been doing linear algebra, not calculus. But the literature is littered with integrals, even though the data is discrete. This is the mathematics I am skeptical of in all such works: they use calculus for their theory but linear algebra for their computations. (The example from Wyszecki & Stiles was closer to linear algebra, and the example they did two pages later is clearly not done by integration — no, I didn’t work that example.) I suspect it’s because a long time ago we pretty much restricted ourselves to continuous systems…. But I keep coming back to the fact that their data — the color matching functions — are discrete, and linear algebra is the appropriate tool. That’s my opinion.)

That’s where the 470 seems to be from, but I don’t really care. It would have made more sense for him to have used 400-700 instead of 360-830, and that’s one reason I don’t care about the scaling. Further, and more importantly, we will ultimately make a unit vector out of the fundamental f. Its scaling along the way just isn’t all that important.

And I didn’t scale the first column of F, so the \sqrt{2}\ in F is consistent with the 470 in f.

Sheesh! And I thought Fourier transforms had too many different scalings!

What interests me is that the first column of F is proprtional to f. And both appear to be proportional to the curve “EE” in figure 20. This is progress.

Now let’s get the dual basis E’. (I think I like idea of calling the rows of E’ a dual basis for the columns of A… and the columns of E a reciprocal basis for the columns of A… and, finally, the columns of E are a dual basis for the rows of A’. Row vectors and column vectors live in vector spaces which are dual to each other.)

Oh, say it compactly! The columns of E are simultaneously a reciprocal basis for the columns of Acolumns of A and a dual basis for the rows of A’. Numerically the same, conceptually different.

I recall the required computation as

E’A = I,

E' = A^{-1} = (A'A)^{-1} A'\ ,

where A^{-1} doesn’t exist, but led me to the appropriate pseudo-inverse.

Like everything else in this example, E is too big to display, but here are its first six rows:

\left(\begin{array}{ccc} 0.00128533 & -0.00131508 & 0.00240808 \\ 0.00289282 & -0.00297823 & 0.00552559 \\ 0.00554886 & -0.00576118 & 0.0109365 \\ 0.00901767 & -0.00947485 & 0.0184882 \\ 0.0127322 & -0.0135554 & 0.0274475 \\ 0.0157701 & -0.0170116 & 0.0363094\end{array}\right)

And here is a graph of the three rows of E’ (equivalently, the three columns of E):

NOTE that these are the “CIE 1964 XYZ fundamentals” on p. 163 of Cohen. Some of those resemble Figure 20… wait, E is not orthonormal, so none of them should be curves in Figure 20 (which are orthogonal, although not unit, vectors!) The resemblance is just that: not much more than vague similarity.

OK, let’s try getting an orthonormal basis, somehow.

I don’t know about you, but I want something I understand. Let’s apply Gram-Schmidt to a non-orthonormal basis. (This is almost certainly not what Cohen did. I’ll show you what I think he did in the next post. We’ll get the same answer.)

The first two non-orthonormal bases I started with were

f, E1, E2

and

f, E1, E3.

In retrospect, my third choice

f, E2, E3

seems more obvious. OTOH, more obvious or not, what matters is that it does, in fact, generate Figure 20!

As is my custom, I am going to describe Gram-Schmidt for this 3D case. The general method just extends it from 3D to n-dimensions.

We have our first vector f. Normalize it, by

f1 = \frac{f}{\sqrt{f \cdot f}}\ .

This gives us our first unit vector, f1, for the new basis.

f1 = {0.00827079, 0.0189187, 0.0372946, 0.0627057, 0.092592, 0.121959, 0.147759, 0.171307, 0.188002, 0.194686, 0.193198, 0.186406, 0.174699, 0.16013, 0.142335, 0.120635, 0.101844, 0.0897429, 0.0819053, 0.0803825, 0.0830737, 0.0892844, 0.0978074, 0.109, 0.12125, 0.132319, 0.142952, 0.153603, 0.163331, 0.170721, 0.176973, 0.183512, 0.189079, 0.192829, 0.194622, 0.19368, 0.191252, 0.18906, 0.185114, 0.178029, 0.168484, 0.15698, 0.14369, 0.128951, 0.113601, 0.0985278, 0.0833993, 0.0680677, 0.0542824, 0.0428017, 0.033149, 0.025137, 0.0187257, 0.0137345, 0.00992992, 0.00706512, 0.00498154, 0.00348081, 0.00241862, 0.00168584, 0.00116469}

Normalize E2 and E3, as we did f, just to start with vectors of the same size. Call the unit vectors e2 and e3.

e2 = {-0.00326355, -0.00739089, -0.0142972, -0.0235132, -0.0336395, -0.0422167, -0.0480022, -0.0514717, -0.0506656, -0.0450514, -0.0346783, -0.0212322, -0.00302856, 0.0178248, 0.0440051, 0.0714143, 0.0954016, 0.120945, 0.141852, 0.165312, 0.189969, 0.213125, 0.235048, 0.255043, 0.270026, 0.276282, 0.276115, 0.273313, 0.264794, 0.24833, 0.226531, 0.201183, 0.171297, 0.136324, 0.0977252, 0.0577811, 0.0185254, -0.0185732, -0.0524476, -0.0805823, -0.102238, -0.116778, -0.124173, -0.124757, -0.11956, -0.109979, -0.0974696, -0.0829623, -0.0686652, -0.0558823, -0.0443658, -0.0340783, -0.0255961, -0.0189148, -0.013768, -0.00982428, -0.00696574, -0.0048783, -0.00340378, -0.00233611, -0.0016479}

e3 = {0.0134212, 0.0307963, 0.0609534, 0.103042, 0.152976, 0.202366, 0.245956, 0.285754, 0.313861, 0.325009, 0.321797, 0.308947, 0.28685, 0.259425, 0.22519, 0.182631, 0.143994, 0.114635, 0.0919794, 0.0761547, 0.0649611, 0.0570429, 0.0503958, 0.0457693, 0.041089, 0.0352787, 0.0289087, 0.0223688, 0.0152318, 0.00702492, -0.00182114, -0.0115187, -0.0217216, -0.0324157, -0.0433804, -0.0537281, -0.0633608, -0.0725008, -0.0801518, -0.0851463, -0.0875296, -0.0873093, -0.08454, -0.0794315, -0.0725582, -0.0646114, -0.0558614, -0.0465045, -0.0377562, -0.0302361, -0.0237077, -0.0180942, -0.0135352, -0.00996527, -0.00722962, -0.00515145, -0.0036426, -0.00254819, -0.00177437, -0.00122704, -0.000856806}

Now apply Gram-Schmidt to the pair f1 and e2. The vector e2 is not orthogonal to f1, but it can be written as the sum of something parallel to f1 plus something perpendicular to f1. We will find the parallel part using the dot product, and get the perpendicular part by subtraction.

The component of e2 wrt f1 is… e2 \cdot\ f1 = 0.354107 (since f1 is a unit vector), so we make a vector of that length, parallel to f1, and subtract it from e2. The result of that vector subtraction is perpendicular to f1. We normalize it and call it f2.

f2 = f1 - (e2 \cdot\ f1)\ f1\ .

Then

f1, f2

is an orthonormal basis that spans the same subspace as

f1, e2.

f2 = {-0.00662133, -0.0150664, -0.029409, -0.0488852, -0.0710293, -0.0913202, -0.107276, -0.119901, -0.125361, -0.121889, -0.110234, -0.0932842, -0.0693866, -0.041572, -0.00683967, 0.0306849, 0.063449, 0.0953437, 0.120668, 0.14633, 0.171675, 0.194084, 0.214299, 0.231442, 0.242825, 0.245323, 0.241118, 0.234089, 0.221296, 0.200893, 0.175217, 0.145636, 0.111572, 0.072756, 0.0308042, -0.0115508, -0.052607, -0.0914459, -0.126173, -0.153574, -0.173117, -0.184308, -0.187183, -0.182227, -0.170858, -0.154906, -0.135801, -0.114483, -0.0939761, -0.0759605, -0.0599912, -0.0459572, -0.0344598, -0.0254257, -0.0184818, -0.0131801, -0.00933457, -0.00653426, -0.00455539, -0.00313629, -0.00220307}

Now what do we do with e3? We want to break off the piece which lies in the plane spanned by f1 and f2. That’s just the vector

f1\ (e3 \cdot f1) + f2\ (e3 \cdot f2)\ .

When we subtract that from e3, what remains is perpendicular to the plane spanned by f1 and f2; we will normalize it to length 1 and then call it f3. (It’s crucial that f1 and f2 are an orthonormal basis for that plane; otherwise we would need an explicit reciprocal basis to find components of e3 wrt f1 and f2.)

f3 = {0.0108058, 0.0248259, 0.0492157, 0.0833774, 0.124048, 0.164396, 0.200091, 0.232718, 0.255782, 0.264988, 0.262322, 0.251597, 0.233086, 0.2101, 0.18122, 0.145023, 0.111701, 0.085438, 0.064457, 0.0480153, 0.034675, 0.0232252, 0.0120345, 0.00174899, -0.00919785, -0.0209678, -0.0332539, -0.0457821, -0.0586168, -0.071629, -0.0848832, -0.0992835, -0.113845, -0.128148, -0.141854, -0.153559, -0.163729, -0.173431, -0.180567, -0.183118, -0.181455, -0.175821, -0.166361, -0.153478, -0.138238, -0.121869, -0.10453, -0.0863848, -0.0696761, -0.0554858, -0.0433135, -0.0329815, -0.0246352, -0.0181132, -0.0131248, -0.00934719, -0.00660276, -0.00461709, -0.00321259, -0.00222783, -0.00154979}

(Maybe I shouldn’t have printed all those, but if you have the 1964 xyz bar tables and you’re trying to verify my calculations….)

So, it’s time to look at what we got. Here are f1, f2, f3. They are an orthonormal basis, and they span the same space as E1, E2, E3 — the fundamental color space, the pre-image of the range of A’.

The brown looks like Cohen’s “I”. But the mauve curve is off. Still….

Do I need the negative of the mauve curve?

Yes!

And multiply by 44 ! (No, I have no idea why! I just kept changing the multiplier until I matched his scale on Figure 20.)

Having found them, make them pretty — i.e. set the horizontal scale.

Here, let me put them side-by-side. They look very close, but not exact. Nevertheless, I am content: his numbers tend to be off a little, so what’s the point of trying to match them perfectly?

So. Start with the non-orthonormal basis f, E2, E3 and apply Gram-Schmidt to get the orthonormal basis f1, f2, f3. Then take the negative of f2.

I must emphasize that his scale (i.e. multiply by 44) does not represent orthonormal vectors. Pairwise orthogonal, yes — but the f1, f2, and f3 I derived were specifically unit vectors. The scale on his figure requires that I multiply by 44 (or something very close to it). Once we have unit vectors, we can scale them any way that is convenient. Here it was convenient to match his — frankly, incomprehensible — vertical scale.

Oh, let’s close with one last calculation, to compute the “transformation matrix” for Figure 20. We have found the F matrix for Figure 20, and we have the A matrix. What is the transition matrix T between them? (Actually, we know we want the attitude matrix instead.) Recall that he takes A as the new, and F as the old, components.

So, as usual, we compute each column of F’ (old) by applying T to each column of A’ (new). That is,

F’ = T A’

F = A T’

and we write the false but suggestive

T' = A^{-1} F

which gives us the true

T' = (A'A)^{-1} A' F

using the appropriate pseudo-inverse.

So we compute and display T’, and remember that, as the transpose of a transition matrix, it is an attitude matrix…

T^T = \left(\begin{array}{ccc} 0.0663267 & -0.37473 & -0.161437 \\ 0.142691 & 0.37685 & 0 \\ 0.0781229 & -0.00253108 & 0.161503\end{array}\right)

We can compare that to the transformation matrix TE we tried at the beginning:

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

As we would expect — since the F produced by TE did not match Figure 20 — these are different.

Real Short Summary

I matched his Figure 20 by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3… where f was the fundamental of an equal-energy illuminant, and E2 and E3 were two of the basis vectors dual to the rows of A’.

Summary

We really started with two separate items: a “transformation matrix” TE — one of five on p. 100 — and Figure 20 on p. 95 of Cohen. The transformation matrix TE turned out to be an attitude matrix describing a new A matrix in terms of an old F matrix. (Yes, that’s backwards, but so long as we knew it, it didn’t matter. This is why I carefully distinguish attitude and transition matrices; so long as I know what I’ve been handed, I can use it properly and confidently.)

We saw that F = A TE reproduced only the “EE” curve of his Figure 20. And his F matrix failed to be orthonormal by a factor of \sqrt{2}\ .

We then saw that the “EE” curve was (proportional to) the fundamental f of an equal-energy illuminant. In order to find the fundamental of a spectrum, we had to find a projection operator R onto the pre-image of the range of A^T. The fundamental f is then the result of applying the projection operator R to a constant vector (“equal energy”).

By trial and error, I found that the F matrix for Figure 20 could be (very nearly) reproduced by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3, where E1, E2, E3 are a basis dual to the rows of A’. We found the dual basis as the rows of E' = (A'A)^{-1} A' and then transposed E’ to get E, a matrix of 3 columns.

In the next post, I will repeat these calculations quickly using 20 nm intervals, and I will show you a few alternatives. In particular, I will show you that his method (a Cholesky decomposition) gives the same answer — even to having to switch the sign of the second column!

Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

As before, I suppose there is a transition matrix T relating them; each column of V^T = V' can be computed by applying a transition matrix T to the corresponding column of P^T\ :

V’ = T P’

i.e.

V = P T’.

Then I imagine briefly that I can “solve” for T’…

T' = P^{-1}\ V\ ,

but then quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (P'P)^{-1}\ P'\ V\ .

I compute T’, and then display its transpose T:

T = \left(\begin{array}{cccc} 0.905145 & -0.386677 & 0.0882059 & -0.141151 \\ 0.853806 & 0.568131 & -0.0875601 & -0.220572 \\ 0.64498 & -0.155196 & -0.506397 & 0.490735 \\ 0.507437 & 0.131036 & 0.67664 & 0.347268\end{array}\right)

and then I test whether

V = P T’ (?).

The largest difference (element by element) between the matrices V and P T’ is 0.170627, which is rather larger (by a factor of 23) than the differences we saw in Part 1. Here’s the full blown matrix — where I chose to round off first, then subtract (i.e. what we’d compute if we were subtracting published, already rounded-off, matrices):

V - P\ T' = \left(\begin{array}{cccc} 0. & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0. & 0 & 0. & 0. \\ 0. & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I am inclined to think there is a probem. But we don’t know much at all about it.

Let’s try one of our alternatives. We construct a projection operator onto the subspace spanned by P.

Get the SVD (Singular Value Decomposition) of P…

P = u\ w\ v^T\ .

Then, since u is 10×10 and V is of rank 4, the leftmost 4 columns (u1) of u….

u1 = \left(\begin{array}{cccc} 0.132041 & 0.280276 & -0.423196 & 0.0366383 \\ 0.24536 & 0.379423 & -0.288143 & -0.0619886 \\ -0.0781059 & 0.252968 & -0.245411 & -0.0642461 \\ -0.310642 & 0.402272 & -0.122076 & 0.641009 \\ 0.19934 & 0.202925 & -0.456543 & -0.212204 \\ -0.275595 & 0.504975 & 0.417433 & 0.130869 \\ 0.0279239 & 0.347505 & 0.199748 & -0.545498 \\ -0.547166 & 0.00466106 & -0.00216808 & -0.298924 \\ -0.571058 & -0.293548 & -0.488675 & -0.061545 \\ -0.278923 & 0.220314 & 0.0368869 & -0.356268\end{array}\right)

are an orthonormal basis for the range of P (i.e. the column space of P). We construct a projection operator, as we have before, by

R = u1\ u1^T\ .

We check it first by confirming that R R = R, i.e. it is idempotent, hence a projection operator. We check it further by applying it to the column vectors in P: we expect that R P = P; i.e. it actually projects onto the column space of P (the space spanned by the columns of P).

Now, having checked that R is a projection operator onto P, we apply it to V, and compute V – R V:

V - R\ V = \left(\begin{array}{cccc} 0.00221622 & 0.0115115 & -0.0249851 & -0.0145408 \\ 0.0137721 & 0.012178 & -0.0243583 & -0.0200425 \\ -0.00564412 & -0.011024 & 0.00933521 & 0.0112507 \\ 0.00222059 & 0.00165751 & -0.00118369 & -0.00423324 \\ 0.00223276 & 0.00638406 & -0.0153346 & -0.0119237 \\ -0.00896396 & -0.0106169 & 0.0272358 & 0.0230479 \\ -0.00941945 & -0.0152489 & 0.0456323 & 0.0330044 \\ 0.0470134 & 0.0703021 & -0.170627 & -0.121383 \\ -0.0232815 & -0.036102 & 0.0941953 & 0.068255 \\ -0.0227796 & -0.0330688 & 0.0740154 & 0.0474348\end{array}\right)

Let me round that to the nearest .01…

V - R V = \left(\begin{array}{cccc} 0 & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0 & 0 & 0 & 0 \\ 0 & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I should put that array of differences into perspective. Recall V:

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

This reinforces my belief that we have a problem. Some of the differences we found are larger than some of the values of V itself. I can see that V is not in the column space of P, and while it’s not extremely far away, it’s not close either. (This seems more definitive than “the hypothetical transition matrix T doesn’t quite work”.)

As before, however, we are tip-toeing around the real question: Is every column vector in V a linear combination of the columns of P?

Easy enough. Do a regression. Actually 4 of them, one for each column of V.

Let me name the individual columns of the P matrix, and let me set the first column of V (v1) as “y”, and the columns of V as independent variables.

(Some smaller pictures follow, in which you can see that the final option is “IncludeConstantBasis -> False”.) The results are:

Not bad at all. In fact, rather close to perfect.

What was the first row of T?

{0.905145, -0.386677, 0.0882059, -0.141151}.

Yes, those are the \beta s (“Estimates”) for that regression; first row of T is first column of T’ is regression coefficients for the first column of V as the dependent variable.

Let me emphasize another relationship. For the first regression, yhat should be the projection of v1 (the 1st column of V) onto the column space of P. But we have that projection directly as R v1:

R v1 = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

… and we have the yhat from the regression:

yhat = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

Good, they are the same. So if regression works better for you than the projection operator, then use regression. Personally, I prefer the projection operator because I can apply it to all of V at once. (And because sometimes I have a use for the orthonormal basis u1.)

Largely to display the R^2, let me run the other three regressions. (Oh, I also reduced the magnification in my Mathematica notebook, so the pictures are smaller and the right side of the regression command isn’t cut off, at least on my screen.)

Here’s v2 as the dependent variable:

Here’s v3 as the dependent variable:

Here’s v4 as the dependent variable:

We see that the R^2 is over 99% for v2, falls to 95% for v3, and is 97.6% for v4. Those last two are just not large enough, not for this question.

I’m certain that V and P do not span the same subspace, but that’s about all I know.

We could, however, have made the dual choice, projecting P onto V instead of V onto P. Are they the same?

No.

We already know — from the T matrix, from the projection operator, or from the regressions — that the P and V matrices do not represent the same data in two different coordinate systems. But there might be — and in this case, there is! — more information available by making the other choice.

Let’s get the projection onto V instead of onto P. We need the SVD of V…

V = u\ w\ v^T\ .

We need the 4 leftmost columns (u1) of u…

u1 = \left(\begin{array}{cccc} -0.292636 & 0.0434068 & -0.33927 & 0.269896 \\ -0.274013 & 0.218205 & -0.238496 & 0.365998 \\ -0.337357 & -0.00723082 & -0.119568 & 0.0745101 \\ -0.355459 & 0.224109 & -0.415783 & -0.490416 \\ -0.2739 & -0.0500253 & -0.225463 & 0.452829 \\ -0.308498 & 0.463936 & 0.232078 & -0.416388 \\ -0.287334 & 0.227908 & 0.427185 & 0.212631 \\ -0.303344 & -0.281983 & 0.503807 & 0.152613 \\ -0.329068 & -0.741253 & -0.129874 & -0.304713 \\ -0.382183 & -0.0396735 & 0.288053 & -0.0857624\end{array}\right)

We construct the projection operator…

R = u1\ u1^T

Now we apply it to P, and compute the difference P – R P:

P - R P = \left(\begin{array}{cccc} 0.00186923 & -0.00646007 & -0.00165699 & 0.0293708 \\ -0.00396322 & 0.00598219 & 0.00143136 & 0.0162366 \\ 0.0035053 & 0.00255499 & -0.00525352 & -0.0232869 \\ 0.000382829 & 0.00307153 & 0.00267228 & 0.124774 \\ 0.000923133 & -0.00219591 & 0.00142733 & -0.0228012 \\ 0.0000472374 & -0.00300758 & -0.00291228 & -0.00287352 \\ -0.00462961 & -0.00220958 & -0.0000400809 & -0.149176 \\ 0.00305495 & 0.00376593 & -0.000439512 & 0.29618 \\ -0.00367312 & -0.00153518 & -0.000838857 & -0.106747 \\ 0.00147883 & -0.000459218 & 0.00482354 & -0.141981\end{array}\right)

So what?

Round it off:

P - R P = \left(\begin{array}{cccc} 0 & -0.01 & 0 & 0.03 \\ 0 & 0.01 & 0 & 0.02 \\ 0 & 0 & -0.01 & -0.02 \\ 0 & 0 & 0 & 0.12 \\ 0 & 0 & 0 & -0.02 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -0.15 \\ 0 & 0 & 0 & 0.3 \\ 0 & 0 & 0 & -0.11 \\ 0 & 0 & 0 & -0.14\end{array}\right)

Ah ha! It’s time for Robin to squeak out, “Holy moly, Batman!”

The fourth column of P does not lie in the subspace spanned by the 4 columns of V, while the first three do. That is the problem. I don’t why or where it happened, but I know exactly what’s wrong: the 4th column of P is the only problem.

Somebody blew it somewhere. (OK, since I don’t know how a varimax rotation is computed, I have to entertain the possibility that V is not supposed to be P under a change of basis. But I can’t believe that, because V and Q have a common column space, and 3 of the 4 columns of P lie in that common column space. If that’s an accident, what does a deliberate act look like?)

Is the mistake in the orignal paper? Were P and V computed from the same correlation matrix but by independent algorithms? Or was V computed incorectly from P (and then Q computed from V)?

Or did someone copy P incorrectly? (P should be primary, but then how the heck does the 4th column of P fail to be in the column space of V? I would sooner have expected an error in V, given P.)

Or did they rotate the first 3 PCs, and then apply that transformation to the 4th column? (No. I’ve checked. You can too.)

Let me emphasize that we would get the same result from four regressions each with one column of P as the dependent variable and all the columns of V as the independent variables. The first three regressions would be extremely good fits, but the fourth would not be. We would, again, conclude that the fourth column of P was not exactly — or even to a fair approximation — a linear combination of the columns of V.

Summary: theory

Given two matrices, say P and V, which are alleged to be the same data in different coordinate systems, we may check that assertion very quickly (if both are of full rank):

Define

T' = (P'P)^{-1} P'\ V\ .

If we find it true that

V’ = T P’,

then T is a transition matrix between two bases, and P and V are indeed the same data in different coordinate systems. We’re done.

If, however, the equality fails, V' \ne T\ P'\ ,

then T is not a transition matrix, and P and V are not the same data in different coordinate systems. Unless we want more information, we’re done.

Finally, we have seen that, numerically, we may have T “close to” a transition matrix. (In Part 1, V and Q are very nearly the same data; in the color posts, the xyx bar and rgb bar tables are very nearly the same data.)

Summary: practice

Computing T tells us if something is wrong, but not much more. It’s quick, easy, and gives a simple answer: “good” or “no good”.

Computing both projection operators P onto V and V onto P (not just one of them) would likewise tell us whether or not something is wrong, and, in addition, the projection operators might isolate the problem.

We could use regression instead of computing explicit projection operators.

Summary: this example

We found that V and Q are the same data in two different coordinate systems, but that P is not. In addition, we actually know that the problem is limited to the 4th column of P: it does not lie in the space spanned by the columns of V (i.e. it does not lie in the column space of V, which is also the column space of Q).

Question:
Can someone tell me if these P, V and Q match the original paper by Yule et al.?

And there’s a question you might have.
How many examples did I have to search through to find such a good (because it was such a bad) example?

One. This was the very first case I checked, outside of the 1931 CIE tables.

That’s scary.

Example: Is it a transition matrix? Part 1

This example comes from PCA / FA (principal component analysis, factor analysis), namely from Jolliffe (see the bibliography). But it illustrates some very nice linear algebra.

More precisely, the source of this example is:
Yule, W., Berger, M., Butler, S., Newham, V. and Tizard, J. (1969). The WPPSL: An empirical evaluation with a British sample. Brit. J. Educ. Psychol., 39, 1-13.

I have not been able to find the original paper. There is a problem here, and I do not know whether the problem lies in the original paper or in Jolliffe’s version of it. If anyone out there can let me know, I’d be grateful. (I will present 3 matrices, taken from Jolliffe; my question is, does the original paper contain the same 3 matrices?)

Like the previous post on this topic, this one is self-contained. In fact, it has almost nothing to do with PCA, and everything to do with finding — or failing to find! — a transition matrix relating two matrices.

On p. 163, Jolliffe provides a matrix of “principal components”:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

On the same page, he provides two matrices of the “rotated factor loadings”. One is “Varimax”…

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

… and the other is “direct quartimin”:

Q = \left(\begin{array}{cccc} 0.51 & -0.05 & 0.05 & 0.05 \\ 0.53 & 0.04 & 0.05 & -0.14 \\ 0.32 & 0.13 & 0.16 & 0.15 \\ 0.17 & -0.19 & 0.65 & 0.2 \\ 0.54 & 0.06 & -0.13 & 0.05 \\ -0.07 & 0.28 & 0.67 & -0.12 \\ 0.16 & 0.53 & 0.13 & -0.17 \\ 0.03 & 0.62 & -0.09 & 0.26 \\ 0. & 0.09 & 0.02 & 0.87 \\ 0.08 & 0.45 & 0.24 & 0.21\end{array}\right)

I presume — not knowing how to comute a varimax rotation! — that each of these matrices represents the same data, i.e. the same 4D subspace of R^{10} (since we have 4 column vectors of length 10 in each matrix). It would seem silly if a varimax rotation of PCs was not supposed to represent rotated PCs within the same subspace.

That presumption is wrong: they do not all represent the same subspaces.

How do I know this?

Because given two matrices of the same shape, I know how to find the transition matrix if it exists. We did this here.

Let me elaborate on what I’m doing. Imagine that our matrices had 2 columns and 3 rows. Those 3D column vectors lie in a 2D subspace — a plane — in R^3\ . By doing 2D transformations within that plane, I might find a particularly nice basis for representing that data.

If, instead, I apply a 3D transformation, I would be moving that data into a different plane. What I am doing here is looking at 4D transformations within the 4D subspace. Everything is predicated on my belief that the PCs in P, and the varimax rotation of them in V, and the direct quartimin data in Q are all supposed to be in the same subspace.

Not knowing how they were supposed to be computed, I could be seriously wrong here. OTOH, since we will find that V and Q are the same subspace, and the first 3 of the 4 columns of P also lie in that common subspace, I really believe that the 4th column of P should too. That it does not strikes me as an error.

Let me make one simplification up front. Are the matrices P, V, Q of full rank (4)?

Yes. The Mathematica command MatrixRank says that each of those matrices is of rank 4.

Let us consider V and Q. If the observations in each are related by a transition matrix T, and if we arbitrarily take V to be the “old” data, then each column of V’ (the transpose of V) is found by applying the transition matrix to the corresponding column of Q’.

V’ = T Q’

i.e.

V = Q T’.

Then, as I showed before, I imagine briefly that I can “solve” for T’…

T' = Q^{-1} V\ ,

but quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (Q'Q)^{-1} Q'\ V\ .

(It’s worth repeating that the inappropriate pseudo-inverse would use QQ’, but that can’t work because QQ’ is 10×10 but of rank 4 at most, hence not invertible.)

If Q’Q is invertible, that recipe always gives me a matrix T, but T need not be a transition matrix. The resulting T is a transition matrix if and only if we actually have

V = Q T’,

(given that V and Q are of the same full rank).

Do we have V = Q T’ ?

By direct computation I get

T = \left(\begin{array}{cccc} 0.9273 & 0.0921888 & 0.15161 & 0.101824 \\ 0.22944 & 0.864017 & 0.176744 & 0.0556721 \\ 0.251353 & 0.0599739 & 0.913081 & 0.0549228 \\ 0.185642 & 0.11154 & 0.00492819 & 0.941728\end{array}\right)

and then I look at V – Q T’…. You know, it’s easy enough to put them side by side.

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right) Q\ T' = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.16 & 0.17 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0 & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

Even the difference of the rounded-off matrices is satisfying small. That is,
Round[V,.01]-Round[Q.TT,.01]//Chop//MatrixForm is

\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & -0.01 & 0.01 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

Judging that the difference is negligible, I conclude that the matrices V and Q do span the same subspace of R^{10}\ ; they are the same 4D subspace with two different coordinate systems (two sets of basis vectors).

While we’re here, let’s do this another way. After all, the fundamental question is: can the columns of V be written as linear combinations of the columns of Q? Alternatively, can each column of Q be written as a linear combination of the columns of V?

Bear in mind that we have already established that Q and V are of the same full rank. In that case, the alternatives are equivalent.

We can answer the fundamental question by trying to regress each column of V on Q. Just remember to tell the regression to drop the constant!

To be specific, here’s the first column of V as a linear combination of the 4 columns of Q. Here’s the parameter table and R^2 — not the adjusted R^2\ : I really want to know how good this fit is, never mind how many variables it has.

Here’s a snapshot of the Mathematica commands (version 7). (Oh, darn. The last option in the LinearModelFit command, which replaced “Regress” is “IncludeConstant->False”. And I print the names just to confirm that none of those symbols has a left-over numerical value!)

(Let me just remind you that the R^2 is a measure of a how close the fit is. When we add another independent variable, however, the R^2 cannot decrease; it is not a good measure of the value — i.e. the worth — of that latest independent variable. The adjusted R^2 can decrease; in fact, in my experience, if the t-statistic of that latest variable is greater than 1 (in magnitude), then the adjusted R^2 goes up. As I said, in this case, I really want the R^2 rather than adjusted R^2 because I know exactly how many independent variables I am supposed to use. That said, there isn’t a whole lot of difference between the two in this case.)

v1 on Q

That’s a really good fit. And it should be. And that column of \beta\ , i.e. “Estimate”? It’s the first column of T’, i.e. the first row of T:

{0.9273, 0.0921888, 0.15161, 0.101824}

We could run 3 more regressions, but they’d only confirm what we believe: T’ could, in fact, be computed as regression coefficients. If the two subspaces coincide, then T is also a transition matrix between two bases in one and the same space.

There is yet another way to look at that. We know that in regression, yhat is the projection of y onto the subspace spanned by the columns of X. If y itself is in that subspace, then yhat = y, and the relatonship is given by a transition matrix.

So we could do this a third way. We know how to compute a projection operator onto the subspace spanned by the columns of X. In this case, X is Q, so we find its SVD (Singular Value Decomposition)…

Q = u\ w\ v^T

Then the 4 leftmost columns (u1) of u are an orthonormal basis for the range of Q….

u1 = \left(\begin{array}{cccc} -0.203473 & 0.27677 & 0.353491 & 0.188167 \\ -0.188235 & 0.423093 & 0.310857 & 0.0177813 \\ -0.30583 & 0.10025 & 0.140569 & 0.094808 \\ -0.313324 & 0.177504 & -0.285416 & 0.609666 \\ -0.194156 & 0.220796 & 0.497051 & 0.0101643 \\ -0.354903 & 0.228756 & -0.604256 & -0.00173818 \\ -0.320761 & 0.191153 & -0.0758258 & -0.463325 \\ -0.379019 & -0.300366 & 0.0912622 & -0.459411 \\ -0.365508 & -0.681435 & 0.196437 & 0.360653 \\ -0.43321 & -0.103153 & -0.111086 & -0.166404\end{array}\right)

That is, u1′ u1 = I, if the column space of X is of full rank. Check it:

u1^T u1 = \left(\begin{array}{cccc} 1. & 0 & 0 & 0 \\ 0 & 1. & 0 & 0 \\ 0 & 0 & 1. & 0 \\ 0 & 0 & 0 & 1.\end{array}\right)

Bur the reversed matrix product

R = u1 u1′

gives us a projection operator R onto the range of Q:

We can check it. (I won’t print it: it’s 10×10.) One, R is a projection operator if and only if it is idempotent, i.e. R R = R. And by computation, I see that is true.

Two, applied to each column of Q, it reproduces that column of Q; i.e. applied to Q as a whole, it reproduces Q, i.e. R Q = Q. Again, computation shows that is true, too.

Having checked that R is a projection onto the column space of Q, let’s use it: apply it to V. If V lies in the subspace spanned by the columns of Q, then R V will reproduce V….

I find that the largest difference between the elements of R V and V is 0.00720726 .

That’s as close as the other two calculations. Not exactly zero, but then, our regression didn’t fit exactly: like the RGB and XYZ tables in the earlier post, V and Q are almost but not exactly related by a transition matrix.

So we have seen three ways of looking at this. All three tell the same story: the column spaces of V and Q are the same.

Let me say a little more about two of them, the projection operator and the explicit regression.

The projection operator R came from the SVD. The regression can also be written using a projection operator, namely the hat matrix H. I keep getting mixed up about the relationship between T and R and H, so let me write it out.

The normal equations for

\hat{y} = X\ \beta

are

\beta = (X'X)^{-1}\ X'\ y\ .

(I showed you a quick and dirty way to recover that equation in the earlier post.)

Then

\hat{y} = X\ \beta = X\ (X'X)^{-1}\ X'\ y

and so we may write

yhat = H y

with the hat matrix H defined as

H = X\ (X'X)^{-1}\ X'\ .

For our regression, we took y to be the first column of V, and X to be the four colums of Q, so our hat matrix would be

H = Q\ (Q'Q)^{-1}\ Q'\ .

It is a projection operator onto the range of Q. But that’s exactly what R was. Go ahead and check: you will find that

H = R.

This means that we could compute R as the hat matrix, instead of using the SVD. Personally, I know how to use the SVD, whereas I would have to work out the definition of the hat matrix every time. But that’s a preference; you may choose to compute the hat matrix instead. (And that is how Malinowski was doing it!)

Going a little further, T’ was

T' = (Q'Q)^{-1}\ Q'\ V

so we have

Q T’ = H V.

We’ll look at the relationship between P and V in Part 2.

Transition matrix: to be or not to be

Cohen (“Visual Color & Color Mixture”, see the bibliography) did something very interesting. In fact, he did something useful which I had never seen before.

Although this post uses some matrices which we saw in the color posts, I think this can stand on its own: you need not have read the color posts. But if you are specifically interested in color, or in Cohen’s work, this post is very relevant.

He was trying to describe how to find a transition matrix between two given data matrices. This will come in handy — very handy! — whenever people give the alleged result of an unspecified linear transformation of a data matrix.

On p. 93, he wrote

D T = C in general

F T = A in particular

with

T = (D'D)^{-1} D'\ C\ .

Because D’D (that is, D^T\ D\ ) is invertible, we infer that D and C are taller than wide, like the A matrix… like our usual data or design matrix X.

We also infer that T is a small matrix: its size is cxc, where c is the number of columns of D. D and C are the same size.

I do have to point out that in his equation (and his notation)

D T = C,

T is an attitude matrix! I will change that. (Trust me: you want me to change that.)

Since we have both XYZ (i.e. 1931 xyz bar) and RGB (1931 rgb bar) tables, and since they are supposed to be related by the transformation matrix T31…

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

… we should try the idea embodied in that equation for T. Assuming that there is a transition matrix T, let J and K stand for the XYZ and RGB matrices respectively.

As usual, let me use 20 nm intervals instead of the published 5 nm intervals. We have

J = \left(\begin{array}{ccc} 0.00003 & -0.00001 & 0.00117 \\ 0.0003 & -0.00014 & 0.01214 \\ 0.00211 & -0.0011 & 0.11541 \\ -0.00261 & 0.00149 & 0.31228 \\ -0.02608 & 0.01485 & 0.29821 \\ -0.04939 & 0.03914 & 0.14494 \\ -0.07173 & 0.08536 & 0.04776 \\ -0.09264 & 0.17468 & 0.01221 \\ -0.03152 & 0.21466 & 0.00146 \\ 0.0906 & 0.19702 & -0.0013 \\ 0.24526 & 0.1361 & -0.00108 \\ 0.34429 & 0.06246 & -0.00049 \\ 0.29708 & 0.01828 & -0.00015 \\ 0.15968 & 0.00334 & -0.00003 \\ 0.05932 & 0.00037 & 0 \\ 0.01687 & 0.00003 & 0 \\ 0.0041 & 0 & 0 \\ 0.00105 & 0 & 0 \\ 0.00025 & 0 & 0 \\ 0.00006 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

K = \left(\begin{array}{ccc} 0.0014 & 0. & 0.0065 \\ 0.0143 & 0.0004 & 0.0679 \\ 0.1344 & 0.004 & 0.6456 \\ 0.3483 & 0.023 & 1.7471 \\ 0.2908 & 0.06 & 1.6692 \\ 0.0956 & 0.139 & 0.813 \\ 0.0049 & 0.323 & 0.272 \\ 0.0633 & 0.71 & 0.0782 \\ 0.2904 & 0.954 & 0.0203 \\ 0.5945 & 0.995 & 0.0039 \\ 0.9163 & 0.87 & 0.0017 \\ 1.0622 & 0.631 & 0.0008 \\ 0.8544 & 0.381 & 0.0002 \\ 0.4479 & 0.175 & 0. \\ 0.1649 & 0.061 & 0. \\ 0.0468 & 0.017 & 0. \\ 0.0114 & 0.0041 & 0. \\ 0.0029 & 0.001 & 0. \\ 0.0007 & 0.0003 & 0. \\ 0.0002 & 0.0001 & 0. \\ 0. & 0. & 0.\end{array}\right)

In case you have not read the color posts, all I am saying is that J and K and T31 are given matrices, and T31 is supposed to be a transition matrix between J and K.

We do know — or take it as given — that for the transition matrix T31, old is XYZ ~ K, so let me write

K’ = T J’

That says T is a transition matrix which, applied to each column vector of J’, delivers the corresponding column of K’.

Transpose:

K = J T’

and then pretend J^{-1} exists (even though J is rectangular)

T' 	= J^{-1} K \

and then substitute the appropriate pseudo-inverse:

T' = (J'J)^{-1} J'\ K\ .

(The inappropriate pseudo-inverse would use JJ’ instead — which cannot be of full rank, given the assumed shape of J, hence cannot be invertible.)

That equation defined the transpose T’, but here’s T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.06008 \\ 0.00004 & 0.05661 & 5.59443\end{array}\right)

Recall the given T31:

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

By doing it my way, I constructed the counterpart to T31, which is the published transition matrix between XYZ and RGB (1931) with XYZ considered the “old” basis.

T and T31 are very close, but not the same.

Even if I used the full tables, at 5 nm intervals, we would not get T31.

First, I am certain that if there were a transition matrix T, then we must have computed it. Since this computation did not yield T31, I infer that T31 is not the transition matrix.

Second, is T the transition matrix? We should apply T to the appropriate table and see what we get. We had assumed

K’ = T J’, or equivalently

K = J T’

so let’s compute a counterpart to K, namely Q = J T’:

Q = \left(\begin{array}{ccc} 0.00139 & 0.00005 & 0.00654 \\ 0.01431 & 0.00039 & 0.06791 \\ 0.13434 & 0.00399 & 0.64559 \\ 0.3483 & 0.02299 & 1.74711 \\ 0.29082 & 0.06001 & 1.66915 \\ 0.09561 & 0.139 & 0.81307 \\ 0.00489 & 0.323 & 0.27202 \\ 0.06329 & 0.71 & 0.07819 \\ 0.29041 & 0.95401 & 0.02032 \\ 0.59452 & 0.99499 & 0.00388 \\ 0.91629 & 0.87 & 0.00167 \\ 1.06216 & 0.631 & 0.00081 \\ 0.85443 & 0.38099 & 0.00021 \\ 0.44795 & 0.17501 & 0.00003 \\ 0.1649 & 0.06102 & 0.00002 \\ 0.04676 & 0.01701 & 0 \\ 0.01135 & 0.0041 & 0 \\ 0.00291 & 0.00105 & 0 \\ 0.00069 & 0.00025 & 0 \\ 0.00017 & 0.00006 & 0 \\ 0 & 0 & 0\end{array}\right)

and compare it to K, via Q – K:

Picture 6

(I keep using pictures for things like that, namely scientific notation, just because it’s easier than editing the LaTeX.)

The largest entry in absolute value is… 0.000069765 . The three columns of Q look like this:

Picture 5

Pretty small, and pretty random, but still not really zero. Well, maybe it wouldn’t be. Let’s try our magic formula on J and Q, since they are related by the transition matrix T. From

T' = (J'J)^{-1} J'\ K\ ,

I compute what I call t’:

t' = (J'J)^{-1} J'\ Q\ ,

and then display the transpose of the transpose, t” = t:

t = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

Recall T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

So. That equation for T’ does recover the transition matrix if there is one. The discrepancy between T and T31 is real.

Maybe I’m making too big a deal of this. I infer that the XYZ and RGB tables do not span exactly the same subspace. They almost do, but not quite.

To be more explicit: the 1931 XYZ (xbar, ybar, zbar) and RGB (rbar, gbar, bbar) tables are not exactly related by any transition matrix whatsoever, although the published T31 is close to being one for them.

I have found a marvelous example — disappointing as a published result but marvelous as a bad example — where the discrepancy is significant. I expect to show it in another post, soon.

We’ve been looking at the computation in one way: if there is a transition matrix T such that

K’ = T J’

then

T' = (J'J)^{-1} J'\ K\ .

What if there is no such transition matrix T? We can still define T by that equation, and compute (J'J)^{-1} J'\ K\ (still assuming J is of full rank, with more rows than columns), but what are we getting, if T itself does not exist as a transition matrix?

That is, what if we compute T, and find that

K' \ne T\ J'\ ?

Well, this will be clearer in the next post, but…. That equation for T’ looks an awful lot like the normal equations for a least-squares fit.

What I will illustrate in the next post is that (J'J)^{-1} J'\ K\ generates a transition matrix to the subspace spanned by J. That is, T is a transition matrix — any invertible matrix is! — but it relates J to a subspace other than (the one spanned by) K.

(In our case, we had 21 observations; the 3 columns of J and of K define 3D subspaces of R^{21}\ . Those two 3D subspaces are almost the same, but not exactly.)

Let me try to explain here it without a good (I mean, bad) example. If J and K span the same subspace, then any one column of K (think of it as the dependent variable y) can be written as a linear combination of the columns of J (think of it as the X matrix).

But if J and K do not span the same space, then the best we can do is to find that linear combination (call it yhat !) of columns of X which is as close to y as possible.

They really are examples of y and yhat and X.

And what is T’ ? Well, if we write it with y and X…

(X'X)^{-1} X'\ y\

we might recognize that as \beta in ordinary least squares.

Let me show you that, while providing another example of the utility of the pseudo-inverse.

The derivation of the normal equations for ordinary least squares is fairly complicated, if only because it involves the derivative of a matrix. But the pseudo-inverse would let us recover the equations themselves.

Deriving them is one thing; recovering the form of them once we know they exist is another.

We write our model (this is a quick & dirty recollection of the answer, not a derivation):

y = X \ \beta\ .

(I guess I have to talk about that. True is yhat = X\ \beta\ . But I write y instead of yhat, as though we have equality rather than a projection onto a subspace.)

We pretend that X is invertible, and write

\beta =  X^{-1} y\ .

Since X is not invertible, but X’X is (by assumption X is of full rank with more rows than columns), we replace the non-existent inverse by the pseudo-inverse and write

\beta = (X'X)^{-1} X'\ y\ .

The real derivation gave us that very answer. Once I know that, I never need to compare them again. I have a plausibility argument that gives me the known right answer.

This is analagous to confirming in the previous color post, among others, that if I compute

E' = (A'A)^{-1} A'\

then I have a dual (or reciprocal or biorthogonal) basis E such that

E’A = I.

In that case, I prefer to verify it computationally every time.

Incidentally, this trickery with the pseudo-inverse would also generate a true result if I had started with yhat instead of y — it is true that \beta = (X'X)^{-1} X'\ \hat{y}\ , but it’s not very useful because we don’t know yhat!

You might have noticed that \beta corresponds, algebraically, to a column of T’. Asking if the first column of K is a linear combination of all the columns of J gives us the first column of T’; asking if the ith column of K is a linear combination of all the columns of J gives us the ith column of T’. This, too, will be in the next related post.

Summary

Given two data matrices J and K, tall and thin, we can define and compute

T' = (J'J)^{-1} J'\ K\ .

If the individual observations are in fact related by a transition matrix M, so that

K’ = M J’,

then our computed T is M:

T = M.

This is pretty slick. Any time someone hands us some data and the alleged result of applying an unspecified linear transformation to it — we can find that linear transformation if it exists, or demonstrate that it does not exist (in which case, they goofed).

And if there is no such matrix M? Well, it’s simple enough: if there is no such M — and J and K are both of full rank — then our computed T must be something else, so we do not have equality:

K' \ne T\ J'\ .

That’s all it takes for us to know that there is no transition matrix M between J’ and K’. If there were, it would be T, but that didn’t work.

(That’s if J and K are both of full rank; if only one is not, then one of J’J or K’K is actually not invertible — as JJ’ and KK’ are not invertible — and T can only be computed in one direction. I expect that T is not invertible if J and K are not both of full rank.)

And when there is no transition matrix M, we nevertheless get a transition matrix T, but it’s between J’ and Q’ = T J’.

I will freely confess that I did not know any part of this before I read Cohen: neither that we could find the transition matrix so easily; nor what we were finding if it were not a transition matrix.

(Cohen implicitly assumes that there exists a transition M; he does not discuss what happens if there isn’t one.)

Next post on this topic, a numerical illustration.

attitude & transition matrices etc. – corrected 5-18, 6-13

i’ve made two changes in total, and you can search on “correction”.

“etc” is the inverse transition matrix, but I didn’t want a longer title.

I know of no natural case in which we specify a linear coordinate transformation by giving its inverse attitude matrix, as such, but i’ll keep my eyes open.

The key relationship among them is this: to say that P is a transition matrix is equivalent to saying that P^T is an attitude matrix. (The inverse transition matrix, of course, is P^{-1}\ .) in the special case that P is orthogonal, then the inverse is the transpose, P^T = P^{-1}\ , so the inverse transition matrix is the attitude matrix.

(OK, did you catch that? If our coordinate transformation is orthogonal, then the inverse attitude matrix is the transition matrix, so any time we specify a rotation by its transition matrix, we have just specified it by its inverse attitude matrix. This doesn’t count. I’m interested in the specification conceptually, and for that i know no case where we specify what should be understood primarily as the inverse attitude matrix.)

transition matrices

I can think of essentially two ways in which transition matrices are the natural expression of a linear coordinate transformation.

Before we get to that, just what is a transition matrix? it’s any invertible square matrix, interpreted as a change-of-basis, whose columns are interpreted as the old components of the new basis vectors.

That came out too convoluted. Any invertible square matrix may be considered a transition matrix. Then its columns are the old components of the new basis vectors.

That means that the effect of a transition matrix is to map new components of a vector to old ones. If x and y are the old and new components of some vector, and P is the transition matrix, then

x = P y.

(The columns of P are the old components of the new basis vectors; the new components of the new basis vectors look like (1,0,0), (0,1,0), etc.)

Now, back to our question: when might we find that a coordinate transformation has been specified by a transition matrix?

One very specific case is the diagonalization of a matrix: if A is diagonable to D, then

D = P^{-1}\ A\ P

where P is a transition matrix (specifically, one whose columns are eigenvectors of A; the eigenvectors are the new basis, so the columns of A (correction: the columns of P) are the old components of the new basis vectors). Let x be the old components of a vector, and let y be its new components. Then

x = P\ y

D is a matrix wrt to new components, so v = D y is a plausible thing to compute; but then

v = D\ y = P^{-1}\ A\ P\ y = P^{-1}\ A\ x

and then if we let u = P v (i.e. let u be the old components of v), we get

P v = A x = u.

That is (viewing A as an active transformation, rather than as a change of basis itself – we have one of those already!), we have two vectors. In old components they are x and u, and u = A x. in new components they are y and v, and v = D y.

Let me emphasize that for a similarity transformation, P need not be orthogonal. It can be, and we often choose it to be, but it doesn’t have to be. If P is not orthogonal, then we cannot replace  P^{-1}\ by P^T\ .

More generally (and I wouldn’t count this as an additional case, but you might), the general change of basis equation for matrices A and B which represent the same linear operator is

B = Q^{-1}\ A\ P\ ,

where both P and Q are transition matrices. A and B need not be square, but they are of the same dimensions, and P and Q must be square and invertible. If A and B are not square, then P and Q are different sizes. The SVD provides an example of that: if X = u\ w\ v^T is 5×2, for example, then u is 5×5, w is 5×2, and v is 2×2. (Yes, the SVD is an example of a change-of-basis.)

The second way in which a transition matrix is the natural expression of the coordinate transformation is: when we want to make a change of variable, a substitution.

Suppose, for example, we are given a second degree polynomial equation in two variables. The general case is

A\ x^2 + B\ x\ y + C\ y^2 + D\ x + E\ y + F = 0\ .

I hope you know that’s a conic section: parabola, ellipse, hyperbola, or possibly a line or two. Its shape is anything we could get by intersecting an infinite (& double) cone with a plane.

Can we figure out what it is without simply plotting it? (And it’s not all that simple to plot, either.)

Yes. Rotate the coordinate system to get rid of the x y term. We can recognize such equations when we have no cross term.

Rotate the coordinate system? Change the variables to diagonalize the quadratic terms. Changing notation completely, we write

\left(\begin{array}{l} x \\ y\end{array}\right) = P\ \left(\begin{array}{l} u \\ v\end{array}\right)

where (x,y) are the old coordinates of a point (components of the position vector), and (u,v) are the new components of the position vector. That P maps new components to old tells us that P is a transition matrix.

Incidentally, we are going to write both row vectors (x,y) and (u,v) as well as column vectors. The equation for row vectors in the transpose

\left(x,\ y\right) = \left(u,\ v\right)\ P^T

Now define a matrix M of quadratic coefficients

M = \left(\begin{array}{ll} A & \frac{B}{2} \\ \frac{B}{2} & C\end{array}\right)

from the given equation, and figure out what P must be to diagonalize M. that’s easy: P must have eigenvectors of M for columns, and since P is to be a rotation, we simply make the columns of unit length.

If you’ve never seen that in a calculus & analytic geometry book, play around with it. The given quadratic terms are

\left(x,\ y\right)\ \left(\begin{array}{ll} A & \frac{B}{2} \\ \frac{B}{2} & C\end{array}\right)\ \left(\begin{array}{l} x \\ y\end{array}\right)

=

A\ x^2 + B\ x\ y + C\ y^2

After we diagonalize M, we haveM = P\ D\ P^-{1 } (note that I “solved for” M, not D). Then

(x,y)^T\ M (x,y) = (x,y)^T P\ D\ P^{-1} (x,y)

=

(u,v)^T P^T P\ D\ P^{-1} P\ (u,v) =  (u,v)^T D\ (u,v)\ ,

because P was made orthogonal. because D is diagonal, there are no cross terms in u v. We will have an expression in u^2 and v^2 (possibly with one or both coefficients zero!) and we can name that conic. If, for example, u^2 and v^2 have nonzero coefficients with the same sign, then the conic is an ellipse. Further, the transition matrix P tells us what the new coordinate system is, in which the conic has a simple form: the major and minor axes of an ellipse would lie on the u and v axes. But P only specifies a rotation, so we would still have to complete the square in the u,v coordinates to figure out where the center of the ellipse was.

BTW, if the only thing I needed to figure out was the type of conic, I would be done as soon as I had the eigenvalues: they are the diagonal of D, and they tell me what kind of conic I have.

So much for transition matrices. Well, the key is that it’s the transition matrix we need for change-of-basis equations for matrices, especially for diagonalizing matrices; then we just understand that the transition matrix maps new components to old.

inverse transition matrices

In a sense, they are the most obvious way to specify a coordinate transformation. But i’ve already shown you why we don’t always do it “the most obvious way”. In this case, I tell you that the new components are such-and-such functions of the old ones. If those functions are linear, as for example

\begin{array}{c} X=x \cos (\theta )+y \sin (\theta ) \\ Y=y \cos (\theta )-x \sin (\theta ) \\ Z=z\end{array}

then they could be written

\left(\begin{array}{l} X \\ Y \\ Z\end{array}\right) = \left(\begin{array}{lll} \cos (\theta ) & \sin (\theta ) & 0 \\ -\sin (\theta ) & \cos (\theta ) & 0 \\ 0 & 0 & 1\end{array}\right)\ \left(\begin{array}{l} x \\ y \\ z\end{array}\right)

and that matrix

Rz(\theta) = \left(\begin{array}{lll} \cos (\theta ) & \sin (\theta ) & 0 \\ -\sin (\theta ) & \cos (\theta ) & 0 \\ 0 & 0 & 1\end{array}\right)

is the inverse transition matrix. It must be: instead of mapping new components to old, it maps old to new. Letting w and W be old and new components respectively, we know

w = P\ W

P^{-1}w = W = Rz(\theta)\ w\ .

If it weren’t that transition matrices were so important, we would probably have named the inverse transition matrix instead. But that’s not how it goes.

Actually, the reason we use the transition matrix is that it contains old components of the new basis vectors. If we were to let Q = P^{-1}\ , then our diagonalization equation would become

D = Q\ A\ Q^{-1}

and what’s the big deal? Nothing. Couldn’t we just define Q to have eigenvectors as rows….

No, NO, Noooo! “Oh, no, Mr. Bill!”

That would be just fine if we made Q orthogonal, but if we don’t want an orthogonal transformation, then our eigendecomposition gets us P, and we would have to compute P^{-1} to get Q. Bear in mind that we almost never compute P^{-1} A\ P, so we almost never need to compute P^{-1}\ , because the eigendecompsition gives us D (from the eigenvalues); we don’t have to compute D. the transition matrix P is fundamental to a similarity transformation.

So, if I give you the matrix which maps old components to new, then I have given you the inverse transition matrix.

If the matrix is orthogonal, the inverse transition matrix is also the transpose. If the matrix is unitary, the inverse transition matrix is the conjugate transpose.

attitude matrices

I know of two ways in which an attitude matrix is the natural way to specify a coordinate transformation.

First, as i’ve said before, is that we are given the new basis vectors; that is, we are given the old components of the new basis vectors. If I tell you the new x axis is the vector (1,1,0), I have given you its old components. (its new ones, of course, are (1,0,0).)

Now lay out the old components of the new vectors as rows of a matrix. That matrix is called the attitude matrix. We do this because it gives us is a matrix-like equation. If the old basis vectors are named e_1\ , e_2\ , e_3\ and the new ones are named f_1\ , f_2\ , f_3\ , and if A is the attitude matrix, then we could write

\left(\begin{array}{l} f_1 \\ f_2 \\ f_3\end{array}\right) = A\ \left(\begin{array}{l} e_1 \\ e_2 \\ e_3\end{array}\right)

OK, that might be convenient. And that’s why we might specify the attitude matrix, if we’re going to write equations involving basis vectors by name.

The key property, however, it that the attitude matrix is the transpose of the transition matrix. Come on, it’s got the old components of the new basis vectors laid out in rows; the transition matrix has them laid out in columns: they are transposes of each other by definition.

The second way I know of is geometrical, usually for rotations but it need not be limited to them correction: for rotations only. If someone says you can get the new coordinate axes by rotating the old ones in such-and-such a way, he has just given you the attitude matrix.

For 3D, visualize a set of three orthonormal vectors at the origin, visualize it as a physical object; someone has told us how to rotate that object in space.

This arises, for example, with orbits, whether satellite orbits around the earth, or planetary orbits around the sun, or any others. Consider planetary orbits, only because i’ve played with these. Suppose we’re interested in the orbit of mars.

Our old coordinate system uses the plane of the earth’s orbit as the xy-plane; z is normal to earth’s orbit and the earth is moving CCW around the z-axis. The x-axis points toward the vernal equinox. (I think they specify a particular date, called the epoch, since the vernal equinox does move, albeit very slowly.) The y-axis is chosen to give a right-handed coordinate system. Oh, the origin is at the sun. This is a coordinate system for the solar system.

Now, voyager, to Mars. (Oh, those went to jupiter and beyond.) what we start with is the equation of a conic in the form

r=\frac{p}{e \cos (\theta )+1}

where r is the distance from the sun, and \theta\ is measured from perihelion (closest approach, called periapse in general).

If we want an associated cartesian coordinate system, it is very natural to take

  • x = r cos \theta\
  • y = r sin \theta\
  • z normal to the plane of the orbit, to give a right-handed coordinate system.

And then what we have is, for Mars anyway, an ellipse whose major axis lies on the x-axis, minor axis on the y, origin at the sun. Call these Mars coordinates.

Cool. Give me \theta\ , I can tell you r, i.e. where Mars is. On its orbit. Or give me time t, and I can get \theta\ . (The challenge is to get time t given the angle \theta\ .)

Yeah, but i’d like to know where it is wrt my old coordinate system: where in the solar system is Mars? strictly speaking, what i’m going to tell you is, where in the solar system is Mars’ orbit?

What people often give us, for the coordinate transformation, are three angles: inclination i, longitude of the ascending node \Omega\ , and argument of periapse \omega\ . What do they mean? How do we use them?

The orbit of Mars and the orbit of earth lie in distinct planes through the sun, which therefore intersect in a line, called the line of nodes. That line, of course, lies in both planes, and in particular, it lies in our old xy-plane. The longitude of the ascending node \Omega\ specifies the angle of that line wrt the x-axis, measured in whatever direction has Mars ascending (moving from negative z to positive z).

So if we rotate about the z-axis thru \Omega\ , we move our old x-axis to the line of nodes.

Now look down that line: we have two planes making an angle, and that’s i, the inclination. If we do a rotation about the new x-axis (i.e. about the line of nodes), we can move our old z-axis so that it’s lined up with the new one, Mars z-axis. Alternatively, we are moving our old xy-plane so that it lies in Mars’ xy-plane.

The only thing left is that our current x-axis is still the line of nodes. We need to rotate it to Mars’ x-axis; that’s a rotation about the new (Mars) z-axis, and the angle required is, you got it, the argument of periapse, \omega\ .

To get from the solar system coordinatesystem (old) to Mars’ coordinate system (new), we start with the old axes, rotate about z thru \Omega\ , then rotate about x thru i, and then rotate thru z by \omega\ .

What makes this so simple is that after we do, say, the first rotation about z thru \Omega\ , the next rotation is written as a rotation about x thru i. It is about the intermediate x-axis, the line of nodes, but the matrix is our standard regly “rotation about the x-axis”. (I’ll show you these in the next post.)

Any such sequence of rotations about coordinate axes is called an euler-angle sequence. They always exist. They are not unique – we used ZXZ, but there are 11 other sequences we could have used. And, in some cases, it can be difficult to automatically find the angles that work. But this kind of transformation is a typical application.

Oh, I need to repeat that the end result of that sequence of rotations is the attitude matrix between solar system and Mars, with the solar system as old coordinate system.

there is a picture of the angles here. at the bottom of the page.