Color: Cohen’s intriguing F matrix again

Let me run thru the derivation of Figure 20 again, still with the CIE 1964 tables, but at 20 nm intervals. Then I show two alternative calculations. And one of them will show us why Cohen switched the sign on f2.

There is interesting linear algebra and vector algebra in this post, even if you’re not interested in where these problems came from.

Review:

What we are aiming for is an orthonormal basis whose three vectors are the columns of an F matrix. In particular, the first column — the first basis vector — is the fundamental of an equal-energy illuminant. The challenge was to find two other columns of F which matched Cohen’s Figure 20.

There were scaling issues all over the place. In particular, Cohen’s final three vectors were orthogonal but not orthonormal, because they were not of unit length. Having matched Figure 20, I will no longer concern myself with his scaling; unit vectors, I want unit vectors.

Let’s grab things as we need them.

Since I’ve shown you pictures derived from the 1964 xyz bar tables, I’ll stay with 1964.

What range of wavelengths? The CIE 1964 xyz bar tables run from 380 to 780. What the heck, just use 400 to 700 as before.

What interval? 20 nm, so I can display things.

Then my A matrix is the following subset of the CIE 1964 xyz bar tables:

A = \left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185 \\ 0.1177 & 0.7618 & 0.0607 \\ 0.3768 & 0.962 & 0.0137 \\ 0.7052 & 0.9973 & 0. \\ 1.0142 & 0.8689 & 0. \\ 1.124 & 0.6583 & 0. \\ 0.8563 & 0.3981 & 0. \\ 0.4316 & 0.1798 & 0. \\ 0.1526 & 0.0603 & 0. \\ 0.0409 & 0.0159 & 0. \\ 0.0096 & 0.0037 & 0.\end{array}\right)

There are 16 distinct wavelengths. I take the equal-energy illuminant to have the value 100 for 16 values:

ee= {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

To get its fundamental, I need a projection operator onto the fundamental color space (the pre-image of the range of A’, the transpose of A). To get that, I need an orthonormal basis for the fundamental color space. To get that, I need the Singular Value Decomposition (SVD) of A’:

A’ = u w v’

We know that the first 3 columns (because A only had 3 columns) of v are an orthonormal basis for the fundamental color space; the remaining 13 (= 16 – 3) columns of v are an orthonormal basis for the null space of A’. Here are the first 3 columns of v, and I name them v1:

v1 = \left(\begin{array}{ccc} -0.0253683 & 0.0153138 & 0.00589786 \\ -0.285062 & 0.175889 & 0.0585877 \\ -0.574584 & 0.357498 & 0.0807567 \\ -0.513035 & 0.307203 & -0.0134903 \\ -0.242873 & 0.0986244 & -0.196418 \\ -0.112652 & -0.0638886 & -0.364177 \\ -0.124169 & -0.198387 & -0.487962 \\ -0.173395 & -0.313102 & -0.431359 \\ -0.221517 & -0.394131 & -0.198767 \\ -0.250764 & -0.427605 & 0.140947 \\ -0.241369 & -0.398876 & 0.386199 \\ -0.171479 & -0.278411 & 0.372286 \\ -0.0839289 & -0.135192 & 0.203385 \\ -0.0292822 & -0.046994 & 0.0743802 \\ -0.00781685 & -0.012531 & 0.020133 \\ -0.00183092 & -0.00293336 & 0.00474977\end{array}\right)

We know that we construct a projection operator R onto the column space of v1 by

R = v1 v1′.

We can quickly check that R is, at least, a projection operator, since it is idempotent: i.e. R^2 = R.

And we can check that it is a projection operator onto the column space of v1, since it preserves each of the columns of v1: i.e. R v1 = v1.

If we wanted, we could confirm that it projects all of the other columns of v to the zero vector: i.e. R v2 = 0, where v2 is the last 13 columns of v.

Enough already, we came here to get the fundamental of the equal energy illuminant…

f = R ee = {5.53946, 62.01, 125.887, 116.934, 68.0773, 55.4377, 80.9502, 109.171, 126.547, 128.189, 113.072, 76.3005, 36.473, 12.5844, 3.34797, 0.782783}

Make it a unit vector f1…

f1 = {0.0165427, 0.185182, 0.375939, 0.349205, 0.203302, 0.165555, 0.241744, 0.326021, 0.377912, 0.382816, 0.337672, 0.227859, 0.108921, 0.0375811, 0.00999815, 0.00233765}

For the other two vectors, we found that we could match Cohen by taking the three vectors {f1, E2, E3} and applying the Gram-Schmidt orthogonalization procedure… which is nothing more than subtracting off the part of E2 which is parallel to f1 (and normalizing, to get a unit vector f2), and then subtracting off the part of E3 which is in the span of f1 and f2.

First, we need the E basis; they are the dual basis to A’; or E’ is the dual basis to A. As always, I find the solution by writing the definition…

E’A = I

then writing the false (because A’ is not invertible)…

E' = A^{-1}\ ,

which leads me to write the true

E' = (A'A)^{-1} A'\ .

As usual, that formula is for the transpose E’ but I write E = E”, the transpose of the transpose:

E = \left(\begin{array}{ccc} 0.00507367 & -0.00518039 & 0.00960814 \\ 0.050171 & -0.0533474 & 0.109524 \\ 0.0711671 & -0.0800759 & 0.224744 \\ -0.00118297 & -0.00392724 & 0.205467 \\ -0.140011 & 0.15312 & 0.103269 \\ -0.256404 & 0.304158 & 0.0467504 \\ -0.323667 & 0.432104 & 0.0296721 \\ -0.247916 & 0.423527 & 0.0110232 \\ -0.0411424 & 0.27367 & -0.0156973 \\ 0.237054 & 0.0289735 & -0.0458278 \\ 0.422342 & -0.164367 & -0.0633192 \\ 0.375936 & -0.19191 & -0.0524917 \\ 0.200409 & -0.110177 & -0.027315 \\ 0.0725721 & -0.0410648 & -0.00979225 \\ 0.0195879 & -0.011175 & -0.00263529 \\ 0.00461444 & -0.00264363 & -0.000619869\end{array}\right)

Just for my sanity, let me make unit vectors e2 and e3 out of E2 and E3 (which are, in case you have any doubt, the 2nd and 3rd columns of E):

e2 = {-0.00645091, -0.0664311, -0.0997151, -0.00489041, 0.190674, 0.378755, 0.53808, 0.5274, 0.34079, 0.0360794, -0.204679, -0.238978, -0.137198, -0.0511362, -0.0139157, -0.003292}

e3 = {0.02679, 0.305381, 0.626647, 0.572896, 0.28794, 0.130352, 0.0827338, 0.0307357, -0.0437682, -0.12778, -0.176551, -0.146361, -0.0761614, -0.0273034, -0.00734788, -0.00172836}

The part of e2 which is parallel to f1 is…

f1 (e2.f1)
= {0.00585458, 0.0655375, 0.133048, 0.123587, 0.0719501, 0.0585914, 0.0855552, 0.115382, 0.133746, 0.135482, 0.119505, 0.080641, 0.0385479, 0.0133002, 0.00353843, 0.000827313}

so the part of e2 which is orthogonal to f1 is found by subtracting the parallel part from e2…

t1 = e2-f1 (e2.f1)
= {-0.0123055, -0.131969, -0.232763, -0.128477, 0.118724, 0.320164, 0.452525, 0.412018, 0.207044, -0.0994022, -0.324183, -0.319619, -0.175746, -0.0644364, -0.0174541, -0.00411931}

Do a quick check that t1 is othogonal to f1…

t1.f1 = -5.15775*10^-17

But t1 is not a unit vector: it’s squared length is t1.t1 = 0.874749 .

So we normalize it, and call the result f2:

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.341736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

To get f3, we find the part of e3 which lies in the plane spanned by f1 and f2…

f1 (f1.e3)+f2 (f2.e3)
= {0.00735195, 0.0822258, 0.166479, 0.153523, 0.0869147, 0.0679319, 0.0993851, 0.136557, 0.161737, 0.167772, 0.150999, 0.103177, 0.0496131, 0.0171664, 0.00457096, 0.00106922}

then subtract that from e3, and the result will be orthogonal to the plane spanned by f1 and f2…

t2 = e3-(f1 (f1.e3)+f2 (f2.e3))
= {0.0194381, 0.223156, 0.460167, 0.419372, 0.201026, 0.0624205, -0.0166512, -0.105822, -0.205505, -0.295552, -0.32755, -0.249538, -0.125774, -0.0444698, -0.0119188, -0.00279758}

Like t1, t2 is not a unit vector, but we make it so, and call the result f3:

f3 = {0.0215889, 0.247847, 0.511084, 0.465775, 0.223269, 0.0693272, -0.0184937, -0.117531, -0.228244, -0.328255, -0.363793, -0.277149, -0.139691, -0.0493903, -0.0132376, -0.00310713}

If we arrange f1, f2, f3 as the columns of F…

F = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

we can compute F’F. Since F is supposed to be orthonormal, we expect to get a 3×3 identity matrix. Sure enough…

F'F = \left(\begin{array}{ccc} 1. & 0 & 0 \\ 0 & 1. & 0 \\ 0 & 0 & 1.\end{array}\right)

The resulting F, of course, are far from smooth…

but they still have the same kind of shapes we saw in Figure 20.

Oh, except that to match his figure, I had to take the negative of f2.

But I know of no good reason to do that, so I’m not going to. Yet. (Ok, Ok — the obvious guess is that f1, f2, f3 is a left-handed basis. If we change the sign of f2, we will get a right-handed basis. I’ll show it to you later in this post. That is, there is a good reason to do that, but nothing we’ve seen yet would lead us to it. )

What have we done? On a coarser scale, we have reproduced the calculations that lead to a specific orthonormal F matrix. It is one of many — infinitely many — bases which have f1 as the first basis vector. That really is the only special vector; the other two span the plane perpendicular to f1.

Cohen’s Cholesky Decomposition

Now let’s try it the way I think Cohen did it.

There is one potentially confusing problem: notation, as so often happens.

In the following discussion, “A” is no longer a matrix of color-matching functions. We will take “A” to have columns f, E2, E3; that was my starting point for the Gram-Schmidt orthogonalization. He describes such a starting point, but I believe he never shows the computations for such a case.

Here then is our new A matrix (which I call A1 in practice):

A1 = \left(\begin{array}{ccc} 5.53946 & -0.00518039 & 0.00960814 \\ 62.01 & -0.0533474 & 0.109524 \\ 125.887 & -0.0800759 & 0.224744 \\ 116.934 & -0.00392724 & 0.205467 \\ 68.0773 & 0.15312 & 0.103269 \\ 55.4377 & 0.304158 & 0.0467504 \\ 80.9502 & 0.432104 & 0.0296721 \\ 109.171 & 0.423527 & 0.0110232 \\ 126.547 & 0.27367 & -0.0156973 \\ 128.189 & 0.0289735 & -0.0458278 \\ 113.072 & -0.164367 & -0.0633192 \\ 76.3005 & -0.19191 & -0.0524917 \\ 36.473 & -0.110177 & -0.027315 \\ 12.5844 & -0.0410648 & -0.00979225 \\ 3.34797 & -0.011175 & -0.00263529 \\ 0.782783 & -0.00264363 & -0.000619869\end{array}\right)

Cohen computed two F matrices (F1 and F2) using two Cholesky decompositions, but we only need one of them. We recall that the first column of F2 = E Ga is proportional to the first column of A — and that’s what we want — so we will need E1, i.e. the dual basis to A1.

That is crucial. We need the dual basis to A1, not to A. We’ve already used the dual basis to A, to get two columns of A1, but now we need the analog of E for A1 instead of for A.

By definition, (dropping the 1’s for a moment)

E’A = I

and so

E' = (A'A)^{-1} A' = Me\ A'\ .

I am using Cohen’s

Ma = A’A

Me = E’E

and it turns out that Ma and Me are inverses. I will, of course, apply that equation for A and E to A1 and E1.

Here is Ma and its inverse Me:

Ma = \left(\begin{array}{ccc} 112130. & 95.1685 & 52.2359 \\ 95.1685 & 0.644885 & 0.0411263 \\ 52.2359 & 0.0411263 & 0.128627\end{array}\right)

Me = \left(\begin{array}{ccc} 0.0000123197 & -0.00153021 & -0.00451381 \\ -0.00153021 & 1.77301 & 0.054536 \\ -0.00451381 & 0.054536 & 9.59006\end{array}\right)

Here is E1:

Now we do a Cholesky decomposition of Ma:

Ga = \left(\begin{array}{ccc} 334.859 & 0. & 0. \\ 0.284205 & 0.751074 & 0. \\ 0.155994 & -0.00427115 & 0.322916\end{array}\right)

That is, the Cholesky decomposition has given us Ga such that Ga Ga’ = Ma.

Cohen had showed us that F2 = E Ga was an orthonormal matrix:

F2 = E Ga = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

I want to point out now, that the lower-triangular structure of Ga says that the 3rd column of F2 is proportional to the third column of E1: in fact, I claim we have (although it’s difficult to make it clear)

F2_3 = .322916 E1_3

(i.e. the 3rd column of F2 is .322916 times the 3rd column of E1).

Cohen also told us (I’ll confess that I haven’t figured out why) that the 1st column of F2 is proportional to the first column of A1: in fact, we have

F2_1 = .00298633 A1_1

Anyway, we have just applied his method to (f, E2, E3). And we have confirmed the realtionships between F2 and A1 and E1.

Fine, but what did we get? Did we get the same answers as from Gram-Schmidt?

Yes. The first column of F2 is equal to f1, the second column to f2, and the third column of F2 is equal to f3. (I could put up a drawing… but it looks exactly like the previous one.)

Lovely! And I’ll point out that we matched my f2 — not the negative of it. Even his method would require that we take the negative of his answer in order to get Figure 20.

The cross product

There is one other alternative computation I want to do.

We just saw that f1 was proportional to the first column of A1 — that’s how we chose it, a unit vector proportional to the fundamental f.

But we also saw that f3 was proportional to the third column of E1 — and so we could have computed f3 as soon as we had E1, just by normalizing the third column of E1. Cohen seems to have gone to an awful lot of work to get the matrix F2, given that both its first and third columns are known as soon as we compute E1 from A1. (This would work in all of his cases: for F2, normalize the first column of A and the third column of E; or — for F1 — normalize the first column of E and the third column of A.)

Anyway, that leaves f2, the second column of F2, to be determined.

Gee, it’s too bad we can’t compute a cross product for vectors of length 16.

Gee, who needs to?

Those vectors lie in a 3D subspace, so there is a change-of-basis transition matrix which leaves them with 3 nonzero components. In fact, the transition matrix is precisely v, and we know that new components y are related to old components x by

x = v y

so

y = v^{-1} x = v'x\

since v is orthogonal. For old components f1, we get

t1 = v’ f1
= {-0.913557, -0.393456, -0.102984, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

so we take y1 to be the three nonzero components…

y1 = {-0.913557, -0.393456, -0.102984}

For old components f3, we get

t3 = v’ f3
= {-0.361692, 0.901754, -0.236681, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

and, again, we take y3 to be the three nonzero components…

y3 = {-0.361692, 0.901754, -0.236681}

Now, what we have are the i and k unit vectors (first and third columns), we want the j unit vector (second column). The right-hand-rule shows me — you can’t do this with your right hand on the mouse! — that

j = k x i,

so I compute

y2 = y3 \times y1

y2 = {-0.18599, 0.178973, 0.966114}

(which is a unit vector) and use

x = v y

to get the corresponding f. Oh, I need to pad y2 out with zeroes.

t4 = {-0.18599, 0.178973, 0.966114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

t2= v t4
= {0.013157, 0.141101, 0.24887, 0.137367, -0.126939, -0.342318, -0.483839, -0.440529, -0.221371, 0.106281, 0.346616, 0.341736, 0.187908, 0.0688953, 0.0186619, 0.00440436}

Quickly recall f2

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.34,1736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

and we can confirm that t2 = – f2.

and that — presumably — is why he took the negative of f2: both his method and my Gram-Schmidt delivered left-handed coordinate systems; the cross product let us construct a right-handed coordinate system.

I think that’s pretty slick. Sure, it’s just a cross-product, but it was in a 3D subspace of R^{16}\ .

Color: Cohen Figure 20, “an intriguing F matrix”

I struggled to make sense of pages 94-101 of Cohen’s “Visual Color and Color Mixture”.

I understand them now. Mostly.

The challenge was to derive a drawing of three basis vectors (Figure 20, p. 95), shown here :

He calls these three vectors both “an intriguing F matrix” and “This canonical orthonormal configuration….” But as far as I can tell, he never said where he got this particular one. He made it sound important, so I had to derive it. Figuring it out, as usual, was very educational.

What follows is a detective story.

I eventually decided that the most likely candidate was a “transformation matrix” on p. 101. There are five transformation matrices on that page, but the key is that exactly one of them is associated with an equal-energy illuminant — and that’s related to “EE” in the figure.

Let us proceed somewhat as I did, but without all the wrong turns and blind alleys. If you want a chance to wander those byways, then close this page, open a copy of Cohen, and go to work. OTOH, I’m about to take you down one, just one, wrong turn. Hey, it’s interesting and valuable.

First, let’s get that particular transition matrix of his. He denotes it T_E but I will just refer to it as “TE” (outside of LaTeX).

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

He refers to the five T’s as “direct transformation matrices from the 1964 CIE X,Y,Z color-matching functions to orthonormal F matrices.” TE is specifically related to the equal-energy illumninant; the other four are related — in the same but as yet unexplained fashion — to the standard illuminants A, B, C, and D_65. (I have done the work to confirm that.)

OK. TE relates an A matrix to an F matrix. But how? “Transformation matrix” just doesn’t tell me enough.

A has 3 columns, so we could compute A TE, i.e. post-multiply A by TE. Since T_E^T\ , T_E^{-1}\ , and T_E^{-T} are all the same size, we could use any of them in place of TE.

Well, A TE doesn’t look right — I dismissed it immediately when I first saw it — but eventually I saw the light. A TE does not produce an orthonormal matrix — but it’s only off by a factor of \sqrt{2}\ (and some change).

Let’s compute A TE and call it F….

Oh, first I had better get the A matrix.

Cohen uses the 1964 CIE tables for this, so I take the A matrix to be the 1964 XYZ (i.e. xyz bar) tables at 5 nm intervals. Not at 20 nm. My only target is a drawing, and I want a lot of detail. I plan to repeat part of these calculations, and to do some alternatives, at 20 nm in the next post. But for this one, I need detail.

I have found a set of these tables online. Let me show you, for checking purposes, a tiny subset of the XYZ tables as I store them:

\left(\begin{array}{cc} 400 & \{0.0191,0.002,0.086\} \\ 405 & \{0.0434,0.0045,0.1971\} \\ 410 & \{0.0847,0.0088,0.3894\} \\ 415 & \{0.1406,0.0145,0.6568\} \\ 420 & \{0.2045,0.0214,0.9725\} \\ 425 & \{0.2647,0.0295,1.2825\}\end{array}\right)

My A matrix has 3 columns and 61 rows. The six rows of A corresponding to that part of the XYZ64 table are:

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.0434 & 0.0045 & 0.1971 \\ 0.0847 & 0.0088 & 0.3894 \\ 0.1406 & 0.0145 & 0.6568 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.2647 & 0.0295 & 1.2825\end{array}\right)

Mathematica aside.
There, I’ve finally shown you how I’m getting different A matrices whenever I change things. The table XYZ64 — like my other three tables — has wavelength for the first argument, and the Interpolation command lets me select any subsets, or even to interpolate. (It always gets the actual points right. In fact, I am usually simply getting subsets of the data, not truly interpolating. If I ever ask for 1 nm intervals, then I would be interpolating from the 5 nm table.)

For example, the following command gives me the entries for an A matrix from 400 to 500 in 20 nm intervals (and displays it nicely):

Table[Interpolation[XYZ64,i],{i,400,500,20}]//MatrixForm

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185\end{array}\right)

So, once having set up my XYZ64 table, and the others, I can get any plausible A matrix I want.

end Mathematica aside.

For this problem, my A matrix is the entire 1964 CIE XYZ table at 5 nm intervals. Now we compute

F = A TE

and plot the results. Let me put the two drawings side-by-side (mine on the left):

That black curve could be his “EE” on a different scale, but my other two are utterly different from “I” and “II”.

Let me show you the first six rows of F:

\left(\begin{array}{ccc} 0.0118144 & -0.0177697 & 0.00228112 \\ 0.027023 & -0.0406959 & 0.00540148 \\ 0.0532659 & -0.0802704 & 0.0112139 \\ 0.0895494 & -0.135158 & 0.0200304 \\ 0.132212 & -0.19958 & 0.0316765 \\ 0.174118 & -0.262005 & 0.045079\end{array}\right)

Cohen said that F would be orthonormal. It is not. We would conclude that F was orthonormal if F’F = I. So we compute F’F:

F^TF = \left(\begin{array}{ccc} 2.05018 & 0.00324299 & -0.0144556 \\ 0.00324299 & 2.00107 & 0.000405886 \\ -0.0144556 & 0.000405886 & 2.0003\end{array}\right)

Round it off:

F^TF = \left(\begin{array}{ccc} 2.05 & 0 & -0.01 \\ 0 & 2. & 0 \\ -0.01 & 0 & 2.\end{array}\right)

I have no idea why his F — equivalently, his TE — is too large by \sqrt{2}\ . (The off-diagonal terms are almost zero, so these vectors are almost orthogonal. But their squared lengths are 2, so their lengths are \sqrt{2}\ . We could have fixed that by dividing TE by \sqrt{2}\ .)

Since the book was published posthumously, perhaps the editor was editing without computing. Perhaps he took what he found in Cohen’s notes or drafts and hoped that everything was consistent.

I assure you that the other possibilities (T_E^T\ , T_E^{-1}\ , and T_E^{-T}) do not come close to delivering an orthonormal F. Feel free to try them.

We conclude that there is a minor error in TE; it turns out that the other four transformation matrices on p. 100 also need to be divided by \sqrt{2}\ . Whatever he had in mind, he was consistent. Under the circumstances — posthumous publication, and the ease of computing unit vectors — this is rather unimportant — conceptually and computationally — once we know it.

More importantly, I think, we also conclude that the F produced by TE is not the F matrix in Figure 20. They seem to have — and actually do have — proportional first columns, but that’s all.

That is a little more troubling. I could have searched for a derivation of TE; instead, I searched for a derivation of Figure 20. I don’t care to do both. Let me be explicit: I don’t know how he got his TE matrix, nor any of the other 4 transformation matrices on p. 100. I do know how he got Figure 20.

It will be more than enough work to get Figure 20. I don’ t know where TE came from. I have some ideas, and I expect that you will too by the time we finish, but I’ m done with TE. Maybe I shouldn’t be, but enough is enough.

I should emphasize, however, that I am keeping the first column of F computed from TE, at least for a little while.

Oh, let me do one last thing with TE before I abandon it.

We know the relationship between A and TE: F = A TE. How would we express that in my usual terminology? Using T for TE, we have

F = A T

and we transpose, getting

F’ = T’ A’.

That says that each column of F’ is obtained by applying the matrix T’ to the corresponding column of A’. That tells us two or three things.

First and second, T’ is the transition matrix, and therefore his “transformation matrix” TE is actually an attitude matrix; third, he takes A as the new components and F as the old.

Just so I know.

Now let’s find a way to match Figure 20.

First, where did the first column — the black curve or “EE” — come from? (Yes, we showed that F produced from TE has a first column which is proportional to the “EE” curve. But where did this common vector come from?)

He tells us that it is proportional to the fundamental of the illuminant.

Ok. To get the fundamental, we need the projection operator. To get that, we need the SVD (Singular Value Decomposition) of the color-matching functions, i.e. of the matrix A^T\ .

A’ = u w v’.

More to the point, we need the first three columns of the matrix v. (We need three columns because A’, like A, is of rank 3.)

Let me show you the first six (of 61) rows of v1:

\left(\begin{array}{ccc} -0.0126669 & 0.00772016 & 0.00299387 \\ -0.0289985 & 0.0177432 & 0.00674605 \\ -0.0572094 & 0.0351751 & 0.0129521 \\ -0.0963098 & 0.0596099 & 0.0210865 \\ -0.142331 & 0.0886581 & 0.0298013 \\ -0.187415 & 0.117288 & 0.0368556\end{array}\right)

Those 3 column vectors are an orthonormal basis for the preimage of the range of A’ (“the fundamental color space”). We form a projection operator by

R = v1 v1′

Now to get compute the fundamental of an equal-energy illuminant ee, we need to define the illuminant (I used all 100’s):

ee = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

Its fundamental f is the projection under R:

f = R ee =

{5.53525, 12.6614, 24.9595, 41.9659, 61.9674, 81.6212, 98.8882, 114.647, 125.821, 130.294, 129.298, 124.753, 116.918, 107.167, 95.2578, 80.7352, 68.1596, 60.0607, 54.8153, 53.7962, 55.5973, 59.7538, 65.4578, 72.9488, 81.1471, 88.5547, 95.6709, 102.8, 109.309, 114.255, 118.44, 122.816, 126.542, 129.052, 130.251, 129.621, 127.996, 126.529, 123.888, 119.146, 112.759, 105.059, 96.1651, 86.301, 76.0275, 65.9399, 55.8152, 45.5545, 36.3286, 28.6452, 22.185, 16.823, 12.5322, 9.19186, 6.64563, 4.72835, 3.33391, 2.32954, 1.61867, 1.12825, 0.779473}

Let me graph a scaled version of f along with the first column of F. (Scaled? I have to divide f by 470. Trust me.)

As I have done before, the black dots are one vector — in this case, the fundamental f — and the red curve is the other vector, in this case the first column of F.

We see that they are the same. Where we used to know that his TE gave us a column of F proportional to the EE curve in Figure 20, now we know what the curve is. Now I can dispense with F computed from TE, because I can use the fundamental f of the equal-energy illuminant instead.

The horizontal axis is off — I didn’t bother showing the wavelength.

That scaling by 470? Ha! The 1964 XYZ table at 1 nm intervals is over the range 360 to 830 nm. And 830-360 = 470.

(Probably “the length of the interval of integration”. Not that an integral is appropriate, in my opinion. And Cohen has pointedly been doing linear algebra, not calculus. But the literature is littered with integrals, even though the data is discrete. This is the mathematics I am skeptical of in all such works: they use calculus for their theory but linear algebra for their computations. (The example from Wyszecki & Stiles was closer to linear algebra, and the example they did two pages later is clearly not done by integration — no, I didn’t work that example.) I suspect it’s because a long time ago we pretty much restricted ourselves to continuous systems…. But I keep coming back to the fact that their data — the color matching functions — are discrete, and linear algebra is the appropriate tool. That’s my opinion.)

That’s where the 470 seems to be from, but I don’t really care. It would have made more sense for him to have used 400-700 instead of 360-830, and that’s one reason I don’t care about the scaling. Further, and more importantly, we will ultimately make a unit vector out of the fundamental f. Its scaling along the way just isn’t all that important.

And I didn’t scale the first column of F, so the \sqrt{2}\ in F is consistent with the 470 in f.

Sheesh! And I thought Fourier transforms had too many different scalings!

What interests me is that the first column of F is proprtional to f. And both appear to be proportional to the curve “EE” in figure 20. This is progress.

Now let’s get the dual basis E’. (I think I like idea of calling the rows of E’ a dual basis for the columns of A… and the columns of E a reciprocal basis for the columns of A… and, finally, the columns of E are a dual basis for the rows of A’. Row vectors and column vectors live in vector spaces which are dual to each other.)

Oh, say it compactly! The columns of E are simultaneously a reciprocal basis for the columns of Acolumns of A and a dual basis for the rows of A’. Numerically the same, conceptually different.

I recall the required computation as

E’A = I,

E' = A^{-1} = (A'A)^{-1} A'\ ,

where A^{-1} doesn’t exist, but led me to the appropriate pseudo-inverse.

Like everything else in this example, E is too big to display, but here are its first six rows:

\left(\begin{array}{ccc} 0.00128533 & -0.00131508 & 0.00240808 \\ 0.00289282 & -0.00297823 & 0.00552559 \\ 0.00554886 & -0.00576118 & 0.0109365 \\ 0.00901767 & -0.00947485 & 0.0184882 \\ 0.0127322 & -0.0135554 & 0.0274475 \\ 0.0157701 & -0.0170116 & 0.0363094\end{array}\right)

And here is a graph of the three rows of E’ (equivalently, the three columns of E):

NOTE that these are the “CIE 1964 XYZ fundamentals” on p. 163 of Cohen. Some of those resemble Figure 20… wait, E is not orthonormal, so none of them should be curves in Figure 20 (which are orthogonal, although not unit, vectors!) The resemblance is just that: not much more than vague similarity.

OK, let’s try getting an orthonormal basis, somehow.

I don’t know about you, but I want something I understand. Let’s apply Gram-Schmidt to a non-orthonormal basis. (This is almost certainly not what Cohen did. I’ll show you what I think he did in the next post. We’ll get the same answer.)

The first two non-orthonormal bases I started with were

f, E1, E2

and

f, E1, E3.

In retrospect, my third choice

f, E2, E3

seems more obvious. OTOH, more obvious or not, what matters is that it does, in fact, generate Figure 20!

As is my custom, I am going to describe Gram-Schmidt for this 3D case. The general method just extends it from 3D to n-dimensions.

We have our first vector f. Normalize it, by

f1 = \frac{f}{\sqrt{f \cdot f}}\ .

This gives us our first unit vector, f1, for the new basis.

f1 = {0.00827079, 0.0189187, 0.0372946, 0.0627057, 0.092592, 0.121959, 0.147759, 0.171307, 0.188002, 0.194686, 0.193198, 0.186406, 0.174699, 0.16013, 0.142335, 0.120635, 0.101844, 0.0897429, 0.0819053, 0.0803825, 0.0830737, 0.0892844, 0.0978074, 0.109, 0.12125, 0.132319, 0.142952, 0.153603, 0.163331, 0.170721, 0.176973, 0.183512, 0.189079, 0.192829, 0.194622, 0.19368, 0.191252, 0.18906, 0.185114, 0.178029, 0.168484, 0.15698, 0.14369, 0.128951, 0.113601, 0.0985278, 0.0833993, 0.0680677, 0.0542824, 0.0428017, 0.033149, 0.025137, 0.0187257, 0.0137345, 0.00992992, 0.00706512, 0.00498154, 0.00348081, 0.00241862, 0.00168584, 0.00116469}

Normalize E2 and E3, as we did f, just to start with vectors of the same size. Call the unit vectors e2 and e3.

e2 = {-0.00326355, -0.00739089, -0.0142972, -0.0235132, -0.0336395, -0.0422167, -0.0480022, -0.0514717, -0.0506656, -0.0450514, -0.0346783, -0.0212322, -0.00302856, 0.0178248, 0.0440051, 0.0714143, 0.0954016, 0.120945, 0.141852, 0.165312, 0.189969, 0.213125, 0.235048, 0.255043, 0.270026, 0.276282, 0.276115, 0.273313, 0.264794, 0.24833, 0.226531, 0.201183, 0.171297, 0.136324, 0.0977252, 0.0577811, 0.0185254, -0.0185732, -0.0524476, -0.0805823, -0.102238, -0.116778, -0.124173, -0.124757, -0.11956, -0.109979, -0.0974696, -0.0829623, -0.0686652, -0.0558823, -0.0443658, -0.0340783, -0.0255961, -0.0189148, -0.013768, -0.00982428, -0.00696574, -0.0048783, -0.00340378, -0.00233611, -0.0016479}

e3 = {0.0134212, 0.0307963, 0.0609534, 0.103042, 0.152976, 0.202366, 0.245956, 0.285754, 0.313861, 0.325009, 0.321797, 0.308947, 0.28685, 0.259425, 0.22519, 0.182631, 0.143994, 0.114635, 0.0919794, 0.0761547, 0.0649611, 0.0570429, 0.0503958, 0.0457693, 0.041089, 0.0352787, 0.0289087, 0.0223688, 0.0152318, 0.00702492, -0.00182114, -0.0115187, -0.0217216, -0.0324157, -0.0433804, -0.0537281, -0.0633608, -0.0725008, -0.0801518, -0.0851463, -0.0875296, -0.0873093, -0.08454, -0.0794315, -0.0725582, -0.0646114, -0.0558614, -0.0465045, -0.0377562, -0.0302361, -0.0237077, -0.0180942, -0.0135352, -0.00996527, -0.00722962, -0.00515145, -0.0036426, -0.00254819, -0.00177437, -0.00122704, -0.000856806}

Now apply Gram-Schmidt to the pair f1 and e2. The vector e2 is not orthogonal to f1, but it can be written as the sum of something parallel to f1 plus something perpendicular to f1. We will find the parallel part using the dot product, and get the perpendicular part by subtraction.

The component of e2 wrt f1 is… e2 \cdot\ f1 = 0.354107 (since f1 is a unit vector), so we make a vector of that length, parallel to f1, and subtract it from e2. The result of that vector subtraction is perpendicular to f1. We normalize it and call it f2.

f2 = f1 - (e2 \cdot\ f1)\ f1\ .

Then

f1, f2

is an orthonormal basis that spans the same subspace as

f1, e2.

f2 = {-0.00662133, -0.0150664, -0.029409, -0.0488852, -0.0710293, -0.0913202, -0.107276, -0.119901, -0.125361, -0.121889, -0.110234, -0.0932842, -0.0693866, -0.041572, -0.00683967, 0.0306849, 0.063449, 0.0953437, 0.120668, 0.14633, 0.171675, 0.194084, 0.214299, 0.231442, 0.242825, 0.245323, 0.241118, 0.234089, 0.221296, 0.200893, 0.175217, 0.145636, 0.111572, 0.072756, 0.0308042, -0.0115508, -0.052607, -0.0914459, -0.126173, -0.153574, -0.173117, -0.184308, -0.187183, -0.182227, -0.170858, -0.154906, -0.135801, -0.114483, -0.0939761, -0.0759605, -0.0599912, -0.0459572, -0.0344598, -0.0254257, -0.0184818, -0.0131801, -0.00933457, -0.00653426, -0.00455539, -0.00313629, -0.00220307}

Now what do we do with e3? We want to break off the piece which lies in the plane spanned by f1 and f2. That’s just the vector

f1\ (e3 \cdot f1) + f2\ (e3 \cdot f2)\ .

When we subtract that from e3, what remains is perpendicular to the plane spanned by f1 and f2; we will normalize it to length 1 and then call it f3. (It’s crucial that f1 and f2 are an orthonormal basis for that plane; otherwise we would need an explicit reciprocal basis to find components of e3 wrt f1 and f2.)

f3 = {0.0108058, 0.0248259, 0.0492157, 0.0833774, 0.124048, 0.164396, 0.200091, 0.232718, 0.255782, 0.264988, 0.262322, 0.251597, 0.233086, 0.2101, 0.18122, 0.145023, 0.111701, 0.085438, 0.064457, 0.0480153, 0.034675, 0.0232252, 0.0120345, 0.00174899, -0.00919785, -0.0209678, -0.0332539, -0.0457821, -0.0586168, -0.071629, -0.0848832, -0.0992835, -0.113845, -0.128148, -0.141854, -0.153559, -0.163729, -0.173431, -0.180567, -0.183118, -0.181455, -0.175821, -0.166361, -0.153478, -0.138238, -0.121869, -0.10453, -0.0863848, -0.0696761, -0.0554858, -0.0433135, -0.0329815, -0.0246352, -0.0181132, -0.0131248, -0.00934719, -0.00660276, -0.00461709, -0.00321259, -0.00222783, -0.00154979}

(Maybe I shouldn’t have printed all those, but if you have the 1964 xyz bar tables and you’re trying to verify my calculations….)

So, it’s time to look at what we got. Here are f1, f2, f3. They are an orthonormal basis, and they span the same space as E1, E2, E3 — the fundamental color space, the pre-image of the range of A’.

The brown looks like Cohen’s “I”. But the mauve curve is off. Still….

Do I need the negative of the mauve curve?

Yes!

And multiply by 44 ! (No, I have no idea why! I just kept changing the multiplier until I matched his scale on Figure 20.)

Having found them, make them pretty — i.e. set the horizontal scale.

Here, let me put them side-by-side. They look very close, but not exact. Nevertheless, I am content: his numbers tend to be off a little, so what’s the point of trying to match them perfectly?

So. Start with the non-orthonormal basis f, E2, E3 and apply Gram-Schmidt to get the orthonormal basis f1, f2, f3. Then take the negative of f2.

I must emphasize that his scale (i.e. multiply by 44) does not represent orthonormal vectors. Pairwise orthogonal, yes — but the f1, f2, and f3 I derived were specifically unit vectors. The scale on his figure requires that I multiply by 44 (or something very close to it). Once we have unit vectors, we can scale them any way that is convenient. Here it was convenient to match his — frankly, incomprehensible — vertical scale.

Oh, let’s close with one last calculation, to compute the “transformation matrix” for Figure 20. We have found the F matrix for Figure 20, and we have the A matrix. What is the transition matrix T between them? (Actually, we know we want the attitude matrix instead.) Recall that he takes A as the new, and F as the old, components.

So, as usual, we compute each column of F’ (old) by applying T to each column of A’ (new). That is,

F’ = T A’

F = A T’

and we write the false but suggestive

T' = A^{-1} F

which gives us the true

T' = (A'A)^{-1} A' F

using the appropriate pseudo-inverse.

So we compute and display T’, and remember that, as the transpose of a transition matrix, it is an attitude matrix…

T^T = \left(\begin{array}{ccc} 0.0663267 & -0.37473 & -0.161437 \\ 0.142691 & 0.37685 & 0 \\ 0.0781229 & -0.00253108 & 0.161503\end{array}\right)

We can compare that to the transformation matrix TE we tried at the beginning:

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

As we would expect — since the F produced by TE did not match Figure 20 — these are different.

Real Short Summary

I matched his Figure 20 by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3… where f was the fundamental of an equal-energy illuminant, and E2 and E3 were two of the basis vectors dual to the rows of A’.

Summary

We really started with two separate items: a “transformation matrix” TE — one of five on p. 100 — and Figure 20 on p. 95 of Cohen. The transformation matrix TE turned out to be an attitude matrix describing a new A matrix in terms of an old F matrix. (Yes, that’s backwards, but so long as we knew it, it didn’t matter. This is why I carefully distinguish attitude and transition matrices; so long as I know what I’ve been handed, I can use it properly and confidently.)

We saw that F = A TE reproduced only the “EE” curve of his Figure 20. And his F matrix failed to be orthonormal by a factor of \sqrt{2}\ .

We then saw that the “EE” curve was (proportional to) the fundamental f of an equal-energy illuminant. In order to find the fundamental of a spectrum, we had to find a projection operator R onto the pre-image of the range of A^T. The fundamental f is then the result of applying the projection operator R to a constant vector (“equal energy”).

By trial and error, I found that the F matrix for Figure 20 could be (very nearly) reproduced by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3, where E1, E2, E3 are a basis dual to the rows of A’. We found the dual basis as the rows of E' = (A'A)^{-1} A' and then transposed E’ to get E, a matrix of 3 columns.

In the next post, I will repeat these calculations quickly using 20 nm intervals, and I will show you a few alternatives. In particular, I will show you that his method (a Cholesky decomposition) gives the same answer — even to having to switch the sign of the second column!

Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

As before, I suppose there is a transition matrix T relating them; each column of V^T = V' can be computed by applying a transition matrix T to the corresponding column of P^T\ :

V’ = T P’

i.e.

V = P T’.

Then I imagine briefly that I can “solve” for T’…

T' = P^{-1}\ V\ ,

but then quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (P'P)^{-1}\ P'\ V\ .

I compute T’, and then display its transpose T:

T = \left(\begin{array}{cccc} 0.905145 & -0.386677 & 0.0882059 & -0.141151 \\ 0.853806 & 0.568131 & -0.0875601 & -0.220572 \\ 0.64498 & -0.155196 & -0.506397 & 0.490735 \\ 0.507437 & 0.131036 & 0.67664 & 0.347268\end{array}\right)

and then I test whether

V = P T’ (?).

The largest difference (element by element) between the matrices V and P T’ is 0.170627, which is rather larger (by a factor of 23) than the differences we saw in Part 1. Here’s the full blown matrix — where I chose to round off first, then subtract (i.e. what we’d compute if we were subtracting published, already rounded-off, matrices):

V - P\ T' = \left(\begin{array}{cccc} 0. & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0. & 0 & 0. & 0. \\ 0. & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I am inclined to think there is a probem. But we don’t know much at all about it.

Let’s try one of our alternatives. We construct a projection operator onto the subspace spanned by P.

Get the SVD (Singular Value Decomposition) of P…

P = u\ w\ v^T\ .

Then, since u is 10×10 and V is of rank 4, the leftmost 4 columns (u1) of u….

u1 = \left(\begin{array}{cccc} 0.132041 & 0.280276 & -0.423196 & 0.0366383 \\ 0.24536 & 0.379423 & -0.288143 & -0.0619886 \\ -0.0781059 & 0.252968 & -0.245411 & -0.0642461 \\ -0.310642 & 0.402272 & -0.122076 & 0.641009 \\ 0.19934 & 0.202925 & -0.456543 & -0.212204 \\ -0.275595 & 0.504975 & 0.417433 & 0.130869 \\ 0.0279239 & 0.347505 & 0.199748 & -0.545498 \\ -0.547166 & 0.00466106 & -0.00216808 & -0.298924 \\ -0.571058 & -0.293548 & -0.488675 & -0.061545 \\ -0.278923 & 0.220314 & 0.0368869 & -0.356268\end{array}\right)

are an orthonormal basis for the range of P (i.e. the column space of P). We construct a projection operator, as we have before, by

R = u1\ u1^T\ .

We check it first by confirming that R R = R, i.e. it is idempotent, hence a projection operator. We check it further by applying it to the column vectors in P: we expect that R P = P; i.e. it actually projects onto the column space of P (the space spanned by the columns of P).

Now, having checked that R is a projection operator onto P, we apply it to V, and compute V – R V:

V - R\ V = \left(\begin{array}{cccc} 0.00221622 & 0.0115115 & -0.0249851 & -0.0145408 \\ 0.0137721 & 0.012178 & -0.0243583 & -0.0200425 \\ -0.00564412 & -0.011024 & 0.00933521 & 0.0112507 \\ 0.00222059 & 0.00165751 & -0.00118369 & -0.00423324 \\ 0.00223276 & 0.00638406 & -0.0153346 & -0.0119237 \\ -0.00896396 & -0.0106169 & 0.0272358 & 0.0230479 \\ -0.00941945 & -0.0152489 & 0.0456323 & 0.0330044 \\ 0.0470134 & 0.0703021 & -0.170627 & -0.121383 \\ -0.0232815 & -0.036102 & 0.0941953 & 0.068255 \\ -0.0227796 & -0.0330688 & 0.0740154 & 0.0474348\end{array}\right)

Let me round that to the nearest .01…

V - R V = \left(\begin{array}{cccc} 0 & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0 & 0 & 0 & 0 \\ 0 & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I should put that array of differences into perspective. Recall V:

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

This reinforces my belief that we have a problem. Some of the differences we found are larger than some of the values of V itself. I can see that V is not in the column space of P, and while it’s not extremely far away, it’s not close either. (This seems more definitive than “the hypothetical transition matrix T doesn’t quite work”.)

As before, however, we are tip-toeing around the real question: Is every column vector in V a linear combination of the columns of P?

Easy enough. Do a regression. Actually 4 of them, one for each column of V.

Let me name the individual columns of the P matrix, and let me set the first column of V (v1) as “y”, and the columns of V as independent variables.

(Some smaller pictures follow, in which you can see that the final option is “IncludeConstantBasis -> False”.) The results are:

Not bad at all. In fact, rather close to perfect.

What was the first row of T?

{0.905145, -0.386677, 0.0882059, -0.141151}.

Yes, those are the \beta s (“Estimates”) for that regression; first row of T is first column of T’ is regression coefficients for the first column of V as the dependent variable.

Let me emphasize another relationship. For the first regression, yhat should be the projection of v1 (the 1st column of V) onto the column space of P. But we have that projection directly as R v1:

R v1 = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

… and we have the yhat from the regression:

yhat = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

Good, they are the same. So if regression works better for you than the projection operator, then use regression. Personally, I prefer the projection operator because I can apply it to all of V at once. (And because sometimes I have a use for the orthonormal basis u1.)

Largely to display the R^2, let me run the other three regressions. (Oh, I also reduced the magnification in my Mathematica notebook, so the pictures are smaller and the right side of the regression command isn’t cut off, at least on my screen.)

Here’s v2 as the dependent variable:

Here’s v3 as the dependent variable:

Here’s v4 as the dependent variable:

We see that the R^2 is over 99% for v2, falls to 95% for v3, and is 97.6% for v4. Those last two are just not large enough, not for this question.

I’m certain that V and P do not span the same subspace, but that’s about all I know.

We could, however, have made the dual choice, projecting P onto V instead of V onto P. Are they the same?

No.

We already know — from the T matrix, from the projection operator, or from the regressions — that the P and V matrices do not represent the same data in two different coordinate systems. But there might be — and in this case, there is! — more information available by making the other choice.

Let’s get the projection onto V instead of onto P. We need the SVD of V…

V = u\ w\ v^T\ .

We need the 4 leftmost columns (u1) of u…

u1 = \left(\begin{array}{cccc} -0.292636 & 0.0434068 & -0.33927 & 0.269896 \\ -0.274013 & 0.218205 & -0.238496 & 0.365998 \\ -0.337357 & -0.00723082 & -0.119568 & 0.0745101 \\ -0.355459 & 0.224109 & -0.415783 & -0.490416 \\ -0.2739 & -0.0500253 & -0.225463 & 0.452829 \\ -0.308498 & 0.463936 & 0.232078 & -0.416388 \\ -0.287334 & 0.227908 & 0.427185 & 0.212631 \\ -0.303344 & -0.281983 & 0.503807 & 0.152613 \\ -0.329068 & -0.741253 & -0.129874 & -0.304713 \\ -0.382183 & -0.0396735 & 0.288053 & -0.0857624\end{array}\right)

We construct the projection operator…

R = u1\ u1^T

Now we apply it to P, and compute the difference P – R P:

P - R P = \left(\begin{array}{cccc} 0.00186923 & -0.00646007 & -0.00165699 & 0.0293708 \\ -0.00396322 & 0.00598219 & 0.00143136 & 0.0162366 \\ 0.0035053 & 0.00255499 & -0.00525352 & -0.0232869 \\ 0.000382829 & 0.00307153 & 0.00267228 & 0.124774 \\ 0.000923133 & -0.00219591 & 0.00142733 & -0.0228012 \\ 0.0000472374 & -0.00300758 & -0.00291228 & -0.00287352 \\ -0.00462961 & -0.00220958 & -0.0000400809 & -0.149176 \\ 0.00305495 & 0.00376593 & -0.000439512 & 0.29618 \\ -0.00367312 & -0.00153518 & -0.000838857 & -0.106747 \\ 0.00147883 & -0.000459218 & 0.00482354 & -0.141981\end{array}\right)

So what?

Round it off:

P - R P = \left(\begin{array}{cccc} 0 & -0.01 & 0 & 0.03 \\ 0 & 0.01 & 0 & 0.02 \\ 0 & 0 & -0.01 & -0.02 \\ 0 & 0 & 0 & 0.12 \\ 0 & 0 & 0 & -0.02 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -0.15 \\ 0 & 0 & 0 & 0.3 \\ 0 & 0 & 0 & -0.11 \\ 0 & 0 & 0 & -0.14\end{array}\right)

Ah ha! It’s time for Robin to squeak out, “Holy moly, Batman!”

The fourth column of P does not lie in the subspace spanned by the 4 columns of V, while the first three do. That is the problem. I don’t why or where it happened, but I know exactly what’s wrong: the 4th column of P is the only problem.

Somebody blew it somewhere. (OK, since I don’t know how a varimax rotation is computed, I have to entertain the possibility that V is not supposed to be P under a change of basis. But I can’t believe that, because V and Q have a common column space, and 3 of the 4 columns of P lie in that common column space. If that’s an accident, what does a deliberate act look like?)

Is the mistake in the orignal paper? Were P and V computed from the same correlation matrix but by independent algorithms? Or was V computed incorectly from P (and then Q computed from V)?

Or did someone copy P incorrectly? (P should be primary, but then how the heck does the 4th column of P fail to be in the column space of V? I would sooner have expected an error in V, given P.)

Or did they rotate the first 3 PCs, and then apply that transformation to the 4th column? (No. I’ve checked. You can too.)

Let me emphasize that we would get the same result from four regressions each with one column of P as the dependent variable and all the columns of V as the independent variables. The first three regressions would be extremely good fits, but the fourth would not be. We would, again, conclude that the fourth column of P was not exactly — or even to a fair approximation — a linear combination of the columns of V.

Summary: theory

Given two matrices, say P and V, which are alleged to be the same data in different coordinate systems, we may check that assertion very quickly (if both are of full rank):

Define

T' = (P'P)^{-1} P'\ V\ .

If we find it true that

V’ = T P’,

then T is a transition matrix between two bases, and P and V are indeed the same data in different coordinate systems. We’re done.

If, however, the equality fails, V' \ne T\ P'\ ,

then T is not a transition matrix, and P and V are not the same data in different coordinate systems. Unless we want more information, we’re done.

Finally, we have seen that, numerically, we may have T “close to” a transition matrix. (In Part 1, V and Q are very nearly the same data; in the color posts, the xyx bar and rgb bar tables are very nearly the same data.)

Summary: practice

Computing T tells us if something is wrong, but not much more. It’s quick, easy, and gives a simple answer: “good” or “no good”.

Computing both projection operators P onto V and V onto P (not just one of them) would likewise tell us whether or not something is wrong, and, in addition, the projection operators might isolate the problem.

We could use regression instead of computing explicit projection operators.

Summary: this example

We found that V and Q are the same data in two different coordinate systems, but that P is not. In addition, we actually know that the problem is limited to the 4th column of P: it does not lie in the space spanned by the columns of V (i.e. it does not lie in the column space of V, which is also the column space of Q).

Question:
Can someone tell me if these P, V and Q match the original paper by Yule et al.?

And there’s a question you might have.
How many examples did I have to search through to find such a good (because it was such a bad) example?

One. This was the very first case I checked, outside of the 1931 CIE tables.

That’s scary.

Transition matrix: to be or not to be

Cohen (“Visual Color & Color Mixture”, see the bibliography) did something very interesting. In fact, he did something useful which I had never seen before.

Although this post uses some matrices which we saw in the color posts, I think this can stand on its own: you need not have read the color posts. But if you are specifically interested in color, or in Cohen’s work, this post is very relevant.

He was trying to describe how to find a transition matrix between two given data matrices. This will come in handy — very handy! — whenever people give the alleged result of an unspecified linear transformation of a data matrix.

On p. 93, he wrote

D T = C in general

F T = A in particular

with

T = (D'D)^{-1} D'\ C\ .

Because D’D (that is, D^T\ D\ ) is invertible, we infer that D and C are taller than wide, like the A matrix… like our usual data or design matrix X.

We also infer that T is a small matrix: its size is cxc, where c is the number of columns of D. D and C are the same size.

I do have to point out that in his equation (and his notation)

D T = C,

T is an attitude matrix! I will change that. (Trust me: you want me to change that.)

Since we have both XYZ (i.e. 1931 xyz bar) and RGB (1931 rgb bar) tables, and since they are supposed to be related by the transformation matrix T31…

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

… we should try the idea embodied in that equation for T. Assuming that there is a transition matrix T, let J and K stand for the XYZ and RGB matrices respectively.

As usual, let me use 20 nm intervals instead of the published 5 nm intervals. We have

J = \left(\begin{array}{ccc} 0.00003 & -0.00001 & 0.00117 \\ 0.0003 & -0.00014 & 0.01214 \\ 0.00211 & -0.0011 & 0.11541 \\ -0.00261 & 0.00149 & 0.31228 \\ -0.02608 & 0.01485 & 0.29821 \\ -0.04939 & 0.03914 & 0.14494 \\ -0.07173 & 0.08536 & 0.04776 \\ -0.09264 & 0.17468 & 0.01221 \\ -0.03152 & 0.21466 & 0.00146 \\ 0.0906 & 0.19702 & -0.0013 \\ 0.24526 & 0.1361 & -0.00108 \\ 0.34429 & 0.06246 & -0.00049 \\ 0.29708 & 0.01828 & -0.00015 \\ 0.15968 & 0.00334 & -0.00003 \\ 0.05932 & 0.00037 & 0 \\ 0.01687 & 0.00003 & 0 \\ 0.0041 & 0 & 0 \\ 0.00105 & 0 & 0 \\ 0.00025 & 0 & 0 \\ 0.00006 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

K = \left(\begin{array}{ccc} 0.0014 & 0. & 0.0065 \\ 0.0143 & 0.0004 & 0.0679 \\ 0.1344 & 0.004 & 0.6456 \\ 0.3483 & 0.023 & 1.7471 \\ 0.2908 & 0.06 & 1.6692 \\ 0.0956 & 0.139 & 0.813 \\ 0.0049 & 0.323 & 0.272 \\ 0.0633 & 0.71 & 0.0782 \\ 0.2904 & 0.954 & 0.0203 \\ 0.5945 & 0.995 & 0.0039 \\ 0.9163 & 0.87 & 0.0017 \\ 1.0622 & 0.631 & 0.0008 \\ 0.8544 & 0.381 & 0.0002 \\ 0.4479 & 0.175 & 0. \\ 0.1649 & 0.061 & 0. \\ 0.0468 & 0.017 & 0. \\ 0.0114 & 0.0041 & 0. \\ 0.0029 & 0.001 & 0. \\ 0.0007 & 0.0003 & 0. \\ 0.0002 & 0.0001 & 0. \\ 0. & 0. & 0.\end{array}\right)

In case you have not read the color posts, all I am saying is that J and K and T31 are given matrices, and T31 is supposed to be a transition matrix between J and K.

We do know — or take it as given — that for the transition matrix T31, old is XYZ ~ K, so let me write

K’ = T J’

That says T is a transition matrix which, applied to each column vector of J’, delivers the corresponding column of K’.

Transpose:

K = J T’

and then pretend J^{-1} exists (even though J is rectangular)

T' 	= J^{-1} K \

and then substitute the appropriate pseudo-inverse:

T' = (J'J)^{-1} J'\ K\ .

(The inappropriate pseudo-inverse would use JJ’ instead — which cannot be of full rank, given the assumed shape of J, hence cannot be invertible.)

That equation defined the transpose T’, but here’s T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.06008 \\ 0.00004 & 0.05661 & 5.59443\end{array}\right)

Recall the given T31:

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

By doing it my way, I constructed the counterpart to T31, which is the published transition matrix between XYZ and RGB (1931) with XYZ considered the “old” basis.

T and T31 are very close, but not the same.

Even if I used the full tables, at 5 nm intervals, we would not get T31.

First, I am certain that if there were a transition matrix T, then we must have computed it. Since this computation did not yield T31, I infer that T31 is not the transition matrix.

Second, is T the transition matrix? We should apply T to the appropriate table and see what we get. We had assumed

K’ = T J’, or equivalently

K = J T’

so let’s compute a counterpart to K, namely Q = J T’:

Q = \left(\begin{array}{ccc} 0.00139 & 0.00005 & 0.00654 \\ 0.01431 & 0.00039 & 0.06791 \\ 0.13434 & 0.00399 & 0.64559 \\ 0.3483 & 0.02299 & 1.74711 \\ 0.29082 & 0.06001 & 1.66915 \\ 0.09561 & 0.139 & 0.81307 \\ 0.00489 & 0.323 & 0.27202 \\ 0.06329 & 0.71 & 0.07819 \\ 0.29041 & 0.95401 & 0.02032 \\ 0.59452 & 0.99499 & 0.00388 \\ 0.91629 & 0.87 & 0.00167 \\ 1.06216 & 0.631 & 0.00081 \\ 0.85443 & 0.38099 & 0.00021 \\ 0.44795 & 0.17501 & 0.00003 \\ 0.1649 & 0.06102 & 0.00002 \\ 0.04676 & 0.01701 & 0 \\ 0.01135 & 0.0041 & 0 \\ 0.00291 & 0.00105 & 0 \\ 0.00069 & 0.00025 & 0 \\ 0.00017 & 0.00006 & 0 \\ 0 & 0 & 0\end{array}\right)

and compare it to K, via Q – K:

Picture 6

(I keep using pictures for things like that, namely scientific notation, just because it’s easier than editing the LaTeX.)

The largest entry in absolute value is… 0.000069765 . The three columns of Q look like this:

Picture 5

Pretty small, and pretty random, but still not really zero. Well, maybe it wouldn’t be. Let’s try our magic formula on J and Q, since they are related by the transition matrix T. From

T' = (J'J)^{-1} J'\ K\ ,

I compute what I call t’:

t' = (J'J)^{-1} J'\ Q\ ,

and then display the transpose of the transpose, t” = t:

t = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

Recall T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

So. That equation for T’ does recover the transition matrix if there is one. The discrepancy between T and T31 is real.

Maybe I’m making too big a deal of this. I infer that the XYZ and RGB tables do not span exactly the same subspace. They almost do, but not quite.

To be more explicit: the 1931 XYZ (xbar, ybar, zbar) and RGB (rbar, gbar, bbar) tables are not exactly related by any transition matrix whatsoever, although the published T31 is close to being one for them.

I have found a marvelous example — disappointing as a published result but marvelous as a bad example — where the discrepancy is significant. I expect to show it in another post, soon.

We’ve been looking at the computation in one way: if there is a transition matrix T such that

K’ = T J’

then

T' = (J'J)^{-1} J'\ K\ .

What if there is no such transition matrix T? We can still define T by that equation, and compute (J'J)^{-1} J'\ K\ (still assuming J is of full rank, with more rows than columns), but what are we getting, if T itself does not exist as a transition matrix?

That is, what if we compute T, and find that

K' \ne T\ J'\ ?

Well, this will be clearer in the next post, but…. That equation for T’ looks an awful lot like the normal equations for a least-squares fit.

What I will illustrate in the next post is that (J'J)^{-1} J'\ K\ generates a transition matrix to the subspace spanned by J. That is, T is a transition matrix — any invertible matrix is! — but it relates J to a subspace other than (the one spanned by) K.

(In our case, we had 21 observations; the 3 columns of J and of K define 3D subspaces of R^{21}\ . Those two 3D subspaces are almost the same, but not exactly.)

Let me try to explain here it without a good (I mean, bad) example. If J and K span the same subspace, then any one column of K (think of it as the dependent variable y) can be written as a linear combination of the columns of J (think of it as the X matrix).

But if J and K do not span the same space, then the best we can do is to find that linear combination (call it yhat !) of columns of X which is as close to y as possible.

They really are examples of y and yhat and X.

And what is T’ ? Well, if we write it with y and X…

(X'X)^{-1} X'\ y\

we might recognize that as \beta in ordinary least squares.

Let me show you that, while providing another example of the utility of the pseudo-inverse.

The derivation of the normal equations for ordinary least squares is fairly complicated, if only because it involves the derivative of a matrix. But the pseudo-inverse would let us recover the equations themselves.

Deriving them is one thing; recovering the form of them once we know they exist is another.

We write our model (this is a quick & dirty recollection of the answer, not a derivation):

y = X \ \beta\ .

(I guess I have to talk about that. True is yhat = X\ \beta\ . But I write y instead of yhat, as though we have equality rather than a projection onto a subspace.)

We pretend that X is invertible, and write

\beta =  X^{-1} y\ .

Since X is not invertible, but X’X is (by assumption X is of full rank with more rows than columns), we replace the non-existent inverse by the pseudo-inverse and write

\beta = (X'X)^{-1} X'\ y\ .

The real derivation gave us that very answer. Once I know that, I never need to compare them again. I have a plausibility argument that gives me the known right answer.

This is analagous to confirming in the previous color post, among others, that if I compute

E' = (A'A)^{-1} A'\

then I have a dual (or reciprocal or biorthogonal) basis E such that

E’A = I.

In that case, I prefer to verify it computationally every time.

Incidentally, this trickery with the pseudo-inverse would also generate a true result if I had started with yhat instead of y — it is true that \beta = (X'X)^{-1} X'\ \hat{y}\ , but it’s not very useful because we don’t know yhat!

You might have noticed that \beta corresponds, algebraically, to a column of T’. Asking if the first column of K is a linear combination of all the columns of J gives us the first column of T’; asking if the ith column of K is a linear combination of all the columns of J gives us the ith column of T’. This, too, will be in the next related post.

Summary

Given two data matrices J and K, tall and thin, we can define and compute

T' = (J'J)^{-1} J'\ K\ .

If the individual observations are in fact related by a transition matrix M, so that

K’ = M J’,

then our computed T is M:

T = M.

This is pretty slick. Any time someone hands us some data and the alleged result of applying an unspecified linear transformation to it — we can find that linear transformation if it exists, or demonstrate that it does not exist (in which case, they goofed).

And if there is no such matrix M? Well, it’s simple enough: if there is no such M — and J and K are both of full rank — then our computed T must be something else, so we do not have equality:

K' \ne T\ J'\ .

That’s all it takes for us to know that there is no transition matrix M between J’ and K’. If there were, it would be T, but that didn’t work.

(That’s if J and K are both of full rank; if only one is not, then one of J’J or K’K is actually not invertible — as JJ’ and KK’ are not invertible — and T can only be computed in one direction. I expect that T is not invertible if J and K are not both of full rank.)

And when there is no transition matrix M, we nevertheless get a transition matrix T, but it’s between J’ and Q’ = T J’.

I will freely confess that I did not know any part of this before I read Cohen: neither that we could find the transition matrix so easily; nor what we were finding if it were not a transition matrix.

(Cohen implicitly assumes that there exists a transition M; he does not discuss what happens if there isn’t one.)

Next post on this topic, a numerical illustration.

Transpose & Adjoint: review & example

introduction

Let me pick up an old problem. The mathematics comes from the two posts “transpose matrix and adjoint operator” part 1 & part 2. The problem itself comes from the schur’s lemma post.

Upon further reflection, I am going to change the problem a little bit. Do not expect to see the same answers as before.

I am also going to work it twice, assuming that we are given different information as our starting point, but I’ll do it for the very same problem.

As I have said, the appropriate question for an introduction to ABO blood groups is: Can your mother donate blood to you? Until you can answer that question, you’re missing something about blood groups.

This example is not that good, but it does tie together the following concepts and computations:

  • the transition matrix;
  • the attitude matrix;
  • the change of basis formula;
  • the reciprocal basis;
  • the transpose of a matrix;
  • the adjoint of a linear operator;
  • the matrix of the adjoint operator.

This example is intended as a guide to and a review of my relevant posts, and will not repeat very much of the explanations in them.

Well, I have found that the explanations are growing.

What is the problem? Given a matrix and a non-orthonormal basis, find the matrix of the adjoint operator with respect to the non-orthonormal basis.

What is the key? We only know one way to find the matrix of the adjoint operator: take the transpose of some matrix.

Here is the general case: given a matrix X with respect to a basis E, we can transpose it, getting X^T\ . But that is the matrix of the adjoint with respect to the dual basis F. To get the matrix with respect to the basis E, we must do a change-of-basis to get from X^T\ with respect to to F to, say, Y^T\ with respect to E.

There is, however, a very special case: if E is an orthonormal basis, then it is its own reciprocal basis, F = E, and we were done as soon as we computed the transpose, becuse X^T\ is the matrix of the adjoint with respect to E.

The general problem, then, is: given a matrix, find the matrix of its adjoint with respect to a non-orthonormal basis. We will consider two possibiities: first, that the original basis is orthonormal; second, that it is not.

I would like to make one final point before we dive in. There is a fundamental truth here, regardless of the precise definition of the adjoint operator. The fact is that two matrices A and A^T\ represent two different but related linear operators (with respect to an orthonormal basis); but if we transform those matrices to a non-orthonormal basis, the two results will not be the transposes of each other.

In other words, the relationship between two linear operators which is represented by a matrix and its transpose, holds only in an orthonormal basis.

first form of the question

First, let us suppose we are given a matrix A with respect to an orthonormal basis; and a description of a non-orthonormal basis.

Here is the given matrix A:

A = \left(\begin{array}{cc} 1 & 1 \\ 0 & 2\end{array}\right)

(This much is the same as before; this is the same matrix as in the previous posts.)

Here is the given non-orthonormal basis \{f_1, f_2\}\ specified by an attitude matrix…

att = \left(\begin{array}{cc} 1 & -1 \\ 1 & 0\end{array}\right)

Find the matrix, which I will call C^T\ , of the adjoint operator with respect to the non-orthonormal basis.

Easy enough, once you’re used to it: we have A with respect to an orthonormal basis, so we form its transpose A^T\ . This is the matrix of the adjoint with respect to the orthonormal basis. Now we just need to transform A^T\ to the non-orthonormal basis.

We want the transition matrix P, which is the transpose of the attitude matrix: P = {att}^T\ .

P = \left(\begin{array}{cc} 1 & 1 \\ -1 & 0\end{array}\right)

One form of the change-of-basis formula tells us how to find the matrix B with respect to the new basis corresponding to A with respect to the original:

B = P^{-1} A P\ ,

so we get

B = \left(\begin{array}{cc} 2 & 0 \\ -2 & 1\end{array}\right)

That is, A and B are matrix representations, with respect to two bases, of a single linear operator. We don’t actually need the matrix B, but it serves for displaying the general change-of-basis formula for from A to B. And we will want to contrast it with C^T\ .

We have been asked to find the matrix C^T\ corresponding to A^T\ with respect to a new basis. But that uses the change-of-basis formula, with A^T\ and C^T\ in place of A and B respectively:

C^T = P^{-1} A^T P\ ,

so we get

C^T = \left(\begin{array}{cc} 1 & -1 \\ 0 & 2\end{array}\right)

What we see (as we should expect) is that B and C^T\ are not the transposes of each other.

We have, then, 4 matrices. A and A^T\ represent a linear operator and its adjoint with respect to an orthornormal basis. B and C^T\ represent the same linear operator and its adjoint with respect to the non-orthonormal basis \{f_1, f_2\}\ .

(In principle, of course, we only needed three of those matrices: A, A^T\ , and C^T\ ; but it may be informative to compute B.)

second form of the question
first of three solutions

We are given the matrix B, representing a linear operator with respect to a non-orthonormal basis…

B = \left(\begin{array}{cc} 2 & 0 \\ -2 & 1\end{array}\right)

and we were given the attitude matrix describing that non-orthonormal basis…

att = \left(\begin{array}{cc} 1 & -1 \\ 1 & 0\end{array}\right)

We want to find the matrix C^T\ of the adjoint operator with respect to the non-orthonormal basis.

The quickest way to do it is to transform B to the orthonormal basis, which gives us A. We still use the transition matrix P = {att}^T \

P = \left(\begin{array}{cc} 1 & 1 \\ -1 & 0\end{array}\right)

and the change of basis formula (but note that we’re going in the other direction, to the orthonormal basis instead of from it)…

A = P B P^{-1}\ .

so we get

A = \left(\begin{array}{cc} 1 & 1 \\ 0 & 2\end{array}\right)

We have just reduced this question to the previous problem: now we can get C^T\ by transforming A^T\ to the non-orthonormal basis…

C^T = P^{-1} A^T P\ .

We’re done. We could stop here. Maybe we should. But it’s nice to know more than one way to do something, because then we can check our work.

second of three solutions

Alternatively, we could use the reciprocal basis.

This gets a little messy only because we’re introducing a 3rd basis, so we get 2 more matrices.

We transpose B, and we get the matrix of the adjoint operator with respect to the reciprocal basis:

B^T = \left(\begin{array}{cc} 2 & -2 \\ 0 & 1\end{array}\right)\ .

But we’ve only just begun. We don’t want the adjoint with respect to the reciprocal basis; we want it with respect to the given non-orthonormal basis. It’s just that, ultimately, we have to take the transpose with respect to some basis; it’s the only way to compute the matrix of the adjoint. Then the question

how do we transform from the reciprocal basis (where we have B^T\ ) to the non-orthonormal basis where we want C^T\ ?

becomes

what is the transition matrix from the reciprocal basis to the basis?

And that leads to the question: exactly what is our reciprocal basis?

I could simply say (!) that the attitude matrix for the reciprocal basis is the inverse of the transition matrix P for the non-orthonormal basis; that is, the transition matrix Q for the reciprocal basis is the inverse transpose of P:

Q = P^{-T}\ ,

Q = \left(\begin{array}{cc} 0 & 1 \\ -1 & 1\end{array}\right)

But I think it’s only fair to admit that I don’t remember the description of the attitude matrix for the reciprocal basis. What I have in my head is a matrix equation, which says that the product of the attitude matrix Q for the reciprocal basis with the transition matrix P for the given basis, is the identity:

Q^T P = I\ .

(And that matrix equation simply says that the dot products of the reciprocal basis vectors – the rows of Q^T\ – with the given non-orthonormal basis vectors – the columns of P – are either 0 or 1.)

Then I solve for Q:

Q = P^{-T}\ .

Of course Q is what I said it was, but the point is that I remember the derivation rather than the answer. Fortunately it’s a very short and easy derivation, so I see the answer almost immediately….

Having Q, we can transform B^T\ to (not from) the orthonormal basis…

D^T = Q B^T Q^{-1}\ ,

so we get

D^T = \left(\begin{array}{cc} 1 & 0 \\ 1 & 2\end{array}\right)

and then we could transform D^T\ to the non-orthonormal basis:

C^T = P^{-1} D^T P\ ,

C^T = \left(\begin{array}{cc} 1 & -1 \\ 0 & 2\end{array}\right)

That is, as it should be, the same answer.

third of three solutions

And the second solution shows us another way to have done it. From

C^T = P^{-1} D^T P\

and

D^T = Q B^T Q^{-1}\

we get

C^T = P^{-1} Q B^T Q^{-1} P\ ,

which we write as

C^T	= R^{-1} B^T R\ ,

with R defined as

R = Q^{-1} P\ .

By writing out the algebra first, we can dispense with explicit the computation and transformation of D^T\ .

In other words, the transition matrix R from the reciprocal basis to the non-orthonormal basis is given by R = Q^{-1} P\ . Equivalently, the transition matrix from the non-orthonormal basis to its reciprocal basis is given by P^{-1} Q\ .

(But even if we use R, we are implicitly going through the original orthonormal basis.)

angle and axis of rotation 2: the two correct answers

Back in what has turned out to be my most popular post, axis and angle of rotation, I showed how to switch between the angle and axis of rotation and the rotation matrix.

Given the axis of rotation as a unit vector (a, b, c), and the angle of rotation, we construct the matrix N

N = \left(\begin{array}{lll} 0 & c & -b \\ -c & 0 & a \\ b & -a & 0\end{array}\right)

and then we construct

A = I + N sin \theta + N^2 \left(1-cos \theta \right)

and that’s a rotation. To be specific, it is an attitude matrix for a rotation of coordinate axes, about the axis (a, b, c) through the angle \theta\ .

Conversely, given a rotation matrix, we find the axis of rotation as the eigenvector whose eigenvalue is 1, and then we use the formula for the trace of a rotation matrix

trace[A] = 1 + 2\ cos \theta

to get the angle \theta\ .

The problem is that the eigenvector and the angle are each determined only to within a sign. There are 4 possible pairings, only 2 of which are correct. Given a candidate pair, we construct the rotation matrix from it, and compare it to the given rotation matrix.

I now know how to uniquely determine the two correct pairs. We need no longer test the possibilities.

The key was a sidebar comment on p. 74 in Kuipers’ “Quaternions and Rotation Sequences”. What he said was that given an arbitrary orthogonal 3×3 matrix A = a_{ij}\ , the axis of rotation was given by

i \left(a_{23} - a_{32}\right) +  j \left(a_{31} - a_{13}\right) +  k \left(a_{12} - a_{21}\right)

where i, j, k are the usual unit coordinate-axis vectors. That is, the components are

\left(\left(a_{23} - a_{32}\right),\   \left(a_{31} - a_{13}\right),\ \left(a_{12} - a_{21}\right)\right)\ .

I struggled with that for a while, silly me, before I realized that instead of trying to prove it for an arbitrary orthogonal 3×3, I should just use the magic formula for the rotation matrix in terms of the axis and angle.

Once I realized that, the obvious calculation is A - A^T\ . Let

A = I + N sin \theta + N^2 \left(1-cos \theta \right)

and we get

A = \left(\begin{array}{lll} \left(-b^2-c^2\right) (1-\cos (\theta ))+1 & a b (1-\cos (\theta ))+c \sin (\theta ) & a c   (1-\cos (\theta ))-b \sin (\theta ) \\ a b (1-\cos (\theta ))-c \sin (\theta ) & \left(-a^2-c^2\right) (1-\cos (\theta ))+1 & b c   (1-\cos (\theta ))+a \sin (\theta ) \\ a c (1-\cos (\theta ))+b \sin (\theta ) & b c (1-\cos (\theta ))-a \sin (\theta ) &   \left(-a^2-b^2\right) (1-\cos (\theta ))+1\end{array}\right)

and then A - A^T\ is

A - A^T = \left(\begin{array}{lll} 0 & 2 c \sin (\theta ) & -2 b \sin (\theta ) \\ -2 c \sin (\theta ) & 0 & 2 a \sin (\theta ) \\ 2 b \sin (\theta ) & -2 a \sin (\theta ) & 0\end{array}\right)

Just look at those off-diagonal terms! They each involve one of a, b, c, and a common factor of sin \theta\ . Your first reaction might be that we don’t know sin \theta\ , but that doesn’t matter. We can write the vector 2\ sin \theta\ (a,\ b,\ c) from the off-diagonal terms, and then we can get (a, b, c) just by making a unit vector from it. And once we have any one of a, b, c, we can get sin \theta\ .

So Kuipers was right, and we can get the axis of rotation from the skew part of the rotation matrix. What he never said was that we could get the sine of the angle, too. And that, combined with the cosine, determines the quadrant.

Given a rotation matrix (representing a rotation of the coordinate axes), there are two ways to proceed. One is to enhance the existing algorithm:

  • Get the rotation axis (a, b, c) as the unit eigenvector with eigenvalue 1,
  • Get cos \theta\ from the trace of the rotation matrix,
  • Get sin \theta\ from a (or b or c) and the appropriate off-diagonal term in A - A^T\ .

The other way to go is to forget about the eigenvector calculation.

  • Get the vector 2\ sin \theta\ (a, b, c) from the off-diagonal terms,
  • Get the vector (a, b, c) as a unit vector,
  • Get cos \theta\ from the trace of the rotation matrix.
  • Get sin \theta\ from a (or b or c) and the appropriate off-diagonal term in A - A^T\ ,

In either case, having both the cosine and the sine of an angle determines it (mod 2\ \pi\ , of course, but that’s not the issue). We know which quadrant we’re in.

Oh, we get one or the other of the two correct pairs of axis and angle by the two possible choices of the sign of (a, b, c). Pick one sign for the vector, get sin \theta\ ; pick the other sign for the vector, get -sin \theta\ . These are the two correct pairs.

Let me emphasize that we work with two matrices: the rotation matrix A and the skew matrix A - A^T\ .

Let’s do it both ways. Here’s the attitude matrix to get to Mars’ coordinate frame. (It rotates the solar system coordinate axes to Mars coordinate axes.)

A = \left(\begin{array}{lll} 0.90956 & -0.414415 & -0.0310051 \\ 0.414851 & 0.909845 & 0.00899314 \\ 0.0244829 & -0.0210423 & 0.999479\end{array}\right)

I ask Mathematica® for the eigendecomposition, and select the eigenvector with eigenvalue 1. It is:

\{-0.0361149,\ -0.0667194,\ 0.997118\}

and that is a unit vector, whose components we call a, b, c, respectively.

We compute that the trace of the rotation matrix is 2.81888 and that’s 1 + 2\ cos \theta\ , so we get the cosine of the angle:

\cos \left(\theta \right)\to 0.909442

Next we compute A - A^T\ :

\left(\begin{array}{lll} 0. & -0.829266 & -0.055488 \\ 0.829266 & 0. & 0.0300354 \\ 0.055488 & -0.0300354 & 0.\end{array}\right)

We know that the (2,3) location is 2\ a\ sin \theta\ , so we have…

0.0300354=2 a \sin (\theta )

Plug in a…

0.0300354=-0.0722298 \sin (\theta )

which gives us

\{\sin (\theta )\to -0.415831\}

Now, we want the angle whose tangent is \frac{\sin \theta}{\cos \theta}\ , with the correct signs of the sine and cosine. I think that most software, nowadays, has a two-argument arctangent function which does exactly that, and Mathematica® certainly does. I get that the angle is (in radians)

\theta = -0.428857

And that is what we found originally: a valid answer was the eigenvector returned by Mathematica® and the negative angle.

Let’s try it the other way. From the general skew matrix

\left(\begin{array}{lll} 0 & 2 c \sin (\theta ) & -2 b \sin (\theta ) \\ -2 c \sin (\theta ) & 0 & 2 a \sin (\theta ) \\ 2 b \sin (\theta ) & -2 a \sin (\theta ) & 0\end{array}\right)

and the specific skew matrix

\left(\begin{array}{lll} 0. & -0.829266 & -0.055488 \\ 0.829266 & 0. & 0.0300354 \\ 0.055488 & -0.0300354 & 0.\end{array}\right)

we construct the vector

2\ sin \theta\ (a,\ b,\ c) = \{0.0300354,\ 0.055488,\ -0.829266\}

which happens to be the negative of the eigenvector we had found. Make a unit vector out of it, which gives us values for (a, b, c):

(a,\ b,\ c) = \{0.0361149,\ 0.0667194,\ -0.997118\}

That’s pedagogically very nice: we just happen to have made the other sign choice for the axis.

As before, we know that the (2,3) location is 2\ a\ sin \theta\ , so we have…

0.0300354=2 a\ \sin (\theta )

Plug in the value of a…

0.0300354=0.0722298 \sin (\theta )

which gives us

\{\sin (\theta )\to 0.415831\}

which, as it must be, is the negative of the previous value for the sine.

The cos \theta is still the same, so we find the new two-argument arctangent:

\theta = 0.428857

This time we got the positive angle. Of course, because we changed the sign on the rotation axis.

We have both of the two correct answers, and only the two correct answers. Having one, of course, we could write the other. I didn’t work the problem twice in order to get two solutions; I worked it twice to illustrate the two possible algorithms. I was very glad that they just happened to give the two different correct answers.

We no longer have to pick one of two unit vectors from column A and one of two angles from column B, and test pairs to see if they reproduce the rotation matrix.

As for the two algorithms, take your pick. My preference is to get the eigenvector, but Mathematica® makes that almost trivial for me to do. If it’s easier for you to get (a, b, c) from A - A^T\ , go for it.

Why not orthogonal?

Consider the diagonalization of a matrix:

D = P^{-1}\ A\ P\ ,

or

A = P\ D\ P^{-1}\ .

A friend asked me why I couldn’t always choose P to be an orthogonal (more generally, a unitary) matrix. Actually, it was more like: remind me why I can’t do that. I fumbled for the answer. Sad dog.

We recall that an orthogonal matrix Q is real, square and satisfies

Q^T\ Q = I\ ,

where Q^T is the transpose of Q. That is to say, the inverse of Q is its transpose: Q^T = Q^{-1}\ . If our eigenvector matrix P is orthogonal, we may replace

A = P\ D\ P^{-1}\ .

by

A = P\ D\ P^{T}\ .

A unitary matrix is complex (possibly real), square and satisfies

Q^{\dagger}\ Q = I\ ,

whereQ^{\dagger} is the conjugate transpose of Q. That is to say, the inverse is the conjugate transpose: Q^{\dagger} = Q^{-1}\ .

Finally, we recall that a matrix is normal if it commutes with its complex conjugate:

A^{\dagger}\ A  = A\ A^{\dagger}\ .

Now we have to go all the way back to the schur’s lemma post here and recall that

  1. Any matrix A can be brought to upper triangular form by some unitary matrix.
  2. A matrix A can be diagonalized by some unitary matrix if and only if A is normal.

That is to say, our guaranteed upper triangular matrix is in fact diagonal if and only if A is normal.

Return to our diagonalization: we have

D = P^{-1}\ A\ P\ ,

By (2), if A is not normal, then P cannot be unitary; if, in addition, P is real, then P cannot be orthogonal.

I suppose I should remind us all that the matrices X\ X^T and X^T\ X (cf. covariance and correlation matrices, R-mode and Q-mode) are normal, so that if we diagonalize them, our eigenvector matrices are orthogonal.

As for the SVD, X = u\ w\ v^T\ , we are using two matrices u and v, not one matrix P, and u and v are guaranteed unitary.

As for my fumbling the answer…. A old friend said that you understand something when it’s intuitively obvious. Well, it just is not intuitively obvious to me that normality of the one gives me orthogonality of the other. I should probably take another look at the proof…. In the meantime, I’ll try to just remember the answer.

the matrix exponential: 3 of 3

discussion

There was a lot going on in the example of the SN decomposition (2 of 3). First off, we found eigenvalues of a non-diagonable matrix A, and constructed a diagonal matrix D from them. Then we found 2 eigenvectors and 1 generalized eigenvector of A, and used them to construct a transition matrix P. We used that transition matrix to go from our diagonal D back to the original basis, and find S similar to D.

So S is diagonable while A is not. And A and S have the same eigenvalues; and the columns of P should be eigenvectors of S. They are. The generalized eigenvector that we found for A is an (ordinary) eigenvector of S, but we had to get a generalized eigenvector of A in order to construct S from D.

I wonder. Can I understand the distinction between eigenvectors and generalized eigenvectors by studying S and A? We’ll see.

I would also remark that A was special in one sense: it was lower triangular. It is not an accident that the eigenvalues of A are its diagonal elements. We could have written the diagonal matrix D by inspection. Instead, I got it together with the eigenvectors.

A question: what does P do to A? We compute

P^{-1}\ A\ P = \left(\begin{array}{lll} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{array}\right)

It brings A to Jordan Canonical Form. (Did I make an especially good choice for the generalized eigenvector? I don’t know. It may be that any choice would have led to the JCF; or maybe not. My vague recollection from examples years ago is that I lucked out, that in general I have to use some of that considerable freedom I found in v to get JCF.)

If finding generalized eigenvectors is relatively painless for you, you may be happy with the N+S decomposition. (Of course, if you have a “matrix exponential” command available, you’re done.) If not, another possibility is to use the Cayley-Hamilton theorem: any matrix satisfies it’s own characteristic equation. In this example, the characteristic equation is

\lambda ^3-5 \lambda ^2+8 \lambda -4 = 0

(The roots of that are the eigenvalues, of course.) The Cayley-Hamilton says that

A^3\ - 5\ A^2 + 8\ A - 4 = 0

where the RHS must be the 3×3 zero matrix, and the –4 on the LHS must multiply the 3×3 identity matrix. And, indeed, A and its powers satisfy that equation.

I used to wonder what the Cayley-Hamilton theorem was good for. One thing it’s good for is turning higher powers of A into lower ones. Use it to express A^3 in terms of I, A, and A^2. then reduce A^4, and keep going. For our example, we could replace the infinite series in A by 3 infinite series of scalars: one series multiplies I, another multiplies A, and the third multiplies A^2. Maybe we could see the patterns for the three scalar series more easily than a pattern for the series in A itself.

There is yet another way to so this; we’ll see it when I look at the spectral decomposition theorem out of Halmos.

Ah, there’s one last point I want to make. Schur’s lemma told us that any matrix could be brought to upper triangular form. I didn’t say this, but any nilpotent matrix is similar to a strictly upper triangular matrix (i.e. upper triangular with zero diagonal). It’s also similar to a strictly lower triangular form; I think upper or lower should just depend on an ordering of the basis.

So why not have just split our triangular matrix into diagonal plus nilpotent? Because we need S and N to commute. Our example is a perfect illustration: we have a lower triangular matrix

A = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 1 & 1 & 2\end{array}\right)

so we can write it as the sum of diagonal \Sigma

\Sigma = \left(\begin{array}{lll} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2\end{array}\right)

and nilpotent B

B = \left(\begin{array}{lll} 0 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 1 & 0\end{array}\right)

While B is, indeed, nilpotent, it turns out that it’s B^3 which is equal to zero instead of B^2.

Do B and \Sigma commute? We compute one

B\ \Sigma = \left(\begin{array}{lll} 0 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 2 & 0\end{array}\right)

and the other

\Sigma \ B = \left(\begin{array}{lll} 0 & 0 & 0 \\ -2 & 0 & 0 \\ 2 & 2 & 0\end{array}\right)

No, they do not commute. It is nontrivial that we can find N and S which commute. We needed to go from D back to S, and get N = A – S.

Incidentally, the online reference linked back at the beginning of 1 of 3 shows that schur’s lemma itself can be used to compute the matrix exponential numerically, although the triangular form which schur’s lemma gives us cannot be used to get N and S. (I.e. we can use schur’s lemma as the basis for a different algorithm.)

the matrix exponential: 2 of 3

let’s look at an example of the SN decomposition.

This comes from Perko p. 35, example 3. We will compute exp(A) using the SN decomposition described in the previous post (“1 of 3″).

We take the following matrix:

A = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 1 & 1 & 2\end{array}\right)

I ask Mathematica® to find its eigenvalues and eigenvectors. For eigenvalues, I get

\lambda = \{2,\ 2,\ 1\}

For the eigenvector matrix, I get

\left(\begin{array}{lll} 0 & 0 & -1 \\ 0 & 0 & -1 \\ 1 & 0 & 2\end{array}\right)

We have three eigenvalues, one repeated. More importantly, we have only two eigenvectors. By that column of zeroes, Mathematica® is telling us that the repeated eigenvalue 2 has only one eigenvector

\{0,\ 0,\ 1\}

associated to it.

Let’s look for a generalized eigenvector for \lambda = 2. We solve {\left(A - 2I\right)}^2\ v = 0. A – 2 I is

A - 2 I = \left(\begin{array}{lll} -1 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 1 & 0\end{array}\right)

and {\left(A - 2I\right)}^2 is

{\left(A - 2I\right)}^2 = \left(\begin{array}{lll} 1 & 0 & 0 \\ 1 & 0 & 0 \\ -2 & 0 & 0\end{array}\right)

so we apply {\left(A - 2I\right)}^2 to an arbitrary vector and set the result to 0…

0 = \left(\begin{array}{lll} 1 & 0 & 0 \\ 1 & 0 & 0 \\ -2 & 0 & 0\end{array}\right) \left(\begin{array}{l} v1 \\ v2 \\ v3\end{array}\right)

and find that

\{\text{v1} = 0,\text{v1} = 0,-2 \text{v1} = 0\}

Mother Nature may not scream her secrets at us very often, but she’s sure about one thing: v1 = 0. That’s fine, because just as clearly, v2 and v3 are arbitrary. So one simple choice for a generalized eigenvector is {0, 1, 0}, and then my P matrix becomes

P = \left(\begin{array}{lll} 0 & 0 & -1 \\ 0 & 1 & -1 \\ 1 & 0 & 2\end{array}\right)

Our diagonal matrix of eigenvalues is

D = \left(\begin{array}{lll} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{array}\right)

We undo the similarity transform to get S:

S = P\ D\ P^{-1} = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 2 & 0 & 2\end{array}\right)

We compute N = A – S

N = A - S = \left(\begin{array}{lll} 0 & 0 & 0 \\ 0 & 0 & 0 \\ -1 & 1 & 0\end{array}\right)

Now we start computing powers of N. We don’t have to go far at all, because we compute and find that N^2 = 0. That is, in the expansion of exp(N), the only nonzero terms are I + N, so we compute it:

exp(N) = I + N = \left(\begin{array}{lll} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 1 & 1\end{array}\right)

and the matrix exp of D is, of course,

exp(D) = \left(\begin{array}{lll} e^2 & 0 & 0 \\ 0 & e^2 & 0 \\ 0 & 0 & e\end{array}\right)

Finally,

exp(A) = P\ exp(D)\ P^{-1}\ exp(N) \ ,

i.e.

exp(A) = \left(\begin{array}{lll} e & 0 & 0 \\ e-e^2 & e^2 & 0 \\ -2 e+e^2 & e^2 & e^2\end{array}\right)

I think this is just so slick. Harry Potter, eat your heart out: I know some true magic. (Don’t misunderstand, I do like the Harry Potter books. Right now I’m struggling thru the first one in German.)

the matrix exponential: 1 of 3

discussion

one correction 30 Aug 2008, per Brian Hall’s comment and counterexample here. see “edit” in this post.

Let me emphasize something I take for granted: i’m writing about theory as opposed to computations by computer. every time I say, for example, that two matrices A and B are similar via

B = P^{-1}\ A\ P\ ,

I am asserting that P is a transition matrix for a change-of-basis, and by definition it is invertible. In practice, the matrix P can be ill-conditioned, and computing  P^{-1} on a computer may be extremely hazardous.

An updated version of a classic paper may be found here . Having just looked at it, I may pepper my theoretical discussion with some vaguely computational comments.

Back to my ivory tower. Well, it’s hardly that, but it’s still far removed from numerical algorithms.

The definition of the matrix exponential is easy enough to write: as the exponential of a number x has the expansion

e^x = 1 + x + x^2\ /\ 2+ x^3\ /\ 3!\ +\ ...

we define

exp(A) = I + A + A^2\ /\ 2 + A^3\ /\ 3!\ +\ ....

But how do we actually compute it (in theory)? Well, we could just try computing the powers of A and look for a pattern, but there are easier ways. Far easier ways. (and just doing it numerically can be a bad idea: we can encounter disastrous cancellation between one power of A and the next.)

Even though I have a Mathematica® command which computes matrix exponentials for me, it’s nice to have some idea how I could check a computation. (I suspect that if the difficulties are numerical, anything else I try will be wrecked on the same rocks. OTOH, at least once a Mathematica® update broke some perfectly reliable code. It is plausible to try alternate methods.)

First off, if the matrix A is diagonal, with diagonal elements \left(d_1,\ ...,\ d_n\right) then its square A^2 is diagonal with diagonal elements \left({d_1}^2,\ ...,\ {d_n}^2\right) and its kth power A^k is diagonal with diagonal elements \left({d_1}^k,\ ...,\ {d_n}^k\right)\ . That is, powers of a diagonal matrix are exactly powers of the diagonal elements. It follows that if A is diagonal, exp(A) is just the exponentials of the diagonal elements: exp(A) is diagonal with elements \left(e^{d_1},\ ...,\ e^{d_n}\right)\ . We saw such an example here in Jz

Jz = \left(\begin{array}{lll} \hbar  & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & -\hbar \end{array}\right)

when we computed the matrix exponential (note the negative sign among all the multipliers)

\text{MatrixExp}\left[-\frac{i\ \text{Jz}\ \theta   }{\hbar }\right]

and got

\left(\begin{array}{lll} e^{-i \theta } & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & e^{i \theta }\end{array}\right)

(All those multipliers are a minor issue: as we have seen before, multiplying a matrix by a scalar multiplies all the eigenvalues but has no effect on the eigenvectors of the matrix.)

Second, if a matrix A is not diagonal but is diagonable to D, then we have

D = P^{-1}\ A\ P\ ,

i.e.

A = P\ D\ P^{-1}\ .

then

A^2 = P\ D\ P^{-1}\ P\ D\ P^{-1} = P\ D^2\ P^{-1}

and, in general,

A^k = P\ D^k\ P^{-1}\ .

That is, all the interior P’s and P-inverses cancel each other, leaving us with the same similarity transform. It follows that if A is diagonable to D, then
exp(A) = P\ exp(D)\ P^{-1}\ . So we just do a similarity transform of A to get D, take the exponential of D, and then reverse the similarity transform for A^k as for A. (As I said, P^{-1} may challenging in practice, but in theory either it exists or it doesn’t.)

(Although we need the transformation to diagonal form for getting exp(D), there was nothing special about D in the cancellation of terms: if A and B are similar, with

B = P^{-1}\ A\ P\ ,

then powers of A and B are similar, with

B^k= P^{-1}\ A^k\ P\ .)

If a matrix is not diagonable, then there are a number of different ways to handle it. If we could transform it to Jordan Canonical Form (JCF) , then we would just need to get the exponential of a Jordan matrix. But transforming a matrix to JCF is not all that easy, and there’s a lot of different kinds of them to find the exponential of.

While we’re on the subject…. If a matrix A can be diagonalized, the matrix P which accomplishes that consists of linearly independent eigenvectors. In fact, we see how diagonalization must fail if it fails. If we have enough linearly independent eigenvectors to form P, then P must diagonalize A. if A cannot be diagonalized, we must be unable to form P: we must not have enough linearly independent eigenvectors. (An nxn matrix which does not have n linearly independent matrices is called defective.)

But, if A cannot be diagonalized, we must still be able to bring it to JCF – or another non-diagonal canonical form – by a similarity transform, i.e. by a change of basis. I wish I had thought to ask a simple question years ago: what is the matrix that accomplishes that? Answer: we add columns of “generalized eigenvectors” to the columns of eigenvectors. There must be a similarity transformation; it just can’t have eigenvectors for all its columns.

One way to describe generalized eigenvectors g of A is this: let \lambda be an eigenvalue of the nxn matrix A of multiplicity m <= n; then for k = 1, … m, (i.e. for any power up to the multiplicity of \lambda\ ) any nonzero solution v of

(A - \lambda\ I)^k\ v = 0

is called a generalized eigenvector of A. An eigenvector satisfies that equation with k = 1.

If we find enough eigenvectors, we never go beyond k = 1. if we have an eigenvalue of multiplicity 2 that has only one eigenvector, then we look at (A - \lambda\ I)^2\ v = 0, and stop. Et cetera.

It turns out that we don’t actually have to go to JCF. There is another kind of matrix for which it is extremely easy to compute the matrix exponential: a matrix N is called nilpotent if some (positive) power k of N is zero: N^k= 0. Then all higher powers of N are also zero, and the infinite series for the matrix exponential of N terminates at the last nonzero power: it becomes a (finite) polynomial. (Yes, i’m taking k to be the smallest power of N which is zero.)

Returning to non-diagonable matrices, we construct a matrix P whose columns are eigenvectors and generalized eigenvectors. Along the way, we got all the eigenvalues of A. it turns out that we can write the SN decomposition

A = S + N

where

  • S is diagonable,
  • N is nilpotent,
  • and (very important) N and S commute, SN = NS.

Furthermore, our matrix P diagonalizes S:

D = P^{-1}\ S\ P\ .

but we come at that backwards: we get the eigenvalues, we construct the diagonal matrix D from them, and then we use P (which requires finding generalized eigenvectors) to get S as

S = P\ D\ P^{-1}\ ,

and then we get N as

N = A – S.

Finally, all of that is useful for the matrix exponential because if matrices B and C commute, then

exp(B+C) = exp(B) exp(C)

just as e^{x+y} = e^x\ e^y\ . Edit: (In fact, the converse holds, too: B and C commute if and only if exp(B+C) = exp(B) exp(C) ).

Correct is:

e^{(B+C)\ t} = e^{B\ t}\ e^{C\ t}\ \text{for all t if and only if BC = CB}

(Golub & van Loan, “Matrix Computations”, 2nd ed. p. 559)

end edit

Because S and N commute, we can compute the matrix exponential of a non-diagonable A as the product of two very simple matrix exponentials:

exp(A) = exp(S+N) = exp(S)\ exp(N) = P\ exp(D)\ P^{-1}\ exp(N)\ ,

i.e. as the product of: a similarity transform of a diagonal matrix D, and a polynomial in N.