Color: Cohen’s intriguing F matrix again

Let me run thru the derivation of Figure 20 again, still with the CIE 1964 tables, but at 20 nm intervals. Then I show two alternative calculations. And one of them will show us why Cohen switched the sign on f2.

There is interesting linear algebra and vector algebra in this post, even if you’re not interested in where these problems came from.

Review:

What we are aiming for is an orthonormal basis whose three vectors are the columns of an F matrix. In particular, the first column — the first basis vector — is the fundamental of an equal-energy illuminant. The challenge was to find two other columns of F which matched Cohen’s Figure 20.

There were scaling issues all over the place. In particular, Cohen’s final three vectors were orthogonal but not orthonormal, because they were not of unit length. Having matched Figure 20, I will no longer concern myself with his scaling; unit vectors, I want unit vectors.

Let’s grab things as we need them.

Since I’ve shown you pictures derived from the 1964 xyz bar tables, I’ll stay with 1964.

What range of wavelengths? The CIE 1964 xyz bar tables run from 380 to 780. What the heck, just use 400 to 700 as before.

What interval? 20 nm, so I can display things.

Then my A matrix is the following subset of the CIE 1964 xyz bar tables:

A = \left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185 \\ 0.1177 & 0.7618 & 0.0607 \\ 0.3768 & 0.962 & 0.0137 \\ 0.7052 & 0.9973 & 0. \\ 1.0142 & 0.8689 & 0. \\ 1.124 & 0.6583 & 0. \\ 0.8563 & 0.3981 & 0. \\ 0.4316 & 0.1798 & 0. \\ 0.1526 & 0.0603 & 0. \\ 0.0409 & 0.0159 & 0. \\ 0.0096 & 0.0037 & 0.\end{array}\right)

There are 16 distinct wavelengths. I take the equal-energy illuminant to have the value 100 for 16 values:

ee= {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

To get its fundamental, I need a projection operator onto the fundamental color space (the pre-image of the range of A’, the transpose of A). To get that, I need an orthonormal basis for the fundamental color space. To get that, I need the Singular Value Decomposition (SVD) of A’:

A’ = u w v’

We know that the first 3 columns (because A only had 3 columns) of v are an orthonormal basis for the fundamental color space; the remaining 13 (= 16 – 3) columns of v are an orthonormal basis for the null space of A’. Here are the first 3 columns of v, and I name them v1:

v1 = \left(\begin{array}{ccc} -0.0253683 & 0.0153138 & 0.00589786 \\ -0.285062 & 0.175889 & 0.0585877 \\ -0.574584 & 0.357498 & 0.0807567 \\ -0.513035 & 0.307203 & -0.0134903 \\ -0.242873 & 0.0986244 & -0.196418 \\ -0.112652 & -0.0638886 & -0.364177 \\ -0.124169 & -0.198387 & -0.487962 \\ -0.173395 & -0.313102 & -0.431359 \\ -0.221517 & -0.394131 & -0.198767 \\ -0.250764 & -0.427605 & 0.140947 \\ -0.241369 & -0.398876 & 0.386199 \\ -0.171479 & -0.278411 & 0.372286 \\ -0.0839289 & -0.135192 & 0.203385 \\ -0.0292822 & -0.046994 & 0.0743802 \\ -0.00781685 & -0.012531 & 0.020133 \\ -0.00183092 & -0.00293336 & 0.00474977\end{array}\right)

We know that we construct a projection operator R onto the column space of v1 by

R = v1 v1′.

We can quickly check that R is, at least, a projection operator, since it is idempotent: i.e. R^2 = R.

And we can check that it is a projection operator onto the column space of v1, since it preserves each of the columns of v1: i.e. R v1 = v1.

If we wanted, we could confirm that it projects all of the other columns of v to the zero vector: i.e. R v2 = 0, where v2 is the last 13 columns of v.

Enough already, we came here to get the fundamental of the equal energy illuminant…

f = R ee = {5.53946, 62.01, 125.887, 116.934, 68.0773, 55.4377, 80.9502, 109.171, 126.547, 128.189, 113.072, 76.3005, 36.473, 12.5844, 3.34797, 0.782783}

Make it a unit vector f1…

f1 = {0.0165427, 0.185182, 0.375939, 0.349205, 0.203302, 0.165555, 0.241744, 0.326021, 0.377912, 0.382816, 0.337672, 0.227859, 0.108921, 0.0375811, 0.00999815, 0.00233765}

For the other two vectors, we found that we could match Cohen by taking the three vectors {f1, E2, E3} and applying the Gram-Schmidt orthogonalization procedure… which is nothing more than subtracting off the part of E2 which is parallel to f1 (and normalizing, to get a unit vector f2), and then subtracting off the part of E3 which is in the span of f1 and f2.

First, we need the E basis; they are the dual basis to A’; or E’ is the dual basis to A. As always, I find the solution by writing the definition…

E’A = I

then writing the false (because A’ is not invertible)…

E' = A^{-1}\ ,

which leads me to write the true

E' = (A'A)^{-1} A'\ .

As usual, that formula is for the transpose E’ but I write E = E”, the transpose of the transpose:

E = \left(\begin{array}{ccc} 0.00507367 & -0.00518039 & 0.00960814 \\ 0.050171 & -0.0533474 & 0.109524 \\ 0.0711671 & -0.0800759 & 0.224744 \\ -0.00118297 & -0.00392724 & 0.205467 \\ -0.140011 & 0.15312 & 0.103269 \\ -0.256404 & 0.304158 & 0.0467504 \\ -0.323667 & 0.432104 & 0.0296721 \\ -0.247916 & 0.423527 & 0.0110232 \\ -0.0411424 & 0.27367 & -0.0156973 \\ 0.237054 & 0.0289735 & -0.0458278 \\ 0.422342 & -0.164367 & -0.0633192 \\ 0.375936 & -0.19191 & -0.0524917 \\ 0.200409 & -0.110177 & -0.027315 \\ 0.0725721 & -0.0410648 & -0.00979225 \\ 0.0195879 & -0.011175 & -0.00263529 \\ 0.00461444 & -0.00264363 & -0.000619869\end{array}\right)

Just for my sanity, let me make unit vectors e2 and e3 out of E2 and E3 (which are, in case you have any doubt, the 2nd and 3rd columns of E):

e2 = {-0.00645091, -0.0664311, -0.0997151, -0.00489041, 0.190674, 0.378755, 0.53808, 0.5274, 0.34079, 0.0360794, -0.204679, -0.238978, -0.137198, -0.0511362, -0.0139157, -0.003292}

e3 = {0.02679, 0.305381, 0.626647, 0.572896, 0.28794, 0.130352, 0.0827338, 0.0307357, -0.0437682, -0.12778, -0.176551, -0.146361, -0.0761614, -0.0273034, -0.00734788, -0.00172836}

The part of e2 which is parallel to f1 is…

f1 (e2.f1)
= {0.00585458, 0.0655375, 0.133048, 0.123587, 0.0719501, 0.0585914, 0.0855552, 0.115382, 0.133746, 0.135482, 0.119505, 0.080641, 0.0385479, 0.0133002, 0.00353843, 0.000827313}

so the part of e2 which is orthogonal to f1 is found by subtracting the parallel part from e2…

t1 = e2-f1 (e2.f1)
= {-0.0123055, -0.131969, -0.232763, -0.128477, 0.118724, 0.320164, 0.452525, 0.412018, 0.207044, -0.0994022, -0.324183, -0.319619, -0.175746, -0.0644364, -0.0174541, -0.00411931}

Do a quick check that t1 is othogonal to f1…

t1.f1 = -5.15775*10^-17

But t1 is not a unit vector: it’s squared length is t1.t1 = 0.874749 .

So we normalize it, and call the result f2:

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.341736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

To get f3, we find the part of e3 which lies in the plane spanned by f1 and f2…

f1 (f1.e3)+f2 (f2.e3)
= {0.00735195, 0.0822258, 0.166479, 0.153523, 0.0869147, 0.0679319, 0.0993851, 0.136557, 0.161737, 0.167772, 0.150999, 0.103177, 0.0496131, 0.0171664, 0.00457096, 0.00106922}

then subtract that from e3, and the result will be orthogonal to the plane spanned by f1 and f2…

t2 = e3-(f1 (f1.e3)+f2 (f2.e3))
= {0.0194381, 0.223156, 0.460167, 0.419372, 0.201026, 0.0624205, -0.0166512, -0.105822, -0.205505, -0.295552, -0.32755, -0.249538, -0.125774, -0.0444698, -0.0119188, -0.00279758}

Like t1, t2 is not a unit vector, but we make it so, and call the result f3:

f3 = {0.0215889, 0.247847, 0.511084, 0.465775, 0.223269, 0.0693272, -0.0184937, -0.117531, -0.228244, -0.328255, -0.363793, -0.277149, -0.139691, -0.0493903, -0.0132376, -0.00310713}

If we arrange f1, f2, f3 as the columns of F…

F = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

we can compute F’F. Since F is supposed to be orthonormal, we expect to get a 3×3 identity matrix. Sure enough…

F'F = \left(\begin{array}{ccc} 1. & 0 & 0 \\ 0 & 1. & 0 \\ 0 & 0 & 1.\end{array}\right)

The resulting F, of course, are far from smooth…

but they still have the same kind of shapes we saw in Figure 20.

Oh, except that to match his figure, I had to take the negative of f2.

But I know of no good reason to do that, so I’m not going to. Yet. (Ok, Ok — the obvious guess is that f1, f2, f3 is a left-handed basis. If we change the sign of f2, we will get a right-handed basis. I’ll show it to you later in this post. That is, there is a good reason to do that, but nothing we’ve seen yet would lead us to it. )

What have we done? On a coarser scale, we have reproduced the calculations that lead to a specific orthonormal F matrix. It is one of many — infinitely many — bases which have f1 as the first basis vector. That really is the only special vector; the other two span the plane perpendicular to f1.

Cohen’s Cholesky Decomposition

Now let’s try it the way I think Cohen did it.

There is one potentially confusing problem: notation, as so often happens.

In the following discussion, “A” is no longer a matrix of color-matching functions. We will take “A” to have columns f, E2, E3; that was my starting point for the Gram-Schmidt orthogonalization. He describes such a starting point, but I believe he never shows the computations for such a case.

Here then is our new A matrix (which I call A1 in practice):

A1 = \left(\begin{array}{ccc} 5.53946 & -0.00518039 & 0.00960814 \\ 62.01 & -0.0533474 & 0.109524 \\ 125.887 & -0.0800759 & 0.224744 \\ 116.934 & -0.00392724 & 0.205467 \\ 68.0773 & 0.15312 & 0.103269 \\ 55.4377 & 0.304158 & 0.0467504 \\ 80.9502 & 0.432104 & 0.0296721 \\ 109.171 & 0.423527 & 0.0110232 \\ 126.547 & 0.27367 & -0.0156973 \\ 128.189 & 0.0289735 & -0.0458278 \\ 113.072 & -0.164367 & -0.0633192 \\ 76.3005 & -0.19191 & -0.0524917 \\ 36.473 & -0.110177 & -0.027315 \\ 12.5844 & -0.0410648 & -0.00979225 \\ 3.34797 & -0.011175 & -0.00263529 \\ 0.782783 & -0.00264363 & -0.000619869\end{array}\right)

Cohen computed two F matrices (F1 and F2) using two Cholesky decompositions, but we only need one of them. We recall that the first column of F2 = E Ga is proportional to the first column of A — and that’s what we want — so we will need E1, i.e. the dual basis to A1.

That is crucial. We need the dual basis to A1, not to A. We’ve already used the dual basis to A, to get two columns of A1, but now we need the analog of E for A1 instead of for A.

By definition, (dropping the 1’s for a moment)

E’A = I

and so

E' = (A'A)^{-1} A' = Me\ A'\ .

I am using Cohen’s

Ma = A’A

Me = E’E

and it turns out that Ma and Me are inverses. I will, of course, apply that equation for A and E to A1 and E1.

Here is Ma and its inverse Me:

Ma = \left(\begin{array}{ccc} 112130. & 95.1685 & 52.2359 \\ 95.1685 & 0.644885 & 0.0411263 \\ 52.2359 & 0.0411263 & 0.128627\end{array}\right)

Me = \left(\begin{array}{ccc} 0.0000123197 & -0.00153021 & -0.00451381 \\ -0.00153021 & 1.77301 & 0.054536 \\ -0.00451381 & 0.054536 & 9.59006\end{array}\right)

Here is E1:

Now we do a Cholesky decomposition of Ma:

Ga = \left(\begin{array}{ccc} 334.859 & 0. & 0. \\ 0.284205 & 0.751074 & 0. \\ 0.155994 & -0.00427115 & 0.322916\end{array}\right)

That is, the Cholesky decomposition has given us Ga such that Ga Ga’ = Ma.

Cohen had showed us that F2 = E Ga was an orthonormal matrix:

F2 = E Ga = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

I want to point out now, that the lower-triangular structure of Ga says that the 3rd column of F2 is proportional to the third column of E1: in fact, I claim we have (although it’s difficult to make it clear)

F2_3 = .322916 E1_3

(i.e. the 3rd column of F2 is .322916 times the 3rd column of E1).

Cohen also told us (I’ll confess that I haven’t figured out why) that the 1st column of F2 is proportional to the first column of A1: in fact, we have

F2_1 = .00298633 A1_1

Anyway, we have just applied his method to (f, E2, E3). And we have confirmed the realtionships between F2 and A1 and E1.

Fine, but what did we get? Did we get the same answers as from Gram-Schmidt?

Yes. The first column of F2 is equal to f1, the second column to f2, and the third column of F2 is equal to f3. (I could put up a drawing… but it looks exactly like the previous one.)

Lovely! And I’ll point out that we matched my f2 — not the negative of it. Even his method would require that we take the negative of his answer in order to get Figure 20.

The cross product

There is one other alternative computation I want to do.

We just saw that f1 was proportional to the first column of A1 — that’s how we chose it, a unit vector proportional to the fundamental f.

But we also saw that f3 was proportional to the third column of E1 — and so we could have computed f3 as soon as we had E1, just by normalizing the third column of E1. Cohen seems to have gone to an awful lot of work to get the matrix F2, given that both its first and third columns are known as soon as we compute E1 from A1. (This would work in all of his cases: for F2, normalize the first column of A and the third column of E; or — for F1 — normalize the first column of E and the third column of A.)

Anyway, that leaves f2, the second column of F2, to be determined.

Gee, it’s too bad we can’t compute a cross product for vectors of length 16.

Gee, who needs to?

Those vectors lie in a 3D subspace, so there is a change-of-basis transition matrix which leaves them with 3 nonzero components. In fact, the transition matrix is precisely v, and we know that new components y are related to old components x by

x = v y

so

y = v^{-1} x = v'x\

since v is orthogonal. For old components f1, we get

t1 = v’ f1
= {-0.913557, -0.393456, -0.102984, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

so we take y1 to be the three nonzero components…

y1 = {-0.913557, -0.393456, -0.102984}

For old components f3, we get

t3 = v’ f3
= {-0.361692, 0.901754, -0.236681, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

and, again, we take y3 to be the three nonzero components…

y3 = {-0.361692, 0.901754, -0.236681}

Now, what we have are the i and k unit vectors (first and third columns), we want the j unit vector (second column). The right-hand-rule shows me — you can’t do this with your right hand on the mouse! — that

j = k x i,

so I compute

y2 = y3 \times y1

y2 = {-0.18599, 0.178973, 0.966114}

(which is a unit vector) and use

x = v y

to get the corresponding f. Oh, I need to pad y2 out with zeroes.

t4 = {-0.18599, 0.178973, 0.966114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

t2= v t4
= {0.013157, 0.141101, 0.24887, 0.137367, -0.126939, -0.342318, -0.483839, -0.440529, -0.221371, 0.106281, 0.346616, 0.341736, 0.187908, 0.0688953, 0.0186619, 0.00440436}

Quickly recall f2

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.34,1736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

and we can confirm that t2 = – f2.

and that — presumably — is why he took the negative of f2: both his method and my Gram-Schmidt delivered left-handed coordinate systems; the cross product let us construct a right-handed coordinate system.

I think that’s pretty slick. Sure, it’s just a cross-product, but it was in a 3D subspace of R^{16}\ .

Color: Cohen Figure 20, “an intriguing F matrix”

I struggled to make sense of pages 94-101 of Cohen’s “Visual Color and Color Mixture”.

I understand them now. Mostly.

The challenge was to derive a drawing of three basis vectors (Figure 20, p. 95), shown here :

He calls these three vectors both “an intriguing F matrix” and “This canonical orthonormal configuration….” But as far as I can tell, he never said where he got this particular one. He made it sound important, so I had to derive it. Figuring it out, as usual, was very educational.

What follows is a detective story.

I eventually decided that the most likely candidate was a “transformation matrix” on p. 101. There are five transformation matrices on that page, but the key is that exactly one of them is associated with an equal-energy illuminant — and that’s related to “EE” in the figure.

Let us proceed somewhat as I did, but without all the wrong turns and blind alleys. If you want a chance to wander those byways, then close this page, open a copy of Cohen, and go to work. OTOH, I’m about to take you down one, just one, wrong turn. Hey, it’s interesting and valuable.

First, let’s get that particular transition matrix of his. He denotes it T_E but I will just refer to it as “TE” (outside of LaTeX).

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

He refers to the five T’s as “direct transformation matrices from the 1964 CIE X,Y,Z color-matching functions to orthonormal F matrices.” TE is specifically related to the equal-energy illumninant; the other four are related — in the same but as yet unexplained fashion — to the standard illuminants A, B, C, and D_65. (I have done the work to confirm that.)

OK. TE relates an A matrix to an F matrix. But how? “Transformation matrix” just doesn’t tell me enough.

A has 3 columns, so we could compute A TE, i.e. post-multiply A by TE. Since T_E^T\ , T_E^{-1}\ , and T_E^{-T} are all the same size, we could use any of them in place of TE.

Well, A TE doesn’t look right — I dismissed it immediately when I first saw it — but eventually I saw the light. A TE does not produce an orthonormal matrix — but it’s only off by a factor of \sqrt{2}\ (and some change).

Let’s compute A TE and call it F….

Oh, first I had better get the A matrix.

Cohen uses the 1964 CIE tables for this, so I take the A matrix to be the 1964 XYZ (i.e. xyz bar) tables at 5 nm intervals. Not at 20 nm. My only target is a drawing, and I want a lot of detail. I plan to repeat part of these calculations, and to do some alternatives, at 20 nm in the next post. But for this one, I need detail.

I have found a set of these tables online. Let me show you, for checking purposes, a tiny subset of the XYZ tables as I store them:

\left(\begin{array}{cc} 400 & \{0.0191,0.002,0.086\} \\ 405 & \{0.0434,0.0045,0.1971\} \\ 410 & \{0.0847,0.0088,0.3894\} \\ 415 & \{0.1406,0.0145,0.6568\} \\ 420 & \{0.2045,0.0214,0.9725\} \\ 425 & \{0.2647,0.0295,1.2825\}\end{array}\right)

My A matrix has 3 columns and 61 rows. The six rows of A corresponding to that part of the XYZ64 table are:

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.0434 & 0.0045 & 0.1971 \\ 0.0847 & 0.0088 & 0.3894 \\ 0.1406 & 0.0145 & 0.6568 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.2647 & 0.0295 & 1.2825\end{array}\right)

Mathematica aside.
There, I’ve finally shown you how I’m getting different A matrices whenever I change things. The table XYZ64 — like my other three tables — has wavelength for the first argument, and the Interpolation command lets me select any subsets, or even to interpolate. (It always gets the actual points right. In fact, I am usually simply getting subsets of the data, not truly interpolating. If I ever ask for 1 nm intervals, then I would be interpolating from the 5 nm table.)

For example, the following command gives me the entries for an A matrix from 400 to 500 in 20 nm intervals (and displays it nicely):

Table[Interpolation[XYZ64,i],{i,400,500,20}]//MatrixForm

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185\end{array}\right)

So, once having set up my XYZ64 table, and the others, I can get any plausible A matrix I want.

end Mathematica aside.

For this problem, my A matrix is the entire 1964 CIE XYZ table at 5 nm intervals. Now we compute

F = A TE

and plot the results. Let me put the two drawings side-by-side (mine on the left):

That black curve could be his “EE” on a different scale, but my other two are utterly different from “I” and “II”.

Let me show you the first six rows of F:

\left(\begin{array}{ccc} 0.0118144 & -0.0177697 & 0.00228112 \\ 0.027023 & -0.0406959 & 0.00540148 \\ 0.0532659 & -0.0802704 & 0.0112139 \\ 0.0895494 & -0.135158 & 0.0200304 \\ 0.132212 & -0.19958 & 0.0316765 \\ 0.174118 & -0.262005 & 0.045079\end{array}\right)

Cohen said that F would be orthonormal. It is not. We would conclude that F was orthonormal if F’F = I. So we compute F’F:

F^TF = \left(\begin{array}{ccc} 2.05018 & 0.00324299 & -0.0144556 \\ 0.00324299 & 2.00107 & 0.000405886 \\ -0.0144556 & 0.000405886 & 2.0003\end{array}\right)

Round it off:

F^TF = \left(\begin{array}{ccc} 2.05 & 0 & -0.01 \\ 0 & 2. & 0 \\ -0.01 & 0 & 2.\end{array}\right)

I have no idea why his F — equivalently, his TE — is too large by \sqrt{2}\ . (The off-diagonal terms are almost zero, so these vectors are almost orthogonal. But their squared lengths are 2, so their lengths are \sqrt{2}\ . We could have fixed that by dividing TE by \sqrt{2}\ .)

Since the book was published posthumously, perhaps the editor was editing without computing. Perhaps he took what he found in Cohen’s notes or drafts and hoped that everything was consistent.

I assure you that the other possibilities (T_E^T\ , T_E^{-1}\ , and T_E^{-T}) do not come close to delivering an orthonormal F. Feel free to try them.

We conclude that there is a minor error in TE; it turns out that the other four transformation matrices on p. 100 also need to be divided by \sqrt{2}\ . Whatever he had in mind, he was consistent. Under the circumstances — posthumous publication, and the ease of computing unit vectors — this is rather unimportant — conceptually and computationally — once we know it.

More importantly, I think, we also conclude that the F produced by TE is not the F matrix in Figure 20. They seem to have — and actually do have — proportional first columns, but that’s all.

That is a little more troubling. I could have searched for a derivation of TE; instead, I searched for a derivation of Figure 20. I don’t care to do both. Let me be explicit: I don’t know how he got his TE matrix, nor any of the other 4 transformation matrices on p. 100. I do know how he got Figure 20.

It will be more than enough work to get Figure 20. I don’ t know where TE came from. I have some ideas, and I expect that you will too by the time we finish, but I’ m done with TE. Maybe I shouldn’t be, but enough is enough.

I should emphasize, however, that I am keeping the first column of F computed from TE, at least for a little while.

Oh, let me do one last thing with TE before I abandon it.

We know the relationship between A and TE: F = A TE. How would we express that in my usual terminology? Using T for TE, we have

F = A T

and we transpose, getting

F’ = T’ A’.

That says that each column of F’ is obtained by applying the matrix T’ to the corresponding column of A’. That tells us two or three things.

First and second, T’ is the transition matrix, and therefore his “transformation matrix” TE is actually an attitude matrix; third, he takes A as the new components and F as the old.

Just so I know.

Now let’s find a way to match Figure 20.

First, where did the first column — the black curve or “EE” — come from? (Yes, we showed that F produced from TE has a first column which is proportional to the “EE” curve. But where did this common vector come from?)

He tells us that it is proportional to the fundamental of the illuminant.

Ok. To get the fundamental, we need the projection operator. To get that, we need the SVD (Singular Value Decomposition) of the color-matching functions, i.e. of the matrix A^T\ .

A’ = u w v’.

More to the point, we need the first three columns of the matrix v. (We need three columns because A’, like A, is of rank 3.)

Let me show you the first six (of 61) rows of v1:

\left(\begin{array}{ccc} -0.0126669 & 0.00772016 & 0.00299387 \\ -0.0289985 & 0.0177432 & 0.00674605 \\ -0.0572094 & 0.0351751 & 0.0129521 \\ -0.0963098 & 0.0596099 & 0.0210865 \\ -0.142331 & 0.0886581 & 0.0298013 \\ -0.187415 & 0.117288 & 0.0368556\end{array}\right)

Those 3 column vectors are an orthonormal basis for the preimage of the range of A’ (“the fundamental color space”). We form a projection operator by

R = v1 v1′

Now to get compute the fundamental of an equal-energy illuminant ee, we need to define the illuminant (I used all 100’s):

ee = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

Its fundamental f is the projection under R:

f = R ee =

{5.53525, 12.6614, 24.9595, 41.9659, 61.9674, 81.6212, 98.8882, 114.647, 125.821, 130.294, 129.298, 124.753, 116.918, 107.167, 95.2578, 80.7352, 68.1596, 60.0607, 54.8153, 53.7962, 55.5973, 59.7538, 65.4578, 72.9488, 81.1471, 88.5547, 95.6709, 102.8, 109.309, 114.255, 118.44, 122.816, 126.542, 129.052, 130.251, 129.621, 127.996, 126.529, 123.888, 119.146, 112.759, 105.059, 96.1651, 86.301, 76.0275, 65.9399, 55.8152, 45.5545, 36.3286, 28.6452, 22.185, 16.823, 12.5322, 9.19186, 6.64563, 4.72835, 3.33391, 2.32954, 1.61867, 1.12825, 0.779473}

Let me graph a scaled version of f along with the first column of F. (Scaled? I have to divide f by 470. Trust me.)

As I have done before, the black dots are one vector — in this case, the fundamental f — and the red curve is the other vector, in this case the first column of F.

We see that they are the same. Where we used to know that his TE gave us a column of F proportional to the EE curve in Figure 20, now we know what the curve is. Now I can dispense with F computed from TE, because I can use the fundamental f of the equal-energy illuminant instead.

The horizontal axis is off — I didn’t bother showing the wavelength.

That scaling by 470? Ha! The 1964 XYZ table at 1 nm intervals is over the range 360 to 830 nm. And 830-360 = 470.

(Probably “the length of the interval of integration”. Not that an integral is appropriate, in my opinion. And Cohen has pointedly been doing linear algebra, not calculus. But the literature is littered with integrals, even though the data is discrete. This is the mathematics I am skeptical of in all such works: they use calculus for their theory but linear algebra for their computations. (The example from Wyszecki & Stiles was closer to linear algebra, and the example they did two pages later is clearly not done by integration — no, I didn’t work that example.) I suspect it’s because a long time ago we pretty much restricted ourselves to continuous systems…. But I keep coming back to the fact that their data — the color matching functions — are discrete, and linear algebra is the appropriate tool. That’s my opinion.)

That’s where the 470 seems to be from, but I don’t really care. It would have made more sense for him to have used 400-700 instead of 360-830, and that’s one reason I don’t care about the scaling. Further, and more importantly, we will ultimately make a unit vector out of the fundamental f. Its scaling along the way just isn’t all that important.

And I didn’t scale the first column of F, so the \sqrt{2}\ in F is consistent with the 470 in f.

Sheesh! And I thought Fourier transforms had too many different scalings!

What interests me is that the first column of F is proprtional to f. And both appear to be proportional to the curve “EE” in figure 20. This is progress.

Now let’s get the dual basis E’. (I think I like idea of calling the rows of E’ a dual basis for the columns of A… and the columns of E a reciprocal basis for the columns of A… and, finally, the columns of E are a dual basis for the rows of A’. Row vectors and column vectors live in vector spaces which are dual to each other.)

Oh, say it compactly! The columns of E are simultaneously a reciprocal basis for the columns of Acolumns of A and a dual basis for the rows of A’. Numerically the same, conceptually different.

I recall the required computation as

E’A = I,

E' = A^{-1} = (A'A)^{-1} A'\ ,

where A^{-1} doesn’t exist, but led me to the appropriate pseudo-inverse.

Like everything else in this example, E is too big to display, but here are its first six rows:

\left(\begin{array}{ccc} 0.00128533 & -0.00131508 & 0.00240808 \\ 0.00289282 & -0.00297823 & 0.00552559 \\ 0.00554886 & -0.00576118 & 0.0109365 \\ 0.00901767 & -0.00947485 & 0.0184882 \\ 0.0127322 & -0.0135554 & 0.0274475 \\ 0.0157701 & -0.0170116 & 0.0363094\end{array}\right)

And here is a graph of the three rows of E’ (equivalently, the three columns of E):

NOTE that these are the “CIE 1964 XYZ fundamentals” on p. 163 of Cohen. Some of those resemble Figure 20… wait, E is not orthonormal, so none of them should be curves in Figure 20 (which are orthogonal, although not unit, vectors!) The resemblance is just that: not much more than vague similarity.

OK, let’s try getting an orthonormal basis, somehow.

I don’t know about you, but I want something I understand. Let’s apply Gram-Schmidt to a non-orthonormal basis. (This is almost certainly not what Cohen did. I’ll show you what I think he did in the next post. We’ll get the same answer.)

The first two non-orthonormal bases I started with were

f, E1, E2

and

f, E1, E3.

In retrospect, my third choice

f, E2, E3

seems more obvious. OTOH, more obvious or not, what matters is that it does, in fact, generate Figure 20!

As is my custom, I am going to describe Gram-Schmidt for this 3D case. The general method just extends it from 3D to n-dimensions.

We have our first vector f. Normalize it, by

f1 = \frac{f}{\sqrt{f \cdot f}}\ .

This gives us our first unit vector, f1, for the new basis.

f1 = {0.00827079, 0.0189187, 0.0372946, 0.0627057, 0.092592, 0.121959, 0.147759, 0.171307, 0.188002, 0.194686, 0.193198, 0.186406, 0.174699, 0.16013, 0.142335, 0.120635, 0.101844, 0.0897429, 0.0819053, 0.0803825, 0.0830737, 0.0892844, 0.0978074, 0.109, 0.12125, 0.132319, 0.142952, 0.153603, 0.163331, 0.170721, 0.176973, 0.183512, 0.189079, 0.192829, 0.194622, 0.19368, 0.191252, 0.18906, 0.185114, 0.178029, 0.168484, 0.15698, 0.14369, 0.128951, 0.113601, 0.0985278, 0.0833993, 0.0680677, 0.0542824, 0.0428017, 0.033149, 0.025137, 0.0187257, 0.0137345, 0.00992992, 0.00706512, 0.00498154, 0.00348081, 0.00241862, 0.00168584, 0.00116469}

Normalize E2 and E3, as we did f, just to start with vectors of the same size. Call the unit vectors e2 and e3.

e2 = {-0.00326355, -0.00739089, -0.0142972, -0.0235132, -0.0336395, -0.0422167, -0.0480022, -0.0514717, -0.0506656, -0.0450514, -0.0346783, -0.0212322, -0.00302856, 0.0178248, 0.0440051, 0.0714143, 0.0954016, 0.120945, 0.141852, 0.165312, 0.189969, 0.213125, 0.235048, 0.255043, 0.270026, 0.276282, 0.276115, 0.273313, 0.264794, 0.24833, 0.226531, 0.201183, 0.171297, 0.136324, 0.0977252, 0.0577811, 0.0185254, -0.0185732, -0.0524476, -0.0805823, -0.102238, -0.116778, -0.124173, -0.124757, -0.11956, -0.109979, -0.0974696, -0.0829623, -0.0686652, -0.0558823, -0.0443658, -0.0340783, -0.0255961, -0.0189148, -0.013768, -0.00982428, -0.00696574, -0.0048783, -0.00340378, -0.00233611, -0.0016479}

e3 = {0.0134212, 0.0307963, 0.0609534, 0.103042, 0.152976, 0.202366, 0.245956, 0.285754, 0.313861, 0.325009, 0.321797, 0.308947, 0.28685, 0.259425, 0.22519, 0.182631, 0.143994, 0.114635, 0.0919794, 0.0761547, 0.0649611, 0.0570429, 0.0503958, 0.0457693, 0.041089, 0.0352787, 0.0289087, 0.0223688, 0.0152318, 0.00702492, -0.00182114, -0.0115187, -0.0217216, -0.0324157, -0.0433804, -0.0537281, -0.0633608, -0.0725008, -0.0801518, -0.0851463, -0.0875296, -0.0873093, -0.08454, -0.0794315, -0.0725582, -0.0646114, -0.0558614, -0.0465045, -0.0377562, -0.0302361, -0.0237077, -0.0180942, -0.0135352, -0.00996527, -0.00722962, -0.00515145, -0.0036426, -0.00254819, -0.00177437, -0.00122704, -0.000856806}

Now apply Gram-Schmidt to the pair f1 and e2. The vector e2 is not orthogonal to f1, but it can be written as the sum of something parallel to f1 plus something perpendicular to f1. We will find the parallel part using the dot product, and get the perpendicular part by subtraction.

The component of e2 wrt f1 is… e2 \cdot\ f1 = 0.354107 (since f1 is a unit vector), so we make a vector of that length, parallel to f1, and subtract it from e2. The result of that vector subtraction is perpendicular to f1. We normalize it and call it f2.

f2 = f1 - (e2 \cdot\ f1)\ f1\ .

Then

f1, f2

is an orthonormal basis that spans the same subspace as

f1, e2.

f2 = {-0.00662133, -0.0150664, -0.029409, -0.0488852, -0.0710293, -0.0913202, -0.107276, -0.119901, -0.125361, -0.121889, -0.110234, -0.0932842, -0.0693866, -0.041572, -0.00683967, 0.0306849, 0.063449, 0.0953437, 0.120668, 0.14633, 0.171675, 0.194084, 0.214299, 0.231442, 0.242825, 0.245323, 0.241118, 0.234089, 0.221296, 0.200893, 0.175217, 0.145636, 0.111572, 0.072756, 0.0308042, -0.0115508, -0.052607, -0.0914459, -0.126173, -0.153574, -0.173117, -0.184308, -0.187183, -0.182227, -0.170858, -0.154906, -0.135801, -0.114483, -0.0939761, -0.0759605, -0.0599912, -0.0459572, -0.0344598, -0.0254257, -0.0184818, -0.0131801, -0.00933457, -0.00653426, -0.00455539, -0.00313629, -0.00220307}

Now what do we do with e3? We want to break off the piece which lies in the plane spanned by f1 and f2. That’s just the vector

f1\ (e3 \cdot f1) + f2\ (e3 \cdot f2)\ .

When we subtract that from e3, what remains is perpendicular to the plane spanned by f1 and f2; we will normalize it to length 1 and then call it f3. (It’s crucial that f1 and f2 are an orthonormal basis for that plane; otherwise we would need an explicit reciprocal basis to find components of e3 wrt f1 and f2.)

f3 = {0.0108058, 0.0248259, 0.0492157, 0.0833774, 0.124048, 0.164396, 0.200091, 0.232718, 0.255782, 0.264988, 0.262322, 0.251597, 0.233086, 0.2101, 0.18122, 0.145023, 0.111701, 0.085438, 0.064457, 0.0480153, 0.034675, 0.0232252, 0.0120345, 0.00174899, -0.00919785, -0.0209678, -0.0332539, -0.0457821, -0.0586168, -0.071629, -0.0848832, -0.0992835, -0.113845, -0.128148, -0.141854, -0.153559, -0.163729, -0.173431, -0.180567, -0.183118, -0.181455, -0.175821, -0.166361, -0.153478, -0.138238, -0.121869, -0.10453, -0.0863848, -0.0696761, -0.0554858, -0.0433135, -0.0329815, -0.0246352, -0.0181132, -0.0131248, -0.00934719, -0.00660276, -0.00461709, -0.00321259, -0.00222783, -0.00154979}

(Maybe I shouldn’t have printed all those, but if you have the 1964 xyz bar tables and you’re trying to verify my calculations….)

So, it’s time to look at what we got. Here are f1, f2, f3. They are an orthonormal basis, and they span the same space as E1, E2, E3 — the fundamental color space, the pre-image of the range of A’.

The brown looks like Cohen’s “I”. But the mauve curve is off. Still….

Do I need the negative of the mauve curve?

Yes!

And multiply by 44 ! (No, I have no idea why! I just kept changing the multiplier until I matched his scale on Figure 20.)

Having found them, make them pretty — i.e. set the horizontal scale.

Here, let me put them side-by-side. They look very close, but not exact. Nevertheless, I am content: his numbers tend to be off a little, so what’s the point of trying to match them perfectly?

So. Start with the non-orthonormal basis f, E2, E3 and apply Gram-Schmidt to get the orthonormal basis f1, f2, f3. Then take the negative of f2.

I must emphasize that his scale (i.e. multiply by 44) does not represent orthonormal vectors. Pairwise orthogonal, yes — but the f1, f2, and f3 I derived were specifically unit vectors. The scale on his figure requires that I multiply by 44 (or something very close to it). Once we have unit vectors, we can scale them any way that is convenient. Here it was convenient to match his — frankly, incomprehensible — vertical scale.

Oh, let’s close with one last calculation, to compute the “transformation matrix” for Figure 20. We have found the F matrix for Figure 20, and we have the A matrix. What is the transition matrix T between them? (Actually, we know we want the attitude matrix instead.) Recall that he takes A as the new, and F as the old, components.

So, as usual, we compute each column of F’ (old) by applying T to each column of A’ (new). That is,

F’ = T A’

F = A T’

and we write the false but suggestive

T' = A^{-1} F

which gives us the true

T' = (A'A)^{-1} A' F

using the appropriate pseudo-inverse.

So we compute and display T’, and remember that, as the transpose of a transition matrix, it is an attitude matrix…

T^T = \left(\begin{array}{ccc} 0.0663267 & -0.37473 & -0.161437 \\ 0.142691 & 0.37685 & 0 \\ 0.0781229 & -0.00253108 & 0.161503\end{array}\right)

We can compare that to the transformation matrix TE we tried at the beginning:

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

As we would expect — since the F produced by TE did not match Figure 20 — these are different.

Real Short Summary

I matched his Figure 20 by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3… where f was the fundamental of an equal-energy illuminant, and E2 and E3 were two of the basis vectors dual to the rows of A’.

Summary

We really started with two separate items: a “transformation matrix” TE — one of five on p. 100 — and Figure 20 on p. 95 of Cohen. The transformation matrix TE turned out to be an attitude matrix describing a new A matrix in terms of an old F matrix. (Yes, that’s backwards, but so long as we knew it, it didn’t matter. This is why I carefully distinguish attitude and transition matrices; so long as I know what I’ve been handed, I can use it properly and confidently.)

We saw that F = A TE reproduced only the “EE” curve of his Figure 20. And his F matrix failed to be orthonormal by a factor of \sqrt{2}\ .

We then saw that the “EE” curve was (proportional to) the fundamental f of an equal-energy illuminant. In order to find the fundamental of a spectrum, we had to find a projection operator R onto the pre-image of the range of A^T. The fundamental f is then the result of applying the projection operator R to a constant vector (“equal energy”).

By trial and error, I found that the F matrix for Figure 20 could be (very nearly) reproduced by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3, where E1, E2, E3 are a basis dual to the rows of A’. We found the dual basis as the rows of E' = (A'A)^{-1} A' and then transposed E’ to get E, a matrix of 3 columns.

In the next post, I will repeat these calculations quickly using 20 nm intervals, and I will show you a few alternatives. In particular, I will show you that his method (a Cholesky decomposition) gives the same answer — even to having to switch the sign of the second column!

Transition matrix: to be or not to be

Cohen (“Visual Color & Color Mixture”, see the bibliography) did something very interesting. In fact, he did something useful which I had never seen before.

Although this post uses some matrices which we saw in the color posts, I think this can stand on its own: you need not have read the color posts. But if you are specifically interested in color, or in Cohen’s work, this post is very relevant.

He was trying to describe how to find a transition matrix between two given data matrices. This will come in handy — very handy! — whenever people give the alleged result of an unspecified linear transformation of a data matrix.

On p. 93, he wrote

D T = C in general

F T = A in particular

with

T = (D'D)^{-1} D'\ C\ .

Because D’D (that is, D^T\ D\ ) is invertible, we infer that D and C are taller than wide, like the A matrix… like our usual data or design matrix X.

We also infer that T is a small matrix: its size is cxc, where c is the number of columns of D. D and C are the same size.

I do have to point out that in his equation (and his notation)

D T = C,

T is an attitude matrix! I will change that. (Trust me: you want me to change that.)

Since we have both XYZ (i.e. 1931 xyz bar) and RGB (1931 rgb bar) tables, and since they are supposed to be related by the transformation matrix T31…

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

… we should try the idea embodied in that equation for T. Assuming that there is a transition matrix T, let J and K stand for the XYZ and RGB matrices respectively.

As usual, let me use 20 nm intervals instead of the published 5 nm intervals. We have

J = \left(\begin{array}{ccc} 0.00003 & -0.00001 & 0.00117 \\ 0.0003 & -0.00014 & 0.01214 \\ 0.00211 & -0.0011 & 0.11541 \\ -0.00261 & 0.00149 & 0.31228 \\ -0.02608 & 0.01485 & 0.29821 \\ -0.04939 & 0.03914 & 0.14494 \\ -0.07173 & 0.08536 & 0.04776 \\ -0.09264 & 0.17468 & 0.01221 \\ -0.03152 & 0.21466 & 0.00146 \\ 0.0906 & 0.19702 & -0.0013 \\ 0.24526 & 0.1361 & -0.00108 \\ 0.34429 & 0.06246 & -0.00049 \\ 0.29708 & 0.01828 & -0.00015 \\ 0.15968 & 0.00334 & -0.00003 \\ 0.05932 & 0.00037 & 0 \\ 0.01687 & 0.00003 & 0 \\ 0.0041 & 0 & 0 \\ 0.00105 & 0 & 0 \\ 0.00025 & 0 & 0 \\ 0.00006 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

K = \left(\begin{array}{ccc} 0.0014 & 0. & 0.0065 \\ 0.0143 & 0.0004 & 0.0679 \\ 0.1344 & 0.004 & 0.6456 \\ 0.3483 & 0.023 & 1.7471 \\ 0.2908 & 0.06 & 1.6692 \\ 0.0956 & 0.139 & 0.813 \\ 0.0049 & 0.323 & 0.272 \\ 0.0633 & 0.71 & 0.0782 \\ 0.2904 & 0.954 & 0.0203 \\ 0.5945 & 0.995 & 0.0039 \\ 0.9163 & 0.87 & 0.0017 \\ 1.0622 & 0.631 & 0.0008 \\ 0.8544 & 0.381 & 0.0002 \\ 0.4479 & 0.175 & 0. \\ 0.1649 & 0.061 & 0. \\ 0.0468 & 0.017 & 0. \\ 0.0114 & 0.0041 & 0. \\ 0.0029 & 0.001 & 0. \\ 0.0007 & 0.0003 & 0. \\ 0.0002 & 0.0001 & 0. \\ 0. & 0. & 0.\end{array}\right)

In case you have not read the color posts, all I am saying is that J and K and T31 are given matrices, and T31 is supposed to be a transition matrix between J and K.

We do know — or take it as given — that for the transition matrix T31, old is XYZ ~ K, so let me write

K’ = T J’

That says T is a transition matrix which, applied to each column vector of J’, delivers the corresponding column of K’.

Transpose:

K = J T’

and then pretend J^{-1} exists (even though J is rectangular)

T' 	= J^{-1} K \

and then substitute the appropriate pseudo-inverse:

T' = (J'J)^{-1} J'\ K\ .

(The inappropriate pseudo-inverse would use JJ’ instead — which cannot be of full rank, given the assumed shape of J, hence cannot be invertible.)

That equation defined the transpose T’, but here’s T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.06008 \\ 0.00004 & 0.05661 & 5.59443\end{array}\right)

Recall the given T31:

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

By doing it my way, I constructed the counterpart to T31, which is the published transition matrix between XYZ and RGB (1931) with XYZ considered the “old” basis.

T and T31 are very close, but not the same.

Even if I used the full tables, at 5 nm intervals, we would not get T31.

First, I am certain that if there were a transition matrix T, then we must have computed it. Since this computation did not yield T31, I infer that T31 is not the transition matrix.

Second, is T the transition matrix? We should apply T to the appropriate table and see what we get. We had assumed

K’ = T J’, or equivalently

K = J T’

so let’s compute a counterpart to K, namely Q = J T’:

Q = \left(\begin{array}{ccc} 0.00139 & 0.00005 & 0.00654 \\ 0.01431 & 0.00039 & 0.06791 \\ 0.13434 & 0.00399 & 0.64559 \\ 0.3483 & 0.02299 & 1.74711 \\ 0.29082 & 0.06001 & 1.66915 \\ 0.09561 & 0.139 & 0.81307 \\ 0.00489 & 0.323 & 0.27202 \\ 0.06329 & 0.71 & 0.07819 \\ 0.29041 & 0.95401 & 0.02032 \\ 0.59452 & 0.99499 & 0.00388 \\ 0.91629 & 0.87 & 0.00167 \\ 1.06216 & 0.631 & 0.00081 \\ 0.85443 & 0.38099 & 0.00021 \\ 0.44795 & 0.17501 & 0.00003 \\ 0.1649 & 0.06102 & 0.00002 \\ 0.04676 & 0.01701 & 0 \\ 0.01135 & 0.0041 & 0 \\ 0.00291 & 0.00105 & 0 \\ 0.00069 & 0.00025 & 0 \\ 0.00017 & 0.00006 & 0 \\ 0 & 0 & 0\end{array}\right)

and compare it to K, via Q – K:

Picture 6

(I keep using pictures for things like that, namely scientific notation, just because it’s easier than editing the LaTeX.)

The largest entry in absolute value is… 0.000069765 . The three columns of Q look like this:

Picture 5

Pretty small, and pretty random, but still not really zero. Well, maybe it wouldn’t be. Let’s try our magic formula on J and Q, since they are related by the transition matrix T. From

T' = (J'J)^{-1} J'\ K\ ,

I compute what I call t’:

t' = (J'J)^{-1} J'\ Q\ ,

and then display the transpose of the transpose, t” = t:

t = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

Recall T:

T = \left(\begin{array}{ccc} 2.76887 & 1.75176 & 1.13013 \\ 1.00002 & 4.59072 & 0.0600843 \\ 0.0000430892 & 0.0566067 & 5.59443\end{array}\right)

So. That equation for T’ does recover the transition matrix if there is one. The discrepancy between T and T31 is real.

Maybe I’m making too big a deal of this. I infer that the XYZ and RGB tables do not span exactly the same subspace. They almost do, but not quite.

To be more explicit: the 1931 XYZ (xbar, ybar, zbar) and RGB (rbar, gbar, bbar) tables are not exactly related by any transition matrix whatsoever, although the published T31 is close to being one for them.

I have found a marvelous example — disappointing as a published result but marvelous as a bad example — where the discrepancy is significant. I expect to show it in another post, soon.

We’ve been looking at the computation in one way: if there is a transition matrix T such that

K’ = T J’

then

T' = (J'J)^{-1} J'\ K\ .

What if there is no such transition matrix T? We can still define T by that equation, and compute (J'J)^{-1} J'\ K\ (still assuming J is of full rank, with more rows than columns), but what are we getting, if T itself does not exist as a transition matrix?

That is, what if we compute T, and find that

K' \ne T\ J'\ ?

Well, this will be clearer in the next post, but…. That equation for T’ looks an awful lot like the normal equations for a least-squares fit.

What I will illustrate in the next post is that (J'J)^{-1} J'\ K\ generates a transition matrix to the subspace spanned by J. That is, T is a transition matrix — any invertible matrix is! — but it relates J to a subspace other than (the one spanned by) K.

(In our case, we had 21 observations; the 3 columns of J and of K define 3D subspaces of R^{21}\ . Those two 3D subspaces are almost the same, but not exactly.)

Let me try to explain here it without a good (I mean, bad) example. If J and K span the same subspace, then any one column of K (think of it as the dependent variable y) can be written as a linear combination of the columns of J (think of it as the X matrix).

But if J and K do not span the same space, then the best we can do is to find that linear combination (call it yhat !) of columns of X which is as close to y as possible.

They really are examples of y and yhat and X.

And what is T’ ? Well, if we write it with y and X…

(X'X)^{-1} X'\ y\

we might recognize that as \beta in ordinary least squares.

Let me show you that, while providing another example of the utility of the pseudo-inverse.

The derivation of the normal equations for ordinary least squares is fairly complicated, if only because it involves the derivative of a matrix. But the pseudo-inverse would let us recover the equations themselves.

Deriving them is one thing; recovering the form of them once we know they exist is another.

We write our model (this is a quick & dirty recollection of the answer, not a derivation):

y = X \ \beta\ .

(I guess I have to talk about that. True is yhat = X\ \beta\ . But I write y instead of yhat, as though we have equality rather than a projection onto a subspace.)

We pretend that X is invertible, and write

\beta =  X^{-1} y\ .

Since X is not invertible, but X’X is (by assumption X is of full rank with more rows than columns), we replace the non-existent inverse by the pseudo-inverse and write

\beta = (X'X)^{-1} X'\ y\ .

The real derivation gave us that very answer. Once I know that, I never need to compare them again. I have a plausibility argument that gives me the known right answer.

This is analagous to confirming in the previous color post, among others, that if I compute

E' = (A'A)^{-1} A'\

then I have a dual (or reciprocal or biorthogonal) basis E such that

E’A = I.

In that case, I prefer to verify it computationally every time.

Incidentally, this trickery with the pseudo-inverse would also generate a true result if I had started with yhat instead of y — it is true that \beta = (X'X)^{-1} X'\ \hat{y}\ , but it’s not very useful because we don’t know yhat!

You might have noticed that \beta corresponds, algebraically, to a column of T’. Asking if the first column of K is a linear combination of all the columns of J gives us the first column of T’; asking if the ith column of K is a linear combination of all the columns of J gives us the ith column of T’. This, too, will be in the next related post.

Summary

Given two data matrices J and K, tall and thin, we can define and compute

T' = (J'J)^{-1} J'\ K\ .

If the individual observations are in fact related by a transition matrix M, so that

K’ = M J’,

then our computed T is M:

T = M.

This is pretty slick. Any time someone hands us some data and the alleged result of applying an unspecified linear transformation to it — we can find that linear transformation if it exists, or demonstrate that it does not exist (in which case, they goofed).

And if there is no such matrix M? Well, it’s simple enough: if there is no such M — and J and K are both of full rank — then our computed T must be something else, so we do not have equality:

K' \ne T\ J'\ .

That’s all it takes for us to know that there is no transition matrix M between J’ and K’. If there were, it would be T, but that didn’t work.

(That’s if J and K are both of full rank; if only one is not, then one of J’J or K’K is actually not invertible — as JJ’ and KK’ are not invertible — and T can only be computed in one direction. I expect that T is not invertible if J and K are not both of full rank.)

And when there is no transition matrix M, we nevertheless get a transition matrix T, but it’s between J’ and Q’ = T J’.

I will freely confess that I did not know any part of this before I read Cohen: neither that we could find the transition matrix so easily; nor what we were finding if it were not a transition matrix.

(Cohen implicitly assumes that there exists a transition M; he does not discuss what happens if there isn’t one.)

Next post on this topic, a numerical illustration.

Color: re-doing Cohen’s example

Cohen’s example again

Let me now show you how I would do Cohen’s example. (His computations, pretty much, were the previous post.) I cannot over-emphasize that he deserves a lot of credit for getting the mathematics right, even if he didn’t name it correctly or do it beautifully.

I start with the A matrix:

A = \left(\begin{array}{ccc} 0.121934 & 0.0339986 & 0.219431 \\ 0.105204 & -0.0428189 & 0.139047 \\ 0.20452 & -0.36676 & -0.108477 \\ -0.449607 & 0.101795 & 0.292825 \\ -0.161109 & -0.0945546 & -0.0522694 \\ -0.554469 & 0.305203 & 0.229803 \\ -0.542936 & 0.493257 & 0.288467\end{array}\right)

This, as I have said, is a toy example of real color matching functions. Each column corresponds to one of the tri-stimulus values; each row represents a wavelength.

My summary of Cohen began with a different matrix — the E matrix — only because it was easier to enter into the computer. A is primary.

The next issue we have to cope with is that we really care about the transpose A' = A^T\ rather than about A. A’ will be applied to a spectrum, and will deliver three tri-stimulus values for it. I’m not convinced that any notational choice I make now will be without problems.

(As I said in the previous post, I use A’ for the transpose A^T\ whenever it is convenient.)

I am going to let

B = A^T\ .

B represents the linear operator I wish to study. Thus, B is

B = \left(\begin{array}{ccccccc} 0.121934 & 0.105204 & 0.20452 & -0.449607 &   -0.161109 & -0.554469 & -0.542936 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257 \\ 0.219431 & 0.139047 & -0.108477 & 0.292825 &   -0.0522694 & 0.229803 & 0.288467\end{array}\right)

It maps R^7\ to R^3\ . At most, it can be of rank 3; and then it would have a four dimensional null space.

What shall we do with it? All together now, class: find its singular value decomposition (SVD).

B = u\ w\ v'\ ,

where u and v are orthogonal (hence square) matrices, and w is as close to diagonal as it can be, given that it is the same shape as B.

The Mathematica command is

{u,w,v}=SingularValueDecomposition[B];

where the final semi-colon suppresses printing.

It is convenient to look first at the singular values themselves, the nonzero “diagonal” elements of the matrix w:

\{1.21434,0.354727,0.307628\}\ .

That all three are “significantly” larger than zero tells us that the matrix B (and therefore its transpose, A) is, indeed, of rank 3.

In case this is new to you, the w matrix is

w = \left(\begin{array}{ccccccc} 1.21434 & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0.354727 & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0.307628 & 0. & 0. & 0. & 0.\end{array}\right)

(And if this is new to you, check the tag cloud and the categories. There’s a lot examples of the Singular Value Decomposition. I could almost rename this blog “Applications of the SVD”.)

What about v?

v = \left(\begin{array}{ccccccc} 0.00783002 & 0.594295 & -0.455574 & 0.145535 &   0.183004 & 0.43387 & 0.443047 \\ -0.0405824 & 0.32624 & -0.416473 & -0.400691 &   0.350383 & -0.293626 & -0.590706 \\ -0.321854 & -0.360317 & -0.448269 & -0.580591 &   -0.296401 & -0.0331037 & 0.373633 \\ 0.416686 & -0.241837 & -0.606932 & 0.490082 &   -0.256011 & -0.298065 & -0.0704081 \\ 0.042639 & -0.520505 & -0.0951579 & 0.073689 &   0.821135 & 0.0465338 & 0.19064 \\ 0.551142 & -0.204115 & -0.0485665 & -0.317617 &   -0.104278 & 0.672485 & -0.297034 \\ 0.644592 & 0.198695 & 0.195648 & -0.367 & 0.0721767   & -0.425676 & 0.430864\end{array}\right)

It is 7×7. We know from the discussion of the four fundamental subspaces (look at the blue text) that the 4 rightmost columns of v are an orthonormal basis for the null space of B, and the 3 leftmost columns are an orthonormal basis for the pre-image of the range of B.

That is, writing a partitioned matrix [v] = [v1 v2], we have

v1 = \left(\begin{array}{ccc} 0.00783002 & 0.594295 & -0.455574 \\ -0.0405824 & 0.32624 & -0.416473 \\ -0.321854 & -0.360317 & -0.448269 \\ 0.416686 & -0.241837 & -0.606932 \\ 0.042639 & -0.520505 & -0.0951579 \\ 0.551142 & -0.204115 & -0.0485665 \\ 0.644592 & 0.198695 & 0.195648\end{array}\right)

v2 = \left(\begin{array}{cccc} 0.145535 & 0.183004 & 0.43387 & 0.443047 \\ -0.400691 & 0.350383 & -0.293626 & -0.590706 \\ -0.580591 & -0.296401 & -0.0331037 & 0.373633 \\ 0.490082 & -0.256011 & -0.298065 & -0.0704081 \\ 0.073689 & 0.821135 & 0.0465338 & 0.19064 \\ -0.317617 & -0.104278 & 0.672485 & -0.297034 \\ -0.367 & 0.0721767 & -0.425676 & 0.430864\end{array}\right)

We should confirm that B.v2 = 0, i.e. every column of v2 is in the null space of B:

B\ v2 = \left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

Once having a basis, v1, for the pre-image of the range, we know that we can construct a projection operator as v1 v1′:

P = v1\ v1' = \left(\begin{array}{ccccccc} 0.560795 & 0.383299 & -0.012435 & 0.136042 &   -0.265648 & -0.0948635 & 0.0339986 \\ 0.383299 & 0.281529 & 0.0822036 & 0.156964 &   -0.131909 & -0.0687305 & -0.0428189 \\ -0.012435 & 0.0822036 & 0.434363 & 0.225095 &   0.216479 & -0.0820701 & -0.36676 \\ 0.136042 & 0.156964 & 0.225095 & 0.600478 & 0.201399   & 0.308492 & 0.101795 \\ -0.265648 & -0.131909 & 0.216479 & 0.201399 &   0.281798 & 0.134365 & -0.0945546 \\ -0.0948635 & -0.0687305 & -0.0820701 & 0.308492 &   0.134365 & 0.34778 & 0.305203 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257\end{array}\right)

Because I didn’t call it R, I can recall Cohen’s R matrix, computed using a pseudo-inverse. I assure you that it is the same, we have P = R.

On the other side, since B is of full rank, its range is equal to its codomain, both being R^3\ , so the entire matrix u is an orthonormal basis for the range of B.

u = \left(\begin{array}{ccc} -0.756721 & 0.651151 & 0.0580932 \\ 0.530816 & 0.560136 & 0.635989 \\ 0.381585 & 0.512103 & -0.769509\end{array}\right)

Now the thing to do is, as before, split some given spectrum into its fundamental and its residual (metameric black). We only really have one good test, an equal energy spectrum.

We get the fundamental of our test spectrum…

e = \{1,1,1,1,1,1,1\}

…by applying our projection operator to it:

f = P\ e = \{0.741189,0.660536,0.496875,1.73027,0.34193,0.850176,0.43012\}

and that is exactly what we got before (come on, P = R; we have the same projection operator.)

Then we get the residual n as the difference…

n = e - f = \{0.258811,0.339464,0.503125,-0.730266,0.65807,0.149824,0.56988\}

and that, too, is the same as before.

As a refesher, let’s compute the components of the fundamental wrt the v1 basis. Since v1 is orthonormal, we may find the components of the fundamental by taking dot products, the fundamental dotted into each basis vector in v1. But that’s just

f\ v1 = \{1.30045,-0.207545,-1.87532\}

which says that f can be written as the sum

\{0.741,0.66,0.497,1.73,0.341,0.85,0.43\}

of the following three multiples of the basis vectors– I’m going to round off for clarity –

\{0.01,-0.053,-0.419,0.542,0.055,0.717,0.838\}

\{-0.123,-0.068,0.075,0.05,0.108,0.042,-0.041\}

\{0.854,0.781,0.841,1.138,0.178,0.091,-0.367\}.

The sum does indeed (almost) match our fundamental. (The discrepancy comes from my rounding off.)

\{0.741,0.661,0.497,1.73,0.342,0.85,0.43\}\ .

The last thing we need to get is the dual basis E for the rows of B. I just go look it up in the previous post…

E = A\ (A'A)^{-1} = B'\ (BB')^{-1}\ :

E = \left(\begin{array}{ccc} 1. & 0 & 2. \\ 0.5455 & -0.3636 & 1.5 \\ -0.5455 & -1.6364 & 0.5 \\ -0.8182 & -1.4545 & 1.3 \\ -1. & -1. & -0.5 \\ -0.7273 & -0.1818 & 0 \\ 0 & 1. & 0\end{array}\right)

Check that B E = I; it does.

And why do we care about the dual basis, i.e. the columns of E? Well, if we apply B to our test spectrum, we get

B\ e = \{-1.27646,0.43012,1.00883\}\ .

What are those three numbers? They are components of our test spectrum wrt the dual basis E!

That is, the sum of the following three vectors (which are multiples of the basis vectors in E)…

\{-1.276,-0.696,0.696,1.044,1.276,0.928,0\}

\{0,-0.156,-0.704,-0.626,-0.43,-0.078,0.43\}

\{2.018,1.513,0.504,1.311,-0.504,0,0\}

is

\{0.742,0.661,0.496,1.729,0.342,0.85,0.43\}

… which, once again, differs from our fundamental

f = \{0.741,0.661,0.497,1.73,0.342,0.85,0.43\}

only because of the rounding used to make the numbers readable.

To say that again: the dual basis E gives us a geometric interpretation of the tri-stimulus values we computed by applying B (i.e. A’) to a spectrum. The tri-stimulus values are components of the spectrum (equivalently, of its fundamental) wrt the E basis.

So. What did Cohen and I have that was the same? The matrices A and E, and the projection operator R. But note that where he computed R as

R = E’A,

and that’s short for R = A\ (A'A)^{-1}\ A'\ ,

I would use

R = v1\ v1^T\ .

I’ll admit that’s not much of a savings if we have both A and E.)

What did we have that was related? Where he had two orthonormal bases F1 and F2, I have one, v1. All three of these bases span the same subspace of R^7\ .

Digression: ways to show that F1, F2, and v1 span the same subspace of R7

One way to show that is to show that the columns of F1 and F2 are preserved by the projection operator cobstructed from v1. We have

F1 = \left(\begin{array}{ccc} 0.513425 & -0.372389 & 0.398141 \\ 0.280073 & -0.373412 & 0.252292 \\ -0.280073 & -0.563189 & -0.196825 \\ -0.420084 & -0.376455 & 0.53131 \\ -0.513425 & -0.0959116 & -0.0948392 \\ -0.373414 & 0.185702 & 0.416961 \\ 0. & 0.468301 & 0.523404\end{array}\right)

and we compute

R\ F1 = \left(\begin{array}{ccc} 0.513425 & -0.372389 & 0.398141 \\ 0.280073 & -0.373412 & 0.252292 \\ -0.280073 & -0.563189 & -0.196825 \\ -0.420084 & -0.376455 & 0.53131 \\ -0.513425 & -0.0959116 & -0.0948392 \\ -0.373414 & 0.185702 & 0.416961 \\ 0. & 0.468301 & 0.523404\end{array}\right)

And yes, their difference is the zero matrix.

Similarly for F2.

Alternatively, F1 and F2 should generate the same projection operator as v1 did.

F1\ F1^T = \left(\begin{array}{ccccccc} 0.560795 & 0.383299 & -0.012435 & 0.136042 &   -0.265648 & -0.0948635 & 0.0339986 \\ 0.383299 & 0.281529 & 0.0822036 & 0.156964 &   -0.131909 & -0.0687305 & -0.0428189 \\ -0.012435 & 0.0822036 & 0.434363 & 0.225095 &   0.216479 & -0.0820701 & -0.36676 \\ 0.136042 & 0.156964 & 0.225095 & 0.600478 & 0.201399   & 0.308492 & 0.101795 \\ -0.265648 & -0.131909 & 0.216479 & 0.201399 &   0.281798 & 0.134365 & -0.0945546 \\ -0.0948635 & -0.0687305 & -0.0820701 & 0.308492 &   0.134365 & 0.34778 & 0.305203 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257\end{array}\right)

I assure you that is the same as P and R. And similarly for F2.

Alternatively, again, we could simply write the columns of F1 (then F2) as a linear combination of the columns of v1, and solve for the components.

Alternatively, yet again, we could try to find a transition matrix relating F1 and v1, then F2 and v2. That’s awkward because they’re not square matrices. Guess what? Use a pseudo-inverse! (Very likely the next math post.)

I have no idea what is special about F1 and F2. They are orthonormal bases, like v1, for the pre-image of the range of B. Since they span the same space, they are related by rotations (and possibly a reflection).

end digression, return to main thread

Finally, what did I do that Cohen did not? The singular value decomposition, which got me a basis, v2, for the null space of B — which he did not have. And I daresay that v1 was easier to get than either F1 or F2.

I think having v2 is incredibly useful. I get a basis for metameric blacks.

(Of course, Cohen had computed the matrices Ma and Me, and Ga and Ge, but I don’t need them because I wasn’t computing F1 and F2.)

Let me summarize that.

Cohen

  • A and E;
  • Ma = A’A, Me = E’E, where Me = Ma^-1;
  • Ga and Ge, Cholesky decompositions of Ma and Me;
  • orthonormal bases F1 = A Ge and F2 = E Ga;
  • the projection operator R = E'\ A (= A\ (A'A)^{-1}\ A')\ .

Rip

  • A;
  • A’ = B = u w v’;
  • [v] = [v1] [v2] where both v1 and v2 are orthonormal bases;
  • the projection operator R = v1 v1′;
  • the dual basis E.

(Strictly speaking, A is the dual basis to E, but the relationship is symmetric. Nevertheless, the rows of A’ live in the dual space V*, while the columns of E live in the original space V.)

I dispensed with everything leading to F1 and F2. I use v1 in place of F1 and/or F2. I get a basis, v2, for the null space.

I think the key point is using the projection operator to split a spectrum into fundamental and null vectors. It’s a unique orthogonal decomposition into a vector which has the same tri-stimulus values as the original spectrum, and a vector in the null space (metameric black, with tri-stimulus values = {0,0,0}.)

(The decomposition is unique to A, but A itself is not unique. More to follow.)

Cohen: “Visual Color and Color Mixture”

Introduction

Oct 10 edit: the third heading has been changed to “Computing E from A”

I want to work an example from Cohen, “Visual Color and Color Mixture” (see the bibliography, and this “books added” post). I am not, however, going to do it exactly the way he did. Nevertheless, I will show you everything he calculated.

Because I want to get everything of his into one post, I will break this into small sections. I expect that my next post will show you what I would have done instead.

All of his matrices can be found on p. 70 of his book.

What we have here is an extremely small example to illustrate “color matching functions” applied to light spectra, resulting in three real numbers which we call R,G,B or X,Y,Z. This is a prelude to using real color matching functions on real spectra. I will refer to the 3 numbers I get during the course of this example as “R,G,B”, in quotes because this is a toy example, and because down the road I’ll be computing “XYZ tristimulus values” in preference to RGB.

The A and E matrices: computing A from E

He has two matrices denoted A and E. They are closely related, and the typing is simpler if I start with the E matrix:

E = \left(\begin{array}{ccc} 1 & 0 & 2 \\ 0.5455 & -0.3636 & 1.5 \\ -0.5455 & -1.6364 & 0.5 \\ -0.8182 & -1.4545 & 1.3 \\ -1 & -1 & -0.5 \\ -0.7273 & -0.1818 & 0 \\ 0 & 1 & 0\end{array}\right)

Let me emphasize that the A matrix is primary, and we should have computed the E matrix from it.

Let me answer a question at the outset. What is the point of the A matrix? Applied to a spectrum — careful! it’s actually the transpose A’ that is applied to a spectrum — it generates R,G,B or X,Y,Z values. This will get us from light spectra to colors! And the matrix E contains spectra; more to follow about that, I assure you.

Given E, he would compute the A matrix as

A = E\ (E'E)^{-1}\

where I will frequently use A’ for the transpose A^T\ , simply because it is sometimes typographically more convenient — and it is an extremely common notation. But note that I will write A^{-T}\ for the inverse transpose. In general, I will use whichever notation is more convenient for any particular expression, and I may even mix them shamelessly.

We get (with 4 places, as he showed it)

A = \left(\begin{array}{ccc} 0.1219 & 0.034 & 0.2194 \\ 0.1052 & -0.0428 & 0.139 \\ 0.2045 & -0.3668 & -0.1085 \\ -0.4496 & 0.1018 & 0.2928 \\ -0.1611 & -0.0946 & -0.0523 \\ -0.5545 & 0.3052 & 0.2298 \\ -0.5429 & 0.4933 & 0.2885\end{array}\right)

Hey what? What did he accomplish with that? First, we may say that he found an A such that both

A’E = I,

and E’A = I,

i.e.

A^T\ E = E^T\ A = \left(\begin{array}{ccc} 1. & 0 & 0 \\ 0 & 1. & 0 \\ 0 & 0 & 1.\end{array}\right)

A straightforward way of stating those relationships is: A’ is a left inverse of E; A is a right inverse of E’. We should also observe that by transposing

A’ E = I

we get

E’ A = I,

i.e. the second of those relationships. They are equivalent.

Now, we computed A from E as

A = E (E'E)^{-1}\ ,

Please note that the existence of (E'E)^{-1}\ depends on E having more rows than columns, and on E being of full rank. The matrix E is of rank 3, and the 3×3 matrix E’E is invertible. The 7×7 matrix EE’ is not, and can never be (not with E having 3 columns). And, of course, the matrices A and E themselves cannot possibly be invertible, since they are not square.

The A and E matrices: computing E from A

Had we been given the matrix A, we might guess that we could have computed E from A by some similar formula. Let’s try the obvious,

A\ (A'A)^{-1}\

and see what we get. We have

A' = (E'E)^{-T}\ E'\

A'A = (E'E)^{-T}E'\ E (E'E)^{-1} = (E'E)^{-T}\

(A'A)^{-1} = (E'E)^T

A\ (A'A)^{-1} = E\ (E'E)^{-1}(E'E)^T\ .

That doesn’t look very good — but wait a minute. We have spent a lot of time computing “dispersion matrices”, things of the form E’E — and they are symmetric. That is,

(E'E)^T = E'E\ .

So we continue, getting

A\ (A'A)^{-1} = E\ (E'E)^{-1}(E'E) = E\ .

That’s what we were hoping for, that E is computed from A exactly the same way as A was computed from E. Given the matrix A we would compute the matrix E as

E = A\ (A'A)^{-1}\ .

By the way, we saw, in that process, that

A'A = (E'E)^{-T} = (E'E)^{-1}\ .

This matters later when we define Ma and Me. That latest equation will tell us that Ma and Me are inverses.

The “real” relationship between A and E: dual (reciprocal) bases

All of Cohen’s work in this example treats A and E as matrices. There is another interpretation, a crucial interpretation. I will return to this in another post, but I have to tell you what he “really” did.

We have seen this computation before. Some of you may be sick of it. Well, it isn’t quite what we have seen before.

What we have seen before is a square matrix — specifically a transition matrix defining a basis — and we have wanted to compute the reciprocal basis. I even emphasized that I do not remember the answer, but I remember how to get it. If P is the given transition matrix, and R is the transition matrix for the reciprocal basis, then I want to compute all the dot products of the columns of R with the columns of P, and I want the answers to be an identity matrix. That is, I want an identity matrix to be the matrix product of rows of R’ with columns of P,

R’P = I

so R is the inverse transpose of P:

R = P^{-T}\ .

Wait just a minute. For a reciprocal basis, we just wrote

R’P = I.

For A and E, we had

E’A = I.

Guess what? Each column of E is a reciprocal basis vector for the columns of A. What we have are three columns of A, 7-dimensional but spanning a 3-D subspace. And the three columns of E are a reciprocal basis.

For reasons that will be far more obvious later, here is a place where I would choose to call the columns of E a dual basis rather than a reciprocal basis. Computationally — in finite dimensional spaces — there is no difference, but conceptually it will help a lot even there. In fact, I would say that we need to distinquish them conceptually precisely because we can’t tell the difference numerically.

It also turns out that it’s better to think of the columns of E (column vectors) as the original basis, and the rows of A’ (row vectors) as the dual basis. It might be best to think of the pair, columns of E and row of A’, as a set of dual bases, plural. Each row of A’ represents a so-called linear functional, a linear operator mapping a spectrum (a column vector) to a real number (the suffix -al in functional says the output is 1D). If this is new to you, relax. We’ll see it again, with real matrices A and E and real spectra.

I have barely discussed the dual space (that’s where dual vectors live, while reciprocal vectors live in the same space as the original basis) in these posts. We will get to it. If you know about it, good: for color theory, think dual basis rather than reciprocal basis. If you don’t know about the dual space, it’s okay to think reciprocal basis, for a while.

Okay, I’ll tell you that if you’ve ever seen differential forms, you’ve seen dual bases:

Picture 46

I should remark that Cohen never uses either term, reciprocal basis or dual basis. He knows that the relationship between A and E is crucial, he just doesn’t use the mathematical name for the relationship.

Now is a good time to remark, as I have before for other authors, that Cohen was almost certainly breaking new ground. I will not fault him for not saying it “correctly”; he deserves far too much credit for getting the mathematics right in the first place.

So, we still have exactly what we started with, the A and E matrices. In fact, we start with either one of them and get the other.

His dispersion matrices Me and Ma, and their Cholesky (LU) decompositions

I am going to show you the rest of what he did. We will later do almost all of this differently, but I had to know what he was doing. And if you are reading Cohen, then I have to show you a better way to do what he did. He computed (and named) the two intermediate dispersion matrices:

Me = E’E

Ma = A’A

and then he focuses on the inverses Me^{-1}\ and Ma^{-1}\ . Oh, he also knows that Me and Ma are themselves inverses, as I noted earlier when we worked out E in terms of A:

Me^{-1} = E'E^{-1} = A'A = Ma\ .

(That is, there are only two distinct matrices among these 4 names.)

Anyway, here are Me, and its inverse…

Me = \left(\begin{array}{ccc} 3.7936 & 3.0166 & 1.9818 \\ 3.0166 & 6.9586 & -2.7544 \\ 1.9818 & -2.7544 & 8.44\end{array}\right)

Me^{-1} = \left(\begin{array}{ccc} 0.8981 & -0.5429 & -0.3881 \\ -0.5429 & 0.4933 & 0.2885 \\ -0.3881 & 0.2885 & 0.3038\end{array}\right)

and, for example, Ma…

Ma = A^T A= \left(\begin{array}{ccc} 0.8981 & -0.5429 & -0.3881 \\ -0.5429 & 0.4933 & 0.2885 \\ -0.3881 & 0.2885 & 0.3038\end{array}\right)

which is, as it should be, Me^{-1}\ .

Although he did not describe it this way, he next computed two matrices by doing Cholesky decompositions of Ma and Me. I think it is safe to describe the Cholseky decomposition as a special case of the LU decomposition. It is the result of an LU decomposition applied to a positive define symmetric (Hermitian in general) matrix, and the resulting L and U matrices are transposes, so there’s really only one of L and U to be found.

If you are following along with Cohen, let me warn you that I am going to change notation. Why? Because there’s a little magic going on here and a slightly different notation will focus on it. Besides, his notation seems silly. He has A and E, ends up with F1 and F2 — and along the way he uses G and Gbar.

I’ll keep his F1 and F2, since they are among his final results, but I am going to use Ga and Ge for the intermediate G’s. Trust me: you want me to do that.

Let me just do them. For Ma, we get…

Ga = \left(\begin{array}{ccc} 0.9477 & 0 & 0 \\ -0.5729 & 0.4062 & 0 \\ -0.4095 & 0.1326 & 0.3442\end{array}\right)

What he has found is a lower triangular “square root” of Ma. That is, we have Ga such that

Ga Ga’ = Ma.

(This is his Gbar.)

We do it for Me, too…

Ge = \left(\begin{array}{ccc} 1.9477 & 0 & 0 \\ 1.5488 & 2.1354 & 0 \\ 1.0175 & -2.0279 & 1.8144\end{array}\right)

As before, we have gotten Ge such that

Ge Ge’ = Me.

Note that I have used Ga and Ge to denote what we took the Cholesky decomposition of: Ga came from A’A, and Ge came from E’E, which is the inverse of A’A.

His orthonormal bases F1 and F2

He then defines

F1 = A Ge

F2 = E Ga.

That is what I want you to see: he uses Ge with A, and Ga with E. This guarantees that F1 and F2 are orthonormal matrices. (See below.) And if I view Ge and Ga as transition matrices, then F1 is apparently a basis for the column space of A, F2 for the column space of E. I have to confess that I don’t really care about F1 and F2 — I can do far better.

But, here are Cohen’s F1 and F2:

F1 = \left(\begin{array}{ccc} 0.5134 & -0.3724 & 0.3981 \\ 0.2801 & -0.3734 & 0.2523 \\ -0.2801 & -0.5632 & -0.1968 \\ -0.4201 & -0.3765 & 0.5313 \\ -0.5134 & -0.0959 & -0.0948 \\ -0.3734 & 0.1857 & 0.417 \\ 0 & 0.4683 & 0.5234\end{array}\right)

F2 = \left(\begin{array}{ccc} 0.1287 & 0.2652 & 0.6884 \\ 0.111 & 0.0512 & 0.5163 \\ 0.2158 & -0.5985 & 0.1721 \\ -0.4744 & -0.4185 & 0.4475 \\ -0.17 & -0.4725 & -0.1721 \\ -0.5851 & -0.0739 & 0 \\ -0.5729 & 0.4062 & 0\end{array}\right)

Okay, just why did F1 and F2 turn out to be orthonormal? That is, they each satisfy F’ F = I:

 F1^TF1 = F2^T F2 = \left(\begin{array}{ccc} 1. & 0 & 0 \\ 0 & 1. & 0 \\ 0 & 0 & 1.\end{array}\right)

Let’s see why. Let’s compute F1′ F1. We have

F1 = A Ge.

Then

F1′ F1 = Ge’ A’A Ge = Ge’ Ma Ge

= Ge' Me^{-1} Ge = Ge' (Ge Ge')^{-1} Ge

= Ge^T (Ge^{-T} Ge^{-1}) Ge = I\ I = I\ .

It works out because Ma and Me are inverses, and Ge is used with A to get F1. A similar calculation works for F2.

I want to point out that F1 and F2 are not dual to each other; we have, for example, F1′ F2 != I:

F1^T F2 = \left(\begin{array}{ccc} 0.541775 & 0.764073 & 0.350247 \\ -0.392952 & 0.598605 & -0.698041 \\ -0.743014 & 0.240551 & 0.624553\end{array}\right)

Of course not. Just as an orthonormal basis is its own reciprocal basis, so it “is” its own dual basis, at least in a finite dimensional space. I waffle and use quoted “is”, because the two bases which are dual to each other live in different vector spaces — but if one is orthonormal, the dual has the same numerical components. We will see this explicitly. In this case, it means that F1 “is” its own dual basis, and F2 “is” its own dual basis.

Let me say that again, approximately but simply: F1 is dual to itself, and F2 is dual to itself; they are not dual to each other.

But today we know how to get such bases without the Cholesky decomposition. We also know that we can use the same orthonormal basis for both of those spaces.

I will do that in the next post.

Using the A matrix: applied to the columns of E

I ask again, what is the point of the A matrix? Applied to a spectrum — careful! it’s actually the transpose A’ that is applied to a spectrum — it generates “R,G,B” values. Of course, this A matrix is a toy, but better to start with 3×7 than 3×81 or even larger. We’ll get there.

There are two interesting possible spectra to which we might apply A’, even from a toy matrix A. The obvious possibility, to me at least, it to apply A’ to E. (That is, we are applying A to 3 spectra at once.) Now, we already know the answer, because E was chosen as the dual basis. That is, we have already seen that

A’E = I.

That is, if we let the columns of E be E1, E2, E3, then we have

A’ E1 = (1,0,0)

A’ E2 = (0,1,0)

A’ E3 = (0,0,1).

In words, E1 is a spectrum which leads pure red; E2 generates pure green; and E3 generates pure blue. More precisely, E1 (etc.) is a spectrum which generates our chosen red. I’ll make more sense of that when we have real A matrices. (Oh, yes, there are more than one.)

Using the A matrix: applied to an equal energy (constant) spectrum

The second possibility is an equal energy spectrum:

ee= \{1,1,1,1,1,1,1\}

A\ ee = \{-1.27646,0.43012,1.00883\}

Let’s not worry too much about that negative value; what it means is that we cannot actually match the equal energy spectrum (with whatever our toy “R,G,B” light sources are); instead, we have to add red light to the equal energy spectrum, and match that resulting mixture.

Perhaps now is a good time to say that the key point here is that we get 3 values; our perception of color vision is specified by 3 numbers. In practice, “CIE tristimulus values X,Y,Z” will be our objective.

The projection matrix R

There is one final crucial question. A’ maps from 7D to 3D. It is of full rank 3, so it has a nullspace of dimension 4. (If you need, now would be a good time time to look at fundamental subspaces.)

There are a whole lot of spectra out there whose R,G,B values are {0,0,0}. More importantly, any given spectrum can be split into two vectors, one in the nullspace of A’ and the other in the preimage of its range (i.e. in the range of A).

How would we find that decomposition?

Any direct sum decomposition has projection operators associated with it, so let’s find the projection operator onto the preimage of the range.

Malinowski did that. I did it differently. Cohen does it the same way Malinowski did, but it looks more compact because he defines both E and A. Cohen writes

R = E A’.

If we use

E = A\ (A'A)^{-1}\

to expand Cohen’s equation, we get

R = A\ (A'A)^{-1}\ A^T\ .

We have also seen that this is the “hat matrix”, which projects an observation onto the subspace spanned by the least-squares fit. That is, we have

yhat = X\ \beta\ ,

and

\beta = (X'X)^{-1}\ X'\ y\ ,

so

yhat = X\ (X'X)^{-1}X'\ y\

and if we define the hat matrix H by

H = X\ (X'X)^{-1}\ X'\ ,

we have

yhat = H y.

The key is that H is to X as R is to A.

Just for completeness, I compute R as Cohen did:

R = E^TA = \left(\begin{array}{ccccccc} 0.5608 & 0.3833 & -0.0124 & 0.136 & -0.2656 & -0.0949 & 0.034   \\ 0.3833 & 0.2815 & 0.0822 & 0.157 & -0.1319 & -0.0687 & -0.0428   \\ -0.0124 & 0.0822 & 0.4344 & 0.2251 & 0.2165 & -0.0821 &   -0.3668 \\ 0.136 & 0.157 & 0.2251 & 0.6005 & 0.2014 & 0.3085 & 0.1018 \\ -0.2656 & -0.1319 & 0.2165 & 0.2014 & 0.2818 & 0.1344 &   -0.0946 \\ -0.0949 & -0.0687 & -0.0821 & 0.3085 & 0.1344 & 0.3478 &   0.3052 \\ 0.034 & -0.0428 & -0.3668 & 0.1018 & -0.0946 & 0.3052 & 0.4933\end{array}\right)

Since we know that projection operators are idempotent, we can check that R^2 = R\ . (It is.)

The projection of the columns of E

Having gotten the projection operator R, let’s use it. First, apply it to E, the dual basis to the rows of A’.

A^T E= \left(\begin{array}{ccc} 1. & 0 & 2. \\ 0.5455 & -0.3636 & 1.5 \\ -0.5455 & -1.6364 & 0.5 \\ -0.8182 & -1.4545 & 1.3 \\ -1. & -1. & -0.5 \\ -0.7273 & -0.1818 & 0 \\ 0 & 1. & 0\end{array}\right)

I hope it is not a surprise that we just got E back: the projection of E is E.

But what does that mean? No part of any column of E is in the nullspace of A’.

The projection of the equal energy spectrum

Recall the equal energy spectrum:

ee = \{1,\ 1,\ 1,\ 1,\ 1,\ 1,\ 1\}\

I will denote the projection as f, for fundamental.

f = R\ ee = \{0.741189,0.660536,0.496875,1.73027,0.34193,0.850176,0.43012\}

Well, that has a nontrivial component in the nullspace, call it n. (We know that, as soon as we see f != ee.) We have

ee = f + n,

where n is simply the difference, n = ee – f, namely

n = \{0.258811,\ 0.339464,\ 0.503125,\ -0.730266,\ 0.65807,\ 0.149824,\ 0.56988 \}

Direct computation confirms that n is in the nullspace of A’:

A^T\ n = \{0,\ 0,\ 0\}

Hang on just one second. What would you call a color whose “R,G,B” values were all zero?

I would call it black.

Because n itself, however, is a nontrivial spectrum — in the sense that there’s light there at each of those 7 frequencies! — let us refer to that spectrum as metameric black. (Two different spectra are said to be metamers or metameric if they appear to be the same color under the same illumination.)

It’s black to our eyes. (Although I could guess that if it were sufficiently intense, we might find ourselves with sore eyes from the radiation that is hitting them.)

I would emphasize that n depends more on A’, in a sense, than on the original equal energy signal ee. Of course, it came from ee and its fundamental, but once I have this n, I could add it to any other spectrum (of length 7!) without affecting the observed color of that spectrum. (In fact, it will depend on the illumination, too, but I’ll show you that. For now, pretend the illuminant is equal energy, too.)

The point is that in the real world, with real spectra — from pine needles or from some Munsell color chip or anything — every “metameric black” we compute could be added to any other spectrum we have, without affecting the color we perceive.

This, if northing else, reminds us that color is our perception of a light spectrum. And there are physical spectra in the visble region that appear black to us.

We can confirm that the fundamental f has precisely the same “R,G,B” values as the equal energy signal, by computing A’ f

A^T\ f = \{-1.27646,\ 0.43012,\ 1.00883\}

Recall, or recompute A’ ee

A^T\ ee = \{-1.27646,\ 0.43012,\ 1.00883\}

The same. We have indeed split ee into two parts

ee = f + n,

one of which (n) contributes nothing to the final R,G,B values, the other of which (f) contributes everything. This is why f is called the fundamental.

And, our observation that the columns of E are projected onto themselves says that the columns of E are also fundamental spectra.

And they are a basis, so every fundamental — everything we see — could be written as a linear combination of the columns of E. They’re not an orthonormal basis, so we would have to use the dual basis to compute coefficients of the linear combinations. Well, that’s exactly what we accomplish by applying A’ (the dual basis) to the fundamental.

Those 3 numbers (A’f = A’ ee) we get describe the fundamental f in terms of the basis vectors E1, E2, E3:

f = R E1 + G E2 + B E3.

They are simply the components of f wrt the basis E1, E2, E3.

They are not, however, the coefficients of the original signal ee. That lies not entirely in the space spanned by E — that’s what the metameric black n is, the part of the original signal that is not in the space spanned by E.

(They also are not necessarily the RGB values we would use in RGB color space. After all, the components of f could be any real numbers, while RGB color space has restricted values, whether 0 to 1, 0 to 100, or 0 to 255. Don’t worry, we’ll get there.)

Summary

Cohen has a matrix A of “color matching functions”; it has 3 columns and lots more rows.

We construct a dual basis E via

E = A\ (A'A)^{-1}\ .

Since “dual basis” is a dual relationship, we could have — and did! — compute A from a given E:

A = E\ (E'E)^{-1}\

We construct two dispersion matrices

Me = E’E,

Ma = A’A,

and we saw that they were inverses.

We did Cholesky decompositions (LU decompositions of positive definite symmetric matrices) of Ma and Me, getting

Ga Ga’ = Ma,

Ge Ge’ = Me.

Then we used Ge and Ga to construct two orthonormal bases F1 and F2

F1 = A Ge,

F2 = E Ga.

Finally, Cohen computed the projection operator R onto the non-nullspace part of the domain of A’ (the preimage of the range of A’, equivalently the image of A).

Most importantly, at the end, we saw that an equal-energy (constant) spectrum could be split into two parts (f and n), one of which (f) contained all the “R,G,B” nformation, the other of which (n) was metameric black, invisible.

Next, I will show what I would do differently.

Posted in color. Tags: , . 2 Comments »