Color: Cohen’s intriguing F matrix again

Let me run thru the derivation of Figure 20 again, still with the CIE 1964 tables, but at 20 nm intervals. Then I show two alternative calculations. And one of them will show us why Cohen switched the sign on f2.

There is interesting linear algebra and vector algebra in this post, even if you’re not interested in where these problems came from.

Review:

What we are aiming for is an orthonormal basis whose three vectors are the columns of an F matrix. In particular, the first column — the first basis vector — is the fundamental of an equal-energy illuminant. The challenge was to find two other columns of F which matched Cohen’s Figure 20.

There were scaling issues all over the place. In particular, Cohen’s final three vectors were orthogonal but not orthonormal, because they were not of unit length. Having matched Figure 20, I will no longer concern myself with his scaling; unit vectors, I want unit vectors.

Let’s grab things as we need them.

Since I’ve shown you pictures derived from the 1964 xyz bar tables, I’ll stay with 1964.

What range of wavelengths? The CIE 1964 xyz bar tables run from 380 to 780. What the heck, just use 400 to 700 as before.

What interval? 20 nm, so I can display things.

Then my A matrix is the following subset of the CIE 1964 xyz bar tables:

A = \left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185 \\ 0.1177 & 0.7618 & 0.0607 \\ 0.3768 & 0.962 & 0.0137 \\ 0.7052 & 0.9973 & 0. \\ 1.0142 & 0.8689 & 0. \\ 1.124 & 0.6583 & 0. \\ 0.8563 & 0.3981 & 0. \\ 0.4316 & 0.1798 & 0. \\ 0.1526 & 0.0603 & 0. \\ 0.0409 & 0.0159 & 0. \\ 0.0096 & 0.0037 & 0.\end{array}\right)

There are 16 distinct wavelengths. I take the equal-energy illuminant to have the value 100 for 16 values:

ee= {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

To get its fundamental, I need a projection operator onto the fundamental color space (the pre-image of the range of A’, the transpose of A). To get that, I need an orthonormal basis for the fundamental color space. To get that, I need the Singular Value Decomposition (SVD) of A’:

A’ = u w v’

We know that the first 3 columns (because A only had 3 columns) of v are an orthonormal basis for the fundamental color space; the remaining 13 (= 16 – 3) columns of v are an orthonormal basis for the null space of A’. Here are the first 3 columns of v, and I name them v1:

v1 = \left(\begin{array}{ccc} -0.0253683 & 0.0153138 & 0.00589786 \\ -0.285062 & 0.175889 & 0.0585877 \\ -0.574584 & 0.357498 & 0.0807567 \\ -0.513035 & 0.307203 & -0.0134903 \\ -0.242873 & 0.0986244 & -0.196418 \\ -0.112652 & -0.0638886 & -0.364177 \\ -0.124169 & -0.198387 & -0.487962 \\ -0.173395 & -0.313102 & -0.431359 \\ -0.221517 & -0.394131 & -0.198767 \\ -0.250764 & -0.427605 & 0.140947 \\ -0.241369 & -0.398876 & 0.386199 \\ -0.171479 & -0.278411 & 0.372286 \\ -0.0839289 & -0.135192 & 0.203385 \\ -0.0292822 & -0.046994 & 0.0743802 \\ -0.00781685 & -0.012531 & 0.020133 \\ -0.00183092 & -0.00293336 & 0.00474977\end{array}\right)

We know that we construct a projection operator R onto the column space of v1 by

R = v1 v1′.

We can quickly check that R is, at least, a projection operator, since it is idempotent: i.e. R^2 = R.

And we can check that it is a projection operator onto the column space of v1, since it preserves each of the columns of v1: i.e. R v1 = v1.

If we wanted, we could confirm that it projects all of the other columns of v to the zero vector: i.e. R v2 = 0, where v2 is the last 13 columns of v.

Enough already, we came here to get the fundamental of the equal energy illuminant…

f = R ee = {5.53946, 62.01, 125.887, 116.934, 68.0773, 55.4377, 80.9502, 109.171, 126.547, 128.189, 113.072, 76.3005, 36.473, 12.5844, 3.34797, 0.782783}

Make it a unit vector f1…

f1 = {0.0165427, 0.185182, 0.375939, 0.349205, 0.203302, 0.165555, 0.241744, 0.326021, 0.377912, 0.382816, 0.337672, 0.227859, 0.108921, 0.0375811, 0.00999815, 0.00233765}

For the other two vectors, we found that we could match Cohen by taking the three vectors {f1, E2, E3} and applying the Gram-Schmidt orthogonalization procedure… which is nothing more than subtracting off the part of E2 which is parallel to f1 (and normalizing, to get a unit vector f2), and then subtracting off the part of E3 which is in the span of f1 and f2.

First, we need the E basis; they are the dual basis to A’; or E’ is the dual basis to A. As always, I find the solution by writing the definition…

E’A = I

then writing the false (because A’ is not invertible)…

E' = A^{-1}\ ,

which leads me to write the true

E' = (A'A)^{-1} A'\ .

As usual, that formula is for the transpose E’ but I write E = E”, the transpose of the transpose:

E = \left(\begin{array}{ccc} 0.00507367 & -0.00518039 & 0.00960814 \\ 0.050171 & -0.0533474 & 0.109524 \\ 0.0711671 & -0.0800759 & 0.224744 \\ -0.00118297 & -0.00392724 & 0.205467 \\ -0.140011 & 0.15312 & 0.103269 \\ -0.256404 & 0.304158 & 0.0467504 \\ -0.323667 & 0.432104 & 0.0296721 \\ -0.247916 & 0.423527 & 0.0110232 \\ -0.0411424 & 0.27367 & -0.0156973 \\ 0.237054 & 0.0289735 & -0.0458278 \\ 0.422342 & -0.164367 & -0.0633192 \\ 0.375936 & -0.19191 & -0.0524917 \\ 0.200409 & -0.110177 & -0.027315 \\ 0.0725721 & -0.0410648 & -0.00979225 \\ 0.0195879 & -0.011175 & -0.00263529 \\ 0.00461444 & -0.00264363 & -0.000619869\end{array}\right)

Just for my sanity, let me make unit vectors e2 and e3 out of E2 and E3 (which are, in case you have any doubt, the 2nd and 3rd columns of E):

e2 = {-0.00645091, -0.0664311, -0.0997151, -0.00489041, 0.190674, 0.378755, 0.53808, 0.5274, 0.34079, 0.0360794, -0.204679, -0.238978, -0.137198, -0.0511362, -0.0139157, -0.003292}

e3 = {0.02679, 0.305381, 0.626647, 0.572896, 0.28794, 0.130352, 0.0827338, 0.0307357, -0.0437682, -0.12778, -0.176551, -0.146361, -0.0761614, -0.0273034, -0.00734788, -0.00172836}

The part of e2 which is parallel to f1 is…

f1 (e2.f1)
= {0.00585458, 0.0655375, 0.133048, 0.123587, 0.0719501, 0.0585914, 0.0855552, 0.115382, 0.133746, 0.135482, 0.119505, 0.080641, 0.0385479, 0.0133002, 0.00353843, 0.000827313}

so the part of e2 which is orthogonal to f1 is found by subtracting the parallel part from e2…

t1 = e2-f1 (e2.f1)
= {-0.0123055, -0.131969, -0.232763, -0.128477, 0.118724, 0.320164, 0.452525, 0.412018, 0.207044, -0.0994022, -0.324183, -0.319619, -0.175746, -0.0644364, -0.0174541, -0.00411931}

Do a quick check that t1 is othogonal to f1…

t1.f1 = -5.15775*10^-17

But t1 is not a unit vector: it’s squared length is t1.t1 = 0.874749 .

So we normalize it, and call the result f2:

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.341736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

To get f3, we find the part of e3 which lies in the plane spanned by f1 and f2…

f1 (f1.e3)+f2 (f2.e3)
= {0.00735195, 0.0822258, 0.166479, 0.153523, 0.0869147, 0.0679319, 0.0993851, 0.136557, 0.161737, 0.167772, 0.150999, 0.103177, 0.0496131, 0.0171664, 0.00457096, 0.00106922}

then subtract that from e3, and the result will be orthogonal to the plane spanned by f1 and f2…

t2 = e3-(f1 (f1.e3)+f2 (f2.e3))
= {0.0194381, 0.223156, 0.460167, 0.419372, 0.201026, 0.0624205, -0.0166512, -0.105822, -0.205505, -0.295552, -0.32755, -0.249538, -0.125774, -0.0444698, -0.0119188, -0.00279758}

Like t1, t2 is not a unit vector, but we make it so, and call the result f3:

f3 = {0.0215889, 0.247847, 0.511084, 0.465775, 0.223269, 0.0693272, -0.0184937, -0.117531, -0.228244, -0.328255, -0.363793, -0.277149, -0.139691, -0.0493903, -0.0132376, -0.00310713}

If we arrange f1, f2, f3 as the columns of F…

F = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

we can compute F’F. Since F is supposed to be orthonormal, we expect to get a 3×3 identity matrix. Sure enough…

F'F = \left(\begin{array}{ccc} 1. & 0 & 0 \\ 0 & 1. & 0 \\ 0 & 0 & 1.\end{array}\right)

The resulting F, of course, are far from smooth…

but they still have the same kind of shapes we saw in Figure 20.

Oh, except that to match his figure, I had to take the negative of f2.

But I know of no good reason to do that, so I’m not going to. Yet. (Ok, Ok — the obvious guess is that f1, f2, f3 is a left-handed basis. If we change the sign of f2, we will get a right-handed basis. I’ll show it to you later in this post. That is, there is a good reason to do that, but nothing we’ve seen yet would lead us to it. )

What have we done? On a coarser scale, we have reproduced the calculations that lead to a specific orthonormal F matrix. It is one of many — infinitely many — bases which have f1 as the first basis vector. That really is the only special vector; the other two span the plane perpendicular to f1.

Cohen’s Cholesky Decomposition

Now let’s try it the way I think Cohen did it.

There is one potentially confusing problem: notation, as so often happens.

In the following discussion, “A” is no longer a matrix of color-matching functions. We will take “A” to have columns f, E2, E3; that was my starting point for the Gram-Schmidt orthogonalization. He describes such a starting point, but I believe he never shows the computations for such a case.

Here then is our new A matrix (which I call A1 in practice):

A1 = \left(\begin{array}{ccc} 5.53946 & -0.00518039 & 0.00960814 \\ 62.01 & -0.0533474 & 0.109524 \\ 125.887 & -0.0800759 & 0.224744 \\ 116.934 & -0.00392724 & 0.205467 \\ 68.0773 & 0.15312 & 0.103269 \\ 55.4377 & 0.304158 & 0.0467504 \\ 80.9502 & 0.432104 & 0.0296721 \\ 109.171 & 0.423527 & 0.0110232 \\ 126.547 & 0.27367 & -0.0156973 \\ 128.189 & 0.0289735 & -0.0458278 \\ 113.072 & -0.164367 & -0.0633192 \\ 76.3005 & -0.19191 & -0.0524917 \\ 36.473 & -0.110177 & -0.027315 \\ 12.5844 & -0.0410648 & -0.00979225 \\ 3.34797 & -0.011175 & -0.00263529 \\ 0.782783 & -0.00264363 & -0.000619869\end{array}\right)

Cohen computed two F matrices (F1 and F2) using two Cholesky decompositions, but we only need one of them. We recall that the first column of F2 = E Ga is proportional to the first column of A — and that’s what we want — so we will need E1, i.e. the dual basis to A1.

That is crucial. We need the dual basis to A1, not to A. We’ve already used the dual basis to A, to get two columns of A1, but now we need the analog of E for A1 instead of for A.

By definition, (dropping the 1’s for a moment)

E’A = I

and so

E' = (A'A)^{-1} A' = Me\ A'\ .

I am using Cohen’s

Ma = A’A

Me = E’E

and it turns out that Ma and Me are inverses. I will, of course, apply that equation for A and E to A1 and E1.

Here is Ma and its inverse Me:

Ma = \left(\begin{array}{ccc} 112130. & 95.1685 & 52.2359 \\ 95.1685 & 0.644885 & 0.0411263 \\ 52.2359 & 0.0411263 & 0.128627\end{array}\right)

Me = \left(\begin{array}{ccc} 0.0000123197 & -0.00153021 & -0.00451381 \\ -0.00153021 & 1.77301 & 0.054536 \\ -0.00451381 & 0.054536 & 9.59006\end{array}\right)

Here is E1:

Now we do a Cholesky decomposition of Ma:

Ga = \left(\begin{array}{ccc} 334.859 & 0. & 0. \\ 0.284205 & 0.751074 & 0. \\ 0.155994 & -0.00427115 & 0.322916\end{array}\right)

That is, the Cholesky decomposition has given us Ga such that Ga Ga’ = Ma.

Cohen had showed us that F2 = E Ga was an orthonormal matrix:

F2 = E Ga = \left(\begin{array}{ccc} 0.0165427 & -0.013157 & 0.0215889 \\ 0.185182 & -0.141101 & 0.247847 \\ 0.375939 & -0.24887 & 0.511084 \\ 0.349205 & -0.137367 & 0.465775 \\ 0.203302 & 0.126939 & 0.223269 \\ 0.165555 & 0.342318 & 0.0693272 \\ 0.241744 & 0.483839 & -0.0184937 \\ 0.326021 & 0.440529 & -0.117531 \\ 0.377912 & 0.221371 & -0.228244 \\ 0.382816 & -0.106281 & -0.328255 \\ 0.337672 & -0.346616 & -0.363793 \\ 0.227859 & -0.341736 & -0.277149 \\ 0.108921 & -0.187908 & -0.139691 \\ 0.0375811 & -0.0688953 & -0.0493903 \\ 0.00999815 & -0.0186619 & -0.0132376 \\ 0.00233765 & -0.00440436 & -0.00310713\end{array}\right)

I want to point out now, that the lower-triangular structure of Ga says that the 3rd column of F2 is proportional to the third column of E1: in fact, I claim we have (although it’s difficult to make it clear)

F2_3 = .322916 E1_3

(i.e. the 3rd column of F2 is .322916 times the 3rd column of E1).

Cohen also told us (I’ll confess that I haven’t figured out why) that the 1st column of F2 is proportional to the first column of A1: in fact, we have

F2_1 = .00298633 A1_1

Anyway, we have just applied his method to (f, E2, E3). And we have confirmed the realtionships between F2 and A1 and E1.

Fine, but what did we get? Did we get the same answers as from Gram-Schmidt?

Yes. The first column of F2 is equal to f1, the second column to f2, and the third column of F2 is equal to f3. (I could put up a drawing… but it looks exactly like the previous one.)

Lovely! And I’ll point out that we matched my f2 — not the negative of it. Even his method would require that we take the negative of his answer in order to get Figure 20.

The cross product

There is one other alternative computation I want to do.

We just saw that f1 was proportional to the first column of A1 — that’s how we chose it, a unit vector proportional to the fundamental f.

But we also saw that f3 was proportional to the third column of E1 — and so we could have computed f3 as soon as we had E1, just by normalizing the third column of E1. Cohen seems to have gone to an awful lot of work to get the matrix F2, given that both its first and third columns are known as soon as we compute E1 from A1. (This would work in all of his cases: for F2, normalize the first column of A and the third column of E; or — for F1 — normalize the first column of E and the third column of A.)

Anyway, that leaves f2, the second column of F2, to be determined.

Gee, it’s too bad we can’t compute a cross product for vectors of length 16.

Gee, who needs to?

Those vectors lie in a 3D subspace, so there is a change-of-basis transition matrix which leaves them with 3 nonzero components. In fact, the transition matrix is precisely v, and we know that new components y are related to old components x by

x = v y

so

y = v^{-1} x = v'x\

since v is orthogonal. For old components f1, we get

t1 = v’ f1
= {-0.913557, -0.393456, -0.102984, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

so we take y1 to be the three nonzero components…

y1 = {-0.913557, -0.393456, -0.102984}

For old components f3, we get

t3 = v’ f3
= {-0.361692, 0.901754, -0.236681, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

and, again, we take y3 to be the three nonzero components…

y3 = {-0.361692, 0.901754, -0.236681}

Now, what we have are the i and k unit vectors (first and third columns), we want the j unit vector (second column). The right-hand-rule shows me — you can’t do this with your right hand on the mouse! — that

j = k x i,

so I compute

y2 = y3 \times y1

y2 = {-0.18599, 0.178973, 0.966114}

(which is a unit vector) and use

x = v y

to get the corresponding f. Oh, I need to pad y2 out with zeroes.

t4 = {-0.18599, 0.178973, 0.966114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

t2= v t4
= {0.013157, 0.141101, 0.24887, 0.137367, -0.126939, -0.342318, -0.483839, -0.440529, -0.221371, 0.106281, 0.346616, 0.341736, 0.187908, 0.0688953, 0.0186619, 0.00440436}

Quickly recall f2

f2 = {-0.013157, -0.141101, -0.24887, -0.137367, 0.126939, 0.342318, 0.483839, 0.440529, 0.221371, -0.106281, -0.346616, -0.34,1736, -0.187908, -0.0688953, -0.0186619, -0.00440436}

and we can confirm that t2 = – f2.

and that — presumably — is why he took the negative of f2: both his method and my Gram-Schmidt delivered left-handed coordinate systems; the cross product let us construct a right-handed coordinate system.

I think that’s pretty slick. Sure, it’s just a cross-product, but it was in a 3D subspace of R^{16}\ .

Color: Cohen Figure 20, “an intriguing F matrix”

I struggled to make sense of pages 94-101 of Cohen’s “Visual Color and Color Mixture”.

I understand them now. Mostly.

The challenge was to derive a drawing of three basis vectors (Figure 20, p. 95), shown here :

He calls these three vectors both “an intriguing F matrix” and “This canonical orthonormal configuration….” But as far as I can tell, he never said where he got this particular one. He made it sound important, so I had to derive it. Figuring it out, as usual, was very educational.

What follows is a detective story.

I eventually decided that the most likely candidate was a “transformation matrix” on p. 101. There are five transformation matrices on that page, but the key is that exactly one of them is associated with an equal-energy illuminant — and that’s related to “EE” in the figure.

Let us proceed somewhat as I did, but without all the wrong turns and blind alleys. If you want a chance to wander those byways, then close this page, open a copy of Cohen, and go to work. OTOH, I’m about to take you down one, just one, wrong turn. Hey, it’s interesting and valuable.

First, let’s get that particular transition matrix of his. He denotes it T_E but I will just refer to it as “TE” (outside of LaTeX).

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

He refers to the five T’s as “direct transformation matrices from the 1964 CIE X,Y,Z color-matching functions to orthonormal F matrices.” TE is specifically related to the equal-energy illumninant; the other four are related — in the same but as yet unexplained fashion — to the standard illuminants A, B, C, and D_65. (I have done the work to confirm that.)

OK. TE relates an A matrix to an F matrix. But how? “Transformation matrix” just doesn’t tell me enough.

A has 3 columns, so we could compute A TE, i.e. post-multiply A by TE. Since T_E^T\ , T_E^{-1}\ , and T_E^{-T} are all the same size, we could use any of them in place of TE.

Well, A TE doesn’t look right — I dismissed it immediately when I first saw it — but eventually I saw the light. A TE does not produce an orthonormal matrix — but it’s only off by a factor of \sqrt{2}\ (and some change).

Let’s compute A TE and call it F….

Oh, first I had better get the A matrix.

Cohen uses the 1964 CIE tables for this, so I take the A matrix to be the 1964 XYZ (i.e. xyz bar) tables at 5 nm intervals. Not at 20 nm. My only target is a drawing, and I want a lot of detail. I plan to repeat part of these calculations, and to do some alternatives, at 20 nm in the next post. But for this one, I need detail.

I have found a set of these tables online. Let me show you, for checking purposes, a tiny subset of the XYZ tables as I store them:

\left(\begin{array}{cc} 400 & \{0.0191,0.002,0.086\} \\ 405 & \{0.0434,0.0045,0.1971\} \\ 410 & \{0.0847,0.0088,0.3894\} \\ 415 & \{0.1406,0.0145,0.6568\} \\ 420 & \{0.2045,0.0214,0.9725\} \\ 425 & \{0.2647,0.0295,1.2825\}\end{array}\right)

My A matrix has 3 columns and 61 rows. The six rows of A corresponding to that part of the XYZ64 table are:

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.0434 & 0.0045 & 0.1971 \\ 0.0847 & 0.0088 & 0.3894 \\ 0.1406 & 0.0145 & 0.6568 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.2647 & 0.0295 & 1.2825\end{array}\right)

Mathematica aside.
There, I’ve finally shown you how I’m getting different A matrices whenever I change things. The table XYZ64 — like my other three tables — has wavelength for the first argument, and the Interpolation command lets me select any subsets, or even to interpolate. (It always gets the actual points right. In fact, I am usually simply getting subsets of the data, not truly interpolating. If I ever ask for 1 nm intervals, then I would be interpolating from the 5 nm table.)

For example, the following command gives me the entries for an A matrix from 400 to 500 in 20 nm intervals (and displays it nicely):

Table[Interpolation[XYZ64,i],{i,400,500,20}]//MatrixForm

\left(\begin{array}{ccc} 0.0191 & 0.002 & 0.086 \\ 0.2045 & 0.0214 & 0.9725 \\ 0.3837 & 0.0621 & 1.9673 \\ 0.3023 & 0.1282 & 1.7454 \\ 0.0805 & 0.2536 & 0.7721 \\ 0.0038 & 0.4608 & 0.2185\end{array}\right)

So, once having set up my XYZ64 table, and the others, I can get any plausible A matrix I want.

end Mathematica aside.

For this problem, my A matrix is the entire 1964 CIE XYZ table at 5 nm intervals. Now we compute

F = A TE

and plot the results. Let me put the two drawings side-by-side (mine on the left):

That black curve could be his “EE” on a different scale, but my other two are utterly different from “I” and “II”.

Let me show you the first six rows of F:

\left(\begin{array}{ccc} 0.0118144 & -0.0177697 & 0.00228112 \\ 0.027023 & -0.0406959 & 0.00540148 \\ 0.0532659 & -0.0802704 & 0.0112139 \\ 0.0895494 & -0.135158 & 0.0200304 \\ 0.132212 & -0.19958 & 0.0316765 \\ 0.174118 & -0.262005 & 0.045079\end{array}\right)

Cohen said that F would be orthonormal. It is not. We would conclude that F was orthonormal if F’F = I. So we compute F’F:

F^TF = \left(\begin{array}{ccc} 2.05018 & 0.00324299 & -0.0144556 \\ 0.00324299 & 2.00107 & 0.000405886 \\ -0.0144556 & 0.000405886 & 2.0003\end{array}\right)

Round it off:

F^TF = \left(\begin{array}{ccc} 2.05 & 0 & -0.01 \\ 0 & 2. & 0 \\ -0.01 & 0 & 2.\end{array}\right)

I have no idea why his F — equivalently, his TE — is too large by \sqrt{2}\ . (The off-diagonal terms are almost zero, so these vectors are almost orthogonal. But their squared lengths are 2, so their lengths are \sqrt{2}\ . We could have fixed that by dividing TE by \sqrt{2}\ .)

Since the book was published posthumously, perhaps the editor was editing without computing. Perhaps he took what he found in Cohen’s notes or drafts and hoped that everything was consistent.

I assure you that the other possibilities (T_E^T\ , T_E^{-1}\ , and T_E^{-T}) do not come close to delivering an orthonormal F. Feel free to try them.

We conclude that there is a minor error in TE; it turns out that the other four transformation matrices on p. 100 also need to be divided by \sqrt{2}\ . Whatever he had in mind, he was consistent. Under the circumstances — posthumous publication, and the ease of computing unit vectors — this is rather unimportant — conceptually and computationally — once we know it.

More importantly, I think, we also conclude that the F produced by TE is not the F matrix in Figure 20. They seem to have — and actually do have — proportional first columns, but that’s all.

That is a little more troubling. I could have searched for a derivation of TE; instead, I searched for a derivation of Figure 20. I don’t care to do both. Let me be explicit: I don’t know how he got his TE matrix, nor any of the other 4 transformation matrices on p. 100. I do know how he got Figure 20.

It will be more than enough work to get Figure 20. I don’ t know where TE came from. I have some ideas, and I expect that you will too by the time we finish, but I’ m done with TE. Maybe I shouldn’t be, but enough is enough.

I should emphasize, however, that I am keeping the first column of F computed from TE, at least for a little while.

Oh, let me do one last thing with TE before I abandon it.

We know the relationship between A and TE: F = A TE. How would we express that in my usual terminology? Using T for TE, we have

F = A T

and we transpose, getting

F’ = T’ A’.

That says that each column of F’ is obtained by applying the matrix T’ to the corresponding column of A’. That tells us two or three things.

First and second, T’ is the transition matrix, and therefore his “transformation matrix” TE is actually an attitude matrix; third, he takes A as the new components and F as the old.

Just so I know.

Now let’s find a way to match Figure 20.

First, where did the first column — the black curve or “EE” — come from? (Yes, we showed that F produced from TE has a first column which is proportional to the “EE” curve. But where did this common vector come from?)

He tells us that it is proportional to the fundamental of the illuminant.

Ok. To get the fundamental, we need the projection operator. To get that, we need the SVD (Singular Value Decomposition) of the color-matching functions, i.e. of the matrix A^T\ .

A’ = u w v’.

More to the point, we need the first three columns of the matrix v. (We need three columns because A’, like A, is of rank 3.)

Let me show you the first six (of 61) rows of v1:

\left(\begin{array}{ccc} -0.0126669 & 0.00772016 & 0.00299387 \\ -0.0289985 & 0.0177432 & 0.00674605 \\ -0.0572094 & 0.0351751 & 0.0129521 \\ -0.0963098 & 0.0596099 & 0.0210865 \\ -0.142331 & 0.0886581 & 0.0298013 \\ -0.187415 & 0.117288 & 0.0368556\end{array}\right)

Those 3 column vectors are an orthonormal basis for the preimage of the range of A’ (“the fundamental color space”). We form a projection operator by

R = v1 v1′

Now to get compute the fundamental of an equal-energy illuminant ee, we need to define the illuminant (I used all 100’s):

ee = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

Its fundamental f is the projection under R:

f = R ee =

{5.53525, 12.6614, 24.9595, 41.9659, 61.9674, 81.6212, 98.8882, 114.647, 125.821, 130.294, 129.298, 124.753, 116.918, 107.167, 95.2578, 80.7352, 68.1596, 60.0607, 54.8153, 53.7962, 55.5973, 59.7538, 65.4578, 72.9488, 81.1471, 88.5547, 95.6709, 102.8, 109.309, 114.255, 118.44, 122.816, 126.542, 129.052, 130.251, 129.621, 127.996, 126.529, 123.888, 119.146, 112.759, 105.059, 96.1651, 86.301, 76.0275, 65.9399, 55.8152, 45.5545, 36.3286, 28.6452, 22.185, 16.823, 12.5322, 9.19186, 6.64563, 4.72835, 3.33391, 2.32954, 1.61867, 1.12825, 0.779473}

Let me graph a scaled version of f along with the first column of F. (Scaled? I have to divide f by 470. Trust me.)

As I have done before, the black dots are one vector — in this case, the fundamental f — and the red curve is the other vector, in this case the first column of F.

We see that they are the same. Where we used to know that his TE gave us a column of F proportional to the EE curve in Figure 20, now we know what the curve is. Now I can dispense with F computed from TE, because I can use the fundamental f of the equal-energy illuminant instead.

The horizontal axis is off — I didn’t bother showing the wavelength.

That scaling by 470? Ha! The 1964 XYZ table at 1 nm intervals is over the range 360 to 830 nm. And 830-360 = 470.

(Probably “the length of the interval of integration”. Not that an integral is appropriate, in my opinion. And Cohen has pointedly been doing linear algebra, not calculus. But the literature is littered with integrals, even though the data is discrete. This is the mathematics I am skeptical of in all such works: they use calculus for their theory but linear algebra for their computations. (The example from Wyszecki & Stiles was closer to linear algebra, and the example they did two pages later is clearly not done by integration — no, I didn’t work that example.) I suspect it’s because a long time ago we pretty much restricted ourselves to continuous systems…. But I keep coming back to the fact that their data — the color matching functions — are discrete, and linear algebra is the appropriate tool. That’s my opinion.)

That’s where the 470 seems to be from, but I don’t really care. It would have made more sense for him to have used 400-700 instead of 360-830, and that’s one reason I don’t care about the scaling. Further, and more importantly, we will ultimately make a unit vector out of the fundamental f. Its scaling along the way just isn’t all that important.

And I didn’t scale the first column of F, so the \sqrt{2}\ in F is consistent with the 470 in f.

Sheesh! And I thought Fourier transforms had too many different scalings!

What interests me is that the first column of F is proprtional to f. And both appear to be proportional to the curve “EE” in figure 20. This is progress.

Now let’s get the dual basis E’. (I think I like idea of calling the rows of E’ a dual basis for the columns of A… and the columns of E a reciprocal basis for the columns of A… and, finally, the columns of E are a dual basis for the rows of A’. Row vectors and column vectors live in vector spaces which are dual to each other.)

Oh, say it compactly! The columns of E are simultaneously a reciprocal basis for the columns of Acolumns of A and a dual basis for the rows of A’. Numerically the same, conceptually different.

I recall the required computation as

E’A = I,

E' = A^{-1} = (A'A)^{-1} A'\ ,

where A^{-1} doesn’t exist, but led me to the appropriate pseudo-inverse.

Like everything else in this example, E is too big to display, but here are its first six rows:

\left(\begin{array}{ccc} 0.00128533 & -0.00131508 & 0.00240808 \\ 0.00289282 & -0.00297823 & 0.00552559 \\ 0.00554886 & -0.00576118 & 0.0109365 \\ 0.00901767 & -0.00947485 & 0.0184882 \\ 0.0127322 & -0.0135554 & 0.0274475 \\ 0.0157701 & -0.0170116 & 0.0363094\end{array}\right)

And here is a graph of the three rows of E’ (equivalently, the three columns of E):

NOTE that these are the “CIE 1964 XYZ fundamentals” on p. 163 of Cohen. Some of those resemble Figure 20… wait, E is not orthonormal, so none of them should be curves in Figure 20 (which are orthogonal, although not unit, vectors!) The resemblance is just that: not much more than vague similarity.

OK, let’s try getting an orthonormal basis, somehow.

I don’t know about you, but I want something I understand. Let’s apply Gram-Schmidt to a non-orthonormal basis. (This is almost certainly not what Cohen did. I’ll show you what I think he did in the next post. We’ll get the same answer.)

The first two non-orthonormal bases I started with were

f, E1, E2

and

f, E1, E3.

In retrospect, my third choice

f, E2, E3

seems more obvious. OTOH, more obvious or not, what matters is that it does, in fact, generate Figure 20!

As is my custom, I am going to describe Gram-Schmidt for this 3D case. The general method just extends it from 3D to n-dimensions.

We have our first vector f. Normalize it, by

f1 = \frac{f}{\sqrt{f \cdot f}}\ .

This gives us our first unit vector, f1, for the new basis.

f1 = {0.00827079, 0.0189187, 0.0372946, 0.0627057, 0.092592, 0.121959, 0.147759, 0.171307, 0.188002, 0.194686, 0.193198, 0.186406, 0.174699, 0.16013, 0.142335, 0.120635, 0.101844, 0.0897429, 0.0819053, 0.0803825, 0.0830737, 0.0892844, 0.0978074, 0.109, 0.12125, 0.132319, 0.142952, 0.153603, 0.163331, 0.170721, 0.176973, 0.183512, 0.189079, 0.192829, 0.194622, 0.19368, 0.191252, 0.18906, 0.185114, 0.178029, 0.168484, 0.15698, 0.14369, 0.128951, 0.113601, 0.0985278, 0.0833993, 0.0680677, 0.0542824, 0.0428017, 0.033149, 0.025137, 0.0187257, 0.0137345, 0.00992992, 0.00706512, 0.00498154, 0.00348081, 0.00241862, 0.00168584, 0.00116469}

Normalize E2 and E3, as we did f, just to start with vectors of the same size. Call the unit vectors e2 and e3.

e2 = {-0.00326355, -0.00739089, -0.0142972, -0.0235132, -0.0336395, -0.0422167, -0.0480022, -0.0514717, -0.0506656, -0.0450514, -0.0346783, -0.0212322, -0.00302856, 0.0178248, 0.0440051, 0.0714143, 0.0954016, 0.120945, 0.141852, 0.165312, 0.189969, 0.213125, 0.235048, 0.255043, 0.270026, 0.276282, 0.276115, 0.273313, 0.264794, 0.24833, 0.226531, 0.201183, 0.171297, 0.136324, 0.0977252, 0.0577811, 0.0185254, -0.0185732, -0.0524476, -0.0805823, -0.102238, -0.116778, -0.124173, -0.124757, -0.11956, -0.109979, -0.0974696, -0.0829623, -0.0686652, -0.0558823, -0.0443658, -0.0340783, -0.0255961, -0.0189148, -0.013768, -0.00982428, -0.00696574, -0.0048783, -0.00340378, -0.00233611, -0.0016479}

e3 = {0.0134212, 0.0307963, 0.0609534, 0.103042, 0.152976, 0.202366, 0.245956, 0.285754, 0.313861, 0.325009, 0.321797, 0.308947, 0.28685, 0.259425, 0.22519, 0.182631, 0.143994, 0.114635, 0.0919794, 0.0761547, 0.0649611, 0.0570429, 0.0503958, 0.0457693, 0.041089, 0.0352787, 0.0289087, 0.0223688, 0.0152318, 0.00702492, -0.00182114, -0.0115187, -0.0217216, -0.0324157, -0.0433804, -0.0537281, -0.0633608, -0.0725008, -0.0801518, -0.0851463, -0.0875296, -0.0873093, -0.08454, -0.0794315, -0.0725582, -0.0646114, -0.0558614, -0.0465045, -0.0377562, -0.0302361, -0.0237077, -0.0180942, -0.0135352, -0.00996527, -0.00722962, -0.00515145, -0.0036426, -0.00254819, -0.00177437, -0.00122704, -0.000856806}

Now apply Gram-Schmidt to the pair f1 and e2. The vector e2 is not orthogonal to f1, but it can be written as the sum of something parallel to f1 plus something perpendicular to f1. We will find the parallel part using the dot product, and get the perpendicular part by subtraction.

The component of e2 wrt f1 is… e2 \cdot\ f1 = 0.354107 (since f1 is a unit vector), so we make a vector of that length, parallel to f1, and subtract it from e2. The result of that vector subtraction is perpendicular to f1. We normalize it and call it f2.

f2 = f1 - (e2 \cdot\ f1)\ f1\ .

Then

f1, f2

is an orthonormal basis that spans the same subspace as

f1, e2.

f2 = {-0.00662133, -0.0150664, -0.029409, -0.0488852, -0.0710293, -0.0913202, -0.107276, -0.119901, -0.125361, -0.121889, -0.110234, -0.0932842, -0.0693866, -0.041572, -0.00683967, 0.0306849, 0.063449, 0.0953437, 0.120668, 0.14633, 0.171675, 0.194084, 0.214299, 0.231442, 0.242825, 0.245323, 0.241118, 0.234089, 0.221296, 0.200893, 0.175217, 0.145636, 0.111572, 0.072756, 0.0308042, -0.0115508, -0.052607, -0.0914459, -0.126173, -0.153574, -0.173117, -0.184308, -0.187183, -0.182227, -0.170858, -0.154906, -0.135801, -0.114483, -0.0939761, -0.0759605, -0.0599912, -0.0459572, -0.0344598, -0.0254257, -0.0184818, -0.0131801, -0.00933457, -0.00653426, -0.00455539, -0.00313629, -0.00220307}

Now what do we do with e3? We want to break off the piece which lies in the plane spanned by f1 and f2. That’s just the vector

f1\ (e3 \cdot f1) + f2\ (e3 \cdot f2)\ .

When we subtract that from e3, what remains is perpendicular to the plane spanned by f1 and f2; we will normalize it to length 1 and then call it f3. (It’s crucial that f1 and f2 are an orthonormal basis for that plane; otherwise we would need an explicit reciprocal basis to find components of e3 wrt f1 and f2.)

f3 = {0.0108058, 0.0248259, 0.0492157, 0.0833774, 0.124048, 0.164396, 0.200091, 0.232718, 0.255782, 0.264988, 0.262322, 0.251597, 0.233086, 0.2101, 0.18122, 0.145023, 0.111701, 0.085438, 0.064457, 0.0480153, 0.034675, 0.0232252, 0.0120345, 0.00174899, -0.00919785, -0.0209678, -0.0332539, -0.0457821, -0.0586168, -0.071629, -0.0848832, -0.0992835, -0.113845, -0.128148, -0.141854, -0.153559, -0.163729, -0.173431, -0.180567, -0.183118, -0.181455, -0.175821, -0.166361, -0.153478, -0.138238, -0.121869, -0.10453, -0.0863848, -0.0696761, -0.0554858, -0.0433135, -0.0329815, -0.0246352, -0.0181132, -0.0131248, -0.00934719, -0.00660276, -0.00461709, -0.00321259, -0.00222783, -0.00154979}

(Maybe I shouldn’t have printed all those, but if you have the 1964 xyz bar tables and you’re trying to verify my calculations….)

So, it’s time to look at what we got. Here are f1, f2, f3. They are an orthonormal basis, and they span the same space as E1, E2, E3 — the fundamental color space, the pre-image of the range of A’.

The brown looks like Cohen’s “I”. But the mauve curve is off. Still….

Do I need the negative of the mauve curve?

Yes!

And multiply by 44 ! (No, I have no idea why! I just kept changing the multiplier until I matched his scale on Figure 20.)

Having found them, make them pretty — i.e. set the horizontal scale.

Here, let me put them side-by-side. They look very close, but not exact. Nevertheless, I am content: his numbers tend to be off a little, so what’s the point of trying to match them perfectly?

So. Start with the non-orthonormal basis f, E2, E3 and apply Gram-Schmidt to get the orthonormal basis f1, f2, f3. Then take the negative of f2.

I must emphasize that his scale (i.e. multiply by 44) does not represent orthonormal vectors. Pairwise orthogonal, yes — but the f1, f2, and f3 I derived were specifically unit vectors. The scale on his figure requires that I multiply by 44 (or something very close to it). Once we have unit vectors, we can scale them any way that is convenient. Here it was convenient to match his — frankly, incomprehensible — vertical scale.

Oh, let’s close with one last calculation, to compute the “transformation matrix” for Figure 20. We have found the F matrix for Figure 20, and we have the A matrix. What is the transition matrix T between them? (Actually, we know we want the attitude matrix instead.) Recall that he takes A as the new, and F as the old, components.

So, as usual, we compute each column of F’ (old) by applying T to each column of A’ (new). That is,

F’ = T A’

F = A T’

and we write the false but suggestive

T' = A^{-1} F

which gives us the true

T' = (A'A)^{-1} A' F

using the appropriate pseudo-inverse.

So we compute and display T’, and remember that, as the transpose of a transition matrix, it is an attitude matrix…

T^T = \left(\begin{array}{ccc} 0.0663267 & -0.37473 & -0.161437 \\ 0.142691 & 0.37685 & 0 \\ 0.0781229 & -0.00253108 & 0.161503\end{array}\right)

We can compare that to the transformation matrix TE we tried at the beginning:

T_E = \left(\begin{array}{ccc} 0.09947 & -0.15392 & -0.55609 \\ 0.20105 & 0.33414 & 0.41574 \\ 0.11061 & -0.18021 & 0.14036\end{array}\right)

As we would expect — since the F produced by TE did not match Figure 20 — these are different.

Real Short Summary

I matched his Figure 20 by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3… where f was the fundamental of an equal-energy illuminant, and E2 and E3 were two of the basis vectors dual to the rows of A’.

Summary

We really started with two separate items: a “transformation matrix” TE — one of five on p. 100 — and Figure 20 on p. 95 of Cohen. The transformation matrix TE turned out to be an attitude matrix describing a new A matrix in terms of an old F matrix. (Yes, that’s backwards, but so long as we knew it, it didn’t matter. This is why I carefully distinguish attitude and transition matrices; so long as I know what I’ve been handed, I can use it properly and confidently.)

We saw that F = A TE reproduced only the “EE” curve of his Figure 20. And his F matrix failed to be orthonormal by a factor of \sqrt{2}\ .

We then saw that the “EE” curve was (proportional to) the fundamental f of an equal-energy illuminant. In order to find the fundamental of a spectrum, we had to find a projection operator R onto the pre-image of the range of A^T. The fundamental f is then the result of applying the projection operator R to a constant vector (“equal energy”).

By trial and error, I found that the F matrix for Figure 20 could be (very nearly) reproduced by applying the Gram-Schmidt orthogonalization procedure to the vectors f, E2, E3, where E1, E2, E3 are a basis dual to the rows of A’. We found the dual basis as the rows of E' = (A'A)^{-1} A' and then transposed E’ to get E, a matrix of 3 columns.

In the next post, I will repeat these calculations quickly using 20 nm intervals, and I will show you a few alternatives. In particular, I will show you that his method (a Cholesky decomposition) gives the same answer — even to having to switch the sign of the second column!

Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

As before, I suppose there is a transition matrix T relating them; each column of V^T = V' can be computed by applying a transition matrix T to the corresponding column of P^T\ :

V’ = T P’

i.e.

V = P T’.

Then I imagine briefly that I can “solve” for T’…

T' = P^{-1}\ V\ ,

but then quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (P'P)^{-1}\ P'\ V\ .

I compute T’, and then display its transpose T:

T = \left(\begin{array}{cccc} 0.905145 & -0.386677 & 0.0882059 & -0.141151 \\ 0.853806 & 0.568131 & -0.0875601 & -0.220572 \\ 0.64498 & -0.155196 & -0.506397 & 0.490735 \\ 0.507437 & 0.131036 & 0.67664 & 0.347268\end{array}\right)

and then I test whether

V = P T’ (?).

The largest difference (element by element) between the matrices V and P T’ is 0.170627, which is rather larger (by a factor of 23) than the differences we saw in Part 1. Here’s the full blown matrix — where I chose to round off first, then subtract (i.e. what we’d compute if we were subtracting published, already rounded-off, matrices):

V - P\ T' = \left(\begin{array}{cccc} 0. & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0. & 0 & 0. & 0. \\ 0. & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I am inclined to think there is a probem. But we don’t know much at all about it.

Let’s try one of our alternatives. We construct a projection operator onto the subspace spanned by P.

Get the SVD (Singular Value Decomposition) of P…

P = u\ w\ v^T\ .

Then, since u is 10×10 and V is of rank 4, the leftmost 4 columns (u1) of u….

u1 = \left(\begin{array}{cccc} 0.132041 & 0.280276 & -0.423196 & 0.0366383 \\ 0.24536 & 0.379423 & -0.288143 & -0.0619886 \\ -0.0781059 & 0.252968 & -0.245411 & -0.0642461 \\ -0.310642 & 0.402272 & -0.122076 & 0.641009 \\ 0.19934 & 0.202925 & -0.456543 & -0.212204 \\ -0.275595 & 0.504975 & 0.417433 & 0.130869 \\ 0.0279239 & 0.347505 & 0.199748 & -0.545498 \\ -0.547166 & 0.00466106 & -0.00216808 & -0.298924 \\ -0.571058 & -0.293548 & -0.488675 & -0.061545 \\ -0.278923 & 0.220314 & 0.0368869 & -0.356268\end{array}\right)

are an orthonormal basis for the range of P (i.e. the column space of P). We construct a projection operator, as we have before, by

R = u1\ u1^T\ .

We check it first by confirming that R R = R, i.e. it is idempotent, hence a projection operator. We check it further by applying it to the column vectors in P: we expect that R P = P; i.e. it actually projects onto the column space of P (the space spanned by the columns of P).

Now, having checked that R is a projection operator onto P, we apply it to V, and compute V – R V:

V - R\ V = \left(\begin{array}{cccc} 0.00221622 & 0.0115115 & -0.0249851 & -0.0145408 \\ 0.0137721 & 0.012178 & -0.0243583 & -0.0200425 \\ -0.00564412 & -0.011024 & 0.00933521 & 0.0112507 \\ 0.00222059 & 0.00165751 & -0.00118369 & -0.00423324 \\ 0.00223276 & 0.00638406 & -0.0153346 & -0.0119237 \\ -0.00896396 & -0.0106169 & 0.0272358 & 0.0230479 \\ -0.00941945 & -0.0152489 & 0.0456323 & 0.0330044 \\ 0.0470134 & 0.0703021 & -0.170627 & -0.121383 \\ -0.0232815 & -0.036102 & 0.0941953 & 0.068255 \\ -0.0227796 & -0.0330688 & 0.0740154 & 0.0474348\end{array}\right)

Let me round that to the nearest .01…

V - R V = \left(\begin{array}{cccc} 0 & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0 & 0 & 0 & 0 \\ 0 & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I should put that array of differences into perspective. Recall V:

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

This reinforces my belief that we have a problem. Some of the differences we found are larger than some of the values of V itself. I can see that V is not in the column space of P, and while it’s not extremely far away, it’s not close either. (This seems more definitive than “the hypothetical transition matrix T doesn’t quite work”.)

As before, however, we are tip-toeing around the real question: Is every column vector in V a linear combination of the columns of P?

Easy enough. Do a regression. Actually 4 of them, one for each column of V.

Let me name the individual columns of the P matrix, and let me set the first column of V (v1) as “y”, and the columns of V as independent variables.

(Some smaller pictures follow, in which you can see that the final option is “IncludeConstantBasis -> False”.) The results are:

Not bad at all. In fact, rather close to perfect.

What was the first row of T?

{0.905145, -0.386677, 0.0882059, -0.141151}.

Yes, those are the \beta s (“Estimates”) for that regression; first row of T is first column of T’ is regression coefficients for the first column of V as the dependent variable.

Let me emphasize another relationship. For the first regression, yhat should be the projection of v1 (the 1st column of V) onto the column space of P. But we have that projection directly as R v1:

R v1 = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

… and we have the yhat from the regression:

yhat = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

Good, they are the same. So if regression works better for you than the projection operator, then use regression. Personally, I prefer the projection operator because I can apply it to all of V at once. (And because sometimes I have a use for the orthonormal basis u1.)

Largely to display the R^2, let me run the other three regressions. (Oh, I also reduced the magnification in my Mathematica notebook, so the pictures are smaller and the right side of the regression command isn’t cut off, at least on my screen.)

Here’s v2 as the dependent variable:

Here’s v3 as the dependent variable:

Here’s v4 as the dependent variable:

We see that the R^2 is over 99% for v2, falls to 95% for v3, and is 97.6% for v4. Those last two are just not large enough, not for this question.

I’m certain that V and P do not span the same subspace, but that’s about all I know.

We could, however, have made the dual choice, projecting P onto V instead of V onto P. Are they the same?

No.

We already know — from the T matrix, from the projection operator, or from the regressions — that the P and V matrices do not represent the same data in two different coordinate systems. But there might be — and in this case, there is! — more information available by making the other choice.

Let’s get the projection onto V instead of onto P. We need the SVD of V…

V = u\ w\ v^T\ .

We need the 4 leftmost columns (u1) of u…

u1 = \left(\begin{array}{cccc} -0.292636 & 0.0434068 & -0.33927 & 0.269896 \\ -0.274013 & 0.218205 & -0.238496 & 0.365998 \\ -0.337357 & -0.00723082 & -0.119568 & 0.0745101 \\ -0.355459 & 0.224109 & -0.415783 & -0.490416 \\ -0.2739 & -0.0500253 & -0.225463 & 0.452829 \\ -0.308498 & 0.463936 & 0.232078 & -0.416388 \\ -0.287334 & 0.227908 & 0.427185 & 0.212631 \\ -0.303344 & -0.281983 & 0.503807 & 0.152613 \\ -0.329068 & -0.741253 & -0.129874 & -0.304713 \\ -0.382183 & -0.0396735 & 0.288053 & -0.0857624\end{array}\right)

We construct the projection operator…

R = u1\ u1^T

Now we apply it to P, and compute the difference P – R P:

P - R P = \left(\begin{array}{cccc} 0.00186923 & -0.00646007 & -0.00165699 & 0.0293708 \\ -0.00396322 & 0.00598219 & 0.00143136 & 0.0162366 \\ 0.0035053 & 0.00255499 & -0.00525352 & -0.0232869 \\ 0.000382829 & 0.00307153 & 0.00267228 & 0.124774 \\ 0.000923133 & -0.00219591 & 0.00142733 & -0.0228012 \\ 0.0000472374 & -0.00300758 & -0.00291228 & -0.00287352 \\ -0.00462961 & -0.00220958 & -0.0000400809 & -0.149176 \\ 0.00305495 & 0.00376593 & -0.000439512 & 0.29618 \\ -0.00367312 & -0.00153518 & -0.000838857 & -0.106747 \\ 0.00147883 & -0.000459218 & 0.00482354 & -0.141981\end{array}\right)

So what?

Round it off:

P - R P = \left(\begin{array}{cccc} 0 & -0.01 & 0 & 0.03 \\ 0 & 0.01 & 0 & 0.02 \\ 0 & 0 & -0.01 & -0.02 \\ 0 & 0 & 0 & 0.12 \\ 0 & 0 & 0 & -0.02 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -0.15 \\ 0 & 0 & 0 & 0.3 \\ 0 & 0 & 0 & -0.11 \\ 0 & 0 & 0 & -0.14\end{array}\right)

Ah ha! It’s time for Robin to squeak out, “Holy moly, Batman!”

The fourth column of P does not lie in the subspace spanned by the 4 columns of V, while the first three do. That is the problem. I don’t why or where it happened, but I know exactly what’s wrong: the 4th column of P is the only problem.

Somebody blew it somewhere. (OK, since I don’t know how a varimax rotation is computed, I have to entertain the possibility that V is not supposed to be P under a change of basis. But I can’t believe that, because V and Q have a common column space, and 3 of the 4 columns of P lie in that common column space. If that’s an accident, what does a deliberate act look like?)

Is the mistake in the orignal paper? Were P and V computed from the same correlation matrix but by independent algorithms? Or was V computed incorectly from P (and then Q computed from V)?

Or did someone copy P incorrectly? (P should be primary, but then how the heck does the 4th column of P fail to be in the column space of V? I would sooner have expected an error in V, given P.)

Or did they rotate the first 3 PCs, and then apply that transformation to the 4th column? (No. I’ve checked. You can too.)

Let me emphasize that we would get the same result from four regressions each with one column of P as the dependent variable and all the columns of V as the independent variables. The first three regressions would be extremely good fits, but the fourth would not be. We would, again, conclude that the fourth column of P was not exactly — or even to a fair approximation — a linear combination of the columns of V.

Summary: theory

Given two matrices, say P and V, which are alleged to be the same data in different coordinate systems, we may check that assertion very quickly (if both are of full rank):

Define

T' = (P'P)^{-1} P'\ V\ .

If we find it true that

V’ = T P’,

then T is a transition matrix between two bases, and P and V are indeed the same data in different coordinate systems. We’re done.

If, however, the equality fails, V' \ne T\ P'\ ,

then T is not a transition matrix, and P and V are not the same data in different coordinate systems. Unless we want more information, we’re done.

Finally, we have seen that, numerically, we may have T “close to” a transition matrix. (In Part 1, V and Q are very nearly the same data; in the color posts, the xyx bar and rgb bar tables are very nearly the same data.)

Summary: practice

Computing T tells us if something is wrong, but not much more. It’s quick, easy, and gives a simple answer: “good” or “no good”.

Computing both projection operators P onto V and V onto P (not just one of them) would likewise tell us whether or not something is wrong, and, in addition, the projection operators might isolate the problem.

We could use regression instead of computing explicit projection operators.

Summary: this example

We found that V and Q are the same data in two different coordinate systems, but that P is not. In addition, we actually know that the problem is limited to the 4th column of P: it does not lie in the space spanned by the columns of V (i.e. it does not lie in the column space of V, which is also the column space of Q).

Question:
Can someone tell me if these P, V and Q match the original paper by Yule et al.?

And there’s a question you might have.
How many examples did I have to search through to find such a good (because it was such a bad) example?

One. This was the very first case I checked, outside of the 1931 CIE tables.

That’s scary.

Example: Is it a transition matrix? Part 1

This example comes from PCA / FA (principal component analysis, factor analysis), namely from Jolliffe (see the bibliography). But it illustrates some very nice linear algebra.

More precisely, the source of this example is:
Yule, W., Berger, M., Butler, S., Newham, V. and Tizard, J. (1969). The WPPSL: An empirical evaluation with a British sample. Brit. J. Educ. Psychol., 39, 1-13.

I have not been able to find the original paper. There is a problem here, and I do not know whether the problem lies in the original paper or in Jolliffe’s version of it. If anyone out there can let me know, I’d be grateful. (I will present 3 matrices, taken from Jolliffe; my question is, does the original paper contain the same 3 matrices?)

Like the previous post on this topic, this one is self-contained. In fact, it has almost nothing to do with PCA, and everything to do with finding — or failing to find! — a transition matrix relating two matrices.

On p. 163, Jolliffe provides a matrix of “principal components”:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

On the same page, he provides two matrices of the “rotated factor loadings”. One is “Varimax”…

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

… and the other is “direct quartimin”:

Q = \left(\begin{array}{cccc} 0.51 & -0.05 & 0.05 & 0.05 \\ 0.53 & 0.04 & 0.05 & -0.14 \\ 0.32 & 0.13 & 0.16 & 0.15 \\ 0.17 & -0.19 & 0.65 & 0.2 \\ 0.54 & 0.06 & -0.13 & 0.05 \\ -0.07 & 0.28 & 0.67 & -0.12 \\ 0.16 & 0.53 & 0.13 & -0.17 \\ 0.03 & 0.62 & -0.09 & 0.26 \\ 0. & 0.09 & 0.02 & 0.87 \\ 0.08 & 0.45 & 0.24 & 0.21\end{array}\right)

I presume — not knowing how to comute a varimax rotation! — that each of these matrices represents the same data, i.e. the same 4D subspace of R^{10} (since we have 4 column vectors of length 10 in each matrix). It would seem silly if a varimax rotation of PCs was not supposed to represent rotated PCs within the same subspace.

That presumption is wrong: they do not all represent the same subspaces.

How do I know this?

Because given two matrices of the same shape, I know how to find the transition matrix if it exists. We did this here.

Let me elaborate on what I’m doing. Imagine that our matrices had 2 columns and 3 rows. Those 3D column vectors lie in a 2D subspace — a plane — in R^3\ . By doing 2D transformations within that plane, I might find a particularly nice basis for representing that data.

If, instead, I apply a 3D transformation, I would be moving that data into a different plane. What I am doing here is looking at 4D transformations within the 4D subspace. Everything is predicated on my belief that the PCs in P, and the varimax rotation of them in V, and the direct quartimin data in Q are all supposed to be in the same subspace.

Not knowing how they were supposed to be computed, I could be seriously wrong here. OTOH, since we will find that V and Q are the same subspace, and the first 3 of the 4 columns of P also lie in that common subspace, I really believe that the 4th column of P should too. That it does not strikes me as an error.

Let me make one simplification up front. Are the matrices P, V, Q of full rank (4)?

Yes. The Mathematica command MatrixRank says that each of those matrices is of rank 4.

Let us consider V and Q. If the observations in each are related by a transition matrix T, and if we arbitrarily take V to be the “old” data, then each column of V’ (the transpose of V) is found by applying the transition matrix to the corresponding column of Q’.

V’ = T Q’

i.e.

V = Q T’.

Then, as I showed before, I imagine briefly that I can “solve” for T’…

T' = Q^{-1} V\ ,

but quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (Q'Q)^{-1} Q'\ V\ .

(It’s worth repeating that the inappropriate pseudo-inverse would use QQ’, but that can’t work because QQ’ is 10×10 but of rank 4 at most, hence not invertible.)

If Q’Q is invertible, that recipe always gives me a matrix T, but T need not be a transition matrix. The resulting T is a transition matrix if and only if we actually have

V = Q T’,

(given that V and Q are of the same full rank).

Do we have V = Q T’ ?

By direct computation I get

T = \left(\begin{array}{cccc} 0.9273 & 0.0921888 & 0.15161 & 0.101824 \\ 0.22944 & 0.864017 & 0.176744 & 0.0556721 \\ 0.251353 & 0.0599739 & 0.913081 & 0.0549228 \\ 0.185642 & 0.11154 & 0.00492819 & 0.941728\end{array}\right)

and then I look at V – Q T’…. You know, it’s easy enough to put them side by side.

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right) Q\ T' = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.16 & 0.17 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0 & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

Even the difference of the rounded-off matrices is satisfying small. That is,
Round[V,.01]-Round[Q.TT,.01]//Chop//MatrixForm is

\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & -0.01 & 0.01 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

Judging that the difference is negligible, I conclude that the matrices V and Q do span the same subspace of R^{10}\ ; they are the same 4D subspace with two different coordinate systems (two sets of basis vectors).

While we’re here, let’s do this another way. After all, the fundamental question is: can the columns of V be written as linear combinations of the columns of Q? Alternatively, can each column of Q be written as a linear combination of the columns of V?

Bear in mind that we have already established that Q and V are of the same full rank. In that case, the alternatives are equivalent.

We can answer the fundamental question by trying to regress each column of V on Q. Just remember to tell the regression to drop the constant!

To be specific, here’s the first column of V as a linear combination of the 4 columns of Q. Here’s the parameter table and R^2 — not the adjusted R^2\ : I really want to know how good this fit is, never mind how many variables it has.

Here’s a snapshot of the Mathematica commands (version 7). (Oh, darn. The last option in the LinearModelFit command, which replaced “Regress” is “IncludeConstant->False”. And I print the names just to confirm that none of those symbols has a left-over numerical value!)

(Let me just remind you that the R^2 is a measure of a how close the fit is. When we add another independent variable, however, the R^2 cannot decrease; it is not a good measure of the value — i.e. the worth — of that latest independent variable. The adjusted R^2 can decrease; in fact, in my experience, if the t-statistic of that latest variable is greater than 1 (in magnitude), then the adjusted R^2 goes up. As I said, in this case, I really want the R^2 rather than adjusted R^2 because I know exactly how many independent variables I am supposed to use. That said, there isn’t a whole lot of difference between the two in this case.)

v1 on Q

That’s a really good fit. And it should be. And that column of \beta\ , i.e. “Estimate”? It’s the first column of T’, i.e. the first row of T:

{0.9273, 0.0921888, 0.15161, 0.101824}

We could run 3 more regressions, but they’d only confirm what we believe: T’ could, in fact, be computed as regression coefficients. If the two subspaces coincide, then T is also a transition matrix between two bases in one and the same space.

There is yet another way to look at that. We know that in regression, yhat is the projection of y onto the subspace spanned by the columns of X. If y itself is in that subspace, then yhat = y, and the relatonship is given by a transition matrix.

So we could do this a third way. We know how to compute a projection operator onto the subspace spanned by the columns of X. In this case, X is Q, so we find its SVD (Singular Value Decomposition)…

Q = u\ w\ v^T

Then the 4 leftmost columns (u1) of u are an orthonormal basis for the range of Q….

u1 = \left(\begin{array}{cccc} -0.203473 & 0.27677 & 0.353491 & 0.188167 \\ -0.188235 & 0.423093 & 0.310857 & 0.0177813 \\ -0.30583 & 0.10025 & 0.140569 & 0.094808 \\ -0.313324 & 0.177504 & -0.285416 & 0.609666 \\ -0.194156 & 0.220796 & 0.497051 & 0.0101643 \\ -0.354903 & 0.228756 & -0.604256 & -0.00173818 \\ -0.320761 & 0.191153 & -0.0758258 & -0.463325 \\ -0.379019 & -0.300366 & 0.0912622 & -0.459411 \\ -0.365508 & -0.681435 & 0.196437 & 0.360653 \\ -0.43321 & -0.103153 & -0.111086 & -0.166404\end{array}\right)

That is, u1′ u1 = I, if the column space of X is of full rank. Check it:

u1^T u1 = \left(\begin{array}{cccc} 1. & 0 & 0 & 0 \\ 0 & 1. & 0 & 0 \\ 0 & 0 & 1. & 0 \\ 0 & 0 & 0 & 1.\end{array}\right)

Bur the reversed matrix product

R = u1 u1′

gives us a projection operator R onto the range of Q:

We can check it. (I won’t print it: it’s 10×10.) One, R is a projection operator if and only if it is idempotent, i.e. R R = R. And by computation, I see that is true.

Two, applied to each column of Q, it reproduces that column of Q; i.e. applied to Q as a whole, it reproduces Q, i.e. R Q = Q. Again, computation shows that is true, too.

Having checked that R is a projection onto the column space of Q, let’s use it: apply it to V. If V lies in the subspace spanned by the columns of Q, then R V will reproduce V….

I find that the largest difference between the elements of R V and V is 0.00720726 .

That’s as close as the other two calculations. Not exactly zero, but then, our regression didn’t fit exactly: like the RGB and XYZ tables in the earlier post, V and Q are almost but not exactly related by a transition matrix.

So we have seen three ways of looking at this. All three tell the same story: the column spaces of V and Q are the same.

Let me say a little more about two of them, the projection operator and the explicit regression.

The projection operator R came from the SVD. The regression can also be written using a projection operator, namely the hat matrix H. I keep getting mixed up about the relationship between T and R and H, so let me write it out.

The normal equations for

\hat{y} = X\ \beta

are

\beta = (X'X)^{-1}\ X'\ y\ .

(I showed you a quick and dirty way to recover that equation in the earlier post.)

Then

\hat{y} = X\ \beta = X\ (X'X)^{-1}\ X'\ y

and so we may write

yhat = H y

with the hat matrix H defined as

H = X\ (X'X)^{-1}\ X'\ .

For our regression, we took y to be the first column of V, and X to be the four colums of Q, so our hat matrix would be

H = Q\ (Q'Q)^{-1}\ Q'\ .

It is a projection operator onto the range of Q. But that’s exactly what R was. Go ahead and check: you will find that

H = R.

This means that we could compute R as the hat matrix, instead of using the SVD. Personally, I know how to use the SVD, whereas I would have to work out the definition of the hat matrix every time. But that’s a preference; you may choose to compute the hat matrix instead. (And that is how Malinowski was doing it!)

Going a little further, T’ was

T' = (Q'Q)^{-1}\ Q'\ V

so we have

Q T’ = H V.

We’ll look at the relationship between P and V in Part 2.

Fourier Series and Fourier Transform

I want to show you something that I think is really neat. (Whether you think it is neat is your decision, not mine. I don’t like it when people tell me what I will think about something they’re about to show me.)

I do not know why this calculation is not readily available. All I know is that I had to search high and low to find an example with sufficient detail that I could duplicate it. In fact, it is difficult to find a text which actually states the result at all, never mind one which works it out for a particular case.

  • I intend to take a function which is zero outside the interval [-1/2, 1/2]…
  • I will compute several terms in the associated Fourier series…
  • I will compute its Fourier transform…
  • and I will hit you between the eyes with the relationship between them.

I won’t even keep it a secret:

  • the coefficients in the Fourier series are samples of the Fourier transform.
  • alternatively, the Fourier transform of the Fourier series is discrete samples of the original continuous Fourier transform.

I think that’s marvelous. But even knowing that it’s true, I don’t see it stated very often.

There are several things I will not do in this post.

  • I will use but not discuss Fourier series or the Fourier transform;
  • I will work the example for a very specific set of parameters;
  • that is, I will not yet try changing the interval to, say, [0, 2 \pi\ ];
  • I won’t even prove the relationship;
  • I will not show you the relationship to the discrete Fourier transform (i.e. the FFT);

I should say that the proof really just amounts to showing that the Fourier transform of the cosine exists and is a pair of delta functions.

It may be that the FFT is so important that most people do not care about the relationship between a Fourier series and a Fourier transform. Well, I do.

Having told you all the things I will not do in this post, let me emphasize that I think this example — or, frankly, any similar example — is marvelous and worthwhile. There are a lot of things that have to be done right to get the details to come out right. And given one good example, I can make a whole lot of sense out of theory.

This example is based on one in Bracewell (Bracewell, Ronald N.; The Fourier Transformation and Its Applications. McGraw Hill, 2000 (3rd ed). ISBN 0 07 303938 1), around p. 244, slightly modified. It is a most excellent book for insight; the mathematics is there, and right, but in the background. The text is marred by typos, but is extraordinary despite them.

I need some pictures. To get them, I need four things. I start with a Fourier transform, namely the square of a sinc function, which itself is some form of sin(x) / x:

F(s) = \frac{1}{5} \text{sinc}\left(\frac{\pi\ s}{5}\right)^2

Picture 45

We find the inverse Fourier transform:

f(t) = \frac{5}{2} t   \text{sgn}\left(t-\frac{1}{5}\right)-\frac{1}{2}   \text{sgn}\left(t-\frac{1}{5}\right)-5 t   \text{sgn}(t)+\frac{5}{2} t   \text{sgn}\left(t+\frac{1}{5}\right)+\frac{1}{2}   \text{sgn}\left(t+\frac{1}{5}\right)

Picture 46

Hey, it’s our favorite counterexample for wavelets: the linear spline scaling function. Ok, it’s skinnier, not of width 2, but it is a triangle. This one does not even fill up the interval [-1/2, 1/2].

Next, I need the Fourier Series of that triangle. Well, I don’t really want an infinite series; I just want a finite approximation. Let’s get 7 terms icluding the constant:

0.350056 \cos (2 \pi  t)+0.229115 \cos (4 \pi    t)+0.101829 \cos (6 \pi  t)+0.0218785 \cos (8 \pi    t)+0.00972378 \cos (12 \pi  t)+0.2

Mathematica complains about convergence, but it turns out, it’s good enough. We see that there is no term in  Cos[10 \pi t]\ , so it looks like we only got 6 terms.

Picture 47

I have shown both the Fourier Series in red and the original single triangle in blue. Strictly speaking, and we need to be strict on this point, that is not the Fourier series of the triangle, but of its periodic extension. We might call it the Fourier series associated with the single triangular pulse.

Finally, just for the drawings, I want to sample the Fourier Transform at the integers:

\left(\begin{array}{cc} -6. & 0.00486189 \\ -5. & 0. \\ -4. & 0.0109393 \\ -3. & 0.0509144 \\ -2. & 0.114557 \\ -1. & 0.175028 \\ 0. & 0.2 \\ 1. & 0.175028 \\ 2. & 0.114557 \\ 3. & 0.0509144 \\ 4. & 0.0109393 \\ 5. & 0. \\ 6. & 0.00486189\end{array}\right)\ .

Picture 48

Let me put it all together.

Picture 49

The top left is a single triangular pulse in the time domain, of height 1, centered at the origin, of width 0.4. We might say it is band-limited in time, or that it is of compact support.

The top right is the Fourier transform of that triangular pulse. It is continuous, and not of compact support.

The bottom left shows the triangular pulse and (the first 7 terms of) its fourier series. Although the triangular pulse is not periodic, the Fourier series is.

The bottom right shows two things. The continuous Fourier transform and red dots which are, by definition, samples of that transform.

Drum roll, please. They are also the coefficients of the fourier series.

How can that be? For crying out loud, there are 13 samples but 7 coefficients! Worse, although the constant term in the Fourier series is .2, and that is equal to the sample F[0], the first cosine term has coefficient 0.350056 while the sample F[1] = 0.175028 .

Hey, if we multiply the sample value by 2, we get: 0.350056 . Which is our leading cosine coefficient.

Conceptually, some of our samples are for negative frequencies, but we wrote the Fourier series with positive frequencies only. If we use complex exponentials instead of cosines, then we get negative frequencies too, and the series coefficients will be cut in half (except the constant term).

Alternatively (but equivalently), since Cos[-2 \pi t] = Cos[2 \pi t]\ , an individual series term such as

0.350056 \cos (2 \pi  t)\

could be wriiten as

0.175028 Cos[2 \pi t] + 0.175028 Cos[-2 \pi t]\ ,

and there are our sample values equal to 0.175028 at -1 and 1.

Let me take a longer look.

We started with the Fourier transform:

F(s) = \frac{1}{5} \text{sinc}\left(\frac{\pi    s}{5}\right)^2\ .

If we remember that the sinc function is the Fourier transform of a rectangle, then the sinc^2 function is the Fourier transform of the convolution of two rectangles.

But the convolution of two rectangles is a triangle. That the inverse Fourier transform is a triangle should seem more than plausible.

Specifically, it turns out to be:

f(t) = \frac{5}{2} t\    \text{sgn}\left(t-\frac{1}{5}\right)-\frac{1}{2}   \text{sgn}\left(t-\frac{1}{5}\right)-5 t\   \text{sgn}(t)+\frac{5}{2} t\    \text{sgn}\left(t+\frac{1}{5}\right)+\frac{1}{2}   \text{sgn}\left(t+\frac{1}{5}\right)\ .

I can’t say I like the way Mathematica writes it, but the graph showed that we do in fact have a single triangular pulse.

Then I asked Mathematica for the first 7 terms of the Fourier series of the triangle:

fs(t) = 0.350056 \cos (2 \pi  t)+0.229115 \cos (4 \pi    t)+0.101829 \cos (6 \pi  t)+0.0218785 \cos (8 \pi    t)+0.00972378 \cos (12 \pi  t)+0.2\ .

As I said, it turns out there are in fact only 6 terms: we are missing one in Cos[10 \pi t]\ . (Strictly speaking, my command probably asked Mathematica for the first 6 non-zero terms. Whatever.)

Finally, I computed and plotted samples of the fourier transform. What were the samples? The plotted points were

\left(\begin{array}{cc} -6. & 0.00486189 \\ -5. & 0. \\ -4. & 0.0109393 \\ -3. & 0.0509144 \\ -2. & 0.114557 \\ -1. & 0.175028 \\ 0. & 0.2 \\ 1. & 0.175028 \\ 2. & 0.114557 \\ 3. & 0.0509144 \\ 4. & 0.0109393 \\ 5. & 0. \\ 6. & 0.00486189\end{array}\right)\

so the y-values were the right column of that array.

Let me explicitly convert the fourier series to complex exponentials, using

(e^{i t} + e^{i t})/2 = Cos (t)\

fs(t) = 0.2+0.175028 e^{-2 i \pi  t}+0.175028 e^{2 i \pi    t}+0.114557 e^{-4 i \pi  t}+0.114557 e^{4 i \pi    t}+0.0509144 e^{-6 i \pi  t}+0.0509144 e^{6 i \pi    t}+0.0109393 e^{-8 i \pi  t}+0.0109393 e^{8 i \pi    t}+0.00486189 e^{-12 i \pi  t}+0.00486189 e^{12 i   \pi  t}\ .

It takes me a little work to get Mathematica to display those coefficients in order (i.e. the coefficient of E^{-12 i \pi t} \ followed by the coefficient of E^{-8 i \pi t}\ , and so on). but when I have them, the ordered list is:

{0.00486189, 0, 0.0109393, 0.0509144, 0.114557, 0.175028, 0.2,
0.175028, 0.114557, 0.0509144, 0.0109393, 0, 0.00486189}

so there we have the 13 fourier series coefficients for the first several powers of e^{2 i n \pi t}\ , where “several” is n = -6 to 6.

I observe that the missing term Cos[10 \pi t]\ in the Fourier series corresponds to a sample value of… 0.

Hit that drum again, maestro, while I repeat myself. Those 13 coefficients are exactly the 13 samples of the fourier transform.

I am not quite done. What we have is a fascinating observation. The Fourier series coefficients are samples of the Fourier transform.

But why?

Consider that we have two different functions of time: the triangle and the (truncated) Fourier series.

Maybe we should find the Fourier transform of the Fourier series instead of the Fourier transform of the single triangle.

Recall the truncated fourier series

fs(t) = 0.350056 \cos (2 \pi  t)+0.229115 \cos (4 \pi    t)+0.101829 \cos (6 \pi  t)+0.0218785 \cos (8 \pi    t)+0.00972378 \cos (12 \pi  t)+0.2\ .

Ask for its fourier transform and get

FS(s) = 0.00486189\ \delta (s-6)+0.0109393\ \delta (s-4)+0.0509144\ \delta (s-3)+0.114557\ \delta (s-2)+0.175028\ \delta (s-1)+ 0.2\ \delta   (s)+0.175028\ \delta (s+1)+0.114557\ \delta (s+2)+0.0509144\ \delta (s+3)+0.0109393\ \delta(s+4)+0.00486189\ \delta (s+6)\ .

We see two things. One, Dirac deltas at the integers. This tells us that the Fourier transform of the Fourier series is discrete not continuous. Two, the coefficients in front of the Dirac deltas are precisely the coefficients in the Fourier series. This is why the coefficients of the Fourier series match the samples — because the samples are the Fourier transform of the Fourier series.

Blow your trumpet, Gabriel:

  • we have a continuous Fourier transform for the triangular pulse,
  • we have a discrete Fourier transform for the (periodic) Fourier series associated with the triangular pulse,
  • and that discrete Fourier transform consists of samples of the continuous Fourier transform.

I’m not sure that a drawing of the Fourier transform of the Fourier series will be convincing. After all, the easiest way to draw it is to plot the samples of the continuous Fourier transform! Those red dots on the picture, by themselves, are the Fourier transform of the Fourier series.

Maybe it’s okay to ask and emphasize: why are the two Fourier transforms not the same? Because we’ve taken Fourier transforms of two different functions. The (infinite) Fourier series is not equal to the triangular pulse (nor is the finite approximation); the series and the finite approximation correspond to a periodic extension of the triangular pulse, and an approximation of that periodic extension. In our first math class about Fourier series we ignored that difference, and focused on one period of the periodic extension; the Fourier transform shows us that we sometimes need to honor the difference between a non-periodic function and a periodic extension of it.

That it works out exactly depends on the Fourier transform of a cosine being precisely a (pair of) Dirac delta(s). If there’s a proportionality constant there, then we’ll be off by that constant factor. Similarly, that the samples were at the integers just required that we be taking the fourier transform of, e.g. Cos[2 \pi t]\ rather than of Cos[t].

This may be why this example is so hard to find. You will discover that most of your books use a different form of the Fourier transform (different choices of the FourierParameters).

Let me state my Fourier transform pair explicitly. I used the Mathematica option FourierParameters -> \{0, -2 \pi\}\ . Then the Fourier transform F(s) of a function f(t) is

Picture 52

and the inverse Fourier transform recovers f(t) from F(s):

Picture 53

Picture 54

Oh, there’s one other convention to be aware of, the definition of the sinc function.

Mathematica is using

sinc(x) = (sin x)/x,

and Bracewell is using

sinc(x) = (sin\ (\pi\ x))/(\pi\ x)\ .

Where Bracewell wrote (for the transform of a triangle of half-width 1/10)

\frac{1}{10}\ sinc^2(s/10)\ ,

I would have needed to write

1/10 sinc^2((\pi s)/10) = 1/10 sin^2((\pi s)/10) / (\pi s/10)^2\ .

(In fact, I arranged to use a triangle of half-width 1/5, instead.)

Let me close with a Mathematica detail. If you attempt to confirm either of those transform integrals directly, you should define a function using “set-delayed”, i.e. “:=”, instead of “=”. There’s a parameter inside each of those integrals, and you want to evaluate the integral first, then the parameter. That is, for example, if we write the inverse transform as… then we get…

Picture 50

and in particular, f1[0] = 1/2, which is wrong.

Instead,

Picture 51

Further, if you want to plot that inverse transform, you will probably find that you will need to compute points explicitly and plot them; the “//Evaluate” that would speed up the computation within the plot command appears to override the “set-delayed”.

It would probably be very instructive for me to look at all this for a triangular pulse of width greater than 1. Maybe I will. But this post was delayed while I fought with the explicit integrals, and I’ve returned to color theory….

Oh, I should at least point out that this example gives us a very useful insight:
because we recognize the Fourier series coefficients as samples of the Fourier transform, we know a heck of a lot about the sizes of the Fourier series coefficients, even without computing them. In particular, looking at the continuous transform at integers 7 and 8, we see that the next coefficient in the series is larger than the last one we computed, and the one after that is about the same size as the last one computed, and all the ones after that are very small. We know this without computing them, just by looking at the graph. (OK, it helps that we understand that the graph does get smaller and smaller as the x^2 — make that s^2 — in the denominator gets larger.)

Linear Algebra: the four fundamental subspaces

Edit 18 Sep 2009. Purely cosmetic. I have highlighted in blue the result describing the spans of the parts of the u and v matrices. I have found that I still need to refer to it rather than reconstruct it when I want it. In other words, it’s what I want to pick out quickly from this post.

Given a linear operator A, we are told that there are four fundamental subspaces associated with it. They are

  • the nullspace of A
  • the column space of A
  • the nullspace of A^T\ (i.e. of the transpose of A)
  • the column space of A^T\ .

Okay, they are easy enough to list.

Next, we need to realize that the column space of A is the range of A, and the column space of A^T\ is the range of A^T\ . That’s what we really care about — the ranges.

But why is A^T\ involved? In particular, why do we care about the nullspace and the range of A^T\ ?

Let me show you what they “really” are. (That is, this is how I understand them.)

I’m going to use an exercise from Strang’s “Linear Algebra and Its Applications”; see the bibliography.

This is exercise 2.4.2.. “Find the dimension and construct a basis for the four subspaces associated with…”

A = \left(\begin{array}{cccc} 0 & 1 & 4 & 0 \\ 0 & 2 & 8 & 0\end{array}\right)

(Of course, since the second row is twice the first, we already know that the matrix is of rank 1.)

Now, Strang presents the four fundamental subspaces by using the LDU (LU) decomposition; I am going to use — what else? — the Singular Value Decomposition (SVD).

Here is the Mathematica command…

Picture 42

We know all about the SVD. In particular, we know that the SVD decomposes the matrix A as

A = u\ w\ v^T\ ,

where u and v are orthogonal matrices; and w is the same shape as A and, loosely speaking, as diagonal as it can be.

Here is the w matrix…

w = \left(\begin{array}{cccc} \sqrt{85} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

That we have only one nonzero value tells us, if we didn’t already know it, that the matrix A is of rank 1. ( This is a representation of the A matrix with respect to different bases, specified by u and v. As such, it is the same shape and of the same rank. Since w is of rank 1, so is A.)

Here is the v matrix…

v = \left(\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{\sqrt{17}} & 0 & -\frac{4}{\sqrt{17}} & 0 \\ \frac{4}{\sqrt{17}} & 0 & \frac{1}{\sqrt{17}} & 0 \\ 0 & 1 & 0 & 0\end{array}\right)

The columns of v are an orthonormal basis for the domain of A. Let’s apply A to it, that is to each of these basis vectors.

A\ v = \left(\begin{array}{cccc} \sqrt{17} & 0 & 0 & 0 \\ 2 \sqrt{17} & 0 & 0 & 0\end{array}\right)

Well! The three rightmost columns of v were mapped to zero by A; we conclude that they are a basis for the nullspace of A.

What about the leftmost (remaining) column of v? Its image, by definition, is in the range of A. So the first column of v is a basis for the pre-image of the range of A.

(Once we know what A does to a basis for its domain, we know what it does to any vector in its domain.)

That’s the pre-image of the range. What about the range of A? Well, let’s look at the matrix u:

u = \left(\begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\end{array}\right)

We see that the first column of u is proportional to the image of the first column of v. They span the same space — which is the range of A. Look at the algebra instead of the numbers:

A\ v = u\ w\ v^T\ v = u\ w\ ,

and the almost-diagonal form of w says that the columns of u w (equivalently, the columns of A v) are multiples of the columns of u. OK, some of them are zero multiples, but the real point is that the nonzero columns of A v are multiples of corresponding columns of u.

What about the other column of u? The matrix A maps from R^4 to R^2; we say that its codomain is R^2. And its codomain is split into two subspaces, each 1-dimensional:

  • the range of A
  • the part of the codomain that is not the range (“the part we cannot reach with A”).

The rightmost column of u is a basis for the rest of the codomain.

Okay…. We seem to have four fundamental subspaces, and we have orthonormal bases for them, which means we have orthogonal direct sums:

  • domain of A = nullspace of A \oplus\ preimage of the range of A
  • codomain of A = range of A \oplus\ the part we cannot reach with A.

The columns of v split naturally into bases for the nullspace and for the pre-image of the range of A; the columns of u split naturally into bases for the range of A and for the part we cannot reach with A. That is,

  • nullspace of A is spanned by rightmost columns of v
  • pre-image of range of A is spanned by leftmost columns of v
  • range of A is spanned by leftmost columns of u
  • the part we cannot reach is spanned by rightmost columns of u

But what about the range and nullspace of A^T\ ? Now would be a good time to guess that

  • the preimage of the range of A is the range of A^T\
  • the part we cannot reach with A is the nullspace of A^T\ ,

and that’s why A^T\ shows up: it provides more convenient (and in a sense, symmetric) names for two of the four spaces.

Maybe we should write the SVD of A^T\ ? From

A = u\ w\ v^T\ ,

we get

A^T = v\ w^T\ u^T\ .

The roles of u and v have been swapped. (The point is that there are no new bases involved, still just the columns of u and v.)

Then our description of the spaces and bases of A translates to the following description of the spaces and bases of A^T\ :

  • nullspace of A^T\ is spanned by rightmost columns of u
  • pre-image of range of A^T\ is spanned by leftmost columns of u
  • range of A^T\ is spanned by leftmost columns of v
  • the part we cannot reach with A^T\ is spanned by rightmost columns of v.

Our guess was right. The bases tell us that we have two names for the spaces spanned by each of the 4 partial bases.

  • the rightmost columns of v span the nullspace of A, and equivalently the part we cannot reach with A^T\
  • the leftmost columns of v span the pre-image of the range of A, and equivalently the range of A^T\
  • the rightmost columns of u span the nullspace of A^T\ , and equivalently the part we cannot reach with A
  • the leftmost columns of u span the pre-image of the range of A^T\ , and equivalently the range of A.

Therefore, instead of naming the orthogonal direct sums as

  • domain of A = nullspace of A \oplus\ preimage of the range of A
  • codomain of A = range of A \oplus\ the part we cannot reach with A

we call them

  • domain of A = nullspace of A \oplus\ range of A^T\
  • codomain of A = range of A \oplus\ the nullspace of A^T\

(If you think about it, those are the only two combinations of “range” and “nullspace” that make sense, i.e. that are subspaces of the domain and codomain respectively.)

BTW, we should actually solve the exercise (!) and write down the bases.

Recall v.

v = \left(\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{\sqrt{17}} & 0 & -\frac{4}{\sqrt{17}} & 0 \\ \frac{4}{\sqrt{17}} & 0 & \frac{1}{\sqrt{17}} & 0 \\ 0 & 1 & 0 & 0\end{array}\right)

For the nullspace of A, the 3 rightmost columns of v:

v2 = \left(\begin{array}{ccc} 0 & 0 & 1 \\ 0 & -\frac{4}{\sqrt{17}} & 0 \\ 0 & \frac{1}{\sqrt{17}} & 0 \\ 1 & 0 & 0\end{array}\right)

Strang’s basis is equivalent:

\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & -4 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right)

Check mine by applying A to each column…

A\ v2 = \left(\begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

For the nullspace of A^T\ , recall u:

u = \left(\begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\end{array}\right)

then the rightmost columns of u are:

u2 = \left(\begin{array}{c} -\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}}\end{array}\right)

Strang’s basis is equivalent: \{-2,1\}\ .

Check mine by applying A^T\ to it:

A^T\ u2 = \left(\begin{array}{c} 0 \\ 0 \\ 0 \\ 0\end{array}\right)

For the range of A, the first column of u:

u1 = \left(\begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}}\end{array}\right)

Strang’s basis is equivalent: \{1,2\}\ .

For the range of A^T\ , the leftmost column of v:

v1 = \left(\begin{array}{c} 0 \\ \frac{1}{\sqrt{17}} \\ \frac{4}{\sqrt{17}} \\ 0\end{array}\right)

Again, his basis is equivalent: \{0,1,4,0\}\ .

I should point out that Strang’s bases for the domain and codomain (found from the LDU decomposition) are orthogonal but not orthonormal. Mine are orthonormal.

I should also point out that I really, really like orthonormal bases. Sometimes there’s a good reason not to use them, but barring such a reason – orthonormal. And we will be using this material for light spectra.

So, as I see it, we have two principles here.

First, the 4 fundamental subspaces “really” are

  • an orthogonal decomposition of the domain of A:
    • nullspace of A
    • preimage of the range of A
  • an orthogonal decomposition of the codomain of A:
    • the range of A
    • the subspace we cannot reach with A

Second, we can describe two of those subspaces in terms of A^T\ ,

  • the range of A^T\ = preimage of the range of A
  • the nullspace of A^T\ = the subspace we cannot reach with A.

Thus, more compactly, we have the 4 fundamental subspaces as

  • an orthogonal decomposition of the domain of A:
    • nullspace of A
    • range of A^T\
  • an orthogonal decomposition of the codomain of A:
    • the range of A
    • nullspace of A^T\

As Paul Harvey used to say, “And now you know the rest of the story.”

Transpose & Adjoint: review & example

introduction

Let me pick up an old problem. The mathematics comes from the two posts “transpose matrix and adjoint operator” part 1 & part 2. The problem itself comes from the schur’s lemma post.

Upon further reflection, I am going to change the problem a little bit. Do not expect to see the same answers as before.

I am also going to work it twice, assuming that we are given different information as our starting point, but I’ll do it for the very same problem.

As I have said, the appropriate question for an introduction to ABO blood groups is: Can your mother donate blood to you? Until you can answer that question, you’re missing something about blood groups.

This example is not that good, but it does tie together the following concepts and computations:

  • the transition matrix;
  • the attitude matrix;
  • the change of basis formula;
  • the reciprocal basis;
  • the transpose of a matrix;
  • the adjoint of a linear operator;
  • the matrix of the adjoint operator.

This example is intended as a guide to and a review of my relevant posts, and will not repeat very much of the explanations in them.

Well, I have found that the explanations are growing.

What is the problem? Given a matrix and a non-orthonormal basis, find the matrix of the adjoint operator with respect to the non-orthonormal basis.

What is the key? We only know one way to find the matrix of the adjoint operator: take the transpose of some matrix.

Here is the general case: given a matrix X with respect to a basis E, we can transpose it, getting X^T\ . But that is the matrix of the adjoint with respect to the dual basis F. To get the matrix with respect to the basis E, we must do a change-of-basis to get from X^T\ with respect to to F to, say, Y^T\ with respect to E.

There is, however, a very special case: if E is an orthonormal basis, then it is its own reciprocal basis, F = E, and we were done as soon as we computed the transpose, becuse X^T\ is the matrix of the adjoint with respect to E.

The general problem, then, is: given a matrix, find the matrix of its adjoint with respect to a non-orthonormal basis. We will consider two possibiities: first, that the original basis is orthonormal; second, that it is not.

I would like to make one final point before we dive in. There is a fundamental truth here, regardless of the precise definition of the adjoint operator. The fact is that two matrices A and A^T\ represent two different but related linear operators (with respect to an orthonormal basis); but if we transform those matrices to a non-orthonormal basis, the two results will not be the transposes of each other.

In other words, the relationship between two linear operators which is represented by a matrix and its transpose, holds only in an orthonormal basis.

first form of the question

First, let us suppose we are given a matrix A with respect to an orthonormal basis; and a description of a non-orthonormal basis.

Here is the given matrix A:

A = \left(\begin{array}{cc} 1 & 1 \\ 0 & 2\end{array}\right)

(This much is the same as before; this is the same matrix as in the previous posts.)

Here is the given non-orthonormal basis \{f_1, f_2\}\ specified by an attitude matrix…

att = \left(\begin{array}{cc} 1 & -1 \\ 1 & 0\end{array}\right)

Find the matrix, which I will call C^T\ , of the adjoint operator with respect to the non-orthonormal basis.

Easy enough, once you’re used to it: we have A with respect to an orthonormal basis, so we form its transpose A^T\ . This is the matrix of the adjoint with respect to the orthonormal basis. Now we just need to transform A^T\ to the non-orthonormal basis.

We want the transition matrix P, which is the transpose of the attitude matrix: P = {att}^T\ .

P = \left(\begin{array}{cc} 1 & 1 \\ -1 & 0\end{array}\right)

One form of the change-of-basis formula tells us how to find the matrix B with respect to the new basis corresponding to A with respect to the original:

B = P^{-1} A P\ ,

so we get

B = \left(\begin{array}{cc} 2 & 0 \\ -2 & 1\end{array}\right)

That is, A and B are matrix representations, with respect to two bases, of a single linear operator. We don’t actually need the matrix B, but it serves for displaying the general change-of-basis formula for from A to B. And we will want to contrast it with C^T\ .

We have been asked to find the matrix C^T\ corresponding to A^T\ with respect to a new basis. But that uses the change-of-basis formula, with A^T\ and C^T\ in place of A and B respectively:

C^T = P^{-1} A^T P\ ,

so we get

C^T = \left(\begin{array}{cc} 1 & -1 \\ 0 & 2\end{array}\right)

What we see (as we should expect) is that B and C^T\ are not the transposes of each other.

We have, then, 4 matrices. A and A^T\ represent a linear operator and its adjoint with respect to an orthornormal basis. B and C^T\ represent the same linear operator and its adjoint with respect to the non-orthonormal basis \{f_1, f_2\}\ .

(In principle, of course, we only needed three of those matrices: A, A^T\ , and C^T\ ; but it may be informative to compute B.)

second form of the question
first of three solutions

We are given the matrix B, representing a linear operator with respect to a non-orthonormal basis…

B = \left(\begin{array}{cc} 2 & 0 \\ -2 & 1\end{array}\right)

and we were given the attitude matrix describing that non-orthonormal basis…

att = \left(\begin{array}{cc} 1 & -1 \\ 1 & 0\end{array}\right)

We want to find the matrix C^T\ of the adjoint operator with respect to the non-orthonormal basis.

The quickest way to do it is to transform B to the orthonormal basis, which gives us A. We still use the transition matrix P = {att}^T \

P = \left(\begin{array}{cc} 1 & 1 \\ -1 & 0\end{array}\right)

and the change of basis formula (but note that we’re going in the other direction, to the orthonormal basis instead of from it)…

A = P B P^{-1}\ .

so we get

A = \left(\begin{array}{cc} 1 & 1 \\ 0 & 2\end{array}\right)

We have just reduced this question to the previous problem: now we can get C^T\ by transforming A^T\ to the non-orthonormal basis…

C^T = P^{-1} A^T P\ .

We’re done. We could stop here. Maybe we should. But it’s nice to know more than one way to do something, because then we can check our work.

second of three solutions

Alternatively, we could use the reciprocal basis.

This gets a little messy only because we’re introducing a 3rd basis, so we get 2 more matrices.

We transpose B, and we get the matrix of the adjoint operator with respect to the reciprocal basis:

B^T = \left(\begin{array}{cc} 2 & -2 \\ 0 & 1\end{array}\right)\ .

But we’ve only just begun. We don’t want the adjoint with respect to the reciprocal basis; we want it with respect to the given non-orthonormal basis. It’s just that, ultimately, we have to take the transpose with respect to some basis; it’s the only way to compute the matrix of the adjoint. Then the question

how do we transform from the reciprocal basis (where we have B^T\ ) to the non-orthonormal basis where we want C^T\ ?

becomes

what is the transition matrix from the reciprocal basis to the basis?

And that leads to the question: exactly what is our reciprocal basis?

I could simply say (!) that the attitude matrix for the reciprocal basis is the inverse of the transition matrix P for the non-orthonormal basis; that is, the transition matrix Q for the reciprocal basis is the inverse transpose of P:

Q = P^{-T}\ ,

Q = \left(\begin{array}{cc} 0 & 1 \\ -1 & 1\end{array}\right)

But I think it’s only fair to admit that I don’t remember the description of the attitude matrix for the reciprocal basis. What I have in my head is a matrix equation, which says that the product of the attitude matrix Q for the reciprocal basis with the transition matrix P for the given basis, is the identity:

Q^T P = I\ .

(And that matrix equation simply says that the dot products of the reciprocal basis vectors – the rows of Q^T\ – with the given non-orthonormal basis vectors – the columns of P – are either 0 or 1.)

Then I solve for Q:

Q = P^{-T}\ .

Of course Q is what I said it was, but the point is that I remember the derivation rather than the answer. Fortunately it’s a very short and easy derivation, so I see the answer almost immediately….

Having Q, we can transform B^T\ to (not from) the orthonormal basis…

D^T = Q B^T Q^{-1}\ ,

so we get

D^T = \left(\begin{array}{cc} 1 & 0 \\ 1 & 2\end{array}\right)

and then we could transform D^T\ to the non-orthonormal basis:

C^T = P^{-1} D^T P\ ,

C^T = \left(\begin{array}{cc} 1 & -1 \\ 0 & 2\end{array}\right)

That is, as it should be, the same answer.

third of three solutions

And the second solution shows us another way to have done it. From

C^T = P^{-1} D^T P\

and

D^T = Q B^T Q^{-1}\

we get

C^T = P^{-1} Q B^T Q^{-1} P\ ,

which we write as

C^T	= R^{-1} B^T R\ ,

with R defined as

R = Q^{-1} P\ .

By writing out the algebra first, we can dispense with explicit the computation and transformation of D^T\ .

In other words, the transition matrix R from the reciprocal basis to the non-orthonormal basis is given by R = Q^{-1} P\ . Equivalently, the transition matrix from the non-orthonormal basis to its reciprocal basis is given by P^{-1} Q\ .

(But even if we use R, we are implicitly going through the original orthonormal basis.)

12 pentagons!

I’ve been reading Sternberg’s “Group Theory and Physics”, Cambridge University, reprinted 1999. On pages 43 to 44 he says, “… every fullerene has exactly 12 pentagons. This is not an accident.”

The stable structure of carbon which has 60 carbon atoms arranged like the vertices of a soccer ball is called a buckyball. It turns out that, in similar structures, we can have any even number, greater than 18, of carbon atoms except for 22. This is equivalent to polyhedra having 12 pentagons and any number of hexagons except 1.

This family of structures consists of polyhedra whose faces are either pentagons or hexagons. In chemistry they are labeled by the number of carbon atoms, so they talk about C_{20},  C_{22}, ...  C_{60}, ...  C_{72}, ....\

I find it unforgettable and marvelous that the number of pentagons is always exactly 12. And I can prove it.

Sternberg’s proof is somewhat different from mine, because I choose to use two general equations which we have seen before. One equation depends on the fact that every edge separates exactly 2 faces; the other depends on the fact that every edge joins exactly 2 vertices.

Imagine that we have a polyhedron. Choose one face, and count the number of edges it has. Do this for all the faces.

We will have counted each edge twice. Let f_k\ be the number of faces with k edges; then, for example, f_5\ is the number of pentagons. We have the general formula

2 e = 3 f_3 + 4 f_4 + 5 f_5 + 6 f_6 + ....\ .

(I think I first showed you that here.)

For example, the term 3 f_3\ says that we counted 3 edges for each triangular face we touched. But the 2e on the LHS says that we will have counted each of those edges as part of another face, too. And yes, an edge can join two different kinds of polygons.

Similarly, if we count the number of edges by counting the number at each vertex, we will have counted each edge twice. Let v_k\ be the number of vertices with k edges; we have the formula

2 e = 3 v_3 + 4 v_4 + 5 v_5 + ....\

(I believe I have not shown you that equation, except in the very special case 2 e = 3 v, for triangulations.)

We also have Euler’s formula in general

\chi = v - e + f\

and

\chi = 2\

in particular.

Let’s see what the smallest case looks like: 12 pentagons.

Picture 38

This is the dodecahedron, one of the five platonic solids. (Now is a good time to remark that people used to think of them as 3D solids, but for quite some time mathematicians have treated them as 2D surfaces bounding 3D volumes. Similarly, the sphere is the surface which bounds the ball.) We see that there are three edges at every vertex.

It has \{v, e, f\} = \{20,30,12\}\ .

I note that it has 20 vertices. It corresponds to C_{20}\ . That’s why the list starts at 20.

What about a soccer ball (buckyball)? Mathematica® knows this as the TruncatedIcosahedron.

Picture 39

It has \{v,e,f\} = \{60,90,32\}\ . This corresponds to carbon 60. Or to the most common soccer ball. (Not all soccer balls have 32 faces, apparently.)

Note also that every vertex has 3 edges.

Not every polyhedron — not even every regular polyhedron — has 3 edges at every vertex. Here is the regular 20-sided Platonic solid, the icosahedron:

Picture 40

Incidentally, it has 5 edges at every vertex. We are going to assume 3 edges at every vertex.

Here is the question. If

  • every face of a polyhedron is either a pentagon or a hexagon,
  • and every vertex has three edges,

what are the possible polyhedra?

The answer has two parts:

  • any such polyhedron must have 12 pentagons;
  • it can have any number of hexagons other than 1.

What I will actually prove is that no number of pentagons is possible, except 12.

Our general formula for edges and faces

2 e = 3 f_3 + 4 f_4 + 5 f_5 + 6 f_6 + ....\

becomes

2 e = 5 f_5 + 6 f_6\ ,

because  f_k = 0\ for every other k.

Our general formula for edges and vertices becomes

2 e = 3 v_3\

because v_k = 0\ for every k ≠ 3.

The total number of faces f is

f = f_5 + f_6\ ,

and the total number of vertices v is

v = v_3\ .

Thus, we wish to investigate solutions of the following 4 equations:

2 e = 5 f_5 + 6 f_6\

f = f_5 + f_6\

2 e = 3 v\

2 = v - e + f\ .

It turns out that Mathematica® is extraordinarily cooperative if I write separate equations for

\chi = v - e + f\

\chi = 2\

and consider a set of 5 equations:

\begin{array}{l} 2 e=5 f_5+6 f_6 \\ 2 e=3 v \\ \chi =-e+f+v \\ f=f_5+f_6 \\ \chi =2\end{array}\

When I tell it to eliminate \chi\ , Mathematica® tells me that f_5 = 12\ :

2 e=3 v\land f=f_5+f_6\land v=2 (f-2)\land f_5=12\

It’s not too difficult to work this out by hand; you can do it without Mathematica.

This algebra places no restrictions on the number of hexagons. An it’s easy enough to exhibit solutions for any number of hexagons. Even for just one hexagon! Ruling out that case is a challenge; the proof dates only from 1963.

We know a couple of cases. We know that f_6 = 0\ (no hexagons) is a dodecahedron. And f_6 = 20\ (32 faces total) is a soccer ball or buckyball.

Whether or not we can actually build a polygon with any value of f_6\ is another question. It turns out that there is only one nonnegative value of f_6\ which does not work: we can’t have just one hexagon.

We can have a polyhedron with any number except n=1; that is, only C_{22}\ cannot exist. But i haven’t proved either that C_{22}\ cannot exist, or that anything does exist, other than C_{20}\ and C_{32}\ , which I have drawn.

I am not about to try to prove either that f_6 = 1\ cannot be constructed, or that every other value can be. Maybe someday, but not today. (Among other things, I have no idea if all the hexagonal faces can be chosen the same size, nor even if they are all regular hexagons. My geometric intuition is nil — but the algebra is clear. Well, maybe it isn’t: the algebra says that simply by counting vertices, edges, and faces we can rule out anything that doesn’t have 12 pentagons. It says northing about the shape or even about the existence.)

Exactly 12 pentagons. I think that is amazing. And the equations I used are nicely general, and can be used for other things – such as showing that there can be no regular polyhedra other than the 5 Platonic solids.

For more about buckyballs, you can try this Wikipedia article.

Here is a drawing that suggests why we can’t add just one hexagon to a dodecahedron.

Within my personal library, the best reference on polyhedra is Hartshorne’s “Geometry: Euclid and Beyond”. (See my bibliographies page.)

Ah, a book newly acquired since I drafted this is: Richeson, “Euler’s Gem”, Princeton University 2008; most of it should be accessible to a high school student.

My reference for the two “2 e” equations is Firby & Gardiner, also on my bibliographies page.

A proof that f_6 = 1\ cannot exist can be found in Branko Grunbaum’s “Convex Polytopes”, Springer, 2nd Edition, 2003. (Also newly acquired. It’s a graduate text.)

Wavelets: Consequences of Orthogonality and Review II

digression: eigenvalue 1

Before I proceed with consequences of orthogonality, I need to mention an omission. For one thing, I have gotten so caught up in the properties I’ve been looking at, that I have forgotten one of the crucial ones we used earlier. For another thing, the consequence which I have forgotten is still a little strange to me.

The consequence (or consequences)?

  • That the dilation equation could be written as an eigenvalue equation,
  • that the existence of a scaling function seems to be equivalent to an eigenvalue = 1,
  • and that the values of the scaling function at the integers are given by the corresponding eigenvector.

This has been crucial to some of our work: the recursion which I use for computing the scaling function is initialized by setting the values of the scaling function at the integers — that is, by finding the eigenvector with eigenvalue 1. Recursion — especially when combined with a lookup table — is very easy and very powerful; but we absolutely had to have initial values, and that eigenvector provided them. (I did this for the Daubechies D4, and for four wavelet systems with 6 nonzero h’s.)

The strangeness?

If I had never seen it fail, it wouldn’t seem strange. I expect that I will show you an example some day. Furthermore, understanding this seems to require that I move into the frequency domain. And I’m not ready to do that yet. (Actually, I’ve begun doing it, but unfortunately I’m still at the stage where some of my calculations result in what I know to be wrong answers. Details, details. When I get this all ironed out, it should be a great example…. And during the time it has taken to finish another draft of this post, I seem to have it. Yes!)

This is probably a good time to remind you that I am not yet past the stage of laying things out so that I recognize them when I see them in my reading. Oh, they make a lot more sense than they did when I started, but my understanding is still rather fragmented.

The eigenvalue properties are still mysterious, as is the assertion that the sum of the odd h’s is equal to the sum of the even h’s.

Now, let’s return to consequences of orthogonality and orthonormality.

Okay, we have one powerful theorem and one example (linear splines) which the theorem does not cover. Let’s look at what they have in common.

For both of them, we have five of six conditions in common… nested spaces V_j\ which are closed under translation and which have a scaling property. For the theorem, we have the sixth condition — that we have an orthonormal basis for the space V_0\ . For the example — the linear splines — we have a basis for V_0 but it is not even orthogonal.

They both, however, have an orthogonal direct sum decomposition, which we describe by saying that W_j\ is the orthogonal complement of V_j\ in V_{j+1}\ ; for example,:

V_1 = V_0 \oplus W_0\ .

Continuing on, we write such things as

V_2 = V_1 \oplus W_1 = V_0 \oplus W_0 \oplus W_1\

V_3 = V_2 \oplus W_2 = V_1 \oplus W_1 \oplus W_2 = V_0 \oplus W_0 \oplus W_1 \oplus W_2\ .

The decomposition

V_3 = V_1 \oplus W_1 \oplus W_2\

is interesting because it shows us that there is nothing special about V_0\ . Furthermore, we could go in the other direction: there is nothing special about non-negative indices.

V_0 = V_{-1} \oplus W_{-1} = V_{-2} \oplus W_{-2} \oplus W_{-1}\ .

The first important thing to realize is that any two W spaces are orthogonal to each other. This means that wavelets are orthogonal across scales.

The challenge is knowing if a W space is closed under translation and has the scaling property. The orthogonal direct sum decomposition guarantees that every function in one W space is orthogonal to every function in every other W space. For our linear spline example, where I can exhibit the mother wavelet, it seems clear — and dangerous as that assumption may be, I’m going to make it — it seems clear that W_0 is closed under translation and that every W space has the scaling property.

Our theorem guarantees that, if we have an orthonormal basis for V_0\ ; but our theorem does not apply to the linear splines.

The second important thing to realize is that the space V_0\ is orthogonal to W_0\ , and in general V_j\ is orthogonal to W_j\ . But more, the nesting of spaces tells us that V_j\ is orthogonal to W_k\ whenever k is greater than or equal to j. This means that scaling functions are orthogonal to some wavelets but not necessarily all.

To be specific, the direct sum decomposition

V_3 = V_0 \oplus W_0 \oplus W_1 \oplus W_2\

tells us that V_0\ is orthogonal to W_k\ for some of the k’s greater than or equal to 0; and the generalization to arbitrary j…

V_j = V_0 \oplus W_0 \oplus ... \oplus W_j\

tells us that in fact V_0\ is orthogonal to W_k\ for all k greater than or equal to 0. And we know there’s nothing special about the 0 subscript: for any m < j we may write

V_j = V_m \oplus W_m \oplus ... \oplus W_j\ .

So. An orthogonal direct sum decomposition tells us a lot about wavelets and scaling functions across scales.

Let me remind us that our powerful theorem told us more: we have orthonormal bases for both V_0\ and W_0\ — and hence orthonormal bases for every V_j\ and W_j\ .

Now, what are the additional consequences specifically of having an orthonormal basis for V_0\ ?

One of them is developed in the proof of the theorem (for which see Daubechies “Ten Lectures on Wavelets”), wherein we explicitly constructed the mother wavelet. We have that the mother wavelet \psi satisfies the equation

\psi(t) = \sum_{n} g(n)\ \sqrt{2}\ \varphi(2t-n)\ .

and that \psi \in W_0\ if and only if the g’s are given by a magic recipe:

g(n) = \pm (-1)^n h(M-n)\ , for M any odd integer.

The -n in M-n says that we reverse the h’s; the (-1)^n\ says that we alternate the signs; the \pm\ says that we may choose to change the first sign or the second; the M says that we may shift the h’s; but in fact we may only shift by an odd integer. Since zero is even, we must shift by at least 1. Oh, we also have that the number of (non-zero) g’s is the same as the number of (non-zero) h’s.

The h’s, you recall, come from the dilation equation for the scaling function \varphi\ ,

\varphi(t) = \sum_n {h(n)\ \sqrt{2}\ \ \varphi(2\ t - n)}

Our example of the linear splines, whose translates are not all orthogonal, shows us that the magic recipe does not work if the translates are not orthogonal. To be specific, our h’s were

\left\{\frac{1}{2 \sqrt{2}},\frac{1}{\sqrt{2}},\frac{1}{2 \sqrt{2}}\right\}\

and our g’s were
\left\{\frac{1}{24},-\frac{1}{4},\frac{5}{12},-\frac{1}{4},\frac{1}{24}\right\}\ .

This example also shows us that the number of g’s can be different from the number of h’s.

By contrast, the Haar, the D4, and the D6 etc. scaling functions and wavelets satisfy our powerful theorem, and their mother wavelets were given by that magic formula for the g’s.

Two other results which I discussed in the previous post were:

  • the magic recipe guarantees that the sum of the g’s is zero;
  • and if the of the g’s is zero, we expect the integral of the mother wavelet to be zero.

For the Haar and Daubechies D4, D6 etc., both of those conditions hold. For the linear spline example, however, only the second condition holds. The magic recipe implies that the integral of the mother wavelet is zero, but the integral can be zero even if we do not use the magic recipe.

Next, there is an extremely important consequence of having an orthonormal basis for V_0\ . We can derive the following quadratic condition:

\sum_n h(n) h(n-2k) = \delta(k)\ ,

where \delta(0) = 1\ and \delta(k) = 0\ for k ≠ 0.

Perhaps surprisingly, it is independant of the norm (\sqrt{P}\ ) of the scaling function, but it does depend on A (our customary \sqrt{2}\ ) in the dilation equation. You should recall that I use P to denote the value of the integral of the scaling function squared:

P := \int \varphi(x)^2 \, dx\ .

I should point out that any consequence of orthonormality which is independent of P is, in fact, a consequence of mere orthogonality. (By orthonormality — or orthogonality –, in this context, I mean that the translates of the scaling function are orthonormal — or orthogonal.)

The quadratic condition has one immediate corollary, for k = 0: the sum of the squared h’s is 1,

\sum_n h(n) h(n) = 1\ .

Some quick algebra indicates that the general formula would be

\sum_n h(n) h(n-2k) = \frac{2}{A^2} \delta(k)\ .

I’ll show you later just how important that corollary is.

There is another crucial consequence of orthonormality. I believe it is a corollary of the quadratic condition, but I have usually seen it associated with the frequency domain.

I believe the quadratic condition implies that the number of non-zero h’s is even (if it is finite). In fact, having decided that, I find it in Strang & Nguyen (p. 148), which changes my belief to virtual certainty. Good. Incidentally, they call it “double shift orthogonality”.

Here’s how I got it.

Whenever I write it out for an odd number of h’s, I can show that the product of the first and last must be zero — so if one of them is zero, we just ended up with an even number of nonzero h’s. To be specific, if we assume that we have 3 h’s, h0, h1, h2, then we get the condition

h0 h2 = 0,

which requires that either h0 = 0 or h2 = 0, and then there are only 2 nonzero h’s.

This is why we have D2 (i.e. Haar), D4, D6, etc. but we do not have, for example, D3. That is, there is no orthonormal Daubechies’ wavelet family with three nonzero h’s. (The notations are not universal; Daubechies herself wrote D2 for what I am used to seeing as D4 today.)

Of course, the linear splines have an odd number of h’s — but the point is, they are not orthogonal — they’re allowed to have an odd number because the quadratic condition does not hold.

I do not know how useful the following are, but let me note that we have a quadratic condition relating h’s and g’s:

\sum_n h(n) g(n-2k) = 0\ .

And we have one for just the g’s, too; I think that for general A, it looks just like the condition on the h’s:

\sum_n g(n) g(n-2k) = \frac{2}{A^2} \delta(k)\ .

(If we are using the magic recipe to get the g’s from the h’s, do we need either of those conditions? I suppose we could use them to catch a typo in a given set of g’s or h’s….)

There is one last consequence I wish to show you. Again, I am not sure how useful this is, but it is unusual enough to be worth noticing, and we may very well find a use for it. This consequence is that

(\int \varphi(x) \, dx) ^2 = \int   (\varphi(x)^2)\, dx\ ,

which I would write as

E^2 = P,

E = \sqrt{P} = || \varphi ||\ .

That is, the L^2 norm \sqrt{P} of the scaling function is equal to its integral E. This is unusual. It requires both orthogonality and the partition of unity, and might also require — as usual — an interchange of integration and an infinite series. Let me show you.

If the translates of the scaling function are orthogonal, then we have

\int \varphi(x) \varphi(x-k) \, dx = P \delta(k)\ .

(That P is my notation, not Burrus et al. As I said before, you have every right to curse me out if you are reading Burrus et al. I am sorry that I did not understand their notation, but I’m not going to change mine now.)

Take the sum — or infinite series — over all k, and get

\sum_k \int \varphi(x) \varphi(x-k) \, dx = P\ .

Interchange, assuming we may, and write

\int \varphi(x) \sum_k \varphi(x-k) \, dx = P\ .

But one of the consequences of the dilation equation was a so-called partition of unity — the sum of all integer translates of the scaling function is equal to the integral E of the scaling function…

\sum_k \varphi(x-k) = E\

so by replacing the summation we get

E \int  \varphi(x)\,  dx = P\ ,

and then by replacing the integral we get

E^2 = P,

QED.

Summary

So. Here is what we have. Six “consequences” of the dilation equation:

  • The sum of the h’s = 2/A.
  • \varphi(t) = \sum_n {h(n)\ A\ \ \varphi(2\ t - n)}\ .
  • E = \int\ \varphi(t)\ dt \ .
  • The sum of the even h’s = the sum of the odd h’s.
  • \sum_k { \varphi(\frac{k}{2^j})} =E\  2^j
  • \sum_k { \varphi(t+k)} = E \

A wavelet equation like the dilation equation serves to define coefficients g(n).

  • \psi(t) = \sum_{n} g(n)\ A\ \varphi(2t-n)\ .

We have the utterly crucial property that if the translates of the scaling function are orthonormal, then the g’s are given by a magic recipe:

  • g(n) = \pm (-1)^n h(M-n)\ , for M any odd integer.

We have two observations about the sum of the g’s.

  • the magic recipe guarantees that the sum of the g’s is zero;
  • and if the of the g’s is zero, we expect the integral of the mother wavelet to be zero.

If the translates of the scaling function are merely orthogonal, so that the squared-norm is not 1…

P := \int \varphi(x)^2 \, dx ={ || \varphi ||}^2 \ne 1\ ,

then we get the following six — or seven — results:

  • the quadratic condition \sum_n h(n) h(n-2k) =\frac{2}{A^2} \delta(k)\ ;
  • its corollary \sum_n h(n) h(n) = \frac{2}{A^2}\ ;
  • its second corollary: the number of nonzero h’s is even, if finite;
  • it’s relatives \sum_n h(n) g(n-2k) = 0\ and \sum g(n) g(n-2k) = \frac{2}{A^2} \delta(k)\ ;
  • the L^2\ norm of the scaling function is equal to its integral: E = \sqrt{P} = || \varphi ||\ ;
  • its corollary E = 1 if and only if P = 1.

Now you might understand why I confused E and P: if the translates of the scaling functions are at least orthogonal, then either P and E are both equal to one, or neither is equal to one. If the translates of the scaling functions are orthonormal, then E = P = 1 — they look like the same thing. And I mixed them up when I first started working through Burrus et al. because on the first pass I was focused on the case E = P without realizing it. My bad.

I don’t know about you, but I need to be careful. If we have orthogonality but not orthonormality, then we need both E and P. If we do not have even orthogonality, we may not care about P but we will still need E.

To put it another way, any consequence of orthogonality which is independent of P is also independent of E.

Two examples

Let me close by giving you two examples of how important the quadratic corollary is.

For the first example, let us look for a scaling function with two coefficients h0 and h1, whose integer translates are orthonormal. Write the dilation equation as usual with A = \sqrt{2}\ , which gives us the sum of the h’s (a linear condition):

\sum_n h(n) = \sqrt{2}\ .

Then write the corollary of the quadratic condition also for A = \sqrt{2}\ :

\sum_n h(n) h(n) = 2/2 = 1\ .

That is, we have two equations in two unknowns,

\begin{array}{l} \text{h0}+\text{h1}=\sqrt{2} \\ \text{h0}^2+\text{h1}^2=1\end{array}

and their solution is unique: h0 = h1 = 1/\sqrt{2}\ . The Haar system. Don’t bother looking for another orthonormal set.

So. From the sum of the h’s and the quadratic condition, we can derive the Haar scaling function and show that it is the only one which leads to an orthonormal basis with two h’s.

For the second example, let us look for a scaling function with 4 coefficients, whose integer translates form an orthonormal basis for V_0\ . Sound familiar?

Again, we have the linear condition on the sum of the h’s…

h0 + h1 + h2 + h3 = \sqrt{2}\ ;

and the quadratic corollary…

h0^2 + h1^2 + h2^2 + h3^2 = 1\ .

But we also have the general quadratic condition itself, for N=4,

h(1) h(1-2 k)+h(2) h(2-2 k)+h(3) h(3-2 k)+h(0) h(-2 k)=\delta(k)\

and then for k = 1 it reduces to…

h(0) h(2)+h(1) h(3)=0\ .

(For other values of k, we get 0 = 0.)

We have three equations in four unknowns:

\begin{array}{l} h(0)+h(1)+h(2)+h(3)=\sqrt{2} \\ h(0)^2+h(1)^2+h(2)^2+h(3)^2=1 \\ h(0) h(2)+h(1) h(3)=0\end{array}

We do not get a unique solution but one of the solutions is, indeed, the Daubechies D4 scaling function.

I could write out the family of solutions in terms of h[0] as a parameter, but they are not simple. I’m not going to write them out.

I’m just going to recall the family of solutions as given by Burrus et al (and which I’ve already shown you),

h(0)=\frac{-\cos (a)+\sin (a)+1}{2 \sqrt{2}}

h(1)=\frac{\cos (a)+\sin (a)+1}{2 \sqrt{2}}

h(2)=\frac{+\cos (a)-\sin (a)+1}{2 \sqrt{2}}

h(3)=\frac{-\cos (a)-\sin (a)+1}{2 \sqrt{2}}

(Of course, we could actually verify that these solutions do satisfy our original 3 equations. I’m pretty sure I did that for myself. In fact, I think that’s how I find the mistake in the original post!)

The Daubechies h’s are found by setting a = \pi/3\ .

I still haven’t shown why the Daubechies are special, but we’re closer: now, at least, we know that they are one of an infinite number of solutions which satisfy 3 linear and quadratic conditions on the 4 h’s.

I also haven’t shown how we would get the solutions in terms of the angle a. Well, I don’t know yet. There is a chance that another derivation of the Daubechies D4 will proceed via trig polynomials and give us this compact family of solutions. There is also a chance that getting the solutions in terms of an angle is simply black magic.

I’ll try to find out, as we proceed.

Wavelets: Review I and Going Forward a Little

Let us recall what we have.

We have a collection of nested spaces…

\dotsm\ V_{-3} \subset\  V_{-2} \subset\  V_{-1} \subset\  V_{0} \subset\  V_{1} \subset\  V_{2}\ \dotsm\ \

… whose intersection is the trivial space and whose union is all square integrable functions on the real line:

\cap\ V_i = \{0\}\ and \cup\ V_i = L^2(R)\ .

We assume that the spaceV_0\ is translation invariant and has the scaling property:

f(x) \in V_0\ \text{ if and only if } f(x-k) \in V_0\ for all integers k;

f(x) \in V_j \text{ if and only if } f(2^{-j} x) \in V_0\ .

Finally, the only real theorem I have shown you says that if we also have an orthonormal basis for V_0\ , then we can get an orthonormal basis for L^2(R)\ :

\psi_{j,k} (x) := 2^{j/2)} \psi(2^j\ x - k)\

The proof is constructive. In addition to giving us that orthonormal basis, it gives us spaces W_j\ , each of which is the orthogonal complement of V_j\ in V_{j+1}\ . Furthermore, for fixed j, we have an orthonormal basis for W_j\ :

\psi_{j,k}(x)\ will be an orthonormal basis of W_j\ .

We have asserted two kinds of orthogonality:

  • that we have orthonormal bases (for V_0\ and for L^2(R)\ ;
  • that we have orthogonal direct sums.

Can we relax these two conditions separately? Yes.

I have shown you an explicit example — for the linear spline (semi-orthogonal wavelets — Strang & Nguyen call them that on the inside front cover, so I will too; they are also called pre-wavelets) — where the bases for V_j\ and W_j\ were not each orthogonal, but the spaces themselves were orthogonal.

I did not provide any existence theorem for that example, nor did I even tell you how I found the mother wavelet for that example. We’ll get there. But the example itself shows that we can have an orthogonal direct sum without having orthonormal bases for V_j\ and W_j\ .

Let me empahsize that, at present, I have no theorem associated with the mother wavelet for the linear splines.

  • The mother wavelet is supposed to be orthogonal to all the translates of the linear spline scaling function, and computation supports that.
  • I infer that all the translates of the mother wavelet are also orthogonal to all the translates of the scaling function.
  • Therefore all the translates of the mother wavelet are in W_0 (because it’s the orthogonal complement of V_0\ ).
  • I say it seems clear that the same is true for scaled versions of the mother wavelet, so, for example, \psi(2t-k) \in W_1\ .

But I don’t actually know some of the most important things I need:

  • do the W_j\ spaces have the scaling property: f(x) \in W_j \text{ if and only if }f(2^{-j}) \in W_0\ ? (I believe it for the \psi\ , but is it true for the spaces?)
  • do the \psi(t-k) \ in W_0\ span W_0\ ?
  • are they a basis for W_0\ ?

(Remember that W_j\ is defined as the orthogonal complement of V_j\ , while the V_j\ are defied as the spans of the scaled and translated scalings functions. While we have finally gotten our hands on elements of W_j\ , we don’t know very much about them. I do expect (but do not know) that once I can explain where the g coefficients came from, we’ll know that we do have bases and the scaling property.)

With the caveat that there are some some crucial concerns about the wavelets for the linear splines, let’s just keep using the linear splines as an example of a non-orthonormal basis. (They are, but are the wavelets?)

So, let me recast the dilation equation and its consequences in this framework. For a few moments let us forget that we have an orthonormal basis for V_0\ .

Suppose we do have a basis, which may not be orthonormal. The translates of the linear splines are such a basis – for the space they span.

If we have a scaling function whose integer translates are a basis for V_0\ , then I am quite sure (I didn’t say I had proven) that the translated and scaled functions are a basis for V_1\ . Maybe I should say it the other way, that if the translated and scaled functions are a basis for V_1\ , then we get the dilation equation:

\varphi(t) = \sum_n {h(n)\ A\ \ \varphi(2\ t - n)}.

That just says that the scaling function is an element of V_1\ and may be written in terms of this basis for V_1\ . Of course, this equation serves as the definition of the h’s.

From the dilation equation, we concluded that the sum of the h’s = 2/A.

We also decided that the scaling function, as a solution of the dilation equation, was determined only to within a multiple. One way to characterize those various multiples is by the integral of the function:

E = \int\ \varphi(t)\ dt \ .

I believe that people have shown that the sum of the even h’s is equal to the sum of the odd h’s. I do not yet know when this is true or how to prove it, so I treat it as a condition to be checked whenever I have a collection of h’s in my hand.

In particular, we saw that it was true for the linear splines – our example of a non-orthonormal basis – even though there were an odd number, specifically 3, nonzero h’s.

If the scaling function has been normalized so that its integral is one (E=1), then we had two further consequences:

  • \sum_k { \varphi(\frac{k}{2^j})} = 2^j \text{(for E = 1)}
  • \sum_k { \varphi(t+k)} = 1 \text{(for E = 1)}\

With the passage of time, I have decided (Hey, sometimes I’m slow! More like, overwhelmed.) that the general forms of those two equations are:

  • \sum_k { \varphi(\frac{k}{2^j})} = 2^j\ E\
  • \sum_k { \varphi(t+k)} = E\

We can confirm those by working thru the algebra in the general case; we can also just realize that going from E = 1 to E ≠ 1 effectively replaces the scaling function \varphi\ by E\ \varphi\ . They appear to be independent of A.

(I have totally messed up the notation vis a vis Burrus et al., but I’m not going to change it now. If you are reading Burrus et al., you have every right to curse me out; my E is his A_0\ , and his E is the integral of the square of the scaling function. I will remain consistent to the notation I have adopted, even though it is not that of Burrus et al. When I write “E”, its the integral of the scaling function, not of its square.)

These, then, are the six consequences I listed earlier. Actually, two of them really serve only to identify the constants A and E; but if someone just hands me a scaling function, I will compute A and E in order to see what conventions they’re using.

Let me write all 6 consequences together:

  • The sum of the h’s = 2/A.
  • \varphi(t) = \sum_n {h(n)\ A\ \ \varphi(2\ t - n)}\ .
  • E = \int\ \varphi(t)\ dt \ .
  • The sum of the even h’s = the sum of the odd h’s.
  • \sum_k { \varphi(\frac{k}{2^j})} = 2^j\ E\
  • \sum_k { \varphi(t+k)} = E\

I was very happy to have an example of a non-orthogonal basis – the linear splines – for which I have already verified those properties.

So much for consequences of the dilation equation. What about the mother wavelet?

If we have singled out a particular function in W_0\ , hence in V_1\ , then we get an equation, similar to the dilation equation for the scaling function:

\psi(t) = \sum_n {g(n)\ A\ \ \varphi(2\ t - n)}\ .

(I can’t imagine that that particular function is anything but the mother wavelet, and then this equation says that the coefficients called g are the ones associated with the mother wavelet.)

There is one particular consequence of having an orthonormal basis for V_0\ — the recipe for construction of the orthonormal basis of wavelets leads to the property that the sum of the g’s is zero.

That, in turn, implies that the integral of the mother wavelet — hence the integral of every wavelet — is zero.

But the proof that the integral of the mother wavelet is zero depends only on two properties:

  • that the sum of the g’s is zero;
  • that we may interchange a possibly infinite series and an integral.

The point I am trying to make is that:
if the sum of the g’s is zero — whether our bases are orthonormal or not — then we should expect the integral of the mother wavelet to be zero. If it is not, then there must have been a problem interchanging the integral and an infinite series (and I might be able to provide such an example).

To put that another way, I want to compute the sum of the g’s in any case.

And to be quite explicit, the proof goes by integrating both sides of the wavelet equation. From

\psi(t) = \sum_n {g(n)\ A\ \ \varphi(2\ t - n)}\ .

we get

\int \psi(t) \, dt = \int (\sum_n {g(n)\ A\ \ \varphi(2\ t - n)}) \, dt\ .

If we can interchange the integral and the summation – whic may be an infinite series rather than a finite sum – then we can extract the sum of the g’s:

\int \psi(t) \, dt = \sum_n {g(n)\ \int (\ A\ \ \varphi(2\ t - n)}) \, dt\ ,

and so long as the integral on the RHS is finite, we have that the integral of the mother wavelet is proportional to the sum of the g’s.

So if the sum of the g’s is zero, then the integral of the mother wavelet is zero, unless something went wrong interchanging the limit operations.

As I said before, if the sum of the g’s is zero, expect that the integral of the mother wavelet is zero – and if it isn’t, then realize that we’re looking at a pathological case.

To summarize, I have discussed only two properties of the mother wavelet: its sort-of dilation equation and that if the g’s add up to 0, then the integral of the mother wavelet ought to be zero:

  • \psi(t) = \sum_n {g(n)\ A\ \ \varphi(2\ t - n)}\ .
  • \sum_n {g(n)} = 0 \text{ implies } \int \psi(t) \, dt = 0

I have not checked the sum of the g’s for the mother wavelet for our only non-orthonormal basis, the linear splines.

Why wait?

In the semi-orthogonal post, I handed you the g’s for the mother wavelet orthogonal to the translates of the linear spline scaling function:

g = \left\{\frac{1}{24 \sqrt{2}},-\frac{1}{4 \sqrt{2}},\frac{5}{12 \sqrt{2}},-\frac{1}{4 \sqrt{2}},\frac{1}{24 \sqrt{2}}\right\}\

They do indeed add up to zero. I expect that the integral of the mother wavelet is zero, and we can probably prove it by writing it in terms of the scaling function. Not now.

I should point out that even though we know so little about this mother wavelet, we know that it is in V_1\ because we were given the g’s that describe it wrt our basis \{\varphi(2t-k)\}\ of V_1\ ! And I was assured that it was, in fact, in W_0\ .

I had intended to march right on into the consequences of our two kinds of orthogonality, but this post is probably long enough, and this is a good stopping point.