Example: Is it a transition matrix? Part 1

This example comes from PCA / FA (principal component analysis, factor analysis), namely from Jolliffe (see the bibliography). But it illustrates some very nice linear algebra.

More precisely, the source of this example is:
Yule, W., Berger, M., Butler, S., Newham, V. and Tizard, J. (1969). The WPPSL: An empirical evaluation with a British sample. Brit. J. Educ. Psychol., 39, 1-13.

I have not been able to find the original paper. There is a problem here, and I do not know whether the problem lies in the original paper or in Jolliffe’s version of it. If anyone out there can let me know, I’d be grateful. (I will present 3 matrices, taken from Jolliffe; my question is, does the original paper contain the same 3 matrices?)

Like the previous post on this topic, this one is self-contained. In fact, it has almost nothing to do with PCA, and everything to do with finding — or failing to find! — a transition matrix relating two matrices.

On p. 163, Jolliffe provides a matrix of “principal components”:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

On the same page, he provides two matrices of the “rotated factor loadings”. One is “Varimax”…

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

… and the other is “direct quartimin”:

Q = \left(\begin{array}{cccc} 0.51 & -0.05 & 0.05 & 0.05 \\ 0.53 & 0.04 & 0.05 & -0.14 \\ 0.32 & 0.13 & 0.16 & 0.15 \\ 0.17 & -0.19 & 0.65 & 0.2 \\ 0.54 & 0.06 & -0.13 & 0.05 \\ -0.07 & 0.28 & 0.67 & -0.12 \\ 0.16 & 0.53 & 0.13 & -0.17 \\ 0.03 & 0.62 & -0.09 & 0.26 \\ 0. & 0.09 & 0.02 & 0.87 \\ 0.08 & 0.45 & 0.24 & 0.21\end{array}\right)

I presume — not knowing how to comute a varimax rotation! — that each of these matrices represents the same data, i.e. the same 4D subspace of R^{10} (since we have 4 column vectors of length 10 in each matrix). It would seem silly if a varimax rotation of PCs was not supposed to represent rotated PCs within the same subspace.

That presumption is wrong: they do not all represent the same subspaces.

How do I know this?

Because given two matrices of the same shape, I know how to find the transition matrix if it exists. We did this here.

Let me elaborate on what I’m doing. Imagine that our matrices had 2 columns and 3 rows. Those 3D column vectors lie in a 2D subspace — a plane — in R^3\ . By doing 2D transformations within that plane, I might find a particularly nice basis for representing that data.

If, instead, I apply a 3D transformation, I would be moving that data into a different plane. What I am doing here is looking at 4D transformations within the 4D subspace. Everything is predicated on my belief that the PCs in P, and the varimax rotation of them in V, and the direct quartimin data in Q are all supposed to be in the same subspace.

Not knowing how they were supposed to be computed, I could be seriously wrong here. OTOH, since we will find that V and Q are the same subspace, and the first 3 of the 4 columns of P also lie in that common subspace, I really believe that the 4th column of P should too. That it does not strikes me as an error.

Let me make one simplification up front. Are the matrices P, V, Q of full rank (4)?

Yes. The Mathematica command MatrixRank says that each of those matrices is of rank 4.

Let us consider V and Q. If the observations in each are related by a transition matrix T, and if we arbitrarily take V to be the “old” data, then each column of V’ (the transpose of V) is found by applying the transition matrix to the corresponding column of Q’.

V’ = T Q’

i.e.

V = Q T’.

Then, as I showed before, I imagine briefly that I can “solve” for T’…

T' = Q^{-1} V\ ,

but quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (Q'Q)^{-1} Q'\ V\ .

(It’s worth repeating that the inappropriate pseudo-inverse would use QQ’, but that can’t work because QQ’ is 10×10 but of rank 4 at most, hence not invertible.)

If Q’Q is invertible, that recipe always gives me a matrix T, but T need not be a transition matrix. The resulting T is a transition matrix if and only if we actually have

V = Q T’,

(given that V and Q are of the same full rank).

Do we have V = Q T’ ?

By direct computation I get

T = \left(\begin{array}{cccc} 0.9273 & 0.0921888 & 0.15161 & 0.101824 \\ 0.22944 & 0.864017 & 0.176744 & 0.0556721 \\ 0.251353 & 0.0599739 & 0.913081 & 0.0549228 \\ 0.185642 & 0.11154 & 0.00492819 & 0.941728\end{array}\right)

and then I look at V – Q T’…. You know, it’s easy enough to put them side by side.

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right) Q\ T' = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.16 & 0.17 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0 & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

Even the difference of the rounded-off matrices is satisfying small. That is,
Round[V,.01]-Round[Q.TT,.01]//Chop//MatrixForm is

\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & -0.01 & 0.01 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

Judging that the difference is negligible, I conclude that the matrices V and Q do span the same subspace of R^{10}\ ; they are the same 4D subspace with two different coordinate systems (two sets of basis vectors).

While we’re here, let’s do this another way. After all, the fundamental question is: can the columns of V be written as linear combinations of the columns of Q? Alternatively, can each column of Q be written as a linear combination of the columns of V?

Bear in mind that we have already established that Q and V are of the same full rank. In that case, the alternatives are equivalent.

We can answer the fundamental question by trying to regress each column of V on Q. Just remember to tell the regression to drop the constant!

To be specific, here’s the first column of V as a linear combination of the 4 columns of Q. Here’s the parameter table and R^2 — not the adjusted R^2\ : I really want to know how good this fit is, never mind how many variables it has.

Here’s a snapshot of the Mathematica commands (version 7). (Oh, darn. The last option in the LinearModelFit command, which replaced “Regress” is “IncludeConstant->False”. And I print the names just to confirm that none of those symbols has a left-over numerical value!)

(Let me just remind you that the R^2 is a measure of a how close the fit is. When we add another independent variable, however, the R^2 cannot decrease; it is not a good measure of the value — i.e. the worth — of that latest independent variable. The adjusted R^2 can decrease; in fact, in my experience, if the t-statistic of that latest variable is greater than 1 (in magnitude), then the adjusted R^2 goes up. As I said, in this case, I really want the R^2 rather than adjusted R^2 because I know exactly how many independent variables I am supposed to use. That said, there isn’t a whole lot of difference between the two in this case.)

v1 on Q

That’s a really good fit. And it should be. And that column of \beta\ , i.e. “Estimate”? It’s the first column of T’, i.e. the first row of T:

{0.9273, 0.0921888, 0.15161, 0.101824}

We could run 3 more regressions, but they’d only confirm what we believe: T’ could, in fact, be computed as regression coefficients. If the two subspaces coincide, then T is also a transition matrix between two bases in one and the same space.

There is yet another way to look at that. We know that in regression, yhat is the projection of y onto the subspace spanned by the columns of X. If y itself is in that subspace, then yhat = y, and the relatonship is given by a transition matrix.

So we could do this a third way. We know how to compute a projection operator onto the subspace spanned by the columns of X. In this case, X is Q, so we find its SVD (Singular Value Decomposition)…

Q = u\ w\ v^T

Then the 4 leftmost columns (u1) of u are an orthonormal basis for the range of Q….

u1 = \left(\begin{array}{cccc} -0.203473 & 0.27677 & 0.353491 & 0.188167 \\ -0.188235 & 0.423093 & 0.310857 & 0.0177813 \\ -0.30583 & 0.10025 & 0.140569 & 0.094808 \\ -0.313324 & 0.177504 & -0.285416 & 0.609666 \\ -0.194156 & 0.220796 & 0.497051 & 0.0101643 \\ -0.354903 & 0.228756 & -0.604256 & -0.00173818 \\ -0.320761 & 0.191153 & -0.0758258 & -0.463325 \\ -0.379019 & -0.300366 & 0.0912622 & -0.459411 \\ -0.365508 & -0.681435 & 0.196437 & 0.360653 \\ -0.43321 & -0.103153 & -0.111086 & -0.166404\end{array}\right)

That is, u1′ u1 = I, if the column space of X is of full rank. Check it:

u1^T u1 = \left(\begin{array}{cccc} 1. & 0 & 0 & 0 \\ 0 & 1. & 0 & 0 \\ 0 & 0 & 1. & 0 \\ 0 & 0 & 0 & 1.\end{array}\right)

Bur the reversed matrix product

R = u1 u1′

gives us a projection operator R onto the range of Q:

We can check it. (I won’t print it: it’s 10×10.) One, R is a projection operator if and only if it is idempotent, i.e. R R = R. And by computation, I see that is true.

Two, applied to each column of Q, it reproduces that column of Q; i.e. applied to Q as a whole, it reproduces Q, i.e. R Q = Q. Again, computation shows that is true, too.

Having checked that R is a projection onto the column space of Q, let’s use it: apply it to V. If V lies in the subspace spanned by the columns of Q, then R V will reproduce V….

I find that the largest difference between the elements of R V and V is 0.00720726 .

That’s as close as the other two calculations. Not exactly zero, but then, our regression didn’t fit exactly: like the RGB and XYZ tables in the earlier post, V and Q are almost but not exactly related by a transition matrix.

So we have seen three ways of looking at this. All three tell the same story: the column spaces of V and Q are the same.

Let me say a little more about two of them, the projection operator and the explicit regression.

The projection operator R came from the SVD. The regression can also be written using a projection operator, namely the hat matrix H. I keep getting mixed up about the relationship between T and R and H, so let me write it out.

The normal equations for

\hat{y} = X\ \beta

are

\beta = (X'X)^{-1}\ X'\ y\ .

(I showed you a quick and dirty way to recover that equation in the earlier post.)

Then

\hat{y} = X\ \beta = X\ (X'X)^{-1}\ X'\ y

and so we may write

yhat = H y

with the hat matrix H defined as

H = X\ (X'X)^{-1}\ X'\ .

For our regression, we took y to be the first column of V, and X to be the four colums of Q, so our hat matrix would be

H = Q\ (Q'Q)^{-1}\ Q'\ .

It is a projection operator onto the range of Q. But that’s exactly what R was. Go ahead and check: you will find that

H = R.

This means that we could compute R as the hat matrix, instead of using the SVD. Personally, I know how to use the SVD, whereas I would have to work out the definition of the hat matrix every time. But that’s a preference; you may choose to compute the hat matrix instead. (And that is how Malinowski was doing it!)

Going a little further, T’ was

T' = (Q'Q)^{-1}\ Q'\ V

so we have

Q T’ = H V.

We’ll look at the relationship between P and V in Part 2.

Color: decomposing the RGB A transpose matrix

(abuse of) terminology

Sometimes I get tired of writing “xbar, ybar, zbar tables” — and I just write “xyz bar tables” or even “XYZ tables”. Similarly for rbar, gbar, bbar tables — rgb bar. I’m not talking about anything new, just abbreviating the names.

Introduction

This is the third post about the example on p. 160 of W&S. Once again, I am going to decompose the reflected spectrum into its fundamental and its residual.

This time, however, I’m going to use the rbar, gbar, bbar tables (RGB) instead of the xbar, ybar, zbar (XYZ) tables. They did not do this.

I’ll tell you now there is one little twist in these calculations. We will need the ybar table, because we still need to use it to scale our results.

In addition to showing, as I did previously, the dual basis spectra and the orthonormal basis spectra for the non-nullspace, I will display the orthonormal basis for the nullspace.

As usual, I am using the CIE 1931 tables, and I am working at 20 nm intervals.

Our ingredients were: a reflectance spectrum, an illiminant spectrum, and some choice of color matching functions. For this post, I am choosing a different set of color matching functions.

Our results in the previous post were a few sets of 3 numbers. The final result was called xyz — meaning the triple (x, y, z) — for plotting a point on the CIE chromaticity chart. We will not do this again.

That result came from a triple denoted XYZ, which are values in XYZ color-space. That space has the advantage of being device-independent. We could — and did — convert those values to an RGB color space, specifically the one defined by the rgb bar (i.e. rbar, gbar, bbar) color-matching tables.

This post will compute the RGB values directly, and we will find that we get almost exactly the same RGB values. (We really ought to.)

The XYZ result in turn came from a triple for which I have no convenient notation, but I keep saying without proof, and confirming by computation, that that triple is the components of the fundamental of the reflected spectrum wrt the basis dual to the chosen color-matching functions.

Let us go see that, this time for the rbar, gbar, bbar tables instead of xbar, ybar, zbar.

The same old calculations, but with a twist

But first, let me review the calculations. Feel free to skip to the next section.

Our first computation is to get the reflected spectrum, which is the pointwise product of the illuminant spectrum and the reflectance spectrum. For this problem, the reflected spectrum is the perceived spectrum, the input to the eye of “the standard observer”. The data itself is in the fourth case of the first post about this example.

Here is what the three of those spectra looked like:

g1107a

As before, I’ve decided to avoid red, green, and blue — except where they are appropriate. For this post, they will be appropriate in some cases. The illuminant is shown in white, the reflectance spectrum is shown in purple, and the reflected spectrum is shown in yellow.

Having said that, I am going to use red, green, blue colors for the three columns of our A matrix, because these entires are rbar, gbar, bbar values. These are the 1931 color matching functions (i.e. a subset, at 20 nm intervals)

g1107b

BTW, we can tell that we are using RGB (i.e. rbar, gbar, bbar tables) because one of these functions is clearly negative over part of its range. In fact all of them are negative somewhere, but it’s only obvious for the red one.

Since I’ve just introduced new data, let me display it:

A = \left(\begin{array}{ccc} 0.00003 & -0.00001 & 0.00117 \\ 0.0003 & -0.00014 & 0.01214 \\ 0.00211 & -0.0011 & 0.11541 \\ -0.00261 & 0.00149 & 0.31228 \\ -0.02608 & 0.01485 & 0.29821 \\ -0.04939 & 0.03914 & 0.14494 \\ -0.07173 & 0.08536 & 0.04776 \\ -0.09264 & 0.17468 & 0.01221 \\ -0.03152 & 0.21466 & 0.00146 \\ 0.0906 & 0.19702 & -0.0013 \\ 0.24526 & 0.1361 & -0.00108 \\ 0.34429 & 0.06246 & -0.00049 \\ 0.29708 & 0.01828 & -0.00015 \\ 0.15968 & 0.00334 & -0.00003 \\ 0.05932 & 0.00037 & 0 \\ 0.01687 & 0.00003 & 0 \\ 0.0041 & 0 & 0 \\ 0.00105 & 0 & 0 \\ 0.00025 & 0 & 0 \\ 0.00006 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

Our second computation (or third, this order doesn’t matter) should be to apply the matrix A^T — the (transpose of) the color matching functions A — to the reflected spectrum S:

A^T\ S = \{70.5889,\ 61.249,\ 47.8412\}\ .

As before these numbers are components of (the fundamental of) the reflected spectrum wrt the basis dual to A^T\ . In Cohen’s notation, these 3 numbers are the components wrt the columns of the E matrix.

The other computation, whether we do it second or third, is to find the constant for normalizing things, using the illuminant spectrum; that is, to set the white point (1,1,1) in our color space.

Here’s the twist. We still have to use the second column of the XYZ (xyz bar) tables; not one of the columns of our new A is the total cone response. (Nor is the sum of them, and I’ll confess that I don’t know why. My WAG (my guess) is that the three color matching functions each have to be weighted differently in order to sum to the total cone response.)

The number we got by applying the second column of the XYZ tables to the illuminant was 529.764 .

Now we move on to the fourth computation: we divide our 3 values A^T\ S by that to get RGB.

RGB = {0.133246, 0.115616, 0.0903067}.

It is encouraging to see that these RGB values are in [0, 1]. These are tristimulus values in an RGB color space, namely the one I keep showing on the chromaticity chart.

Let me be clear. The red, green, and blue dots on this diagram are the projections of (1,0,0), (0,1,0) and (0,0,1) onto xy-space. The values I just got are in this RGB space.
CIE with orig RGB

Oh, in the previous post we transformed the computed XYZ values into RGB; what did we get?

{0.133246, 0.115616, 0.0903103}

So. Almost exactly the same answers: only the B value differs. That’s good. I said I would confirm this equivalence.

The blue value is off a tiny bit. (The simple fact is that the two sets of tables — xyz bar and rgb bar — and the transition matrix between them are not consistent. In fact, it appears that the columns of the two tables do not quite span the same 3D space. If I’m right, I could fix this by using one table and the transition matrix to construct a very slightly different version of the second table. But bear in mind that I am printing far more digits than W&S showed.)

the dual basis

Now, let’s get the dual basis E. We recall that instead of

E^T = A^{-1}\ ,

we use the pseudo inverse and write

E^T = (A'A)^{-1}\ A'\

It is easier to work with E^T but easier to display E:

E = \left(\begin{array}{ccc} 0.000561675 & -0.000849316 & 0.00535311 \\ 0.00583075 & -0.00906171 & 0.0555629 \\ 0.0527362 & -0.0836627 & 0.52778 \\ 0.110419 & -0.184984 & 1.42203 \\ 0.0130902 & -0.0537785 & 1.34031 \\ -0.149518 & 0.244672 & 0.616947 \\ -0.311401 & 0.664664 & 0.135783 \\ -0.492205 & 1.35497 & -0.0912093 \\ -0.339342 & 1.5831 & -0.142044 \\ 0.0806418 & 1.32314 & -0.0962321 \\ 0.656427 & 0.713092 & 0.00326589 \\ 1.0634 & 0.072223 & 0.0910496 \\ 0.956881 & -0.194913 & 0.10246 \\ 0.521479 & -0.151607 & 0.0594277 \\ 0.194688 & -0.0626132 & 0.0226816 \\ 0.0554502 & -0.0183495 & 0.00649827 \\ 0.0134844 & -0.00451219 & 0.00158394 \\ 0.00345331 & -0.00115556 & 0.000405644 \\ 0.000822217 & -0.000275134 & 0.0000965819 \\ 0.000197332 & -0.0000660321 & 0.0000231797 \\ 0. & 0. & 0.\end{array}\right)

Check that we do have E^T\ A = I\ . I didn’t say it last time, so let me emphasize: I didn’t prove that the pseudo-inverse would give me the dual basis. This easy “check” is the real confirmation that my recipe actually works. This check is crucial, not merely convenient.

Let me call the basis vectors E1, E2, E3, and let’s look at them. In fact, let me juxtapose the dual basis (right) with the color matching functions (left).

g1107c

If we multiply each of those dual basis vectors (E1, E2, E3) by our original three numbers…

{70.5889, 61.249, 47.8412}

and add the results.. we get numbers…

{0.243727,2.51476,23.848,64.4959,61.7521,33.9471,25.2246,
43.8826,66.214,82.1295,90.1689,83.8434,60.5088,30.3679,
10.993,3.10116,0.751257,0.192395,0.0458083,0.010994,0.}

and a picture…

g1107e

where I have shown the sum in white.

Let’s look quickly at the reflected spectrum (in yellow) and that sum (still in white):

g1107f

Yes, we’ve seen that before. And as before, that white curve is the fundamental. It differs from the yellow curve by a metameric black, a spectrum in the null space of A^T\ .

As before, we could stop here. We have gotten RGB values from the reflected spectrum… we have found the dual basis… we have found the fundamental of the reflected spectrum… we know that the first three numbers we computed (A^T\ S\ ) are the components of the fundamental wrt the dual basis.

The SVD, orthonormal basis, and projection operator

Now let’s do the SVD (Singular Value Decomposition), because we can get a nice orthonormal basis for the entire domain of A^T\ including its null space, i.e. for spectra including the metameric blacks.

As before, we will get an orthonormal basis v1 for the pre-image of A^T\ , then a projection operator R onto that pre-image, then recompute the fundamental by applying the projection operator to the reflected spectrum.

First, let

A^T = u\ w\ v^T\ .

(The first time I did this, I let B = A^T\ . Now we’re more experienced, but if you want to use B instead of A^T\ , feel free.)

As always, our vectors have length 21 because we used 20 nm intervals from 380 to 780 nm. Then, the leftmost 3 columns of v are an orthonormal basis for the pre-image of A^T\ , and the rightmost 18 (i.e. 21-3) are an orthonormal basis for the nullspace of A^T\ .

Here are the leftmost 3 columns of v, denoted v1 as is my habit:

v1 = \left(\begin{array}{ccc} 0.000242117 & 0.00237304 & 0.000773655 \\ 0.00254521 & 0.0246016 & 0.00811386 \\ 0.0253371 & 0.23387 & 0.0760077 \\ 0.0806111 & 0.633731 & 0.188135 \\ 0.110853 & 0.608184 & 0.12797 \\ 0.102268 & 0.306133 & -0.04696 \\ 0.0972532 & 0.129667 & -0.243487 \\ 0.0880827 & 0.101349 & -0.512971 \\ -0.0317461 & 0.111471 & -0.580417 \\ -0.228205 & 0.117862 & -0.449608 \\ -0.460923 & 0.112998 & -0.181177 \\ -0.596203 & 0.0923988 & 0.0811638 \\ -0.500375 & 0.0613641 & 0.162946 \\ -0.266387 & 0.0296408 & 0.104503 \\ -0.0986158 & 0.0105714 & 0.0410965 \\ -0.0280157 & 0.00296645 & 0.0118833 \\ -0.00680593 & 0.000717077 & 0.00290704 \\ -0.00174298 & 0.000183642 & 0.000744487 \\ -0.000414996 & 0.0000437242 & 0.000177259 \\ -0.000099599 & 0.0000104938 & 0.0000425421 \\ 0. & 0. & 0.\end{array}\right)

OK, I have to know: did v1 change? Yes. I’ll show you, soon.

Let me juxtapose the orthonormal basis v1 (right) with the dual basis E vectors. Both bases span the same space, the pre-image of A^T\ , but they are different.

g1107g

And in this case, I can’t see that any vectors in one basis look similar to vectors in the other basis.

Let me recall an image from the previous post: here are the dual basis E and the o/n basis v1 from the previous post. This is the old counterpart to the preceding drawing.

g1107h

And the right hand pieces of those two drawings are the new v1 and the old v1. We can see that they are quite different.

Having an orthonormal basis for the preimage of A^T\ , we know that we can construct a projection operator R from the domain of A^T\ onto the preimage; we just take the product of v1 and v1^T\ in the other order:

R = v1\ v1^T\ .

It’s 21×21, as before, so I’m not going to show it to you. But, for checking, I’ll show you the first column:

g1107i

And I’ll tell you that the last column, as before, is identically 0.

the fundamental and the residual

So. Apply the projection operator R to the reflected spectrum S to get the fundamental f…

f = {0.243727,2.51476,23.848,64.4959,61.7521,33.9471,25.2246,
43.8826,66.214,82.1295,90.1689,83.8434,60.5088,30.3679,
10.993,3.10116,0.751257,0.192395,0.0458083,0.010994,0.}

and subtract to get the residual n = S – f…

n = {4.85627,26.2996,19.116,-14.6684,-7.32845,18.6715,27.0686,
15.2246,3.00319,-13.0295,-23.7795,-12.7434,13.5977,41.0282,
59.1236,63.6887,60.2519,52.106,59.2832,33.0258,39.625}

Did we get the same fundamental? They are very close. Again, I think the real problem is that the two tables are not exactly consistent. Here’s

old f minus new f =

{-0.00362232,0.000221807,0.00281357,-0.00101889,-0.00138667,
-0.00346164,-0.00173991,-0.00126245,-0.00325203,-0.000486578,
0.00271479,0.00422385,0.00120929,-0.00248099,-0.00107542,
0.0013427,0.00223839,-0.00306751,0.00309787,0.00376061,0.}

We already know that the fundamental is the same whether we compute it using the dual basis or using the projection operator. Let me illustrate that the fundamental computed using RGB is the same as the fundamental computed using XYZ.

I have plotted points using the previous fundamental, and I have plotted a piecewise linear curve using the new fundamental:

g1107j

I should probably emphasize that although this drawing looks like one from the previous post, it illustrates something else. The apparently identical drawing in the previous post showed the equivalence of the fundamental computed two ways using XYZ; this drawing shows the equivalence of the fundamental from the previous post with the fundamental from this post.

And the null? Close, too. We have

old n minus new n =

{0.00362232,-0.000221807,-0.00281357,0.00101889,0.00138667,
0.00346164,0.00173991,0.00126245,0.00325203,0.000486578,
-0.00271479,-0.00422385,-0.00120929,0.00248099,0.00107542,
-0.0013427,-0.00223839,0.00306751,-0.00309787,-0.00376061,0.}

We can check that n is in the nullspace of A^T\ by computing A^T\ n\ :

{-6.77791*10^-14, 6.72509*10^-15, -3.50254*10^-14}.

We should also check that the fundamental has the same 3 components as the entire reflected spectrum. That is, we had

A^T\ S = \{70.5889,\ 61.249,\ 47.8412\}\ .

for A^T\ applied to the reflected spectrum. If we apply A^T\ to the fundamental, we get

A^T\ f = \{70.5889,\ 61.249,\ 47.8412\}.

The same. As they should be.

You know what? There’s another pair of calculations I can do, to make a point.

What are the components of the fundamental wrt the orthonormal basis v1? We just dot f into the v1 matrix, because its columns are orthonormal basis vectors:

f \cdot v1 =\
{-129.491,141.916,-79.7421,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}.

That confirms that no part of the fundamental lies in the nullspace. That was the whole point of the projection operator.

And what are the components of the residual? I hope you’ve got it: exactly the first three are zero:

n \cdot v1 =\
{0,0,0,-24.6623,-15.3786,17.8955,33.3365,29.6755,19.4414,
-0.412828,-18.9652,-15.4907,8.59491,37.8598,57.8821,63.3301,
60.1642,52.0835,59.2778,33.0245,39.625}.

The decomposition looks the same as in the previous post. The reflected spectrum is in yellow, the fundamental in white, and their difference n, the residual, is in black.

g1107j2

NOTE that although reflected and fundamental are non-negative, the residual is not. The good news is that, in principle, we could generate the fundamental spectrum, as well as the reflected. But I can’t imagine how to generate a spectrum with negative values.

We face this problem not just for the residual, but for just about every basis vector we’ve computed!

(The middle vector of the orthonormal basis, i.e. the middle column of v1, is the only exception so far; it is everywhere non-negative. But that’s only for the orthonormal basis for the RGB color matching functions; for the previous post, every column of v1 has some negative values.)

Because the basis vectors are, frankly, weird spectra, don’t expect anything nice if you try to find their RGB, XYZ, or xyz values. Most of them lie outside the CIE chromaticity chart! Incidentally, I have computed Cohen’s F1 and F2 bases for this case, and some of them are still negative, too. Well, they have to be: things have to cancel out so that the dot products among the orthonormal vectors are actually zero.

But while we’re looking at the linear algebra, let’s take a look at the orthonormal basis for the null space.

the null space basis

Now for something new.

We have 18 basis vectors (metameric black spectra) for the nullspace. It’s rather confusing to look at all of them… it’s not much better to look at a few of them close together. In the top left, we have the 1st 5th, 9th, and 13th (1,5,9,13)… the top right is 2,6,10,14… bottom left is 3,7,11,15… bottom right is 4,8,12,16:

g1107j3

There are two more to show (17 & 18) and I thought I show a few more adjacent pairs, too. I have added 1 & 2 and 9 & 10:

g1107k

While I’m playing around…. What about four spectra centered around 500? And add them up, to get another metameric black.

g1107m

Let me do one last thing.

Graphs at 5 nm

What would the bases look like if I used the full 5 nm tables? The fundamental depended on the reflected spctrum, but the bases depend only on my choice of color matching functions. So, let me show them for the 1931 CIE rgb bar tables at 5nm intervals.

Here are the orthonormal basis vectors “v1″ — at 20 nm on the left, 5 nm on the right:

g1107n

And the dual basis vectors “E”:

g1107o

And the dual basis “E” juxtaposed with the orthonormal basis “v1″:

g1107p

I might as well do this for the XYZ color matching functions, too. The more detailed curves are from the 1931 xyz bar tables, at 5 nm intervals.

Here are the orthonormal basis vectors “v1″:

g1107p2

The dual basis vectors “E”:

g1107q

And the dual basis “E” juxtaposed with the orthonormal basis “v1″:

g1107r

Color: decomposing the A transpose matrix and the reflected spectrum

This is the second post about the example on p. 160 of W&S. Here’s the first. (Incidentally, I am going to decompose the reflected spectrum into its fundamental and its residual. I do not think that they did this. Oh, this is what I did for Cohen’s toy example, but now it’s for real. Approximate, but for real.)

review the XYZ for this

Let’s review one case from the previous post: 1931 CIE at 20 nm intervals. As is my custom, I took a subset of their values — I did not average the numbers to account for the data I omitted. If this approximation isn’t good enough, simply go to 5 nm or 1 nm intervals.

Our ingredients were: a reflectance spectrum, an illiminant spectrum, and some choice of color matching functions.

Our results were a few sets of 3 numbers. The final result was called xyz — meaning the triple (x, y, z) — for plotting a point on the CIE chromaticity chart.

That result came from a triple denoted XYZ, which are values in XYZ color-space. That space has the advantage of being device-independent. We could — and will shortly — convert those values to an RGB color space, specifically the one defined by the CIE 1931 rgb bar (i.e. rbar, gbar, bbar) color-matching tables.

That XYZ result in turn came from a triple for which I have no convenient notation, but I keep saying without demonstration that that triple is the components of the fundamental of the reflected spectrum wrt the basis dual to the chosen color-matching functions. (Try saying that with a mouthfull of rocks!)

Let us go see that.

But first, let me review the calculations. Feel free to skip to the next section.

Our first computation is to get the reflected spectrum, which is the pointwise product of the illuminant spectrum and the reflectance spectrum. For this problem, the reflected spectrum is the perceived spectrum, the input to the eye of “the standard observer”.

Here is what the three of those spectra looked like:

g1102a

For this post, I’ve decided to avoid red, green, and blue, except where it is really appropriate. The illuminant is shown in white, the reflectance spectrum is shown in purple, and the reflected spectrum is shown in yellow.

For this example, I had chosen the 1931 xyzbar tables for our color functions, and they looked like this:

g1102b

Once again, I’ve changed the colors. X is orange, Y is white, and Z is yellow. It might help if you remember that the Y column of this A is nothing like green, being instead the “photopic response” — the total energy received by the cones — of the standard observer; white seems a sensible color for it.

BTW, we can tell that we are using XYZ (i.e. xbar, ybar, zbar tables) because these functions are never negative; we recall that each of the rbar, gbar, bbar color matching functions has some negative values.

Our second computation (or third, it doesn’t matter) should be to apply the matrix A^T\ — the (transpose of) the color matching functions A — to the reflected spectrum S, getting:

A^T\ S = \{356.817,\ 354.641,\ 271.109\}

I claim that these numbers are components of the fundamental of the reflected spectrum wrt the basis dual to A^T\ . In Cohen’s notation, these 3 numbers are the components wrt the columns of the E matrix.

Please be patient. We’ll get there real soon now.

The other computation, whether we do it second or third, is to normalize things using the illuminant spectrum; that is, to set the white point (1,1,1) in our XYZ space.

We did this easily because the second column of A — the second row of A^T\ — is the total cone response — the photopic response — of the standard observer.

Here’s the second row of A^T\ applied to D65: 529.764

Now we move on to the fourth computation: we divide our 3 values by that to get X,Y,Z.

XYZ = {0.673539, 0.669432, 0.511754}

NOTE that the XYZ values are in [0, 1]. These are tristimulus values in a sort-of color space. (Let’s see. It is not HSI, even though that middle number is an intensity. In fact, it does not directly specify a color; that’s done by a change-of-basis. Maybe I should call it a pre-color space? Maybe I’m getting carried away here.)

And then our fifth computation is to get x,y,z. Here’s the sum X+Y+Z: 1.85472

and we divide XYZ by it to get xyz values: xyz = {0.363148, 0.360933, 0.275919}.

Where was that on the chromaticity chart? The yellow dot.

g1102c

All that was done in the previous post. Now let’s move on.

Moving on.

Get the RGB

Let’s do one last thing before putting this down. I want to get RGB values from these XYZ. I have in my hands, and have shown you before, the transition matrix T31 (having once needed the 1964 CIE, I have renamed some things)…

T31 = \left(\begin{array}{ccc} 2.76888 & 1.75175 & 1.13016 \\ 1 & 4.5907 & 0.0601 \\ 0 & 0.05651 & 5.59427\end{array}\right)

It treats XYZ as old, RGB as new, so we have

XYZ = T31 RGB,

so

RGB = T31^{-1}\ XYZ = \{0.133246,\ 0.115616,\ 0.0903103\}\ .

We’ll come back to these in the next post. These should be RGB color-space values using the 3 sources shown on the CIE chromaticity diagram; alternatively, these should be the RGB values computed directly from the rbar, gbar, bbar tables. That’s what I’ll confirm in the next color post.

Moving on, finally, there are two things we need to do. Actually, we might get away with only one of them, and that’s the dual basis, so let’s go get it first.

getting the dual basis E.

As I have said before, given a basis with transition matrix A we can find a basis with transition matrix E dual to it by requiring that the dot products of the vectors satisfy

E^T A = I\ .

That is, by laying out the E vectors as rows and the A vectors as columns, we compute all of the possible dot products as E^T A\ .

A digression on dual bases

I’ll remind you that we are working in finite dimensional vector spaces, and the dual basis and reciprocal basis look exactly the same. (So if you need to review, look at my posts on reciprocal bases. See the tag cloud.) But I think it’s worthwhile to think of reciprocal bases as living in the same vector space, while dual bases live in a vector space V and its dual V*.

Oh, the dual space V* is the vector space of all linear functionals whose domain is V, i.e. all real-valued linear operators from V to R.

I’ll also remind you that although one usually starts with a basis in V and finds “the dual basis” in V*, we are going backwards. The rows of A^T\ are a basis for V*, and we wish to find a matrix E whose columns are a basis for V, and more, a dual basis to the rows of A^T\ .

What a tangled knot! In my head I was going backwards, but I computed it the other way! What I really computed was a basis E^T dual to A — but that is the same as finding a basis E dual to A^T\ . The two equations

E^T A = I

and

A^T E = I

are equivalent. I chose to solve the first instead of the second, even though I was thinking of the second. (Aside: it’s a very good thing that the dual of the dual space is the original space, for finite dimensional vector spaces.)

Finally, I suppose I should say again that although we often compute the dot product of two vectors (u and v, say) by writing one as a row (say u) and the other (v) as a column, we may view that matrix product as the effect of applying the linear functional u to the vector v. And what I did was the equivalent of talking about u’v but computing the equivalent, v’u.

Why might we view it that way? Among other things, we can apply u to v without even having a dot product!

Let’s return to finding the dual basis E.
end digression

If A is square and of full rank — i.e. we really do have a basis, hence a linearly independent set of vectors — then we can invert A and write

E^T A = I

E^T = A^{-1}\ .

In this case, however — with A our color matching functions –as we first saw in the first Cohen post, A is not square and therefore cannot possibly be invertible. The 3×3 matrix A’A (A^T A\ ), however, is invertible, and we construct the pseudo-inverse

(A'A)^{-1} A^T

and substitute it for A^{-1}\ , getting

E^T = A (A'A)^{-1}\ .

(We remember that A’A is symmetric, i.e. its own transpose.)

BTW, it is very convenient to write the equation that way, because to check that we have a dual basis we want to compute E^T A\ . We might as well have found E^T first.

When we check it, we see that E^T A = I\ .

Let me call the three dual basis vectors E1, E2, E3, and let’s look at them. Here they are in columns…

E = \left(\begin{array}{ccc} 0.000358806 & -0.000360371 & 0.000884809 \\ 0.00330737 & -0.00334373 & 0.00929769 \\ 0.0302172 & -0.0308581 & 0.0885709 \\ 0.0643748 & -0.0678399 & 0.241909 \\ 0.0116003 & -0.0190679 & 0.237446 \\ -0.0843071 & 0.0839105 & 0.126398 \\ -0.190772 & 0.216831 & 0.060478 \\ -0.329572 & 0.420346 & 0.0457603 \\ -0.286455 & 0.453811 & 0.0276004 \\ -0.0869933 & 0.321465 & -0.00307957 \\ 0.209686 & 0.0758441 & -0.0425853 \\ 0.438522 & -0.150742 & -0.0706912 \\ 0.418257 & -0.20126 & -0.0640185 \\ 0.232068 & -0.121142 & -0.0349616 \\ 0.0872125 & -0.0467666 & -0.0130635 \\ 0.0249132 & -0.0134701 & -0.00372503 \\ 0.00608984 & -0.00330711 & -0.000909677 \\ 0.00157141 & -0.000868448 & -0.000233816 \\ 0.000348975 & -0.000172577 & -0.0000531557 \\ 0.0000923158 & -0.0000402793 & -0.0000143874 \\ 0. & 0. & 0.\end{array}\right)

… and here is what they look like (yellow, violet, orange in order):

g1102d

If we multiply each of those spectra by our original numbers {356.817,354.641,271.109}… and add the results.. we get

Vdb = {0.240105,2.51498,23.8508,64.4948,61.7507,33.9437,25.2229,43.8813,
66.2108,82.129,90.1716,83.8477,60.51,30.3654,10.9919,3.10251,0.753495,
0.189327,0.0489062,0.0147546,0.}

which looks like this (you can see that the orange one shrank a little compared to the other two):

g1102e

where I have shown the scaled basis vectors (i.e. the scalar component times each vector), and added their sum in white.

Let’s look quickly at the reflected spectrum (in yellow) and that sum (in white):

g1102f

They are quite different. What’s going on?

What does the dual basis describe?

Well, look at the long wavelengths of the white curve. We — i.e. the standard observer — can’t see that part of the reflected spectrum at all. Parts of the reflected spectrum are invisible to us.

Duh. Been there, done that. “Metameric black”.

What we have — a basis of 3 vectors — is a basis for the visible-to-us part of the reflected spectrum — i.e. a basis for what we called the fundamental. We have in fact found that part of the reflected spectrum which the standard observer actually sees; and by subtraction we could find what part of it is– let’s call it the residual — what we have called a metameric black. (“Metameric” because it is a nonzero spectrum; it is perceptually black, but it is not physically the absence of light.)

We have the fundamental; it’s that white curve. But I haven’t really proved that.

It’s not hard. By definition, each of the 3 vectors in the dual basis is in the pre-image of A^T\ . And, since A^T\ has 3 rows, it is of rank at most 3, so its pre-image is of dimension at most 3. In fact, we know that this A^T\ is of full rank, hence of rank 3, and therefore its pre-image is of dimension 3.

If the 3 dual basis vectors are linearly independent (i.e. the E matrix is of full rank 3), they span a 3D space which is a subspace of the 3D pre-image — so they span the entire pre-image.

And I’m certain that A^T\ being of full rank guarantees that the dual basis is of full rank.

But exactly how do I prove that? Oh, the pre-image of A^T\ is the range of A, hence of dimension 3.

Do you find it convincing?

Let’s look at it another way. But remember that we really did just get the fundamental f of the reflected spectrum S, and we could compute the residual n by subtraction; the residual is the reflected spectrum minus the fundamental.

n = S – f.

An orthonormal basis for fundamental spectra.

For the other way, we need a really nice basis for the entire domain of the A^T\ matrix, one which decomposes its domain into nullspace and not-nullspace (i.e. nullspace and preimage of the range). But we know how to do that.

The SVD (Singular Value Decomposition).

We will get an orthonormal basis v1 for the pre-image of A^T\ , then a projection operator R onto that pre-image, then get the fundamental by applying the projection operator to the reflected spectrum.

First, let

A^T\ = u\ w\ v^T\ .

(The first time I did this, I let B = A^T\ . Now we’re more experienced, but if you want to use B instead of A^T\ , feel free.)

Now, v is 21×21. Then the leftmost 3 (= rank of A^T\ ) columns of v are an orthonormal basis for the preimage of A^T\ , and the rightmost 18 (= 21-3) are an orthonormal basis for the nullspace of A^T\ .

Here are the leftmost 3 columns of v:

v1 = \left(\begin{array}{ccc} -0.00201177 & 0.00143737 & 0.000374614 \\ -0.0210175 & 0.0149756 & 0.00335666 \\ -0.199597 & 0.142652 & 0.0305073 \\ -0.539263 & 0.385948 & 0.0611556 \\ -0.513526 & 0.367224 & -0.00344907 \\ -0.258102 & 0.159433 & -0.11971 \\ -0.121682 & -0.0133979 & -0.265642 \\ -0.134798 & -0.177324 & -0.481195 \\ -0.193185 & -0.30181 & -0.471062 \\ -0.246728 & -0.377774 & -0.254046 \\ -0.283178 & -0.410206 & 0.0979204 \\ -0.273373 & -0.377298 & 0.391838 \\ -0.201477 & -0.270573 & 0.408489 \\ -0.102016 & -0.135415 & 0.232377 \\ -0.0370629 & -0.0489598 & 0.0880787 \\ -0.0104736 & -0.0138137 & 0.0252276 \\ -0.00254533 & -0.00335417 & 0.00617542 \\ -0.000641283 & -0.000842035 & 0.00160261 \\ -0.000163267 & -0.00021855 & 0.000343642 \\ -0.0000487128 & -0.0000661713 & 0.0000876566 \\ 0. & 0. & 0.\end{array}\right)

We should look at those. First alone:

g1102g

… and then juxtaposed with the dual basis:
g1102g2

OK, the orthonormal basis is different from the dual basis. (Actually, one pair look similar. I’ll take a look in a moment.) These vectors (spectra) are orthonormal, the dual basis is not. (To check that, just compute v1^T v1\ ; it’s a 3×3 identity matrix.)

Let me look at the two basis vectors that were similar. From the dual basis, the first vector; from the orthonormal basis, the 3rd vector.

g1102h

Interesting. I do not know that this result has any significance, but what’s the point of drawing pictures and not looking at them?

Let’s continue.

The projection operator R.

Having an orthonormal basis for the preimage of A^T, we know that we can construct a projection operator R from the domain of A^T\ onto the preimage; we just take the product of v1 and v1^T in the other order:

R = v1\ v1^T

It’s 21×21, so I’m not going to show it to you. But, for checking, I’ll show you the first column:

g1102 R

And I’ll tell you that the last column is identically 0.

The fundamental, again? and more.

So. Apply the projection operator R to the reflected spectrum S to get the fundamental f…

R S = f = {0.240105,2.51498,23.8508,64.4948,61.7507,33.9437,25.2229,43.8813,
66.2108,82.129,90.1716,83.8477,60.51,30.3654,10.9919,3.10251,0.753495,
0.189327,0.0489062,0.0147546,0.}

and subtract to get the residual n = S – f:

S – f = n = {4.85989,26.2994,19.1132,-14.6673,-7.32707,18.6749,27.0703,15.2259,
3.00644,-13.029,-23.7822,-12.7477,13.5965,41.0307,59.1247,63.6874,
60.2497,52.1091,59.2801,33.022,39.625}

We can check that n is in the nullspace of A^T\ by computing A^T\ n\ :

A^T\ n = \{-1.01618*10^-13,\ -1.09023*10^-13,\ 4.93919*10^-15\}

We should also check that the fundamental has the same 3 values as the entire reflected spectrum. That is, we had

A^T\ S = \{356.817,\ 354.641,\ 271.109\}

for A^T\ applied to the reflected spectrum. If we now apply A^T\ to the fundamental, we get

A^T\ f = \{356.817,\ 354.641,\ 271.109\}

The same. Of course. (I hope you think so.)

Let’s look at the decomposition. The reflected spectrum is in yellow, the fundamental in white, and their difference n, the residual, is in black.

g1102i

Yes, where the fundamental is zero, the reflected spectrum and the residual coincide.

At the risk of annoying you, let me emphasize: the standard observer (at 20 nm intervals!) would perceive the yellow spectrum and the white spectrum as being exactly the same color. We would say they are metameric spectra. He is actually perceiving the yellow spectrum, but if we somehow generated the white one, it would look the same to him.

And now we should compare the fundamental with the vector Vdb constructed from the dual basis. We could look at the numbers, of course, but an easy way to see that they coincide is to plot only points for Vdb and only a piecewise-linear curve for the fundamental.

g1102j

They are the same.

Thus we confirm that the 3 values A^T S are the components of the fundamental (not of the entire reflected spectrum) wrt the basis E dual to A.

The dual basis tells us what those first 3 numbers mean, and would let us construct the fundamental, and thence the residual.

The orthonormal basis, on the other hand, let us compute a projection operator, which gets us the fundamental by applying the projection operator to the reflected spectrum.

If i could only do one of these, it would be to construct the dual basis rather than the projection operator… but computing power is cheap and there’s no reason to do only one of these.

Incidentally, we could have simply appled the Gram-Schmidt orthogonalization procedure to the dual basis… but I really like having a basis for the null space, and that’s what the rest of the v matrix from the SVD got me.

I’m going to wait, however, before looking at the basis vectors for the null space.

Linear Algebra: the four fundamental subspaces

Edit 18 Sep 2009. Purely cosmetic. I have highlighted in blue the result describing the spans of the parts of the u and v matrices. I have found that I still need to refer to it rather than reconstruct it when I want it. In other words, it’s what I want to pick out quickly from this post.

Given a linear operator A, we are told that there are four fundamental subspaces associated with it. They are

  • the nullspace of A
  • the column space of A
  • the nullspace of A^T\ (i.e. of the transpose of A)
  • the column space of A^T\ .

Okay, they are easy enough to list.

Next, we need to realize that the column space of A is the range of A, and the column space of A^T\ is the range of A^T\ . That’s what we really care about — the ranges.

But why is A^T\ involved? In particular, why do we care about the nullspace and the range of A^T\ ?

Let me show you what they “really” are. (That is, this is how I understand them.)

I’m going to use an exercise from Strang’s “Linear Algebra and Its Applications”; see the bibliography.

This is exercise 2.4.2.. “Find the dimension and construct a basis for the four subspaces associated with…”

A = \left(\begin{array}{cccc} 0 & 1 & 4 & 0 \\ 0 & 2 & 8 & 0\end{array}\right)

(Of course, since the second row is twice the first, we already know that the matrix is of rank 1.)

Now, Strang presents the four fundamental subspaces by using the LDU (LU) decomposition; I am going to use — what else? — the Singular Value Decomposition (SVD).

Here is the Mathematica command…

Picture 42

We know all about the SVD. In particular, we know that the SVD decomposes the matrix A as

A = u\ w\ v^T\ ,

where u and v are orthogonal matrices; and w is the same shape as A and, loosely speaking, as diagonal as it can be.

Here is the w matrix…

w = \left(\begin{array}{cccc} \sqrt{85} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

That we have only one nonzero value tells us, if we didn’t already know it, that the matrix A is of rank 1. ( This is a representation of the A matrix with respect to different bases, specified by u and v. As such, it is the same shape and of the same rank. Since w is of rank 1, so is A.)

Here is the v matrix…

v = \left(\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{\sqrt{17}} & 0 & -\frac{4}{\sqrt{17}} & 0 \\ \frac{4}{\sqrt{17}} & 0 & \frac{1}{\sqrt{17}} & 0 \\ 0 & 1 & 0 & 0\end{array}\right)

The columns of v are an orthonormal basis for the domain of A. Let’s apply A to it, that is to each of these basis vectors.

A\ v = \left(\begin{array}{cccc} \sqrt{17} & 0 & 0 & 0 \\ 2 \sqrt{17} & 0 & 0 & 0\end{array}\right)

Well! The three rightmost columns of v were mapped to zero by A; we conclude that they are a basis for the nullspace of A.

What about the leftmost (remaining) column of v? Its image, by definition, is in the range of A. So the first column of v is a basis for the pre-image of the range of A.

(Once we know what A does to a basis for its domain, we know what it does to any vector in its domain.)

That’s the pre-image of the range. What about the range of A? Well, let’s look at the matrix u:

u = \left(\begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\end{array}\right)

We see that the first column of u is proportional to the image of the first column of v. They span the same space — which is the range of A. Look at the algebra instead of the numbers:

A\ v = u\ w\ v^T\ v = u\ w\ ,

and the almost-diagonal form of w says that the columns of u w (equivalently, the columns of A v) are multiples of the columns of u. OK, some of them are zero multiples, but the real point is that the nonzero columns of A v are multiples of corresponding columns of u.

What about the other column of u? The matrix A maps from R^4 to R^2; we say that its codomain is R^2. And its codomain is split into two subspaces, each 1-dimensional:

  • the range of A
  • the part of the codomain that is not the range (“the part we cannot reach with A”).

The rightmost column of u is a basis for the rest of the codomain.

Okay…. We seem to have four fundamental subspaces, and we have orthonormal bases for them, which means we have orthogonal direct sums:

  • domain of A = nullspace of A \oplus\ preimage of the range of A
  • codomain of A = range of A \oplus\ the part we cannot reach with A.

The columns of v split naturally into bases for the nullspace and for the pre-image of the range of A; the columns of u split naturally into bases for the range of A and for the part we cannot reach with A. That is,

  • nullspace of A is spanned by rightmost columns of v
  • pre-image of range of A is spanned by leftmost columns of v
  • range of A is spanned by leftmost columns of u
  • the part we cannot reach is spanned by rightmost columns of u

But what about the range and nullspace of A^T\ ? Now would be a good time to guess that

  • the preimage of the range of A is the range of A^T\
  • the part we cannot reach with A is the nullspace of A^T\ ,

and that’s why A^T\ shows up: it provides more convenient (and in a sense, symmetric) names for two of the four spaces.

Maybe we should write the SVD of A^T\ ? From

A = u\ w\ v^T\ ,

we get

A^T = v\ w^T\ u^T\ .

The roles of u and v have been swapped. (The point is that there are no new bases involved, still just the columns of u and v.)

Then our description of the spaces and bases of A translates to the following description of the spaces and bases of A^T\ :

  • nullspace of A^T\ is spanned by rightmost columns of u
  • pre-image of range of A^T\ is spanned by leftmost columns of u
  • range of A^T\ is spanned by leftmost columns of v
  • the part we cannot reach with A^T\ is spanned by rightmost columns of v.

Our guess was right. The bases tell us that we have two names for the spaces spanned by each of the 4 partial bases.

  • the rightmost columns of v span the nullspace of A, and equivalently the part we cannot reach with A^T\
  • the leftmost columns of v span the pre-image of the range of A, and equivalently the range of A^T\
  • the rightmost columns of u span the nullspace of A^T\ , and equivalently the part we cannot reach with A
  • the leftmost columns of u span the pre-image of the range of A^T\ , and equivalently the range of A.

Therefore, instead of naming the orthogonal direct sums as

  • domain of A = nullspace of A \oplus\ preimage of the range of A
  • codomain of A = range of A \oplus\ the part we cannot reach with A

we call them

  • domain of A = nullspace of A \oplus\ range of A^T\
  • codomain of A = range of A \oplus\ the nullspace of A^T\

(If you think about it, those are the only two combinations of “range” and “nullspace” that make sense, i.e. that are subspaces of the domain and codomain respectively.)

BTW, we should actually solve the exercise (!) and write down the bases.

Recall v.

v = \left(\begin{array}{cccc} 0 & 0 & 0 & 1 \\ \frac{1}{\sqrt{17}} & 0 & -\frac{4}{\sqrt{17}} & 0 \\ \frac{4}{\sqrt{17}} & 0 & \frac{1}{\sqrt{17}} & 0 \\ 0 & 1 & 0 & 0\end{array}\right)

For the nullspace of A, the 3 rightmost columns of v:

v2 = \left(\begin{array}{ccc} 0 & 0 & 1 \\ 0 & -\frac{4}{\sqrt{17}} & 0 \\ 0 & \frac{1}{\sqrt{17}} & 0 \\ 1 & 0 & 0\end{array}\right)

Strang’s basis is equivalent:

\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & -4 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right)

Check mine by applying A to each column…

A\ v2 = \left(\begin{array}{ccc} 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right)

For the nullspace of A^T\ , recall u:

u = \left(\begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\end{array}\right)

then the rightmost columns of u are:

u2 = \left(\begin{array}{c} -\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}}\end{array}\right)

Strang’s basis is equivalent: \{-2,1\}\ .

Check mine by applying A^T\ to it:

A^T\ u2 = \left(\begin{array}{c} 0 \\ 0 \\ 0 \\ 0\end{array}\right)

For the range of A, the first column of u:

u1 = \left(\begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}}\end{array}\right)

Strang’s basis is equivalent: \{1,2\}\ .

For the range of A^T\ , the leftmost column of v:

v1 = \left(\begin{array}{c} 0 \\ \frac{1}{\sqrt{17}} \\ \frac{4}{\sqrt{17}} \\ 0\end{array}\right)

Again, his basis is equivalent: \{0,1,4,0\}\ .

I should point out that Strang’s bases for the domain and codomain (found from the LDU decomposition) are orthogonal but not orthonormal. Mine are orthonormal.

I should also point out that I really, really like orthonormal bases. Sometimes there’s a good reason not to use them, but barring such a reason – orthonormal. And we will be using this material for light spectra.

So, as I see it, we have two principles here.

First, the 4 fundamental subspaces “really” are

  • an orthogonal decomposition of the domain of A:
    • nullspace of A
    • preimage of the range of A
  • an orthogonal decomposition of the codomain of A:
    • the range of A
    • the subspace we cannot reach with A

Second, we can describe two of those subspaces in terms of A^T\ ,

  • the range of A^T\ = preimage of the range of A
  • the nullspace of A^T\ = the subspace we cannot reach with A.

Thus, more compactly, we have the 4 fundamental subspaces as

  • an orthogonal decomposition of the domain of A:
    • nullspace of A
    • range of A^T\
  • an orthogonal decomposition of the codomain of A:
    • the range of A
    • nullspace of A^T\

As Paul Harvey used to say, “And now you know the rest of the story.”