Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:

P = \left(\begin{array}{cccc} 0.34 & -0.39 & 0.09 & -0.08 \\ 0.34 & -0.37 & -0.08 & -0.23 \\ 0.35 & -0.1 & 0.05 & 0.03 \\ 0.3 & -0.24 & -0.2 & 0.63 \\ 0.34 & -0.32 & 0.19 & -0.28 \\ 0.27 & 0.24 & -0.57 & 0.3 \\ 0.32 & 0.27 & -0.27 & -0.34 \\ 0.3 & 0.51 & 0.19 & 0.27 \\ 0.23 & 0.22 & 0.69 & 0.43 \\ 0.36 & 0.33 & -0.03 & 0.02\end{array}\right)

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

As before, I suppose there is a transition matrix T relating them; each column of V^T = V' can be computed by applying a transition matrix T to the corresponding column of P^T\ :

V’ = T P’

i.e.

V = P T’.

Then I imagine briefly that I can “solve” for T’…

T' = P^{-1}\ V\ ,

but then quickly replace the non-existent inverse of Q by the appropriate pseudo-inverse

T' = (P'P)^{-1}\ P'\ V\ .

I compute T’, and then display its transpose T:

T = \left(\begin{array}{cccc} 0.905145 & -0.386677 & 0.0882059 & -0.141151 \\ 0.853806 & 0.568131 & -0.0875601 & -0.220572 \\ 0.64498 & -0.155196 & -0.506397 & 0.490735 \\ 0.507437 & 0.131036 & 0.67664 & 0.347268\end{array}\right)

and then I test whether

V = P T’ (?).

The largest difference (element by element) between the matrices V and P T’ is 0.170627, which is rather larger (by a factor of 23) than the differences we saw in Part 1. Here’s the full blown matrix — where I chose to round off first, then subtract (i.e. what we’d compute if we were subtracting published, already rounded-off, matrices):

V - P\ T' = \left(\begin{array}{cccc} 0. & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0. & 0 & 0. & 0. \\ 0. & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I am inclined to think there is a probem. But we don’t know much at all about it.

Let’s try one of our alternatives. We construct a projection operator onto the subspace spanned by P.

Get the SVD (Singular Value Decomposition) of P…

P = u\ w\ v^T\ .

Then, since u is 10×10 and V is of rank 4, the leftmost 4 columns (u1) of u….

u1 = \left(\begin{array}{cccc} 0.132041 & 0.280276 & -0.423196 & 0.0366383 \\ 0.24536 & 0.379423 & -0.288143 & -0.0619886 \\ -0.0781059 & 0.252968 & -0.245411 & -0.0642461 \\ -0.310642 & 0.402272 & -0.122076 & 0.641009 \\ 0.19934 & 0.202925 & -0.456543 & -0.212204 \\ -0.275595 & 0.504975 & 0.417433 & 0.130869 \\ 0.0279239 & 0.347505 & 0.199748 & -0.545498 \\ -0.547166 & 0.00466106 & -0.00216808 & -0.298924 \\ -0.571058 & -0.293548 & -0.488675 & -0.061545 \\ -0.278923 & 0.220314 & 0.0368869 & -0.356268\end{array}\right)

are an orthonormal basis for the range of P (i.e. the column space of P). We construct a projection operator, as we have before, by

R = u1\ u1^T\ .

We check it first by confirming that R R = R, i.e. it is idempotent, hence a projection operator. We check it further by applying it to the column vectors in P: we expect that R P = P; i.e. it actually projects onto the column space of P (the space spanned by the columns of P).

Now, having checked that R is a projection operator onto P, we apply it to V, and compute V – R V:

V - R\ V = \left(\begin{array}{cccc} 0.00221622 & 0.0115115 & -0.0249851 & -0.0145408 \\ 0.0137721 & 0.012178 & -0.0243583 & -0.0200425 \\ -0.00564412 & -0.011024 & 0.00933521 & 0.0112507 \\ 0.00222059 & 0.00165751 & -0.00118369 & -0.00423324 \\ 0.00223276 & 0.00638406 & -0.0153346 & -0.0119237 \\ -0.00896396 & -0.0106169 & 0.0272358 & 0.0230479 \\ -0.00941945 & -0.0152489 & 0.0456323 & 0.0330044 \\ 0.0470134 & 0.0703021 & -0.170627 & -0.121383 \\ -0.0232815 & -0.036102 & 0.0941953 & 0.068255 \\ -0.0227796 & -0.0330688 & 0.0740154 & 0.0474348\end{array}\right)

Let me round that to the nearest .01…

V - R V = \left(\begin{array}{cccc} 0 & 0.01 & -0.02 & -0.01 \\ 0.01 & 0.01 & -0.02 & -0.02 \\ -0.01 & -0.01 & 0.01 & 0.01 \\ 0 & 0 & 0 & 0 \\ 0 & 0.01 & -0.02 & -0.01 \\ -0.01 & -0.01 & 0.03 & 0.02 \\ -0.01 & -0.02 & 0.05 & 0.03 \\ 0.05 & 0.07 & -0.17 & -0.12 \\ -0.02 & -0.04 & 0.09 & 0.07 \\ -0.02 & -0.03 & 0.07 & 0.05\end{array}\right)

I should put that array of differences into perspective. Recall V:

V = \left(\begin{array}{cccc} 0.48 & 0.09 & 0.17 & 0.14 \\ 0.49 & 0.15 & 0.18 & -0.03 \\ 0.35 & 0.22 & 0.24 & 0.22 \\ 0.26 & 0. & 0.64 & 0.2 \\ 0.49 & 0.16 & 0.02 & 0.15 \\ 0.05 & 0.34 & 0.6 & -0.09 \\ 0.2 & 0.51 & 0.18 & -0.07 \\ 0.1 & 0.54 & -0.02 & 0.32 \\ 0.1 & 0.13 & 0.07 & 0.83 \\ 0.17 & 0.46 & 0.28 & 0.26\end{array}\right)

This reinforces my belief that we have a problem. Some of the differences we found are larger than some of the values of V itself. I can see that V is not in the column space of P, and while it’s not extremely far away, it’s not close either. (This seems more definitive than “the hypothetical transition matrix T doesn’t quite work”.)

As before, however, we are tip-toeing around the real question: Is every column vector in V a linear combination of the columns of P?

Easy enough. Do a regression. Actually 4 of them, one for each column of V.

Let me name the individual columns of the P matrix, and let me set the first column of V (v1) as “y”, and the columns of V as independent variables.

(Some smaller pictures follow, in which you can see that the final option is “IncludeConstantBasis -> False”.) The results are:

Not bad at all. In fact, rather close to perfect.

What was the first row of T?

{0.905145, -0.386677, 0.0882059, -0.141151}.

Yes, those are the \beta s (“Estimates”) for that regression; first row of T is first column of T’ is regression coefficients for the first column of V as the dependent variable.

Let me emphasize another relationship. For the first regression, yhat should be the projection of v1 (the 1st column of V) onto the column space of P. But we have that projection directly as R v1:

R v1 = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

… and we have the yhat from the regression:

yhat = {0.477784,0.476228,0.355644,0.257779,0.487767,
0.058964,0.209419,0.0529866,0.123281,0.19278}

Good, they are the same. So if regression works better for you than the projection operator, then use regression. Personally, I prefer the projection operator because I can apply it to all of V at once. (And because sometimes I have a use for the orthonormal basis u1.)

Largely to display the R^2, let me run the other three regressions. (Oh, I also reduced the magnification in my Mathematica notebook, so the pictures are smaller and the right side of the regression command isn’t cut off, at least on my screen.)

Here’s v2 as the dependent variable:

Here’s v3 as the dependent variable:

Here’s v4 as the dependent variable:

We see that the R^2 is over 99% for v2, falls to 95% for v3, and is 97.6% for v4. Those last two are just not large enough, not for this question.

I’m certain that V and P do not span the same subspace, but that’s about all I know.

We could, however, have made the dual choice, projecting P onto V instead of V onto P. Are they the same?

No.

We already know — from the T matrix, from the projection operator, or from the regressions — that the P and V matrices do not represent the same data in two different coordinate systems. But there might be — and in this case, there is! — more information available by making the other choice.

Let’s get the projection onto V instead of onto P. We need the SVD of V…

V = u\ w\ v^T\ .

We need the 4 leftmost columns (u1) of u…

u1 = \left(\begin{array}{cccc} -0.292636 & 0.0434068 & -0.33927 & 0.269896 \\ -0.274013 & 0.218205 & -0.238496 & 0.365998 \\ -0.337357 & -0.00723082 & -0.119568 & 0.0745101 \\ -0.355459 & 0.224109 & -0.415783 & -0.490416 \\ -0.2739 & -0.0500253 & -0.225463 & 0.452829 \\ -0.308498 & 0.463936 & 0.232078 & -0.416388 \\ -0.287334 & 0.227908 & 0.427185 & 0.212631 \\ -0.303344 & -0.281983 & 0.503807 & 0.152613 \\ -0.329068 & -0.741253 & -0.129874 & -0.304713 \\ -0.382183 & -0.0396735 & 0.288053 & -0.0857624\end{array}\right)

We construct the projection operator…

R = u1\ u1^T

Now we apply it to P, and compute the difference P – R P:

P - R P = \left(\begin{array}{cccc} 0.00186923 & -0.00646007 & -0.00165699 & 0.0293708 \\ -0.00396322 & 0.00598219 & 0.00143136 & 0.0162366 \\ 0.0035053 & 0.00255499 & -0.00525352 & -0.0232869 \\ 0.000382829 & 0.00307153 & 0.00267228 & 0.124774 \\ 0.000923133 & -0.00219591 & 0.00142733 & -0.0228012 \\ 0.0000472374 & -0.00300758 & -0.00291228 & -0.00287352 \\ -0.00462961 & -0.00220958 & -0.0000400809 & -0.149176 \\ 0.00305495 & 0.00376593 & -0.000439512 & 0.29618 \\ -0.00367312 & -0.00153518 & -0.000838857 & -0.106747 \\ 0.00147883 & -0.000459218 & 0.00482354 & -0.141981\end{array}\right)

So what?

Round it off:

P - R P = \left(\begin{array}{cccc} 0 & -0.01 & 0 & 0.03 \\ 0 & 0.01 & 0 & 0.02 \\ 0 & 0 & -0.01 & -0.02 \\ 0 & 0 & 0 & 0.12 \\ 0 & 0 & 0 & -0.02 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -0.15 \\ 0 & 0 & 0 & 0.3 \\ 0 & 0 & 0 & -0.11 \\ 0 & 0 & 0 & -0.14\end{array}\right)

Ah ha! It’s time for Robin to squeak out, “Holy moly, Batman!”

The fourth column of P does not lie in the subspace spanned by the 4 columns of V, while the first three do. That is the problem. I don’t why or where it happened, but I know exactly what’s wrong: the 4th column of P is the only problem.

Somebody blew it somewhere. (OK, since I don’t know how a varimax rotation is computed, I have to entertain the possibility that V is not supposed to be P under a change of basis. But I can’t believe that, because V and Q have a common column space, and 3 of the 4 columns of P lie in that common column space. If that’s an accident, what does a deliberate act look like?)

Is the mistake in the orignal paper? Were P and V computed from the same correlation matrix but by independent algorithms? Or was V computed incorectly from P (and then Q computed from V)?

Or did someone copy P incorrectly? (P should be primary, but then how the heck does the 4th column of P fail to be in the column space of V? I would sooner have expected an error in V, given P.)

Or did they rotate the first 3 PCs, and then apply that transformation to the 4th column? (No. I’ve checked. You can too.)

Let me emphasize that we would get the same result from four regressions each with one column of P as the dependent variable and all the columns of V as the independent variables. The first three regressions would be extremely good fits, but the fourth would not be. We would, again, conclude that the fourth column of P was not exactly — or even to a fair approximation — a linear combination of the columns of V.

Summary: theory

Given two matrices, say P and V, which are alleged to be the same data in different coordinate systems, we may check that assertion very quickly (if both are of full rank):

Define

T' = (P'P)^{-1} P'\ V\ .

If we find it true that

V’ = T P’,

then T is a transition matrix between two bases, and P and V are indeed the same data in different coordinate systems. We’re done.

If, however, the equality fails, V' \ne T\ P'\ ,

then T is not a transition matrix, and P and V are not the same data in different coordinate systems. Unless we want more information, we’re done.

Finally, we have seen that, numerically, we may have T “close to” a transition matrix. (In Part 1, V and Q are very nearly the same data; in the color posts, the xyx bar and rgb bar tables are very nearly the same data.)

Summary: practice

Computing T tells us if something is wrong, but not much more. It’s quick, easy, and gives a simple answer: “good” or “no good”.

Computing both projection operators P onto V and V onto P (not just one of them) would likewise tell us whether or not something is wrong, and, in addition, the projection operators might isolate the problem.

We could use regression instead of computing explicit projection operators.

Summary: this example

We found that V and Q are the same data in two different coordinate systems, but that P is not. In addition, we actually know that the problem is limited to the 4th column of P: it does not lie in the space spanned by the columns of V (i.e. it does not lie in the column space of V, which is also the column space of Q).

Question:
Can someone tell me if these P, V and Q match the original paper by Yule et al.?

And there’s a question you might have.
How many examples did I have to search through to find such a good (because it was such a bad) example?

One. This was the very first case I checked, outside of the 1931 CIE tables.

That’s scary.

Color: re-doing Cohen’s example

Cohen’s example again

Let me now show you how I would do Cohen’s example. (His computations, pretty much, were the previous post.) I cannot over-emphasize that he deserves a lot of credit for getting the mathematics right, even if he didn’t name it correctly or do it beautifully.

I start with the A matrix:

A = \left(\begin{array}{ccc} 0.121934 & 0.0339986 & 0.219431 \\ 0.105204 & -0.0428189 & 0.139047 \\ 0.20452 & -0.36676 & -0.108477 \\ -0.449607 & 0.101795 & 0.292825 \\ -0.161109 & -0.0945546 & -0.0522694 \\ -0.554469 & 0.305203 & 0.229803 \\ -0.542936 & 0.493257 & 0.288467\end{array}\right)

This, as I have said, is a toy example of real color matching functions. Each column corresponds to one of the tri-stimulus values; each row represents a wavelength.

My summary of Cohen began with a different matrix — the E matrix — only because it was easier to enter into the computer. A is primary.

The next issue we have to cope with is that we really care about the transpose A' = A^T\ rather than about A. A’ will be applied to a spectrum, and will deliver three tri-stimulus values for it. I’m not convinced that any notational choice I make now will be without problems.

(As I said in the previous post, I use A’ for the transpose A^T\ whenever it is convenient.)

I am going to let

B = A^T\ .

B represents the linear operator I wish to study. Thus, B is

B = \left(\begin{array}{ccccccc} 0.121934 & 0.105204 & 0.20452 & -0.449607 &   -0.161109 & -0.554469 & -0.542936 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257 \\ 0.219431 & 0.139047 & -0.108477 & 0.292825 &   -0.0522694 & 0.229803 & 0.288467\end{array}\right)

It maps R^7\ to R^3\ . At most, it can be of rank 3; and then it would have a four dimensional null space.

What shall we do with it? All together now, class: find its singular value decomposition (SVD).

B = u\ w\ v'\ ,

where u and v are orthogonal (hence square) matrices, and w is as close to diagonal as it can be, given that it is the same shape as B.

The Mathematica command is

{u,w,v}=SingularValueDecomposition[B];

where the final semi-colon suppresses printing.

It is convenient to look first at the singular values themselves, the nonzero “diagonal” elements of the matrix w:

\{1.21434,0.354727,0.307628\}\ .

That all three are “significantly” larger than zero tells us that the matrix B (and therefore its transpose, A) is, indeed, of rank 3.

In case this is new to you, the w matrix is

w = \left(\begin{array}{ccccccc} 1.21434 & 0. & 0. & 0. & 0. & 0. & 0. \\ 0. & 0.354727 & 0. & 0. & 0. & 0. & 0. \\ 0. & 0. & 0.307628 & 0. & 0. & 0. & 0.\end{array}\right)

(And if this is new to you, check the tag cloud and the categories. There’s a lot examples of the Singular Value Decomposition. I could almost rename this blog “Applications of the SVD”.)

What about v?

v = \left(\begin{array}{ccccccc} 0.00783002 & 0.594295 & -0.455574 & 0.145535 &   0.183004 & 0.43387 & 0.443047 \\ -0.0405824 & 0.32624 & -0.416473 & -0.400691 &   0.350383 & -0.293626 & -0.590706 \\ -0.321854 & -0.360317 & -0.448269 & -0.580591 &   -0.296401 & -0.0331037 & 0.373633 \\ 0.416686 & -0.241837 & -0.606932 & 0.490082 &   -0.256011 & -0.298065 & -0.0704081 \\ 0.042639 & -0.520505 & -0.0951579 & 0.073689 &   0.821135 & 0.0465338 & 0.19064 \\ 0.551142 & -0.204115 & -0.0485665 & -0.317617 &   -0.104278 & 0.672485 & -0.297034 \\ 0.644592 & 0.198695 & 0.195648 & -0.367 & 0.0721767   & -0.425676 & 0.430864\end{array}\right)

It is 7×7. We know from the discussion of the four fundamental subspaces (look at the blue text) that the 4 rightmost columns of v are an orthonormal basis for the null space of B, and the 3 leftmost columns are an orthonormal basis for the pre-image of the range of B.

That is, writing a partitioned matrix [v] = [v1 v2], we have

v1 = \left(\begin{array}{ccc} 0.00783002 & 0.594295 & -0.455574 \\ -0.0405824 & 0.32624 & -0.416473 \\ -0.321854 & -0.360317 & -0.448269 \\ 0.416686 & -0.241837 & -0.606932 \\ 0.042639 & -0.520505 & -0.0951579 \\ 0.551142 & -0.204115 & -0.0485665 \\ 0.644592 & 0.198695 & 0.195648\end{array}\right)

v2 = \left(\begin{array}{cccc} 0.145535 & 0.183004 & 0.43387 & 0.443047 \\ -0.400691 & 0.350383 & -0.293626 & -0.590706 \\ -0.580591 & -0.296401 & -0.0331037 & 0.373633 \\ 0.490082 & -0.256011 & -0.298065 & -0.0704081 \\ 0.073689 & 0.821135 & 0.0465338 & 0.19064 \\ -0.317617 & -0.104278 & 0.672485 & -0.297034 \\ -0.367 & 0.0721767 & -0.425676 & 0.430864\end{array}\right)

We should confirm that B.v2 = 0, i.e. every column of v2 is in the null space of B:

B\ v2 = \left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\end{array}\right)

Once having a basis, v1, for the pre-image of the range, we know that we can construct a projection operator as v1 v1′:

P = v1\ v1' = \left(\begin{array}{ccccccc} 0.560795 & 0.383299 & -0.012435 & 0.136042 &   -0.265648 & -0.0948635 & 0.0339986 \\ 0.383299 & 0.281529 & 0.0822036 & 0.156964 &   -0.131909 & -0.0687305 & -0.0428189 \\ -0.012435 & 0.0822036 & 0.434363 & 0.225095 &   0.216479 & -0.0820701 & -0.36676 \\ 0.136042 & 0.156964 & 0.225095 & 0.600478 & 0.201399   & 0.308492 & 0.101795 \\ -0.265648 & -0.131909 & 0.216479 & 0.201399 &   0.281798 & 0.134365 & -0.0945546 \\ -0.0948635 & -0.0687305 & -0.0820701 & 0.308492 &   0.134365 & 0.34778 & 0.305203 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257\end{array}\right)

Because I didn’t call it R, I can recall Cohen’s R matrix, computed using a pseudo-inverse. I assure you that it is the same, we have P = R.

On the other side, since B is of full rank, its range is equal to its codomain, both being R^3\ , so the entire matrix u is an orthonormal basis for the range of B.

u = \left(\begin{array}{ccc} -0.756721 & 0.651151 & 0.0580932 \\ 0.530816 & 0.560136 & 0.635989 \\ 0.381585 & 0.512103 & -0.769509\end{array}\right)

Now the thing to do is, as before, split some given spectrum into its fundamental and its residual (metameric black). We only really have one good test, an equal energy spectrum.

We get the fundamental of our test spectrum…

e = \{1,1,1,1,1,1,1\}

…by applying our projection operator to it:

f = P\ e = \{0.741189,0.660536,0.496875,1.73027,0.34193,0.850176,0.43012\}

and that is exactly what we got before (come on, P = R; we have the same projection operator.)

Then we get the residual n as the difference…

n = e - f = \{0.258811,0.339464,0.503125,-0.730266,0.65807,0.149824,0.56988\}

and that, too, is the same as before.

As a refesher, let’s compute the components of the fundamental wrt the v1 basis. Since v1 is orthonormal, we may find the components of the fundamental by taking dot products, the fundamental dotted into each basis vector in v1. But that’s just

f\ v1 = \{1.30045,-0.207545,-1.87532\}

which says that f can be written as the sum

\{0.741,0.66,0.497,1.73,0.341,0.85,0.43\}

of the following three multiples of the basis vectors– I’m going to round off for clarity –

\{0.01,-0.053,-0.419,0.542,0.055,0.717,0.838\}

\{-0.123,-0.068,0.075,0.05,0.108,0.042,-0.041\}

\{0.854,0.781,0.841,1.138,0.178,0.091,-0.367\}.

The sum does indeed (almost) match our fundamental. (The discrepancy comes from my rounding off.)

\{0.741,0.661,0.497,1.73,0.342,0.85,0.43\}\ .

The last thing we need to get is the dual basis E for the rows of B. I just go look it up in the previous post…

E = A\ (A'A)^{-1} = B'\ (BB')^{-1}\ :

E = \left(\begin{array}{ccc} 1. & 0 & 2. \\ 0.5455 & -0.3636 & 1.5 \\ -0.5455 & -1.6364 & 0.5 \\ -0.8182 & -1.4545 & 1.3 \\ -1. & -1. & -0.5 \\ -0.7273 & -0.1818 & 0 \\ 0 & 1. & 0\end{array}\right)

Check that B E = I; it does.

And why do we care about the dual basis, i.e. the columns of E? Well, if we apply B to our test spectrum, we get

B\ e = \{-1.27646,0.43012,1.00883\}\ .

What are those three numbers? They are components of our test spectrum wrt the dual basis E!

That is, the sum of the following three vectors (which are multiples of the basis vectors in E)…

\{-1.276,-0.696,0.696,1.044,1.276,0.928,0\}

\{0,-0.156,-0.704,-0.626,-0.43,-0.078,0.43\}

\{2.018,1.513,0.504,1.311,-0.504,0,0\}

is

\{0.742,0.661,0.496,1.729,0.342,0.85,0.43\}

… which, once again, differs from our fundamental

f = \{0.741,0.661,0.497,1.73,0.342,0.85,0.43\}

only because of the rounding used to make the numbers readable.

To say that again: the dual basis E gives us a geometric interpretation of the tri-stimulus values we computed by applying B (i.e. A’) to a spectrum. The tri-stimulus values are components of the spectrum (equivalently, of its fundamental) wrt the E basis.

So. What did Cohen and I have that was the same? The matrices A and E, and the projection operator R. But note that where he computed R as

R = E’A,

and that’s short for R = A\ (A'A)^{-1}\ A'\ ,

I would use

R = v1\ v1^T\ .

I’ll admit that’s not much of a savings if we have both A and E.)

What did we have that was related? Where he had two orthonormal bases F1 and F2, I have one, v1. All three of these bases span the same subspace of R^7\ .

Digression: ways to show that F1, F2, and v1 span the same subspace of R7

One way to show that is to show that the columns of F1 and F2 are preserved by the projection operator cobstructed from v1. We have

F1 = \left(\begin{array}{ccc} 0.513425 & -0.372389 & 0.398141 \\ 0.280073 & -0.373412 & 0.252292 \\ -0.280073 & -0.563189 & -0.196825 \\ -0.420084 & -0.376455 & 0.53131 \\ -0.513425 & -0.0959116 & -0.0948392 \\ -0.373414 & 0.185702 & 0.416961 \\ 0. & 0.468301 & 0.523404\end{array}\right)

and we compute

R\ F1 = \left(\begin{array}{ccc} 0.513425 & -0.372389 & 0.398141 \\ 0.280073 & -0.373412 & 0.252292 \\ -0.280073 & -0.563189 & -0.196825 \\ -0.420084 & -0.376455 & 0.53131 \\ -0.513425 & -0.0959116 & -0.0948392 \\ -0.373414 & 0.185702 & 0.416961 \\ 0. & 0.468301 & 0.523404\end{array}\right)

And yes, their difference is the zero matrix.

Similarly for F2.

Alternatively, F1 and F2 should generate the same projection operator as v1 did.

F1\ F1^T = \left(\begin{array}{ccccccc} 0.560795 & 0.383299 & -0.012435 & 0.136042 &   -0.265648 & -0.0948635 & 0.0339986 \\ 0.383299 & 0.281529 & 0.0822036 & 0.156964 &   -0.131909 & -0.0687305 & -0.0428189 \\ -0.012435 & 0.0822036 & 0.434363 & 0.225095 &   0.216479 & -0.0820701 & -0.36676 \\ 0.136042 & 0.156964 & 0.225095 & 0.600478 & 0.201399   & 0.308492 & 0.101795 \\ -0.265648 & -0.131909 & 0.216479 & 0.201399 &   0.281798 & 0.134365 & -0.0945546 \\ -0.0948635 & -0.0687305 & -0.0820701 & 0.308492 &   0.134365 & 0.34778 & 0.305203 \\ 0.0339986 & -0.0428189 & -0.36676 & 0.101795 &   -0.0945546 & 0.305203 & 0.493257\end{array}\right)

I assure you that is the same as P and R. And similarly for F2.

Alternatively, again, we could simply write the columns of F1 (then F2) as a linear combination of the columns of v1, and solve for the components.

Alternatively, yet again, we could try to find a transition matrix relating F1 and v1, then F2 and v2. That’s awkward because they’re not square matrices. Guess what? Use a pseudo-inverse! (Very likely the next math post.)

I have no idea what is special about F1 and F2. They are orthonormal bases, like v1, for the pre-image of the range of B. Since they span the same space, they are related by rotations (and possibly a reflection).

end digression, return to main thread

Finally, what did I do that Cohen did not? The singular value decomposition, which got me a basis, v2, for the null space of B — which he did not have. And I daresay that v1 was easier to get than either F1 or F2.

I think having v2 is incredibly useful. I get a basis for metameric blacks.

(Of course, Cohen had computed the matrices Ma and Me, and Ga and Ge, but I don’t need them because I wasn’t computing F1 and F2.)

Let me summarize that.

Cohen

  • A and E;
  • Ma = A’A, Me = E’E, where Me = Ma^-1;
  • Ga and Ge, Cholesky decompositions of Ma and Me;
  • orthonormal bases F1 = A Ge and F2 = E Ga;
  • the projection operator R = E'\ A (= A\ (A'A)^{-1}\ A')\ .

Rip

  • A;
  • A’ = B = u w v’;
  • [v] = [v1] [v2] where both v1 and v2 are orthonormal bases;
  • the projection operator R = v1 v1′;
  • the dual basis E.

(Strictly speaking, A is the dual basis to E, but the relationship is symmetric. Nevertheless, the rows of A’ live in the dual space V*, while the columns of E live in the original space V.)

I dispensed with everything leading to F1 and F2. I use v1 in place of F1 and/or F2. I get a basis, v2, for the null space.

I think the key point is using the projection operator to split a spectrum into fundamental and null vectors. It’s a unique orthogonal decomposition into a vector which has the same tri-stimulus values as the original spectrum, and a vector in the null space (metameric black, with tri-stimulus values = {0,0,0}.)

(The decomposition is unique to A, but A itself is not unique. More to follow.)

PCA / FA Example 4: Davis. R-mode & Q-mode via the SVD.

let’s finally do this using the SVD. i need to do one terrible thing: where davis writes U and V, we need v and u, resp. (look, i couldn’t reliably keep translating davis’ equations in my head, so i had to use his notation; by the same token, i can’t reliably translate the SVD over and over again. thank god he used UC, upper case.)
if the correspondence had been u ~ U and v ~ V, the translation would have been trivial. unfortunately, the correspondence is
u ~ V
v ~ U.
(from the SVD posts, you recall that given X = u \ w \ v^T we conclude that v is an eigenvector matrix for X^T\ X, and u is an eigenvector matrix for XX^T. and, just as important, the nonzero values of w are the nonzero \sqrt{\text{eigenvalues}}.)
from the SVD of X, X = u \ w \ v^T
and the definition of A^R = U \ \sqrt{\text{eigenvalues}}
i conjecture that we could define A^R = v \ w^T.
from the definition of A^Q = V \ \sqrt{\text{eigenvalues}}
i conjecture that we could define A^Q = u \ w .
then from S^R = X \ A^R
we get S^R = u\ w\ v^T\ v \ w^T = u \ w \ w^T ,
since v is orthogonal.
and from S^Q = X^T\ A^Q
we get S^Q = v \ w^T \ u^T \ u \ w = v \ w^T \ w,
since u is orthogonal.
(almost) finally, from u \ w = \ A^Q \ and S^R = u \ w \ w^T
we get S^R = A^Q\ w^T\ ,
which is the first dual relationship between Q-mode and R-mode.
finally, from v \ w^T = A^R and S^Q =v \ w^T\ w
we get S^Q = A^R\ w,
which is the other dual relationship.
so we see that davis’ duality delationships make sense. 
let me summarize the SVD equations for the As and Qs:
A^R = v \ w^T
A^Q = u \ w
S^R = u \ w \ w^T = A^Q\ w^T
S^Q = v \ w^T\ w = A^R\ w.
BTW, i’ll compute using S^R = A^Q\ w^T and S^Q = A^R\ w because they involve two matrices instead of three.
it’s time to actually compute the SVD.
here’s u, v, and w:
u = \left(\begin{array}{cccc} -0.801784&0&0.597511&0.0110879\\ 0.267261&-0.816497&0.351732&0.371738\\ 0.&0.408248&-0.016937&0.912714\\ 0.534522&0.408248&0.720401&-0.169237\end{array}\right)
v = \left(\begin{array}{ccc} 0.816497&0&0.57735\\ -0.408248&-0.707107&0.57735\\ -0.408248&0.707107&0.57735\end{array}\right)
w = \left(\begin{array}{ccc} 9.16515&0&0\\ 0.&3.4641&0\\ 0.&0&0\\ 0.&0&0\end{array}\right)
(we could check that u and v are orthogonal, and in fact i did check that. OTOH, in contrast to eigenvector matrices, mathematica says u and v will always be orthogonal.)
we should compare u with V; here’s V:
V = \left(\begin{array}{cccc} 0.801784&0&-0.597614&0\\ -0.267261&0.816497&-0.358569&-0.365148\\ 0.&-0.408248&0&-0.912871\\ -0.534522&-0.408248&-0.717137&0.182574\end{array}\right)
most interesting. the 3rd and 4th eigenvectors are different although the 3rd one isn’t very different. what’s going on? well, they are two eigenvectors associated with the same eigenvalue (0), and in fact they span the null space of XX^T. (i think this is true for any repeated eigenvalue, that we could select any orthonormal basis for the multi-dimensional eigenspace, but i need to confrm that. in any case, it’s certainly ok here because the eigenvalue is zero and we’re dealing with the null space.)
what is significant, however, is that u and V have different signs on the two columns with nonzero eigenvalues, the 1st and 2nd.
now compare v and U:
v = \left(\begin{array}{ccc} 0.816497&0&0.57735\\ -0.408248&-0.707107&0.57735\\ -0.408248&0.707107&0.57735\end{array}\right)
U = \left(\begin{array}{ccc} 0.816497&0&-0.57735\\ -0.408248&-0.707107&-0.57735\\ -0.408248&0.707107&-0.57735\end{array}\right)
they differ only in the sign of one column that matters (because it has nonzero eigenvalue), the 1st. and that’s the problem: U and V are incompatible in the sign of the 2nd column. we don’t have to match davis; we just have to change the sign of one column if we want the expected correspondence between U ~ v and V ~ u.
let me just show the incompatibiity. from the SVD X = u \ w \ v^T,
i can always write w = u^T\ X\ v.
the question is: given U and V computed separately by eigendecompositions, can we use them in the SVD? let’s see what happens if we substitute U for v and V for u. that is, we compute
V^T\ X\ U.
V^T\ X\ U = \left(\begin{array}{cccc} 0.801784&-0.267261&0&-0.534522\\ 0.&0.816497&-0.408248&-0.408248\\ -0.597614&-0.358569&0&-0.717137\\ 0.&-0.365148&-0.912871&0.182574\end{array}\right) 
x \left(\begin{array}{ccc} -6&3&3\\ 2&1&-3\\ 0.&-1&1\\ 4&-3&-1\end{array}\right) x \left(\begin{array}{ccc} 0.816497&0&-0.57735\\ -0.408248&-0.707107&-0.57735\\ -0.408248&0.707107&-0.57735\end{array}\right)
= \left(\begin{array}{ccc} -9.16515&0&0\\ 0.&-3.4641&0\\ 0.&0&0\\ 0.&0&0\end{array}\right)
close. there’s just one little problem: the nonzero elements of w are supposed to be positive. this is what i mean when i say that U and V need not be compatible in general, and in this case they are not. after all, U and V came from independent eigendecompositions.
moving on, we compute A^R = v \ w^T
v \ w^T = \left(\begin{array}{ccc} 0.816497&0&0.57735\\ -0.408248&-0.707107&0.57735\\ -0.408248&0.707107&0.57735\end{array}\right) x \left(\begin{array}{cccc} 9.16515&0&0&0\\ 0&3.4641&0&0\\ 0&0&0&0\end{array}\right)
= \left(\begin{array}{cccc} 7.48331&0.&0.&0.\\ -3.74166&-2.44949&0.&0.\\ -3.74166&2.44949&0.&0.\end{array}\right)
we should compare that to what i got before.
\left(\begin{array}{ccc} 7.48331&0.&0.\\ -3.74166&-2.44949&0.\\ -3.74166&2.44949&0.\end{array}\right)
oh, dear. this emphasizes that i changed the definition of A^R when i went from an eigendecomposition to the SVD. OTOH, so did davis. 
if A^R is defined as the \sqrt{\text{eigenvalue}} weighted eigenvector matrix U, then it must be the same size as U. we would get it by post-multiplying U by a 3×3 diagonal matrix of eigenvalues. instead, mine is bigger than 3×3, while davis’ is smaller.
but if it is defined as A^R = v \ w^T, then A^R is not the same size as v (or U) because w is not square. 
i think this is just a matter of terminolgy. conceptually, i can best see the duality between Q-mode and R-mode using the SVD. 
OTOH, conceptually, if i want the \sqrt{\text{eigenvalue}}-weighted eigenvector matrix, i just post-multiply v by a diagonal matrix. in practice, i would just cut down A^R to get something the same size as v.
to put that another way, if we use davis’ square matrix of \sqrt{\text{eigenvalues}}, then we still see the duality and his A^R is the nonzero \sqrt{\text{eigenvalue}}-weighted eigenvector matrix.
(BTW, we’ve seen that square matrix of \sqrt{\text{eigenvalues}} before, in the derivation of the SVD. we needed to invert it. maybe i will start using it, too.)
if this is getting confusing, fall back on: what we have are u, w, and v; everything else follows from them.
moving on again, we compute A^Q = u \ w :
u \ w = \left(\begin{array}{cccc} -0.801784&0&0.597511&0.0110879\\ 0.267261&-0.816497&0.351732&0.371738\\ 0.&0.408248&-0.016937&0.912714\\ 0.534522&0.408248&0.720401&-0.169237\end{array}\right) 
x \left(\begin{array}{ccc} 9.16515&0&0\\ 0.&3.4641&0\\ 0.&0&0\\ 0.&0&0\end{array}\right)
= \left(\begin{array}{ccc} -7.34847&0.&0.\\ 2.44949&-2.82843&0.\\ 0.&1.41421&0.\\ 4.89898&1.41421&0.\end{array}\right)
then we compute S^R = A^Q\ w^T:
A^Q\ w^T = \left(\begin{array}{ccc} -7.34847&0.&0.\\ 2.44949&-2.82843&0.\\ 0.&1.41421&0.\\ 4.89898&1.41421&0.\end{array}\right) x \left(\begin{array}{cccc} 9.16515&0&0&0\\ 0.&3.4641&0&0\\ 0.&0&0&0\end{array}\right)
= \left(\begin{array}{cccc} -67.3498&0.&0.&0.\\ 22.4499&-9.79796&0.&0.\\ 0.&4.89898&0.&0.\\ 44.8999&4.89898&0.&0.\end{array}\right)
(i see no point to confirming that S^R = X \ A^R ; that was our definition.)
now we compute S^Q = A^R\ w:
A^R\ w = \left(\begin{array}{cccc} 7.48331&0.&0.&0.\\ -3.74166&-2.44949&0.&0.\\ -3.74166&2.44949&0.&0.\end{array}\right) x \left(\begin{array}{ccc} 9.16515&0&0\\ 0.&3.4641&0\\ 0.&0&0\\ 0.&0&0\end{array}\right)
= \left(\begin{array}{ccc} 68.5857&0.&0.\\ -34.2929&-8.48528&0.\\ -34.2929&8.48528&0.\end{array}\right)
we’ve done all the computations. (that’s an understatement: we done them three or four different ways!) 
next we will look at the interpretations.

PCA / FA example 4: Davis. R-mode FA, eigenvalues

from davis’ “statistics and data analysis in geology”, we take the following extremely simple example (p. 502). his data matrix is
D = \left(\begin{array}{ccc} 4&27&18\\ 12&25&12\\ 10&23&16\\ 14&21&14\end{array}\right)
where each column is a variable. he now centers the data, by subtracting each column mean from the values in the column.
let me do that. i compute the column means…
{10,\ 24,\ 15}
and subtract each one from the appropriate column, getting
X = \left(\begin{array}{ccc} -6&3&3\\ 2&1&-3\\ 0.&-1&1\\ 4&-3&-1\end{array}\right)
now let’s just follow him. he introduces the SVD (the singular value decomposition), by way of explanation, and we will use it, but not quite yet. he performs what’s called an R-mode factor analysis as follows.
compute X^T X:
X^T X = \left(\begin{array}{ccc} 56&-28&-28\\ -28&20&8\\ -28&8&20\end{array}\right)
then he computes the eigendecomposition. i get that the eigenvector matrix is… 
P = \left(\begin{array}{ccc} -2&0.&1\\ 1&-1&1\\ 1&1&1\end{array}\right)
and that the diagonal matrix of eigenvalues is …
\Sigma = \left(\begin{array}{ccc} 84&0.&0.\\ 0.&12&0.\\ 0.&0.&0.\end{array}\right)
we run into our first notational quagmire. he has shown us the SVD (X = u\ w\ v^{T}) , and what we have here is w^T w, the eigenvalues of X^T X (see “an SVD of X ~ eigenstructures of XTX and XXT”).
what he wants are the singular values w. no problem, you think: just take the square roots. sheesh, that’s what harman did, and maybe we’re beginning to see why.
unfortunately, davis is also using a 2×2 diagonal matrix instead of the 3×3 i get. to be more specific, i get w as a 4×3 and w^T w as a 3×3. to be completely specific, i get w is… 
w = \left(\begin{array}{ccc} 2 \sqrt{21}&0.&0.\\ 0.&2 \sqrt{3}&0.\\ 0.&0.&0.\\ 0.&0.&0.\end{array}\right)
and therefore w^T w is…
w^T w = \left(\begin{array}{ccc} 84&0.&0.\\ 0.&12&0.\\ 0.&0.&0.\end{array}\right)
which is, as it should be, the 3×3 matrix of eigenvalues.
well, having said that he wants the square roots, i’ll give them to him, but i will modify his notation. i take \Lambda to be a 3×3 diagonal matrix of square roots of eigenvalues, one of which is zero…
\Lambda = \left(\begin{array}{ccc} 9.16515&0.&0.\\ 0.&3.4641&0.\\ 0.&0.&0.\end{array}\right)
and i will use \Lambda 2 for his 2×2 matrix of nonzero square roots (which he called \Lambda).
\Lambda 2 = \left(\begin{array}{cc} 9.16515&0.\\ 0.&3.4641\end{array}\right)
and, i will need the 2×2 submatrix of eigenvalues:
\Sigma 2 = \left(\begin{array}{cc} 84&0.\\ 0.&12\end{array}\right)
perhaps i should point out that i have no intention of throwing away any parts of these matrices in my own work. but i need to make sure that i am following davis correctly, so i am carrying out his computations in parallel with my own.
besides, you might be trying to read davis even as i speak, another reason for me to display his work, too.
at the end of all that, all we have gotten is 2 positive eigenvalues of X^T X (84 and 12), and we have their two positive square roots; we have these two pairs of numbers laid out in both 2×2 and 3×3 diagonal matrices. we have far more notation that mathematics.
so much for the eigenvalues. 
next we’ll dig into the eigenvectors.

the SVD: preface

Rather than edit the following SVD posts – unless I have to – let me put some prefatory material here. I am also going to answer a couple of email questions in comments, partly just to check out their mechanics.

The following posts in the math SVD category should be read in the order displayed, not starting from the back.

That may have been a bad idea on my part, but let’s see how it works out. The difficulty is that most of what I do out here will be available in the usual blog-order, latest first. Eventually it may be awkward if one category is in reverse order wrt everything else.

I would like to add two pieces of vocabulary. For the SVD X = u\ w\ v^T the columns of u and v are called the left singular vectors of X and the right singular vectors of X, respectively.

I have also discovered that the latex interpreter works in comments. Cool! My understanding is that it does not work in forums; they would be the appropriate place to post code – and have it display as code! After all, that’s where I found the key starting points for latex.

the SVD (singular value decomposition) stuff

ok, i’ve published everything i planned to put out there for the SVD. not to say i won’t need to add things, if anyone ever has any questions about what i meant….  

yes, i need to put out a bibliography, too. but i’ve been at this all day, and it’s time to stop. incidentally, the SVD stuff is a category of its own – not a page as i originally set up the first (last!) installment.

i should probably emphasize that the SVD category is posted in the order i intend it to be read. 

it looks like quite a pile of stuff, and it’s scary to think that it’s all only background for principal components analysis!

one advantage to doing as much work as i did today is that i’ve managed to streamline the process. take a piece of the latex output from mathematica®, drop it into TextWrangler, edit it massively there, then copy a preliminary version into the wordpress editor, and finally edit there until it works.

well, i’m thrilled to have found a way to play with the SVD, after all the years i’ve meant to. i need to actually write down the list of all the other things i’ve been meaning to look at.

the Singular Value Decomposition (SVD)

The singular value decomposition (SVD) is marvelous. It says that any real matrix X can be decomposed as

X = u\ w\ v^T [singular value decomposition]

where u and v are orthogonal (i.e. rotations) and w is as close to diagonal as possible.To be more precise,

w = \left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right)

where in fact \Sigma is not only a diagonal matrix but it has only positive elements. Finally, we order the elements of w from largest to smallest(don’t sweat it if some are equal).
(the theorem is true for any complex matrix, provided the transpose is replaced by the conjugate transpose, and u and v are unitary instead of orthogonal. i just want to stick with the transpose. Or you could simply interpret v^T as the conjugate transpose of v in everything i do.)
My main reference for this is Stewart: short and sweet, with some history and the derivation. Strang has a gentler discussion, but no derivation.

the SVD generalizes eigenstructure – one correction 5/22/08

I’ve seen two things like this. First, the diagonalization of a square matrix A:

D = P^{-1}\ A\ P [diagonalization]


where 

D is diagonal

correction: if A is symmetric, P may be chosen orthogonal.


We recall that not every matrix can be diagonalized; instead they can be brought to Jordan Canonical form. OTOH the set of matrices which cannot be diagonalized is a subset of measure zero. Nevertheless, for some applications (e.g. differential equations), matrices which cannot be diagonalized play a significant role. I’m sure i’ll talk about this someday.   

Looks a little different from the SVD, you say? we always have to pay attention to whether we decomposed a matrix or transformed it. Rewrite the transformation as a decomposition of A: 

P\ D\ P^{-1} = A

and then as

A = P\ D\ P^T [eigenvector decomposition]

since P was chosen orthogonal.

Now that resembles the SVD:

X = u\ w\ v^T .

A is square, X is not (in general); D is diagonal most of the time, while w is as close to diagonal as a rectangular matrix can get; and instead of one orthogonal matrix P, we have two orthogonal matrices u and v.

The SVD is a generalization of the eigenvector decomposition, from square matrices to matrices of arbitrary shape.      

Among other things, this means that any square matrix can be diagonalized by u and v of its SVD; it just can’t always be done with u = v = P.

To put that another way, if we can change the bases independently in the domain and codomain, then we can diagonalize any square matrix; but if werequire the same one change of basis in both the domain and codomain, then we may not be able to diagonalize the matrix.

the SVD specializes change-of-bases

the general formula relating two matrix representations of a single linear operator L is:

B = Q^{-1}\ A\ P [change of bases]


where P and Q are invertible (hence square) matrices (of the appropriate sizes). the best reference i’ve ever seen for this is — are you ready?

the Schaum’s outline for linear algebra, by Lipschutz.

that equation is written as a transformation. We have a linear operator

L: V –>W,

and we have its matrix A WRT bases in each space V and W. then we do a change of basis P on the domain V, and a change of basis Q on the codomain W, and we find that the matrix B of L WRT the new bases is given by the change of bases equation.

we may rewrite it as a decomposition:

A = Q\ B\ P^{-1} .

we can’t replace P^{-1} by P^T because P is an arbitrary invertible matrix: we may have made any legal change-of-basis, not the special case of a rotation.
The SVD 

X = u\ w\ v^T 

is a special case of a change of basis: X and w represent the same linear operator WRT different bases. in particular, since v is orthogonal, we can replace v^T by v^{-1} :

  • X ~ A
  • w ~ B
  • u ~ Q
  • v ~ P 

So the SVD tells us that any matrix can be represented by a nearly diagonal matrix.
Why don’t we end up with a possible jordan canonical form for the SVD? or to put it another way, why does the SVD of a jordan canonical form still turn out to have nonzero terms only on the diagonal?
Because we have two matrices u and v instead of one P, to effect the decomposition.
(Of course, the eigenvector diagonalization is also a special case of the change-of-bases equation, with only one basis to be changed.)

the SVD provides real-valued rank

Suppose that X has a singular value decomposition 

X = u\ w\ v^T ;

n nonzero singular values (nonzero elements of w);

the smallest nonzero singular value is \epsilon > 0 .

Then,

X is of rank n

X differs from a matrix Y of rank n-1 by \epsilon ;

no matrix of rank n-1 differs from X by less than \epsilon

we may compute one such matrix Y as follows: 

the matrix

Y = u\ w2\ v^T

is of rank n-1 and differs from X by \epsilon  , where w2 is w with \epsilon  replaced by 0.

If, for example, X had 5 nonzero singular values and the smallest one was .15, then we could say that the rank of X was 4.15 . and if we replace the .15 in w by 0 in w2 and compute Y, we get a matrix Y of rank 4 which differs from X by as little as possible.
i remark that i do this by changing a value in the matrix w, leaving u and v alone; hence Y is the same shape as X. There are those who would drop a row and column from w, then drop a column from each of u and v. (see below under alternative SVD.)