the SVD: preface

Rather than edit the following SVD posts – unless I have to – let me put some prefatory material here. I am also going to answer a couple of email questions in comments, partly just to check out their mechanics.

The following posts in the math SVD category should be read in the order displayed, not starting from the back.

That may have been a bad idea on my part, but let’s see how it works out. The difficulty is that most of what I do out here will be available in the usual blog-order, latest first. Eventually it may be awkward if one category is in reverse order wrt everything else.

I would like to add two pieces of vocabulary. For the SVD X = u\ w\ v^T the columns of u and v are called the left singular vectors of X and the right singular vectors of X, respectively.

I have also discovered that the latex interpreter works in comments. Cool! My understanding is that it does not work in forums; they would be the appropriate place to post code – and have it display as code! After all, that’s where I found the key starting points for latex.

the SVD (singular value decomposition) stuff

ok, i’ve published everything i planned to put out there for the SVD. not to say i won’t need to add things, if anyone ever has any questions about what i meant….  

yes, i need to put out a bibliography, too. but i’ve been at this all day, and it’s time to stop. incidentally, the SVD stuff is a category of its own – not a page as i originally set up the first (last!) installment.

i should probably emphasize that the SVD category is posted in the order i intend it to be read. 

it looks like quite a pile of stuff, and it’s scary to think that it’s all only background for principal components analysis!

one advantage to doing as much work as i did today is that i’ve managed to streamline the process. take a piece of the latex output from mathematica®, drop it into TextWrangler, edit it massively there, then copy a preliminary version into the wordpress editor, and finally edit there until it works.

well, i’m thrilled to have found a way to play with the SVD, after all the years i’ve meant to. i need to actually write down the list of all the other things i’ve been meaning to look at.

the Singular Value Decomposition (SVD)

The singular value decomposition (SVD) is marvelous. It says that any real matrix X can be decomposed as

X = u\ w\ v^T [singular value decomposition]

where u and v are orthogonal (i.e. rotations) and w is as close to diagonal as possible.To be more precise,

w = \left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right)

where in fact \Sigma is not only a diagonal matrix but it has only positive elements. Finally, we order the elements of w from largest to smallest(don’t sweat it if some are equal).
(the theorem is true for any complex matrix, provided the transpose is replaced by the conjugate transpose, and u and v are unitary instead of orthogonal. i just want to stick with the transpose. Or you could simply interpret v^T as the conjugate transpose of v in everything i do.)
My main reference for this is Stewart: short and sweet, with some history and the derivation. Strang has a gentler discussion, but no derivation.

the SVD generalizes eigenstructure – one correction 5/22/08

I’ve seen two things like this. First, the diagonalization of a square matrix A:

D = P^{-1}\ A\ P [diagonalization]


where 

D is diagonal

correction: if A is symmetric, P may be chosen orthogonal.


We recall that not every matrix can be diagonalized; instead they can be brought to Jordan Canonical form. OTOH the set of matrices which cannot be diagonalized is a subset of measure zero. Nevertheless, for some applications (e.g. differential equations), matrices which cannot be diagonalized play a significant role. I’m sure i’ll talk about this someday.   

Looks a little different from the SVD, you say? we always have to pay attention to whether we decomposed a matrix or transformed it. Rewrite the transformation as a decomposition of A: 

P\ D\ P^{-1} = A

and then as

A = P\ D\ P^T [eigenvector decomposition]

since P was chosen orthogonal.

Now that resembles the SVD:

X = u\ w\ v^T .

A is square, X is not (in general); D is diagonal most of the time, while w is as close to diagonal as a rectangular matrix can get; and instead of one orthogonal matrix P, we have two orthogonal matrices u and v.

The SVD is a generalization of the eigenvector decomposition, from square matrices to matrices of arbitrary shape.      

Among other things, this means that any square matrix can be diagonalized by u and v of its SVD; it just can’t always be done with u = v = P.

To put that another way, if we can change the bases independently in the domain and codomain, then we can diagonalize any square matrix; but if werequire the same one change of basis in both the domain and codomain, then we may not be able to diagonalize the matrix.

the SVD specializes change-of-bases

the general formula relating two matrix representations of a single linear operator L is:

B = Q^{-1}\ A\ P [change of bases]


where P and Q are invertible (hence square) matrices (of the appropriate sizes). the best reference i’ve ever seen for this is — are you ready?

the Schaum’s outline for linear algebra, by Lipschutz.

that equation is written as a transformation. We have a linear operator

L: V –>W,

and we have its matrix A WRT bases in each space V and W. then we do a change of basis P on the domain V, and a change of basis Q on the codomain W, and we find that the matrix B of L WRT the new bases is given by the change of bases equation.

we may rewrite it as a decomposition:

A = Q\ B\ P^{-1} .

we can’t replace P^{-1} by P^T because P is an arbitrary invertible matrix: we may have made any legal change-of-basis, not the special case of a rotation.
The SVD 

X = u\ w\ v^T 

is a special case of a change of basis: X and w represent the same linear operator WRT different bases. in particular, since v is orthogonal, we can replace v^T by v^{-1} :

  • X ~ A
  • w ~ B
  • u ~ Q
  • v ~ P 

So the SVD tells us that any matrix can be represented by a nearly diagonal matrix.
Why don’t we end up with a possible jordan canonical form for the SVD? or to put it another way, why does the SVD of a jordan canonical form still turn out to have nonzero terms only on the diagonal?
Because we have two matrices u and v instead of one P, to effect the decomposition.
(Of course, the eigenvector diagonalization is also a special case of the change-of-bases equation, with only one basis to be changed.)

the SVD provides real-valued rank

Suppose that X has a singular value decomposition 

X = u\ w\ v^T ;

n nonzero singular values (nonzero elements of w);

the smallest nonzero singular value is \epsilon > 0 .

Then,

X is of rank n

X differs from a matrix Y of rank n-1 by \epsilon ;

no matrix of rank n-1 differs from X by less than \epsilon

we may compute one such matrix Y as follows: 

the matrix

Y = u\ w2\ v^T

is of rank n-1 and differs from X by \epsilon  , where w2 is w with \epsilon  replaced by 0.

If, for example, X had 5 nonzero singular values and the smallest one was .15, then we could say that the rank of X was 4.15 . and if we replace the .15 in w by 0 in w2 and compute Y, we get a matrix Y of rank 4 which differs from X by as little as possible.
i remark that i do this by changing a value in the matrix w, leaving u and v alone; hence Y is the same shape as X. There are those who would drop a row and column from w, then drop a column from each of u and v. (see below under alternative SVD.)

an SVD of X ~ eigenstructures of XTX and XXT

OK, back to the drawing board. If we have a singular value decomposition of X, 

X = u\ w\ v^T

with u and v orthogonal, then

X^T X = (u\ w\ v^T)^T\ u\ w\ v^T

= v\ w^T u^T\ u\ w\ v^T

= v\ w^T w\ v^T

=: v\ D\ v^T

= v\ D\ v^{-1}

where we have defined

D = w^T w .

That is, if we have an SVD of X, then we can construct an eigendecomposition of X^T X .

(conceptually, the derivation of the SVD is nothing more than reversing that implication: given an eigendecomposition of X^T X , get an SVD for X. the challenge is to construct u.)

Note that i did not say the SVD of X, nor the eigendecomposition of X^T X . neither is unique. sometimes i may forget, so if i do refer to the SVD of a matrix or the eigendecomposition of a matrix, please forgive me. of course, “the SVD” as such means any decomposition of the form

X = u\ w\ v^T 

We can also construct an eigendecomposition of XX^T . you should expect that it uses u instead of v.

X\ X^T =(u\ w\ v^T) (u w v^T)^T

= (u\ w\ v^T)v\ w^T u^T

= u\ w\ w^T u^T

=: u\ D_2\ u^{-1} .

where we have defined

D_2 = w\ w^T .

since w is not square in general, w^T \ w \ne w\ w^T . That is, D \ne D_2 (they’re not even the same size), but both are diagonal, and both have the same nonzero entries. In fact, and we want to know this,

the nonzero entries of D or D_2 are the squares of the nonzero entries in w.

the SVD: derivation

Here’s where i get to be honest about what i stumbled over. I’ve read the proof in stewart. this is just an expanded version of it, although i want to change the notation. Some of what i need to do is obvious; some of it didn’t register. i had to go back and peek at his proof again.
Let X be a real matrix. We wish to show that    

X = u\ w\ v^T     

with u and v orthogonal, and     

w=\left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right), and

\Sigma  diagonal, all elements positive.

Read the rest of this entry »

the SVD: nonzero singular values?

i should babble something about singular values and \Sigma . the issue is whether “nonzero singular values” is redundant because singular values are defined to be nonzero.
For technical purposes – the proof we just did! – it is extremely useful, even essential, to let \Sigma hold precisely the nonzero elements; we write the decomposition of w into

w=\left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right)  

so that we can invert the diagonal submatrix  \Sigma .

in practice, however, it is useful and conceptually convenient (for me at least) to imagine that the diagonal of \Sigma  is extended to whichever boundary it hits first. in practice, then, w is of one of the following forms:     

\left(\begin{array}{c}\Sigma\\O\end{array}\right)  or  \left(\begin{array}{cc}\Sigma&O\end{array}\right)  

then we can say

all the entries of \Sigma  are nonzero iff w is of full rank.

we can’t phrase it that way if \Sigma  is automatically made smaller to contain only nonzero entries.

Read the rest of this entry »

the SVD: overview of the proof

sometimes a proof is of comparable importance to the theorem itself. Sometimes, at the other extreme, the proof is almost trivial. In this case, the proof falls between those limits. to be specific, the decompositions of

u = (u1 u2)

and

v = (v1 v2)

are worth knowing in their own right.  

let me outline the proof. 

  •  get the eigendecomposition of  X^T X = v\ D^2\ v^{-1}  
  •  … with orthogonal basis of eigenvectors v  
  •  … and the diagonal eigenvalue matrix written as  D^2 
  •  split the matrix v = (v1 v2), with all the zero eigenvalues associated with v2  
  •  let \Sigma  be the diagonal matrix of square roots of the nonzero eigenvalues  
  •  define u_1 = X\ v_1\ {\Sigma}^{-1}  by analogy with the impossible (because w is not invertible) u = X\ v\ w^{-1} 
  •  extend u1 by gram-schmidt to u.