the SVD: derivation

Here’s where i get to be honest about what i stumbled over. I’ve read the proof in stewart. this is just an expanded version of it, although i want to change the notation. Some of what i need to do is obvious; some of it didn’t register. i had to go back and peek at his proof again.
Let X be a real matrix. We wish to show that    

X = u\ w\ v^T     

with u and v orthogonal, and     

w=\left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right), and

\Sigma  diagonal, all elements positive.

OK, form A=X^T X. it is square, diagonalizable, and all of its eigenvalues are real; in fact they are non-negative. Even more, the positive numbers are the squares of the numbers in \Sigma . we would usually write  

D= v^{-1}\ A\ v  ,

with v orthogonal, but we know that we’re going to get our w’s from the square roots of these eigenvalues.

So let’s change our notation and write    

D^2 = v^{-1}\ A\ v .     

this was the first thing i forgot when i tried to work this out for myself. It turns out there’s another good reason for using the square, and it will present itself later if we don’t do it now.

Now,  D^2 may have some zero eigenvalues. WLOG all of the eigenvalues may be ordered, getting smaller (more precisely, not getting larger) as the indices increase:  

\sigma_1^2\ge \sigma_2^2 \ge ... \ge \sigma_r^2 > 0 = ... = 0 

(and yes, the ordering of the columns of v depends on the order of the eigenvalues.)

Let  \Sigma^2 be the diagonal submatrix of  D^2 consisting of the positive eigenvalues (it may be all of  D^2. split v = (v1 v2) so that v1 corresponds to the nonzero eigenvalues, v2 to the zero eigenvalues, if any.    why? because a diagonal matrix can be inverted, and it’s trivial, iff all the diagonal entries are nonzero. We’ve just split off the invertible part of  D^2 .

We have split    

D^2 = v^T\ A\ v


\Sigma^2 = v1^T\ A\ v1


 0 = v2^T\ A\ v2


A\ v2 = \lambda \ v2 = 0

(by definition all the vectors in v2 have eigenvalue \lambda=0. (the second equation, hence v2, may be vacuous.)

In fact, what we have is a matrix equation


?=\left(\begin{array}{c}v1^T\\v2^T\end{array}\right) \ A \ (v1 \ v2)

=\left(\begin{array}{c}v1^T\\v2^T\end{array}\right) \ (Av1 \ Av2)

=\left(\begin{array}{cc}{v1^T Av1}&{v1^T Av2}\\{v2^T A v1}&{v2^T Av2}\end{array}\right) 

=\left(\begin{array}{cc}{\Sigma^2}&O\\O&O\end{array}\right) .


A v2 = 0 , which zeroes out the 2nd column;

and while

A \ v1 \ne 0 ,

we do have

A \ v1 \propto v1 ,

but then  

v2^T v1 = 0 since all the vectors in v are mutually orthogonal, and this zeroes out the southwest block of the matrix.

All of that is wonderful, but it applies to A = X^T X . what about X itself? this is the part that didn’t register when i was reading it. sheesh! just expand A. (ouch!)  from

\Sigma^2 = v1^T\ A\ v1

we get  

\Sigma^2 = v1^T\ X^T \ X\ v1

OK, we’ve gotten v and v1, v2. How do we get u? well, want do we want?


 X = u\ w\ v^T

we might suspect that we want something of the form  

u= X\ v\ w^{-1}

but of course w isn’t invertible. Ah, but \Sigma is. Let’s try  

u1 = X \ v1 \ \Sigma^{-1}

we know that u is to be orthogonal, so u1 needs to be a set of orthonormal vectors. how do we show that? (i need to re-work this, but here it is.) u1 has exactly the same elements as  

  \Sigma^2 = v1^T \ X^T X \ v1   

so clearly we want to post-multiply by  \Sigma^{-1} :

\Sigma = v1^T \ X^T X \ v1 \ \Sigma^{-1}  

and then we need to realize that the diagonal matrix \Sigma is its own transpose:

\Sigma = \Sigma^T

so that 

 u1^T = \Sigma^{-1} \ v1^T \ X^T

and that 

\Sigma = v1^T \ X^T \ u1

is in fact  

I = u1^T \ u1 .

That looks like u1 is orthogonal. It isn’t, actually, because it isn’t square, so the product isn’t square. That combination would be called an orthonormal matrix instead. Each column of u1 is a unit vector, and all the columns are mutually orthogonal. There just aren’t enough of them.

(BTW, i found the term “orthonormal matrix” defined in applications of the SVD. i had come across the term before, but never seen it defined in a math book. And no, i didn’t say it wasn’t in any of them, just that i had never seen it defined before.)

so there just aren’t enough of them?

Good golly, miss molly! we know what to do when we don’t have enough orthonormal vectors. Just extend the set of column vectors of u1 by gram-schmidt, and let u2 be the columns of the new vectors. That is, we let  

u = (u1 u2). 

There’s nothing left to do but to see if it worked. Compute

u^T\ X\ v 

=\left(\begin{array}{c}u1^T\\u2^T\end{array}\right) \ X \ (v1 \ v2)

=\left(\begin{array}{c}u1^T\\u2^T\end{array}\right) \ (Xv1 \ Xv2)

=\left(\begin{array}{cc}{u1^T Xv1}&{u1^T Xv2}\\{u2^T X v1}&{u2^T Xv2}\end{array}\right) 

 now what? u1 was defined in terms of v1, so use it:

u1 = X \ v1 \ \Sigma^{-1}


X \ v1 = u1 \ \Sigma 

we will also need to realize that the eigenvectors v2 in the null space of A are also in the null space of X: i.e.

 A \ v2 = 0 = X^T X \ v2


X \ v2 = 0 .

we have 

u^T\ X\ v  

=\left(\begin{array}{cc}{u1^T Xv1}&{u1^T Xv2}\\{u2^T X v1}&{u2^T Xv2}\end{array}\right)  

 =\left(\begin{array}{cc}{u1^T Xv1}&O\\{u2^T X v1}&O\end{array}\right)

=\left(\begin{array}{cc}{u1^T u1 \Sigma}&O\\{u2^T u1 \Sigma}&O\end{array}\right)


finally, we arrange that into the usual form:

X= u \ \left(\begin{array}{cc}\Sigma&O\\O&O\end{array}\right) \ v^T 

X = u \ w \ v^T 


(whew! it takes too long to do all that, and i may have defeated the purpose of explaining it. dang!) 



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: