the matrix exponential: 3 of 3

discussion

There was a lot going on in the example of the SN decomposition (2 of 3). First off, we found eigenvalues of a non-diagonable matrix A, and constructed a diagonal matrix D from them. Then we found 2 eigenvectors and 1 generalized eigenvector of A, and used them to construct a transition matrix P. We used that transition matrix to go from our diagonal D back to the original basis, and find S similar to D.

So S is diagonable while A is not. And A and S have the same eigenvalues; and the columns of P should be eigenvectors of S. They are. The generalized eigenvector that we found for A is an (ordinary) eigenvector of S, but we had to get a generalized eigenvector of A in order to construct S from D.

I wonder. Can I understand the distinction between eigenvectors and generalized eigenvectors by studying S and A? We’ll see.

I would also remark that A was special in one sense: it was lower triangular. It is not an accident that the eigenvalues of A are its diagonal elements. We could have written the diagonal matrix D by inspection. Instead, I got it together with the eigenvectors.

A question: what does P do to A? We compute

P^{-1}\ A\ P = \left(\begin{array}{lll} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{array}\right)

It brings A to Jordan Canonical Form. (Did I make an especially good choice for the generalized eigenvector? I don’t know. It may be that any choice would have led to the JCF; or maybe not. My vague recollection from examples years ago is that I lucked out, that in general I have to use some of that considerable freedom I found in v to get JCF.)

If finding generalized eigenvectors is relatively painless for you, you may be happy with the N+S decomposition. (Of course, if you have a “matrix exponential” command available, you’re done.) If not, another possibility is to use the Cayley-Hamilton theorem: any matrix satisfies it’s own characteristic equation. In this example, the characteristic equation is

\lambda ^3-5 \lambda ^2+8 \lambda -4 = 0

(The roots of that are the eigenvalues, of course.) The Cayley-Hamilton says that

A^3\ - 5\ A^2 + 8\ A - 4 = 0

where the RHS must be the 3×3 zero matrix, and the –4 on the LHS must multiply the 3×3 identity matrix. And, indeed, A and its powers satisfy that equation.

I used to wonder what the Cayley-Hamilton theorem was good for. One thing it’s good for is turning higher powers of A into lower ones. Use it to express A^3 in terms of I, A, and A^2. then reduce A^4, and keep going. For our example, we could replace the infinite series in A by 3 infinite series of scalars: one series multiplies I, another multiplies A, and the third multiplies A^2. Maybe we could see the patterns for the three scalar series more easily than a pattern for the series in A itself.

There is yet another way to so this; we’ll see it when I look at the spectral decomposition theorem out of Halmos.

Ah, there’s one last point I want to make. Schur’s lemma told us that any matrix could be brought to upper triangular form. I didn’t say this, but any nilpotent matrix is similar to a strictly upper triangular matrix (i.e. upper triangular with zero diagonal). It’s also similar to a strictly lower triangular form; I think upper or lower should just depend on an ordering of the basis.

So why not have just split our triangular matrix into diagonal plus nilpotent? Because we need S and N to commute. Our example is a perfect illustration: we have a lower triangular matrix

A = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 1 & 1 & 2\end{array}\right)

so we can write it as the sum of diagonal \Sigma

\Sigma = \left(\begin{array}{lll} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2\end{array}\right)

and nilpotent B

B = \left(\begin{array}{lll} 0 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 1 & 0\end{array}\right)

While B is, indeed, nilpotent, it turns out that it’s B^3 which is equal to zero instead of B^2.

Do B and \Sigma commute? We compute one

B\ \Sigma = \left(\begin{array}{lll} 0 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 2 & 0\end{array}\right)

and the other

\Sigma \ B = \left(\begin{array}{lll} 0 & 0 & 0 \\ -2 & 0 & 0 \\ 2 & 2 & 0\end{array}\right)

No, they do not commute. It is nontrivial that we can find N and S which commute. We needed to go from D back to S, and get N = A – S.

Incidentally, the online reference linked back at the beginning of 1 of 3 shows that schur’s lemma itself can be used to compute the matrix exponential numerically, although the triangular form which schur’s lemma gives us cannot be used to get N and S. (I.e. we can use schur’s lemma as the basis for a different algorithm.)

the matrix exponential: 2 of 3

let’s look at an example of the SN decomposition.

This comes from Perko p. 35, example 3. We will compute exp(A) using the SN decomposition described in the previous post (“1 of 3″).

We take the following matrix:

A = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 1 & 1 & 2\end{array}\right)

I ask Mathematica® to find its eigenvalues and eigenvectors. For eigenvalues, I get

\lambda = \{2,\ 2,\ 1\}

For the eigenvector matrix, I get

\left(\begin{array}{lll} 0 & 0 & -1 \\ 0 & 0 & -1 \\ 1 & 0 & 2\end{array}\right)

We have three eigenvalues, one repeated. More importantly, we have only two eigenvectors. By that column of zeroes, Mathematica® is telling us that the repeated eigenvalue 2 has only one eigenvector

\{0,\ 0,\ 1\}

associated to it.

Let’s look for a generalized eigenvector for \lambda = 2. We solve {\left(A - 2I\right)}^2\ v = 0. A – 2 I is

A - 2 I = \left(\begin{array}{lll} -1 & 0 & 0 \\ -1 & 0 & 0 \\ 1 & 1 & 0\end{array}\right)

and {\left(A - 2I\right)}^2 is

{\left(A - 2I\right)}^2 = \left(\begin{array}{lll} 1 & 0 & 0 \\ 1 & 0 & 0 \\ -2 & 0 & 0\end{array}\right)

so we apply {\left(A - 2I\right)}^2 to an arbitrary vector and set the result to 0…

0 = \left(\begin{array}{lll} 1 & 0 & 0 \\ 1 & 0 & 0 \\ -2 & 0 & 0\end{array}\right) \left(\begin{array}{l} v1 \\ v2 \\ v3\end{array}\right)

and find that

\{\text{v1} = 0,\text{v1} = 0,-2 \text{v1} = 0\}

Mother Nature may not scream her secrets at us very often, but she’s sure about one thing: v1 = 0. That’s fine, because just as clearly, v2 and v3 are arbitrary. So one simple choice for a generalized eigenvector is {0, 1, 0}, and then my P matrix becomes

P = \left(\begin{array}{lll} 0 & 0 & -1 \\ 0 & 1 & -1 \\ 1 & 0 & 2\end{array}\right)

Our diagonal matrix of eigenvalues is

D = \left(\begin{array}{lll} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{array}\right)

We undo the similarity transform to get S:

S = P\ D\ P^{-1} = \left(\begin{array}{lll} 1 & 0 & 0 \\ -1 & 2 & 0 \\ 2 & 0 & 2\end{array}\right)

We compute N = A – S

N = A - S = \left(\begin{array}{lll} 0 & 0 & 0 \\ 0 & 0 & 0 \\ -1 & 1 & 0\end{array}\right)

Now we start computing powers of N. We don’t have to go far at all, because we compute and find that N^2 = 0. That is, in the expansion of exp(N), the only nonzero terms are I + N, so we compute it:

exp(N) = I + N = \left(\begin{array}{lll} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 1 & 1\end{array}\right)

and the matrix exp of D is, of course,

exp(D) = \left(\begin{array}{lll} e^2 & 0 & 0 \\ 0 & e^2 & 0 \\ 0 & 0 & e\end{array}\right)

Finally,

exp(A) = P\ exp(D)\ P^{-1}\ exp(N) \ ,

i.e.

exp(A) = \left(\begin{array}{lll} e & 0 & 0 \\ e-e^2 & e^2 & 0 \\ -2 e+e^2 & e^2 & e^2\end{array}\right)

I think this is just so slick. Harry Potter, eat your heart out: I know some true magic. (Don’t misunderstand, I do like the Harry Potter books. Right now I’m struggling thru the first one in German.)

the matrix exponential: 1 of 3

discussion

one correction 30 Aug 2008, per Brian Hall’s comment and counterexample here. see “edit” in this post.

Let me emphasize something I take for granted: i’m writing about theory as opposed to computations by computer. every time I say, for example, that two matrices A and B are similar via

B = P^{-1}\ A\ P\ ,

I am asserting that P is a transition matrix for a change-of-basis, and by definition it is invertible. In practice, the matrix P can be ill-conditioned, and computing  P^{-1} on a computer may be extremely hazardous.

An updated version of a classic paper may be found here . Having just looked at it, I may pepper my theoretical discussion with some vaguely computational comments.

Back to my ivory tower. Well, it’s hardly that, but it’s still far removed from numerical algorithms.

The definition of the matrix exponential is easy enough to write: as the exponential of a number x has the expansion

e^x = 1 + x + x^2\ /\ 2+ x^3\ /\ 3!\ +\ ...

we define

exp(A) = I + A + A^2\ /\ 2 + A^3\ /\ 3!\ +\ ....

But how do we actually compute it (in theory)? Well, we could just try computing the powers of A and look for a pattern, but there are easier ways. Far easier ways. (and just doing it numerically can be a bad idea: we can encounter disastrous cancellation between one power of A and the next.)

Even though I have a Mathematica® command which computes matrix exponentials for me, it’s nice to have some idea how I could check a computation. (I suspect that if the difficulties are numerical, anything else I try will be wrecked on the same rocks. OTOH, at least once a Mathematica® update broke some perfectly reliable code. It is plausible to try alternate methods.)

First off, if the matrix A is diagonal, with diagonal elements \left(d_1,\ ...,\ d_n\right) then its square A^2 is diagonal with diagonal elements \left({d_1}^2,\ ...,\ {d_n}^2\right) and its kth power A^k is diagonal with diagonal elements \left({d_1}^k,\ ...,\ {d_n}^k\right)\ . That is, powers of a diagonal matrix are exactly powers of the diagonal elements. It follows that if A is diagonal, exp(A) is just the exponentials of the diagonal elements: exp(A) is diagonal with elements \left(e^{d_1},\ ...,\ e^{d_n}\right)\ . We saw such an example here in Jz

Jz = \left(\begin{array}{lll} \hbar  & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & -\hbar \end{array}\right)

when we computed the matrix exponential (note the negative sign among all the multipliers)

\text{MatrixExp}\left[-\frac{i\ \text{Jz}\ \theta   }{\hbar }\right]

and got

\left(\begin{array}{lll} e^{-i \theta } & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & e^{i \theta }\end{array}\right)

(All those multipliers are a minor issue: as we have seen before, multiplying a matrix by a scalar multiplies all the eigenvalues but has no effect on the eigenvectors of the matrix.)

Second, if a matrix A is not diagonal but is diagonable to D, then we have

D = P^{-1}\ A\ P\ ,

i.e.

A = P\ D\ P^{-1}\ .

then

A^2 = P\ D\ P^{-1}\ P\ D\ P^{-1} = P\ D^2\ P^{-1}

and, in general,

A^k = P\ D^k\ P^{-1}\ .

That is, all the interior P’s and P-inverses cancel each other, leaving us with the same similarity transform. It follows that if A is diagonable to D, then
exp(A) = P\ exp(D)\ P^{-1}\ . So we just do a similarity transform of A to get D, take the exponential of D, and then reverse the similarity transform for A^k as for A. (As I said, P^{-1} may challenging in practice, but in theory either it exists or it doesn’t.)

(Although we need the transformation to diagonal form for getting exp(D), there was nothing special about D in the cancellation of terms: if A and B are similar, with

B = P^{-1}\ A\ P\ ,

then powers of A and B are similar, with

B^k= P^{-1}\ A^k\ P\ .)

If a matrix is not diagonable, then there are a number of different ways to handle it. If we could transform it to Jordan Canonical Form (JCF) , then we would just need to get the exponential of a Jordan matrix. But transforming a matrix to JCF is not all that easy, and there’s a lot of different kinds of them to find the exponential of.

While we’re on the subject…. If a matrix A can be diagonalized, the matrix P which accomplishes that consists of linearly independent eigenvectors. In fact, we see how diagonalization must fail if it fails. If we have enough linearly independent eigenvectors to form P, then P must diagonalize A. if A cannot be diagonalized, we must be unable to form P: we must not have enough linearly independent eigenvectors. (An nxn matrix which does not have n linearly independent matrices is called defective.)

But, if A cannot be diagonalized, we must still be able to bring it to JCF – or another non-diagonal canonical form – by a similarity transform, i.e. by a change of basis. I wish I had thought to ask a simple question years ago: what is the matrix that accomplishes that? Answer: we add columns of “generalized eigenvectors” to the columns of eigenvectors. There must be a similarity transformation; it just can’t have eigenvectors for all its columns.

One way to describe generalized eigenvectors g of A is this: let \lambda be an eigenvalue of the nxn matrix A of multiplicity m <= n; then for k = 1, … m, (i.e. for any power up to the multiplicity of \lambda\ ) any nonzero solution v of

(A - \lambda\ I)^k\ v = 0

is called a generalized eigenvector of A. An eigenvector satisfies that equation with k = 1.

If we find enough eigenvectors, we never go beyond k = 1. if we have an eigenvalue of multiplicity 2 that has only one eigenvector, then we look at (A - \lambda\ I)^2\ v = 0, and stop. Et cetera.

It turns out that we don’t actually have to go to JCF. There is another kind of matrix for which it is extremely easy to compute the matrix exponential: a matrix N is called nilpotent if some (positive) power k of N is zero: N^k= 0. Then all higher powers of N are also zero, and the infinite series for the matrix exponential of N terminates at the last nonzero power: it becomes a (finite) polynomial. (Yes, i’m taking k to be the smallest power of N which is zero.)

Returning to non-diagonable matrices, we construct a matrix P whose columns are eigenvectors and generalized eigenvectors. Along the way, we got all the eigenvalues of A. it turns out that we can write the SN decomposition

A = S + N

where

  • S is diagonable,
  • N is nilpotent,
  • and (very important) N and S commute, SN = NS.

Furthermore, our matrix P diagonalizes S:

D = P^{-1}\ S\ P\ .

but we come at that backwards: we get the eigenvalues, we construct the diagonal matrix D from them, and then we use P (which requires finding generalized eigenvectors) to get S as

S = P\ D\ P^{-1}\ ,

and then we get N as

N = A – S.

Finally, all of that is useful for the matrix exponential because if matrices B and C commute, then

exp(B+C) = exp(B) exp(C)

just as e^{x+y} = e^x\ e^y\ . Edit: (In fact, the converse holds, too: B and C commute if and only if exp(B+C) = exp(B) exp(C) ).

Correct is:

e^{(B+C)\ t} = e^{B\ t}\ e^{C\ t}\ \text{for all t if and only if BC = CB}

(Golub & van Loan, “Matrix Computations”, 2nd ed. p. 559)

end edit

Because S and N commute, we can compute the matrix exponential of a non-diagonable A as the product of two very simple matrix exponentials:

exp(A) = exp(S+N) = exp(S)\ exp(N) = P\ exp(D)\ P^{-1}\ exp(N)\ ,

i.e. as the product of: a similarity transform of a diagonal matrix D, and a polynomial in N.

books added – 25 May

References for generalized eigenvectors, the matrix exponential and for the SN decomposition.

one correction 30 Aug 2008, per Brian Hall’s comment below. see “edit” in this post.

First, let’s talk about books which are already in the bibliography. It’s pretty easy to find statements that for matrix exponentials,

exp(A+B) = exp(A) exp(B)

if A and B commute; edit: it’s a little rarer to find the full (and true) “if and only if” A and B commute.

Correct is:

e^{(A+B)\ t} = e^{A\ t}\ e^{B\ t}\ \text{for all t if and only if AB = BA}

(Golub & van Loan, “Matrix Computations”, 2nd ed. p. 559)

end edit

What’s very rare is the SN decomposition,

A = N + S,

where N is nilpotent, S is diagonable, and N and S commute.

Halmos’ “Finite Dimensional Vector Spaces” has very little about the matrix exponential, but it includes existence and uniqueness of the SN decomposition as an exercise! More generally, Halmos has 3 consecutive sections on triangular form, nilpotence, and jordan form, i.e. on the underlying nature of eigenspaces.

Strang’s “Linear Algebra and Its Applications” has an appendix on jordan form, but I don’t believe he mentions generalized eigenvectors. More generally, Strang illustrates the matrix exponential for solving matrix differential equations.

Stewart’s “Introduction to Matrix Computations” has a different definition of generalized eigenvectors.

I’ll be looking in all three of those to finally make sense of generalized eigenvectors. (It would be so easy if repeated eigenvalues always led to generalized eigenvectors, but they don’t; and I will plead that it all looks more reasonable after I know that generalized eigenvectors exist: I was long out of graduate school before I heard of them.)

In addition to linear algebra texts, we may find the matrix exponential in dynamical systems texts and in lie algebra / lie groups texts.

The following books have been added to the bibliography.

The classic text by Hirsch & Smale, “Differential Equations, Dynamical Systems, and Linear Algebra”, has the NS decomposition. This is one of “the” books. I’ll confess that their examples were a lot easier to follow than their theorems. Unfortunately, it’s out of print, so you need to seek a used copy; or go with Perko, below.

Be careful. There is a 2nd edition, with an additional author (i.e. Hirsch, Smale, Devaney) and a changed title: “Differential Equations, Dynamical Systems, & An Introduction to Chaos, Second Edition.” The back cover advertises “Simplified treatment of linear algebra”. It does not contain the SN decomposition.

Fortunately, there is another book on dynamical systems which has the SN decomposition, and it’s a book I like a lot: Perko’s “Differential Equations and Dynamical Systems”. In addition to having the NS decomposition, this has plenty of problems involving generalized eigenvectors, and I recommend it specifically for that, too.

Finally, one of my introductory texts on Lie stuff, Hall’s “Lie Groups, Lie Algebras, and Representations”, also has the NS decomposition. Any book on matrix Lie groups has to have the matrix exponential, but this is the only one in which I have found the SN decomposition.

Hall, Brian C. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Springer, 2003.
ISBN 0 387 40122 9.
[lie groups, lie algebras, representations; 25 May 2008]
This is a math book (for this subject, there are a lot of physics books). It is readable: it discusses its theorems and definiitons. It is primarily about matrix groups rather than lie groups, but it has lie groups in the appendices. As its title promises, it treats lie algebras, and representations. This is one of my favorite introductions to the subject.

Hirsch, Morris W., Smale, Stephen, and Devaney, Robert L. Differential Equations, Dynamical Systems & An Introduction to Chaos. Elsevier Acadameic Press, 2004 (2nd ed).
ISBN 0 12 349703 5.
[applied linear algebra, differential equations, dynamical systems, discrete systems, chaos; 25 May 2008]
This is a far different book from the 1st edition. It looks like a rather fine introduction to the first edition, and a good supply of examples.

Hirsch, Morris W., Smale, Stephen. Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, 1974.
ISBN 0 12 349550 4.
[applied linear algebra, differential equations, dynamical systems; 25 May 2008]
This is one of “the” books, even if it is out of print! (And Smale is a Fields medallist.) And yet, I find their examples a lot more informative than their theorems.

Perko, Lawrence. Differential Equations and Dynamical Systems. Springer, 2001 (3rd ed).
ISBN 0 387 95116 4
[applied linear algebra, differential equations, dynamical systems; 25 May 2008]
I really like this book: it’s readable, it’s got lots of examples and pictures, and it covers a lot of ground. It is an upper division / beginning graduate math book.