one correction 30 Aug 2008, per Brian Hall’s comment and counterexample here. see “edit” in this post.
Let me emphasize something I take for granted: i’m writing about theory as opposed to computations by computer. every time I say, for example, that two matrices A and B are similar via
I am asserting that P is a transition matrix for a change-of-basis, and by definition it is invertible. In practice, the matrix P can be ill-conditioned, and computing on a computer may be extremely hazardous.
An updated version of a classic paper may be found here . Having just looked at it, I may pepper my theoretical discussion with some vaguely computational comments.
Back to my ivory tower. Well, it’s hardly that, but it’s still far removed from numerical algorithms.
The definition of the matrix exponential is easy enough to write: as the exponential of a number x has the expansion
But how do we actually compute it (in theory)? Well, we could just try computing the powers of A and look for a pattern, but there are easier ways. Far easier ways. (and just doing it numerically can be a bad idea: we can encounter disastrous cancellation between one power of A and the next.)
Even though I have a Mathematica® command which computes matrix exponentials for me, it’s nice to have some idea how I could check a computation. (I suspect that if the difficulties are numerical, anything else I try will be wrecked on the same rocks. OTOH, at least once a Mathematica® update broke some perfectly reliable code. It is plausible to try alternate methods.)
First off, if the matrix A is diagonal, with diagonal elements then its square is diagonal with diagonal elements and its kth power is diagonal with diagonal elements . That is, powers of a diagonal matrix are exactly powers of the diagonal elements. It follows that if A is diagonal, exp(A) is just the exponentials of the diagonal elements: exp(A) is diagonal with elements . We saw such an example here in Jz
when we computed the matrix exponential (note the negative sign among all the multipliers)
(All those multipliers are a minor issue: as we have seen before, multiplying a matrix by a scalar multiplies all the eigenvalues but has no effect on the eigenvectors of the matrix.)
Second, if a matrix A is not diagonal but is diagonable to D, then we have
and, in general,
That is, all the interior P’s and P-inverses cancel each other, leaving us with the same similarity transform. It follows that if A is diagonable to D, then
. So we just do a similarity transform of A to get D, take the exponential of D, and then reverse the similarity transform for A^k as for A. (As I said, may challenging in practice, but in theory either it exists or it doesn’t.)
(Although we need the transformation to diagonal form for getting exp(D), there was nothing special about D in the cancellation of terms: if A and B are similar, with
then powers of A and B are similar, with
If a matrix is not diagonable, then there are a number of different ways to handle it. If we could transform it to Jordan Canonical Form (JCF) , then we would just need to get the exponential of a Jordan matrix. But transforming a matrix to JCF is not all that easy, and there’s a lot of different kinds of them to find the exponential of.
While we’re on the subject…. If a matrix A can be diagonalized, the matrix P which accomplishes that consists of linearly independent eigenvectors. In fact, we see how diagonalization must fail if it fails. If we have enough linearly independent eigenvectors to form P, then P must diagonalize A. if A cannot be diagonalized, we must be unable to form P: we must not have enough linearly independent eigenvectors. (An nxn matrix which does not have n linearly independent matrices is called defective.)
But, if A cannot be diagonalized, we must still be able to bring it to JCF – or another non-diagonal canonical form – by a similarity transform, i.e. by a change of basis. I wish I had thought to ask a simple question years ago: what is the matrix that accomplishes that? Answer: we add columns of “generalized eigenvectors” to the columns of eigenvectors. There must be a similarity transformation; it just can’t have eigenvectors for all its columns.
One way to describe generalized eigenvectors g of A is this: let be an eigenvalue of the nxn matrix A of multiplicity m <= n; then for k = 1, … m, (i.e. for any power up to the multiplicity of ) any nonzero solution v of
is called a generalized eigenvector of A. An eigenvector satisfies that equation with k = 1.
If we find enough eigenvectors, we never go beyond k = 1. if we have an eigenvalue of multiplicity 2 that has only one eigenvector, then we look at , and stop. Et cetera.
It turns out that we don’t actually have to go to JCF. There is another kind of matrix for which it is extremely easy to compute the matrix exponential: a matrix N is called nilpotent if some (positive) power k of N is zero: . Then all higher powers of N are also zero, and the infinite series for the matrix exponential of N terminates at the last nonzero power: it becomes a (finite) polynomial. (Yes, i’m taking k to be the smallest power of N which is zero.)
Returning to non-diagonable matrices, we construct a matrix P whose columns are eigenvectors and generalized eigenvectors. Along the way, we got all the eigenvalues of A. it turns out that we can write the SN decomposition
A = S + N
- S is diagonable,
- N is nilpotent,
- and (very important) N and S commute, SN = NS.
Furthermore, our matrix P diagonalizes S:
but we come at that backwards: we get the eigenvalues, we construct the diagonal matrix D from them, and then we use P (which requires finding generalized eigenvectors) to get S as
and then we get N as
N = A – S.
Finally, all of that is useful for the matrix exponential because if matrices B and C commute, then
exp(B+C) = exp(B) exp(C)
just as . Edit: (In fact, the converse holds, too: B and C commute if and only if exp(B+C) = exp(B) exp(C) ).
(Golub & van Loan, “Matrix Computations”, 2nd ed. p. 559)
Because S and N commute, we can compute the matrix exponential of a non-diagonable A as the product of two very simple matrix exponentials:
i.e. as the product of: a similarity transform of a diagonal matrix D, and a polynomial in N.