attitude & transition matrices etc. – corrected 5-18, 6-13

i’ve made two changes in total, and you can search on “correction”.

“etc” is the inverse transition matrix, but I didn’t want a longer title.

I know of no natural case in which we specify a linear coordinate transformation by giving its inverse attitude matrix, as such, but i’ll keep my eyes open.

The key relationship among them is this: to say that P is a transition matrix is equivalent to saying that P^T is an attitude matrix. (The inverse transition matrix, of course, is P^{-1}\ .) in the special case that P is orthogonal, then the inverse is the transpose, P^T = P^{-1}\ , so the inverse transition matrix is the attitude matrix.

(OK, did you catch that? If our coordinate transformation is orthogonal, then the inverse attitude matrix is the transition matrix, so any time we specify a rotation by its transition matrix, we have just specified it by its inverse attitude matrix. This doesn’t count. I’m interested in the specification conceptually, and for that i know no case where we specify what should be understood primarily as the inverse attitude matrix.)

transition matrices

I can think of essentially two ways in which transition matrices are the natural expression of a linear coordinate transformation.

Before we get to that, just what is a transition matrix? it’s any invertible square matrix, interpreted as a change-of-basis, whose columns are interpreted as the old components of the new basis vectors.

That came out too convoluted. Any invertible square matrix may be considered a transition matrix. Then its columns are the old components of the new basis vectors.

That means that the effect of a transition matrix is to map new components of a vector to old ones. If x and y are the old and new components of some vector, and P is the transition matrix, then

x = P y.

(The columns of P are the old components of the new basis vectors; the new components of the new basis vectors look like (1,0,0), (0,1,0), etc.)

Now, back to our question: when might we find that a coordinate transformation has been specified by a transition matrix?

One very specific case is the diagonalization of a matrix: if A is diagonable to D, then

D = P^{-1}\ A\ P

where P is a transition matrix (specifically, one whose columns are eigenvectors of A; the eigenvectors are the new basis, so the columns of A (correction: the columns of P) are the old components of the new basis vectors). Let x be the old components of a vector, and let y be its new components. Then

x = P\ y

D is a matrix wrt to new components, so v = D y is a plausible thing to compute; but then

v = D\ y = P^{-1}\ A\ P\ y = P^{-1}\ A\ x

and then if we let u = P v (i.e. let u be the old components of v), we get

P v = A x = u.

That is (viewing A as an active transformation, rather than as a change of basis itself – we have one of those already!), we have two vectors. In old components they are x and u, and u = A x. in new components they are y and v, and v = D y.

Let me emphasize that for a similarity transformation, P need not be orthogonal. It can be, and we often choose it to be, but it doesn’t have to be. If P is not orthogonal, then we cannot replace P^{-1}\ by P^T\ .

More generally (and I wouldn’t count this as an additional case, but you might), the general change of basis equation for matrices A and B which represent the same linear operator is

B = Q^{-1}\ A\ P\ ,

where both P and Q are transition matrices. A and B need not be square, but they are of the same dimensions, and P and Q must be square and invertible. If A and B are not square, then P and Q are different sizes. The SVD provides an example of that: if X = u\ w\ v^T is 5×2, for example, then u is 5×5, w is 5×2, and v is 2×2. (Yes, the SVD is an example of a change-of-basis.)

The second way in which a transition matrix is the natural expression of the coordinate transformation is: when we want to make a change of variable, a substitution.

Suppose, for example, we are given a second degree polynomial equation in two variables. The general case is

A\ x^2 + B\ x\ y + C\ y^2 + D\ x + E\ y + F = 0\ .

I hope you know that’s a conic section: parabola, ellipse, hyperbola, or possibly a line or two. Its shape is anything we could get by intersecting an infinite (& double) cone with a plane.

Can we figure out what it is without simply plotting it? (And it’s not all that simple to plot, either.)

Yes. Rotate the coordinate system to get rid of the x y term. We can recognize such equations when we have no cross term.

Rotate the coordinate system? Change the variables to diagonalize the quadratic terms. Changing notation completely, we write

\left(\begin{array}{l} x \\ y\end{array}\right) = P\ \left(\begin{array}{l} u \\ v\end{array}\right)

where (x,y) are the old coordinates of a point (components of the position vector), and (u,v) are the new components of the position vector. That P maps new components to old tells us that P is a transition matrix.

Incidentally, we are going to write both row vectors (x,y) and (u,v) as well as column vectors. The equation for row vectors in the transpose

\left(x,\ y\right) = \left(u,\ v\right)\ P^T

Now define a matrix M of quadratic coefficients

M = \left(\begin{array}{ll} A & \frac{B}{2} \\ \frac{B}{2} & C\end{array}\right)

from the given equation, and figure out what P must be to diagonalize M. that’s easy: P must have eigenvectors of M for columns, and since P is to be a rotation, we simply make the columns of unit length.

If you’ve never seen that in a calculus & analytic geometry book, play around with it. The given quadratic terms are

\left(x,\ y\right)\ \left(\begin{array}{ll} A & \frac{B}{2} \\ \frac{B}{2} & C\end{array}\right)\ \left(\begin{array}{l} x \\ y\end{array}\right)

=

A\ x^2 + B\ x\ y + C\ y^2

After we diagonalize M, we haveM = P\ D\ P^-{1 } (note that I “solved for” M, not D). Then

(x,y)^T\ M (x,y) = (x,y)^T P\ D\ P^{-1} (x,y)

=

(u,v)^T P^T P\ D\ P^{-1} P\ (u,v) =  (u,v)^T D\ (u,v)\ ,

because P was made orthogonal. because D is diagonal, there are no cross terms in u v. We will have an expression in u^2 and v^2 (possibly with one or both coefficients zero!) and we can name that conic. If, for example, u^2 and v^2 have nonzero coefficients with the same sign, then the conic is an ellipse. Further, the transition matrix P tells us what the new coordinate system is, in which the conic has a simple form: the major and minor axes of an ellipse would lie on the u and v axes. But P only specifies a rotation, so we would still have to complete the square in the u,v coordinates to figure out where the center of the ellipse was.

BTW, if the only thing I needed to figure out was the type of conic, I would be done as soon as I had the eigenvalues: they are the diagonal of D, and they tell me what kind of conic I have.

So much for transition matrices. Well, the key is that it’s the transition matrix we need for change-of-basis equations for matrices, especially for diagonalizing matrices; then we just understand that the transition matrix maps new components to old.

inverse transition matrices

In a sense, they are the most obvious way to specify a coordinate transformation. But i’ve already shown you why we don’t always do it “the most obvious way”. In this case, I tell you that the new components are such-and-such functions of the old ones. If those functions are linear, as for example

\begin{array}{c} X=x \cos (\theta )+y \sin (\theta ) \\ Y=y \cos (\theta )-x \sin (\theta ) \\ Z=z\end{array}

then they could be written

\left(\begin{array}{l} X \\ Y \\ Z\end{array}\right) = \left(\begin{array}{lll} \cos (\theta ) & \sin (\theta ) & 0 \\ -\sin (\theta ) & \cos (\theta ) & 0 \\ 0 & 0 & 1\end{array}\right)\ \left(\begin{array}{l} x \\ y \\ z\end{array}\right)

and that matrix

Rz(\theta) = \left(\begin{array}{lll} \cos (\theta ) & \sin (\theta ) & 0 \\ -\sin (\theta ) & \cos (\theta ) & 0 \\ 0 & 0 & 1\end{array}\right)

is the inverse transition matrix. It must be: instead of mapping new components to old, it maps old to new. Letting w and W be old and new components respectively, we know

w = P\ W

P^{-1}w = W = Rz(\theta)\ w\ .

If it weren’t that transition matrices were so important, we would probably have named the inverse transition matrix instead. But that’s not how it goes.

Actually, the reason we use the transition matrix is that it contains old components of the new basis vectors. If we were to let Q = P^{-1}\ , then our diagonalization equation would become

D = Q\ A\ Q^{-1}

and what’s the big deal? Nothing. Couldn’t we just define Q to have eigenvectors as rows….

No, NO, Noooo! “Oh, no, Mr. Bill!”

That would be just fine if we made Q orthogonal, but if we don’t want an orthogonal transformation, then our eigendecomposition gets us P, and we would have to compute P^{-1} to get Q. Bear in mind that we almost never compute P^{-1} A\ P, so we almost never need to compute P^{-1}\ , because the eigendecompsition gives us D (from the eigenvalues); we don’t have to compute D. the transition matrix P is fundamental to a similarity transformation.

So, if I give you the matrix which maps old components to new, then I have given you the inverse transition matrix.

If the matrix is orthogonal, the inverse transition matrix is also the transpose. If the matrix is unitary, the inverse transition matrix is the conjugate transpose.

attitude matrices

I know of two ways in which an attitude matrix is the natural way to specify a coordinate transformation.

First, as i’ve said before, is that we are given the new basis vectors; that is, we are given the old components of the new basis vectors. If I tell you the new x axis is the vector (1,1,0), I have given you its old components. (its new ones, of course, are (1,0,0).)

Now lay out the old components of the new vectors as rows of a matrix. That matrix is called the attitude matrix. We do this because it gives us is a matrix-like equation. If the old basis vectors are named e_1\ , e_2\ , e_3\ and the new ones are named f_1\ , f_2\ , f_3\ , and if A is the attitude matrix, then we could write

\left(\begin{array}{l} f_1 \\ f_2 \\ f_3\end{array}\right) = A\ \left(\begin{array}{l} e_1 \\ e_2 \\ e_3\end{array}\right)

OK, that might be convenient. And that’s why we might specify the attitude matrix, if we’re going to write equations involving basis vectors by name.

The key property, however, it that the attitude matrix is the transpose of the transition matrix. Come on, it’s got the old components of the new basis vectors laid out in rows; the transition matrix has them laid out in columns: they are transposes of each other by definition.

The second way I know of is geometrical, usually for rotations but it need not be limited to them correction: for rotations only. If someone says you can get the new coordinate axes by rotating the old ones in such-and-such a way, he has just given you the attitude matrix.

For 3D, visualize a set of three orthonormal vectors at the origin, visualize it as a physical object; someone has told us how to rotate that object in space.

This arises, for example, with orbits, whether satellite orbits around the earth, or planetary orbits around the sun, or any others. Consider planetary orbits, only because i’ve played with these. Suppose we’re interested in the orbit of mars.

Our old coordinate system uses the plane of the earth’s orbit as the xy-plane; z is normal to earth’s orbit and the earth is moving CCW around the z-axis. The x-axis points toward the vernal equinox. (I think they specify a particular date, called the epoch, since the vernal equinox does move, albeit very slowly.) The y-axis is chosen to give a right-handed coordinate system. Oh, the origin is at the sun. This is a coordinate system for the solar system.

Now, voyager, to Mars. (Oh, those went to jupiter and beyond.) what we start with is the equation of a conic in the form

r=\frac{p}{e \cos (\theta )+1}

where r is the distance from the sun, and \theta\ is measured from perihelion (closest approach, called periapse in general).

If we want an associated cartesian coordinate system, it is very natural to take

  • x = r cos \theta\
  • y = r sin \theta\
  • z normal to the plane of the orbit, to give a right-handed coordinate system.

And then what we have is, for Mars anyway, an ellipse whose major axis lies on the x-axis, minor axis on the y, origin at the sun. Call these Mars coordinates.

Cool. Give me \theta\ , I can tell you r, i.e. where Mars is. On its orbit. Or give me time t, and I can get \theta\ . (The challenge is to get time t given the angle \theta\ .)

Yeah, but i’d like to know where it is wrt my old coordinate system: where in the solar system is Mars? strictly speaking, what i’m going to tell you is, where in the solar system is Mars’ orbit?

What people often give us, for the coordinate transformation, are three angles: inclination i, longitude of the ascending node \Omega\ , and argument of periapse \omega\ . What do they mean? How do we use them?

The orbit of Mars and the orbit of earth lie in distinct planes through the sun, which therefore intersect in a line, called the line of nodes. That line, of course, lies in both planes, and in particular, it lies in our old xy-plane. The longitude of the ascending node \Omega\ specifies the angle of that line wrt the x-axis, measured in whatever direction has Mars ascending (moving from negative z to positive z).

So if we rotate about the z-axis thru \Omega\ , we move our old x-axis to the line of nodes.

Now look down that line: we have two planes making an angle, and that’s i, the inclination. If we do a rotation about the new x-axis (i.e. about the line of nodes), we can move our old z-axis so that it’s lined up with the new one, Mars z-axis. Alternatively, we are moving our old xy-plane so that it lies in Mars’ xy-plane.

The only thing left is that our current x-axis is still the line of nodes. We need to rotate it to Mars’ x-axis; that’s a rotation about the new (Mars) z-axis, and the angle required is, you got it, the argument of periapse, \omega\ .

To get from the solar system coordinatesystem (old) to Mars’ coordinate system (new), we start with the old axes, rotate about z thru \Omega\ , then rotate about x thru i, and then rotate thru z by \omega\ .

What makes this so simple is that after we do, say, the first rotation about z thru \Omega\ , the next rotation is written as a rotation about x thru i. It is about the intermediate x-axis, the line of nodes, but the matrix is our standard regly “rotation about the x-axis”. (I’ll show you these in the next post.)

Any such sequence of rotations about coordinate axes is called an euler-angle sequence. They always exist. They are not unique – we used ZXZ, but there are 11 other sequences we could have used. And, in some cases, it can be difficult to automatically find the angles that work. But this kind of transformation is a typical application.

Oh, I need to repeat that the end result of that sequence of rotations is the attitude matrix between solar system and Mars, with the solar system as old coordinate system.

there is a picture of the angles here. at the bottom of the page.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: