Quaternions: introduction

Introduction

Let me start by listing three references, none of which has everything I would want.

  1. Kuipers’ “Quaternions and Rotation Sequences” is on my bibliographies page.
  2. Kantor and Solodovnikov’s “Hypercomplex Numbers: An Elementary Introduction to Algebras” (Springer 1989, 0 387 96980 2) puts quaternions in the context of number systems.
  3. Pertti Lounesto’s “Clifford Algebras and Spinors” (Cambridge 1997, 0 521 59916 4) had one result I wanted, namely a complex matrix representation. I wouldn’t recommend this book for quaternions, but I find it indispensible for clifford algebras.

Quaternions were invented by William Rowan Hamilton on October 16, 1843. He may have considered them the greatest achievement of his rather stellar mathematical career. He had been looking for a 3D analog of the complex numbers; we know today that the properties he was hoping to find can only hold in dimensions 1,2,4,8. (They give us the real numbers, the complex numbers, the quaternions, and the octonions.)

The simplest way to define a quaternion is to write

q = a + b I + c J + d K,

where a,b,c,d are real numbers, and the symbols I, J, K have the following properties:

I^2 = J^2 = K^2 = I\ J\ K = -1\ .

That should remind you of the complex numbers: z = a + b I, with I^2 = -1\ . We have two additional imaginary units, and there is one additional property relating all three of them.

We add two quaternions the way we add complex numbers, by adding corresponding terms together: the two real terms, the two I-terms, the two J-terms, the two K-terms.

What about multiplying them?

To be precise, I should say that we assume that whatever the multiplication rule is, it is associative and that it is distributive over addition.

By writing I J K = -1 without parentheses I have already implicitly assumed associativity: namely, that (I J) K = I (J K). Let us play with that equation. We start with …

I J K = -1

Multiply by I on the left…

I I J K = -I

-1 J K= – I

and multiply by -1…

J K = I.

Similarly, we could multiply that equation by J on the left, and get

J J K = J I

-K = J I

Multiply that equation on the left by K, getting

– K K = K J I

1 = K J I.

Among other things, we have just shown that I J K = – K J I. (Order matters; the multiplication is not commutative.) We could continue doing things like this, and we would end up with

I J = K, K I = J, J K = I

and

I J = – J I, I K = – K I, K J = – J K.

These six equations are often taken as the defining relations among I, J, K, but I prefer to start with one, I J K = -1, as Hamilton did, because I can derive the six from the one.

The key is that multiplication amng the I J K symbols is not commutative: order matters. That should remind you of matrices. If you’ve seen matrix multiplication, you’ve seen a non-commutative multiplication.

Oh, not that I need to derive those six from the one. Those six equations correspond to the cross products of the unit vectors i, j, k:

i \times j = k, k \times i = j, j \times k = i

and

i \times j = - j \times i, i \times k = - k \times i, k \times j = - j \times k\ .

I can work them out using the right-hand-rule for cross products. That’s no accident. The vector cross product was derived from the quaternion product. The quarrel — and it was a quarrel — between those who championed quaternions and those who championed vectors was heated because a vector was originally part of a quaternion. In fact, it was Hamilton who first used the term vector in the modern sense, specifically for the IJK components of a quaternion. While Gibbs and Heaviside thought, as we do today, that the three-dimensional IJK parts of quaternions were particularly well-suited to three dimensional calculations, the champions of quaternions (Tait and Kelvin come to mind) viewed a separate vector analysis as an act of destruction, ripping quaternions apart.

We do not, after all, spend a lot of time on purely imaginary numbers — we pretty much require complex numbers. In contrast, vector analysis is tantamount to working with the imaginary part of quaternions, throwing away the real part.

Back to multiplication. Associativity and distributivity mean that we multiply quaternions the way we multiply complex numbers: just do it, but then collect similar terms (in I, J, K, and otherwise). Of course, we have to remember that products involving I,J,K do not commute.

I can say that more cleanly: expand the product using the distributive law, paying attention to the order of I,J,K, and collect terms.

Let me collect the terminology I’ve been using, and add some more.

For the quaternion

q = a + b I + c J + d K

a is called the real part;
b I + c J + d K is called both the imaginary part and the vector part;
if a = 0, the quaternion is said to be pure;
the conjugate q* is defined by negating the vector part, while leaving the real part untouched: q* = a – b I – c J – d K.

In addition, Mathematica® can deal with quaternions. Let’s take a couple. (Integers are convenient, not essential.)

Where I would write

q1 = 1 + 2 I + 3 J + 4 K
q2 = 1 – I + J – 2 K,

I would hand Mathematica these inputs… and get the same outputs… (and we need to explicitly load the Quaternions package):

Let’s look at that. In particular, where did that 8 come from? It came from 4 terms. (And remember that I^2 = -1, etc.)

  1. the product of the two real parts: 1 ** 1 = 1;
  2. the product of the two I parts: 2 I ** -I = 2;
  3. the product of the two J parts: 3 J ** J = -3;
  4. the product of the two K parts: 4 K ** -2 K = 8

and then 1 + 2 – 3 + 8 = 8.

Where did that -9 come from? It came from 4 terms, too.

  1. the product of one real and one I term: 1 ** -I = – I;
  2. the product of the other real and the other I term: 1 ** 2 I = 2 I;
  3. the product of one J and one K term: 3 J ** -2 K = -6 J K = -6 I;
  4. the product of the other K term and the other J term: 4 K ** 1 J = 4 K J = -4 I;

and then -I + 2I – 6I – 4 I = -9 I.

Similar calculations lead to the 4J and the 7K.

We could write that all out once and for all in general, by letting Mathemtica do the work:

If you don’t have Mathematica and need to code that up, go ahead.

Quaternions and vectors

We can also write that out another way. In addition to the functions which Mathematica provides, I wanted a few more of my own.

Mathematica can take the real part of a quaternion. Here are our two quaternions, each followed by its real part:

I want to extract the vector (imaginary) part of a quaternion; given a quaternion, it delivers a vector. (I’ve been doing this a lot.) Here’s the function… a quaternion, namely the product q1**q2… and its vector part:

Given that vectors are the imaginary parts of quaternions, we should expect that the scalar and vector products of vectors are contained in the product of two quaternions.

Let us look at the quaternion product of two pure quaternions. Now I want to extract a pure quaternion; given any quaternion, it delivers a pure quaternion by setting the real part to zero. (I’ve been doing this, too, quite a bit.) Here is the function definition… followed by q1 and a pure quaternion with the same vector part… followed by q2 and a pure quaternion with the same vector part…

We see that the real component of the quaternion product is the negative dot product of the extracted vectors; the vector part of the quaternion product is the cross product of the extracted vectors.

That’s where the vector cross product came from, historically!

Now, that was the product of two pure quaternions. What about the general case?

It becomes convenient to compress the vector part, symbolically writing

P = {po, p}
Q = {qo, q}

although the expressions for components barely change, acquiring a second set of braces around the vector parts:

(For those of you who’ve seen the Clifford product… the Clifford product of two vectors p and q would combine the dot product and the cross product as p.q + p*q, apparently adding apples and oranges.. The quaternion product has additional terms, except in the case of pure quaternions.)

Quaternions as 2×2 matrices

There is yet another representation of quaternions — as 2×2 complex matrices. We write

w + I x + J y + K z = \left(\begin{array}{cc} w+i z & i x+y \\ i x-y & w-i z\end{array}\right)

Let me write functions to do that translation. Going from a quaternion to a matrix is easy, because we have w, x, y, z.

Going the other way just requires the usual tricks when we want the real components of a complex number. Given the LHS matrix below, to find the RHS entries…

\left(\begin{array}{cc} a & b \\ c & d\end{array}\right)=\left(\begin{array}{cc} w+i z & y+i x \\ -y+i x & w-i z\end{array}\right)

we find w, for example, by adding a and d, and dividing by 2.

Let’s try it. Here are our two general quaternions P and Q converted to matrices c1 and c2 respectively, and their matrix product; note the order c2 times c1, even though I’m going to compare the matrix product with P**Q.

c1 = \left(\begin{array}{cc} a+i d & i b+c \\ i b-c & a-i d\end{array}\right)

c2 = \left(\begin{array}{cc} \alpha +i \delta  & i \beta +\gamma  \\ i \beta -\gamma  & \alpha -i \delta \end{array}\right)

c3 = \left(\begin{array}{cc} (i b-c) (i \beta +\gamma )+(a+i d) (\alpha +i \delta ) & (a-i d) (i   \beta +\gamma )+(i b+c) (\alpha +i \delta ) \\ (a+i d) (i \beta -\gamma )+(i b-c) (\alpha -i \delta ) & (i b+c) (i   \beta -\gamma )+(a-i d) (\alpha -i \delta )\end{array}\right)

We can confirm that the inverse works:

But why reverse the matrix product?

There is a reason for that. I am certain that I can find a matrix representation which does not reverse the product — y, I’m sure you can, too — but in the application of quaternions to rotations, we will find that we must reverse the matrix product. Since we have to reverse the matrix product when we deal with rotation matrices, we might as well be consistent and do it here, too.

(I do wonder if this choice of complex matrix has anything to do with a 2D representation of 3D rotations. If so, such a connection might mandate this choice of complex matrix. I’ll keep my eyes open.)

Having gotten the matrix c3, let’s get the corresponding quaternion, and compare it to P**Q; note that P ~ c1 and Q ~ c2 — that’s why I say the orders are reversed. Here is c3 converted to a quaternion… a re-computation of the quaternion product… and a test of equality…

In a specific example, here are our two familiar quaternions and their quaternion product:

Here are the two quaternions converted to matrices, q1 to c1 and q2 to c2…

c1 = \left(\begin{array}{cc} 1+4 i & 3+2 i \\ -3+2 i & 1-4 i\end{array}\right)

c2 = \left(\begin{array}{cc} 1-2 i & 1-i \\ -1-i & 1+2 i\end{array}\right)

Here is the matrix product c3 = c2 c1:

c3 = \left(\begin{array}{cc} 8+7 i & 4-9 i \\ -4-9 i & 8-7 i\end{array}\right)

And here is the product matrix c3 converted to a quaternion:

c3 -> Quaternion[8,-9,4,7]

which is exactly what we got for q1 ** q2 directly.

Absolute value, inverse, and division

What about magnitude? We recall that the squared magnitude of a complex number z = a + b I was

|z|^2 = a^2 + b^2

and, more importantly, that it could be computed as

|z|^2 = z z*,

i.e. by multiplying z by its conjugate. Strictly speaking, z z* is a complex number, and written as an ordered pair it’s (zz*, 0). But we are used to switching seemlessly between the complex number x + 0 i and the real number x; it just looks a little funny when we switch between the complex number written as an ordered pair (zz*, 0) and the real number zz*… or between the quaternion (qq*, 0, 0, 0) and the real number qq*, as we will below.

The same is true for quaternions; and Mathematica can compute the conjugate… and then we compute qq*, and we would identify the quaternion having zero imaginary part with a real number.

Since the result is real, the order doesn’t matter: qq* = q*q.

We can also get the square root just by asking Mathematica for the absolute value:

Once we had the sum of squares of a complex number, we were able to compute an inverse for any nonzero complex number. From z = a + b I, we got

z^{-1}\ = z* / zz*.

All we did was multiply the undefined symbol 1/z by an appropriate form of 1, namely z* / z*, and realize that we now had something we could compute: z* / zz*.

The same trick works for any nonzero quaternion:

q^{-1}\ = q* / qq*.

And the inverse is unique, because the conjugate is unique, and we’re dividing by a real number.

Once we had the inverse of a complex number, we could divide two numbers by multiplying by the inverse of the denominator, in either order, because complex multiplication is commutative…

z1 / z2 := z1\ z2^{-1} = z2^{-1}\ z1

The same thing works for quaternions. We can multiply by the inverse.

But.

We can do it in two different ways, and we will get two different answers. Division of quaternions is not unique, and we have to think of it as multiplication by an inverse… in either of two distinct ways, pre-multiplying or post-multiplying by the inverse.

Although the inverse is unique, the two products are not equal. Just as we might gleefully compute the matrix products A^{-1}\ B and B\ A^{-1}\ — but never write a matrix division, B/A — so we can write the two quaternion products

q2^{-1}\ q1 \text{ and } q1\ q2{^-1}

but never write the quaternion division q1 / q2 — because we can’t tell what order to use.

The inverse q2 is not only unique, but we could also say that it is both a left inverse and a right inverse — for q2. But there are two ways to “divide” q1 by q2, and they are different.

As an example, here’s q2 and its inverse q2i:

Now let’s look at unit quaternions. Just take a quaternion and divide by its absolute value. Mathematica can do that. The name, “Sign”, may seem weird. (It seems weird to me.) As with some other functions, this works for numerical quaternions. Applied to the symbolic quaternion P, it gives nothing; applied to q1, it gives a unit quaternion:

Quaternions in polar form

Now for one last representation: the polar form, magnitude and angle..

We know we can write a complex number z = x + i y as r\ e^{i\ \theta}\ , where r is the absolute value of z and e^{i\ \theta}\ is a complex number of length 1. Let me rephrase that:

  • we can write any complex number as the product of its absolute value times a unit complex number;
  • we can write any unit complex number as a complex exponential.

More, we can write any unit complex number as cos \theta + i\ sin \theta\ … and if we were to write that as an ordered pair, it’s (cos \theta, sin \theta)\ .

We can do almost the same thing for any quaternion. Step one is easy: given an arbitrary quaternion, we write it as its absolute value times a unit quaternion. We start with…

P = Quaternion[a,b,c,d]

Let M = \sqrt{a^2+b^2+c^2 + d^2}\ , and write

P = M P/M,

and u = P/M is a unit quaternion, say

u = Quaternion[A,B,C,D].

For step two… for our unit quaternion we have

A^2 + B^2 + C^2 + D^2 = 1

= A^2 + (B^2 + C^2 + D^2)

=: cos^2 \theta + sin^2 \theta

which suggests that we rewrite our unit quaternion as

u = (cos \theta, (\beta,\gamma,\delta) sin \theta)\ ,

where \beta^2 + \gamma^2 + \delta^2 = 1\ . That is, \beta\ sin\theta = B\ , etc.

Let me show you. We start with P in general, q1 in particular:

P = Quaternion[a,b,c,d]

q1 = Quaternion[1,2,3,4]

and we have seen that q1 is not a unit quaternion:

Abs[q1] = \sqrt{30}\ .

(That’s radians followed by degrees.)

That last line is the unit quaternion computed directly; the second-to-last line is our polar representation. They are they same.

And that’s the beginning of the connection between quaternions and rotations. The vector part of the quaternion may be taken as the axis of rotation… but we will see that twice the angle of the quaternion may be taken as the angle of rotation. We’ll see this in a later post.

Summary

We may write a quaternion as

  • q = a + b I + c J + d K
  • an ordered quadruple (a,b,c,d)
  • a real and vector part (qo, v) = (a, (b,c,d) )
  • a complex 2×2 matrix
  • its magnitude times a unit vector, q = M u
  • and we may write a unit quaternion u using an angle \theta\ and a unit vector (\beta,\gamma,\delta\ ), i.e. (u = (cos \theta, (\beta,\gamma,\delta) sin \theta)\ .

For the quaternion q = a + b I + c J + d K
a is called the real part of q = a + b I + c J + d K
b I + c J + d K is called the vector part or the imaginary part.

The products of the symbols I, J, K satisfy the equations: I^2 = J^2 = K^2 = I\ J\ K = -1\ . This implies that I J = – J I, etc. That is, multiplication of the symbols is not commutative.

We add quaternions componentwise.
We multiply quaternions using a sometimes messy-looking rule which amounts to applying the distributive laws, using the rules for multiplying the symbols I, J, K, and collecting the four real and imaginary terms.
We could write out the product of two quaternions as:

We could also write that product out as

The conjugate of q is defined as q* = a – b I – c J – d K.
The norm of q is defined as qq*.
The absolute value of q is the square root of its norm:|q| = Sqrt[qq*].
The inverse of q is defined as q^{-1}\ = q* / qq*.
We can construct a unit quaternion u from q by dividing q by is absolute value: u = q / Sqrt[qq*].

Mathematica permits us to define quaternions… multiply them… compute their conjugate, norm, and absolute value. In addition, Mathematica can compute a few other things, and I found it useful to augment the functions. We’ll see them all down the road.

Advertisements

5 Responses to “Quaternions: introduction”

  1. Chris Says:

    This is really cool. I never knew the historical context for vectors and cross products, or the connection with rotations. Excellent post.

  2. rip Says:

    Thanks, Chris. Welcome.

    Rip

  3. Richard Fowell Says:

    Excellent introduction!

    Kuiper’s material on quaternions was released as a government report in 1994 (five years before the book version). The report can be freely downloaded from the US Government Defense Technical Information Center, here:
    http://handle.dtic.mil/100.2/ADA322836

    You might add that link to the bibliography page, for those interested in a free electronic copy. (Myself, I prefer the hardcopy, of which I own two)

    My favorite text on introducing quaternions is the chapter in:
    Brand, Louis, “Vector and Tensor Analysis”, John Wiley & Sons, (c) 1947. { Chapter X, Quaternions, pp. 403-429 }.

    Used copies are available: try http://www.bookfinder.com

    Brand provides a rigorous introduction – proofs for everything. In 26 pages it provided me a solid foundation that was lacking in the engineering papers on quaternions I’d been relying on.

    One of the problems from the problem set at the end of that chapter “Prove the famous theorem of Euler (1776): Any displacement of a rigid body which leaves the point O fixed is equivalent to a rotation about an axis through O.”

  4. rip Says:

    Hi Richard,

    Thank you for all this info. I might look for the Brand book myself, and maybe others will be interested in that or the earlier Kuipers.

    rip

  5. Snow Says:

    Thank you for this info. Amazingly simple explanation that no one else seemed to be able to so eloquently break down.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: