Regression 1: ADM polynomials – 3 (Odds and Ends)

Edit Jan 29: a reference to the diary post of Jan 21 has been corrected to refer to Jan 14.

There are several things I want to show you, all related to our orthogonal polynomial fits.

  • Can we fit a 7th degree polynomial to our 8 data points? Yes.
  • We can do it using regression.
  • We can do it using Lagrange Interpolation.
  • Did Draper & Smith use the same orthogonalized data? Yes, but not normalized.
  • How did Draper & Smith get their values? They looked them up.
  • Were their values samples of Lagrange polynomials? No.

The bottom line is that starting with half-integral values of x, all I need is the Orthogonalize command, to apply Gram-Schmidt to the powers of x. I did that here. I don’t need to look up a set of equations or a pre-computed table of orthogonal vectors. Furthermore, I can handle arbitrary data which is not equally-spaced.

Can we fit a 7th degree polynomial to our 8 data points? Yes.
We can do it using regression.

Let me finish by showing you what happens if I add x^7. Here’s the result of centering the 8 years 1986..1993:

Here’s the dependent variable again:

Now compute powers of x, up through x^7, and append the y values… and set names:

Run stepwise… (I’ll remark that I almost always run stepwise rather than a single specific model. It gives me context for the fit of interest, in general.)

Oh, come on! That should work.

It did. We still get an error on the last regression, but we already know there is no problem with the first six fits (other than multicollinearity). Let’s check the inversion on the last fit: no warnings.

What’s going on?

Let’s try to look at the details of this regression.

Ah ha! That 1/0 occurs because n – k = 0: we have 8 observations and 8 variables. Notice the “1” sitting between two “Missing”? That’s the RSquared… it was computed just fine, thank you.

Since R^2 = 1, the sum of the squared errors must be zero, and if the sum of non-negative numbers is zero, then each one of them – each one of the squared errors – must be zero. We have a fitted equation with zero errors – but we also have n – k = 0 messing up many of our usual output results.

(I had originally run the previous posts using powers thru x^7… but the error message because n = k has nothing to do with the error messages that came from extreme multicollinearity, and I didn’t want to proliferate issues.)

We did indeed put a 7th degree polynomial exactly thru 8 points. Here’s the fitted equation… and replace Xi by X^i (and name the resulting polynomial “g”)…

Let’s take a look at it.. and the data points, too:

I wouldn’t really forecast using this… but let’s go four years out anyway. That is, instead of drawing the polynomial between -7/2 and +7/2, I go out to +15/2.

Yikes! That’s what x^7 will do to you. The original data is completely blown away by the absurdly large (negative) growth of the high-degree polynomial.

In summary,
we can use LinearModelFit to exactly fit the data
n = k will mess up some of the usual outputs

Oh, I could have just called for the one fit:

We are spared the error messages… except that they’ll show up as soon as we call for anything involving n-k:

(So it’s my fault they showed up at all: stepwise asks for information involving n-k, and that’s what generated the messages.)

We can do it using Lagrange Interpolation.

You may recall something called Lagrange Interpolation. It’s a formula for a polynomial that goes exactly thru a set of points. (It is crucial for Lagrange Interpolation that the x values be distinct. Similarly, the least-squares fit cannot go through all 8 points if the x values are not distinct.)

Mathematica® will do a Lagrange Interpolation for us. Here are the (x,y) pairs:

And here’s the command and the output for a fitted polynomial… and the answer rewritten… and, for comparison, the answer from the OLS fit (that’s why I named it – so I could refer to it here):

They are the same, as they should be.

Here is a a video showing Lagrange Interpolation. (As I said in the diary post of (edit, 14th not 21st) Jan 14, I have found an error in a different video lecture by this speaker, so keep your eyes open.)

Did Draper & Smith use the same orthogonalized data? Yes, but not normalized.

Let me show you what D&S used. The following matrix is one of several tables; it’s specifically for 8 observations. It is accompanied by equations for “orthogonal polynomials”. Basically, someone sat down and worked out equations rather than simply orthogonalizing the powers of x numerically.

It has mutually orthogonal – but not orthonormal – columns; computing Z’Z gets me all the dot products of the columns with each other… since all off-diagonal terms are zero, the columns are orthogonal to each other.

We understand that the square roots of the diagonal entries are the lengths of the vectors. Its second column

Is there any significant difference between their data matrix and mine?

Only the scaling. Take theirs and divide by mine, and square. Here is my orthogonal matrix from the previous post; its starting point was x = {-(7/2),-(5/2),-(3/2),-(1/2),1/2,3/2,5/2,7/2}…

The Orthogonalize command (applied to the transpose) gave us…

Divide their matrix by my matrix, term by term, and then square the entries:

Those values are precisely the diagonal elements we saw when we computed Zo’Zo: {8,168,168,264,616,2184,264}.

In other words, if I normalize their columns to unit vectors, I will get precisely the same matrix as if I orthogonalize my powers of x – provided I started with the centered data.

(There were any number of ways to show that relationship. The most straight-forward might have been to normalize each of their columns, and shown that the result matched my data…. Instead I chose to simply divide their data by my data, element by element.)

How did Draper & Smith get their values? They looked them up.

They have gone to a lot of trouble to get equations. It turns out they are general in one respect – they work for any number of observations. But they suffer from two defects. One, we have only been given equations through degree 6: if we want higher degrees, we’ll have to work them out or look them up. Two, the equations require that the x values be evenly spaced. But orthogonalizing the data will work whether or not the values are evenly spaced.

Here are the equations they wrote. You will see that each has a multiplicative constant \lambda \ . That’s part of what makes the equations general. (They cite many references for the following information, including a source of Fortran code; they might have computed it all themselves. In any case, I have scanned these equations and two subsequent tables from Draper & Smith.)

Here is the prescription for n = 8:

The numbers on the last row are the \lambda s\ . That “2” under their first column says that the column of integers was obtained from x (half-integers!) by multiplying by 2.

Here’s the code I wrote to implement the equations for n = 8. (It includes the \lambda s\ , so don’t worry about reading them from the image.)

(I chose to include the constant term.)

Here’s my output. First, each new power is a new row… then I transpose that answer so that each power is a column… then I compute Z’Z to confirm orthogonality of columns:

Let me show you something very important. Although the equations are general in that they contain “n”, the number of observations, they are implicitly specific in that they require that the entries in X be separated by 1.

Huh?

What happens if we start with 2x (integers) instead of x (half-integers? The resulting matrix does not have orthogonal columns.

So, not only do we have that list of \lambda s\ to pre-multiply the equations, but we have to start with integers for n odd, half-integers for n even.

I have a slightly better idea. I can get rid of the \lambda s\ – because I can figure out what they must be. (And I could let the code decide whether I need integers or half integers to start… and I could even let the code decide whether I already had half-integers! but I’m only going to show you how to figure out the \lambda s\ .

For 7 observations, I need centered integers:

Here’s the output: first with powers in rows, then transposed. In between, I multiplied by an identity matrix…. because I’m going to change some of the diagonal entries:

Well, column 2 has a common factor of 2… as do columns 4 and 6 – whoa! column 4 has a common factor of 4… and column 7 needs to be multiplied by 7… so I change entries in the diagonal matrix, and get:

In other words, I have scaled down the answers as far as possible while still having integers.

Did I get what they would have? Here’s their table for n = 7:


Finally, I had already multiplied column 2 by 2 in the code for n=8… after I divide by 2, I’ve got that \lambda = 1\ . I don’t need his table of \lambda s\ … and I could even rewrite the code to remove the lambdas from it.

But I’m not going to bother.

As I said, what they so laboriously construct could be accomplished on a case-by-case basis simply by orthogonalizing the data matrix (starting with integers or half-integers for equally-spaced data). And such a numerical orthogonalization will work even if the data is not evenly spaced. (I would try for approximately integral separation, but I can’t predict how amenable the data will be.)

Were their values samples of Lagrange polynomials? No.

Here, for example, is the Draper & Smith data for the quadratic term:

and they look like this:

Here is a graph of the second-degree Legendre polynomial:

Its integral from -1 to 1 is zero; this means it is orthogonal to the constant function = 1.

Let’s change the horizontal and vertical scales on the polynomial… and show the data points we actually have:

So: our data points are not samples of the corresponding Legendre polynomial.

Now let’s sample the Legendre polynomial. Here is a graph of the Legendre polynomial and 8 equally-spaced samples.

Fine. but what I want to know is: can I scale these samples to be orthogonal to the constant? NO: their sum would have to be zero. If it isn’t already, I can’t make it zero by multiplying by any nonzero scale factor.

Oh, Rip, don’t be silly. You don’t need to scale it – you need to shift it up or down! With a sum of 8 for the eight samples, I need to subtract 1 from each sample:

Now scale the result so that I have integers – without common factors:

OK, I just reproduced the Draper & Smith values.

Unfortunately, this process will not work for the quartic data, so as far as I can tell so far, there’s no easy way to get the discrete orthogonal polynomials from the continuous (Legendre) polynomials.

I’ll repeat the bottom line. If we start with integral or half-integral data, for odd or even number of observations, the Gram-Schmidt orthogonalization algorithm (the Orthogonalize command) will generate effectively the same answers as the equations or tables.

And the Orthogonalize command will work even if the x-values are not equally-spaced.

So just orthgonalize. Don’t go looking for pre-computed tables.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: