Regression 1 – Multicollinearity in Review

As I draft this, I plan to do four things in this post.

  1. Summarize the methods I’ve used to analyze multicollinearity.
  2. Suggest that multicollinearity is a continuum with no clear-cut boundaries.
  3. Summarize the conventional wisdom on its diagnosis and treatment.
  4. Flag significant points made in my posts.

Let me say up front that there is one more thing I know of that I want to learn about multicollinearity – but it won’t happen this time around. I would like to know what economists did to get around the multicollinearity involved in estimating production functions, such as the Cobb-Douglas.
Read the rest of this entry »

Regression 1: ADM polynomials – 3 (Odds and Ends)

Edit Jan 29: a reference to the diary post of Jan 21 has been corrected to refer to Jan 14.

There are several things I want to show you, all related to our orthogonal polynomial fits.

  • Can we fit a 7th degree polynomial to our 8 data points? Yes.
  • We can do it using regression.
  • We can do it using Lagrange Interpolation.
  • Did Draper & Smith use the same orthogonalized data? Yes, but not normalized.
  • How did Draper & Smith get their values? They looked them up.
  • Were their values samples of Lagrange polynomials? No.

The bottom line is that starting with half-integral values of x, all I need is the Orthogonalize command, to apply Gram-Schmidt to the powers of x. I did that here. I don’t need to look up a set of equations or a pre-computed table of orthogonal vectors. Furthermore, I can handle arbitrary data which is not equally-spaced.

Read the rest of this entry »

Regression 1: ADM polynomials – 2

Let’s look again at a polynomial fit for our small set of annual data. We started this in the previous technical post.

What we used last time was

That is, I had divided the year by 1000… because, as messy as our results were, they would have been a little worse using the years themselves.

But there’s a simple transformation that we ought to try – and it will have a nice side effect.

Just center the data. Start with the years themselves, and subtract the mean:

I’ll observe that if we wanted to work with integers, we could just multiply by 2. In either case, our new x is not a unit vector.

Oh, the nice side effect? Our centered data is orthogonal to a constant vector.

Let’s see what happens.
Read the rest of this entry »

Regression 1: Archer Daniel Midlands (polynomials) – 1

Now I want to illustrate another problem, this time with the powers of x. The following comes from Draper & Smith, p. 463, Archer Daniel Midlands data; it may be in a file, but – with only 8 observations – it was easier to type the data in. Heck, I didn’t even look to see if it was all in some file somewhere.

raw data

I have chosen to divide the years by 1000; in the next post I will do something else.

The output of the following command is the given y values… I typed integers and then divided by 100 once rather than type decimal points.


Read the rest of this entry »

Regression 1: Example 8, Fitting a polynomial

I want to revisit my old 2nd regression example of May 2008. I have more tools available to me today than I did when I first created it – and it was originally done before Regress was replaced by LinearModelFit.

Recap: fitting a quadratic and a cubic

What I had was five observations x, five disturbances u – and an equation defining the true model: y = 2 + x^2 + u. Here they are:

Construct a full data matrix with x, x^2, and y:

Run forward selection… and backward selection…


Read the rest of this entry »

the relationship between the raw and the orthogonalized data

OK, so we orthogonalized the hald data, including the constant (the column of 1s).

What’s the relationship between the new variables and the old? We might someday get a new observation, and if we were using the fit to the orthogonalized data, we might want to see what it predicts for a new data point.

(In all honesty, I would use the original fit – but I still want to know what the relationship is.)

My notation is a little awkward. I’m going to stay with what is used for this post, in which I first showed how to find….

Let me start fresh. If we have two typical data matrices (i.e. taller than wide), and they are supposed to be the same data, how do we find the relationship?
Read the rest of this entry »

Regression 1: eliminating multicollinearity from the Toyota data

We have seen that we can eliminate the multicollinearity from the Hald data if we orthogonalize the design matrix – thereby guaranteeing that the new data vectors will be orthogonal to a column of 1s. That, in turn, centers the new data, so that it is uncorrelated as well as orthogonal.

Doing that to the Toyota data will seem strange… because we have to do it to the dummy variables, too! But it will eliminate the multicollinearity.

I’m not sure it’s worthwhile to eliminate it… but we can… so let’s do it.
Read the rest of this entry »

Regression 1: eliminating multicollinearity from the Hald data

I can eliminate the multicollinearity from the Hald dataset. I’ve seen it said that this is impossible. Nevertheless I conjecture that we can always do this – provided the data is not linearly dependent. (I expect orthogonalization to fail precisely when X’X is not invertible, and to be uncertain when X’X is on the edge of being not invertible.)

The challenge of multicollinearity is that it is a continuum, not usually a yes/no condition. Even exact linear dependence – which is yes/no in theory – can be ambiguous on a computer. In theory we either have linear dependence or linear independence. In practice, we may have approximate linear dependence, i.e. multicollinearity – but in theory approximate linear dependence is still linear independence.

But if approximate linear dependence is a continuum then it is also a continuum of linear independence.

So what’s the extreme form of linear independence?

Orthogonal.

What happens if we orthogonalize our data?

The procedure isn’t complicated: use the Gram-Schmidt algorithm – on the design matrix. Let me empahsize that: use the design matrix, which includes the columns of 1s. (We will also, in a separate calculation, see what happens if we do not include the vector of 1s.)

Here we go….
Read the rest of this entry »

Regression 1 – Example 7: Nylon Yarn

I have found some data which illustrates a few points I want to make when I summarize what I’ve shown you about ordinary least squares regression – I should be publishing a summary soon. Let me provide some evidence for part of my summary.

This example comes from Atkinson, “Plots, Transformations and Regressions”, Oxford Science Publications, reprinted 1988 ISBN 0198533594. It was example 8, on p 106, and it deals with the properties of nylon yarn.

He, in turn, took it from John, Outliers in Factorial Experiments, Appl. Statist. 27, 111-19, 1978.

I do not know of an online source for this data.

Here’s what we have:
Read the rest of this entry »

Regression 1 – Example 6 Revisited (Housing Starts)

There was more I had intended to say about the housing starts data, i.e. example 6. Here it is.

Review and Introduction

Recall that we had 5 independent variables

As usual, I want to have a list available which includes the name of the constant:

You might also recall that we have only 23 observations.

We had, also as usual, run both forward selection and backward selection:


Read the rest of this entry »

Follow

Get every new post delivered to your Inbox.