Regression 1 – Mathematica’s Eigenstructure Table for the Hald data

The data and its eigenstructure table

Let me show you how Mathematica®’s eigenstructure table is computed. The short answer is: use an algorithm proposed by Belsley, Kuh, and Welsch (henceforth BKW; see the bibliography page) – but applied to the data correlation matrix rather than to “column-equilibrated” data.

Yes, of course I’ll show all that to you.

And, as usual, although I used Mathematica for the computations, and we are looking at a property provided by Mathematica for regressions, the following calculations could be carried out in other computer languages.

Let’s use the Hald data. It turns out that Draper & Smith do exactly what I think BKW recommended, so at the very least, Draper & Smith and I read BKW the same way.

We need the Hald data, in a matrix named d1…

We need to name the independent variables…

n1 = {X1, X2, X3, X4}

and to run a regression using all four independent variables:

What are we looking at? What’s to see?

Those four numbers close to 1 on the last row. Like the Variance Inflation Factors, they tell us – somehow! – that there is multicollinearity.

Just for comparison, let me recall the R Squares computed from the Variance Inflation Factors for that regression:

No, they are not the same as the four numbers on the bottom row of the eigenstructure table.

I’ve known for some time where part of the eigenstructure table came from: it comes from the correlation matrix of the data.

Drop the dependent variable, to get a data matrix with just the independent variables:

Get the correlation matrix of the data…

Get the eigensystem of the correlation matrix:

In case you’re checking, here is the eigenvector matrix V with eigenvectors as columns… the eigenvalues will be along very soon:

By the way, it is crucial to distinguish the “CorrelationMatrix” of the regression from the correlation matrix of the data. The former is the “parameter correlation matrix”, i.e. the correlation matrix of the coefficients \beta\ .

In text that follows, “correlation matrix” refers to the data correlation matrix, rather than to the regression “CorrelationMatrix” (one word with two UC letters), namely this:

(It might be telling us something that the parameter correlation matrix is so highly correlated. Let me investigate this, but not here and not now.)

the eigensystem and the eigenstructure table

Good, they’re the same. That was easy.

Now divide each of those into the largest… and take the square roots:

So, we know what the first two columns of the eigenstructure table are: the first is the eigenvalues of the correlation matrix; the second is the square roots of ratios of eigenvalues.

But what is the rest of the table, the so-called variance partitions, under the columns labeled by the variable names?

BKW’s recommendation

Let’s see what BKW recommended. (Mathematica’s documentation gives no hint, that I could see, that we should be doing this, but let’s try it anyway.)

BKW did not provide data for these calculations, except for one case — the modified Bauer matrix. Draper & Smith, however, did these calculations for the Hald data (and for the modified Bauer matrix, and for their “steam data”).

First, rescale the data so that each column has length 1. This is called column-equilibrated data. We know that in general we need to start from the design matrix instead of the data matrix – and we also know in particular that the multicollinearity of the Hald data involves the constant term

… and get the square roots of the diagonal elements of X’X. (The diagonal elements of X’X are the sums of squares of each column, so the square roots are the lengths of the column vectors). As a check, I want to see the sums of squares because I know the answer for the column of 13 1s – namely, 13.

(And, as usual, X’ symbolizes the transpose of X.)

Here are the diagonal elements and their square roots.

(And, as we expect, we see a “13” for the first diagonal element.)

Now divide each matrix entry by the length of the column it’s in; i.e. divide the (i,j) entry of X by len(j).

Just to be sure, check the lengths of the columns of M by looking at the diagonal of M’M:

OK. At this point I have a matrix of column-equilibrated data. Now I want to try a sequence of calculations described on pp. 106-107 of BKW.

Get the SVD of M, and for convenience get the list W of singular values as well (as opposed to using the matrix w – it’s not as though I have to work to get it!):

Divide each column of v by the square of the corresponding singular value:

Sum each row:

Divide each row by its sum, so that each row sums to 1:

Finally, take the transpose:

For comparison, here’s the core of the eigenstructure table again.

Hmm. Although the numbers are different, and the size is different, this 5×5 table we have computed has something in common with the eigenstructure table: all those numbers close to 1 on the bottom row.

Close but no cigar.

Furthermore, we’ve completely lost any connection to the eigenvalues of the correlation matrix.

And that suggests we try this algorithm on the matrix – namely the data correlation matrix – having those eigenvalues.

But hold up a moment. As I said, Draper & Smith do this calculation, and they present an answer on p. 380 (3rd ed.). Let me round off my answer:

That is exactly what Draper & Smith show in their Table 16.3.

Oh, they also show condition indices: just divide each singular value into the maximum singular value, and round to the nearest integer:

That’s exactly what they have.

So, we know what BKW suggest. (I actually used their equations and then compared my results to Draper & Smith, rather than use Draper & Smith’s equations.)

I suppose, to be pedantic, I should say that Draper & Smith and I agree on what BKW intended. I’d be even more confidant if I matched a calculation by BKW themselves.

But we know that what BKW suggest is not how the eigenstructure table is computed. Fine – we know where the eigenvalues come from, so let’s try…

the BKW algorithm on the correlation matrix

Well, exactly what did BKW have us do?

  • create a data matrix M with columns of length 1
  • get the SVD of M, with orthogonal matrix v and singular value list W.
  • divide each column of v by the corresponding W^2
  • sum each row
  • divide each row by that sum to get row sums = 1
  • transpose

Instead of doing all that to the data, we want to work with the correlation matrix. To be more precise, instead of doing all that to the v matrix from the SVD of the column-equilibrated data, let’s do it to the eigenvector matrix V of the correlation matrix.

They correspond.

Let’s review the relationship between the SVD of a matrix X and the eigendecomposition of the dispersion matrix X’X. If we have an SVD of X

X = u w v’

then we get an eigendecomposition of the square matrix X’X

X’X = (v w’ u’) (u w v’) = v w’w v’ = v \Lambda\ v’,

where \Lambda\ is a diagonal matrix whose entries are the eigenvalues of X’X.

(We also get one for XX’, but in this case it is X’X that corresponds to the correlation matrix.)

The key points are that the SVD matrix v is also the eigenvector matrix v; and the eigenvalues are the squares of the singular values (because \Lambda = w^T\ w\ . (Alternatively, the singular values are the square roots of the eigenvalues.) And that’s why the eigenstructure table divides \sqrt{\text{eigenvalues}}\ into the largest \sqrt{\text{eigenvalue}}\ , while Draper & Smith (and Belsley et al.) divided singular values into the largest singular value.

Of course, the correlation matrix isn’t actually of the form X’X – it has a scalar factor, too: we divide by N-1, where N is the number of observations. That factor of N-1 affects the eigenvalues but not the eigenvectors.

Anyway, the point is that instead of computing the SVD of M, we are going to compute the eigendecomposition of the correlation matrix; instead of using the squares of the singular values, we will use the eigenvalues.

And you might be comforted by remembering that we’re just trying to figure out how the eigenstructure table was computed. I don’t need to rigorously justify using the square roots of the eigenvalues instead of the singular values – in fact, the only important justification will be if I can compute the eigenstructure table as follows.

(Having said that, I really could make it all perfect by using “small standard deviates” rather than standardized variables, in order to have a data matrix Y such that Y’Y was exactly the correlation matrix of X; that’s what “small standard deviates” are for!)

We will replace

  • get the SVD of column-equilibrated data M, with orthogonal matrix v and singular value list W
  • divide each column of v by the corresponding W^2

by

  • get the eigendecomposition of the correlation matrix, with orthogonal matrix V and eigenvalue list \lambda\
  • divide each column by the corresponding eigenvalue

And yes, we already have the eigendecomposition of the correlation matrix: we have V and \lambda\ . Here’s the division by the eigenvalues:

Next we want to

  • sum each row
  • divide each row by that sum to get row sums = 1
  • transpose

OK, sum each row:


transpose…

and compare that to the core of the eigenstructure table:

Yes! I love it! Just what I was hoping to see: they’re the same.

We have just computed the variance partition of the eigenstructure table; and we got the eigenvalues and condition indices at the beginning of the post. We know how to compute the entire eigenstructure table.

Summary

BKW provide an algorithm for computing a variance partition. They apply it to a column-equilibrated design matrix and get three things: a list of singular values, a list of condition indices, and a square table of variance partitions. They do not carry out this calculation for the Hald data, but Draper and Smith do, and I got the same answers they did.

But our common answers do not match Mathematica’s eigenstructure table for a regression using all of the Hald data.

A modified version of the BKW algorithm, however, turns out to generate the eigenstructure table. We get a list of eigenvalues, a list of condition indices, and a square table of variance partitions.

But…

  • Instead of BKW’s list of singular values, we get a list of eigenvalues.
  • Instead of BKW’s condition indices as the ratios of singular values, we get the ratios of square roots of eigenvalues.
  • Where BKW get a variance partition for the constant term, we do not.
  • Where BKW did an SVD of the column-equilbrated data, we did an eigendecomposition of the data correlation matrix.
  • I emphasize that where BKW modified the data, we simply computed the correlation matrix of the data.

Of course, now that we know how the eigenstructure table is computed, we don’t ever have to compute it again — just ask Mathematica for it.

(For the record, if we were to try all this on the parameter correlation matrix, we would not even get the right eigenvalues, not to mention the variance partitions.)

Perhaps now is the appropriate time to remind you of the modified Bauer matrix:

X = \left(\begin{array}{ccccc} -74 & 80 & 18 & -56 & -112 \\ 14 & -69 & 21 & 52 & 104 \\ 66 & -72 & -5 & 764 & 1528 \\ -12 & 66 & -30 & 4096 & 8192 \\ 3 & 8 & -7 & -13276 & -26552 \\ 4 & -12 & 4 & 8421 & 16842\end{array}\right)

The key fact, as we saw here, is that column 5 is exactly two times column 4.

This is the only example of this algorithm for which BKW provided the data. I showed you that the fifth singular value was extremely small, with a value down around 10^-12. I also told you that that number is zero in principle, because the fifth column is an exact multiple of the fourth; and it is zero in practice, too: the computer cannot invert X’X.

BKW showed that fifth singular value as 1.3 x 10^-12. Then they constructed a column-equilbrated matrix from the modified Bauer matrix, and so on, and they computed its fifth condition index as 2 x 10^16.

No. That’s a poor computer approximation for infinity. I refuse to do that.

I can see what they accomplished, by using a finite number for 1 over the smallest singular value. They got a fifth line for the variance partition, whereas using the true value of the fifth singular value (namely zero), we would only get four lines – and we would not see multicollinearity on the fifth line.

I have to ask why would we want to?

Four nonzero singular values instead of five tells us the the matrix is of rank 4 instead of rank 5, and we have seen that it is rather straight-forward to isolate the linear dependence. (Heck, if were that easy to isolate multicollinearity, I wouldn’t even glance at variance inflation factors and the eigenstructure table!)

So, if I find linear dependence, as in the modified Bauer matrix, I’m going to deal with it as linear dependence. You can do what you like, of course.

Oh, I still owe us one thing. We know how the eigenstructure table is computed. But I have only vague ideas as to why, and what it really means.

When I figure it out, I’ll let you know. That might be next week, or it might be ten years from now. I’ll spend some time on it and see how it goes.

Let me close with additional comments. Until we know exactly what this algorithm is doing, we can’t very well tell whether it is better done with the correlation matrix (which gave us a 4×4 variance partition) or with column-equilibrated data (which gave us a 5×5 variance partition, i..e one that included the constant term), or with some other form of the data.

2 Responses to “Regression 1 – Mathematica’s Eigenstructure Table for the Hald data”

  1. James Says:

    I am a novice. Would you post the notebook you used in the article? So I can evaluate it by myself. Thanks a lot.

  2. rip Says:

    Hi James,

    Sorry, but no.

    It’s just not feasible. I can’t actually upload a text file the way I do an image… I can’t simply drop a notebook into a post – well, I can, but if you copy it into a notebook, it will take a lot of work to make it executable. More work than it would take to just type the commands yourself.

    Further, my notebooks are drafts, and I simply don’t want to review and edit all that text to bring it up to par. And that’s why I’m not willing to email a notebook….

    You’ll just have to type in the commands. Oh, I will add a comment to the first Hald data post explaining how I set the path to my data file.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: