Regression 1: You inverted what matrix?

Edit: 2011 Nov 25: In “How the Hell” I have some negative signs. Find “edit”.

I want to show you something about LinearModelFit that shocked me. I will use the Toyota data. Let me point out that I am using version 7.0.1.0 of Mathematica@. Furthermore, this post is in two categories, OLS and linear algebra; the example comes from regression, but the key concept is the inverse of a matrix.

setup

Here it is again:

There are 3 columns of data:

We want to introduce three dummy variables, as before.

Here’s the dummy variable for the second region, 21-42 inclusive…

Finally the dummy variable for the third region, from 43 on.

I told you before that we should not even try to use all three dummy variables: they add up to 1 – and that’s the constant term in the design matrix.

(I keep checking that to make sure that there isn’t one zero sitting in there, or a two. You’ll see why I wish I had made a mistake in there.)

I want to rerun the regressions using dum2 and dum 3. Here’s the data matrix d2, the names list n2 for the regressions, and the longer names list N2 for some of my analysis:

Here’s my forward selection…

These are nothing new; we saw them in the original post about this data.

Let’s just look at the largest regression, using bac2 because it has the variables in their original order:

We saw in the first post that we have severe multicollinearity – but the fit is not particularly sensitive to the data, and all my selection criteria would choose this regression, with both AGE and MILES, over the regression with only AGE (and two dummies).

There is a problem, however, if we add the third dummy variable. Then the design matrix X will be of rank 5 rather than of rank 6, and X’X will not be invertible. We’re not talking severe multicollinearity here, we’re talking exact linear dependence.

What shocked me, when I prepared to demonstrate this, is that Mathematica® went right ahead and, in essence, inverted the matrix. Well, it got a ridiculous, wrong answer. If you want to see it right away, jump to the section “What the hell?”.

First let me show you what I think LinearModelFit computed.

two forms of a little trick

Get the design matrix and call it X. In fact, get the transpose X’ and the product X’X. We’re working with only two of the dummy variables; everything’s fine.

Get the eigenvalues and eigenvectors of X’X… print the eigenvalues… get V and the diagonal matrix of eigenvectors, such that

$\Lambda = V'\ (X'X)\ V\$,

or, equivalently, that

$X'X = V\ \Lambda\ V'\$ :

Check that we get $\Lambda\$:

Here’s one form of the trick. Given the diagonal matrix $\Lambda\$ of eigenvalues of any square matrix A, I could invert A by inverting the diagonal elements of $\Lambda\$. Don’t misunderstand: we did an awful lot of work to get the eigendecomposition; it’s just that having done all that work, inverting A is very easy.

Let’s do it. Here, A = X’X. $\Lambda i\$ is a diagonal matrix whose elements are $1/\lambda\$ instead of $\lambda\$. Then, in place of $V \Lambda V'\$, we compute $V \Lambda i\ V'\$, and call it $X'X^{-1}\$, in the justifiable expectation that it is the inverse…

Check it:

There was no need to do all that, of course; the inverse can be obtained directly by asking for it:

Let me consider an alternative, using the SVD of X instead of the eigendecomposition of X’X. We have

X = u w v’

Transpose:
X’ = v w’ u’

X’X simplifies because u is orthogonal (u’u = I):
X’X = v w’ u’u w v’ = v w’w v’

Invert (noting that $v^{-1} = v'\$):
$(X'X)^{-1} = v^{-T}\ \ (w'w)^{-1}\ v^{-1} = v\ (w'w)^{-1}\ v'$

and then compute the regression coefficients $\beta\$:
$\beta = (X'X)^{-1} X'y = v (w'w)^{-1} v' v w' u'y = v (w'w)^{-1} w' u'y = v W u' y\$.

That is,
$\beta = v\ W\ u'\ y\$,

with
$W = (w'w)^{-1}\ w'$.

I will use that later.

OK, just as we had $(X'X)^{-1}\ X'\$, we get $(w'w)^{-1}\ w'\$. Let the nonzero entries of w be $\sigma\$. The nonzero entries of $w'w^{-1}\$ are the $\sigma^{-2}\$, then $w'w^{-1}\ w'\$ has us multiply by $\sigma\$, so the net effect is just $\sigma^{-1}\$.

And what does that work out to be?

w is the same shape as x, so w’ is short and wide. OK, take w’ but invert the singular values $\sigma\$. So, if we do a singular value decomposition of a design matrix X, we can compute the regression coefficients without, strictly speaking, doing a matrix inversion. We just invert some real numbers.

Now let me show you an application of that! A bad application of that.

what the hell?

Let me show you why that trick may be relevant.

Before I put out the Toyota post, I had expected to demonstrate that we could not use all three dummy variables. Let’s see what happens when we do.

Here is a data matrix with all three dummies…

Here are two sets of names…

Now I call for a backward selection – and the very first one, which should fail utterly, works. (I have confirmed that a direct call to LinearModelFit also “works”, without any error or warning message.)

Consider the first regression in the list, the one with all the variables. What the hell? How did Mathematica manage to do that? He should not have been able to invert X’X.

Here’s the resulting parameter table:

O…kay. Four coefficients about 10^15. That’s nice. We’re looking at nonsense, as we should be.

Let’s look at the design matrix for that regression.

As was true for the Bauer matrix, in a long-ago post, so for this one: that last singular value is really zero. The matrix is – as we know in theory – of rank 5 instead of rank 6. (Without the Tolerance parameter, the singular value list would have had only five entries.)

We could look at the X.v numerically – I assure you, every entry in the last column is about $10^{-14}\$.

We could identify the vector that spans the null space:

Ok, every term has .5 instead of 1, but that vector is parallel to

– CON + D1 + D2 + D3

and that, in turn, says

CON = D1 + D2 + D3.

The tools we’ve used in the past work perfectly well to isolate and identify this exact linear dependence.

But what did Mathematica do?

I think it did something like the trick I showed you. To be specific, I think it inverted the singular values of X. I can’t literally reproduce the coefficients in the fit, but I get close enough that I’m satisfied that this is what went wrong.

By the way, if we ask for the inverse of X’X?

That error message is nice to see. What scares me is that if Wolfram changes the algorithm for “Inverse”, we may get nonsense answers – like that one – without an error message!

FYI, that inverse doesn’t work correctly; if it’s the inverse of X’X, then this product…

should have been an identity matrix.

So. Be very careful: LinearModelFit can deliver bullshit without even a “by your leave”.

How the hell

Here’s what I think LinearModelFit did.

Hang on. I don’t want to actually multiply out $W = (w'w)^{-1}\ w'\$, I just want to construct W by inverting the singular values individually. Let get the matrix w’, and a copy which I will modify. (I show only part of w; it’s got 57 columns, the rightmost 51 of which are identically zero.

Now I invert the six singular values, including that $10^{-15}\$ garbage:

Now I compute the coefficients for a regression as derived earlier:

$\beta = v\ W\ u'\ y\$.

By doing it my way, I got -8.11396 10^15 instead of -8.35193 10^15, 8.68853 instead of 6.75663, and -24.5693 instead of -23.9602.

Edit: oops. Although my coefficients for AGE and MILES have the right signs, all of my coefficients on the order of 10^15 have the opposite signs. They’re very close in magnitude – and that’s what I jumped on. I am no longer content. Let me think about this. The following paragraph is no longer valid.

I’m content. I don’t know _exactly_ how Mathematica did that, but my calculation is close enough to strongly suggest that they inverted the singular values of X – including the 10^-15 entry. (Since the matrix product is associative, maybe I can match their answers if I change the precedence in my product. This isn’t worth worrying about.)

So. Do not relax as soon as you get answers from LinearModelFit. I may make it a personal rule to call for the inverse of X’X, with X the design matrix. On the other hand, the singular values of X are a powerful indicator: 10^-15 is really zero.

Grrrr.

For the record, I repeat that I am using version 7.0.1.0. My personal regression tools notebook does not run under version 8 – and I have better things to do with my time than deal with the lack of backwards compatibility in Mathematica. Version 8 can do stuff with wavelets – and that’s when I’ll use it. Besides, if I wait long enough, they’ll bring out version 9… or even 10… and I can completely avoid updating god knows how many notebooks to version 8!

Again, grrrr.