the relationship between the raw and the orthogonalized data

OK, so we orthogonalized the hald data, including the constant (the column of 1s).

What’s the relationship between the new variables and the old? We might someday get a new observation, and if we were using the fit to the orthogonalized data, we might want to see what it predicts for a new data point.

(In all honesty, I would use the original fit – but I still want to know what the relationship is.)

My notation is a little awkward. I’m going to stay with what is used for this post, in which I first showed how to find….

Let me start fresh. If we have two typical data matrices (i.e. taller than wide), and they are supposed to be the same data, how do we find the relationship?
Read the rest of this entry »

Happenings – 2011 Nov 26

It’s not yet 10 AM as I begin to draft this, and it’s already been a productive day. I woke up at 3:30 AM… after failing to fall back asleep in an hour, I started correcting a mistake in one of the regression posts… I updated the post at 5:30 AM and went back to bed.

The mistake, in case you’re curious, is in the Toyota post of August 22: I meant to have Mathematica® display the correlation matrix of the data… but I actually asked for the correlation matrix of the parameters… thinking that I was computing the correlation matrix of the design matrix. Sheesh! I know better. I just wasn’t paying attention to the details.

Actually, I feel pretty good about that… over the Thanksgiving holiday I reread all 30 regression posts… and that was the only mistake I found. Of course, that doesn’t mean there aren’t more mistakes I made. I’d be very surprised if there are no more.

I came across some interesting numbers this week or last. Frankly, I don’t actually believe them – having no idea where they came from, but… the IRS (the United States tax collecting agency) has an approval rating of 36%… Paris Hilton has an approval rating of 15%… more surprisingly, British Petroleum had an approval rating of 16% during the Gulf of Mexico oil leak… and –
Read the rest of this entry »

Regression 1: eliminating multicollinearity from the Toyota data

We have seen that we can eliminate the multicollinearity from the Hald data if we orthogonalize the design matrix – thereby guaranteeing that the new data vectors will be orthogonal to a column of 1s. That, in turn, centers the new data, so that it is uncorrelated as well as orthogonal.

Doing that to the Toyota data will seem strange… because we have to do it to the dummy variables, too! But it will eliminate the multicollinearity.

I’m not sure it’s worthwhile to eliminate it… but we can… so let’s do it.
Read the rest of this entry »

Happenings – 2011 Nov 19

Mathematically speaking, it’s been a slow week and a slow day. My alter ego the kid has been playing with the very beginning of differential geometry – namely, curves in 3-D space – and with the very beginning of “Excursions in Modern Mathematics” by Tannenbaum and Arnold – namely, different ways of deciding an election.

It’s a wonderful book in general and I highly recommend it.

In fact, the kid has been having so much fun with voting methods that I had trouble stopping him so I could write this.

In addition, I spent some time searching for Arrow’s impossibility theorem – which says that there is no voting method satisfying a particular set of desirable properties, so we can stop looking for a voting scheme that is “fair” in that specific sense. (Arrow himself shared the 1972 Nobel Prize in economics.)

Here, let me offer you some links.
Read the rest of this entry »

Regression 1: eliminating multicollinearity from the Hald data

I can eliminate the multicollinearity from the Hald dataset. I’ve seen it said that this is impossible. Nevertheless I conjecture that we can always do this – provided the data is not linearly dependent. (I expect orthogonalization to fail precisely when X’X is not invertible, and to be uncertain when X’X is on the edge of being not invertible.)

The challenge of multicollinearity is that it is a continuum, not usually a yes/no condition. Even exact linear dependence – which is yes/no in theory – can be ambiguous on a computer. In theory we either have linear dependence or linear independence. In practice, we may have approximate linear dependence, i.e. multicollinearity – but in theory approximate linear dependence is still linear independence.

But if approximate linear dependence is a continuum then it is also a continuum of linear independence.

So what’s the extreme form of linear independence?


What happens if we orthogonalize our data?

The procedure isn’t complicated: use the Gram-Schmidt algorithm – on the design matrix. Let me empahsize that: use the design matrix, which includes the columns of 1s. (We will also, in a separate calculation, see what happens if we do not include the vector of 1s.)

Here we go….
Read the rest of this entry »

Happenings – 2011 Nov 12

This post is late going out– but at least it will go out Saturday night rather than Sunday. I had to do some homeowner things late this morning, and it took me all day to get back to this.

First off, the good news: I have a post for this coming Monday – how to eliminate multicollinearity from the Hald data. It awaits only final edits– barring an emergency, it should go out Monday evening. Finally, after 5 consecutive diary posts, I’m putting out a technical post again.

Let me add one more piece of mathematics which I think a math geek should know: there must be 12 pentagons. That is, we can build a surface from hexagons and pentagons subject to 2 constraints: there can be any number, including zero, of hexagons except just one; and there must be exactly 12 pentagons. We can make a soccer ball almost any way we like – but it must have exactly 12 pentagons and it cannot have only 1 hexagon.

Now, that’s not important mathematics – it’s just weird mathematics– one might say it’s more geeky than it is mathematical.

While I’m at it, let me add another topic for our well-rounded math geek: Fermat’s Last Theorem.
Read the rest of this entry »

Happenings – 2011 Nov 5

This weekend – for large values of “weekend”, namely 4 days – I have a friend staying with me. I’m not really on vacation anymore – but he is.

I have made progress on regression. I now know how to completely eliminate multicollinearity from a data set – but I will not have time to write this up this weekend. In other words, expect no technical post this coming Monday. Still, I have the material for one.

My visitor pointed out that has another list of 9 essentials for geeks: 9 Equations True Geeks Should (at Least Pretend to) Know.

If nothing else, this is worth looking at for all of the comments from readers that followed.

Generally speaking, if I were to propose such a list, I would focus more on relationships than on single equations. Let me give you a few examples… although I will omit almost all of the details.
Read the rest of this entry »

Posted in diary. 1 Comment »