Regression 1 – Mathematica’s Eigenstructure Table for the Hald data

The data and its eigenstructure table

Let me show you how Mathematica®’s eigenstructure table is computed. The short answer is: use an algorithm proposed by Belsley, Kuh, and Welsch (henceforth BKW; see the bibliography page) – but applied to the data correlation matrix rather than to “column-equilibrated” data.

Yes, of course I’ll show all that to you.

And, as usual, although I used Mathematica for the computations, and we are looking at a property provided by Mathematica for regressions, the following calculations could be carried out in other computer languages.

Let’s use the Hald data. It turns out that Draper & Smith do exactly what I think BKW recommended, so at the very least, Draper & Smith and I read BKW the same way.
Read the rest of this entry »

Happenings – 2011 Jan 29

Two mathematical events during this past week were deeply satisfying.

One of them occurred Tuesday morning as I was waking up. I do not use an alarm clock, so I wake up gradually. I was… mulling over is a good verb for it… mulling over Monday’s post. By the time I was fully awake, I knew that I had made one mistake in terminology, and that I wanted to point out the distinction between a vector norm and a matrix norm.

This is a far cry from waking up knowing new mathematics (as Ramanujan did)… and, were I perfect, I would never need to wake up knowing I had made a mathematical mistake. But I’m not perfect, and I do make mistakes… and I was glad that my subconscious had pointed it out to my conscious mind.

So, although I did not deliberately set out to “sleep on” Monday’s post, I did in fact do so and it was worthwhile. I made an edit and added two comments to the post on Tuesday morning before work.

The other event occurred Wednesday evening: I now know how the eigenstructure table is computed in Mathematica®.
Read the rest of this entry »

Posted in diary. 1 Comment »

Regression 1: Multicollinearity in the Hald data – 1

Edited 2011 Jan 25: one “edit” and two “aside” comments, all pertaining to vector and matrix norms.

Introduction and Review

Let me say up front that this post closes by explaining and using the “Variance Inflation Factors” from Mathematica’s Linear Model Fit. If that’s what you’re looking for, do a find on “VIF”. (I didn’t know how they were computed when I looked at some of Mathematica’s properties of a linear regression. Now I do.)

Before we begin investigating multicollinearity of the Hald data, let’s review what we have learned about the data.

I have shown you three or four ways of selecting the best regression from a specified set of variables – subject to the very significant caveat that we haven’t constructed new variables from them. That is, we have not transformed the variables, or taken powers and products of the variables, or taken lagged variables, and other such things. Had I done any of those things, I would have included the newly constructed variables in the specified set.

One of the ways was simply to run all possible regressions. When we did that for the Hald data, we found that our selection criteria were divided between two of the regressions:
Read the rest of this entry »

Happenings – 2011 Jan 22

Last weekend was dominated by two things… a long technical post, and a football game.

I know better than to put five examples into one technical post… on the other hand, I think each one of them was a simple example. (Of course I think that; after all, I both planned and executed them.) It just took a long time to put it all together, and I didn’t get much else done. That’s hardly surprising.

As for the football game, if you follow these things you know that the Pittsburgh Steelers won last weekend and will be playing the New York Jets for the AFC championship this Sunday afternoon. Well, I won’t be doing mathematics Sunday afternoon.

Yes, I plan to put out another post for Monday. But I guarantee it will be shorter than the last one… I will simply split it into pieces, one of which can be finished today or early tomorrow. Whatever gets done will constitute the post.

I did make a little time in the evenings to make progress in Dummit & Foote’s “abstract algebra”. I think it’s going to be a challenge to get through the entire book in 10 months. That saddens me, because there are so many math books I want to get through… I simply don’t have enough time for all of them. (It would help if I stopped buying new books, but that’s inconceivable. So my reading lists keeps growing instead of shrinking.) In the meantime, I’ll keep trying to get through this book in less than a year.

I also found something new and interesting on the Internet. (Can you imagine that?)
Read the rest of this entry »

Regression 1: Linear Dependence, or Exact Multicollinearity

You want to read my comment of Feb 7 at the end of this post: there is an additional post you will want to read after you read this one.

Introduction

(Let me say up front that this was a long post to assemble, and I’m not at all sure that my editing got all of the “dependent” and “independent” right – so feel free to point out any mismatches.)

I had thought that I would begin this discussion by looking at the Hald data. It turns out, however, that we need to take a wider view before we specialize.

I have decided that it is a very good idea to begin an investigation of multicollinearity by investigating linear dependence. And I thought I didn’t like the term “exact multicollinearity” – except that it’s perfect for why I’m doing this post.

For one thing, “multicollinearity” has a vague definition: the term is used to describe set of vectors that are “close to” being linearly dependent… multicollinearity, then, is “approximate linear dependence”. But how close must it be before it’s an issue?

So let’s look at exact linear dependence first.

This is of more than pedagogical interest: what I am about to show you can be applied to multicollinearity.
Read the rest of this entry »

Happenings – 2011 Jan 15

Well, I’ve managed to put off this post until almost the last minute; but since I did not put out a technical post last Monday, I certainly have to put out a diary post today.

Why is it almost the last minute? Because I have to drive to a friend’s house to watch the Pittsburgh Steelers play the Baltimore Ravens. No, I’m not a football fan – but I most certainly am a Steelers fan. So I won’t be doing any math or blogging for the rest of the day.

(My alter-egos the kid and the grad student have already put in their time on chemical reaction design and abstract algebra respectively.)

As for last weekend… I got distracted by… mathematics. Can you imagine that?

Instead of writing about multicollinearity for the next post, I decided to write about linear dependence. Multicollinearity, after all, is “approximate linear dependence”. So let’s look at “exact linear dependence” first.

My own approach to isolating multicollinearity – figuring out which of the variables are involved – is exactly the approach one would use to isolate linear dependence. It seems to make perfect sense to introduce the subject by looking at linear dependence itself. And the more I look, the more convinced I am of that.

Anyway, that post is somewhere in stage IV: all the mathematics is done, and a fair bit of the discussion has been written. And that was probably all done last Saturday afternoon.

So what happened? Well, I decided to keep looking at multicollinearity itself. To put that another way, I just couldn’t tear myself away from the math. If you will, I was just looking ahead a few posts.

I really, really expect to put out the post about isolating linear dependence this Monday. (It’s already been four consecutive Happenings posts… let’s not make it five.)

As for the blog itself, it passed 90,000 hits last Sunday.

Happenings – 2011 Jan 8

There is a disadvantage to going so long without a diary post: it takes a while to figure out what I had been doing in the meantime, because there’s so long a time period to be searched.

I had started to put out a happenings post on Christmas morning, but I got distracted. Yes, distracted by Christmas presents… but also by the fact that I wanted to work on regression, multicollinearity in particular.

On the other hand, that may have been the last time I touched regression during these holidays. No, the Mathematica notebook shows that I put it down at noon the day after Christmas.

Whatever mathematics I’ve done since then, was done by two of my less grown-up alter egos… the kid and the grad student.
Read the rest of this entry »