Regression 1: Multicollinearity in the Hald data – 2


The Hald data turns out to have been an excellent choice for investigating multicollinearity: it has at least four “near linear dependencies”. I’m about to show the details of three of them. (And in a subsequent post, I think I can eliminate all but three of them – but not the same three!)

We have already seen one of them: we know that the four independent variables have a nearly constant sum just under 100. (These four variables are, in fact, a subset of a larger set of variables – whose sum was 100%.)

We have seen two approaches to finding (exact) linear dependence of a set of variables:

We used the singular value decomposition of the design matrix for all four variables (that is, our subset was the entire set) to discover that all four variables (with a constant, too) were multi-collinear. But I did not continue looking at all the subsets.

I think I will return to that approach, looking at all subsets… but not today. Instead, I want to look at an orthonormal basis for the closest thing we have to a null space. And I want to do it for four regressions which I decided were worth investigating.

I’m going to use one additional tool, the variance inflation factors – indirectly. You will see that I view them as one possible explanation for multicollinearity detected by the SVD. (And yes, that should be surprising.)
Read the rest of this entry »

Happenings – 2011 Feb 26

There is no way I should take the time to write this post… but I’m going to anyway. I want to talk about Freeman Dyson and problem solving and look briefly at some interesting mathematics… but I really should be working on two technical blog posts.

I have long known of Freeman Dyson as the theoretical physicist who reconciled the disparate theories of quantum electrodynamics for which Feynman, Schwinger, and Tomonaga won the Nobel Prize.

What I hadn’t realized was that he started out as a mathematician… and more, he was that most esoteric of mathematicians (at least for his day), a number theorist.

He tells a fascinating tale of compartmentalization, and I want to share it with you.

As for what I should be doing instead…. I found myself wishing, last Sunday, that I had a reserve post in my pocket… so I put some time into one during the week, but it’s not ready yet. (There’s one question left to answer: exactly what are the odds against improving a pair by drawing three cards?)

And, of course, I have a multicollinearity post to write. Three diary posts in a row, without a technical post in between. What kind of lazy dog have I become?
Read the rest of this entry »

Happenings – 2011 Feb 19

It’s been a slow week.

I’m still watching Babylon 5, but I’ve managed a little abstract algebra in the evenings. The kid looked at two-person zero-sum games this morning.

Oh, no post went out last Monday… I was distracted by another obligation on Sunday… and I needed to think about just what I can do concerning multicollinearity. I had no idea that I was opening such a can of worms; just identifying all the multicollinearity in the Hald data will take one or two or three more posts – and assessing its severity may be more than I can cope with at this stage. I may end up just laying out rough suggestions instead of clearly defined mathematical tools. I do believe I can clarify things… I’m just far from convinced that I can solve the things.

I’ve ordered a book on alternatives to ordinary least squares; it simply looked tempting while I was looking for books on multicollinearity.

I found something, and I’ve ordered it – but I don’t have any idea what it is, beyond the title, “Multicollinearity”. It’s “print-on-demand”, with no ISBN, so I conjecture that it is self-published. With any luck it’s a graduate level exhaustive and informative study – but why couldn’t it be published via normal channels? Maybe it’s just a Master’s thesis overview. Maybe it’s an undergraduate thesis. Maybe it’s 300 pages… maybe it’s 15 pages. And in any case, maybe it’s a pile of crap. For $38 including shipping, it seemed a worthwhile gamble. (Boy, I hope it’s not hand-written!)

I’ll let you know.

Meanwhile, the blog had passed 95,000 posts when I took a quick look last Monday morning.

As usual, my alter ego the grad student gets to do abstract algebra now; then I’ll pick up multicollinearity….

Happenings – 2011 Feb 12

Mathematically speaking, last weekend was wonderful. I saw the simple solution for simultaneously isolating and identifying linear dependence in a matrix. And I posted the simple solution last Monday.

I am, as you might expect, disappointed that I didn’t see the simple solution immediately… but there you have it: I’m a human being. I gladly give myself credit for seeing it at all.

I am, on the other hand, extremely optimistic that this simple solution can be used to isolate and identify multicollinearity… defined specifically as “near linear dependence”. (Many people seem to treat multicollinearity as though it were defined by its symptoms. That would be fine… if those symptoms were always caused by multicollinearity, but they’re not.)

In addition, I have decided that dealing with multicollinearity has two phases: isolation/identification, and assessing the severity. What do we have? And how serious is it?
Read the rest of this entry »

Regression 1 – Isolating & Identifying Linear Dependence

I almost named this post “Nailing Linear Dependence”. It’s so easy….


This post will show just how easy it is to isolate and identify linear dependence. Looking at subsets, as we did in this previous post, works – but what I’m about to show you is easier and faster. (Henceforth I will refer to that link as “the previous post” or once as “that post”.)

On the other hand, we will see a case where looking at subsets gives an equivalent answer that might be more appealing.

Now, I’m going to solve the same five examples as in the previous post. I am not going to duplicate the introductory material in the previous post, so if you need more context, please read that post. You might or might not want to spend much time looking at the details of examining subsets of the columns of X – that examination is what I’m about to replace by something more incisive.

As usual, I will use either XT or X’ to denote the transpose of X; and v’ or vt to denote the transpose of v.

Example 1

Read the rest of this entry »

Happenings – 2011 Feb 5

WordPress has been running extremely slowly for the past half day, so I have no idea when I will actually be able to put out this post. But I will write it up this morning – Saturday – as per my usual schedule, and I will put it out when WordPress permits.

Let me lead with a picture: this is called Heighway’s Dragon:

Let me start, however, with a different subject.
Read the rest of this entry »

Posted in diary. 2 Comments »