Regression 1 – Example 7: Nylon Yarn

I have found some data which illustrates a few points I want to make when I summarize what I’ve shown you about ordinary least squares regression – I should be publishing a summary soon. Let me provide some evidence for part of my summary.

This example comes from Atkinson, “Plots, Transformations and Regressions”, Oxford Science Publications, reprinted 1988 ISBN 0198533594. It was example 8, on p 106, and it deals with the properties of nylon yarn.

He, in turn, took it from John, Outliers in Factorial Experiments, Appl. Statist. 27, 111-19, 1978.

I do not know of an online source for this data.

Here’s what we have:
Read the rest of this entry »

Advertisements

Happenings – 2011 Sep 24

(Edited 1 Oct to replace an equals sign by approximation. Find “Edit”.)

There are so many things I could talk about today but I’m not sure where to start or how many of them to cover.

So let me start at the beginning.

When I woke up this morning, I realized that I had answered a fairly long-standing question pertaining to multicollinearity. (Yes, I wake up dreaming about math.) Assume for the moment that we have exact linear dependence among the columns of a matrix X. That is, we have an equation

X C = 0.

Note that we can multiply that equation by any scalar a, so we really have

X (\alpha\ C) = 0\ .

Assume further that we have fitted the equation

\hat{y} = X\ \beta\ .

We could combine them as

\hat{y} = X\ (\beta+ \alpha\ C)\ .

In other words, even if we could find the beta vector, we could add any multiple of the C vector to it.
Read the rest of this entry »

Regression 1 – Example 6 Revisited (Housing Starts)

There was more I had intended to say about the housing starts data, i.e. example 6. Here it is.

Review and Introduction

Recall that we had 5 independent variables

As usual, I want to have a list available which includes the name of the constant:

You might also recall that we have only 23 observations.

We had, also as usual, run both forward selection and backward selection:


Read the rest of this entry »

Happenings – 2011 Sep 17

I can’t say that it’s been a slow week for mathematics – but it hasn’t been very productive either.

I did write up one of the dozen or so small things for which I’ve done the mathematics; namely, Andrews curves. They were mentioned by Jolliffe as a possible tool for deciding how many principal components to keep in a PCA analysis. I couldn’t figure out what they were from his description. But I’ve known for a while now, and I finally wrote them up.

I do hope, however, that I don’t have to publish that post this weekend. I really would like to build a reserve. But we’ll see what happens.

The real challenge is that I am torn between 3 possible posts… 3 possible data sets… for the next multi-collinearity post. If I can’t settle on one to work on, I won’t get any of them finished. So we’ll see what happens.

I’ve made a little more progress in Stillwell’s “Naive Lie Theory”.

In the meantime, I’ve been browsing. In somewhat specific attempts, I’ve been looking at single deletion statistics again, taking notes on discussions of the care and feeding of multi-collinearity: detection, isolation, severity, and how to cope with it. I’ve also been looking at more general regression estimators – ridge regression and Bayesian methods –and at the Box-Cox family of transformations of the dependent variable for minimizing the error sum of squares.

But that was either taking notes or just reading stuff – oh, I worked out one, just one, transformation.

More generally, I spent an evening flipping through all of my books on commutative algebra and algebraic geometry while watching reruns of Criminal Minds – it’s one of my smaller collections of books, only a dozen in fact. Here’s what I’ve learned (!):

Commutative algebra is essentially the study of commutative rings. Roughly speaking, it has developed from 2 sources: (1) algebraic geometry and (2) algebraic number theory. In (1) the prototype of the rings studied is the ring of polynomials k[x1, …, xn] in several variables over a field k; in (2) it is the ring Z of rational integers.

(Atiyah & MacDonald, “Introduction to Commutative Algebra”, 0–201–40751–5, p. vii.)

Oh, I’ve also learned that I won’t be starting with that book; I will probably use volume 1 of Zariski & Samuel, “Commutative Algebra”. My copy is old, 1958, and predates ISBN numbers. A quick check of Amazon shows only volume 2 available.

When I pick up algebraic geometry proper, I will resume with Cox, Little, & O’Shea’s “Ideals, Varieties, and Algorithms”, ISBN 0-387-97847-X.

In any case, however, I probably won’t get to this anytime soon.

Regression 1 – Example 6: Housing Starts

Time for another regression using Ramanathan’s data. Here’s the description of this data from the 4th edition data. See this post for information about obtaining his datasets.

(*
DATA4-3: Annual data on new housing units and their determinants
Source: 1987 Economic Report of the President. Because the housing
series has been discontinued, this data set could not be updated.
housing = total new housing units started, in thousands (Table B-50)
(Range 1072.1 – 2378.5)
pop = U.S. population in millions (Table B-30), Range 189.242 – 239.283
gnp = gross national product in constant 1982 dollars in billions
(Table B-2), Range 1873.3 – 3585.2
unemp = unemployment rate in % among all workers (Table B-35)
(Range 3.4 – 9,5)
intrate = new home mortgage yields, FHLBB, in % (Table B-68)
(Range 5.81 – 15.14)
*)
year housing pop gnp unemp intrate ;
1 1963 1985

It appears that the dataset matches table 4.10 of the 3rd edition. There are 6 vars. Get the data, and construct a data matrix d1 with the dependent variable (HOUSING) in the last column.
Read the rest of this entry »

Happenings – 2011 Sep 10

It’s been another slow week for mathematics. On the other hand, I do have a post almost ready for publication Monday evening. Still, that’s writing, not doing, mathematics.

I did, for what it’s worth, use quaternions to look at symmetries of the cube and the tetrahedron. (I ended last week’s diary post by asking whether I should do that or something else.)

I’m watching a tiebreak for the 1st set of the Federer-Djokovic semi final of the U.S. Open… and I’ll be keeping an eye on the match all morning. Ah, Federer just won the tiebreak and the first set… but we’re in rain delay, so it appears I’m not watching in real time.

Boy, am I not in real time! What they’ve been showing is last year’s match! Or even older. Damn!

As for the blog… it passed 125,000 hits on Thursday. As for spam… Spamhaus itself has posted a summary of the case I discussed last week.

Well, having a post already written for Monday evening… I will turn my inner child loose to do whatever he wants… my alter ego the undergraduate will put his time into Stillwell’s “Naive Lie Theory”… I wonder if my blogger can write up one of the dozen or so short pieces I have in mind… and I have no idea at all what mathematics my perpetual student will tackle today.

Regression 1: You inverted what matrix?

Edit: 2011 Nov 25: In “How the Hell” I have some negative signs. Find “edit”.

I want to show you something about LinearModelFit that shocked me. I will use the Toyota data. Let me point out that I am using version 7.0.1.0 of Mathematica@. Furthermore, this post is in two categories, OLS and linear algebra; the example comes from regression, but the key concept is the inverse of a matrix.
Read the rest of this entry »