Happenings – 2010 Oct 30

As usual, I regret that I was not able to put out a technical post last Monday.

I was working on it… the fact is, right now I’m almost obsessed with it… but a couple of things conspired to keep me from finishing it last Sunday.

This particular regression post applies the 15 selection criteria (introduced in the most recent regression post) to all possible regressions that we could run on the Hald data, using the four given variables. There are, after all, only 16 such regressions. (Yes, that count assumes that every regression includes a constant term.)

It turns out that the selection criteria are relatively unambiguous: every one of the selection criteria selects one of two regressions. They weren’t unanimous, but there were only two candidates.

But I didn’t want to stop there… I wanted to learn more about the relative rankings of the regression. That turned out to be messy but straightforward.

Then I asked a different question. These selection criteria do not directly concern themselves with the t statistics of the coefficients. For my own purposes, I usually use far more relaxed standards (a t statistic greater than one in absolute value) than an economist would use (a t statistic greater than about two).

So I took my set of 16 regressions, and found the minimum absolute value of the t statistics in each one.

I was stunned by the results.
Read the rest of this entry »

Happenings – 2010 Oct 23

The only thing out of the ordinary this past week was a lecture about neutrinos at Cal (Berkeley). The speaker was Dr. Arthur McDonald, the director of SNOLAB, an experimental research facility located deep in a mine in Sudbury, Ontario, Canada.

If you’re curious about it, you can probably find most of the information on their website:

At my first opportunity, instead of looking up neutrinos, I looked up impact craters in Canada (I’m virtually certain that none of my books discuss in detail the consequences of neutrinos having mass). I recalled that Sudbury was the site of a large and ancient impact crater… that’s why it’s a good place to mine for metals. (No, I don’t know precisely why, but I more than suspect it is caused by what happens under the crater as it is formed.)

There is a handy list of impact craters, by continent or by age or by size, etc.
Read the rest of this entry »

Regression 1 – Selection Criteria


I will close my discussion of the (Mathematica®) properties of a regression with “selection criteria” or “measures of improvement”: how do we select the “best” regression?

While I will do a couple of calculations, most numerical work with these criteria will wait until the next regression post.

Mathematica provides 4 such measures: R Squared, Adjusted R Squared, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC).

In addition, I know almost a dozen more, and I’ll show them to you.

Almost all of them use the error sum of squares ESS; the R Squared and the Adjusted R Squared also use the total sum of squares TSS:

TSS = (y - \bar{y}).(y-\bar{y})

ESS = (y-\hat{y}).(y-\hat{y})

and the numbers n and k…

n = number of observations

k = number of parameters (\beta\ s).
Read the rest of this entry »

Happenings – 2010 Oct 16

I find it interesting… I have already finished a regression post for this coming Monday… but I’m struggling to write a diary post for today.

Things just are not “clicking”. Maybe it has something to do with being awakened at 5:30 AM by the neighbors noisily going somewhere this morning.

In any case, I’m struggling to get anything going. The kid tried particle physics briefly, until I discovered that Wolfram no longer knows the mass of an electron or of pions. It used to.

And there is a Mathematica package for abstract algebra… but it still seems an amorphous pile of commands. But I think if I were feeling sharper, I could begin to organize them for myself. It wouldn’t be so challenging if I could find all their commands on their palettes. Who knows? Maybe the problem is my searching rather than their palettes.

Those of us who have studied foreign languages have probably, sooner or later, understood that… if we don’t believe the word we’re looking for is in the dictionary, then it isn’t.

Of course, the reality is that we simply fail to find it because we don’t expect to find it.
Read the rest of this entry »

Posted in diary. 2 Comments »

Regression 1: Single deletion statistics


edited 2011 Sep 12 to correct the equation that leads to s(i).

This post continues the examination of regression properties available in Mathematica®. You might want to have read the previous “regression 1” posts, starting with the “Introduction”.

The following are all related to deletion of a single observation:

  • BetaDifferences,
  • CookDistances,
  • CovarianceRatios,
  • FitDifferences,
  • FVarianceRatios, and
  • SingleDeletionVariances.

I will discuss them in this post, along with

  • StandardizedResiduals and
  • StudentizedResiduals,

which are used in single-deletion analysis, but could certainly have been discussed apart from it. I will also introduce another form of residual.

My major reference for this underlying material is Belsley, Kuh, and Welsch – which is on my bibliographies page. I’m not positive, but I more than suspect they are responsible for the discovery of many of these statistics. In addition, I have been reading and will be using Atkinson’s “Plots, Transformations and Regression”, ISBN 0 19 853359 4, to see what these statistics can actually do for me.

The primary purpose of this post, however, is to show exactly how these statistics are computed. (For me, it’s not enough to have Mathematica® provide them; I have to know how they were computed.) I will, in fact, compute them for our Hald regression, but I won’t be doing a full-fledged analysis. That will come later, after I have more experience with them.

Here’s the key: It is a rather remarkable fact that we can deduce, from a given regression, what would happen to the fit if we removed exactly one observation.
Read the rest of this entry »

Happenings – 2010 Oct 9

I am still working pretty much exclusively on regression analysis. I’m even working on the post I planned, the so-called singular deletion statistics.

And I’ve seen some new ideas and information.

Let’s see. Although the residuals are not normal, each apparently is a singular normal… but since they don’t have the same variance, I think it doesn’t really matter: it seems silly to check the residuals for constant variance, and it seems almost as silly to check them for normality when we know they came from distributions with different variances (given our model assumptions). Conclusion: use the standardized residuals for tests of constant variance and of normality. Ah, no, I don’t yet know whether the standardized residuals are supposed to be normal if the disturbances are. But if it makes sense to test anything for normality, it would seem to be the standarized residuals.

Another idea. Someone suggested that instead of looking at all possible subsets of variables, we could simply look at the best regression with each possible number of variables, i.e. the best with j variables, as j runs from 1 to k, the maximum number. That is, find the best regression with 1 variable, then the best regression with 2 variables, etc. This is different from stepwise (or forward selection) in that it doesn’t require that one of the two variables be the one from the j=1 regression. To put that another way, at each step, having a best regression with j variables, stepwise looks for the best variable to add to that regression. The new suggestion is: don’t bother requiring any relationship between the regression with j variables and the one with j+1 variables — just get the best, for each j.

The downside is that this simply amounts to editing the list of all possible subsets, displaying the best one at each level. We still have to run every possible regression. For k variables that’s 2^k regressions; for k = 20, we’re talking more than 1 million. This idea doesn’t save time, and it’s feasible only when we can run all possible subsets.

Given all possible regressions, this is nothing more than selecting a subset. On the other hand, if the number of variables is large, then we could run all possible subsets up to some feasible number of variables.

It won’t replace stepwise in my toolbox, but it will augment it: I can imagine using it to start a stepwise analysis with more than 1 variable instead of 1 as I usually do. In addition, it will give us a different way of looking at all possible subsets. (The Hald data only has 4 variables; there are only 16 possible regressions. Looking at all of them is not only feasible, it is almost mandatory. And we will do it.)

Outside of regression… I’ve continued looking at the Bayesian analysis book. It’s got a chapter on spectrum analysis, and I have to do more with that….

That’s about all.

And now, math calls to me….

Regression 1 – Two kinds of Prediction bands

Among the properties I discussed in the previous post, there were a few that I could not explain. In particular, I could not explain the distinction between MeanPredictionBands and SinglePredictionBands. I knew that they were computing confidence intervals for a predicted value of yhat given an observation xo = {1, X1, X2, X4}, but not much more.

Okay, I knew that the MeanPredictionConfidenceIntervalTable listed confidence intervals using the MeanPredictionBands, for yhat for each of the observations in the given data. Similarly, the SinglePredictionConfidenceIntervalTable listed confidence intervals for yhat, using the SinglePredictionBands and the given observations.

The “bands” can be used on any data, however, not just the original observations.

Well, I now know how the two “bands” are computed… and, therefore, I know the distinction between them.
Read the rest of this entry »

Happenings – 2010 Oct 2

Well, I’ve managed to do a little reading without computing things… and I’ve managed to do a few simple computations that make sense.

I’ve been reading a very little bit about ridge regression and principal component regression. It appears that either or both may be used when the independent variables are multi-collinear. Principal component regression just might show up here relatively soon; it will, however, probably be quite a while before I tackle ridge regression

(My major concern about principal component regression is to make sure that it really is exactly what it sounds like. I know perfectly well how to do principal components and how to do regression. I just need to know that the terminology really refers to the obvious computation: find the principal components of the data and use some or all of them as the independent variables, instead of the original data.)

I also found a new – to me – definition of “deletion residuals”… and they are used to compute a new – to me – measure of the best fitting regression. I did take a little time to confirm the definitions and to verify the computations on one example. These should both show up in technical posts soon.

The next regression post, however, may not be the “single deletion properties” that I intended… because I think I know how to calculate both the “mean prediction” and the “single prediction” bands which Mathematica® provides. I have verified some calculations for a 1-variable case – that is, the properties which Mathematica prints match a numerical example – but I still need to verify them for a multiple-variable case, and then I need to work out the derivations and confirm the calculations by hand.

That is, of the three major properties which I did not understand last weekend, I think I do now understand two of those properties… and it seems appropriate to explain them right after the post in which I failed to explain them.

I’ve also found what has started out as a very readable book on Bayesian methodology in statistics. I’m not likely to abandon my habitual practice in regression, but maybe I had better post it before this book makes me an apostate.

I won’t go into details, but you might simply look up Bayes Theorem. Wikipedia has a nice statement and some nice examples.

Now I’m going to turn my kid loose to see what mathematics looks like pure fun this morning… and then I’ll probably go to work on those regression bands.