Happenings – 2010 Oct 30

As usual, I regret that I was not able to put out a technical post last Monday.

I was working on it… the fact is, right now I’m almost obsessed with it… but a couple of things conspired to keep me from finishing it last Sunday.

This particular regression post applies the 15 selection criteria (introduced in the most recent regression post) to all possible regressions that we could run on the Hald data, using the four given variables. There are, after all, only 16 such regressions. (Yes, that count assumes that every regression includes a constant term.)

It turns out that the selection criteria are relatively unambiguous: every one of the selection criteria selects one of two regressions. They weren’t unanimous, but there were only two candidates.

But I didn’t want to stop there… I wanted to learn more about the relative rankings of the regression. That turned out to be messy but straightforward.

Then I asked a different question. These selection criteria do not directly concern themselves with the t statistics of the coefficients. For my own purposes, I usually use far more relaxed standards (a t statistic greater than one in absolute value) than an economist would use (a t statistic greater than about two).

So I took my set of 16 regressions, and found the minimum absolute value of the t statistics in each one.

I was stunned by the results.
Read the rest of this entry »

Happenings – 2010 Oct 23

The only thing out of the ordinary this past week was a lecture about neutrinos at Cal (Berkeley). The speaker was Dr. Arthur McDonald, the director of SNOLAB, an experimental research facility located deep in a mine in Sudbury, Ontario, Canada.

If you’re curious about it, you can probably find most of the information on their website:

At my first opportunity, instead of looking up neutrinos, I looked up impact craters in Canada (I’m virtually certain that none of my books discuss in detail the consequences of neutrinos having mass). I recalled that Sudbury was the site of a large and ancient impact crater… that’s why it’s a good place to mine for metals. (No, I don’t know precisely why, but I more than suspect it is caused by what happens under the crater as it is formed.)

There is a handy list of impact craters, by continent or by age or by size, etc.
Read the rest of this entry »

Regression 1 – Selection Criteria


I will close my discussion of the (Mathematica®) properties of a regression with “selection criteria” or “measures of improvement”: how do we select the “best” regression?

While I will do a couple of calculations, most numerical work with these criteria will wait until the next regression post.

Mathematica provides 4 such measures: R Squared, Adjusted R Squared, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC).

In addition, I know almost a dozen more, and I’ll show them to you.

Almost all of them use the error sum of squares ESS; the R Squared and the Adjusted R Squared also use the total sum of squares TSS:

TSS = (y - \bar{y}).(y-\bar{y})

ESS = (y-\hat{y}).(y-\hat{y})

and the numbers n and k…

n = number of observations

k = number of parameters (\beta\ s).
Read the rest of this entry »

Happenings – 2010 Oct 16

I find it interesting… I have already finished a regression post for this coming Monday… but I’m struggling to write a diary post for today.

Things just are not “clicking”. Maybe it has something to do with being awakened at 5:30 AM by the neighbors noisily going somewhere this morning.

In any case, I’m struggling to get anything going. The kid tried particle physics briefly, until I discovered that Wolfram no longer knows the mass of an electron or of pions. It used to.

And there is a Mathematica package for abstract algebra… but it still seems an amorphous pile of commands. But I think if I were feeling sharper, I could begin to organize them for myself. It wouldn’t be so challenging if I could find all their commands on their palettes. Who knows? Maybe the problem is my searching rather than their palettes.

Those of us who have studied foreign languages have probably, sooner or later, understood that… if we don’t believe the word we’re looking for is in the dictionary, then it isn’t.

Of course, the reality is that we simply fail to find it because we don’t expect to find it.
Read the rest of this entry »

Posted in diary. 2 Comments »

Regression 1: Single deletion statistics


edited 2011 Sep 12 to correct the equation that leads to s(i).

This post continues the examination of regression properties available in Mathematica®. You might want to have read the previous “regression 1” posts, starting with the “Introduction”.

The following are all related to deletion of a single observation:

  • BetaDifferences,
  • CookDistances,
  • CovarianceRatios,
  • FitDifferences,
  • FVarianceRatios, and
  • SingleDeletionVariances.

I will discuss them in this post, along with

  • StandardizedResiduals and
  • StudentizedResiduals,

which are used in single-deletion analysis, but could certainly have been discussed apart from it. I will also introduce another form of residual.

My major reference for this underlying material is Belsley, Kuh, and Welsch – which is on my bibliographies page. I’m not positive, but I more than suspect they are responsible for the discovery of many of these statistics. In addition, I have been reading and will be using Atkinson’s “Plots, Transformations and Regression”, ISBN 0 19 853359 4, to see what these statistics can actually do for me.

The primary purpose of this post, however, is to show exactly how these statistics are computed. (For me, it’s not enough to have Mathematica® provide them; I have to know how they were computed.) I will, in fact, compute them for our Hald regression, but I won’t be doing a full-fledged analysis. That will come later, after I have more experience with them.

Here’s the key: It is a rather remarkable fact that we can deduce, from a given regression, what would happen to the fit if we removed exactly one observation.
Read the rest of this entry »

Happenings – 2010 Oct 9

I am still working pretty much exclusively on regression analysis. I’m even working on the post I planned, the so-called singular deletion statistics.

And I’ve seen some new ideas and information.

Let’s see. Although the residuals are not normal, each apparently is a singular normal… but since they don’t have the same variance, I think it doesn’t really matter: it seems silly to check the residuals for constant variance, and it seems almost as silly to check them for normality when we know they came from distributions with different variances (given our model assumptions). Conclusion: use the standardized residuals for tests of constant variance and of normality. Ah, no, I don’t yet know whether the standardized residuals are supposed to be normal if the disturbances are. But if it makes sense to test anything for normality, it would seem to be the standarized residuals.

Another idea. Someone suggested that instead of looking at all possible subsets of variables, we could simply look at the best regression with each possible number of variables, i.e. the best with j variables, as j runs from 1 to k, the maximum number. That is, find the best regression with 1 variable, then the best regression with 2 variables, etc. This is different from stepwise (or forward selection) in that it doesn’t require that one of the two variables be the one from the j=1 regression. To put that another way, at each step, having a best regression with j variables, stepwise looks for the best variable to add to that regression. The new suggestion is: don’t bother requiring any relationship between the regression with j variables and the one with j+1 variables — just get the best, for each j.

The downside is that this simply amounts to editing the list of all possible subsets, displaying the best one at each level. We still have to run every possible regression. For k variables that’s 2^k regressions; for k = 20, we’re talking more than 1 million. This idea doesn’t save time, and it’s feasible only when we can run all possible subsets.

Given all possible regressions, this is nothing more than selecting a subset. On the other hand, if the number of variables is large, then we could run all possible subsets up to some feasible number of variables.

It won’t replace stepwise in my toolbox, but it will augment it: I can imagine using it to start a stepwise analysis with more than 1 variable instead of 1 as I usually do. In addition, it will give us a different way of looking at all possible subsets. (The Hald data only has 4 variables; there are only 16 possible regressions. Looking at all of them is not only feasible, it is almost mandatory. And we will do it.)

Outside of regression… I’ve continued looking at the Bayesian analysis book. It’s got a chapter on spectrum analysis, and I have to do more with that….

That’s about all.

And now, math calls to me….

Regression 1 – Two kinds of Prediction bands

Among the properties I discussed in the previous post, there were a few that I could not explain. In particular, I could not explain the distinction between MeanPredictionBands and SinglePredictionBands. I knew that they were computing confidence intervals for a predicted value of yhat given an observation xo = {1, X1, X2, X4}, but not much more.

Okay, I knew that the MeanPredictionConfidenceIntervalTable listed confidence intervals using the MeanPredictionBands, for yhat for each of the observations in the given data. Similarly, the SinglePredictionConfidenceIntervalTable listed confidence intervals for yhat, using the SinglePredictionBands and the given observations.

The “bands” can be used on any data, however, not just the original observations.

Well, I now know how the two “bands” are computed… and, therefore, I know the distinction between them.
Read the rest of this entry »