## Regression 1 – Two kinds of Prediction bands

Among the properties I discussed in the previous post, there were a few that I could not explain. In particular, I could not explain the distinction between MeanPredictionBands and SinglePredictionBands. I knew that they were computing confidence intervals for a predicted value of yhat given an observation xo = {1, X1, X2, X4}, but not much more.

Okay, I knew that the MeanPredictionConfidenceIntervalTable listed confidence intervals using the MeanPredictionBands, for yhat for each of the observations in the given data. Similarly, the SinglePredictionConfidenceIntervalTable listed confidence intervals for yhat, using the SinglePredictionBands and the given observations.

The “bands” can be used on any data, however, not just the original observations.

Well, I now know how the two “bands” are computed… and, therefore, I know the distinction between them.

Let us suppose, as we did briefly once before, that the true model is

Y = X B + u,

where the u are independent normal mean zero random variables with variance $\sigma^2\$.

Assume that the fitted model is

yhat =X beta.

Suppose that we have one observation xo. We can immediately compute yhat using the fitted model:

yhat0 = xo beta.

Fine, that gives us a point estimate for yhat. How would we estimate the possible dispersion, the possible variation, associated with that yhat? Well, that’s what the variance is, a measure of dispersion. Let’s start there.

Question: how would we estimate the variance of yhat0 given X and xo?

I’m going to give you the answer without proof… not because it’s difficult, but because I want to put all of these derivations in one place – in one post – later. The variance of yhat given X is

$s^2\ xo'\ (X'X)^{-1} xo = xo'\ s^2\ (X'X)^{-1} xo = xo'\ cov\ xo\$,

where “cov” is the parameter covariance matrix. Let me isolate the answer in the form I use:

$xo'\ cov\ xo\$

This is the variance that is used to compute “mean prediction bands”.

The key fact is that yhat0 itself estimates the mean value of Y0, because $\beta\$ is an unbiased estimate of B:

Y0 = X B + u = yhat0 + e.

The word “mean” comes from that fact: our computed value of yhat0 is an unbiased estimate of the mean value of Y0. We understand the yhat0 to be the mean value of Y0, and understand the variance, standard deviation sy, and confidence interval to be for the mean value of Y0.

Question: what’s the variance of Y0 given X and xo?

Well, it is the sum of the variances of yhat0 and u – there is no cross-term.

Var(Y0) = Var(yhat0) + Var(u).

That is, we estimate the variance of Y0, as opposed to yhat0, by

$xo'\ cov\ xo + s^2\$.

And that is the variance which is used to compute “single prediction bands” as opposed to “mean prediction bands”.

Let me illustrate these… let’s compute these two variances, and for simplicity, let’s do it numerically first. In order to confirm our answers, let’s use one of the given observations.

For no particularly good reason, I choose observation number 6:

xo = X[[6]] = {1.,11.,55.,22.}.

(So far, we are using only one example: the Hald data, with independent variables X1, X2, X4. The fitted equation was

What’s yhat, for the 6th observation?

yhat[[6]] = 105.302 .

(Let me point out that Mathematica® does not require me to transpose the vector xo; it does the only possible matrix multiplication with the vector, treating it as a row on the left and as a column on the right.)

Now, exactly what do we do with that? Well, let me assert that yhat0 follows a Student T distribution with 13 – 4 = 9 degrees of freedom. Mathematica is perfectly happy to compute a 95% confidence interval (that’s the default) given the mean, standard deviation, and degrees of freedom:

Let me be clear: here’s the entry for the sixth observation.

I’d like to point out that the tabulated standard error .790942 is exactly what we computed for sy. But now we know where it came from.

Let me “get under the hood” of that confidence interval calculation. To do that, I need to know how many standard deviations out correspond to 95% of the area under that T distribution.

I looked it up the last time I needed it, but let me calculate it this time. It is possible that I can simply ask Mathematica for it, but I didn’t see any such thing. (That doesn’t mean it isn’t there.)

But I can find it anyway. I want 2.5% of the tail on the left, and 2.5% on the right. The T distribution is symmetric, so all I need to do is set the Student’s T cumulative distribution function (“CDF”) with 9 degrees of freedom to .975 and solve:

(What I had used before, from a table, was 2.262, for the drawing with two red dots. And if I had solved for .025, I would have gotten -2.262 – I prefer the positive value.)

Anyway, the upshot is that a confidence interval for yhat0 can be computed as

Yhat0 ± sy tc,

and for a 95% confidence interval, we take tc = 2.26216 .

Right?

So, we have verified a calculation of the mean prediction confidence interval for the sixth observation.

Now let’s do it using the larger variance – which includes the (estimated) variance of u. Actually, here is the square root “su”:

Let Mathematica do it; just supply su instead of sy:

Again, we match the entry for the 6th observation (i.e. for y = 109.2, yhat = 105.302). And, again, our computed su = 2.44047 is the standard error shown in the table for the 6th observation.

Again, let me check it “by hand”, using su instead of sy:

Okay, now that we’ve seen computations which match, let’s actually get the “bands”.

I want an arbitrary observation…

xo = {1,X1,X2,X4}

and its yhat:

Then we get sy:

Now compare it to the MeanPredictionBands:

Okay, let’s try the other band. We get su:

Then we get the SinglePredictionBands

So.

That’s how the bands were computed.

And the distinction between them is that one is using the variance of the mean value of Y, while the other adds in the variance of Y about its mean.