Introduction to CAPM, the Capital Asset Pricing Model

introduction

It can be difficult to find a clear statement of what the Capital Asset Pricing Model (henceforth CAPM) is. I’m not trying to do much more than provide that. In particular, I did not find the wiki article to be useful, even after acquiring a couple of recent books on the subject.

I own six references:

  • Sharpe, Wiliam F.; “Investments”, Prentice Hall, 1978; 0-13-504605-X.
  • Reilly, Frank K.; “Investments”, CBS College Publishing (The Dryden Press), 1980; 0-03-056712-2.
  • Gringold, Richard C and Kahn, Ronald N.; Active Portfolio Management, McGraw-Hil, 2000; 0-07-024882-6.
  • Roman, Steven; Introduction to the Mathematics of Finance, Springer, 2004; 0-387-21364-3.
  • Benninga, Simon; Financial Modeling, 3rd ed. MIT, 2008; 0-262-02628-7.
  • Ruppert, David; Statistics and Data Analysis for Financial Engineering; Springer 2011; 978-1-4419-7786-1.

There is more than one version of the CAPM… Roman (p. 62) tells me that “The major factor that turns Markowitz portfolio theory into capital market theory is the inclusion of a riskfree asset in the model…. generally regarded as the contribution of William Sharpe, for which he won the Nobel Prize…. the theory is sometimes referred to as the Sharpe-Lintner-Mossin (SLM) capital asset pricing model.”

Then Benninga (p. 265) told me about “Black’s zero-beta CAPM… in which the role of the risk-free asset is played by a portfolio with a zero beta with respect to the particular envelope portfolio y.” (We’ll come back to this, briefly.)

Let me begin by writing what is usually called the (security) characteristic line. (It’s actually a poor name because the first thing we write is not a line, but a line plus disturbances.) We begin by asserting a model, that

(R_i - R_f) = \alpha + \beta (R_m - R_f) + r_i\ .

R_i is a vector: the returns on an asset i (these may be daily, weekly, monthly, or perhaps even hourly)… think “IBM”.

R_f is a vector: the corresponding returns on the “risk-free investment”… think “30-year US Treasury bonds”.

R_m is a vector: the corresponding returns on the (risky) market as a whole… think “S&P 500”.

R_i – R_f is called the excess return of asset i; R_m – R_f is called the excess return of the market.

So what we’re looking at is nothing more than a linear model… to which we would want to fit data. I would not usually use r_i for the disturbances (errors) but it seems to be common in the literature. It is standard to assume that the expected value of the disturbances is zero: E(r_i) = 0. What we have is a form of the two-variable model

Y = \alpha + \beta X + e\

and we will do a least-squares fit to get a line

\hat{Y} = \hat{\alpha} + \hat{\beta} X\

and then we would be able compute residuals \hat{e}\ from the predicted values \hat{Y}\ (all of which lie on the fitted line):

Y =  \hat{\alpha} + \hat{\beta} X + \hat{e}\ so that Y = \hat{Y} + \hat{e}\ .

Let me emphasize that X is a vector in this case, and both \alpha\ and \beta\ are scalars (numbers… in particular, \beta\ is not a vector of coefficients and X is not a matrix, as they would be in the general case of more than one independent variable.)

I am using a very common alternative notation to my usual: a hat to designate an estimate of an underlying parameter. I usually use it to distinguish Y and \hat{Y}\ , but this time I’m also using it for \alpha\ and \hat{\alpha}\ , \beta \ and \hat{\beta}\ , and e and \hat{e}\ . There’s a reason for this: just about every one of my references does not clearly distinguish the underlying parameters \alpha\ and \beta\ from their estimates \hat{\alpha}\ and \hat{\beta}\ . I think that distinction is important to make.

I think I will try to refer to \hat{\alpha}\ and \hat{\beta}\ as “coefficients”, rather than to always call them “estimates of the parameters”. And I will do my damnedest to use \hat{\alpha}\ and \hat{\beta}\ when I’m supposed to. (Please point them out if you think I’ve missed any.)

Third,, I’m going to use this post here as a springboard to show you a very interesting alternative method of calculation for the 2-variable model.

I propose to do five things:

  1. review and rewrite the alternative calculation of \hat{\alpha} \text{ and }\hat{\beta}\
  2. use an illustrative example from Reilly:
    • using the alternative calculation
    • using the usual regression method
  3. review the jargon
  4. consider the variance of the excess return on an asset
  5. summarize it all briefly

The alternative algorithm

In the post about secrets of the correlation coefficient (same link as above) I showed you that it was algebraically convenient to write things in terms of centered data x and y (note: lower case) rather than the raw data X and Y (upper case), even though we were using a model and estimating parameters for X and Y. That is, for the model

Y = \alpha + \beta X + e\

and the least-squares line

\hat{Y} = \hat{\alpha} + \hat{\beta} X\ ,

it is convenient to define centered variables x and y as

y = Y - \bar{Y}\

x = X - \bar{X}\ ,

where \bar{Y}\ and \bar{X}\ are the means of Y and X.

Then I showed that we could write

\hat{\beta} = x\cdot y / x \cdot x\

and I assert that the fitted line goes thru the mean-point (\bar{X},\bar{Y})\ – that is, the point (\bar{X},\bar{Y})\ lies on the fitted line

\bar{Y} = \hat{\alpha} + \hat{\beta} \bar{X}\

– and we get \hat{\alpha}\ by rewriting the equation of the line:

\hat{\alpha} = \bar{Y} - \beta \bar{X}\ .

Then we can compute the R^2 as

R^2 = \beta x\cdot y/y\cdot y\

or

R^2 = (x\cdot y)^2 / x\cdot x / y\cdot y\

or

R^2 = \hat{\beta}^2 x\cdot x / y\cdot y\

or (taking a square root)

R = x\cdot y /  \sqrt{x\cdot x} / \sqrt{y\cdot y}\

So?

Now we take a step further and rewrite the expressions x.y, x.x, and y.y . Because they are centered data, we could write

x\cdot y = (n-1) Cov(x,y)\

x\cdot x = (n-1) Var(x)\

y\cdot y = (n-1) Var(y)\ .

(I have written sample covariances and variances, but it doesn’t matter: the common factors of (n-1) will cancel, and if we had used n everywhere instead, they would all cancel, too. We just can’t use one for variance and the other for covariance!)

What we get is

\hat{\beta} = Cov(x,y) / Var(x)\

and

R = Cov(x,y) / StDev(x) / StDev(y)\

or, my preference,

R^2 = \hat{\beta}^2 Var(x) / Var(y)\ .

We retain

\hat{\alpha} = \bar{Y} - \beta \bar{X}\ .

An aside:
Sharpe writes the equation

\beta = Cov(x,y) / Var(x)\ ,

and yes, I meant to omit the hat from \beta\ . Sharpe does a sample calculation using an underlying probability distribution – he is not fitting a line to sampled data, but computing from the underlying population. He uses the same formulas for \alpha\ and \beta \ as I am using for \hat{\alpha}\ and \hat{\beta}\ . This actually seems plausible to me, but I’m still thinking about it.

Reilly’s Example

introduction

Let me work a small example from Reilly.

We are given 13 month-end prices for IBM from Dec 1978 thru Dec 1979 – but I’m not going to show them. We compute the following percentage monthly returns:

capm 1

Check that these are ratios, using the first two given prices, 96.11 and 99.93:

capm 2

Another aside:
Each of those returns was computed by taking \frac{price(t)}{price(t-1)}-1\ . An alternative would have been to compute “continuously compounded returns” by taking the natural logarithm of the ratios… e.g.

capm 3

We are also given month-end prices for the S&P 500 over the same period. Again, we compute percentage monthly returns:

capm 4

We are apparently going to ignore the risk-free rate of return: we are setting R_f = 0.

the alternative calculations for the regression line

First we need the variance of Rm, and the covariance between Rm and Ri:

capm 5

From those, we can compute \hat{\beta}\ . I recall

\hat{\beta} = x.y / x.x\

but convert that to

\hat{\beta} = Cov(x,y) / Var(x)\

and then to

\hat{\beta} = Cov(Rm,Ri) / Var(Rm)\ :

capm 6

Next, we get the means of Ri and Rm…

capm 7

and compute \hat{\alpha}\ :

\hat{\alpha} = \bar{Y} - \bar{X} \hat{\beta} = E_i - E_m \hat{\beta}\

capm 8

Finally, we compute the variance of Ri,

capm 9

and then compute the R^2 and R for the regression:

R^2 = \beta^2 V(x) / V(y)\

which translates to

capm 10

(R is, in fact, the correlation coefficient between X and Y, i.e. between Rm and Ri.)

For the record, I would usually just recall

\hat{\beta} = x.y / x.x\

\hat{\alpha} = \bar{Y} - \bar{X} \beta\

R^2 = \hat{\beta}^2 V(x) / V(y)\

… then compute all 3 variances and covariances and the two means, and then write:

capm 11

Reilly got .806, -1.94, and .78 . OK. Incidentally, the R^2 is unusually good for this kind of data: a much lower value, 0.3, is typical.

So, that’s one way to compute the coefficients of a least-squares line fitted thru our (X,Y) = (Rm, Ri) data. This is how the computation is usually presented in discussions of the CAPM.

But we know the usual way to fit a least-squares line, and we know that it gives us more information.

LinearModelFit

I need a 2-column matrix holding Rm and Ri:

capm 12

We fit a line, naming the independent variable RI…

capm 13

Let me get the parameter table, the RSquared, and let me construct a function for the fitted line:

capm 14

We get \hat{\beta} = .806\ , \hat{\alpha} = -1.98\ , and R^2 = .608223, just as we did before. Good.

I am impressed that both of his t-stats are significant, considering that we only had 12 data points. (I don’t actually believe these t-statistics – but they’re all we have. In particular, it is doubtful that the disturbances are normal.)

A negative parameter \alpha\ would say the stock is over-valued; a parameter \beta\ less than 1 would that the stock is “defensive”, because it does not move as much as the S&P 500.

Let me elaborate on that. We have \hat{\alpha}\ less than zero – which suggests that \alpha\ might be less than zero, which would suggest that the stock is overvalued, which suggests that we might want to sell it if we own it, or short it if we do not.

But \hat{\alpha}\ less than zero does not guarantee that \alpha\ is less than zero, and it’s \hat{\alpha}\ that we know.

We also have \hat{\beta}\ less than 1. Now the t-statistic, if it can be trusted, says that it’s not likely that \beta\ is zero; it tells us nothing about whether \beta\ is greater than or less than 1. Still, \hat{\beta}\ less than 1 suggests that \beta\ might be less than 1, in which case the stock tends to move less than the market, and could be termed “defensive”.

Let’s look at the data and the fit (this is why I defined a function for the fit):

capm 15

jargon

Let’s return to our model:

(R_i - R_f) = \alpha + \beta (R_m - R_f) + r_i\ .

That includes the equation of a line:

(R_i - R_f) = \alpha + \beta (R_m - R_f)\

As I said at the beginning, that’s called the security characteristic line or just the characteristic line.

From Reilly I learned (p. 587) that “A risk-free asset is one for which there is no uncertainty regarding the expected rate of return; i.e. the standard deviation of returns is equal to zero ….”

He also told me (p. 588) that “The covariance between any risky asset or portfolio of risky assets and a risk-free asset is zero.” And that’s where Black’s zero-beta CAPM arises: if \hat{\beta}\ is zero then the correlation coefficient R is zero, so we could start by assuming an asset with zero beta instead of a risk-free asset. (It’s really the correlation that needs to be zero, I think. After all, \beta = 0\ does not imply \hat{\beta} = 0\ . That’s why we compute t-statistics, to see if a nonzero coefficient could be the accidental result of a zero parameter.)

Returning to our model, we can collect terms:

(R_i - R_f) =  \beta (R_m - R_f) + (\alpha + r_i)\ .

\beta (R_m - R_f)\ is called the market or systematic component of excess return; (\alpha + r_i)\ is called the nonmarket, or unique, or unsystematic component of excess return.

Sharpe says of r_i: “By convention, \alpha\ represents the expected nonmarket return, while r_i represents the deviation from this expectation. Before the fact, the best guess is that r_i will be zero. After the fact, it almost certainly will not be. Unexpected good news will cause r_i to be positive, while unexpected bad news will cause it to be negative.”

I guess I’m glad to see that he interprets the disturbances r_i as responses to information.

He also says, of \alpha\ : “A security that is priced correctly will have an ex ante alpha of zero in the eyes of well-informed analysts. And in an efficient market, all securities are priced correctly.”

Oh, I have said that Sharpe worked a numerical illustration using an underlying probability distribution, so that he was computing the parameters \alpha\ and \beta\ rather than estimates. His result for \alpha\ was a positive number, and he told us that the asset was under-valued. (I reason that its excess return is higher than it ought to be, given market conditions. Other assets with that return have higher prices than this one.)

Finally, Ruppert points out that we often want to assume that for any two assets i and j, r_i is uncorrelated with r_j. Not a likely assumption, so we might want to combine assets into sectors, and argue that the disturbances to sectors are uncorrelated between different sectors.

But speaking of the variance/covariance of r_i leads to the question of the variance of the excess return on an asset.

Variance Decomposition

If we assume that the disturbances are not correlated with the market returns,

Cov(R_m,r_i) = 0\

then from

(R_i - R_f) = \alpha + \beta (R_m - R_f) + r_i\

in the form

R_i = R_f + \alpha + \beta (R_m - R_f) + r_i\

we get

Var(R_i) = \beta^2 Var(R_m) + Var(r_i)\

because Var(R_f) = 0\ by definition of risk-free, and \alpha\ and \beta\ are constants, and Cov(R_m,r_i) = 0\ .

So we have split the variance of asset i into market and nonmarket components. Now I hate to say it, but no one actually believes the assumptions leading to that, but they use the result anyway. Let me quote Ruppert (p. 423):

“The validity of the CAPM can only be guaranteed if all these assumptions are true, and certainly no one believes that any of them are exactly true.”

OK… the discredited assumptions are used to decide what assumptions they really want – and that decomposition of the variance is something they really want to assume.

I have to ask, however, about the estimates \hat{\alpha}\ and \hat{\beta}\ . Should we just assume that we can actually write a similar decomposition for the coefficients \hat{\alpha}\ and \hat{\beta}\ ?

No… we don’t need to. There is a perfectly valid decomposition for the sample variances. Let me write it using X and Y, because I don’t want to write r_i hat for the residuals, and I want to use my familiarity with X and Y. It is true that

Var(Y) =  \hat{\beta}^2 Var(X) + Var(\hat{e})\ .

(I confess I was glad to remember that that was true, and why. I know that \hat{\beta}\ has a variance associated with it; I was quite surprised to see that it and \hat{\alpha}\ could apparently be treated as constants.)

That decomposition is an immediate consequence of the sum-of-squares (SSQ) decomposition, which in turn is true because in a least-squares fit, the residuals \hat{e}\ are orthogonal to the fitted values \hat{Y}\ :

Total SSQ = Regression SSQ + Error SSQ.

(TSS = RSS + ESS)

Let me remind you of that. What we have is

TSS = y\cdot y = (\hat{Y} + \hat{e})\cdot (\hat{Y}+\hat{e})\

= \hat{Y}\cdot \hat{Y} + \hat{e}\cdot \hat{e}\ (because \hat{Y}\cdot \hat{e} = 0)\

= RSS + ESS

= (\hat{\beta} x)\cdot(\hat{\beta} x) + \hat{e}\cdot \hat{e}\

so

y\cdot y = \hat{\beta}^2 x\cdot x + \hat{e}\cdot\hat{e}\

Divide that by n-1 (where n is the number of observations) and we get the decomposition for the sample variances (DVS):

s_Y^2 = \hat{\beta}^2 s_X^2 + s_e^2\

where each s^2\ stands for the unbiased estimate of the corresponding population variance \sigma^2\ . (So, for example, s_e^2 = ESS/(n-1).\ )

I find two things interesting. One is that the DVS requires only that we do a least squares fit to a straight line. The least squares property guarantees that the residuals \hat{e}\ are orthogonal to the fitted values \hat{Y}\ – and that’s what wipes out the cross-term \hat{e}\cdot\hat{Y}\ on the right hand side.

Now I need to quote Sharpe again (p. 107). “Three key measures summarize a security’s prospects…: \alpha_i\ , \beta_i\ , S(r_i) [= \sqrt{Var(r_i)}]\ . Estimates of these variables should be the responsibility of the security analyst. He or she may utilize statistical analysis of historical data… but fundamental knowledge of the company and industry in question can usually be employed to considerable advantage.”.

It seems to me that if the analyst’s estimates \hat{\alpha}\ and \hat{\beta}\ are NOT least-squares estimates, then the DVS does not hold: we just lost that lovely variance decomposition for the coefficients! (We may still have it for the underlying parameters – but we don’t know the real parameters.)

Two… staying with X and Y… we contrast two equations, for the estimate

s_Y^2 = \hat{\beta}^2 s_X^2 + s_{\hat{e}}^2\

and the model

\sigma_Y^2 = \beta^2 \sigma_X^2 + \sigma_e^2\

Lovely?

No.

An unbiased estimate of Var(e) = \sigma_e^2 \text{ is } \frac{ESS}{n-2}\ . I can’t pair up corresponding entries, because the last two do not correspond.

Huh?

It is not true that the estimated variance of Y is equal to \hat{\beta} ^2\ times the estimated variance of X plus the estimated variance of e. It is true if we replace “the estimated variance of e” by “the sample variance of \hat{e}\ “. They’re not the same thing.

And yet, in a sense, that’s a delightfully easy approximation. Run a regression, and take \hat{\beta}^2\ times the sample variance of the Xs and the sample variance of the residuals as the decomposition of the sample variance of Y.

That’s what I’m now expecting to see when I look at examples.

Just don’t tell ever me that the sample variance of \hat{e}\ is an unbiased estimate of the variance of e. More to the point, don’t tell me that the sample variance of the residuals is the nonmarket component of the variance. (Is that pointed enough? Jab!)

Let me illustrate the differences. For Reilly’s example of IBM vs. the S&P 500, we have the estimated variance s^2 of the disturbances:

capm 16

Now I need the residuals… their sum-of-squares ESS… and ESS/n-2:

d17
capm 17

Yes, that’s the estimated variance – that is, an unbiased estimate of the nonmarket variance.

Now lets compute and save the sample variance of Y (i.e. R_i), with the marvelously descriptive name t1:

capm 18

Next, compute the variance of X (i.e. R_m), and multiply by \hat{\beta}^2\ , and save that product as t2:

capm 19

Finally, compute and save the sample variance of \hat{e}\ , the residuals… and recall the estimated nonmarket variance s^2:

capm 20

The sum of squares decomposition says that

t1 = t2 + t3

capm 21

That the estimated non-market variance differs from the sample variance of the \hat{e}\ says that

t1 ≠ t2 + s^2

capm 22

And it does not, of course.

And now I think I know why people are so vague about the distinction between the parameters \alpha\ and \beta\ and their estimates \hat{\alpha}\ and \hat{\beta}\ : it’s awkward, to say the least. You get a set algorithm, and you compute, and you imagine that you really computed \alpha\ and \beta\ .

Oh, and if you, O Reader, are thinking, “Wait a minute! We know the mean of the residuals \hat{e}\ : it’s exactly zero. So the sample variance of the \hat{e}\ should be the error sum of squares divided by n, not n-1. We usually subtract 1 from n because we’re estimating the mean rather than using the population mean, because it’s unknown. Here, we know it. Divide ESS by n.”

I think I agree. But ESS/n is even further from ESS/(n-2). And the true decomposition

TSS = RSS + ESS

requires that we divide everything by the same number, whether it be n, n-1 (my preference), or n-2.

Summary

My summary of this entire thing? The simplest part of the CAPM is a standard linear model. Letting Y and X be the excess returns of an asset and of the market, we have

Y = \alpha + \beta X + e\

with disturbances e. We get a fitted line

\hat{Y} = \hat{\alpha} + \hat{\beta} X\ ,

which gives us

Y = \hat{Y} + \hat{e}\ ,

where \hat{\alpha}\ and \hat{\beta}\ are the least-squares estimates of \alpha\ and \beta\ .

We saw that the estimates could be computed using variances of X and Y, and the covariance between them.

As usual in such models, the expected value of e is zero; and the sum of the \hat{e}\ is zero. In addition, however, the CAPM tells us that if \alpha\ is nonzero, the asset is not priced correctly.

Moving on, we wish to decompose the variance of Y into the variance of X and of e, thinking of one as the market (systematic) component and the other as the unique (nonmarket, unsystematic) component:

\sigma_y^2 = \beta^2 \sigma_X^2 + \sigma_e^2\

In practice, I expect that they decompose using the sample variances of X and, incorrectly but easy, of \hat{e}\ :

s_Y^2 = \hat{\beta}^2 s_X^2 + s_e^2\

\frac{TSS}{n-1} = \frac{RSS}{n-1} + \frac{ESS}{n-1}\

I’ll be looking at the variance decompositions I see in practice.

Bear in mind that this post is based on a few schoolboy references… I’ve not been in the back room of a quant shop seeing what they really do! (Gringold and Kahn may give me more of that experience, but their mathematics seems a little too vague. That’s why I bought Ruppert, and Benninga, and there’s another book due in a couple of days.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: