**You want to read my comment of Feb 7 at the end of this post:** there is an additional post you will want to read after you read this one.

## Introduction

(Let me say up front that this was a long post to assemble, and I’m not at all sure that my editing got all of the “dependent” and “independent” right – so feel free to point out any mismatches.)

I had thought that I would begin this discussion by looking at the Hald data. It turns out, however, that we need to take a wider view before we specialize.

I have decided that it is a very good idea to begin an investigation of multicollinearity by investigating linear dependence. And I thought I didn’t like the term “exact multicollinearity” – except that it’s perfect for why I’m doing this post.

For one thing, “multicollinearity” has a vague definition: the term is used to describe set of vectors that are “close to” being linearly dependent… multicollinearity, then, is “approximate linear dependence”. But how close must it be before it’s an issue?

So let’s look at exact linear dependence first.

This is of more than pedagogical interest: what I am about to show you can be applied to multicollinearity.

Before I remind you of the definition of linear dependence, and its effect on matrices, let me summarize the issues. Conceptually, there are three issues; computationally, there is a fourth.

Given a collection of vectors, which we have assembled into a matrix, the three conceptual issues are:

- Detection: is this set of vectors linearly dependent?
- Isolation: if the set is linearly dependent, which vectors are involved?
- Identification: if the set is linearly dependent, exactly what are the dependencies?

The fourth issue arises whenever we are using a computer for the detection of linear dependence… the test for linear dependence in a square matrix, for example, is that the determinant of the matrix is zero…

- Computation: how small a number is “zero”?

I’m going to show you several examples, but there’s no reason not to give you the answers up front.

- Detection: a set of n vectors is linearly dependent if it has fewer than n nonzero singular values.
- Isolation: we can locate the dependencies by looking at subsets of columns of data.
- Identification: we can use regression to compute the dependencies once we isolate them.
- Computation: the computer itself can tell you that the matrix is singular; as far as I’m concerned, that identifies a number as “zero” on the computer.

## Example 1 and a review of linear dependence

Of course there will be examples, but I’m also going to have one in hand while I remind you about linear dependence.

In what follows, I will use either XT or X’ to denote the transpose of X. X’ is more convenient in equations… but, in fact, my transpose matrix is usually named XT.

The following matrix comes from Belsley, Kuh, and Welsch. They call it the modified Bauer matrix because it is based on something from Bauer. This is probably a famous example… but when I get through with it, I hope you’ll see why it should be called infamous.

I would encourage you not to look too closely at this matrix; let our analytical tools dissect it instead.

**Detection: Is this matrix linearly dependent?**

Recall from linear algebra that a set of vectors v1, …, vn is linearly dependent if there exist scalars (for our purposes, real numbers), not all of them zero, such that

c1 v1 + … + cn vn = 0.

A set of vectors is linearly independent if it is not linearly dependent.

It’s a 6×5 matrix. If we view it as 5 column vectors each of length 6, then we have 5 vectors in a 6-dimensional space; they could be linearly independent, and then we would say that the column rank of this matrix was 5. More generally, if fewer than 5, say c, of them were linearly independent, we say that the column rank was c.

On the other hand, if we view the matrix as 6 row vectors each of length 5, then we have 6 vectors in a 5-dimensional space: the 6 rows cannot be linearly independent. If some subset of 5 rows is linearly independent, then we say that the row rank of the matrix is 5. More generally, if some subset of r rows is linearly independent, and any larger subset is not, then we say the row rank of the matrix is r.

You should recall a theorem from linear algebra telling us that the row rank and the column rank are always equal. This lets us speak of the rank of a matrix.

The largest possible rank for a matrix is either the number of rows or the number of columns, whichever is smaller. Whenever a matrix is of less than its largest possible value, we would say that the matrix was of less than full rank, and/or that its columns were linearly dependent.

This is rather interesting. It says, for example, that if only 4 columns are linearly independent, then any set of 5, rather than 6, rows must be linearly dependent.

I hope you’ve seen this in 2D and 3D. Two vectors in the xy-plane are linearly dependent if they don’t span the plane – i.e. they lie on one common line, i.e. they are either parallel or anti-parallel, i.e. either is a multiple of the other. And the equation of that line expresses their linear dependence.

Similarly, three vectors in xyz-space are linearly dependent if they don’t span the space – i.e. if they lie on one common plane. And the equation of that plane expresses their linear dependence.

OK, now that we’ve just barely reviewed the idea of the rank of a matrix, how would we find it?

[Cue the William Tell Overture.] The Singular Value Decomposition.

We don’t actually need the entire decomposition. Yes, the Singular Value Decomposition says we could factor X as

,

but all we need is the diagonal entries in w. In fact, all we need is the number of nonzero entries in w – because that is precisely the rank of the matrix X.

SingularValueList[X]//N = {36368.4, 170.701, 60.5332, 7.6019}

OK, those are all plausible numbers.

Oops. There are only 4 of them.

This matrix is of rank 4.

Its 5 columns are not linearly independent. Our answer is:

**Detection: yes, we have linear dependence.**

A quick aside. If I do not ask for a numerical answer ( //N), I get a symbolic one that is of no use to me, because the matrix was composed of integers:

**OK, let’s locate (isolate) the dependence.** What I’m going to do is drop one column at a time, and find the singular values of each resulting 6×4 matrix. Since Mathematica is a little more comfortable with rows than columns, I’m going to actually drop a row of the transpose X’ rather than a column of X. Oh, I’m going to drop a row by specifying exactly which rows to keep – because this operation generalizes nicely.

To put that another way, I’m going to look at all possible submatrices of X’ having 4 rows.

I have 5 rows… and I want all possible subsets of 4 rows… Mathematica does that rather easily:

r=Range[5] = {1,2,3,4,5}

s=Subsets[r,{4}] = {{1,2,3,4}, {1,2,3,5}, {1,2,4,5}, {1,3,4,5}, {2,3,4,5}}

In other words… s1 drops row 5 from XT; s2 drops row 4 from XT, etc.

Here’s what we get if we just ask for the singular value lists:

Let me show case number, the rows we kept, and the singular values:

What we see (literally) is that every submatrix containing rows 4 and 5 is of rank 3. (Let me be clear: I’m not counting singular values in that table – I’m just looking at the lengths of the lines.) Alternatively, both the matrix with row 5 dropped (line 1), and the matrix with row 4 dropped (line 2), are of rank 4. Since they’re both 6×4 matrices, they are both of full rank.

Recall that if a set of vectors is linearly independent, every subset of them is also linearly independent. We don’t need to check the subsets of any line with 4 nonzero singular values.

Drop either row 4 or row 5 from XT and we’re good. That is, equivalently, drop either column 4 or column 5 from X, and we’re good.

**Isolation: Columns 4 and 5 of X are linearly dependent, and that’s the only dependence in the matrix X.**

Well, the only way for two vectors to be linearly dependent is for them to be proportional, so let’s just divide column 5 by column 4, element by element:

XT[[5]]/XT[[4]] = {2,2,2,2,2,2}

**Identification: column 5 is twice column 4.**

Before I move on to the fourth issue, let me point out that we could have started with two columns instead of four:

Great! Rows 4 and 5 of XT are linearly dependent. Every other pair of rows is linearly independent. (Since we’re looking at 6×2 matrices, full rank = 2.)

We are, in fact, done at this point… but it might be more satisfying to confirm it.

Now we see explicitly that every subset of 3 columns which does not contain both 4 and 5 is linearly independent.

Finally, there’s the question of what the computer really has for numbers. (Huh? Keep reading.)

Mathematica distinguishes between exact numbers and real numbers. My matrix had integer values, and Mathematica computes its fifth singular value as zero.

But what if I tell it to treat the data as real numbers, and I ask it to show the singular values to many, many places? Having saved the original matrix as X0, I recreate X as

X=X0//N,

and that gives me:

(Those decimal points are extremely significant to Mathematica, marking these as real numbers rather than integers or rationals, or named irrationals such as Pi.)

I ask for the singular value list, and I specify a ridiculously small tolerance:

t1=SingularValueList[X,Tolerance->10^-20]

… and I get

{36368.4, 170.701, 60.5332, 7.6019, 9.23125*10^-13}

Ah ha! That fifth singular value isn’t zero!

Hmm. What’s the size of a typical atomic nucleus? 10^-13 cm, according to my handy copy of Griffiths’ “Particle Physics”.

I hope you realize we’re being silly.

If the matrix X were of full rank, then we could invert X’X. (To put that another way, if X is not of full rank, we cannot invert X’X, which means we cannot solve the normal equations… which means we cannot use that data matrix for regression.)

So let’s ask for the inverse of X’X:

This is what I mean by “let the computer tell us what is close enough to be zero.”

**Computation: the fifth singular value is zero as far as the computer is concerned.**

Why am I making such a big deal of that?

Well, when Belsley, Kuh, and Welsch list the singular values of this matrix, they show a fifth singular value, having the value 1.3 X 10^-12. They note that zero on the computer they used is 10^-14. I don’t care; that number is conceptually zero, and anything else is round-off error.

In more pointed terms, their 1.3 X 10^-12 and my 9.23125*10^-13 are both garbage, literally: they are debris left in a register when the calculations were finished.

And yet, I don’t actually care that they get 1.3×10^-12.

What I do care about is that they compute the ratio of largest to smallest singular values for the column-scaled Bauer matrix, and they used 1.3×10^-12 for that smallest value!

**They treat the fifth singular value as nonzero.** They divided by it!

They treat this matrix as though it is multicollinear – having near linear dependence – instead of being of reduced rank, with exact linear dependence.

Come on, column 5 is exactly two times column 4. You probably knew this in high school: take any 5 rows of this matrix so that we get a 5×5; what’s the determinant? Zero. You know it’s zero.

And yet Draper and Smith march right along with Belsley et al., except that they either worked in single precision or they had a terrible algorithm: they actually think the ratio of first (largest) to fifth (smallest) singular value is 5799 (p. 379 of the 3rd ed.)

(To be precise, they computed the singular values of a related matrix. I will show you exactly what Belsley et al. and Draper & Smith did, in a subsequent post. **The Belsley et al. algorithm for isolating multicollinearity is fine; I just think their example is seriously flawed.**)

**To summarize Example 1:**

- Detection: the matrix X is of reduced rank, 4 instead of 5; we have linear dependence.
- Isolation: columns 4 and 5 are linearly dependent.
- Identification: column 5 = 2 times column 4.
- Computation: attempting to compute the inverse of X’X gets us “the matrix is singular”, as we would expect when the fifth singular value is zero.

Let me close this example by showing you what to do if you cannot compute singular values. **Instead, compute eigenvalues of X’X**:

Eigenvalues[XTX]//N = {1.32266*10^9, 29138.9, 3664.27, 57.7889, 0.}

Yes, it shows five of them – but the fifth one is zero.

Then we could compute eigenvalues of Xs’Xs, where Xs is a subset of the columns of X.

(And it’s not an accident that the matrix XTX has such an unusual form: it was deliberately constructed so that the fourth and fifth columns were orthogonal to each of the first three. For my purposes, that’s irrelevant.)

## Example 2

**Detection**: is this matrix of less than full rank? Find the singular values.

SingularValueList[XT]//N = {16264.4,219.11,75.5895,11.6173}

Yes, there are only four nonzero singular values, so the matrix is of rank 4 instead of 5.

**Isolation**: what columns of X are involved? Equivalently: what rows of the transpose XT are involved? (Remember that this a convenient alternative because I am using Mathematica, which is row oriented.) Look at subsets of 4 rows of XT:

Only two of those are linearly dependent. By taking the intersection of the row numbers, I find what is common to the two linearly dependent sets:

Hmm. It would appear that there is a linear dependence among x2, x3, and x5.

We’re actually done: we can tell from the linearly independent sets that x2 and x3 are not linearly dependent, and that x2 and x5 are not linearly dependent, and that x3 and x5 are not linearly dependent.

… but let’s widen our comfort zone anyway and look at all subsets of 3 rows of XT:

Exactly one of those is linearly dependent:

t2[[-3,2]] = {2,3,5}

That says the same thing as the intersection of the previous results. But let’s look at 2-element subsets of that set: I’m about to change the row numbers, but i don’t see that I have a choice. (The following screenshot shows precisely how and what I did.)

(Okay, I was a bit devious there, and that’s why the screenshot. By construction M1 is a subset of the rows of XT; I eliminated the part about taking a subset of the columns of X and then transposing that subset.)

Every one of those is just fine: the linear dependence involves all of x2, x3, and x5… because the preceding table explicitly shows that the pairs x2,x3 and x2,x5, and x3,x5 are linearly independent.

In a nutshell, we can stop as soon as we’re convinced that we’ve isolated the linear dependence… but we can go further and confirm our conclusion, just to make sure we’re not being hasty.

As for **identification**, now that we that the dependence involves x2, x3, and x5, we could fit any one of the three as a function of the other two. Let’s do x5 as a function of x2 and x3:

The fitted equation is X5 = X2 + 2 X3. The fit is perfect, i.e. there are no errors (RSquared = 1). I could have told Mathematica to omit the constant term; instead, I let it show me that it is zero.

As for **computation**, let’s try inverting X’X:

The fifth singular value is zero according to the computer.

**Summary of Example 2:**

- Detection: we have linear dependence, because the matrix is of rank 4 instead of 5.
- Isolation: x2, x3, and x5 are linearly dependent.
- Identification: x5 = x2 + 2 x3
- Computation: the computer cannot invert X’X, so we really do have fewer than 5 nonzero singular values.

## Example 3

Here’s a matrix:

Here are its singular values:

SingularValueList[X]//N = {36368.4, 264.865, 70.4962}

**Detection**: There are only 3 nonzero singular values, so we have linearly dependent columns of X, because the matrix is of rank 3 instead of rank 5. And because the rank is only 3, there are only 3 linearly independent columns… so there are two linear dependencies.

Let’s look at subsets of 4 rows of the transpose matrix.

We see 3 singular values in every case: every subset of 4 rows of XT is linearly dependent. Of course. The matrix is of rank 3: there are only 3 linearly independent columns in X, equivalently, 3 linearly independent rows in XT.

So look at subsets of 3 rows of XT:

What subsets are of full rank?

s[[{2,3,4,5,7,8}]] = {{1,2,4},{1,2,5},{1,3,4},{1,3,5},{2,3,4},{2,3,5}}

That’s too much to sort out.(Okay, we looking at linear dependence instead, we can see that {1,2,3} is linearly dependent, and anything involving {4,5} is linearly dependent. But let’s get more comfortable.)

Let’s look at pairs of rows of XT:

Exactly one pair is of rank 1, the last one (index -1):

s[[-1]] = {4,5}

so rows 4 and 5 of XT (i.e. columns 4 and 5 of X) are linearly dependent. But we’re not done: if that were the only linear dependence, the matrix would be of rank 4, not 3.

Remove one of those: x5 is convenient. (I just ask for subsets of the first four rows instead of subsets of five rows.) Let’s take a look at what happens now.

Now the only linearly dependent subset of 3 is the first:

s[[1]] = {1,2,3}

**Isolation**: x4 and x5 are linearly dependent; and x1, x2, and x3 are linearly dependent.

As for **identification**, x5 is still twice x4:

XT[[5]]/XT[[4]] = {2,2,2,2,2,2}

(Two linearly dependent vectors? One is a multiple of the other.)

and we can fit x3 as a function of x1 and x2:

So X3 = X1 + 2 X2.

As for **computation**, we can ask the computer to invert X’X – which will confirm, by failing, that the smallest singular value is numerically equivalent to zero.

**Summary of Example 3:**

- Detection: yes, the matrix X is of rank 3, not of full rank, and so we have only 3 independent columns.
- Isolation: x4 and x5 and linearly dependent, and x1, x2, x3 are linearly dependent.
- Identification: x5 = 2 * x4, and x3 = x1 + 2 x2.
- Computation: X’X is not invertible, so there really are fewer than 5 nonzero singular values.

(That X’X cannot be inverted only really confirms that X is of rank less than 5; it doesn’t really confirm that two singular values are zero. But we could drop x5 and proceed with the reduced matrix.)

## Example 4

Here’s the matrix:

**Detection**: Here are its singular values:

SingularValueList[X]//N = {803.167, 77.2844, 9.03734}

3 of them. We have two linear dependencies again. (Rank = 3, but full rank would be 5.)

**Isolation**. There is no reason, other than comfort, to look at subsets of 4 rows of XT – but comfort is good, so here are the singular values:

No more than 3 singular values in any of them. Of course: X is of rank 3, so it has only 3 linearly independent columns, and no subset of its columns can be of rank more than 3.

The third case is even more linearly dependent; with only two nonzero singular values, this subset has only two linearly independent rows of XT:

s[[3]] = {1,2,4,5}

… but that doesn’t tell me much. Yes, that particular set is linearly dependent… but does the dependence involve all four variables? (We will understand this case when we’re finished.)

Ho hum. Subsets of 3 rows of XT?

Let’s drop down to pairs of columns:

As in the last example, we see that exactly one pair, the last, is linearly dependent:

s[[-1]] = {4,5}

… so remove one of those: dropping 5 is convenient again, because I get to look at subsets of rows 1 thru 4. Let’s look at subsets containing 3 columns:

Now the only linearly dependent subset of 3 is the 2nd:

s[[2]] = {1,2,4}

Now, we already know that the only linear dependence between pairs of columns is 4 & 5, so no subset of {1,2,4} is linearly dependent.

OK, let’s fit x4 as a function of x1 and x2 (**identification**):

We see that x4 = x1 + 3 x2, there is no constant term, and the fit is perfect (RSquared = 1).

And, if you feel it’s necessary, confirm that X’X really is not invertible, i.e. the fourth and fifth singular values are not very small nonzero numbers.

**Summary of Example 4:**

- Detection: the 5×6 matrix X is of rank 3, so we have two linear dependencies.
- Isolation: x4 & x5 are linearly dependent; and x1, x2, x4 are linearly dependent.
- Identification: x5 = 2 x4 and x4 = x1 + 3 x2
- Computation: X’X is not invertible, so there really are fewer than 5 nonzero singular values.

(As I said in the previous example, that X’X cannot be inverted only really confirms that X is of rank less than 5; it doesn’t really confirm that two singular values are zero. But we could drop x5 and proceed with the reduced matrix.)

And back when we saw that the subset of rows {1,2,4,5} was of rank 2? Well, x4 is determined by x1 and x2; and x5 is determined by x4. Rank 2 indeed: only two of those four vectors are linearly independent; x1 and x2 are one possible choice.

In addition, this example demonstrated that the two linear dependencies could be interconnected, but that doesn’t pose a problem.

## Example 5

Here’s a matrix:

**Detection**: Here are its singular values.

SingularValueList[XT]//N = {16264.4, 219.143, 75.6439, 11.8313, 2.93115}

There are 5 of them. **This matrix is of full rank**; its columns are not not linearly dependent. Everything is fine. (And you were beginning to think I couldn’t construct a matrix of full rank!)

But.

If I were to run a regression using these 5 variables as the independent variables, we would have a problem.

What would change?

We would add a column of 1s to those variables.

Instead of studying X itself, **if we’re going to use it for regression, we need to study the design matrix**. I’m going to do something a little devious: put the column of 1s (“the constant term”) on the right.

That way, column 3 of the new matrix is still x3. If I put the column of 1s on the left, then column 3 will be x2. A nuisance. The location of the column of 1s is computationally irrelevant – but I would hate having to remember that column 3 refers to x2 instead of to x3.

Here’s the new matrix:

**Detection**. Here are its singular values.

SingularValueList[M]//N = {16264.4, 219.143, 75.6439, 11.8436, 3.78017}

Now M is 6×6. But it only has 5 nonzero singular values. M is not of full rank. Houston, we have a problem.

From here, the analysis goes just like all the previous ones. The only difference was that X is of full rank, but M is not… and all we did was introduce a column of 1s.

**Isolation**. Let me transpose it and look at subsets of 5 rows of M’ = MT.

Okay… a couple of subsets are linearly dependent:

t2[[3,2]] = {1,2,3,5,6}

t2[[6,2]] = {2,3,4,5,6}

Intersection[%,%%] = {2,3,5,6}

That intersection strongly suggests that x2, x3, x5, and x6 (the constant) are linearly dependent.

Let’s go ahead and look at subsets of 4 rows of M’:

That confirms explicitly that {2,3,5,6} is the linearly dependent set of rows of M’; it’s the only 4-row subset of MT with only 3 singular values.

Let’s go ahead and look at subsets of 3 rows of M’:

Every subset of 3 columns of M’ is of rank 3, i.e. of full rank. The only linear dependence is {2,3,5,6}.

**Identification**. Confirm it, by fitting x5 as a function of x2 and x3:

Sure enough: X5 = X2 + 2 X3 – 3. (This time, the constant term wasn’t zero.)

Note that I did not include x6 in the data matrix d1 – the whole point is that the design matrix includes a column of 1s):

**Computation**. Yes, the smallest singular value of M really is zero to within the machine tolerance:

**Summary of Example 5:**

- Detection: the matrix X is of full rank, but the design matrix constructed from it is not.
- Isolation: the column of 1s, and x2, x3, x5 are linearly dependent.
- Identification: X5 = X2 + 2 X3 – 3.
- computation: the design matrix M leads to M’M which is not invertible.

**Key**: if we only look at X, we will not catch a linear relationship involving a constant term. It is crucial, for example, that the earliest example was

x5 = 2 x4,

not something like x5 = 2 x4 + 10,

with a constant.

**Key**: if the observations have constant row sums (for example, they are percentages which add to 100), we may find exactly this problem: the matrix X of data may be of full rank, but the design matrix will not be, because we would have

x1 + x2 + x3 + x4 -100 = 0.

**Key**. There’s another way to get linear dependence in regression analysis. Suppose our data was quarterly, and suppose we wanted to see if there was some effect of the individual quarters. We could define four variables Q1, Q2, Q3, and Q4, i.e Qi, as being 1 in the ith quarter and zero otherwise.

That is, if an observation were second quarter, we would set Q1 = Q3 = Q4 = 0, and Q2 = 1.

These are called dummy variables if we’re speaking informally, instrumental variables otherwise (when we don’t want to seem too down-to-earth, like we get our hands dirty).

The crucial fact is that Q1 + Q2 + Q3 + Q4 = 1,

i.e. there is a linear dependence among the 4 Qs and the constant.

So omit one of the Qs. You have to. And your texts and your teachers will tell you up front, omit one of the Qs.

So you shouldn’t actually see this in practice, because you shouldn’t see all of the Qs. But sometimes accidents happen.

## Conclusion

Okay? We can detect, isolate, and identify linear dependence; and we can, if necessary, convince people that, for example, 10^-12 is not nonzero in our calculations.

I remind you that if you cannot compute singular values of X (or X’), then compute eigenvalues of X’X instead – you must have access to something that will compute eigenvalues. (It’s right up there with electricity and indoor plumbing.) Those eigenvalues are the squares of the singular values, and zero squared will still be zero, so zero singular values correspond to zero eigenvalues.

For handy reference, here are the examples:

- x5 = 2 x4
- x5 = x2 +2 x3
- x3 = x1 + 2 x2.
- x5 = 2 x4, x4 = x1 + 3 x2
- x5 = x2 + 2 x3 – 3

Next up, multicollinearity, i.e. approximate linear dependence.

February 7, 2011 at 6:24 pm

I have considered completely rewriting this post, but I’ve decided it should remain. There’s nothing wrong in it (that I know of) but: the isolation and identification of linear dependence can be done way, way more easily than by looking at subsets. I can’t even say I’ve found a quick-and-dirty solution – oh, it’s quick, all right! But there’s nothing dirty about it.

Had I decided to rewrite the post, I would have retained the introductory material about linear dependence and the SVD; I would have replaced only the isolation and identifcation portions. But some of you may find looking at subsets to be more intuitive than the quick-and-easy solution, and that’s one reason for leaving this post essentially untouched.

So: there is an additional post which shows how appallingly easy it is to isolate and identify linear dependence in a matrix. It provides new methods of solution for the five examples in this post. We will get the same answers, except that in one case looking at subsets gave an equivalent answer that might be more appealing.

Oh, let me tell you what I remembered: if a matrix X is of less than full rank, then the appropriate number of rightmost columns of v – from the Singular Value Decomposition X = u w v’ – are a basis for the nullspace of X. And that hands us the linear dependence on a platter.

December 3, 2011 at 5:53 pm

Just stumbled upon this through a friends post..great idea and amazing talent here..

December 4, 2011 at 4:08 am

Thank you very much for posting this good content! I am looking forward to checking out more!