## introduction

A long time ago, in a post about PCA (principal component analysis), I said that I did not know what Andrews curves were. (The suggestion was made that Andrews curves might help us decide how many principal components to keep. I did not understand how they were to be computed.)

Now I know. Let me show you. I will compute Andrews curves for what is called “the iris data”… for both the full data set (150 observations) and a reduced data set (30 observations). I will also show you a possible variant.

In addition, we will know that there are in fact three kinds of irises in the data – so we can assess how well the Andrews curves did. In practive, of course, we will be trying to figure out how many kinds of observations we have.

The data is here. The paper which explained Andrews curves to me is here. Andrews original paper is: Andrews D. Plots of high-dimensional data Biometrics 1972 28:125-136… but I haven’t found it freely available anywhere online.

In addition, there is a short and sweet webpage by one of the authors of the paper I read.
Read the rest of this entry »

## Introduction

The Hald data turns out to have been an excellent choice for investigating multicollinearity: it has at least four “near linear dependencies”. I’m about to show the details of three of them. (And in a subsequent post, I think I can eliminate all but three of them – but not the same three!)

We have already seen one of them: we know that the four independent variables have a nearly constant sum just under 100. (These four variables are, in fact, a subset of a larger set of variables – whose sum was 100%.)

We have seen two approaches to finding (exact) linear dependence of a set of variables:

We used the singular value decomposition of the design matrix for all four variables (that is, our subset was the entire set) to discover that all four variables (with a constant, too) were multi-collinear. But I did not continue looking at all the subsets.

I think I will return to that approach, looking at all subsets… but not today. Instead, I want to look at an orthonormal basis for the closest thing we have to a null space. And I want to do it for four regressions which I decided were worth investigating.

I’m going to use one additional tool, the variance inflation factors – indirectly. You will see that I view them as one possible explanation for multicollinearity detected by the SVD. (And yes, that should be surprising.)
Read the rest of this entry »

## Regression 1 – Isolating & Identifying Linear Dependence

I almost named this post “Nailing Linear Dependence”. It’s so easy….

## Introduction

This post will show just how easy it is to isolate and identify linear dependence. Looking at subsets, as we did in this previous post, works – but what I’m about to show you is easier and faster. (Henceforth I will refer to that link as “the previous post” or once as “that post”.)

On the other hand, we will see a case where looking at subsets gives an equivalent answer that might be more appealing.

Now, I’m going to solve the same five examples as in the previous post. I am not going to duplicate the introductory material in the previous post, so if you need more context, please read that post. You might or might not want to spend much time looking at the details of examining subsets of the columns of X – that examination is what I’m about to replace by something more incisive.

As usual, I will use either XT or X’ to denote the transpose of X; and v’ or vt to denote the transpose of v.

## Regression 1: Multicollinearity in the Hald data – 1

Edited 2011 Jan 25: one “edit” and two “aside” comments, all pertaining to vector and matrix norms.

## Introduction and Review

Let me say up front that this post closes by explaining and using the “Variance Inflation Factors” from Mathematica’s Linear Model Fit. If that’s what you’re looking for, do a find on “VIF”. (I didn’t know how they were computed when I looked at some of Mathematica’s properties of a linear regression. Now I do.)

Before we begin investigating multicollinearity of the Hald data, let’s review what we have learned about the data.

I have shown you three or four ways of selecting the best regression from a specified set of variables – subject to the very significant caveat that we haven’t constructed new variables from them. That is, we have not transformed the variables, or taken powers and products of the variables, or taken lagged variables, and other such things. Had I done any of those things, I would have included the newly constructed variables in the specified set.

One of the ways was simply to run all possible regressions. When we did that for the Hald data, we found that our selection criteria were divided between two of the regressions:
Read the rest of this entry »

## Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of $R^{10}\$.

Now, what about V and P? Let me recall them:
Read the rest of this entry »

## Cohen’s example again

Let me now show you how I would do Cohen’s example. (His computations, pretty much, were the previous post.) I cannot over-emphasize that he deserves a lot of credit for getting the mathematics right, even if he didn’t name it correctly or do it beautifully.

(from the SVD posts, you recall that given $X = u \ w \ v^T$ we conclude that v is an eigenvector matrix for $X^T\ X$, and u is an eigenvector matrix for $XX^T$. and, just as important, the nonzero values of w are the nonzero $\sqrt{\text{eigenvalues}}$.)