Andrews Curves

introduction

A long time ago, in a post about PCA (principal component analysis), I said that I did not know what Andrews curves were. (The suggestion was made that Andrews curves might help us decide how many principal components to keep. I did not understand how they were to be computed.)

Now I know. Let me show you. I will compute Andrews curves for what is called “the iris data”… for both the full data set (150 observations) and a reduced data set (30 observations). I will also show you a possible variant.

In addition, we will know that there are in fact three kinds of irises in the data – so we can assess how well the Andrews curves did. In practive, of course, we will be trying to figure out how many kinds of observations we have.

The data is here. The paper which explained Andrews curves to me is here. Andrews original paper is: Andrews D. Plots of high-dimensional data Biometrics 1972 28:125-136… but I haven’t found it freely available anywhere online.

In addition, there is a short and sweet webpage by one of the authors of the paper I read.
Read the rest of this entry »

Regression 1: Multicollinearity in the Hald data – 2

Introduction

The Hald data turns out to have been an excellent choice for investigating multicollinearity: it has at least four “near linear dependencies”. I’m about to show the details of three of them. (And in a subsequent post, I think I can eliminate all but three of them – but not the same three!)

We have already seen one of them: we know that the four independent variables have a nearly constant sum just under 100. (These four variables are, in fact, a subset of a larger set of variables – whose sum was 100%.)

We have seen two approaches to finding (exact) linear dependence of a set of variables:

We used the singular value decomposition of the design matrix for all four variables (that is, our subset was the entire set) to discover that all four variables (with a constant, too) were multi-collinear. But I did not continue looking at all the subsets.

I think I will return to that approach, looking at all subsets… but not today. Instead, I want to look at an orthonormal basis for the closest thing we have to a null space. And I want to do it for four regressions which I decided were worth investigating.

I’m going to use one additional tool, the variance inflation factors – indirectly. You will see that I view them as one possible explanation for multicollinearity detected by the SVD. (And yes, that should be surprising.)
Read the rest of this entry »

Regression 1 – Isolating & Identifying Linear Dependence

I almost named this post “Nailing Linear Dependence”. It’s so easy….

Introduction

This post will show just how easy it is to isolate and identify linear dependence. Looking at subsets, as we did in this previous post, works – but what I’m about to show you is easier and faster. (Henceforth I will refer to that link as “the previous post” or once as “that post”.)

On the other hand, we will see a case where looking at subsets gives an equivalent answer that might be more appealing.

Now, I’m going to solve the same five examples as in the previous post. I am not going to duplicate the introductory material in the previous post, so if you need more context, please read that post. You might or might not want to spend much time looking at the details of examining subsets of the columns of X – that examination is what I’m about to replace by something more incisive.

As usual, I will use either XT or X’ to denote the transpose of X; and v’ or vt to denote the transpose of v.

Example 1

Read the rest of this entry »

Regression 1: Multicollinearity in the Hald data – 1

Edited 2011 Jan 25: one “edit” and two “aside” comments, all pertaining to vector and matrix norms.

Introduction and Review

Let me say up front that this post closes by explaining and using the “Variance Inflation Factors” from Mathematica’s Linear Model Fit. If that’s what you’re looking for, do a find on “VIF”. (I didn’t know how they were computed when I looked at some of Mathematica’s properties of a linear regression. Now I do.)

Before we begin investigating multicollinearity of the Hald data, let’s review what we have learned about the data.

I have shown you three or four ways of selecting the best regression from a specified set of variables – subject to the very significant caveat that we haven’t constructed new variables from them. That is, we have not transformed the variables, or taken powers and products of the variables, or taken lagged variables, and other such things. Had I done any of those things, I would have included the newly constructed variables in the specified set.

One of the ways was simply to run all possible regressions. When we did that for the Hald data, we found that our selection criteria were divided between two of the regressions:
Read the rest of this entry »

Example: Is it a transition matrix? Part 2

We had three matrices from Jolliffe, P, V, and Q. They were allegedly a set of principal components P, a varimax rotation V of P, and a quartimin “oblique rotation” Q.

I’ll remind you that when they say “oblique rotation” they mean a general change-of-basis. A rotation preserves an orthonormal basis; a rotation cannot transform an orthonormal basis to a non-orthonormal basis, and that’s what they mean — a transformation from an orthonormal basis to a non-orthonormal basis, or possibly a transformation from a merely orthogonal basis to a non-orthogonal one. In either case, the transformation cannot be a rotation.

(It isn’t that complicated! If you change the lengths of basis vectors, it isn’t a rotation; if you change the angles between the basis vectors, it isn’t a rotation.)

Anyway, we showed in Part 1 that V and Q spanned the same 4D subspace of R^{10}\ .

Now, what about V and P? Let me recall them:
Read the rest of this entry »

Color: re-doing Cohen’s example

Cohen’s example again

Let me now show you how I would do Cohen’s example. (His computations, pretty much, were the previous post.) I cannot over-emphasize that he deserves a lot of credit for getting the mathematics right, even if he didn’t name it correctly or do it beautifully.

I start with the A matrix:
Read the rest of this entry »

PCA / FA Example 4: Davis. R-mode & Q-mode via the SVD.

let’s finally do this using the SVD. i need to do one terrible thing: where davis writes U and V, we need v and u, resp. (look, i couldn’t reliably keep translating davis’ equations in my head, so i had to use his notation; by the same token, i can’t reliably translate the SVD over and over again. thank god he used UC, upper case.)
if the correspondence had been u ~ U and v ~ V, the translation would have been trivial. unfortunately, the correspondence is
u ~ V
v ~ U.
(from the SVD posts, you recall that given X = u \ w \ v^T we conclude that v is an eigenvector matrix for X^T\ X, and u is an eigenvector matrix for XX^T. and, just as important, the nonzero values of w are the nonzero \sqrt{\text{eigenvalues}}.)

PCA / FA example 4: Davis. R-mode FA, eigenvalues

from davis’ “statistics and data analysis in geology”, we take the following extremely simple example (p. 502). his data matrix is
D = \left(\begin{array}{ccc} 4&27&18\\ 12&25&12\\ 10&23&16\\ 14&21&14\end{array}\right)
where each column is a variable. he now centers the data, by subtracting each column mean from the values in the column.
let me do that. i compute the column means…
{10,\ 24,\ 15}
and subtract each one from the appropriate column, getting
X = \left(\begin{array}{ccc} -6&3&3\\ 2&1&-3\\ 0.&-1&1\\ 4&-3&-1\end{array}\right)

the SVD: preface

Rather than edit the following SVD posts – unless I have to – let me put some prefatory material here. I am also going to answer a couple of email questions in comments, partly just to check out their mechanics.

The following posts in the math SVD category should be read in the order displayed, not starting from the back.

That may have been a bad idea on my part, but let’s see how it works out. The difficulty is that most of what I do out here will be available in the usual blog-order, latest first. Eventually it may be awkward if one category is in reverse order wrt everything else.

I would like to add two pieces of vocabulary. For the SVD X = u\ w\ v^T the columns of u and v are called the left singular vectors of X and the right singular vectors of X, respectively.

I have also discovered that the latex interpreter works in comments. Cool! My understanding is that it does not work in forums; they would be the appropriate place to post code – and have it display as code! After all, that’s where I found the key starting points for latex.

the SVD (singular value decomposition) stuff

ok, i’ve published everything i planned to put out there for the SVD. not to say i won’t need to add things, if anyone ever has any questions about what i meant….  

yes, i need to put out a bibliography, too. but i’ve been at this all day, and it’s time to stop. incidentally, the SVD stuff is a category of its own – not a page as i originally set up the first (last!) installment.

i should probably emphasize that the SVD category is posted in the order i intend it to be read. 

it looks like quite a pile of stuff, and it’s scary to think that it’s all only background for principal components analysis!

one advantage to doing as much work as i did today is that i’ve managed to streamline the process. take a piece of the latex output from mathematica®, drop it into TextWrangler, edit it massively there, then copy a preliminary version into the wordpress editor, and finally edit there until it works.

well, i’m thrilled to have found a way to play with the SVD, after all the years i’ve meant to. i need to actually write down the list of all the other things i’ve been meaning to look at.