## introduction

A long time ago, in a post about PCA (principal component analysis), I said that I did not know what Andrews curves were. (The suggestion was made that Andrews curves might help us decide how many principal components to keep. I did not understand how they were to be computed.)

Now I know. Let me show you. I will compute Andrews curves for what is called “the iris data”… for both the full data set (150 observations) and a reduced data set (30 observations). I will also show you a possible variant.

In addition, we will know that there are in fact three kinds of irises in the data – so we can assess how well the Andrews curves did. In practive, of course, we will be trying to figure out how many kinds of observations we have.

The data is here. The paper which explained Andrews curves to me is here. Andrews original paper is: Andrews D. Plots of high-dimensional data Biometrics 1972 28:125-136… but I haven’t found it freely available anywhere online.

In addition, there is a short and sweet webpage by one of the authors of the paper I read.

## the data

Since the data is publicly available, let me show you part of it. (There are a total of 150 observations, which in fact are 3 groups of 50 observations.) Here are the 1st 10 observations from each of the 3 groups.

Get the iris data.

There are 4 columns in my dataset (I removed the names – then went back to the original and looked at them).

Looking at the original data, however, I found that there were only three distinct names, and there were 50 observations of each. So I don’t have to code the observations in order to look at any one group; I just need to pick observations 1-50, 51-100, or 101-150.

Here are the first 10 observations:

Here are observations 51-60:

And here are observations 101-110:

I assemble them into a reduced dataset, dsm. Let me emphasize that I know that that the first 10 observations are one kind of iris, the next 10 are a second kind, and the last 10 observations are a third kind.

## (Fourier) Andrews’ curves

Now, given a row, I dot it into a fourier basis. Here’s a 4-element basis:

And I compute…

We have 150 functions, and here’s what they look like:

Ok. We see two kinds of functions, hence presumably two kinds of observations.

I have played around with different orderings of the columns of the data, and with different bases, such as

{Sin[2 t], Cos[2 t], Sin[4 t], Cos[4 t]}.

While it might be worthwhile, I think I’ll settle for pointing out that we have a great deal of freedom in choosing the basis; there’s nothing sacred about the frequencies I chose.

BTW, I know of no reason to use a Fourier basis specifically, and I will show you an alternative.

But first, let me use the fact that the observations were labelled as three distinct kinds:

We see that our trick did not distinguish blue from yellow, although it did distinguish red from the other two. As I said at the beginning, we would usually be trying to determine how many kinds of observations we have. In this case, Andrews curves would lead us to conclude that there were two kinds of observations instead of three.

Just for the fun of it, let’s also do those two graphs for the reduced data set.

## Using Legendre polynomials

Instead of a Fourier basis, let me try something else that comes to mind.

The Legendre polynomials are an orthogonal (not orthonormal) basis on the interval [-1,1]. In case you’ve not seen them, here’s the linear one:

The quadratic:

The cubic:

And the quartic:

So I’m going to take those four polynomials as my basis…

and then create functions out of each observation:

What do they look like?

Let’s see that in color (again, using the labels on the observations). We’re judging how well the technique works on known different observations.

Let me do it for the reduced data set:

## After PCA (Principal Components Analysis)

Now, let’s try that (Fourier basis) using PCA. I’m not going to say much about it beyond: I will do a singular value decomposition of the data (u w v’), and I will use v as a change-of-basis matrix to get new data.

We get the SVD…

{u,w,v}=SingularValueDecomposition[din];

And we create new data y by computing X.v, i.e.

Here’s our Fourier basis again:

Then we construct functions:

That looks like three.

Interesting. Andrews curves applied to the transformed data suggest that there are three kinds of data – so we would keep three principal components. But applied to the original data, we would only keep two.

Ah, according to Jolliffe, Andrews recommended applying the technique to the transformed data… admittedly, because tests of significance were easier.

But they work better, too! At least in this case.

That distinguishes 2 and 3 (blue and yellow)! Nice.

## Let’s try the Legendre basis again.

That didn’t work so well. Two kinds.

Well, we can see – in color – that the blue and yellow curves cross each other… but I don’t see hw we could have easily gotten that from the all-black curves.

## the reduced data set

Now, let’s try that using the reduced data set: a Fourier basis after transforming by principal components. Look at the singular values of the array dsm:

SingularValueList[dsm] = {43.3649,7.88191,1.52126,0.497326}

We get the SVD…

{u,w,v}=SingularValueDecomposition[dsm];

And we create new data y by computing X.v, i.e.

y=dsm.v;

Here’s our Fourier basis again:

Then we construct functions:

That looks like three kinds, as it did before.

Let’s try the Legendre basis again.

So. The Fourier basis and Legendre basis work about equally well on the raw data (full or reduced), but both distinguish only two kinds – unless we look at them in color when we know there are three kinds. On the other hand, the Fourier basis works better than the Legendre on the data transformed using principal components.

Instead of using black, maybe we should use many colors. And I think I can say that if we find any basis that shows 3 kinds… then we can say “3 kinds”. We don’t require that all the bases we try give the same answer. We’re looking for anything at all that differentiates the observations.

I would also consider using a wavelet basis.

There have been a lot of generalizations out there, I gather – but I’m happy enough to know what the heck Jolliffe was talking about way back when.

June 1, 2016 at 10:15 am

you can find Andrew’s original paper available at http://sci-hub.cc