## A Minimization problem

Let me work a minimization problem rather than a maximization. Like the previous problem, this one comes from Loomis & Turban, “Applied programming for management”, 003-078240-6 (pp. 97-99).

A customer asks a butcher to grind up several cuts of beef to form a blend of not less than 17.6% protein and 14.8% fat.

What he has available is so that table gives us the A matrix and the c vector; the protein and fat requirements give us the b vector. Let the variables be x1, x2, x3.
Read the rest of this entry »

## Linear programming – Getting the final tableau given the answer

(That may sound strange. Bear with me.)

## Introduction

Every once in a while, I pick up and play with linear programming. This post will show you a couple of elementary ways to set up and solve a small linear programming problem….

But that is not the main purpose of this post.

For the record, Mathematica® has some special–purpose commands for solving large linear programming problems. I believe it handles them as data sets in a standardized format. I also believe its smallest example has more than 30 variables. So, if you need to do linear programming professionally, you should look at Mathematica’s linear programming command.

But if, like me, you pick up linear programming as a student who wants to work his way through a textbook, then you need something else.

Working through a textbook will almost certainly involve tabular displays of the initial problem, a sequence of tabular displays of intermediate non-optimal solutions, and a display of the final optimal solution. Each of these tables is typically called by the French “tableau”, plural tableaux.

You need to know how to get the final tableau, given Mathematica’s solution and the initial tableau (i.e. the initial data).
Read the rest of this entry »

## Regression 1 – Assumptions and the error sum of squares

There’s one thing I didn’t work out in the previous post: the relatinship between the error sum of squares and the variance of the u. We have already computed the variance of the e, that is,

V(ee’).

What we want now is the expected value of the error sum of squares:

E(e’e).

(I should perhaps remind us that e is, by convention, a column vector… so its transpose e’ is a row vector… so e’e is a scalar, equal to the dot product of e with itself… while ee’ is a square matrix. Vectors can be pretty handy for this kind of stuff.)

The expected value of the sum of squared errors is surprisingly complicated. Well, maybe I should just say it’s different from what we did in the last post… and that’s one reason I moved it to a post of its own.
Read the rest of this entry »

## Using the QR Decomposition to orthogonalize data

This is going to be a very short post, illustrating one idea with one example (yes, one, not five).

It turns out that there is another way to have Mathematica® orthogonalize a matrix: it’s called the QR decomposition. The matrix Q will contain the orthogonalized data… and the matrix R will specify the relationship between the original data and the orthogonalized.

That means we do not have to do the laborious computations described in this post. Understand, if we do not care about the relationship between the original data and the orthogonalized data, then I see no advantage in Mathematica to using the QR over using the Orthogonalize command.
Read the rest of this entry »

## the relationship between the raw and the orthogonalized data

OK, so we orthogonalized the hald data, including the constant (the column of 1s).

What’s the relationship between the new variables and the old? We might someday get a new observation, and if we were using the fit to the orthogonalized data, we might want to see what it predicts for a new data point.

(In all honesty, I would use the original fit – but I still want to know what the relationship is.)

My notation is a little awkward. I’m going to stay with what is used for this post, in which I first showed how to find….

Let me start fresh. If we have two typical data matrices (i.e. taller than wide), and they are supposed to be the same data, how do we find the relationship?
Read the rest of this entry »

## introduction

A long time ago, in a post about PCA (principal component analysis), I said that I did not know what Andrews curves were. (The suggestion was made that Andrews curves might help us decide how many principal components to keep. I did not understand how they were to be computed.)

Now I know. Let me show you. I will compute Andrews curves for what is called “the iris data”… for both the full data set (150 observations) and a reduced data set (30 observations). I will also show you a possible variant.

In addition, we will know that there are in fact three kinds of irises in the data – so we can assess how well the Andrews curves did. In practive, of course, we will be trying to figure out how many kinds of observations we have.

The data is here. The paper which explained Andrews curves to me is here. Andrews original paper is: Andrews D. Plots of high-dimensional data Biometrics 1972 28:125-136… but I haven’t found it freely available anywhere online.

In addition, there is a short and sweet webpage by one of the authors of the paper I read.
Read the rest of this entry »

## Norms and Condition Numbers

We have had a few occasions to talk about norms of vectors. I want to take a look at them, and to also look at norms of matrices, and then at condition numbers of matrices.

## Vector Norms

Let me jump right in. Here’s a 2-dimensional vector: we get the Euclidean length of the vector – it’s the length of the hypotenuse of a 3-4-5 right triangle.

## Compressed Sensing: the L1 norm finds sparse solutions

I’ve just finished watching three hours of lectures on something called compressed sensing. I had fully expected to put out a post showing how to compute time from position on an elliptical orbit. But I really want to show you a key piece of mathematics. This was utterly new to me.

I want to show you, via a small example, that minimizing the L1 norm often returns sparse solutions.

Hey, what?

Imagine you have a one megapixel photograph. More likely, the raw file created by your digital camera is 8 to 15 megapixels… but if you save it as JPEG, you may compress it by a factor of – what? – 20? (I’m too busy to work it out.)

The key is that there is some representation in which the photograph has a sparse representation… the data can be represented with good accuracy by fewer numbers than we started with. We see something similar when we represent a sound recording by its Fourier coefficients, or computer tomography by a Radon transform.

Let me get to the simple example.
Read the rest of this entry »

## Regression 1 – Isolating & Identifying Linear Dependence

I almost named this post “Nailing Linear Dependence”. It’s so easy….

## Introduction

This post will show just how easy it is to isolate and identify linear dependence. Looking at subsets, as we did in this previous post, works – but what I’m about to show you is easier and faster. (Henceforth I will refer to that link as “the previous post” or once as “that post”.)

On the other hand, we will see a case where looking at subsets gives an equivalent answer that might be more appealing.

Now, I’m going to solve the same five examples as in the previous post. I am not going to duplicate the introductory material in the previous post, so if you need more context, please read that post. You might or might not want to spend much time looking at the details of examining subsets of the columns of X – that examination is what I’m about to replace by something more incisive.

As usual, I will use either XT or X’ to denote the transpose of X; and v’ or vt to denote the transpose of v.

## Introduction.

In this post, I will go from XYZ coordinates to a spectrum.

I have to ask: what would somebody do with this? Especially, what would they do that couldn’t have been done using the XYZ tristimulus values directly? I don’t know – but let’s just solve the problem. I do not yet always have a satisfactory solution, and I will illustrate both satisfactory and unsatisfactory solutions.

Yes, we have done this before – but not as our primary purpose, but rather as part of another computation. It is worthwhile to tackle this specific problem, because there is one subtlety.

In general terms, the solution is simple: the XYZ tri-stimulus values are proportional to the components (with respect to the dual basis) of a (fundamental) spectrum. Find the dual basis, then get the linear combination defined by those components.
Read the rest of this entry »