## PCA / FA Malinowski: example 6. Why Target Testing?

We have seen that Malinowski cares about “target testing”. That is, having reduced the dimension of his data, he goes looking for other vectors that lie in or near the “factor space”. Let me show you why. I ought to say that I really like what we do here.

This will be a long post, but I hope most of it is familiar; and much of it will be matrices being printed only for reference should you be following the actual computations.

Here’s the data for what he calls a hypothetical example, on p. 6, way before we can actually follow what he’s doing. We have a problem with notation, so I’m going to call the data matrix D instead of X.

$D = \left(\begin{array}{lllll} 0.005 & 0.031 & 0.063 & 0.091 & 0.046 \\ 0.04 & 0.172 & 0.356 & 0.444 & 0.218 \\ 0.103 & 0.283 & 0.484 & 0.471 & 0.208 \\ 0.116 & 0.323 & 0.562 & 0.548 & 0.241 \\ 0.125 & 0.318 & 0.516 & 0.45 & 0.185 \\ 0.104 & 0.267 & 0.43 & 0.376 & 0.154\end{array}\right)$

Each row measures a response to a specific frequency of ultraviolet light; each column is a different substance.

## get the SVD three ways

First I get the SVD, $D = u\ w\ v^T\$. (If I have the data, I’ll always get the SVD.)

For reference, in case you are working along with me, here is u:

$u = \left(\begin{array}{llllll} -0.0713266 & -0.218034 & -0.741018 & 0.140519 & -0.577793 & -0.211366 \\ -0.37098 & -0.808212 & 0.0928685 & 0.284078 & 0.334027 & -0.0909283 \\ -0.456438 & 0.000587014 & -0.354606 & -0.754766 & 0.274388 & 0.144769 \\ -0.528805 & -0.03171 & 0.488755 & -0.106788 & -0.652122 & 0.20931 \\ -0.466484 & 0.417828 & 0.0596303 & 0.160391 & 0.136195 & -0.74832 \\ -0.389498 & 0.351665 & -0.272185 & 0.541067 & 0.188362 & 0.567719\end{array}\right)$

and v:

$v = \left(\begin{array}{lllll} -0.132133 & 0.369931 & -0.143717 & -0.74594 & -0.518278 \\ -0.367417 & 0.506792 & -0.682616 & 0.254753 & 0.278034 \\ -0.631555 & 0.341731 & 0.674782 & 0.0355687 & 0.166621 \\ -0.613423 & -0.539825 & -0.18983 & 0.222622 & -0.496694 \\ -0.269067 & -0.445109 & -0.14837 & -0.572581 & 0.616131\end{array}\right)$

What we really want to look at is w:

$w = \left(\begin{array}{lllll} 1.68292 & 0. & 0. & 0. & 0. \\ 0. & 0.139932 & 0. & 0. & 0. \\ 0. & 0. & 0.00467999 & 0. & 0. \\ 0. & 0. & 0. & 0.00233644 & 0. \\ 0. & 0. & 0. & 0. & 0.000575221 \\ 0. & 0. & 0. & 0. & 0.\end{array}\right)$

Assume that the 3 smallest singular values are noise, or negligible. There are two ways to do this. One is to leave the size of w unchanged, but zero out the three smallest singular values. Let me call this w0:

$w0 = \left(\begin{array}{lllll} 1.68292 & 0 & 0 & 0 & 0 \\ 0 & 0.139932 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0\end{array}\right)$

Using w0 we could compute a “reconstituted” data matrix $D1 = u\ w0\ v^T\$.

$D1 = \left(\begin{array}{lllll} 0.00457424 & 0.0286415 & 0.0653838 & 0.0901035 & 0.0458782 \\ 0.0406571 & 0.172074 & 0.355651 & 0.44403 & 0.218326 \\ 0.101528 & 0.282273 & 0.485156 & 0.471156 & 0.206647 \\ 0.115948 & 0.324729 & 0.560528 & 0.548303 & 0.241428 \\ 0.12536 & 0.318073 & 0.515785 & 0.450008 & 0.185208 \\ 0.104816 & 0.265778 & 0.430797 & 0.375531 & 0.154468\end{array}\right)$

We have not changed the data very much. The rounded differences between D and D1 are:

$D - D1 = \left(\begin{array}{lllll} 0 & 0.002 & -0.002 & 0.001 & 0 \\ -0.001 & 0 & 0 & 0 & 0 \\ 0.001 & 0.001 & -0.001 & 0 & 0.001 \\ 0 & -0.002 & 0.001 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ -0.001 & 0.001 & -0.001 & 0 & 0\end{array}\right)$

Note that D1 was computed using the same u and v as D. This is advantageous if we’re considering more than one possible w0, i.e. if it’s not obvious to us how many singular values to zero-out, so we consider more than one possibility.

OTOH, we could use the cut-down SVD: not only zero-out the last three singular values, but also cut w down to an invertible 2×2 matrix; call it w1.

$w1 = \left(\begin{array}{ll} 1.68292 & 0. \\ 0. & 0.139932\end{array}\right)$

Once we change the size of w, we need to cut down u and v to the appropriate sizes. They each need to keep only the first 2 columns of u and v respectively. As usual, I call them u1 and v1.

For reference, then, here’s u1:

$u1 = \left(\begin{array}{ll} -0.132133 & 0.369931 \\ -0.367417 & 0.506792 \\ -0.631555 & 0.341731 \\ -0.613423 & -0.539825 \\ -0.269067 & -0.445109\end{array}\right)$

and v1:

$v1 = \left(\begin{array}{ll} -0.132133 & 0.369931 \\ -0.367417 & 0.506792 \\ -0.631555 & 0.341731 \\ -0.613423 & -0.539825 \\ -0.269067 & -0.445109\end{array}\right)$

then we could compute

$D1 = u\ w1\ v^T\$

and it is the same D1 we computed earlier using w0:

$D1 = u\ w0\ v^T\$.

All those zeroes in w0 wipe out any effects of the additional columns of u and v; alternatively, the additional columns of v have eigenvalue 0.

So, we have three SVDs, one for the original data D and two for the “reconstituted data” D1.

## get the product R C

We recall that Malinowski’s next step is to factor the reconstituted D1 as $D1 = \left(u1\ w1\right)\ v1^T$, naming the two pieces R and C: $R = u1\ w1$ and $C = v1^T\$.

$R = \left(\begin{array}{ll} -0.120037 & -0.0305099 \\ -0.624331 & -0.113095 \\ -0.768149 & 0.0000821421 \\ -0.889938 & -0.00443725 \\ -0.785055 & 0.0584676 \\ -0.655494 & 0.0492092\end{array}\right)$

$C = \left(\begin{array}{lllll} -0.132133 & -0.367417 & -0.631555 & -0.613423 & -0.269067 \\ 0.369931 & 0.506792 & 0.341731 & -0.539825 & -0.445109\end{array}\right)$

The crucial fact is that R and C and D1 are of rank 2: our 5 substances are mixtures of two other substances (at least, only 2 that reacted over that range of frequencies).

He is saying that the responses of the 5 mixtures are linear combinations of the responses of the chemical components of the mixtures. But what might the chemical components be? More importantly, how can we figure out what they are?

That is what target testing does in this example: it identifies chemical components in the mixtures.

He has 3 test vectors, which are the responses of o-xylene, m-xylene, and p-xylene (at those ultraviolet frequencies).

As an aside, he recommends target testing of basis vectors and a vector of all 1s. I will omit these tests in this case. I did perform them, but the most significant result was: I have to say I prefer making inferences based on the new components of x rather than on the difference between x and $\hat{x}\$. (See below.)

## target testing

i want the hat matrix $H = u1\ u1^T\$:

$H = \left(\begin{array}{llllll} 0.0526262 & 0.202678 & 0.0324282 & 0.0446318 & -0.057828 & -0.0488934 \\ 0.202678 & 0.790833 & 0.168855 & 0.221805 & -0.164638 & -0.139724 \\ 0.0324282 & 0.168855 & 0.208336 & 0.241348 & 0.213166 & 0.177988 \\ 0.0446318 & 0.221805 & 0.241348 & 0.280641 & 0.23343 & 0.194817 \\ -0.057828 & -0.164638 & 0.213166 & 0.23343 & 0.392187 & 0.32863 \\ -0.0488934 & -0.139724 & 0.177988 & 0.194817 & 0.32863 & 0.275377\end{array}\right)$

Now, here are the known responses of our 3 potential chemical components (o-, m-, and p-xylene).

$x1 = \left(\begin{array}{l} 0.015 \\ 0.143 \\ 0.4 \\ 0.452 \\ 0.495 \\ 0.416\end{array}\right)$

$x2 = \left(\begin{array}{l} 0.037 \\ 0.397 \\ 0.404 \\ 0.638 \\ 0.581 \\ 0.501\end{array}\right)$

$x3 = \left(\begin{array}{l} 0.112 \\ 0.512 \\ 0.416 \\ 0.489 \\ 0.344 \\ 0.275\end{array}\right)$

BTW, if we had some data missing on the xylenes, we could have used his tricks discussed in “missing data”. We might not have to run more experiments on the xylenes in order to test them against our unknown mixtures, if we’re missing a couple of rows.

Target test them: for each $x_i\$, compute $\hat{x}_i = H x_i\$. (For subsequent computations, be aware that I rounded these answers.)

$\hat{x}_1 = \left(\begin{array}{l} 0.014 \\ 0.144 \\ 0.397 \\ 0.452 \\ 0.497 \\ 0.416\end{array}\right)$

$\hat{x}_2 = \left(\begin{array}{l} 0.066 \\ 0.366 \\ 0.519 \\ 0.599 \\ 0.56 \\ 0.468\end{array}\right)$

$\hat{x}_3 = \left(\begin{array}{l} 0.112 \\ 0.511 \\ 0.417 \\ 0.49 \\ 0.337 \\ 0.281\end{array}\right)$

Look at the differences:

$x_1 - \hat{x}_1 = \{0.001,\ \ -0.001,\ \ 0.003,\ \ 0.,\ \ -0.002,\ \ 0.\}$

$x_2 - \hat{x}_2 = \{-0.029,\ \ 0.031,\ \ -0.115,\ \ 0.039,\ \ 0.021,\ \ 0.033\}$

$x_3 - \hat{x}_3 = \{0.,\ \ 0.001,\ \ -0.001,\ \ -0.001,\ \ 0.007,\ \ -0.006\}$

Let’s not get distracted by statistical tests or empirical measures: let’s take it as established that x1 and x3 are in the factor space, while x2 is not. Our 5 mixtures are composed of o-xylene and p-xylene. This is the cat’s meow.

BTW, here’s why I don’t compute t’s first, then the $\hat{x_i}\$ : I will need the t’s for x1 and x3, but not for x2. More importantly, if my collection of test vectors had been a few hundred chemical compounds rather than three, there would be few hundred t’s which I would never need to know or compute.

I want to look at this the other way. By computing the $\hat{x}_i$ directly, we were comparing old components of x and $\hat{x}_i\$. Let’s also look at the new components of the x’s (WRT the u basis). Going from old components to new, we apply the inverse transition matrix, which is the transpose because u is orthogonal:

$\text{new x1} =\{-0.868656,0.220175,-0.002472,-0.00296659,-0.000130152,0.00209526\}$

$\text{new x2} = \{-1.13786,0.0700214,0.0762967,0.109183,-0.0204724,-0.00223924\}$

$\text{new x3} = \{-0.913977,-0.213045,0.00170192,-0.00104814,0.000217159,-0.00895112\}$

We see that x1 and x3 have very nearly zero for all components after the first two. We attribute the discrepancies to experimental error (in either the components, or the mixtures, or both). For x2, its 2nd component is very small, while its 4th is not that small.

As I said, I am more comfortable assessing the fit of any test vector by finding its new components and seeing directly how far out of the factor space it lies. OTOH, if we know our average experimental errors, it’s the x and xhat we should be comparing; I blithely zeroed out the “small” singular values – but what if we have a trace amount of a third chemical? Ideally, “small” depends on knowing something about the experimental errors. If we have no such information, then I can justify my preference to look at new components.

Let’s make sure we understand the relationship between new components and old: what are the new components of the $\hat{x}_i\$? If you need to think about that, I should let you. Answer to follow, but not immediately.

Malinowski ended his presentation with the computation of the $\hat{x}_i\$ and the comparison of them to the x’s, concluding that the data represented 5 mixtures of 2 chemical components.

## getting the t vectors

Now I’m going to diverge from Malinowski. He explicitly says that he wants to study a different product from the one I will construct, because, he says on p. 64, “In effect, the model would be forced to fit the data.” He does, however, need the pieces of this product.

Let me show you how exactly how the model “fits the data.”

The burning question in my mind is: now that we believe that we have 5 mixtures of o-xylene and p-xylene, what are the chemical compositions of the mixtures? Let me show it to you first.

I could get two t vectors, each would be of length 2, and I could form a 2×2 invertible matrix T whose columns were the t’s. Then I could write

$D1 = R\ T\ T^{-1}\ C$

But $R\ T = \hat{X}\$, the matrix whose columns are $\hat{x}_1\ \text{and}\ \hat{x}_3\$. Then we have

$D1 = \hat{X}\ T^{-1}\ C\$,

and it seems appropriate to denote $T^{-1}\ C$ by $\hat{Y}\$, so we can write

$D1 = \hat{X}\ \hat{Y}\$.

This is one of the things I love about Malinowski’s work: everything comes together right there. $\hat{x} = R\ t$ is not only a least-squares fit (and $\hat{X} = R\ T$ a set of least-squares fits), but the matrix T built from the t vectors provides the desired change-of-basis.

So, we need to get the t’s for x1 and x3. Ah, I have never shown you how to do this. There are two ways, and they look frighteningly similar and different. One is

$t = w1^{-1}\ u1^T\ x$

and the other is

$t = w1^{-1}\ u1^T\ \hat{x}$

Yes, it makes no difference whether I use x or $\hat{x}\$ on the RHS.

There are any number of ways to see that. First, we can derive both equations. Second, roughly speaking, it’s because the hat matrix H is idempotent, so we have $H\ \hat{x} = H\ x \left( = \hat{x}\right)\$. Third, once we have both x and $\hat{x}\$, running a regression with $\hat{x}\$ gets the same coefficients t as running a regression with x. We wouldn’t usually do that, of course, but if we did….

Here’s a derivation of the formula with $\hat{x}\$:

$\hat{x} = R\ t = u1\ w1\ t$

where u1 is orthonormal $\left(u1^T\ u1 = I\text{, or} u1^T\ \text{ is a left inverse for u1}\right)$ and w1 is invertible. Pre-multiply by $u1^T\$:

$u1^T \hat{x} = w1\ t$

and then pre-multiply by $w1^{-1}\$:

$w1^{-1}\ u1^T\ \hat{x} = t$

QED.

Here’s a derivation of the formula with x:

The original definition of t was as the coefficients of a least squares fit:

$t = \left(R^T\ R\right)^{-1}\ R^T\ x$

but we worked all of that out once before. Here it is again:

$R = u1\ w1$

$R^T\ R = w1^T\ u1^T\ u1\ w1 = w1^T\ w1 = w1^2$

$\left(R^T\ R\right)^{-1} = w1^{-2}$

$\left(R^T\ R\right)^{-1}\ R^T = w1^{-2}\ w1^T\ u1^T = w1^{-1}\ u1^T\$,

so

$t = w1^{-1}\ u1^T\ x\$,

QED.

So I define

$G = w1^{-1}\ u1^T$

and compute

$t = G\ x\$.

here’s G:

$G = \left(\begin{array}{llllll} -0.0423826 & -0.220438 & -0.271218 & -0.314219 & -0.277187 & -0.231441 \\ -1.55814 & -5.77575 & 0.004195 & -0.22661 & 2.98594 & 2.51312\end{array}\right)$

Here are the two t’s:

$t_1 = \left(\begin{array}{l} -0.516159 \\ 1.57344\end{array}\right)$

$t_3 = \left(\begin{array}{l} -0.543089 \\ -1.52249\end{array}\right)$

I did also check the formula

$t = w1^{-1}\ u1^T\ \hat{x}$

by computing

$t = G\ H\ x\$,

because I rounded the $\hat{x}_i\$ earlier, so i didn’t now have accurate values to apply G to directly. I did, indeed, get the same answers.

## the physical product Xhat Yhat

i want those two t vectors as columns of a matrix T:

$T = \left(\begin{array}{ll} -0.516159 & -0.543089 \\ 1.57344 & -1.52249\end{array}\right)$

We compute

$\hat{X} = R\ T$

and

$\hat{Y} = T^{-1}\ C\$:

$\hat{X} = \left(\begin{array}{ll} 0.0139527 & 0.111642 \\ 0.144306 & 0.511253 \\ 0.396617 & 0.417049 \\ 0.452368 & 0.490071 \\ 0.497209 & 0.337339 \\ 0.415767 & 0.281071\end{array}\right)$

$\hat{Y} = \left(\begin{array}{lllll} 0.245113 & 0.508802 & 0.699311 & 0.390618 & 0.102366 \\ 0.0103388 & 0.192959 & 0.498258 & 0.758257 & 0.398147\end{array}\right)$

I remind us that we have both

$D1 = R\ C$

and

$D1 = \hat{X}\ \hat{Y}\$

It is that last equation of which Malinowski says, “In effect, the model would be forced to fit the data.” I’m sure of one of his recommendations: replace $\hat{X}$ by X itself, and compare the product

$\hat{D} = X\ \hat{Y}$

with the original data matrix D; i.e. study the differences

$D - X\ \hat{Y}\$.

That uses the data D and the data X, and encapsulates our model in $\hat{Y}\$. For now, I return to the true decomposition

$D1 = \hat{X}\ \hat{Y}\$

We set out to get the chemical compositions of the mixtures. I hope you realize that we are going to find compositions using the $\hat{x}_i$ instead of the $x_i\$. Here’s where the least-squares interpretation comes in handy. The reconstituted mixtures (i.e. the columns of the matrix D1 of reconstituted data) are not exactly linear combinations of the x’s, but they are exactly linear combinations of the $\hat{x}_i\$. We just need to find the coefficients for those linear combinations. If we view the two $\hat{x}_1\$ and $\hat{x}_3\$ as a (newer) basis, we just want the components of D1 wrt the newer basis.

But we already have the answers. (What? I’m not going to write an explicit transition matrix? No. Wonders never cease.)

We can interpret

$D1 = \hat{X}\ \hat{Y}\$

as saying that the five columns of D1 are five linear combinations of the two columns of $\hat{X}\$, and that the coefficients for the combinations are given by the five columns of $\hat{Y}\$. Let me round them:

$\hat{Y} = \left(\begin{array}{lllll} 0.245 & 0.509 & 0.699 & 0.391 & 0.102 \\ 0.01 & 0.193 & 0.498 & 0.758 & 0.398\end{array}\right)$

That is, for example, the first column of D1 represents a .245 and .01 mixture of o-xylene and p-xylene:

$0.245 (o-\text{xylene}) + 0.01 (p-\text{xylene})$

Those are not actually compositions: they add up to .255 instead of 1. Maybe Yhat has to be interpreted as ratios: the first mixture is about 25:1, the second is 50:20, then 70:50, 40:76, 10:40.

Perhaps the sum .255 means that 74.5% of the first mixture is non-reactive? Unfortunately “compositions” 3 and 4 both sum to more than 1, so that interpretation doesn’t work. The responses of the mixtures are linear combinations of the responses of the components, but the coefficients are not exactly compositions. For reference, here are the rounded column sums of Yhat:

$\{0.255,\ 0.702,\ 1.198,\ 1.149,\ 0.501\}$

This was the most attractive part of what I found in Malinowski. We have seen a physical use for target testing, and the use of the t coefficients to express the mixture data in terms of other chemical component data.

From the SVD

$D1 = R\ C\$,

and other data which gave us successful test vectors X, we have obtained a physically meaningful decomposition of the data,

$D1 = \hat{X}\ \hat{Y}\$.

$\hat{Y}$ isn’t literally compositions, but we infer that our 5 substances are mixtures of just two chemical compounds, and we can quantify the mixtures.

This seems fine to me. Malinowski said on p. 24: “… we try to model the data in a fundamental way, seeking a solution of the form $X_{basic}\ Y_{basic} = D\$…. [This equation] expresses the high point of [target factor analysis]. Solutions of this type are the crown jewels of factor analysis.”

Malinowski says we should look at two other things: in one case it’s pretty clear that he recommends studying the difference between the original data D and the product $X\ \hat{Y}$ instead of the product $D1 = \hat{X}\ \hat{Y}\$. In the other case, I believe he has a typo on the page where he describes that. It’s important enough that I will include it in the next post, which I think will be the final one – a summary – about Malinowski.

Oh, I owe you an answer. What are the new components of the $\hat{x}_i\$? Well, the $\hat{x}_i\$ lie exactly in the 2D factor space, so only their first two components are nonzero. That’s why we care if the new components – after the first two – of the x vectors are small, i.e. close to the zero new components of the $\hat{x}_i\$.

In case you want them:

$\text{new}\ \hat{x}_1 = \{-0.868656,\ 0.220175,\ 0,\ 0,\ 0,\ 0\}$

$\text{new}\ \hat{x}_2 =\{-1.13786,\ 0.0700214,\ 0,\ 0,\ 0,\ 0\}$

$\text{new}\ \hat{x}_3 = \{-0.913977,\ -0.213045,\ 0,\ 0,\ 0,\ 0\}$

(That only the first two new components of $\hat{x}_2$ are nonzero tells us nothing about the new components of $x_2\$.)