## Introduction

The previous post discussed something interesting that Basilevsky did. (Search the bibliography, he’s there.) I can’t say I like it — because it leads to a basis which is non-orthogonal, and whose vectors are not eigenvectors of either the correlation or covariance matrices.

But I had to understand it.

I don’t know how widespread it is nowadays, but even if Basilevsky is the only author who does it, it’s another example of the lack of standardization (no pun intend, I swear) in PCA / FA. This branch of applied statistics is like the mythical Wild West: everybody’s got a gun and there are bullets flying all over the place. Law and order have not arrived yet.

OTOH, it’s nice to find something different in just about every book I open.

Let me set the stage again. What we have is the following 3 models.

$Xc^T = Ac\ Zc^T$

$Xs^T = Ar\ Zc^T$

$Xs^T = As\ Zs^T\$.

where X is data (variables in columns, observations in rows, and there are N rows), Z is principal components (PCs), and A is constructed from an eigenvector matrix V using the diagonal matrix $\Lambda$ of $\sqrt{\text{eigenvalues}}$.

Now is a good time, before we need to be reminded of it, to emphasize that all three models are of the same form:

$X^T = A\ Z^T\$.

In the first model,

$Xc^T = Ac\ Zc^T\$,

Xc is centered data, and

$Ac = Vc\ \Lambda c\$,

where Vc is an orthogonal eigenvector matrix of the covariance matrix $c = \frac{Xc^T\ Xc}{N-1}\$:

$\Lambda c^2 = Vc{^{-1}}\ c\ Vc = Vc^T\ c\ Vc\$.

In the third model,

$Xs^T = As\ Zs^T\$,

Xs is standardized data, and

$As = Vs\ \Lambda s\$,

where Vs is an orthogonal eigenvector matrix of the correlation matrix $r = \frac{Xs^T Xs}{N-1}\$:

$\Lambda s^2 = Vs{^{-1}}\ r\ Vs = Vs^T\ r Vs\$.

In the second model,

$Xs^T = Ar\ Zc^T$

we have the standardized data Xs as in the third model, but the PCs Zc from the first model. The transition matrix Ar may be found by normalizing the rows of Ac:

$Ar = \sigma{^{-1}}\ Ac\$,

where $\sigma$ is a diagonal matrix of the standard deviations ($\sqrt{\text{variances}}$) of Xc.

## The example

Basilevsky gives us a covariance matrix c. (So we have no data, for this example.)

$c = \left(\begin{array}{llll} 471.51 & 324.71 & 73.24 & 4.35 \\ 324.71 & 224.84 & 50.72 & 2.81 \\ 73.24 & 50.72 & 11.99 & 1.23 \\ 4.35 & 2.81 & 1.23 & 0.98\end{array}\right)$

We will want both the correlation matrix r and the diagonal matrix $\sigma\$, so let’s get them. $\sigma\$ is to be a diagonal matrix of the square roots of the variances, hence of the square roots of the diagonal of c. (i should call it s, because it’s made up of sample standard deviations, but I don’t want “s” as well as “As”, “Vs”, etc.) Here it is:

$\sigma = \left(\begin{array}{llll} 21.7143 & 0. & 0. & 0. \\ 0. & 14.9947 & 0. & 0. \\ 0. & 0. & 3.46266 & 0. \\ 0. & 0. & 0. & 0.989949\end{array}\right)$

Once we have $\sigma\$, the fastest way for me to get the correlation matrix is

$r = \sigma^{-1}\ c\ \sigma^{-1}\$:

$r = \left(\begin{array}{llll} 1. & 0.9973 & 0.9741 & 0.2024 \\ 0.9973 & 1. & 0.9769 & 0.1893 \\ 0.9741 & 0.9769 & 1. & 0.3588 \\ 0.2024 & 0.1893 & 0.3588 & 1.\end{array}\right)$

## From the covariance matrix:

So let’s do the eigendecompositions. First from the covariance matrix, which will give us the first model. We get eigenvalues…

$\lambda c = \{706.979,\ 1.34915,\ 0.894303,\ 0.0971549\}$

an eigenvector matrix…

$V = \left(\begin{array}{llll} -0.8164 & -0.0236 & -0.5705 & 0.0862 \\ -0.5633 & -0.1048 & 0.7665 & -0.2903 \\ -0.1272 & 0.5679 & 0.2742 & 0.7656 \\ -0.0075 & 0.816 & -0.1089 & -0.5676\end{array}\right)$

These are not quite Basilevsky’s numbers. The 4th eigenvector is off, by as much as .0014 . ah, the 2nd is off too, in two terms. Not great, but not a big deal. (Actually, I want to check his numbers, but not today. I will be using my numbers for eigenvectors and eigenvalues.)

(I suspect that most packages which do PCA / FA use iterative schemes to find one eigenvector after another; I’m getting used to seeing more and more errors as we move to the right in the eigenvector matrices. I think they’re suffering from accumulated round-off errors. Of course, I generally take it for granted that my answers are the right ones, but see below.)

Now I’m going change the signs to agree with Basilevsky:

$Vc = \left(\begin{array}{llll} 0.8164 & -0.0236 & 0.5705 & 0.0862 \\ 0.5633 & -0.1048 & -0.7665 & -0.2903 \\ 0.1272 & 0.5679 & -0.2742 & 0.7656 \\ 0.0075 & 0.816 & 0.1089 & -0.5676\end{array}\right)$

We will discover that Basilevsky should have changed the sign of the second column, too, but he didn’t so I won’t either.

Let’s keep going. We get the diagonal matrix $\Lambda c$ of $\sqrt{\text{eigenvalues}}\$

$\Lambda c = \left(\begin{array}{llll} 26.5891 & 0. & 0. & 0. \\ 0. & 1.16153 & 0. & 0. \\ 0. & 0. & 0.945676 & 0. \\ 0. & 0. & 0. & 0.311697\end{array}\right)$

and $Ac = Vc\ \Lambda c$

$Ac = \left(\begin{array}{llll} 21.7075 & -0.0274391 & 0.539517 & 0.0268828 \\ 14.9764 & -0.121762 & -0.724841 & -0.0904817 \\ 3.38092 & 0.659679 & -0.259284 & 0.238627 \\ 0.199249 & 0.947838 & 0.102961 & -0.176925\end{array}\right)$

Basilevsky then displays equations for the variables, using Vc not Ac:

$\begin{array}{c} \ \text{Xc1}=0.8164 \ \text{Zu1}-0.0236 \ \text{Zu2}+0.5705 \ \text{Zu3}+0.0862 \ \text{Zu4} \\ \ \text{Xc2}=0.5633 \ \text{Zu1}-0.1048 \ \text{Zu2}-0.7665 \ \text{Zu3}-0.2903 \ \text{Zu4} \\ \ \text{Xc3}=0.1272 \ \text{Zu1}+0.5679 \ \text{Zu2}-0.2742 \ \text{Zu3}+0.7656 \ \text{Zu4} \\ \ \text{Xc4}=0.0075 \ \text{Zu1}+0.816 \ \text{Zu2}+0.1089 \ \text{Zu3}-0.5676 \ \text{Zu4}\end{array}$

What he has written is

i.e.

$Xc^T = Vc\ Zu^T\$,

where the Zu are unstandardized, in contrast to Zc and Zs which are standardized. (The Zu were computed from Vc, not from Ac.) I remark that Basilevsky did not compute Ac.

## From the correlation matrix:

Now let’s do the correlation matrix, which will give us the third model. Here are the eigenvalues…

$\lambda s = \{3.0569,\ 0.9283,\ 0.013,\ 0.0018\}$

and eigenvectors…

$Vs = \left(\begin{array}{llll} 0.5627 & 0.1712 & 0.5688 & 0.5749 \\ 0.5624 & 0.1848 & 0.1817 & -0.7852 \\ 0.5696 & -0.0008 & -0.7906 & 0.2248 \\ 0.2065 & -0.9677 & 0.136 & -0.0485\end{array}\right)$

Let’s stop there for a moment.

I want to compare the two V’s: recall Vc…

$Vc = \left(\begin{array}{llll} 0.8164 & -0.0236 & 0.5705 & 0.0862 \\ 0.5633 & -0.1048 & -0.7665 & -0.2903 \\ 0.1272 & 0.5679 & -0.2742 & 0.7656 \\ 0.0075 & 0.816 & 0.1089 & -0.5676\end{array}\right)$

He says, “… even though $\alpha_{42}$ is positive as a covariance loading, it becomes highly negative as a correlations loadings [sic].”

He correctly points out that the 2nd column of Vc and the second column of Vs would have the same signs if one were multiplied by negative 1, and correctly concludes that we can take care of the sign of $\alpha_{42}$ by changing the sign of the second column. (At least, I hope that what he said. And $\alpha$ is his notation for an element of what i’m calling V.)

Unfortunately, he missed the fact that the 3rd columns cannot be made to have the same signs. The 3rd column of Vc has 2 negative signs; the 3rd column of Vs has only 1.

That is, even though $Vs_{23}$ is positive, $Vc_{23}$ is negative, when all other signs are the same.

This serves as a reminder that Vc and Vs, eigenvectors of the covariance matrix and of the correlation matrix, do not have a simple relationship to each other.

i will also point out that both Ac and Ar will be derived from Vc by scaling by positive numbers, and As will be derived from Vs by scaling by positive numbers, so this sign discrepancy between the (2,3) elements of Vs and Vc will propagate to the (2,3) elements of Ac and Ar versus As.

Now, lets get back to doing our thing. Get $\Lambda s\$

$\Lambda s = \left(\begin{array}{llll} 1.74839 & 0. & 0. & 0. \\ 0. & 0.963503 & 0. & 0. \\ 0. & 0. & 0.114029 & 0. \\ 0. & 0. & 0. & 0.0424615\end{array}\right)$

and As…

$As = \left(\begin{array}{llll} 0.9839 & 0.1649 & 0.0649 & 0.0244 \\ 0.9832 & 0.1781 & 0.0207 & -0.0333 \\ 0.9959 & -0.0008 & -0.0901 & 0.0095 \\ 0.361 & -0.9324 & 0.0155 & -0.0021\end{array}\right)$

Yes and yes, Basilevsky and I agree. Even to all his signs.

We use As and get the equations

$\begin{array}{c} \ \text{Xs1}=0.9839 \ \text{Zs1}+0.1649 \ \text{Zs2}+0.0649 \ \text{Zs3}+0.0244 \ \text{Zs4} \\ \ \text{Xs2}=0.9832 \ \text{Zs1}+0.1781 \ \text{Zs2}+0.0207 \ \text{Zs3}-0.0333 \ \text{Zs4} \\ \ \text{Xs3}=0.9959 \ \text{Zs1}-0.0008 \ \text{Zs2}-0.0901 \ \text{Zs3}+0.0095 \ \text{Zs4} \\ \ \text{Xs4}=0.361 \ \text{Zs1}-0.9324 \ \text{Zs2}+0.0155 \ \text{Zs3}-0.0021 \ \text{Zs4}\end{array}$

These Xs are standardized, and so are the Zs, in contrast to the Xc and Zu of the first set of equations:

$\begin{array}{c} \ \text{Xc1}=0.8164 \ \text{Zu1}-0.0236 \ \text{Zu2}+0.5705 \ \text{Zu3}+0.0862 \ \text{Zu4} \\ \ \text{Xc2}=0.5633 \ \text{Zu1}-0.1048 \ \text{Zu2}-0.7665 \ \text{Zu3}-0.2903 \ \text{Zu4} \\ \ \text{Xc3}=0.1272 \ \text{Zu1}+0.5679 \ \text{Zu2}-0.2742 \ \text{Zu3}+0.7656 \ \text{Zu4} \\ \ \text{Xc4}=0.0075 \ \text{Zu1}+0.816 \ \text{Zu2}+0.1089 \ \text{Zu3}-0.5676 \ \text{Zu4}\end{array}$

## Basilevsky’s model:

Basilevsky has shown us that if we define

$Ar = \sigma^{-1}Ac\$,

then we would have the second model,

$Xs^T = Ar\ Zc^T\$.

here’s Ar…

$Ar = \left(\begin{array}{llll} 0.9997 & -0.0013 & 0.0248 & 0.0012 \\ 0.9988 & -0.0081 & -0.0483 & -0.006 \\ 0.9764 & 0.1905 & -0.0749 & 0.0689 \\ 0.2013 & 0.9575 & 0.104 & -0.1787\end{array}\right)$

And here’s the corresponding equations…

$\begin{array}{c} \ \text{Xs1}=0.9997 \ \text{Zc1}-0.0013 \ \text{Zc2}+0.0248 \ \text{Zc3}+0.0012 \ \text{Zc4} \\ \ \text{Xs2}=0.9988 \ \text{Zc1}-0.0081 \ \text{Zc2}-0.0483 \ \text{Zc3}-0.006 \ \text{Zc4} \\ \ \text{Xs3}=0.9764 \ \text{Zc1}+0.1905 \ \text{Zc2}-0.0749 \ \text{Zc3}+0.0689 \ \text{Zc4} \\ \ \text{Xs4}=0.2013 \ \text{Zc1}+0.9575 \ \text{Zc2}+0.104 \ \text{Zc3}-0.1787 \ \text{Zc4}\end{array}$

followed immediately by the second set computed earlier…

$\begin{array}{c} \ \text{Xs1}=0.9839 \ \text{Zs1}+0.1649 \ \text{Zs2}+0.0649 \ \text{Zs3}+0.0244 \ \text{Zs4} \\ \ \text{Xs2}=0.9832 \ \text{Zs1}+0.1781 \ \text{Zs2}+0.0207 \ \text{Zs3}-0.0333 \ \text{Zs4} \\ \ \text{Xs3}=0.9959 \ \text{Zs1}-0.0008 \ \text{Zs2}-0.0901 \ \text{Zs3}+0.0095 \ \text{Zs4} \\ \ \text{Xs4}=0.361 \ \text{Zs1}-0.9324 \ \text{Zs2}+0.0155 \ \text{Zs3}-0.0021 \ \text{Zs4}\end{array}$

There are some satisfying similarities. The first columns differ significantly only in the last entry, and even that is .2 vs .36 . For the second column, as we said, if we had changed its sign, all the signs would match up, and in particular the single large terms would then be +0.9324 and +0.9575 .

I should emphasize that he calls both Ar and As, “correlation loadings”, at one point or another.

We could also severely round both As and Ar to confirm their similarity:

$As = \left(\begin{array}{llll} 1. & 0 & 0 & 0 \\ 1. & 0 & 0 & 0 \\ 1. & 0 & 0 & 0 \\ 0.5 & -1. & 0 & 0\end{array}\right)$

$Ar = \left(\begin{array}{llll} 1. & 0 & 0 & 0 \\ 1. & 0 & 0 & 0 \\ 1. & 0 & 0 & 0 \\ 0 & 1. & 0 & 0\end{array}\right)$

As he said, we shoulda changed the sign of the second column of Vc.

Without data, there isn’t a whole lot more we can do here. But I do want to show you one last thing: let’s take a closer look at Ar. We computed it from

$Ar = \sigma^{-1}Ac$

and the rows of Ar are supposed to be of unit length. As usual, the fastest way to compute the lengths (more precisely, the squared lengths) of vectors in an array A, is to compute either $A\ A^T$ or $A^T\ A\$. The former gives us dot products of all the rows with each other, including with themselves, while the latter gives us all the dot products among the columns.

So if we compute $Ar Ar^T\$, the diagonal elements should be 1. We see that they are:

$Ar Ar^T = \left(\begin{array}{llll} 1. & 0.997272 & 0.974077 & 0.202363 \\ 0.997272 & 1. & 0.976861 & 0.189303 \\ 0.974077 & 0.976861 & 1. & 0.358825 \\ 0.202363 & 0.189303 & 0.358825 & 1.\end{array}\right)$

In fact, we see that $Ar Ar^T\$ is equal to our old friend the correlation matrix r:

$r = \left(\begin{array}{llll} 1. & 0.997272 & 0.974077 & 0.202363 \\ 0.997272 & 1. & 0.976861 & 0.189303 \\ 0.974077 & 0.976861 & 1. & 0.358825 \\ 0.202363 & 0.189303 & 0.358825 & 1.\end{array}\right)$

Don’t be distracted, however, by his scaling the rows. The new basis – which gives us the PCs Zc from Xs – is specified by the columns of Ar, just as the columns of Ac and As were the bases that gave us Zc from Xc, and Zs from Xs. (As I said, all three models are of the same form: $X^T = A\ Z^T\$. And as I’ve said before, we should keep our eyes on the linear algebra.)

So what does the basis Ar look like? (I’ve said some terrible things about it; let me justify them.) For this, we compute $Ar^T\ Ar\$:

$Ar^T\ Ar = \left(\begin{array}{llll} 2.9908 & 0.369352 & -0.0756215 & 0.0265267 \\ 0.369352 & 0.953093 & 0.0856776 & -0.157942 \\ -0.0756215 & 0.0856776 & 0.0193785 & -0.023426 \\ 0.0265267 & -0.157942 & -0.023426 & 0.0367285\end{array}\right)$

That the diagonals are not 1 says the columns are not of unit length; that the off-diagonal elements are not zero says the columns are not mutually orthogonal.

Ar is not a very nice basis at all.

I think it has some redeeming social value, but we’ll look at that when we have data.

## Looking at eigenvectors

Let’s check one more thing. The columns of Vc are eigenvectors of the covariance matrix: we should have

$c\ Vc = Vc\ \Lambda c^2\$.

Do we? We do. (When I compute the difference, I get zero, to within $10^{-13}\$.)

And columns of Ac are eigenvectors of the covariance matrix, too:

$c\ Ac = Ac\ \Lambda c^2\$.

Similarly, columns of Vs and As are eigenvectors of r, and we should have both

$r\ Vs = Vr\ \Lambda s^2$

and

$r\ As = As\ \Lambda s^2\$.

We do.

(This is why I’m so sure my answers are right: I’ve checked the eigendecompositions.)

What about Ar? Even supposing that its columns are eigenvectors of c, we don’t know what the eigenvalues are. No problem, just see if

r Ar

is proportional to Ar. (That is, see if columns are proportional; see below.)

We have

$r\ Ar = \left(\begin{array}{llll} 2.98756 & 0.369966 & -0.0752539 & 0.0261815 \\ 2.98765 & 0.357973 & -0.0770204 & 0.0286877 \\ 2.99806 & 0.524909 & -0.0605794 & 0.0000958891 \\ 0.942999 & 1.02403 & 0.0730146 & -0.154885\end{array}\right)$

and when we divide, element by element, by Ar, we get

$\left(\begin{array}{llll} 2.98849 & -292.778 & -3.02879 & 21.1478 \\ 2.9913 & -44.0836 & 1.59331 & -4.75413 \\ 3.07054 & 2.75525 & 0.809018 & 0.00139143 \\ 4.68519 & 1.06953 & 0.70202 & 0.866628\end{array}\right)$

so we conclude that the columns of Ar are not eigenvectors of r. (Put that the other way: if columns of Ar are eigenvectors, then applying r to them should yield columns which are proportional to the eigenvectors, i.e. proportional to Ar.)

Are the columns of Ar eigenvectors of c? here’s c Ar…

$c\ Ar = \left(\begin{array}{llll} 868.064 & 14.8855 & -9.01303 & 2.8942 \\ 599.263 & 10.1172 & -6.30661 & 2.03838 \\ 135.83 & 2.95751 & -1.40195 & 0.391071 \\ 8.55343 & 1.14433 & -0.0179306 & -0.101953\end{array}\right)$

divided by Ar…

$\left(\begin{array}{llll} 868.334 & -11779.8 & -362.753 & 2337.76 \\ 599.995 & -1245.91 & 130.464 & -337.802 \\ 139.114 & 15.524 & 18.7226 & 5.67474 \\ 42.4968 & 1.19517 & -0.172399 & 0.570459\end{array}\right)$

Again, no; the columns of Ar are not eigenvectors of c.

Oh, do you need to see how that calculation works when it comes out right? here’s c Ac…

$c\ Ac = \left(\begin{array}{llll} 15346.8 & -0.0370194 & 0.482492 & 0.0026118 \\ 10588. & -0.164275 & -0.648228 & -0.00879074 \\ 2390.24 & 0.890006 & -0.231879 & 0.0231838 \\ 140.865 & 1.27878 & 0.0920784 & -0.0171892\end{array}\right)$

and dividing by Ac gives us…

$\left(\begin{array}{llll} 706.979 & 1.34915 & 0.894303 & 0.0971549 \\ 706.979 & 1.34915 & 0.894303 & 0.0971549 \\ 706.979 & 1.34915 & 0.894303 & 0.0971549 \\ 706.979 & 1.34915 & 0.894303 & 0.0971549\end{array}\right)$

which says the 4 eigenvalues for the 4 columns are

$\{706.979,\ 1.34915,\ 0.894303,\ 0.0971549\}$

and that’s exactly what we got a long time ago:

$\lambda c = \{706.979,\ 1.34915,\ 0.894303,\ 0.0971549\}$

The purpose of this post was to demonstrate Basilevsky’s computations. Next time, I’ll do these kinds of calculations with data.