6
$\begingroup$

I've been testing PCA via SVD to decompose a simple time series data matrix, $X$. I have two signals $x_1(t)$ and $x_2(t)$ in a data matrix where $M$ rows represents each timepoint sample and each column represents $x_1$ and $x_2$.

The mean signal, $\hat{x}$, is defined as the mean along the row axis (average of $x_1$ and $x_2$ along each timepoint). I normalize each column of $X$ by subtracting its mean and dividing by the standard deviation.

When I use [U S V] = svd(Xz) in matlab, regardless of how the variables are distributed (whether they are correlated or uncorrelated), one of the columns of the right singular matrix, V, always points in the same direction (to a multiplicative constant) as the mean vector $(1/2, 1/2)$. But when add an additional third time series, this is never the case (where the mean vector is $(1/3, 1/3, 1/3)$.

Because I normalize the standard deviation for each vector, it does make sense that the direction of most variance given by PCA would be the diagonal 45 degree line. But if both variables $x_1$ and $x_2$ are independent gaussians, couldn't the PCA direction be any direction since the distribution is radially symmetric?

MATLAB Code:

s = RandStream('mcg16807', 'Seed', 0);
RandStream.setDefaultStream(s);

G = zeros(1000,2);
G(:,1) = 40*randn(1000,1)-100;
G(:,2) = tan(G(:,1)) + randn(1000,1);
X = G - repmat(mean(G),[size(G,1), 1]);
Xz = X./repmat(std(G),[size(G,1), 1]);
[U S V] = svd(Xz);
G_mean = mean(Xz,2);
corrcoef(G_mean, U(:,1))
corrcoef(G_mean, U(:,2))
$\endgroup$
6
  • $\begingroup$ Are you sure your code is correct? I am using python/scipy, but I can't reproduce the behaviour you are seeing. (It also does not make sense, from my point of view...) $\endgroup$
    – bayerj
    Commented Dec 3, 2011 at 18:18
  • $\begingroup$ I've added some MATLAB code to show the behavior I described. $\endgroup$ Commented Dec 3, 2011 at 19:02
  • 1
    $\begingroup$ Because you standardize the columns you are decomposing a bivariate correlation matrix. Whenever its off-diagonal element (the correlation coef.) is non-zero you always get eigenvector matrix with elements +/-.70711 which means 45 degree rotation. But when the correlation is zero there is no need to rotate at all, and so you get eigenvector matrix equal to the correlation matrix. $\endgroup$
    – ttnphns
    Commented Dec 3, 2011 at 20:10
  • $\begingroup$ Could you explain the first sentence more? I'm not sure what is meant by decomposing a bivariate correlation matrix. $\endgroup$ Commented Dec 3, 2011 at 20:53
  • $\begingroup$ I will explain as an answer below $\endgroup$
    – ttnphns
    Commented Dec 3, 2011 at 21:08

2 Answers 2

5
$\begingroup$

The answer is elaboration of my comment, because you asked. The core action of PCA is singular-value decomposition of data matrix X. It is the same thing as eigen-decomposition of square symmetrical matrix X'X: both actions will leave you the same matrix of eigenvectors (which within SVD you call V matrix). This matrix of eigenvectors is the matrix of orthogonal rotation of old axes (columns of X) into new axes (principal components), and its elements are cosines between these and those.

When columns of X are standardized then X'X is a correlation matrix (multiplied by N-1; this multiplication will have no effect on the matrix of eigenvectors). Thus, PCA of standardized data columns is eigen-decomposition of correlation matrix between the data columns.

You have 2 standardized data columns. Whenever correlation between them is not exactly zero you will always get eigenvector matrix which is the 45 degree rotation. If the correlation is exactly zero, no rotation is needed and eigenvector matrix is {1,0;0,1}.

$\endgroup$
3
  • $\begingroup$ Thanks for elaborating on your response. Why does the angle have to be 45 degrees versus any other angle for non-zero correlations between the two columns of a standardized $X$? And why does this fall apart when this is done for higher dimensions (3 and greater). $\endgroup$ Commented Dec 3, 2011 at 22:23
  • 2
    $\begingroup$ Try to eigen-decompose, in MATLAB: [V,D]=eig(R), where R is a correlation matrix of a specific (any) dimension. If all off-diagonal correlations are equal and non-zero you'll get the same V - be these correlations 0.5 or 0.1 or 0.2642 - all one. 2-dimensional case with .70711 values in V is just a particular case. $\endgroup$
    – ttnphns
    Commented Dec 3, 2011 at 23:27
  • 2
    $\begingroup$ (+1) It might help to emphasize that in the case of zero correlation, every vector is an eigenvector, so the ones provided at the end of your answer are but one choice, albeit a natural one. $\endgroup$
    – cardinal
    Commented Dec 4, 2011 at 16:16
6
$\begingroup$

Your $\bf{}X^TX$ matrix is a (multiple of a) correlation matrix when the inputs have been standardised. So we can do this analysis for a correlation matrix.

2 dimensions

Take a $2\times 2$ correlation matrix:

$$R_2=\begin{pmatrix} 1 & r \\ r & 1\end{pmatrix}$$

The characteristic polynomial is given by:

$$p(\lambda)=det(R_2-\lambda I)=(1-\lambda)^2-r^2=(1-\lambda+r)(1-\lambda-r)$$

This means that the (ordered) eigenvalues are given by $\lambda_1=1+|r|\geq \lambda_2=1-|r|$. Note that we can also use $tr(R_2)=2=\lambda_1+\lambda_2$ and $det(R_2)=1-r^2=\lambda_1\lambda_2$ to solve for the $2$ dimensional case (this would not work in higher dimensions). Now we need the eigenvectors $e_1,e_2$, which satisfy

$$\begin{pmatrix} 1 & r \\ r & 1\end{pmatrix}e_j=\begin{pmatrix} e_{j1}+re_{j2} \\ re_{j1}+e_{j2}\end{pmatrix}=\lambda_je_j$$

Now this means that $e_{11}=\frac{r}{|r|}e_{12}=sgn(r)e_{12}$ and, similarly, $e_{21}=-sgn(r)e_{22}$, provided that $r\neq 0$ (note that they are orthogonal, as $e_{11}e_{21}+e_{12}e_{22}=e_{12}e_{22}(sgn(r)-sgn(r))=0$). If $r=0$, then the above argument is invalid (because we have divided by $0$). However, the correlation matrix is already diagonalised, and the eigenvectors are just $e_1=(1,0)$ and $e_2=(0,1)$. The eigenvalues now become tied at $\lambda_1=\lambda_2=1$, consistent with the previous formula.

This shows that the first principal component is one of three lines: the $y=x$ line if $r>0$, the $y=-x$ line if $r<0$ and it is not unique if $r=0$, being indetermininant between $y=x$ and $y=-x$. The second principal component is the "other" line.

3 dimensions

Now in $3$ dimensions, things aren't so easy mathematically, we have a $3\times 3$ correlation matrix

$$R_3=\begin{pmatrix} 1 & q & r \\ q & 1 & s \\ r & s & 1\end{pmatrix}$$

The characteristic polynomial is given by:

$$p(\lambda)=det(R_3-\lambda I)=(1-\lambda)^3-(1-\lambda)(s^2+q^2+r^2)+2qrs$$

Using the formula for solving cubic polynomials (e.g. here). The characteristic polynomial is almost of the form of equation (2) in the link, but with $u=1-\lambda$. Hence we can use the solution given for $u$ in (7)-(9), with $a=-(q^2+r^2+s^2)$ and $b=2qrs$, $A$ from equation (5) and $B$ from equation (6) and we get:

$$\begin{array}{c c} \lambda=1-(A+B) \\ \lambda'=1+\frac{1}{2}(A+B)-i\sqrt{3}\frac{A-B}{2} \\ \lambda''=1+\frac{1}{2}(A+B)+i\sqrt{3}\frac{A-B}{2} \end{array}$$

Where $i=\sqrt{-1}$ is the imaginary unit. Now for this to be real in all three solutions, we require $\Im (A+B)=0$ (else the first solution is complex), and $\Re (A-B)=0$ (else the second and third solutions are complex). Writing out $A=c_{A}+d_{A}i$ and $B=c_{B}+d_{B}i$, this means that we require $c_{A}=c_{B}=c$, and $d_{A}=-d_{B}=d$. In other words, $A$ and $B$ are complex conjugates. The condition required for $A,B$ being of this form means:

$$\frac{b^2}{4}+\frac{a^3}{27}\leq 0\implies 27q^2r^2s^2\leq (q^2+r^2+s^2)^3$$

We then get $A+B=2c$ and $A-B=2di$. Re-writing out the eigenvalues, we find that $|d|\sqrt{3}-1\leq c \leq\frac{1}{2}$ for $R_3$ to be positive semi-definite. We then get the ordered eigenvalues:

$$\begin{array}{c c} \lambda_1=1-2c & \lambda_2=1+c+|d|\sqrt{3} & \lambda_3=1+c-|d|\sqrt{3} & \text{if } c\leq-\frac{|d|}{\sqrt{3}}\\ \lambda_1=1+c+|d|\sqrt{3} & \lambda_2=1-2c & \lambda_3=1+c-|d|\sqrt{3} & \text{if } |c| \leq\frac{|d|}{\sqrt{3}}\\ \lambda_1=1+c+|d|\sqrt{3} & \lambda_2=1+c-|d|\sqrt{3} & \lambda_3=1-2c & \text{if } c\geq\frac{|d|}{\sqrt{3}} \end{array}$$

If any of the above inequalities are not strict, or $d=0$, then some or all eigenvalues are tied and the principal components for those eigenvalues are not unique (more like a principal "hyperplane"). The first principal component follow at once by solving the equation:

$$(R_3-\lambda_1I)e_1=0\implies \begin{array}{c c} (1-\lambda_1)e_{11}+qe_{12}+re_{13}=0 \\ qe_{11}+(1-\lambda_1)e_{12}+se_{13}=0 \\ re_{11}+se_{12}+(1-\lambda_1)e_{13}=0 \end{array}$$

A general solution can be:

$$e_1=(qs-r(1-\lambda_1),\;qr-s(1-\lambda_1),\;(1-\lambda_1)^2-q^2)^T$$

This solution works, so long as all the entries aren't zero (which means implicitly dividing by zero). This solution is also valid for $e_2$ and $e_3$ (e.g. $e_{23}=(1-\lambda_2)^2-q^2$ etc...). If we want the mean vector to be the solution, then we require $(1-\lambda_1+q)(s-r)=q(s-q)+(1-\lambda_1-r)(1-\lambda_1)=0$. The first implies $s=r$ or $\lambda_1=1+q$. The second is impossible, because this implies $e_{13}=0$ (not a mean vector). Plugging $s=r$ into the equations, we get $e_{11}=e_{12}=-r(1-\lambda_1-q)$. To get equality for $e_{13}$ requires that $1+r+q=\lambda_1$.

Note that if all correlations are equal, $q=r=s$, we get real solutions for $A=B=(-\frac{b}{2})^{1/3}=-r$, and a similar form for the largest eigenvalue as in 2 dimensions, $\lambda_1=1+2r>\lambda_2=\lambda_3=1-r$, provided $r>0$, and $\lambda_1=\lambda_2=1-r>\lambda_3=1+2r$ if $r<0$. Taking the $r>0$ case, we then get an eigenvector of:

$$e_1=(3r^2,3r^2,3r^2)^T\propto (1/3,1/3,1/3)^{T}$$

Taking the $r<0$ case, the above solution gives the trivial $e_1=(0,0,0)^T$ (same for $e_2$). Note that if we go back to the three equations, we find they are all identical, giving $r(e_{11}+e_{12}+e_{13})=0$ (same for $e_2$). This is due to the multiplicity of the largest eigenvalue. The mean vector is now moved down to principal component 3. Note that the first 2PCs are still orthogonal to the mean vector. The first principal component is not unique if all correlations are equal and negative. The first two PCs span the subspace orthogonal to the mean vector, but apart from this are arbitrary.

$\endgroup$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.