Revisions to Making sense of principal component analysis, eigenvectors & eigenvalues

edited body

Source Link

edited Jun 20, 2013 at 10:07

60.9k
8
143
221

The way I understand principleprincipal components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principleprincipal components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principleprincipal component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

The way I understand principle components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principle components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principle component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

The way I understand principal components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principal components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principal component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

Removed signature

Source Link

edit approved Dec 9, 2012 at 21:25

jonsca

1.8k
3
20
30

I'm a biologist with little math background, but I have a working knowledge of basic stats. Some of the explanations in here are very useful.

The way I understand principle components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principle components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principle component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

Cheers!

Jeremy

I'm a biologist with little math background, but I have a working knowledge of basic stats. Some of the explanations in here are very useful.

The way I understand principle components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principle components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principle component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

Cheers!

Jeremy

The way I understand principle components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principle components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principle component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

Source Link

answered Dec 9, 2012 at 20:49

Jeremias Jackson

111
1
2

I'm a biologist with little math background, but I have a working knowledge of basic stats. Some of the explanations in here are very useful.

The way I understand principle components is this: Data with multiple variables (height, weight, age, temperature, wavelength, percent survival, etc) can be presented in three dimensions to plot relatedness.

Now if you wanted to somehow make sense of "3D data", you might want to know which 2D planes (cross-sections) of this 3D data contain the most information for a given suite of variables. These 2D planes are the principle components, which contain a proportion of each variable.

Think of principal components as variables themselves, with composite characteristics from the original variables (this new variable could be described as being part weight, part height, part age, etc). When you plot one principle component (X) against another (Y), what you're doing is building a 2D map that can geometrically describe correlations between original variables. Now the useful part: since each subject (observation) being compared is associated with values for each variable, the subjects (observations) are also found somewhere on this X Y map. Their location is based on the relative contributions of each underlying variable (i.e. one observation may be heavily affected by age and temperature, while another one may be more affected by height and weight). This map graphically shows us the similarities and differences between subjects and explains these similarities/differences in terms of which variables are characterizing them the most.

Cheers!

Jeremy

Stack Exchange Network

Return to Answer