Principal components analysis to summarize microarray experiments: application to sporulation time series

S Raychaudhuri, JM Stuart, RB Altman - Biocomputing 2000, 1999 - World Scientific
Biocomputing 2000, 1999World Scientific
A series of microarray experiments produces observations of differential expression for
thousands of genes across multiple conditions. It is often not clear whether a set of
experiments are measuring fundamentally different gene expression states or are
measuring similar states created through different mechanisms. It is useful, therefore, to
define a core set of independent features for the expression states that allow them to be
compared directly. Principal components analysis (PCA) is a statistical technique for …
Abstract
A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components—i.e. 2 variables capture most of the information. These components appear to represent (I) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at http://www.smi.stanford.edu/piojects/helix/PCArray.
World Scientific