[PDF][PDF] Analysis of gene expression microarrays for phenotype classification.

A Califano, G Stolovitzky, Y Tu - Ismb, 2000 - cdn.aaai.org
A Califano, G Stolovitzky, Y Tu
Ismb, 2000cdn.aaai.org
Several microarray technologies that monitor the level of expression of a large number of
genes have recently emerged. Given DNA-microarray data for a set of cells characterized by
a given phenotype and for a set of control cells, an important problem is to identify “patterns”
of gene expression that can be used to predict cell phenotype. The potential number of such
patterns is exponential in the number of genes. In this paper, we propose a solution to this
problem based on a supervised learning algorithm, which differs substantially from previous …
Abstract
Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify “patterns” of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes.
In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in the control set. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme.
cdn.aaai.org