AILUN: reannotating gene expression data automatically

R Chen, L Li, AJ Butte - Nature methods, 2007 - nature.com
Nature methods, 2007nature.com
To the editor: Gene Expression Omnibus (GEO) 1 is a public repository for gene expression
data. While the amount of data in GEO has grown exponentially, the number of publications
citing GEO has only grown linearly. The difficulty in data reuse is the mapping of probes in
GEO datasets to established gene identifiers, which can change as annotations for the
underlying sequences change2. Therefore, microarray results need to be reevaluated with
the latest probe annotations. There have been several previous efforts to reannotate …
To the editor: Gene Expression Omnibus (GEO) 1 is a public repository for gene expression data. While the amount of data in GEO has grown exponentially, the number of publications citing GEO has only grown linearly. The difficulty in data reuse is the mapping of probes in GEO datasets to established gene identifiers, which can change as annotations for the underlying sequences change2. Therefore, microarray results need to be reevaluated with the latest probe annotations. There have been several previous efforts to reannotate microarray probe identifiers3, 4, but only for a few platforms and species.
We built a fully automated system, Array Information Library Universal Navigator (AILUN), to reannotate all types of microarrays in GEO periodically by relating every probe identifier to Entrez Gene identifiers. First, we collected all gene identifiers from Entrez Gene and UniGene and built a universal gene identifier table (UGIT). We then matched each column of every GEO platform with UGIT to find the best matching column and type of external identifier, and annotated each probe identifier with Entrez Gene identifiers.
nature.com