Frozen robust multiarray analysis (fRMA)

MN McCall, BM Bolstad, RA Irizarry - Biostatistics, 2010 - academic.oup.com
Biostatistics, 2010academic.oup.com
Robust multiarray analysis (RMA) is the most widely used preprocessing algorithm for
Affymetrix and Nimblegen gene expression microarrays. RMA performs background
correction, normalization, and summarization in a modular way. The last 2 steps require
multiple arrays to be analyzed simultaneously. The ability to borrow information across
samples provides RMA various advantages. For example, the summarization step fits a
parametric model that accounts for probe effects, assumed to be fixed across arrays, and …
Abstract
Robust multiarray analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last 2 steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has 2 drawbacks: (1) RMA cannot be used in clinical settings where samples must be processed individually or in small batches and (2) data sets preprocessed separately are not comparable. We propose a preprocessing algorithm, frozen RMA (fRMA), which allows one to analyze microarrays individually or in small batches and then combine the data for analysis. This is accomplished by utilizing information from the large publicly available microarray databases. In particular, estimates of probe-specific effects and variances are precomputed and frozen. Then, with new data sets, these are used in concert with information from the new arrays to normalize and summarize the data. We find that fRMA is comparable to RMA when the data are analyzed as a single batch and outperforms RMA when analyzing multiple batches. The methods described here are implemented in the R package fRMA and are currently available for download from the software section of http://rafalab.jhsph.edu.
Oxford University Press