Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

A McDavid, G Finak, PK Chattopadyay… - …, 2013 - academic.oup.com
A McDavid, G Finak, PK Chattopadyay, M Dominguez, L Lamoreaux, SS Ma, M Roederer
Bioinformatics, 2013academic.oup.com
Motivation: Cell populations are never truly homogeneous; individual cells exist in
biochemical states that define functional differences between them. New technology based
on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions
now enables high-throughput single-cell gene expression measurement, allowing
assessment of cellular heterogeneity. However, few analytic tools have been developed
specifically for the statistical and analytical challenges of single-cell quantitative polymerase …
Abstract
Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However, few analytic tools have been developed specifically for the statistical and analytical challenges of single-cell quantitative polymerase chain reactions data.
Results: We present a statistical framework for the exploration, quality control and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. Although developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events.
Availability: All results presented here were obtained using the SingleCellAssay R package available on GitHub (http://github.com/RGLab/SingleCellAssay).
Contact:  rgottard@fhcrc.org
Supplementary information:  Supplementary data are available at Bioinformatics online.
Oxford University Press