Comparative analysis of statistical methods used for detecting differential expression in label-free mass spectrometry proteomics

SR Langley, M Mayr - Journal of proteomics, 2015 - Elsevier
Journal of proteomics, 2015Elsevier
Abstract Label-free LC-MS/MS proteomics has proven itself to be a powerful method for
evaluating protein identification and quantification from complex samples. For comparative
proteomics, several methods have been used to detect the differential expression of proteins
from such data. We have assessed seven methods used across the literature for detecting
differential expression from spectral count quantification: Student's t-test, significance
analysis of microarrays (SAM), normalised spectral abundance factor (NSAF), normalised …
Abstract
Label-free LC-MS/MS proteomics has proven itself to be a powerful method for evaluating protein identification and quantification from complex samples. For comparative proteomics, several methods have been used to detect the differential expression of proteins from such data. We have assessed seven methods used across the literature for detecting differential expression from spectral count quantification: Student's t-test, significance analysis of microarrays (SAM), normalised spectral abundance factor (NSAF), normalised spectral abundance factor-power law global error model (NSAF-PLGEM), spectral index (SpI), DESeq and QSpec. We used 2000 simulated datasets as well as publicly available data from a proteomic standards study to assess the ability of these methods to detect differential expression in varying effect sizes and proportions of differentially expressed proteins. At two false discovery rate (FDR) levels, we find that several of the methods detect differential expression within the data with reasonable precision, others detect differential expression at the expense of low precision, and finally, others which fail to identify any differentially expressed proteins. The inability of these seven methods to fully capture the differential landscape, even at the largest effect size, illustrates some of the limitations of the existing technologies and the statistical methodologies.
Significance
In label-free mass spectrometry experiments, protein identification and quantification have always been important, but there is now a growing focus on comparative proteomics. Detecting differential expression in protein levels can inform on important biological mechanisms and provide direction for further study. Given the high cost and labour intensive nature of validation experiments, statistical methods are important for prioritising proteins of interest. Here, we have performed a comparative analysis to investigate the statistical methodologies for detecting differential expression and provide a reference for future experimental designs.
This article is part of a Special Issue entitled: Computational Proteomics.
Elsevier