Hybrid feature detection and information accumulation using high-resolution LC–MS metabolomics data

T Yu, Y Park, S Li, DP Jones - Journal of proteome research, 2013 - ACS Publications
Journal of proteome research, 2013ACS Publications
Feature detection is a critical step in the preprocessing of liquid chromatography–mass
spectrometry (LC–MS) metabolomics data. Currently, the predominant approach is to detect
features using noise filters and peak shape models based on the data at hand alone.
Databases of known metabolites and historical data contain information that could help
boost the sensitivity of feature detection, especially for low-concentration metabolites.
However, utilizing such information in targeted feature detection may cause large number of …
Feature detection is a critical step in the preprocessing of liquid chromatography–mass spectrometry (LC–MS) metabolomics data. Currently, the predominant approach is to detect features using noise filters and peak shape models based on the data at hand alone. Databases of known metabolites and historical data contain information that could help boost the sensitivity of feature detection, especially for low-concentration metabolites. However, utilizing such information in targeted feature detection may cause large number of false positives because of the high levels of noise in LC–MS data. With high-resolution mass spectrometry such as liquid chromatograph–Fourier transform mass spectrometry (LC–FTMS), high-confidence matching of peaks to known features is feasible. Here we describe a computational approach that serves two purposes. First it boosts feature detection sensitivity by using a hybrid procedure of both untargeted and targeted peak detection. New algorithms are designed to reduce the chance of false-positives by nonparametric local peak detection and filtering. Second, it can accumulate information on the concentration variation of metabolites over large number of samples, which can help find rare features and/or features with uncommon concentration in future studies. Information can be accumulated on features that are consistently found in real data even before their identities are found. We demonstrate the value of the approach in a proof-of-concept study. The method is implemented as part of the R package apLCMS at http://www.sph.emory.edu/apLCMS/.
ACS Publications