Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). This method implements global retention time (RT) alignment to infer peptide RTs across experiments. DART-ID then incorporates the global RT-estimates within a principled Bayesian framework to increase the confidence in correct peptide-spectrum-matches and decrease confidence in incorrect peptide-spectrum-matches. Applying DART-ID to hundreds of monocyte and T-cell samples prepared by the Single Cell Proteomics by Mass Spectrometry (SCoPE-MS) design increased the number of data points by 30 - 50% at 1% FDR, and thus decreased missing data. Quantification benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for downstream analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at: http://github.com/SlavovLab/DART-ID.
Footnotes
We performed additional analysis to demonstrate the quantification accuracy of peptides upgraded by DART-ID and their utility for downstream analysis, such as identifying cell types and cell-type specific proteins. Furthermore, we demonstrated the utility of DART-ID for analyzing bulk LC-MS/MS data.