Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can iden-tify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. Identifying proteins by LC-MS/MS proteomics, however, remains chal-lenging for lowly abundant samples, such as the proteomes of single mammalian cells. To increase the identification rate of peptides in such small samples, we developed DART-ID. This method implements a data-driven, global retention time (RT) alignment process to infer peptide RTs across experiments. DART-ID then incorporates the global RT-estimates within a principled Bayesian framework to increase the confidence in correct peptide-spectrum-matches. Applying DART-ID to hundreds of samples prepared by the Single Cell Proteomics by Mass Spectrometry (SCoPE-MS) design increased the peptide and proteome coverage by 30 – 50% at 1% FDR. The newly identified peptides and proteins were further validated by demonstrating that their quantification is consistent with the quantification of peptides identified from high-quality spectra. DART-ID can be applied to various sets of experimen-tal designs with similar sample complexities and chromatography conditions, and is freely available online.
Author Summary Identifying and quantifying proteins in single cells gives researchers the ability to tackle complex biological problems that involve single cell heterogeneity, such as the treatment of solid tumors. However, the mass spectra from analysis of single cells do not support sequence identification for all analyzed peptides. To improve identification rates, we utilize the retention time of peptide sequences from liquid chromatography – a process used before to separate peptides before their analysis with mass spectrometry. We present both a novel method of aligning the retention times of peptides across experiments, as well as a rigorous framework for using the estimated retention times to enhance peptide sequence identification. Incorporating the retention time as additional evidence leads to a substantial increase in the number of proteins that can be quantified and bio-logically analyzed by single-cell mass spectrometry.