Abstract
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms1,2,3,4,5,6. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency7, and demonstrated that an intensity-based method reduces peptide identification error by 50–96% without any loss in sensitivity.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Pandey, A. & Mann, M. Proteomics to study genes and genomes. Nature 405, 837–846 (2000).
Mann, M., Hendrickson, R.C. & Pandey, A. Analysis of proteins and proteomes by mass spectrometry. Annu. Rev. Biochem. 70, 437–473 (2001).
Aebersold, R. & Goodlett, D.R. Mass spectrometry in proteomics. Chem. Rev. 101, 269–295 (2001).
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Tyers, M. & Mann, M. From genomics to proteomics. Nature 422, 193–197 (2003).
Gay, S., Binz, P.A., Hochstrasser, D.F. & Appel, R.D. Peptide mass fingerprinting peak intensity prediction: extracting knowledge from spectra. Proteomics 2, 1374–1391 (2002).
Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).
Eng, J., McCormack, A. & Yates, J.R. 3rd. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Jensen, F.V. Bayesian Networks and Decision Graphs (Springer, New York, 2001).
King, O.D., Foulger, R.E., Dwight, S.S., White, J.V. & Roth, F.P. Predicting gene function from patterns of annotation. Genome Res. 13, 896–904 (2003).
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech J. 27, 379–423,623–656 (1948).
Papayannopoulos, I.A. The interpretation of collision-induced dissociation tandem mass spectra of peptides. Mass Spectrom. Rev. 14, 4973 (1995).
Breci, L.A., Tabb, D.L., Yates, J.R. 3rd & Wysocki, V.H. Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. Anal. Chem. 75, 1963–1971 (2003).
Tabb, D.L. et al. Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal. Chem. 75, 1155–1163 (2003).
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Florens, L. et al. A proteomic view of the Plasmodium falciparum life cycle. Nature 419, 520–526 (2002).
Perkins, D., Pappin, D., Creasy, D. & Cottrell, J. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Peng, J. & Gygi, S.P. Proteomics: the move to mixtures. J. Mass Spectrom. 36, 1083–1091 (2001).
Harrison, A.G. The gas-phase basicities and proton affinities of amino acids and peptides. Mass Spectrom. Rev. 16, 201–217 (1997).
Deber, C.M. et al. TM Finder: a prediction program for transmembrane protein segments using a combination of hydrophobicity and nonpolar phase helicity scales. Protein Sci. 10, 212–219 (2001).
Washburn, M., Wolters, D. & Yates, J.R. 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001).
Acknowledgements
We thank the past and present members of the Gygi Lab and Taplin Biological Mass Spectrometry Facility for generating the spectra used in this study. We gratefully acknowledge John Cottrell of Matrix Science, for examining our test data set with the Mascot algorithm. Particular thanks to S. Gerber, J. Jebanathirajah and H. Steen for insightful comments during manuscript preparation. This work was supported in part by National Institutes of Health (NIH) HG00041 (S.P.G.), NIH National Research Service Award 5T32CA86878 from the National Cancer Institute (J.E.E.), by an institutional grant from the Howard Hughes Medical Institute (F.P.R., F.D.G.), and by National Research Service Award fellowship from NIH/National Human Genome Research Institute (O.D.K.).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Elias, J., Gibbons, F., King, O. et al. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat Biotechnol 22, 214–219 (2004). https://doi.org/10.1038/nbt930
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt930
This article is cited by
-
Prediction of peptide mass spectral libraries with machine learning
Nature Biotechnology (2023)
-
Proteomics analysis of the gut–brain axis in a gut microbiota-dysbiosis model of depression
Translational Psychiatry (2021)
-
Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning
Nature Methods (2019)
-
Deep learning adds an extra dimension to peptide fragmentation
Nature Methods (2019)
-
Mapping Lipid Fragmentation for Tailored Mass Spectral Libraries
Journal of the American Society for Mass Spectrometry (2019)