Probability-based validation of protein identifications using a modified SEQUEST algorithm

Anal Chem. 2002 Nov 1;74(21):5593-9. doi: 10.1021/ac025826t.

Abstract

Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ACn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Databases, Factual*
  • Information Storage and Retrieval / methods*
  • Mass Spectrometry
  • Molecular Sequence Data
  • Peptides / chemistry*
  • Proteins / chemistry*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Peptides
  • Proteins