General framework for developing and evaluating database scoring algorithms using the TANDEM search engine

Bioinformatics. 2006 Nov 15;22(22):2830-2. doi: 10.1093/bioinformatics/btl379. Epub 2006 Jul 28.

Abstract

Motivation: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++.

Availability: Source code for the scoring functions is available from http://proteomics.fhcrc.org

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology / methods*
  • Data Interpretation, Statistical
  • Databases, Factual*
  • Databases, Protein
  • Information Storage and Retrieval
  • Mass Spectrometry / methods*
  • Peptides
  • Programming Languages
  • Proteomics
  • Software

Substances

  • Peptides