Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

J Proteome Res. 2008 Jun;7(6):2195-203. doi: 10.1021/pr070510t. Epub 2008 Apr 19.

Abstract

The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Data Interpretation, Statistical
  • Discriminant Analysis
  • Fibroblasts / chemistry
  • Fibroblasts / cytology
  • Fibroblasts / radiation effects
  • Humans
  • Internet
  • Mass Spectrometry / statistics & numerical data*
  • Normal Distribution
  • Phosphopeptides / analysis*
  • Proteomics / methods*
  • ROC Curve
  • Reproducibility of Results
  • Skin / cytology
  • Software

Substances

  • Phosphopeptides