Abstract
The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum’s best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here we first highlight and empirically augment a little-known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method (“PSM-only”) offers the lowest statistical power in practice. An alternative approach that carries out competition at both the PSM and the peptide level (“PSM-and-peptide”) is the most powerful method, yielding an average increase of 17% more discovered peptides at a 1% FDR threshold relative to the PSM-only method.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Section on entrapment experiment updated to clarify how entrapment experiments work; Supplemental file updated to include algorithm description; Figure 1 added