Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Multi-reference spectral library yields almost complete coverage of heterogeneous LC-MS/MS data sets

Constantin Ammar, Evi Berchtold, Gergely Csaba, Andreas Schmidt, Axel Imhof, Ralf Zimmer
doi: https://doi.org/10.1101/180448
Constantin Ammar
1Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität Muüchen, Amalienstrasse 17, 80333 Munchen, Germany
2Graduate School of Quantitative Biosciences, Ludwig-Maximilians-Universität München, Feodor-Lynen-Str. 25, 81337 Munchen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evi Berchtold
1Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität Muüchen, Amalienstrasse 17, 80333 Munchen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gergely Csaba
1Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität Muüchen, Amalienstrasse 17, 80333 Munchen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andreas Schmidt
3Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Groβhaderner Straβe 9, 82152 Planegg-Martinsried, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Axel Imhof
2Graduate School of Quantitative Biosciences, Ludwig-Maximilians-Universität München, Feodor-Lynen-Str. 25, 81337 Munchen, Germany
3Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Groβhaderner Straβe 9, 82152 Planegg-Martinsried, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ralf Zimmer
1Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität Muüchen, Amalienstrasse 17, 80333 Munchen, Germany
2Graduate School of Quantitative Biosciences, Ludwig-Maximilians-Universität München, Feodor-Lynen-Str. 25, 81337 Munchen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Ralf.Zimmer@bio.ifi.lmu.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Summary

Spectral libraries play a central role in the analysis of data independent acquisition (DIA) proteomics experiments. DIA experiments require spectral libraries, as most current methods cannot apply traditional peptide identification via database searching on DIA data. A central assumption in current spectral library tools is that a single characteristic intensity pattern (CIP) suffices to describe the fragmentation of an unmodified peptide in a particular parent ion charge state (peptide charge pair). However, we find that this is often not the case.

We analyze a heterogeneous dataset of 440.000 MaxQuant - preprocessed peptide spectra from a QToF mass spectrometer, stemming from over 100 different LC-MS/MS runs. The dataset corresponds to 10.580 peptide charge pairs, which have each been measured and identified at least 20 times. We demonstrate that the same charged and unmodified peptide can fragment in multiple reproducible ways, even within the same LC-MS/MS run. We integrate multiple reference CIPs (MCIPs) in our model library and observe a >99% coverage of replicate fragmentation spectra for 95% of peptide charge pairs (using up to four CIPs). Using a single CIP (as in current spectral library approaches), we find >99% coverage for only 50% of the peptide charge pairs.

Our approach achieves substantially greater sensitivity in comparison to the popular SpectraST library generation tool. Using randomized decoy spectra, we demonstrate that identification accuracy of the MCIP approach is improved by up to 12% compared to a single CIP approach. We test the MCIP approach on a SWATH data set and observe a ∼30% increase in peptide recognition. We conclude that including MCIPs in spectral library approaches would yield increased sensitivity without compromising the false discovery rate.

Abbreviations DIA: data independent acquisition

DDA: data dependent acquisition

SRM: selected reaction monitoring

MRM: multiple reaction monitoring

PRM: parallel reaction monitoring

SWATH: sequential window acquisition of all theoretical mass spectra

CIP: characteristic intensity pattern

MCIP: multiple characteristic intensity patterns

FA: formic acid

TFA: trifluoroacetic acid

ACN: aceonitrile

PSM: peptide spectrum match

cps: counts per second

NRFV: normalized replicate fragmentation vector TPP: trans proteomic pipeline

SVM: support vector machine

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 26, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Multi-reference spectral library yields almost complete coverage of heterogeneous LC-MS/MS data sets
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Multi-reference spectral library yields almost complete coverage of heterogeneous LC-MS/MS data sets
Constantin Ammar, Evi Berchtold, Gergely Csaba, Andreas Schmidt, Axel Imhof, Ralf Zimmer
bioRxiv 180448; doi: https://doi.org/10.1101/180448
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Multi-reference spectral library yields almost complete coverage of heterogeneous LC-MS/MS data sets
Constantin Ammar, Evi Berchtold, Gergely Csaba, Andreas Schmidt, Axel Imhof, Ralf Zimmer
bioRxiv 180448; doi: https://doi.org/10.1101/180448

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4230)
  • Biochemistry (9123)
  • Bioengineering (6766)
  • Bioinformatics (23968)
  • Biophysics (12109)
  • Cancer Biology (9510)
  • Cell Biology (13753)
  • Clinical Trials (138)
  • Developmental Biology (7623)
  • Ecology (11674)
  • Epidemiology (2066)
  • Evolutionary Biology (15492)
  • Genetics (10631)
  • Genomics (14310)
  • Immunology (9473)
  • Microbiology (22821)
  • Molecular Biology (9086)
  • Neuroscience (48919)
  • Paleontology (355)
  • Pathology (1480)
  • Pharmacology and Toxicology (2566)
  • Physiology (3840)
  • Plant Biology (8322)
  • Scientific Communication and Education (1468)
  • Synthetic Biology (2295)
  • Systems Biology (6180)
  • Zoology (1299)