RT Journal Article SR Electronic T1 Multi-reference spectral library yields almost complete coverage of heterogeneous LC-MS/MS data sets JF bioRxiv FD Cold Spring Harbor Laboratory SP 180448 DO 10.1101/180448 A1 Constantin Ammar A1 Evi Berchtold A1 Gergely Csaba A1 Andreas Schmidt A1 Axel Imhof A1 Ralf Zimmer YR 2017 UL http://biorxiv.org/content/early/2017/08/26/180448.abstract AB Spectral libraries play a central role in the analysis of data independent acquisition (DIA) proteomics experiments. DIA experiments require spectral libraries, as most current methods cannot apply traditional peptide identification via database searching on DIA data. A central assumption in current spectral library tools is that a single characteristic intensity pattern (CIP) suffices to describe the fragmentation of an unmodified peptide in a particular parent ion charge state (peptide charge pair). However, we find that this is often not the case.We analyze a heterogeneous dataset of 440.000 MaxQuant - preprocessed peptide spectra from a QToF mass spectrometer, stemming from over 100 different LC-MS/MS runs. The dataset corresponds to 10.580 peptide charge pairs, which have each been measured and identified at least 20 times. We demonstrate that the same charged and unmodified peptide can fragment in multiple reproducible ways, even within the same LC-MS/MS run. We integrate multiple reference CIPs (MCIPs) in our model library and observe a >99% coverage of replicate fragmentation spectra for 95% of peptide charge pairs (using up to four CIPs). Using a single CIP (as in current spectral library approaches), we find >99% coverage for only 50% of the peptide charge pairs.Our approach achieves substantially greater sensitivity in comparison to the popular SpectraST library generation tool. Using randomized decoy spectra, we demonstrate that identification accuracy of the MCIP approach is improved by up to 12% compared to a single CIP approach. We test the MCIP approach on a SWATH data set and observe a ∼30% increase in peptide recognition. We conclude that including MCIPs in spectral library approaches would yield increased sensitivity without compromising the false discovery rate.Abbreviations DIA: data independent acquisitionDDA: data dependent acquisitionSRM: selected reaction monitoringMRM: multiple reaction monitoringPRM: parallel reaction monitoringSWATH: sequential window acquisition of all theoretical mass spectraCIP: characteristic intensity patternMCIP: multiple characteristic intensity patternsFA: formic acidTFA: trifluoroacetic acidACN: aceonitrilePSM: peptide spectrum matchcps: counts per secondNRFV: normalized replicate fragmentation vector TPP: trans proteomic pipelineSVM: support vector machine