PT - JOURNAL ARTICLE AU - Florian Meier AU - Niklas D. Köhler AU - Andreas-David Brunner AU - Jean-Marc H. Wanka AU - Eugenia Voytik AU - Maximilian T. Strauss AU - Fabian J. Theis AU - Matthias Mann TI - Deep learning the collisional cross sections of the peptide universe from a million training samples AID - 10.1101/2020.05.19.102285 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.05.19.102285 4099 - http://biorxiv.org/content/early/2020/05/21/2020.05.19.102285.short 4100 - http://biorxiv.org/content/early/2020/05/21/2020.05.19.102285.full AB - The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To explore the nature and utility of the entire peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation – serial fragmentation (PASEF). The scale and precision (CV <1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools library validate the model within a 1.3% median relative error (R > 0.99). Hydrophobicity, position of prolines and histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.Competing Interest StatementThe authors have declared no competing interest.