RT Journal Article SR Electronic T1 Spectrum graph-based de-novo sequencing algorithm MaxNovo achieves high peptide identification rates in collisional dissociation MS/MS spectra JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.09.04.458985 DO 10.1101/2021.09.04.458985 A1 Petra Gutenbrunner A1 Pelagia Kyriakidou A1 Frido Welker A1 Jürgen Cox YR 2021 UL http://biorxiv.org/content/early/2021/09/06/2021.09.04.458985.abstract AB We describe MaxNovo, a novel spectrum graph-based peptide de-novo sequencing algorithm integrated into the MaxQuant software. It identifies complete sequences of peptides as well as sequence tags that are incomplete at one or both of the peptide termini. MaxNovo searches for the highest-scoring path in a directed acyclic graph representing the MS/MS spectrum with peaks as nodes and edges as potential sequence constituents consisting of single amino acids or pairs. The raw score is a sum of node and edge weights, plus several reward scores, for instance, for complementary ions or protease compatibility. For search-engine identified peptides, it correlates well with the Andromeda search engine score. We use a particular score normalization and the score difference between the first and second-best solution to define a combined score that integrates all available information. To evaluate its performance, we use a human cell line dataset and take as ground truth all Andromeda-identified MS/MS spectra with an Andromeda score of at least 100. MaxNovo outperforms other software in particular in the high-sensitivity range of precision-coverage plots. We also identify incomplete sequence tags and study their statistical properties. Next, we apply MaxNovo to ion mobility-coupled time of flight data. Here we achieve excellent performance as well, except for potential swaps of the two amino acids closest to the C-terminus, which are not well resolved due to the low end of the mass range in MS/MS spectra in this dataset. We demonstrate the applicability of MaxNovo to palaeoproteomics samples with a Late Pleistocene hominin proteome dataset that was generated using three proteases. Interestingly, we did not use any machine learning in the construction of MaxNovo, but implemented expert domain knowledge directly in the definition of the score. Yet, it performs as good as or better than the leading deep learning-based algorithm.Competing Interest StatementThe authors have declared no competing interest.