Abstract
Motivation Predicting antimicrobial resistance using MALDI-TOF mass spectrometry based machine learning is a fast-growing field of research. Recent advances in machine learning methods specifically designed for MALDI-TOF mass spectra have outperformed established classification approaches. However, classification performance was observed to have a large standard deviation between different train–test splits. We hypothesise that this variance is caused by the underlying phylogenetic structure between microbial samples, which is implicitly reflected in their MALDI-TOF MS profiles, but not taken into account during the training of a model.
Results In this paper, we propose to infer this structure from the dataset—using agglomerative hierarchical clustering—and consider it during the dataset splitting between train and test. We show that incorporating such phylogenetic structure into the antimicrobial resistance prediction scenario leads to an improved classification performance. Average precision was increased from 42.3 to 47.1 for ciprofloxacin resistance prediction in Escherichia coli and from 44.6 to 50.8 for amoxicillin-clavulanic acid resistance prediction in Staphylococcus aureus using a Gaussian process classifier with a MALDI-TOF MS specific kernel. We envision that these results will support the quick and reliable identification of antimicrobial resistances, thus increasing patient well-being and reducing healthcare costs.
Availability All data is available for download and code available as an easy-to-use Python package under https://github.com/BorgwardtLab/maldi_PIKE at branch maldi_stratification.
Contact caroline.weis{at}bsse.ethz.ch, karsten.borgwardt{at}bsse.ethz.ch
Supplementary information Supplementary information at the end of document.
Competing Interest Statement
The authors have declared no competing interest.