Abstract
Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) Mass Spectrometry (MS) is a reference method for microbial identification. Currently, machine learning techniques are used to predict Antibiotic Resistance (AR) based on MALDI-TOF data. However, current solutions need costly preprocessing steps, their reproducibility is difficult due to hyperparameter cross-validation, they do not provide interpretable results, and they do not take into account the epidemiological difference inherent to data coming from different laboratories. In this paper, we validate a multi-view heterogeneous Bayesian model (SSHIBA) for AR mechanism prediction based on MALDI-TOF MS. This novel approach allows exploiting local epidemiology differences between data sources, gets rid of preprocessing steps, is easily reproducible because hyperparameters are optimized by Bayesian inference, and provides interpretable results. To validate this model and its advantages, we present two domains of Klebsiella pneumoniae isolates: 282 samples of Hospital General Universitario Gregorio Marañón (GM) domain and 120 samples for Hospital Universitario Ramón y Cajal (RyC) domain that discriminates between Wild Type (WT), Extended-Spectrum Beta-Lactamases (ESBL)-producers and ESBL + Carbapenemases (ESBL+CP)-producers. Experimental results prove that SSHIBA outperforms state-of-the-art (SOTA) algorithms by exploiting the multi-view approach that allows it to distinguish between data domains, avoiding local epidemiological problems. Moreover, it shows that there is no need to preprocess MALDI-TOF data. Its implementation in microbiological laboratories could improve the detection of multi-drug resistant isolates, optimizing the therapeutic decision and reducing the time to obtain results of the resistance mechanism. The proposed model implementation, specifically adapted to AR prediction, and data collections are publicly available on GitHub at: github.com/alexjorguer/RMPrediction
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
This work was supported by Spanish MINECO (Agencia Estatal de Investigación) [TEC2017-92552-EXP to P.O., RTI2018-099655-B-100 to P. O., TEC2017-83838-R to A. G., C. S. and V. G., PID2020-115363RB-I00 to A. G., C. S. and V. G.]; and Comunidad de Madrid [IND2017/TIC-7618, IND2018/TIC-9649, IND2020/TIC-17372, Y2018/TCS-4705 to P. O.]; and the BBVA Foundation under the Domain Alignment and Data Wrangling with Deep Generative Models (Deep-DARWiN) project to P. O.; and the European Union (European Regional Development Fund and the European Research Council) through the European Union’s Horizon 2020 Research and Innovation Program [714161 to P. O.]; and Intramural Program of the Gregorio Marañón Health Research Institute to A. G.; and Health Research Fund (Instituto de Salud Carlos III. Plan Nacional de I+D+I 2013-2016) of the Carlos III Health Institute (ISCIII, Madrid, Spain) [PI15/01073, PI18/00997 to A. C. and B. R.] partially financed by the European Regional Development Fund (FEDER) ‘A way of making Europe’; and Health Research Fund Miguel Servet contract [CPII19/00002 to B. R.]
(e-mail: alexjorguer{at}tsc.uc3m.es, pa-martin{at}ing.uc3m.es).
(e-mail: casevill{at}pa.uc3m.es, vanes-sag{at}ing.uc3m.es)
(email: acandelagon{at}gmail.com, emilia.cercenado{at}salud.madrid.org, pamunoz{at}iisgm.com, mbelen.rodriguez{at}iisgm.com)
(e-mail: martahernandez1986{at}gmail.com, rafael.canton{at}salud.madrid.org, rosacampo{at}yahoo.com)
Title corrected;