Abstract
Enterococcus faecium is one of the leading pathogens in the world. In this study, we proposed a strategy to rapidly and accurately distinguish vancomycin-resistant Enterococcus faecium (VREfm) and vancomycin-susceptible E. faecium (VSEfm) to help doctors correctly determine the use of vancomycin by a machine learning (ML)-based algorithm. A predictive model was developed and validated to distinguish VREfm and VSEfm by analyzing MALDI-TOF MS spectra of unique E. faecium isolates from different specimen types. Firstly, 5717 mass spectra, including 2795 VREfm and 2922 VSEfm, were used to develop the algorithm. And 2280 mass spectra of isolates, namely 1222 VREfm and 1058 VSEfm, were used to externally validate the algorithm. The random forest-based algorithm demonstrated good classification performances for overall specimens, whose mean AUROC in 5-fold cross validation, time-wise validation, and external validation was all greater than 0.84. For the detection of VREfm in blood, sterile body fluid, urinary tract, and wound, the AUROC in external validation was also greater than 0.84. The predictions with algorithms were significantly more accurate than empirical antibiotic use. The accuracy of antibiotics administration could be improved by 30%. And the algorithm could provide rapid antibiotic susceptibility results at least 24 hours ahead of routine laboratory tests. The turn-around-time of antibiotic susceptibility could be reduced by 50%. In conclusion, a ML algorithm using MALDI-TOF MS spectra obtained in routine workflow accurately differentiated VREfm from VSEfm, especially in blood and sterile body fluid, which can be applied to facilitate the clinical testing process due to its accuracy, generalizability, and rapidness.
Introduction
Enterococcus spp. is one of the leading pathogens in healthcare-associated infection.1 Enterococcal infection could cause urinary tract infection, blood stream infection, and even mortality.2 Until recently, vancomycin was virtually the only drug that could be consistently relied on for treating multidrug-resistant enterococcal infections3,4. Vancomycin-resistant Enterococcus (VRE) has led to heavy burden on healthcare worldwide since its first-time isolation.5,6 Enterococcus faecalis and E. faecium are the 2 most commonly isolated Enterococcus spp. in clinical practice.1 VRE faecium (VREfm) has received considerably more attention than VRE faecalis (VREfs) because most of the clinically isolated VRE is E. faecium in the recent decades4,7 and VREfm causes more severe infection than VREfs8,9. Early detection of vancomycin resistance is essential for successfully treating VREfm infection.10 Vancomycin could be discontinued, and antimicrobial agents could be replaced with other antibiotics (eg, linezolid and daptomycin) based on the laboratory results of vancomycin resistance11,12. Patients’ prognosis could be improved and further drug resistance development could be avoided by using susceptible antibiotics.11 However, typical tests in clinical microbiology laboratories, such as the minimal inhibitory concentration test or agar-diffusion test, fail to provide results for antibiotic susceptibility rapidly. The antibiotic susceptibility test (AST) of vancomycin is time-consuming, and the Clinical and Laboratory Standards Institute recommended a full 24 hours should be held for accurate detection of vancomycin resistance in enterococci.13 This would considerably delay accurate prescription of antibiotics against E. faecium. Furthermore, prescribing antibiotics based on empirical prescription, without determining AST, would result in low effectiveness (approximately 50%), depending on the local epidemiology of VREfm.12 Thus, a new tool is needed to provide AST for VREfm rapidly and accurately.
Recently, matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become popular among clinical microbiology laboratories worldwide because of its reliability, rapidity, and cost-effectiveness in identifying bacterial species.14–16 In addition to species identification, MALDI-TOF MS has been promising in other applications, such as strain typing or AST.17–19 MALDI-TOF MS can generate massive data comprising hundreds of peak signals on the spectra.17,20 The complex data of MALDI-TOF spectra are overwhelming to even an experienced medical staff.19 Studies have attempted to identify the characteristic peak through visual inspection.21,22 The results of the studies have been discordant, which has limited the clinical utility.23–25
Machine learning (ML) is a good analytical method for solving classification problems through identification of implicit data patterns from complex data.26 The ML method outperforms traditional statistical methods because of its excellent ability to handle complex interactions between large amount of predictors and good performance in non-linear classification problems27 ML has been successfully applied in several clinical fields.27–36 Thus, the ML algorithm is especially appropriate for analyzing complex data such as MALDI-TOF spectra. However, to our knowledge, few studies have used ML in the analysis of MALDI-TOF spectra for rapidly reporting VREfm, and the case numbers in these studies were insufficient, and so, ML algorithm generalization has been limited.37–39 Moreover, to date, no study has validated AST prediction ML algorithms by using large real-world data.
In this study, we aimed to develop and validate a VREfm prediction ML model by using consecutively collected real-world data from 2 tertiary medical centers (Chang Gung Memorial Hospital [CGMH], Linkou branch and CGMH, Kaohsiung branch). Using the largest MALDI-TOF spectrum clinical data to date, the ML algorithm could predict VREfm accurately, rapidly, and in a ready-to-use manner based on the real-world evidence, which is more representative for clinical practice.40 Moreover, we confirmed the robustness and generalization of the ML algorithm through several validation methods, namely cross-validation, time-wise internal validations (unseen independent testing dataset classified according to time), and external validation (unseen independent testing dataset from another medical center). According to the real-world evidence-based validation, our VREfm prediction ML models are ready to be incorporated into routine workflow.
Materials and methods
Data source
We designed a novel machine learning approach which can improve accuracy of antibiotics administration and reduce the turn-around-time of antibiotics susceptibility test. We summarized the comparison between the machine learning approach and the traditional approach used in current clinical microbiology laboratory. We schematically illustrated the study design in Figure 1(b). The data used in this retrospective study was consecutively collected from the clinical microbiology laboratories of 2 tertiary medical centers in Taiwan, namely CGMH Linkou branch and CGMH Kaohsiung branch between January 1, 2013 and December 31, 2017. The clinical microbiology laboratories collected and processed all the routine specimens obtained from the hospitals. In total, 7997 E. faecium cases were identified and included in this study, whereas 5717 (VREfm: 48.89%) and 2280 (VREfm: 53.60%) cases, respectively, were obtained from Linkou and Kaohsiung branches of CGMH. The E. faecium strains were isolated from blood, urinary tract, sterile body fluids, and wound. The detailed description of specimen types is provided in eTable 1 in the Supplement. The study was approved by the Institutional Review Board of Chang Gung Medical Foundation (No. 201900767B0). We followed the Standards for Reporting of Diagnostic Accuracy 201541 and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis reporting guidelines.42
We plotted a timeline of bacterial culture test in current clinical microbiology laboratory (i.e., traditional approach) and a modified timeline when the VREfm model is incorporated (i.e., machine learning approach). In the traditional approach, specimens are collected for bacterial culture test. One day is usually needed for growth of a single colony for species identification (by MALDI-TOF MS). Antibiotics susceptibility test (AST) of vancomycin for VREfm will cost another day to report. By contrast, in the machine learning approach, the VREfm model can provide preliminary AST at the time when bacterial species is identified by MALDI-TOF MS. For treating VREfm, the machine learning approach can improve accuracy of antibiotics use by around 30% (from 50% accuracy of empirical antibiotics use in the traditional approach to 80% accuracy of preliminary AST provided by the machine learning approach). Meanwhile, the turn-around-time of bacterial culture test can be reduced to one day, which is 50% reduction.
We developed and validated a VREfm prediction model. The study included several steps, namely data collection, data preprocessing, predictor candidate extraction and important predictor selection, model training, evaluation, and testing. In data collection, data were obtained from 2 tertiary medical centers (Linkou and Kaohsiung branches of CGMH). The data included mass spectra and results of the vancomycin susceptibility test of E. faecium. Data from the CGMH Linkou branch were used for model training and validation, while data from the CGMH Kaohsiung branch served as an independent testing data. In the steps of data preprocessing and predictor candidate extraction and important predictor selection, a specific set of crucial predictors would be used for model training. K-fold, time-wise CV, and external validation were used to confirm the models’ robustness. The VREfm prediction model can detect VREfm accurately at least 1 day earlier than the current method.
Definition of E. faecium and vancomycin susceptibility
E. faecium was identified using MALDI-TOF spectra measured using a Microflex LT mass spectrometer and analyzed using Biotyper 3.1 (Bruker Daltonik GmbH, Bremen, Germany). A log score (generated through Biotyper 3.1) larger than 2 was used for confirming E. faecium.17–19 We tested vancomycin susceptibility of E. faecium by using the paper disc method. The details of E. faecium identification and AST are given in the eMethods in the Supplement.
MALDI-TOF mass spectrum data collection and preprocessing
The details were described in the Supplements.
Peak selection from MALDI-TOF mass spectra for model development
We applied the embedded feature-selection method to select the most important peaks from MALDI-TOF mass spectra.43 The peaks were ranked using the p-values of the chi-square test of homogeneity, which was employed to determine whether frequency counts were distributed identically across VREfm and vancomycin-susceptible E. faecium (VSEfm). Preliminarily, we selected top 10 important peaks to plot a heat map based on the hierarchical clustering (eMethods in the Supplement). All the ranked peaks were incorporated in the model accordingly until the performance did not increase. Consequently, we could obtain the important peaks that were highly related to differentiation of VREfm and VSEfm isolates.
For determining the number of peaks included in the ML models, we forwardly added them into the ML models and calculated the performance using accuracy as the metric. First, the predictor candidates were sorted in a descending order according to the importance score, and one predictive peak was added at a time into the ML models. On the basis of predictive peak composition, we used different algorithms, namely random forest (RF), support vector machine (SVM) with a radial basis function kernel, and k-nearest neighbor (KNN) and applied 5-fold cross validation (CV) in the data from the CGMH Linkou branch. The accuracies of the ML models were calculated to determine the adequate number of predictive peaks included in the models.
Development and validation of VREfm prediction models
We aimed to develop and validate a robust VREfm prediction model capable of detecting VREfm earlier than the AST report. Three commonly used ML algorithms, namely RF, SVM with a radial basis function kernel, and KNN, were used for developing the VREfm prediction model. These ML algorithms have demonstrated their successful applications (either classification or prediction) in clinical practice.17–19,27,28,35,36 The details of these ML algorithms and model training processes are attached in the eMethods in the Supplement.
We thoroughly evaluated the performance and robustness of the VREfm prediction models using 5-fold CV, time-wise internal validation, and external validation. Data from the CGMH Linkou branch were used for 5-fold CV and time-wise internal validation; by contrast, data from the CGMH Kaohsiung branch served as the unseen independent testing data for external validation. For 5-fold CV, data were randomly divided into 5 datasets. Each one of the 5 datasets served as the testing dataset to evaluate the performance of the model developed by the other 4 datasets. In 5-fold CV, we obtained 5 measurements of metrics for evaluating the robustness of VREfm prediction models. Moreover, to evaluate performance using prospectively collected data, we conducted time-wise internal validation: we used data collected between January 1, 2013 and December 31, 2016 as the training dataset for developing VREfm prediction models, while data from January 1, 2017 to December 31, 2017 served as the testing dataset. To test the generalizability of the models, we used data from the CGMH Linkou branch to develop VREfm prediction models and used data from the CGMH Kaohsiung branch to test the models’ performance in a different institute. Additionally, we evaluated the performance of the VREfm prediction model using different types of specimens, namely blood, urinary tract, sterile body fluid, and wound, by using data from the CGMH Kaohsiung branch. We adopted metrics including sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), receiver operating characteristic (ROC) curve, and area under the receiver operating characteristic curve (AUROC) to access and compare the performance of the VREfm prediction model.
Statistical analysis
The confidence intervals for sensitivity, specificity, and accuracy were estimated using the calculation of the confidence interval for a proportion in one sample situation. Specifically, the critical values followed the Z-score table. To compare the percentages in matched samples, Cochran’s Q test, a nonparametric approach, was implemented in this study.44 Then, we employed pairwise McNemar’s tests45 for post hoc analysis and adopted the false discovery rate proposed by Benjamini and Hochberg (1995) to adjust the P value.46 Furthermore, the confidence intervals of AUROCs were determined using the nonparametric approach, and the AUROC comparisons mainly adopted the nonparametric approach proposed by Delong et al.47
Results
Predictive peaks for detecting VREfm
We defined crucial predictive peaks when the occurrence frequency of a peak was significantly different (defined by the chi-square test) in VREfm and VSEfm. In the step of extracting predictor candidates, 876 predictor candidates were extracted. From the predictor candidates, we used the chi-square method to select important predictive peaks.
We selected 10 most critical predictive peaks and plotted a heat map to preliminarily visualize the difference between VREfm and VSEfm (Figure 2). Peaks of m/z 3172, 3302, 3645, 6342, 6356, 6603, and 6690 were found more frequently in VREfm; by contrast, m/z 3165, 3681, and 7360 occurred more frequently in VSEfm. Although these important predictive peaks were statistically significant, we found them in both VREfm and VSEfm. The full list of crucial predictive peaks is provided in eTable 2 in the Supplement.
We selected top 10 discriminative peaks by chi-square testing the occurrence frequency of peaks in VREfm and VSEfm. The heat map was plotted based on the hierarchical clustering of all the VREfm and VSEfm isolates from the CGMH Linkou branch. Rows represent the isolates, and columns represent the top 10 discriminative peaks. The values in the heat map represent the MS spectral intensity which was log10-normalized and z-score standardized. Red color indicates relatively higher peak intensity while blue color indicates lower peak intensity. The isolates are grouped into 5 clusters. VREfm and VSEfm isolates can be visually differentiated by using the top 10 discriminative peaks.
We selected several important predictive peaks from the predictor candidate list, which was ordered according to the chi-square score. eFigure 4 in the Supplement shows the change in ML models performance when the number of critical predictive peaks increased. For all the ML algorithms used in the study, a similar trend of performance was observed: the accuracies of the ML models reached a steady plateau when the included number of important predictive peaks was larger than 100 (eFigure 4 in the Supplement). Thus, the top 100 crucial predictive peaks were selected as the peak composition for the following experiments.
Performance of VREfm prediction models
We summarized the ML models’ performance in Table 1, Table 2, and Figure 3. The details of comparison between different algorithms are described in the Supplement. The RF model outperformed SVM and KNN in 5-fold CV, time-wise internal validation, and external validation (eTable 3 in the Supplement), where the AUROC ranged from 0.8463 to 0.8553 and accuracy ranged from 0.7769 to 0.7855. Moreover, performance robustness was also observed in SVM and KNN. Figure 3 shows typical ROC curves for the 3 algorithms in all the 3 validations. We used Youden’s index to select the threshold from the ROC curves in search of balanced sensitivity and specificity. In external validation, the sensitivity and specificity of RF were 0.7791 (95% confidence interval: 0.7620-0.7961) and 0.7930 (95% confidence interval: 0.7764-0.8096). On the basis of the resistance rate (VREfm: 53.60%) in the external validation dataset, the PPV was 0.8130 and the NPV was 0.7565.
Given that the RF algorithm attained the highest performance, additionally, we tested the performance of the RF-based VREfm prediction model using different types of specimens in the independent testing dataset (ie, external validation by using data of the CGMH Kaohsiung branch) (Table 2). The RF-based VREfm prediction model attained higher performance in predicting VREfm in blood and sterile body fluid specimens than the other specimen types. The AUROC of blood specimens reached 0.9103 (95% confidence interval: 0.8727-0.9480), whereas that of sterile body fluid specimens reached 0.8714 (95% confidence interval: 0.8321-0.9106). Moreover, the sensitivity (0.8870, 95% confidence interval: 0.8436-0.9303) and specificity (0.8000, 95% confidence interval: 0.7452-0.8548) of the RF-based VREfm prediction model for the blood specimen were also balanced and significantly higher than those for other specimens. By contrast, the performance of the RF-based VREfm prediction model for urinary tract specimens (0.8494, 95% confidence interval: 0.8258-0.8731) was similar to that for overall specimens (0.8553, 95% confidence interval: 0.8399-0.8706).
Discussion
We developed ML-based models for predicting VREfm rapidly and accurately based on MALDI-TOF MS data. The models were especially effective in predicting VREfm in invasive infections (ie, blood and sterile body fluid). We used the largest up-to-date real-world data to validate the robustness and generalization of the ML-based models by using k-fold CV, time-wise internal validation, and external validation. The rapid and accurate AST of vancomycin is promising for determining antibiotics against VREfm infection.
Our results suggested that AST could be predicted accurately by using ML algorithms to analyze MALDI-TOF MS data. MALDI-TOF MS is a powerful analytical tool in current clinical microbiology laboratories because of its rapidness and cost-effectiveness in identifying bacterial species.14–16 On the basis of the massive data produced by MALDI-TOF MS, moreover, some studies have demonstrated that subspecies typing could be predicted from a specific pattern of MS spectra only.17,19 Furthermore, other studies have shown a good correlation between AST and specific patterns of MS spectra.18,23–25,48 However, some issues have limited the generalization of these results. First, most of the studies have adopted an additional protein extraction step before analytical measurement of MALDI-TOF MS. The protein extraction step could enhance data quality; however, it is not routinely used in clinical practice because it is labor-intensive, time-consuming, and expensive.17,18 By contrast, we used the direct deposition method, which is recommended by the manufacturer and is used for everyday works. Thus, our models are more feasible for the existing workflow because they were trained using real-world data. Second, the data sizes in these studies were too small to be representative. We demonstrated that the ML-based models for predicting VREfm can be applied as a clinical decision support tool by using the largest up-to-date datasets collected through the direct deposition method and various validation methods.
Identifying crucial predictive peaks in VREfm classification may not be essential in clinical application; however, the specific combination of crucial predictive peaks would inspire further studies investigating the molecular mechanism of VREfm. Typically, the vanA cluster is the most common mediator of vancomycin resistance in enterococci,49 although many vancomycin resistance genes have been identified.50 In brief, many factors together attribute to antibiotic resistance. Moreover, the complex mechanisms of antibiotic resistance would evolve in response to the selective pressures of their competitive environment (eg, antibiotic use).49 Thus, identifying the important predictive peaks for VREfm could help us understand the mechanism behind resistance. In this study, for example, peaks of m/z 6603, 6631, and 6635 were found frequently for VREfm (eTable 2 in the Supplement). The finding is consistent with a previous study where Griffin et al. reported m/z 6603 is specific for vanB-positive VREfm, while m/z 6631 and 6635 are specifically found for vanA-positive VREfm.38 These peaks are worthy of further identification in future investigations. Moreover, new antibiotics against VREfm can be developed based on these predictive peaks for VREfm.
Our ML models persistently performed well in 5-fold CV, time-wise internal validation, and external validation. Moreover, all the ML algorithms used in this study exhibited good performance (AUROC > 0.8). It could be explained that discriminating VREfm from VSEfm is generally achievable after adequate feature extraction and feature selection processes. In time-wise internal validation, we intended to simulate a prospective study for a model trained by the “past data” to analyze the “future data.” Based on the performance of time-wise internal validation, we concluded that the trained ML models could also perform well on the prospectively collected data, which are unseen in the training process. Previous study results differentiating VREfm from VSEfm by using MALDI-TOF MS spectra could not be generalized.23–25,38 The inconsistent results could be because less features (<10) were used. A review article reported that peak-level reproducibility of MALDI-TOF mass is approximately 80%.51 The classification performance is compromised when essential peaks are few and happen to be absent on the mass spectra. In our study, the ML models performed stably when the included peaks were more than 100 (eFigure 4 in the Supplement). The steady and good performance of our ML models could be explained by the much more included peaks: when some of the essential peaks are not reproduced in the mass spectra, we can still use other alternative essential peaks to conduct an accurate classification. The number of essential peaks somehow compensated the insufficient reproducibility of MALDI-TOF mass. By contrast, regarding predicting VREfm for various specimens, we found that the RF-based model performed especially well in blood and sterile body fluids. The superior prediction performance could be attributed to the relatively fewer number of VREfm strains in blood and sterile body fluids. Bacterial infection in blood or sterile body fluids is typically regarded as invasive infection.52 Only a few VREfm strains (sequence type (ST)17, ST18, ST78, and ST203) cause invasive infections in blood or sterile body fluids according to the studies in Taiwan53 and Ireland.54 The nature of the classification problem would be more simple when the number of labels is fewer.
Limitations
This study has several limitations. First, although the models were evaluated using unseen external data from different medical centers, all the training data and testing data were collected from only 2 tertiary medical centers in Taiwan. Directly applying the ML models in hospitals of other areas or countries as well as in primary care institutes may not be appropriate. However, we believe that the method, but not the trained model, could be generalized. Although our ML models were validated comprehensively using 3 different approaches and the results show that the difference in MALDI-TOF mass spectra between VREfm and VSEfm can be distinguished through all the ML algorithms we used, we suggest others collecting their locally relevant data for training and validating the VREfm predicting model given that the epidemiology of VREfm could be fairly different site by site. Second, our primary goal was to develop and validate a practical and ready-to-use ML model in real-world practice. We found some crucial predictive peaks for VREfm; however, we did not confirm the identities for these peaks. It is worthy of identifying these peaks in further investigations. Third, we did not use the deep learning (DL) algorithm for predicting VREfm, although DL has been successful in the image classification or radiology field.32,33 In this study, VREfm could be accurately predicted using several classic algorithms (ie, RF, SVM, and KNN) that require less resource and time in training and using models. Moreover, DL usually requires more training samples and is financially and computationally more expensive than classical ML algorithms.55 DL utility in analyzing MS data rather than image data could be another promising issue in the bioinformatics field. Fourth, no strain typing data were included. Thus, the molecular epidemiology of VREfm used in this study is unknown.
Conclusions
We developed and validated robust ML models capable of discriminating VREfm from VSEfm based on MALDI-TOF MS spectra. These models were especially good at detecting VREfm causing invasive diseases. The accurate and rapid detection of VREfm by using the ML models would facilitate more appropriate antibiotic prescription.
Author Contributions
HYW, KPL, and CRC had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of data analysis. HYW, KPL, CRC, and YJT analyzed/interpreted the data, performed experiments, designed the study, and wrote the manuscript. HYW, CRC, YJT, JTH, TYL, THC, MHW, TPL, and JJL reviewed/edited the manuscript for important intellectual content and provided administrative, technical, or material support. JJL obtained funding and supervised the study.
Funding
This work was supported by Chang Gung Memorial Hospital (CMRPG3F1721, CMRPG3F1722, CMRPD3I0011) and the Ministry of Science and Technology, Taiwan (MOST 107-2320-B-182A-021-MY3, MOST 108-2636-E-182-001, and MOST 107-2636-E-182-001).
Competing interests
The authors have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Acknowledgments
This manuscript was edited by Wallace Academic Editing.