TY - JOUR T1 - Understanding and predicting ciprofloxacin minimum inhibitory concentration in <em>Escherichia coli</em> with machine learning JF - bioRxiv DO - 10.1101/806760 SP - 806760 AU - Bálint Ármin Pataki AU - Sébastien Matamoros AU - Boas C.L. van der Putten AU - Daniel Remondini AU - Enrico Giampieri AU - Derya Aytan-Aktug AU - COMPARE ML-AMR group AU - Rene S. Hendriksen AU - Ole Lund AU - István Csabai AU - Constance Schultsz Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/01/17/806760.abstract N2 - A possible way to tackle the crisis of antimicrobial resistance development is a strict policy when prescribing antibiotics. Thus, it is important that prescriptions are based on antimicrobial susceptibility data to ensure effective treatment outcomes. The increasing availability of next-generation sequencing (NGS), bacterial whole genome sequencing (WGS) can facilitate a more reliable and faster alternative to traditional phenotyping for the detection and surveillance of AMR.This work proposes a machine learning approach that can predict the minimum inhibitory concentration (MIC) for a given antibiotic, here ciprofloxacin, on the basis of both genome-wide mutation profiles and profiles of acquired antimicrobial resistance genes (ARG). We analysed 704 Escherichia coli genomes combined with their respective MIC measurements for ciprofloxacin originating from different countries. The four most important predictors found by the model, mutations in gyrA residues Ser83 and Asp87, a mutation in parC residue Ser80 and presence of any qnrS gene, have been experimentally validated before. Using only these four predictors in a linear regression model, 65% and 92% of the test samples’ MIC were correctly predicted within a two- and a four-fold dilution range, respectively. The presented work goes further than the typical predictions that use machine learning as a black box model concept. The recent progress in WGS technology in combination with machine learning analysis approaches indicates that in the near future WGS of bacteria might become cheaper and faster than a MIC measurement.Impact statement Whole genome sequencing has become the standard approach to study molecular epidemiology of bacteria. However, the application of WGS in the clinical microbiology laboratory as part of individual patient diagnostics still requires significant steps forward, in particular with respect to prediction of antibiotic susceptibility based on DNA sequence. Whilst the majority of studies of prediction of susceptibility have used a binary outcome (susceptible/resistant), a quantitative prediction of susceptibility, such as the MIC, will allow for earlier detection of trends in increasing resistance as well as the flexibility to follow potential adjustments in definitions of susceptible (wild type) and resistant (non-wild type) categories (breakpoints/ epidemiological cut-off values).Data summary In this study, 704 E. coli genomes combined with MIC measurement for ciprofloxacin were analysed (24). Paired-end sequencing was performed on all isolates and the results were stored in FASTQ format. The isolates originated from five countries, Denmark, Italy, USA, UK, and Vietnam. The MIC distribution for these isolates is depicted in Table 1. Out of 704, 266 E. coli genomes had no country metadata available and were used as an independent test set. All data were deposited in the AMR Data Hub (24) which consists of raw sequencing data, ciprofloxacin minimum inhibitory concentrations, and additional metadata such as the origin of the samples.View this table:TABLE 1 The collected and used data in the analysis grouped by country and MIC values.Publicly available sequencing data was used from projects PRJEB21131, PRJNA266657, PRJNA292901, PRJNA292904, PRJNA292902, PRJDB7087, PRJEB21880, PRJEB21997, PRJEB14086 and PRJEB16326.Download and analysis scripts are available at https://github.com/patbaa/AMR_ciprofloxacin. iTOL phylogenetic tree is available at https://itol.embl.de/tree/14511722611491391569485969.The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ER -