A DAPTIVE NANOPORE SEQUENCING ON MINIATURE FLOW CELL DETECTS EXTENSIVE ANTIMICROBIAL RESISTANCE

6 Rapid screening of hospital admissions to detect asymptomatic carriers of resistant bacteria can prevent pathogen 7 outbreaks. However, the resulting isolates rarely have their genome sequenced due to cost constraints and long turn- 8 around times to get and process the data, limiting their usefulness to the practitioner. Here we use real-time, on-device 9 target enrichment ("adaptive") sequencing on a new type of low-cost nanopore ﬂow cell as a highly multiplexed assay 10 covering 1,147 antimicrobial resistance genes. Using this method, we detected three types of carbapenemase in a 11 single isolate of Raoultella ornithinolytica ( NDM , KPC , VIM ). Further investigation revealed extensive horizontal gene 12 transfer within the underlying microbial consortium, increasing the risk of resistance spreading. From a technical point 13 of view, we identify two important variables that can increase the enrichment of target genes: higher nucleotide identity 14 and shorter read length. Real-time sequencing could thus quickly inform how to monitor this case and its surroundings. 15

: Real-time sequencing reveals extensive resistance load and horizontal gene transfer. (A) Genome reconstruction of a strain of R. ornithinolytica carrying nine plasmids and four carbapenemase genes. Color-coded coverage from 90x (black, e.g., chromosome) to 250x (red, e.g., plasmid carrying OXA-1). (B) Gene transfer of VIM-1 across three strains and four loci. The carbapenemase is flanked by multiple transposases (see annotation), which likely mediate its mobilization. Vertical lines indicate 100 % sequence identity between corresponding genes. (C) Comparison of shared resistance genes between the enrichment sequencing run ("Flongle"), the R. ornithinolytica isolate, all four "isolates" combined and the "metagenome" assembly. Of all resistance genes identified in the metagenome, 79.7 % were found in the isolates. Surprisingly, several resistance genes were not identified in the metagenome, among them several carbapenemase copies. In the R. ornithinolytica isolate genome, about two-thirds of resistance genes were also found using on-device target enrichment. All plasmid-encoded genes among them were detected, including all carbapenemases. (D) Pairwise shared sequences between isolates and metagenome-assembled genomes. Putative transfers were defined as loci with a minimum length of one kilobase and 99.9 % sequence identity between each pair of loci. Extensive sequence transfer is observed between the three isolate genomes (and their corresponding bins from the metagenomic assembly). (E) Miniature, low-cost flow cell used for on-device target enrichment ("Flongle", Oxford Nanopore Technologies), with a one-cent coin placed on top as scale.
ureilytica. None of the remaining metagenomic contigs showed putative transfers. Again, metagenomics did not add 61 important information beyond the culture isolates.

62
The sensitivity of metagenomic sequencing can be increased with depth, but the associated cost limits the applicability 63 in the routine laboratory. Therefore, going in the opposite direction, a recently introduced miniature nanopore flow cell 64 ("Flongle") aims to reduce per-run costs through reduced sequencing yield. Because the yield is reduced, however, genes. 11, 12 Here we aimed to enrich 1,147 representative antimicrobial resistance genes (ARGs, see methods), which to 72 our knowledge, is the first time that adaptive sequencing has been used to target a microbial gene panel. We define 73 "enrichment" as the difference in total read count over a corresponding ARG between standard and adaptive sequencing.

74
In the latter condition and unless stated otherwise, we exclude rejected reads, i.e., those for which the adaptive selection 75 algorithm has not recognized a target in the first bases of the read ( Figure S1).

76
Read count over target genes is a commonly used metric in RNA-Seq experiments, with two caveats: First, most 77 RNA-Seq studies use fixed-sized short reads, i.e., the read length distribution is the same for all conditions. Second, the 78 read count is normalized by target length and the overall number of mapped reads, which allows the comparison of 79 targets within a condition ("relative expression"), although their size might differ. Here, we only use raw read count 80 to measure enrichment without further adjustments. We justify this because, first, a single library was used in both 81 conditions (standard, adaptive), and the resulting read length distributions for "standard" and "unrejected adaptive" 82 reads are near-identical ( Figure S1). Second, we only compare each target across conditions, and so there is no need to 83 normalize by target length.

84
To compare target read abundance between adaptive and "standard" nanopore sequencing, we sequenced three isolates 85 in technical duplicates on a single MinION flow cell, periodically alternating between adaptive and standard sequencing 86 in 16 one-hour intervals ( Figure 2A). Overall, adaptive sequencing could roughly double the abundance of many targets, 87 while others were hardly detected at all ( Figure 2B). We subsequently identified two factors that substantially affected 88 target abundance between the two conditions: nucleotide identity and read length.

89
First, high nucleotide identity between an isolate's gene and the corresponding member of the target gene panel resulted 90 in a higher on-target read count ( Figure 2B). Surprisingly, most similar and thus most enriched genes were located on 91 plasmids. This likely reflects a bias in the database composition, where common resistance plasmids are well annotated 92 while strain-specific, chromosomal gene isoforms are undersampled. As expected, target sequence similarity did not 93 affect abundance in the standard condition, which did not use a target database. To quantify this effect, we performed 94 Bayesian regression and modeled the effects of variables for which a contribution to abundance seemed plausible, 95 namely sequence similarity, coverage, read length and whether the target was located on a plasmid (including interaction 96 effects, see methods). The largest effect was observed for similarity conditional on whether adaptive sequencing was 97 turned on ( = 11.98, 95 % CI ± 0.15). However, it can be hard to interpret any single coefficient in an interaction 98 model in isolation; it is more informative to plot samples from the posterior distribution for any variable of interest. All 99 else being equal, adaptive enrichment only outperforms standard sequencing when an isolate's gene has a nucleotide 100 identity of at least 95 % to a record in the target database, and up to two-fold for near-identical targets ( Figure 2C).

101
Several targets are enriched four times over the standard baseline. Other studies reported a similar enrichment of two-102 to four-fold for bacterial genomes, albeit partly using different real-time matching algorithms. 11, 13 103 Second, we observed that reads from plasmids were shorter than chromosomal ones ( Figure 2D). Furthermore, the 104 length of chromosomal reads is shorter for adaptive sequencing than the standard because if a target is not identified on 105 a given read, sequencing is terminated prematurely. Our statistical model takes this conditionality into account. The 106 model indicates that adaptive sequencing outperforms the standard only for reads smaller than 3 kb, all else being equal 107 Figure 2: Effect of two variables during adaptive sequencing on enrichment efficiency compared to a standard nanopore sequencing run. (A) The same three Citrobacter and Raoultella isolates were sequenced with and without enrichment (green and violet, respectively), alternating the conditions periodically on a single flow cell. (B) Each point corresponds to an open reading frame that has been annotated in the final isolate genome assembly as a resistance gene using a dereplicated ARG database (n=1,147). "Read count" is the number of reads from each sequencing condition that map to these genes. As the read passes through the nanopore during enrichment, it is searched in real-time against a database of target genes. If no match is found, the read is ejected prematurely. Reads that are very similar to a database entry (nucleotide identity) pass this filter, while reads with lower sequence identity are falsely rejected. Many highly similar targets reside on plasmids, likely a sampling bias in the resistance database. As expected, similarity has no effect on read count per ORF for standard sequencing because there is no database search involved. (C) Adaptive sequencing outperforms the standard once the nucleotide identity ("similarity") between the target and its match in the panel database surpasses 95 %. For values close to identity, a two-fold enrichment can be expected. We even observed a four-fold enrichment of several targets over the standard baseline. When we fit a Bayesian multivariate regression model, the increase of target abundance with similarity becomes clear (2.5 and 97.5 % quantiles displayed). (D) Reads derived from plasmids are shorter than chromosomal ones. In turn, chromosomal reads from adaptive are shorter than those from standard sequencing because they are more frequently ejected from the nanopore before the read has been sequenced fully for lack of any match. These factors need to be accounted for in a regression model to estimate the effect of read length on target abundance accurately. (E) Adaptive sequencing outperforms the standard for shorter read lengths of 3 kb and less, all else being equal. Short reads of about 1 kb can potentially double target abundance. Therefore, library preparation protocols for adaptive sequencing could add a step to shear extracted DNA to improve the enrichment.
hours passed. This short turn-around time helped shape the public health response. For example, transposon-encoded 145 VIM and OXA meant that associated wards could be monitored for the occurrence of these genes in other members of 146 the Enterobacteriaceae. By comparison, generating 20 Gb of metagenomic data would also take two days, irrespective 147 The amount of putative horizontal gene transfer between isolate genomes and MAGs was estimated by counting the 233 number of shared genes for each pair. First, we performed pairwise genome alignment using the nucmer command 234 from mummer (v4.0.0rc1). 24 We then searched for "shared genes", defined as such if the alignment was 1 kb or longer 235 and if the pairwise nucleotide identity between genes was > 99.9 %.

236
To model the influence of several predictor variables on our outcome variable "target abundance" (total on-target read 237 count), including plausible interactions, we fit a Bayesian regression model using brms (v2.13). 25 The outcome variable 238 was modeled as a Poisson distribution. We conditioned the effect of nucleotide similarity on sequencing state, i.e., 239 whether adaptive sequencing was turned on or off, by introducing an interaction term. Also, we conditioned read length 240 on sequencing state and whether a read was derived from a plasmid or the chromosome. Finally, we included a term to 241 model the effect of contig coverage, calculated from mapping the reads back to the isolate assemblies. 242 read_count ⇠ N ormal(µ i , ) Sampling was performed with four chains, each with 2,000 iterations, of which the first half were discarded as warmup, 243 for a total of 4,000 post-warmup samples. Samples were drawn using the NUTS algorithm. All chains converged 244 (R = 1.00). Coefficient estimates including confidence intervals are available in Table S1. Refer to the project 245 repository for code on how the model was specified. 246 length distributions ( Figure S1). We then selected parameters to model combinations of reads and targets of varying 248 length distributions, from long (mean 8,103 bases, or log 9, variance 1.5) to short (mean 665 bases, or log 6.5, variance 249 0.25). We could thus, for example, assess the effects of combining long reads with short targets and short reads with long 250 targets. Next, we generated ten thousand (read, target) sample pairs for each combination of read and target distributions.

251
For each pair, we randomly "placed" a target start on the read with a uniform distribution across read positions, in line 252 with how targets are distributed across DNA fragments in realistic single-molecule sequencing experiments. We then 253 asked if the first part of the read (as seen by the adaptive sampling algorithm) would have detected the target. Since all 254 simulated reads contain a target, failure to detect one in the first number of bases counts as a false negative ( Figure S3).

255
To estimate effect sizes on the false-negative rate (FNR) we fit a multivariate regression model using brms (see above)