MinION Nanopore Sequencing of Multiple Displacement Amplified Mycobacteria DNA Direct from Sputum

Sequencing of pathogen DNA directly from clinical samples offers the possibilities of rapid diagnosis, faster antimicrobial resistance prediction and enhanced outbreak investigation. The approach is especially advantageous for infections caused by species which grow very slowly in culture, such as Mycobacteria tuberculosis. Since the pathogen of interest may represent as little as 0.01% of the total DNA, enrichment of the input material for target sequences by specific amplification and, or depletion of non-target DNA (human, other bacteria) is essential for success. Here, we investigated the potential of isothermal multiple displacement amplification by Phi29 polymerase. We directed the amplification reaction towards Mycobacteria DNA in sputum samples by exploiting in our oligonucleotide primer design, their high GC content (approximately 65%) relative to human DNA. Amplified DNA was then sequenced using the Oxford Nanopore Technology MinION. In addition, a model system comprising standardised ‘mock clinical samples’ was designed. Pooled infection negative human sputum samples were spiked with enumerated Mycobacterium bovis (BCG) Pasteur strain at concentrations spanning the typical range at which Mycobacterium tuberculosis is found in human sputum samples (106 - 101 BCG cells/ml). To assess the amount of BCG sequence enrichment achieved, sample DNA was sequenced both before, and after amplification. Reads from amplified samples, which mapped to a BCG reference genome, comprised short repeated sequences - apparently transcribed multiple times from the same fragment of BCG DNA. Therefore post-amplification, the samples were enriched for BCG sequences relative to unamplified sequences (8,101 BCG reference mapped reads, increasing to 28,617 at 106 BCG cells/ml sample), but BCG genome coverage declined markedly (for example 89.4% to 4.1%). In summary, the use of standardised mock clinical samples allowed direct comparison of data from different Mycobacteria enrichment experiments and sequencing runs. However, optimal conditions for multiple displacement amplification of minority Mycobacteria DNAs, remain to be identified.

. Samples are 75 cultured until positive, which usually occurs within 1-2 weeks if the sample is smear positive 76 (but up to 5-6 weeks if bacterial load is low), then total DNA is extracted and sequenced 77 using the Illumina platform [5,7]. WGS diagnostics can be completed in a median of 9 days 78 (IQR 6-10) [5]. Antimicrobial resistance predictions are based on nucleotide sequence data 79 [8], and phylogenetic analyses identify transmission events and outbreaks [9,10]. The  concentration of target organisms. Mycobacteria DNA, for example, can represent as little as 88 0.01% of the total DNA extracted from sputum [11]. Small scale studies employing direct 89 from sample sequencing have reported 0.002 -0.7% sequence coverage of the M. 90 tuberculosis genome (using differential lysis and a DNA extraction kit) [12], and up to 90%  Potential advantages of adopting the Nanopore sequencing platform (Oxford Nanopore 100 Technology, ONT, Oxford, UK) include the possibility of increased read lengths [15,16] and 101 consequent improved de novo assemblies, avoiding the need for a reference genome [16].

102
The accuracy of DNA sequences obtained using the Nanopore platform is constantly 103 improving; 99.9% can be achieved when Nanopolish is used to improve consensus accuracy 104 [17].

106
Enrichment of target pathogen sequences within total extracted DNA is a prerequisite for 107 direct-from-sample sequencing. The technique of isothermal multiple displacement 108 amplification (MDA) using Phi29 DNA polymerase [18] shows promise, since μg quantities 109 of DNA can be generated from minimal template (1-10ng) [19][20][21]. In the present study, we 110 investigated the possibility of biasing MDA towards Mycobacteria DNA in sputum samples, 111 prior to sequencing the DNA directly using the Oxford Nanopore Technology MinION.    primers were tested; 'random' hexamers containing 65% GC, or 'most frequent 10mers' 177 based on the most frequent 10 bp sequence repeats identified in Mycobacteria genome (S1 178   Table). Incubation was at 30 °C for 16 hours. Amplified DNA was purified using AMPure 179 XP beads, quantitated, and 1 μg was digested to remove branched structures using 1 μl T7 180 Endonuclease I (New England BioLabs, Hitchin, UK) in 20 μl reaction volume at 37°C for 1 181 hour, followed by a second AMPure XP bead purification step. were reported using NanoPack [24]. Then, reads from each sample were mapped to the BCG   sequence of an individual read could not be aligned linearly to the reference sequence. Thus, one of the linear alignments in a repeat-containing 227 read is referred as the "representative alignment" and the others (repeats of this sequence) are referred as "supplementary alignment(s)". * ratio 228 not given when the number of mapped read is less than 10. The high GC content of the Mycobacteria genome (for M. tuberculosis 65.6% GC) [28] 233 relative to most of the human genome (<50% GC for ~92% of the genome and 50-60% GC in 234~7% genome [29] was exploited in our experimental design. MDA was primed using 65% 235 GC biased 'random' hexamers, or MF (most frequent) 10mer primers (S1 Table). Visualization of the mapping profile revealed that BCG-like reads were split into multiple 265 small fragments which each mapped to same region of the reference genome (Fig 1)   products across the sequence of the BCG reference genome also indicated that obvious 286 amplification hot spots were absent (Fig 2).  The negative control sample (negative sputum with zero BCG cells added) contained a small 297 number of reads (less than five) which mapped to the BCG reference (Table 1). This also 298 occurred in sputum samples spiked at low BCG concentrations (10 2 and 10 1 cells/ml) in both 299 the phi29 amplified and unamplified control sequence data. All samples had been sequenced 300 while 'multiplexed' -the addition of a barcode sequence to each sample during library 301 preparation allowed 'de-multiplexing' to be performed bioinformatically after sequencing.

302
Despite the fact that we implemented stringent bioinformatic barcode removal for de-303 multiplexing, which successfully removed most of this cross-contamination, a low level 16 304 remained. This issue was confirmed to be bioinformatics-based, when samples were run of 305 single flow cells (not multiplexed).

DISCUSSION
Multiple Displacement Amplification of DNA by Phi29 polymerase is an attractive choice for experiments aiming to generating large quantities of DNA (≥μg) from very small (≤ng) amounts under isothermal conditions [18]. Advantages include a low error rate due to 3',5'exonuclease 'proofreading' activity (error rate ~9.5 x 10 -6 ), the capacity to synthesise DNA molecules >70kb long and the possibility of virtually whole genome amplification [19,[30][31][32]. Relative to PCR-based methods, more DNA is amplified by at least an order of magnitude, and good genome coverage and reduced amplification bias of genomic DNA from human cells has been reported [33]. Long DNA fragments provide ideal input for the Nanopore MinION sequencing platform, which in turn generates long reads offering the possibility of de novo, rather than reference based genome assemblies.
MDA has also shown promise for the accurate and unbiased amplification of whole bacterial genomes from uncultivable, or slow growing species, and even 'single cell genomics' [34,35]. MDA with random hexamers has been used to amplify Xylella fastidiosa (Gram negative plant pathogen, 52% CG content) DNA directly from approximately 1000 target cells, yielding over 4 μg of high molecular weight DNA and achieving uniform genome coverage relative to unamplified DNA [34]. Coverage of Coxiella burnetii (fastidious obligate intracellular pathogen, 42.5% GC) was similarly representative, as assessed by PCR [36].
Work in our laboratory aiming to sequence Mycobacteria directly from sputum samples has previously used 3% NaOH (Nac-Pac Red; Alpha-Tec Systems, Vancouver, WA, USA) to deplete sputum of non-Mycobacteria, together with a 'Molysis kit' (Molzym Life Science, Bremen, Germany) to reduce human DNA contamination [11]. An important issue arising from these pre-treatments is that for most samples, insufficient DNA remains for direct sequencing using the Nanopore MinION. Here, we aimed to investigate the possibility of 18 eliminating the need for such pre-treatment, while amplifying microgram quantities of DNA enriched for Mycobacteria sequences.
Our experiments employed 65% GC biased hexamers to favour amplification of the BCG genome (65% GC content) relative to the human genome (CG content <50% for ~92% of the genome and 50-60% CG for ~7% genome [29]). This achieved two to five fold enrichment for BCG sequences (Table 1) but at the expense of genome coverage (for example 89.4% genome coverage decreased to 4.1% at the 10 6 BCG spike concentration, Table 1). Post amplification, certain regions of the genome were covered at extremely high depth. The reason for the high coverage in certain regions, but not others is unknown. It may be unrelated to the mean GC content of these sequences, because this was the same within amplified sequences as the mean for the whole genome. The absence of obvious amplification hotspots conserved between experiments (Fig 2) suggests regions of high coverage may occur stochastically.
The difficulty of amplifying GC rich sequences from a complex mixture by MDA has been reported previously [37]; species with the highest GC content underwent significantly less amplification from an environmental (soil) sample compared to low GC bacteria. Yilmaz et al. [38] evaluated three different commercially available kits, including NEB Phi29 used in our study. They also observed amplification bias against high (G+C)-content templates in bacteria amplified from sludge and compost communities. Our use of 65% GC biased hexamers (also MF10mers Table 1) with the NEB Phi29 polymerase was insufficient to achieve unbiased amplification of the GC rich BCG genome. Similar bias against GC rich sequences has been observed previously [39]; MDA of DNA extracted from tumour samples reproducibly distorted gene dosage representation in the amplified DNA, reflecting the GC 19 content of different regions of the template. Also, a study of copy number variants within the human genome created hundreds of potentially confounding MDA artefacts that could obscure authentic copy number variants, which were reproducible and influenced by GC content [40]. There is also evidence of stochastic effects originating from the amplification of very low amounts of genomic template from a single bacterium [41] -locus representation values ranged from 0.1% to 1,211%.
The reason MDA is biased against GC rich templates is unclear, but it could reflect the higher melting temperature of GC rich DNA relative to AT rich sequences. In addition to the conditions described above, we tested reaction conditions which are known to alleviate GCmelting related issues in PCR, by effectively reducing the melting temperature of the DNA (PCR additives Q-solution and DMSO), as well as increasing the incubation temperature to 35 °C and 40 °C. A novel thermostable mutant of Phi29 polymerase, designated WGA-X (Thermofisher) has been described which amplifies DNA at 45°C, [42] and offers improved amplification of high GC content templates, but it was not commercially available. We also tested the Phi29-polymerase based Qiagen REPLI-g kit (data not shown) because it uses alkaline DNA denaturation to improve the uniformity of DNA denaturation while minimising DNA fragmentation or generation of abasic sites (relative to heat denaturation), and because it's been reported to work at 40°C [43]. This kit was also tested by Yilmaz et al. [38] and it performed best with respect to GC bias. Unfortunately, none of these modifications improved the genome coverage achieved in our study. Further experiments with shorter amplification incubation times were also performed, with the aim of potentially reducing the amplification bias, but these were unsuccessful. 20 The challenges of amplifying a minority, GC rich target DNA from within a complex mixture remain. Here, establishing mock clinical samples containing defined numbers of BCG cells represented a key step forward, because the data from different method development experiments could be compared. This material is proving invaluable in further work aiming to optimise 'direct from sample' Mycobacteria genome sequencing. In conclusion, optimal conditions under which Phi29 polymerase might be directly amplify minority GC rich templates without bias, remain to be identified.