Abstract
Whilst next generation sequencing is frequently used to whole genome sequence bacteria from cultures, it is rarely achieved direct from sample, and even more rarely performed in a clinically relevant time frame. To demonstrate the potential of direct from blood sequencing a bacteraemia model was developed, using defibrinated horse blood to model whole human blood infections. Sample processing included removal of erythrocytes and lysis of white blood cells, before rapid and accurate none targeted amplification. The rapid approach to allow direct from sample sequencing, allowed greater than 92% genome coverage of pathogens of interest whilst limiting the sequencing of host genome (less than 7% of all reads). Analysis of de novo assembled reads allowed accurate genotypic antibiotic resistance prediction. The sample processing would be easily applicable to multiple sequencing platforms. Overall this model provides evidence that it is currently possible to rapidly produce whole genome bacterial data from low cell number sterile site infections.
Introduction
Bacterial sequencing of clinical isolates has been used in many settings, including for virulence determinants1–4, population structure5 ,6 and outbreak investigations 7–10. Almost exclusively current applications of sequencing rely either upon culture or targeted amplification imparting a diagnostic bias towards known pathogens and bacteria with known and relatively simple growth requirements. Bacterial whole genome sequencing is most often performed at reference or research centres to study population structures and evolution, with results rarely available in a clinically relevant time. One exception to this is the application of next generation sequencing to the diagnosis of tuberculosis11,12, where WGS was shown, in some cases, to be quicker than current routine methods for predicting resistance; however this process still relies on culturing of the bacteria prior to sequencing. Direct from sample sequencing of pathogens using an untargeted (metagenomic) approach has the potential to lower turnaround times and lower diagnostic bias. Metagenomic sequencing techniques have already been applied to ecological studies 13–16, gut microbiomes 17–19 and investigation and identification of viruses 20–23.
In order to sequence a bacterial genome, at least 1 ng of DNA is needed, which is the equivalent of over a 200000 copies of the E. coli K12 genome. To directly sequence pathogens present at low levels (as few as 1 bacterial cell per ml blood) such as those found in sterile site infections an unbiased and high fidelity amplification enzyme is needed. Multiple displacement amplification (MDA) using ɸ 29 is an alternative method to PCR for the production of DNA in high enough amounts for sequencing. ɸ 29 MDA has the potential to decrease our reliance on primers and increase the lengths of DNA produced through amplification. This method has the advantage of being able to produce large lengths of DNA with lower errors than conventional PCR. ɸ 29 MDA has been applied to samples with very low starting DNA amounts from single cells (both prokaryotic and eukaryotic) and provided DNA in levels high enough to perform sequencing 24. The high fidelity and 3’–5' proofreading activity reduces the amplification error rate to 1 in 106–107 bases compared to conventional Taq polymerase with a reported error rate of 1 in 90,000 25. A single binding reaction can incorporate over 70kb 26, meaning that this method is not limited by the length of the initial target like conventional PCR. It has been demonstrated that this method is suitable for use with single bacterial cells24 and more recently malaria parasites directly from blood samples 27.Bacteraemia is a major global cause of morbidity and mortality, with a large range of aetiologies and is a particular problem in healthcare settings28,29. The current microbiological diagnostic process is culture based often involving specialised equipment, with time to positivity varying due to aetiology and pathogen load. Although recent advances in MALDI-TOF processing has allowed direct identification of bacteria from positive blood cultures 30,31,32, antibiotic sensitivities take a further 18 hours. The direct application of whole genome sequencing to blood samples would allow rapid pathogen diagnosis, along with simultaneous pathogen typing and genotypic resistance prediction. Furthermore, by applying unbiased pathogen detection method the diagnostic bias would be lowered. In order to allow to allow direct from sample sequencing, a two stage host cells removal was performed firstly red blood cells were removed using HetaSep®, which has previously been used to isolate nucleated cells in the blood, particularly granulocytes but has not previously been applied to aid in pathogen isolation. Secondly a selective white blood cell lysis was undertaken using saponin33 in order to release intracellular pathogens and aid in host nucleic acid removal.
Methods
Model process
Clinical isolates were collected from the Royal Free Hospital Hampstead, where they had been stored at -80°C after isolation from septic patients. Phenotypic data was produced using the BD phoenix and was retrieved from final hospital reports. Bacteraemia models were set up by adding an estimated ten bacterial cells, (S. aureus or E. coli) calculated using serial dilutions to 1ml horse blood. The workflow depicted in Figure 1 was then applied, with samples being cultured at each stage to assess bacterial survival, sample processing was repeated three times. 200 μl HetaSep® was added and the sample vortexed and incubated at 37°C for 10 minutes. 550 μl supernatant was removed and 200 μl 5% saponin was added to a final 2% solution, and incubated at room temperature for 5 minutes. 700μl sterile water was added for a water shock and incubated at room temperature for 30 seconds before salt restoration with the addition of 21μl 5M NaCl. The sample was centrifuged at 4000xg for 5 minutes and the supernatant discarded before addition of 2 μl turbo DNase1 and 5 μl of 10x buffer (Ambion). The sample was vortexed and incubated at 37°C for 15 minutes EDTA was added to a final concentration of 15nM. The sample was then centrifuged for 5 minutes at 4000xg and the supernatant removed and discarded. The pellet was washed in decreasing volumes of PBS, initially 200μl then 100μl followed by 20 μl with each stage being centrifuged at 6000xg for three minutes.
Bacteria were suspended in a total of 4 μl sterile PBS. Extraction was performed using alkaline method; briefly cell suspensions were added to 200 mM potassium hydroxide (Qiagen) and 50mM dithiothreitol (Qiagen) and incubated at 65°C for 10 minutes. The reaction was then neutralised using neutralisation buffer (Qiagen). The sample was then briefly vortexed and placed on ice.
Amplification was performed using ɸ29 MDA (Repli-g Single Cell Kit Qiagen). A master mix was prepared on ice in a total volume of 40 μl, with 29 μl reaction buffer, containing endonuclease resistant hexamer primers and 2 μl (40 U) of ɸ29polymerase (Qiagen, REPLI-g Single Cell Kit). The extracted DNA was then added to the master mix and the sample was then incubated at 300C for 2 hours, and the reaction stopped by heating to 650C for 3 minutes.
Sequencing
DNA was quantified using Qubit BR kit and 3 μg of ɸ29 MDA DNA was de-branched using S1 nuclease in a 90 μl reaction as follows, 3 μl 10x buffer, 3 μl 0.5M NaCl, 10 μl S1 nuclease (1U/ μl) with water to make the volume to 90μl. The digestion reaction was left at room temperature for 30 minutes and the enzyme deactivated by incubating at 70°C with 6 μl 0.5M EDTA. The DNA was then fragmented by nebulisation at 30psi for 180 seconds. The DNA was sequenced using the 454 Junior using the manufacturer’s recommended methods. A negative library was produced by using sterile PBS in the ɸ29 MDA, this was then sequenced using the same method.
Data analysis
Data analysis was performed in three stages, firstly host and contaminating reads were removed before abundance trimming, secondly reads were classified using lowest common ancestor (LCA) analysis and finally classified reads were assembled and analysed. Initially reads were mapped using Newbler standard parameters against host and a local contamination library consisting of sequenced negative controls and reads identified as contamination in previous runs. Unmapped reads were written into a new fastq file using a custom python script (supplementary material) and taken forward for further analysis. The remaining reads were abundance trimmed using Khmer34 using two passes to a maximum depth of 50, to remove over represented reads produce by the ɸ29 MDA method. Following this the reads were error trimmed using Prinseq35, with ends being trimmed to a Q20 cut off. Blastn was then used to assign the reads against the nr database. Once completed lowest common ancestor (LCA) analysis was performed using MEGAN36, and reads associated with the bacterial species of interest were extracted to a new fastq file. These reads were then de novo assembled using standard parameters in SPAdes37 assembly outputs were assessed using QUAST (Quality Assessment Tool for Genome Assemblies)35. Reference assemblies were performed using the closest reference sequence identified using the LCA analysis. Antibiotic prediction was performed using Mykrobe11 for S. aureus and ResFinder38 for E. Coli using the de novo assembled reads.
Results
Sample processing
When the full blood processing method was applied to E. coli and S. aureus, good survival rates was found for both bacteria through-out all stages with the final survival rate being 100%. Details of survivial at each stage of processing can be found in Table 1
Sequencing results S. aureus
After processing and sequencing the horse blood spiked with S. aureus the number of reads passing basic filter was 128500. Once known contaminants were removed and error and abundacnce trimming complete 124,145 reads remained. Using Blastn and MEGAN, 62.1% reads were identified as S. aureus. 6.76% of the reads were identified as the genus Equus, and 1.72% were identified as Parascaris equorum. When examining the S. aureus reads closer 4254 reads were identified to the subspecies level, (Staphylococcus aureus subsp. aureus HO 5096 0412), this subspecies has a complete genome available (GenBank GCA_000284535.1) and was used as a reference for reference mapping. Reference mapping produced 451 contigs and covered 92% of the reference genome. When the reads identified as Staphylococcus using the LCA analysis were extracted and the reads de novo assembled 1212 contigs were produced with an N50 of 3882. When this was compared to the reference sequence 83% of the genome was covered with 10 misassemblies. Mykrobe analysis using the de novo assembly gave genotypic result for 12 antibiotics. When comparing the genotypic and phenotypic (BD Phoenix™) results matched in 11 of the 12 antibiotics. The results for ciprofloxacin were inconclusive in genotypic tests, but resistant by phenotypic methods.
Sequencing results E. coli
Post sequencing 173597 reads passed the initial filter, once contaminants were removed and the reads error and abundance trimmed 170243 reads remained. Overall 73% of all reads remaining after the analysis pipeline were identified as E. coli. 31959 (18%) reads had no identity. Unlike S. aureus it was not possible to type the E. coli, as 150 reads were assigned to O7:K1 and 211 reads were assigned to JJ1886. Reads which were identified as Enterobacteriaceae, Escherichia and E. coli by BLAST and LCA analysis were extracted from the fastq produced after pipeline completion. The genome of JJ1886 was available from the Integrated Microbial Genomes database (ID 2558309052), and the chromosomal sequence was used as the reference against which the extracted reads were assembled. After reference mapping assembly 93.5% of the genome was covered in 548 contigs. When the reads were de novo assembled, 89% of the same reference was covered in 1334 contigs. The de novo assembly was used to identify several resistance markers using ResFinder. The resistance markers included dfrA, conferring trimethoprim resistance, gyrA conferring resistance to fluoroquinolones. mdtK an efflux pump conferring resistance to norfloxacin and blaCMY-2 which is an AmpC, conferring resistance to beta-lactams including cephalosporins. Additionally, eight drug efflux systems were identified. These results were concordant to the phenotypic data.
Discussion
Direct diagnostics using WGS from clinical samples would, in many ways, provide the ideal diagnostic method. By providing all the information from whole genome data with the speed of direct sample testing. However, there are various obstacles, including low pathogen numbers, high host background and difficulty in interpreting genotypic data. Here, a model was prepared to demonstrate the potential for direct from sample sequencing from whole blood.
Fresh whole horse blood was used to model bacteraemia, as it was readily available. The process was developed to remove RBCs early in the process, as they represent the largest proportion of the cellular makeup of blood (up to 96%39), debulking the sample, and preventing release of oxidative agents. Selective lysis on WBCs allowed the release of any intracellular pathogens and exposed host nucleic acid to nucleases. S. aureus and E. coli were chosen due to different cell wall types and cell morphology; both demonstrated good survival during the developed sample processing.
The majority of sequencing reads from spiked horse blood were associated with the spiked bacteria (62.1% S. aureus and 73% E. coli), considering the horse genome is over 500 times larger than the E. coli genome this shows that the vast majority of the host material was removed. Overall there was good concordance of phenotypic and genotypic results showing the potential for rapid genotypic prediction of antibiotic resistance from 10 bacterial cells in 1 ml of host blood. The inconclusive ciprofloxacin results demonstrate the need for improved understanding of the mechanisms of resistance. Ciprofloxacin resistance is harder to predict as it is chromosomal mutation rather than gene acquisition. Three mechanisms of fluoroquinolone resistance have been proposed in S. aureus, Topoisomerase IV gene mutations, DNA gyrase gene mutations and an active efflux pump (NorA)40. The complexity of predicting ciprofloxacin resistance suggests that the database may be lacking in its ability to predict ciprofloxacin resistance, and so this is the most likely cause of the inconclusive result for ciprofloxacin resistance. Additionally the creators of Mykrobe (Bradley et al11) found a false negativity rate of 4.6% for ciprofloxacin resistance.
In addition to identifying the isolates resistance to beta-lactams the database was able to identify the blaZ gene and MecA gene. Genotypic testing will never entirely replace phenotypic susceptibility testing, due to its inability to identify novel resistance determinants and the comprehensive nature of phenotypic testing. However, in this scenario, of invasive sepsis, the gain in speed provided by not having to culture the organism to determine susceptibility could be life-saving.
Multiple antibiotic resistance factors were identified in the E. coli, which gave good concordance with the phenotypic data. Using the genotypic data, it was possible to rapidly identify the beta-lactamase present as BlaCMY-2. The rapid identification of the specific resistance genes in bacteria could help identify outbreaks by providing more information than a simple antibiogram. Additionally, it could help monitor novel resistance genes, or genes that are increasing in incidence. Large amounts of horizontal genome transfer amongst Gram negative bacteria has the potential to cause outbreaks of resistance bacteria through genes or plasmids41, which would be more complex to track. Rapid identification of genes causing the resistance in isolates could help inform epidemiological and outbreak studies which could involve several species of bacteria. The output for the resistance prediction was complex to interpret with several genes identified which didn’t always have specific drug resistance associated. This highlights a down side of generic databases, as they are often time-consuming to interpret. A study of NGS sequencing data from bacteraemia isolates of E. coli have shown resistance prediction specificity of 97%2, if this was coupled with direct from sample sequencing genotypic prediction could inform treatment more rapidly than phenotypic test.
Limitations of this model include differences in horse and human blood, and difference in sample between spiked bacteria and bacteria causing a true infection. Additionally, factors in the blood such as immune and inflammatory reactions could alter the efficiency of the preparation method. However, this model shows proof of principal that direct unbiased sequencing directly from blood is possible and could be used to diagnose and inform treatment of bacterial bloodstream infections.
The method presented here is easily adapted to allow application to other sequencing platforms. Overall the method presented allows sufficient DNA for whole genome sequencing of pathogens in blood to be produced within a single day.