Abstract
Newcastle disease (ND) outbreaks are global challenges to the poultry industry. Effective management requires rapid identification and virulence prediction of the circulating Newcastle disease viruses (NDV), the causative agent of ND. However, these diagnostics are hindered by the genetic diversity and rapid evolution of NDVs. A highly sensitive amplicon sequencing (AmpSeq) workflow for virulence and genotype prediction of NDV samples using a third-generation, real-time DNA sequencing platform is described using both egg-propagated virus and clinical samples. 1D MinION sequencing of barcoded NDV amplicons was performed on 33 egg-grown isolates, (23 unique lineages, including 15 different NDV genotypes), and from 15 clinical swab samples from field outbreaks. Assembly-based data analysis was performed in a customized, Galaxy-based AmpSeq workflow. For all egg-grown samples, NDV was detected and virulence and genotype were predicted. For clinical samples, NDV was detected in ten of eleven NDV samples. Six of the clinical samples contained two mixed genotypes, of which the MinION method detected both genotypes in four of those samples. Additionally, testing a dilution series of one NDV sample resulted in detection of NDV with a 50% egg infectious dose (EID50) as low as 101 EID50/ml. This was accomplished in as little as 7 minutes of sequencing time, with a 98.37% sequence identity compared to the expected consensus. The high sensitivity, fast sequencing capabilities, accuracy of the consensus sequences, and the low cost of multiplexing allowed for identification of NDV of different genotypes circulating worldwide. This general method will likely be applicable to other infections agents.
Introduction
Newcastle disease (ND) is one of the most important infectious diseases of poultry and is a major cause of economic losses to the global poultry industry. Newcastle disease is caused by virulent strains of avian paramyxovirus 1 (APMV-1), commonly known as Newcastle disease virus (NDV) (1), recently reclassified as avian avulavirus-1 (AAvV-1) (2). Newcastle disease viruses are a highly diverse group of viruses with two distinct classes and 19 accepted genotypes, infecting a wide range of domestic and wild bird species. In addition to the genotypic diversity of NDVs, these viruses are also diverse in their virulence. The global spread, constant evolution, varying virulence, and the wide host range of NDV are challenges to the control of ND (3).
Effective control of ND is dependent on rapid, sensitive, and specific diagnostic testing, which for ND are typically oriented towards detection, genotyping, or prediction of virulence. Virulence of NDV is best assayed through infection-based studies (4), but due to the time constraints associated with such methods, reverse transcriptase-quantitative PCR (RT-qPCR) and sequencing of the fusion (F) gene cleavage site are used to predict NDV virulence (5, 6). Genotyping of NDV is commonly achieved through Sanger sequencing of the coding sequence of the fusion gene (7), which also allows for prediction of virulence. Preliminary genotyping can be accomplished through partial fusion gene sequencing (i.e., variable region) (8). PCR-based tests aimed at rapid detection often lack applicability for virulence determination due to NDV’s genetic diversity, and the current methods that rely on Sanger sequencing lack multiplexing capability and have limited sequencing depth, which complicates detection of mixed infections. While fusion-based assays can be used for detection (9), the variability of this region, which makes it useful for genotyping, hinders the universal applicability of any single primer set. Thus, detection-focused assays are often designed towards more conserved regions, such as the matrix or polymerase genes(9–11). These assays, however, fail to provide virulence or genotype predictions. In summary, there is a need for a method that will sensitively and rapidly detect numerous genotypes of NDV and provide genotype and virulence prediction.
Rapid advances in nucleic acid sequencing, have led to different sequencing platforms (12, 13) being widely applied for identification of novel viruses (14), whole genome sequencing (15), transcriptomics, and metagenomics (16, 17). However, high capital investments and relatively long turnaround times limit the widespread use of these NGS platforms, especially in developing countries (18). Recent improvements in third-generation sequencing, including those introduced by Oxford Nanopore Technologies (ONT) (19), increase the utility of high-throughput sequencing as a useful tool for surveillance and pathogen characterization (20). Among the transformative advantages of ONT’s sequencing technology are the ability to perform real-time sequence analysis with a short turnaround time (21), the portability of the MinION device, the low startup cost compared to other high-throughput platforms, and the ability to sequence up to several thousand bases from individual RNA or DNA molecules. The MinION device has been successfully used to evaluate antibiotic resistance genes from several bacterial species (22, 23), obtain complete viral genome sequences of an influenza virus (24) and Ebola virus (25), and detect partial viral genome sequences (e.g., Zika virus (26) and poxviruses (21)) by sequencing PCR amplicons (AmpSeq). The MinION, therefore, represents an opportunity to take infectious disease diagnostics a step further and to perform rapid identification and genetic characterization of infectious agents at a lower cost.
As with any deep sequencing platform, the sequence analysis approach is integral for accurate interpretation. Primarily, two approaches for taxonomic profiling of microbial sequencing data have been employed: read-based and de novo assembly-based classifications. Read-based metagenomic classification software has been used for identification of microbial species from high-throughput sequencing data (19, 27–29). Although the sequencing accuracy of the MinION is improving, the raw single-read error rate of nearly 10% (30) may limit the accuracy of this approach for Nanopore data (27), especially when attempting to subspecies level differentiation. De novo approaches that use quality-based filtering and clustering of reads (31), or use consensus-based error correction of Nanopore sequencing reads have been reported (32); however, these are not optimized for amplicon sequencing data.
In this study, a specific, sensitive, rapid protocol, using the MinION sequencer, was developed to detect representative isolates from all current (excluding the Madagascar-limited genotype XI) genotypes of NDV. This protocol was also tested on a limited number of clinical swab samples collected from chickens during disease outbreaks. Additionally, a Galaxy-based, de novo AmpSeq workflow is presented that efficiently reduces systematic sequencing errors in Nanopore sequencing data and uses amplicon-based sequences to obtain accurate final consensus sequences.
Materials and Methods
Viruses
Thirty-three NDV isolates, representing 23 unique lineages (including 15 different genotypes) of different virulence, and 10 other avian avulaviruses (AAvV 2-10 and AAvV-13) from the Southeast Poultry Research Laboratory (SEPRL) repository, were propagated in 9–11-day-old specific pathogen free (SPF) eggs (33) and the harvested allantoic fluids were used in this study. Additionally, 15 oral and cloacal swab samples collected from chickens during disease outbreaks in Pakistan in 2015 were tested. The background information of the egg-grown isolates and the clinical samples is summarized in Table S1 and Table S2, respectively.
RNA Isolation
Total RNA from each sample was extracted from infectious allantoic fluids or directly from clinical swab media using TRIzol LS (Thermo Fisher Scientific, USA) following the manufacturer’s instructions. RNA concentrations were determined by using Qubit® RNA HS Assay Kit on a Qubit® fluorometer 3.0 (Thermo Fisher Scientific, USA).
Amplicon synthesis
Approximately 20 ng (in 5 μl) of RNA was reverse transcribed, and cDNA was amplified with target-specific primers using the SuperScript™ III One-Step RT-PCR System (Thermo Fisher Scientific, USA). Previously published primers (4331F and 5090R) (8, 34) were used in this protocol; however, primers were tailed with universal adapter sequence of 22 nucleotides (in bold font) to facilitate PCR-based barcoding: 4331F Tailed: 5′-TTTCTGTTGGTGCTGATATTGCGAGGTTACCTCYACYAAGCTRGAGA-3′; 5090R Tailed: 5′-ACTTGCCTGTCGCTCTATCTTCTCATTAACAAAYTGCTGCATCTTCCCWAC-3′). The thermocycler conditions for the reaction were as follows: 50 °C for 30 minutes; 94 °C for 2 minutes; 40 cycles of 94 °C for 15 seconds, 56 °C for 30 seconds, and 68 °C for 60 seconds, followed by 68 °C for 5 minutes. The reaction amplified a 788 base pair (bp) NDV product (832 bp including primer tails) for all genotypes, which included 173 bp of the 3′ region of the end of the M gene and 615 bp of the 5′ end of the F gene (sizes and primer locations based on the Genotype V strain).
Library preparation
Amplified DNA was purified by Agencourt AMPure XP beads (Beckman Coulter, USA) at 1.6:1 bead-to-DNA ratio and quantified using the dsDNA High Sensitivity Assay kit on a Qubit® fluorometer 3.0 (Thermo Fisher Scientific, USA). MinION-compatible DNA libraries were prepared with approximately 1 μg of barcoded DNA in a total volume of 45 μL using nuclease-free water and using the 1D PCR Barcoding Amplicon Kit (Oxford Nanopore Technologies, UK) in conjunction with the Ligation Sequencing Kit 1D (SQK-LSK108) (19) as per manufacturer’s instructions. Briefly, each of the amplicons were diluted to 0.5 nM for barcoding and amplified using LongAmp Taq 2X Master Mix (New England Biolabs, USA) with the following conditions; 95 °C for 3 min; 15 cycles of 95 °C for 15 seconds; 62 °C for 15 seconds, 65 °C for 50 seconds, followed by 65 °C for 50 seconds. The barcoded amplicons were bead purified, pooled into a single tube, end prepped, dA tailed, bead purified, and ligated to the sequencing adapters per manufacturer’s instructions. Final DNA libraries were bead purified and stored frozen until used for sequencing.
Comparison to RT-qPCR assay for Matrix gene
For comparison of this MinION-based protocol with the matrix gene reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) assay, both methods were run on a dilution series from a single isolate. NDV (LaSota strain) from the SEPRL repository was cultured in SPF 9–11-days-old eggs and the harvested allantoic fluids were diluted to titers ranging from 106 to 101 EID50/mL in brain-heart infusion broth. RNA was extracted from dilutions, and DNA libraries were prepared following the same protocols as described above. Amplicons from each of the dilutions were barcoded separately. At the pooling step, equal concentrations of barcoded amplicons from different dilutions of LaSota were pooled together in single tube. Dilutions, extractions, library construction, and sequencing were performed twice.
The same extracted RNA was also used as the input into the RT-qPCR using the AgPath-ID one-step RT-PCR Kit (Ambion, USA) on the ABI 7500 Fast Real-Time PCR system following the previously described protocols (9).
Sequencing by MinION
The libraries were sequenced with the MinION Nanopore sequencer (19). A new FLO-MIN106 R9.4 flow cell, stored at 4°C prior to use, was allowed to equilibrate to room temperature for 10 minutes before priming it for sequencing. The flow cell was primed with running buffer as per manufacturer’s instructions. The pooled DNA libraries were prepared by combining 12 μL of the libraries with 2.5 μL nuclease-free water, 35 μL RBF, and 25.5 μL library loading beads. After the MinION Platform QC run, the DNA library was loaded into the MinION flow cell via the SpotON port. The standard 48-h 1D sequencing protocol was initiated using the MinKNOW software v.5.12. Detailed information for all MinION runs in this study is provided in Table S3.
The complete steps from RNA isolation to MinION sequencing were performed twice for egg-grown viruses. One run consisted of six egg-grown isolates from different genotypes representative of vaccine and virulent NDV strains (run 3: 6-sample pool). The other run consisted of these same six viruses and an additional 27 egg-grown NDV isolates (run 4: 33-sample pool).
To determine the accuracy of consensus sequences at different sequencing time points for accurate identification of the NDV genotypes, the raw data (FAST5 files) obtained from the 10-fold serial dilution experiment (see above) were analyzed in subgroups based on time of acquisition and processed through the AmpSeq workflow as described below.
MinION data analysis workflow
To analyze the Nanopore sequencing data, a custom, assembly-based AmpSeq workflow within the Galaxy platform interface (35) was developed, as diagrammed in Figure 1. The MinION raw reads in FAST5 format were uploaded into Galaxy workflow. The reads were base-called using the Albacore v2.02 (ONT). The NanoporeQC tool v0.001 (available in the Galaxy testing toolshed) was used to visualize read quality based on the summary table produced by Albacore. Porechop v0.2.2 (https://github.com/rrwick/Porechop) was used to demultiplex reads for each of the barcodes and trim the adapters at the ends of the reads by using default settings. Short reads (cutoff = 600 bp) were filtered out and the remaining reads were used as input to the in-house LAclust v0.002. LAclust performs single-linkage clustering of noisy reads based on alignment identity and length cutoffs from DALIGNER pairwise alignments (36) (minimum alignment coverage = 0.90, maximum identity difference = 0.35; minimum number of reads to save cluster = 5; maximum reads saved per cluster = 200, minimum read length = 600 bp; rank mode = number of intracluster linkages; randomized input read order = yes). Read clusters generated by LAclust were then aligned using the in-house Amplicon aligner v0.001 to generate a consensus sequence. This tool optionally subsamples reads (target depth used = 100), re-orients them as necessary, aligns them using Multiple Alignment using Fast Fourier Transform (MAFFT) (37) with highly relaxed gap opening and extension penalties, and calls a majority consensus. Next, each consensus was used as a reference sequence for mapping the full unfiltered read clusters from LAclust with BWA-MEM and ONT2D settings (38, 39). The final consensus sequence for each sample was refined by using Nanopolish v0.8.5 (https://github.com/jts/nanopolish), which calculates an improved consensus using the read alignments and raw signal information from the original FAST5 files. After manually trimming primer sequences from both 3′ (25 bp) and 5′ (29 bp) ends, the obtained consensus sequences (734 bp) were BLAST searched against NDV customized database, which consisted NCBI’s nucleotide (nt) database and internal unpublished NDV sequences (NCBI database updated on May 23, 2018).
Sequencing by MiSeq
For comparison between nucleotide sequences obtained from MinION and MiSeq (a validated and high accuracy sequencing platform), 24 NDV isolates from the SEPRL repository that were used for MinION sequencing (representing each genotype and sub-genotype of NDV) and 15 clinical swab samples (allantoic fluid of cultured swab samples) were processed for target-independent NGS sequencing. Briefly, paired-end random sequencing was conducted from cDNA libraries prepared from total RNA using KAPA Stranded RNA-Seq kit (KAPA Biosystems, USA) as per manufacturer’s instructions and as previously described (40). All libraries for NGS were loaded into the 300-cycle MiSeq Reagent Kit v2 (Illumina, USA) and pair-end sequencing (2 × 150 bp) was performed on the Illumina MiSeq instrument (Illumina, USA). Pre-processing and de-novo assembly of the raw sequencing data was completed within the Galaxy platform using a previously described approach (15).
Phylogenetic analysis
The assembled consensus sequences from different NDV genotypes and sub-genotypes (6 sequences from MinION run 3, 32 sequences from run 4 and 24 sequences from MiSeq; a total of 62 sequences) and selected (minimum of one sequence from each genotype/subgenotype) sequences from GenBank (n = 66) were aligned using ClustalW (41) in MEGA6 (42). Determination of the best-fit substitution model was performed using MEGA6, and the goodness-of-fit for each model was measured by corrected Akaike information criterion (AICc) and Bayesian information criterion (BIC) (42). The final tree was constructed using the maximum-likelihood method based on the General Time Reversible model as implemented in MEGA6, with 500 bootstrap replicates (43). The available GenBank accession number for each sequence in the phylogenetic tree is followed by the, host name, country of isolation, strain designation, and year of isolation.
Comparison of MinION and MiSeq sequence accuracy
To assess the accuracy of the MinION AmpSeq consensus sequences, 24 samples were sequenced by both deep-sequencing methods (MinION and MiSeq) described above. Pairwise nucleotide comparison between MinION and MiSeq was conducted using the Maximum Composite Likelihood model (44). The variation rate among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 54 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 691 positions in the final dataset. The evolutionary distances were inferred by pairwise analysis using the MEGA6 (42).
Results
Comparison to the Matrix gene RT-qPCR assay
Six, sequential, 10-fold dilutions (from 106 EID50/ml to 101 EID50/ml) from one NDV isolate (LaSota) were used to compare the ability of AmpSeq and RT-qPCR to detect low quantities of NDV. In each of the six dilutions, AmpSeq and the matrix RT-qPCR detected NDV in all dilutions. AmpSeq resulted in 99.04–100.0% sequence identity to the LaSota isolate across all six dilutions in the first experiment (R1) and 99.86–100.0% identity in the second experiment (R2) (Table 1).
Time for data acquisition and analysis
To determine the minimal sequencing time needed for acquisition of accurate full-length amplicon consensus sequences at different serial dilutions, 28,000 reads, which were obtained within the first 19 minutes of sequencing in the first serial dilution experiment (run 1), were analyzed. For all concentrations, the first read that aligned to the reference LaSota sequence was obtained within 5 minutes after the sequencing run started. To obtain consensus sequences (5 reads required to build a consensus sequence) only 5 minutes of sequencing time were required for concentrations 106–103 EID50/ml, which resulted in 99.18–100% sequence identity to the reference LaSota strain. Seven minutes were required to obtain NDV consensus sequences for the two lower concentrations: 101 EID50/ml = 8 reads, 98.77% identity and 102 EID50/ml = 5 reads, 98.37% identity (Table 2).
PCR specificity and range of reactivity for NDV genotypes
To determine the utility of the primers for the currently circulating NDV genotypes and the potential cross-reactivity for other AAvVs, which are relatively nonpathogenic in poultry but can confound diagnosis of NDV (45), total RNA from 43 AAvVs, including 23 unique AAvV-1 lineages, (15 different NDV genotypes, 8 different subgenotypes, total samples = 33), as well as AAvV-2–10 and −13 (n = 10) were tested. All AAvV-1 genotypes that are currently circulating globally were amplified with tailed primers; samples 19 and 36 had weak bands of the desired molecular weight compared to other lanes; samples 19, 20, 21, 31 and 32 had an additional, heavier band (~1000 bp). All non-AAvV-1 viruses were negative (Figure S1).
Quality metrics
The Nanopore QC tool was used to obtain quality metrics plots of all sequencing runs. For MinION runs 3 (6 multiplexed samples) and 4 (33 multiplexed samples), more than 70% of total reads had a quality score greater than ten (Q10 score = 90% accuracy) (Figure S2 A and B). The average overall mean read quality scores in both runs were comparable (run 3 = 10.7, run 4 = 11.0), and the mean quality scores of reads ≥ 10 (mean Q≥10) were similar (11.8) for both runs (Figure S2). In addition, analysis of five consecutive batches of reads (each batch = 20,000 reads) obtained at different time intervals from run 4 indicated that the overall mean read quality for each 20,000 read batch remained above 10 (Table S4). Similarly, the mean Q≥10 over time remained consistent in the clinical sample runs (runs 5–7), which had long (12 hrs) sequencing runs (Figure S3, green lines).
Sub-genotypic resolution of AAvV-1 viruses with MinION sequencing
To determine the capability to effectively detect and differentiate viruses of different genotypes, PCR amplicons from 33 egg-gown isolates, which were representative of 23 different NDV lineages including 15 different genotypes, were barcoded, pooled, and sequenced in a single 12-hour MinION run (run = 4) generating a total of 2.076 million reads. The first 100,000 reads, which were obtained in 3 hours and 10 minutes, were analyzed for identification of all 33 NDV isolates used in the study. All 33 NDV isolates were correctly identified to the sub-genotype level (Table 3). These consensus sequences had 97.82–100% sequence identity to the expected sequence in each of the samples.
While 788 bp was the expected amplicon size, genotypes III, IV, and IX (all previously untested genotypes with this primer set) yielded an unexpected electrophoresis product of ~1000 bp (see above). The sequences obtained from these NDV isolates revealed that in addition to the 788 bp product, an upstream region of NDV genome was amplified, resulting in 1050 bp consensus sequences.
Two virus isolates contained two NDV genotypes. For sample #37, MiSeq detected genotypes XIIIb and VIc, but the AmpSeq workflow only detected genotype XIIIb. Sample #16 had been previously confirmed to contain two genotypes, as determined by two genotype-specific PCRs and Sanger sequencing (data not shown); however, the MinION-based AmpSeq workflow only detected genotype VIId.
Clinical swab samples from chicken
To assess the potential utility of the protocol on clinical field samples, MinION libraries were generated directly from clinical swab samples. Also, for confirmation, these swab samples were cultured in eggs and the allantoic fluid was sequenced using a MiSeq-based workflow. Out of 11 NDV-positive samples with the culture-MiSeq method, 10 samples were NDV positive by the MinION protocol (Sample #52 being the exception) (Table 4). In the six NDV-positive samples that contained one NDV genotype, as detected by the culture-MiSeq method, the same NDV genotype was also detected with the MinION protocol. The culture-MiSeq method detected two genotypes in samples #45, #46, #47, and #49; whereas, the MinION protocol only detected dual genotypes in samples #45 and #46. In sample #48, only one NDV genotype was detected by culture-MiSeq but two NDV genotypes were detected by the MinION protocol. All 4 samples negative by culture-MiSeq were also negative by the MinION protocol.
Pairwise comparison of replicated MinION sequences and MiSeq sequences
Pairwise nucleotide distance analysis was used to compare the consensus sequences in six samples across two separate MinION runs, and to compare the MinION consensus sequence to the MiSeq consensus sequence in twenty-four isolates (one isolate representing each genotypes and sub-genotype). There was no variation in the consensus sequence between the MinION runs across those six samples. The MinION and MiSeq consensus sequences were 100% identical, except for sample #37 where the sequence identity was 99.99% to the paired MiSeq sequence. Collectively, these results demonstrate the repeatable accuracy of the MinION-AmpSeq method.
Phylogeny of NDV genotypes
To confirm the ability of the MinION-acquired partial matrix and fusion gene sequences to be used for accurate analysis of evolutionary relatedness, phylogenetic analysis using consensus sequences (734 bp) obtained from two independent MinION runs (run 3 and 4) was performed. Additionally, the 24 sequences from MiSeq were also included in the phylogenetic tree (Figure 2) to further illustrate the agreement between these two sequencing methods. In the phylogenetic tree, the isolates (n = 33; green font) grouped together with the viruses that showed highest nucleotide sequence identity to them, including those in which MiSeq sequences were available (red font). The six isolates that were sequenced twice (blue font) clustered together. Taken together, the results demonstrated that all sequences clustered to the expected genotype/sub-genotype branch of the phylogram.
Time and cost estimation
The time of sample processing and cost estimation of reagents to multiplex and sequence samples (n = 6; n = 33) from RNA extraction to obtain final consensus sequences is presented in Table S5. From RNA extraction to final consensus sequence calculations, the average time (including sequencing time) to process six samples was approximately 9–10 person-hours and for 33 samples approximately 26 person-hours. Assuming that flow cell can be used multiple times (twice when 33 samples pooled and five times when six samples pooled to prepare one cDNA library) for sequencing, cost per sequencing run and cost per sample were calculated. The cost per sample decreased from $53 (six samples multiplexed) to $31 (33 samples multiplexed).
Discussion
This study describes a sequencing protocol for rapid and accurate detection, virulence determination, and preliminary genotype identification (with sub-genotype resolution) of NDV utilizing the low-cost MinION sequencer. Additionally, an assembly-based sequence analysis workflow is described for analyzing MinION amplicon sequencing data. This MinION AmpSeq workflow was successful when using a lab repository of egg-grown viruses of varying genotypes, clinical swab samples, and samples containing mixed genotypes.
The sequence heterogeneity among AAvV-1 genomes is well known (3, 46, 47), which complicates the ability to develop a single test that sensitively detects NDV, while also predicting the genotypic classification and virulence. Currently, an RT-qPCR targeting the M gene (9) is most sensitive and is used for screening samples, but this assay only provides positive and negative results of the samples. An RT-qPCR that predicts virulence based on the fusion gene is available (9); however, the lower sensitivity of this assay and the inability of this assay to detect all genotypes (e.g., genotypes Va and VI) (9, 10, 48) complicate diagnostic interpretation when the matrix and fusion tests have conflicting results. Thus, the only truly reliable option to detect some viruses is to design genotype-specific primers and probes (6, 48, 49). For example, Miller et al reported that the primer set used in this study detected Class I and all nine of the tested class II genotypes (34); however, this primer set was not tested against other currently circulating genotypes, nor was the sensitivity well defined. The current study includes six additional genotypes, collectively representing all currently circulating genotypes (excluding the Madagascar-limited genotype XI). Additionally, while the preliminary analytical sensitivity of this protocol was determined using only one NDV genotype, the sensitivity of the MinION AmpSeq was comparable to the matrix RT-qPCR and presents an opportunity to implement detection, genotype prediction, and virulence prediction into a single test. As an example of this, in several clinical samples the MinION AmpSeq method detected and identified the LaSota vaccine strain, which is an extensively used vaccine strain for commercial poultry in this region (50). Whereas with the matrix RT-qPCR, these samples would have only been reported as NDV positive and would have required additional testing to determine that the NDV was not virulent.
While the multifaceted nature of this MinION AmpSeq protocol is an advantage, cost and time efficiency must be maintained for it to be useful outside of research laboratories. MinION is inherently rapid due to the real-time nature of the sequencing. Additionally, samples can be multiplexed into a single sequencing run, which reduces time and cost (51). Recently, multiplexing and MinION sequencing of the PCR products from a panel of 5 samples was reported (51). Here a panel of 33 samples was multiplexed while maintaining successful NDV genotyping from data collected within 3 hrs and 10 minutes of sequencing and without affecting mean read quality and percentage of high quality reads.
While Nanopore sequencing has numerous benefits useful for diagnostic testing, the high error rate poses unique challenges to data analysis. Thus, it is important to extract accurate consensus sequences from raw sequencing data (52). As previously discussed, pathogen typing from sequencing data can be done with read count-based profiling or de novo assembly approaches (27). However, there are a limited number of available tools suitable for handling the noisy reads currently produced by the MinION platform. The approach in this study takes advantage of the fact that single MinION reads often represent full-length amplicon sequences. By clustering full-length reads based on pairwise identity and subsequently performing consensus calling using standard multiple alignment software, this method quickly and reliably generates accurate do novo assemblies from amplicon datasets. Additional refinement using the Nanopolish finisher leads to reliable consensus accuracy >99% with as few as twenty reads per amplicon, and correct genotypic assignments with as few as five reads per amplicon. Thus, this method overcomes the inherently high error rate of Nanopore sequencing and sequence identification and differentiation at the sub-genotype level can be highly reliable.
Because this protocol relies on identity-based clustering prior to assembly, it maintains the ability to detect samples with mixed NDV genotypes, similar to read-based analytical methods. For example, in this study four clinical samples had two different genotypes as detected by MiSeq analysis, two of which were correctly identified by the MinION AmpSeq workflow. In a fifth case, a mixed sample was detected by MinION AmpSeq, but not by MiSeq. A potential explanation for these differences could be that the MiSeq sequencing was performed on egg-amplified samples, which may have altered the relative levels of the two genotypes, as compared to the direct clinical swab sample used for MinION sequencing. Additionally, the differences in molecular techniques (i.e., MinION: targeted; MiSeq: random) may have altered the relative abundance of the genotypes within the sequencing libraries. While further studies into the ability of this workflow to detect and differentiate NDV in samples with more than one genotype are ongoing, rapid NDV genotyping from clinical samples without culturing the virus in SPF eggs has the potential to facilitate disease diagnostics.
Taken together, this protocol reliably detected, genotyped, and predicted the virulence of NDV. A MinION AmpSeq-based approach will be beneficial worldwide and especially in developing countries where the endemicity of high-consequence diseases, such as NDV, and lack of resources are additional challenges to monitoring and studying infectious diseases (50). Furthermore, the advantages of MinION AmpSeq allow for further optimization not possible with other techniques. For example, PCR product length is less of a restriction with MinION AmpSeq as compared to RT-qPCR; thus, there will be one less restriction on primer site design when trying to create a pan-NDV primer set. Work is in progress to utilize the ability of MinION to sequence longer amplicon fragments, which will provide more complete phylogenetic information and may improve the sensitivity of this assay. Overall, MinION AmpSeq improves the depth of information obtained from PCRs and allows for more flexibility in assay design, which can be broadly applied to the detection and characterization of numerous infectious agents.
Acknowledgments
We gratefully acknowledge Tim Olivier for technical assistance. Special thanks to the Fulbright U.S. Student Program for sponsoring Dr. Salman’s PhD research work. This project was supported by USDA CRIS 6040-32000-072 and DTRA grant 58.0210.5.006 and partially funded by BAA# FRCALL 12-6-2-0015.