Long Amplicon Nanopore Sequencing for Dual-Typing RdRp and VP1 Genes of Norovirus Genogroups I and II in Wastewater

Noroviruses (NoV) are the leading cause of non-bacterial gastroenteritis across the globe with societal costs of US$60.3 billion per annum. Development of a long amplicon nanopore-based method for dual-typing the RNA-dependent RNA polymerase (RdRp) and major structural protein (VP1) regions from a single RNA fragment could improve existing norovirus typing methods. Its application to wastewater-based epidemiology (WBE) and environmental testing could enable the discovery of novel types and improve tracking throughout the population and into aquaculture and recreational water settings. Here, we develop and optimise such a method for wastewater as the sample matrix. Reverse transcription (RT), PCR and library pooling were optimised and a consensus-based bioinformatics pipeline was developed. Inhibitor removal and LunaScript® RT gave robust amplification of the ≈1000 bp RdRP+VP1 amplicon. Platinum™ Taq polymerase showed good sensitivity and reduced levels non-specific amplification (NSA) when compared to other polymerases. Optimised PCR annealing temperatures significantly reduced NSA (51.3% and 42.4% for GI and GII), increased yield (86.5% for GII) and increased taxa richness (57.7%) for GII. Analysis of three NoV positive faecal samples showed 100% nucleotide similarity with Sanger sequencing. Eight GI genotypes, 11 polymerase types (p-types) and 13 combinations were detected in wastewater along with 4 GII genotypes, 4 p-types and 8 combinations; highlighting the diversity of norovirus taxa present in wastewater in England. The most common genotypes detected in clinical samples were all detected in wastewater while we also commonly detected several GI genotypes not reported in the clinical data. Application of this method into a WBE scheme, therefore, may allow for more accurate measurement of norovirus diversity within the population.


Introduction
Noroviruses (NoV) are non-enveloped viruses in the family Caliciviridae with a single stranded, positive sense RNA genome.In human NoV strains the genome is ≈7.6 kb in length and comprised of three open reading frames (ORFs).ORF-1 encodes the non-structural proteins P48, NTPase, P22, VPg, Protease and RNA-dependent RNA polymerase (RdRp) while ORF-2 and -3 encodes the major structural capsid protein VP1 and minor structural protein VP2 (Campillay-Véliz et al., 2020).Three of the ten NoV genogroups (GI, GII and GIV) cause gastroenteritis in humans (Baldridge, Turula and Wobus, 2016).Their further classification into genotypes and polymerase-types (p-type) is based on VP1 and RdRp regions, with a dual-typing system (genotype + p-type) proposed by Chhabra et al., (2019) naming of both the genotype (e.g.GI.2) and p-type (e.g.GI.P2) in combination (e.g.GI.2[P2]) referred to herein as type.
The leading cause of epidemic and non-bacterial gastroenteritis, NoV was estimated to cost US$4.2 billion in direct health costs and US$60.3 billion in societal costs worldwide in 2016 (Baldridge, Turula and Wobus, 2016;Bartsch et al., 2016;Campillay-Véliz et al., 2020).Although severe disease and death is rare, mortality rates are higher in children <5 y old, the elderly and immunocompromised (Baldridge, Turula and Wobus, 2016;Bartsch et al., 2016).Out of an estimated 699 million illnesses, 219,000 deaths are predicted to occur annually; 70,000 being children <5 y old (Baldridge, Turula and Wobus, 2016;Bartsch et al., 2016).The main transmission route is faecal-oral and an acute symptomatic phase of 1 to 4 d occurs with faecal shedding commencing at 0.8 d and may continue for months (Baldridge, Turula and Wobus, 2016;Ge et al., 2023).High viral loads (~2.0 x 10 9 genome copies/g stool) and particle stability make NoV a suitable candidate for wastewater based epidemiology (WBE) (Hall, 2012;Newman et al., 2016;Hassard et al., 2017).WBE became prominent during the SARS-CoV-2 pandemic with at least 72 countries adopting it to quantify viral RNA, track variants of concern and emerging variants (Naughton et al., 2021).WBE, however, has been used for decades as part of the poliovirus eradication programme and for monitoring other pathogens such as hepatitis viruses and norovirus (Asghar et al., 2014;Polo et al., 2020;Treagus et al., 2023).It has the potential to be a useful tool with benefits including detection of asymptomatic cases, access to near population-scale epidemiological information without mass testing and fewer anthropogenic biases.
Numerous studies using amplicon sequencing to genotype and p-type NoV in wastewater have been performed.The majority amplify partial VP1 or RdRp regions in isolation.Kazama et al., (2017) and Cao et al., (2022) used semi-nested PCR of partial GI and GII VP1 regions (≈337 bp) using the GS-Junior (Roche, Switzerland) and NovaSeq (Illumina, USA) platforms.In both cases 15 genotypes were detected, with up to 8 genotypes per sample for the former.Fumian et al., (2019) amplified the 5' of the GII VP1 (373 bp) using MiSeq 2x150 bp (Illumina, USA).Over 1 yr, 13 genotypes were detected from 156 samples with GII.4 being the most prevalent.Mabasa et al., (2022) sequenced ≈575bp of the ORF-1 and -2 junction using MiSeq 2x300 bp (Illumina, USA).Over 26 months of bi-weekly sampling, 81% were positive and 13 and 21 types of GI and GII were detected.Recently our group used nanopore sequencing to genotype GII (302 bp amplicon) and found 8 genotypes across 42 samples (Treagus et al., 2023).These previous studies highlight the diverse range of NoV types that can be detected in wastewater.
Most previous research, however, has relied on short-read sequencing which often require separate amplification and sequencing of VP1 and RdRp.This makes application of dual-typing postsequencing extremely difficult due to high levels of recombination in the norovirus genome.The aim of this study, therefore, was to develop a long-amplicon (≈1000bp) nanopore sequencing method optimised for wastewater that allows dual-typing of GI and GII.Development of such an approach could allow identification of emerging variants and novel recombinants from wastewater.

Materials and Methods
A step-by-step protocol for the methods developed in this study is available in the supplementary protocol or on protocols.ioat dx.doi.org/10.17504/protocols.io.8epv5xpmjg1b/v1.

Sample Collection and Processing
Samples were collected and processed by the Environment Agency as part of the SARS-CoV-2 wastewater monitoring programme in England (UK) as outlined in Walker, (2024).Briefly, 1 L of untreated sewage was collected and processed (150 mL) using ammonium sulphate precipitation and nucleic acid extraction using a Kingfisher Flex™ (Thermo Scientific™, UK) and NucliSENS® (BioMérieux, France) reagents.Nucleic acids were sent on dry ice to the Centre for Environment, Fisheries and Aquaculture Science (Weymouth, UK).
To assess the performance of inhibitor removal and the wastewater optimised sequencing protocol, 209 nucleic acid extracts from wastewater collected between 22/10/2021 and 25/10/2021 were pooled by geographical region based on the location of the wastewater treatment plants using QGIS 3.16 and the "Regions (December 2021) EN BFC" (Office for National Statistics, 2021; QGIS Development Team, 2021).Regions with limited or excess nucleic acid volume were split, combined or omitted creating 10 samples, Supplementary Table 1.Excess nucleic acids were pooled into three additional independent pools and used to assess reverse transcription, PCR and size selection.

Inhibitor Removal
To assess the impact of inhibitors on reverse transcription and PCR, nucleic acid extracts were cleaned with Mag-Bind® TotalPure NGS beads (Omega Bio-Tek, USA) following Child et al., (2023) using 25 µL of nucleic acids.To monitor inhibitor removal efficacy, a GI RT-qPCR and an external control RNA (EC RNA) method was used following ISO 15216-1: 2017 (ISO, 2017).Reactions were spiked with 1 µL of EC RNA (4,000 gc/µL) and run alongside an EC RNA + water control.Reactions were run in duplicate on QuantStudio™ 3 machines.RT-qPCR standard curve slopes were between -3.6 and -3.1 with R 2 ≥0.99.Technical repeats were averaged prior to analysis.Inhibition values <0% were assigned a value of 0 and NoV concentration data was square root transformed (West, 2022).

Reverse Transcription Optimisation
Methods using two reverse transcription (RT) kits were tested; SuperScript™ IV Reverse Transcriptase (Invitrogen™, USA) and LunaScript® RT SuperMix (New England Biolabs, USA), referred to as Superscript™ and LunaScript®.For SuperScript™, the manufacturer's instructions were followed for a 20 µL final volume, 10 µL of nucleic acids and RT at 50°C.LunaScript® followed Child et al., (2023) with a nucleic acid volume of 10 µL.Molecular biology grade water (7.5 µL) was added after RT to equalise sample dilution.
To assess RT performance, semi-nested PCR of the RdRp+VP1 region was performed.First round and semi-nested products were 1194 and 1110 bp for GI and 1052 and 971 bp for GII, respectively (Table 1).PCR with Platinum™ Taq DNA Polymerase followed the manufacturer's instructions for a 25 μl reaction with 5 μl of cDNA or first-round PCR products.Cycling conditions were 95°C for 1 min followed by 40 cycles of 95°C for 30 s, 50°C for 30 s and 72°C for 30 s and 72°C for 7 min.Products were visualised by gel electrophoresis with 2% tris-borate EDTA agarose (Sigma Aldrich) with 100 bp ladder (Promega, USA) or with a TapeStation 4150 using D5000 screen tape (Agilent, USA).

PCR Optimisation and Size Selection
Two independent pooled cDNA samples processed using LunaScript™ were used to optimise PCR.Six Annealing temperatures (Ta) were optimised with gradient PCR using the lowest primer melting temperature (Tm) for each polymerase as the highest Ta with two additional temperatures at ≈2.5°C and ≈5.0°C below.Cycling conditions were as recommended by the manufacturer and run for 40 cycles.Reactions were run in simplex at 1 and 10-fold dilutions on a Mastercycler® Nexus Gradient (Eppendorf, Germany).Optimised PCR Ta were compared to Ta=50°C as used by Ollivier et al. (2022).
Ampure XP (Beckman Coulter, USA) and Mag-Bind® TotalPure NGS (Omega Bio-Tek, USA) beads were trialled for size selection using both 0.4 to 0.6x ratios based on the manufacturers' recommendations.

Library Preparation and Sequencing
To simultaneously type GI and GII under a single barcode, an amplicon pooling method was developed to increase the equality of sequencing depth.Nucleic acid extracts (158) from untreated sewage samples collected between 05/08/21 and 11/08/21 and 25/02/22 and 07/03/2022 were processed using the wastewater optimised methods then analysed by TapeStation.The amplicon of interest percentage AOI% was calculated following Supplementary Equation 1. Negative and diluted samples, those with unusually high levels of non-specific amplification (NSA) and outliers (1.5 times the interquartile range) were removed and the average GI:GII AOI% ratio was determined (Supplementary Equation 2).This was used to adjust the moles of the GI and GII PCR products input into library preparation; Supplementary Equations 3 and 4.
PCR products were purified using ExoSAP-IT™ (Applied Biosystems, USA) following the manufacturer's instructions for 10 µL of product.PCR yield was determined using a Qubit™ Flex (Invitrogen™, USA) and dsDNA High Sensitivity kit with 2 µL of sample.For the wastewater optimised assay GI and GII amplicon pooling, molarities were adjusted for amplicon length and NSA with 85.3 and 114.7 fmol of GI and GII amplicons pooled together.For the unoptimised method, 62.0 and 138.0 fmol of GI and GII were pooled together based on the GI:GII AOI% ratios of the samples under investigation.Library preparation was performed using the Native Barcoding Kit 96 V14 (Oxford Nanopore Technologies, UK) following the manufacturer's instructions for sequencing of amplicons.
Sequencing was performed on a GridION (MinKNOW software release 22.10.7)using R10.4.1 flow cells at 260 bps with super accurate basecalling.Whole process (cDNA synthesis onwards) and PCRspecific no template controls were sequenced alongside the samples.

Bioinformatics
Briefly, reads were split using duplex_tools 0.2.14 and trimmed with cutadapt 3.4 (Martin, 2011;Oxford Nanopore Technologies, 2022).Size filtering and random sampling (>800 bp and ≤90,000 reads per sample) was performed using SeqTK 1.3 (Hang Li, 2018).Minimap 2.24 was used to find overlaps between reads.Alignments were screened using yacrd 1.0.0 to detect chimeras or poorly supported reads where more than 20% of a given read had a depth of less than 10 (Li, 2018;Marijon, Chikhi and Varré, 2020).NGSpeciesID 0.1.3clustered reads and formed consensus sequences supported by more than 100 reads (Sahlin, Lim and Prost, 2021).
For each sample, consensus sequences were indexed, polished and screened for regions with poor support and reads aligned using kma 1.4.9 (Clausen, Aarestrup and Lund, 2018).All consensus sequences were concatenated, renamed using SeqTK 1.3 and then indexed with Samtools 1.13 (Hang Li, 2018;Danecek et al., 2021).SeqKit 2.3.0 identified regions at consensus termini soft masked by kma 1.4.9 due to having poor support, with any such regions being trimmed using Bedtools 2.30.0 (Quinlan and Hall, 2010;Shen et al., 2016).Consensuses were clustered at 95% (CD-HIT 4.8.1) and typed using the Centre for Disease Control's (CDC) Human Calicivirus Typing Tool (Fu et al., 2012;Tatusov et al., 2021).Full bioinformatic procedures, tools and commands can be found in the Supplementary Table 2.
Untyped consensus sequences were aligned against CDC reference sequences in MEGA 11.0.11using the ClustalW algorithm with default parameters (Tamura et al., 2021; Centre for Disease Control, 2023b).Indels causing frame-shift mutations were assessed using amino acid alignments and putative viral typing using the CDC's NoV typing tool followed by errors correction.R 4.1.2was used to collate the number of reads mapped to each of the consensus sequences (R Core Team, 2021).
Putative PCR chimeras were removed from the dataset if they met any of the following criteria: Failing to align against the CDC's reference sequence database (Centre for Disease Control, 2023b), identification as a chimera by USEARCH v11 de-novo or a child sequence with a parent breakpoint within the terminal or proximal regions of RdRp and VP1 and child-parent sequence similarities ≥95%.For USEARCH chimera detection, consensus sequences were annotated with read count (R) and screened using USEARCH v11 de-novo chimera detection (Edgar, 2016).Manually screening was performed using NCBI Multiple Sequence Alignment Viewer v1.25.0.For the method comparison study, a read-depth threshold of 0.1% of the median reads per sample was used and reads from consensus sequences of the same viral type were grouped.
Nanopore-generated sequences were assessed against those obtained by Sanger sequencing.Three norovirus positive faecal samples as determined by RT-qPCR following ISO 15216-1:2017 from presumed single-type infections (ISO, 2017), were processed following the wastewater optimised protocol.PCR products were purified using the QIAquick PCR Purification Kit (QIAGEN, Germany) and sequenced with the forward primer using Mix2Seq (Eurofins, Germany).Sequences were trimmed to 10 consecutive Q30 bases prior to alignment.The above bioinformatic approach was used without consensus sequence clustering at 95%.Read quality was estimated by aligning reads against the faecal samples consensus sequences using minimap 2.26 and the map-ont preset.Results were filtered to include only those alignments ≥700 bp.Quality scores were calculated as described in Supplementary Equation 5.

Data Analysis
For all inferential statistics, paired t-tests were performed unless the assumption of normality of the difference between observations was not met where a Wilcoxon signed-rank test was performed.All data analysis and visualisation was performed in R Studio build 353 (2022.12.0) and statistical significance is defined as p<0.05 (Posit team, 2022).

Inhibitor Removal and Reverse Transcription
Inhibitor removal significantly reduced average inhibition from 90.6% to 13.2% and increased NoV quantification 4.8-fold; p<0.001 (Figure 1 A and B).Reverse transcription studies showed no or weak amplification with SuperScript™ while Lunascript® showed good amplification for GI and GII with inhibitor removal (Figure 2 A and B).For Lunascript®, the number of aligned reads ≥900 bp (log10) significantly increased with inhibitor removal from 1.99 ± 0.25 (sd) to 5.22 ± 0.24 for GI and 2.31 ± 0.38 to 4.52 ± 0.46 for GII (p<0.001) giving 163-fold and 1731-fold differences.

PCR Reagents, Cycle Optimisation and Amplicon Purification
Following first round PCR, no GI amplicons were observed indicating that the polymerases lacked the sensitivity to amplify NoV GI in a single round.For GII, however, amplicons were observed but not for all dilutions (Figure 3).Phusion™ showed signs of PCR inhibition with increased yields following dilution (Figure 3 B).Q5U® and Ultra™ II Q5® showed abundant NSA at all Ta (Figure 3 D and E).In general, Platinum™ and LongAmp® showed low levels of NSA, good yield and sensitivity with both showing amplification with samples diluted 10-fold at their optimal Ta.For LongAmp™, yield reduced with increasing Ta but Platinum™ increased with Ta while SuperFi™ showed a reduction in sensitivity and NSA but an increase in yield with increasing Ta (Figure 3 A, C and F).

Development of a Multi-Target Library
Following PCR optimisation, NSA was higher for GII than GI with the AOI% at 61.5% and 89.1% (Figure 4 B and D and Figure 5 A and B).PCR optimisation significantly increased AOI% for GI and GII from 77.6% to 89.1% and 33.4% to 61.5%(Figure 5 A and B).PCR yield significantly increased for GII (p<0.001) following optimisation from 11.84 ng/µL ± 2.71(sd) to 22.08 ng/µL ± 3.47 while GI was not impacted; 29.97 ng/µL ± 4.47 and 29.78 ng/µL ± 2.80.To reduce sequencing of NSA, Ampure XP-and Mag-Bind® TotalPure NGS-based size selection were trialled.Neither increased the AOI% (78.4 to 79.4%) but both saw reductions in amplicon concentrations (24.1 to 6.14 ng/µL); highlighted in Figure 5 C and E. A library pooling method based on the AOI% within the total amplicon pools for GI and GII was developed with a 1.34-fold mean difference between the GI and GII AOI% being identified (Figure 5 D).Adjusting the GI and GII pooling molarities reduced the bias in median read coverage from 3.3-fold to 2.6-fold.
3.4 Sequence Quality, Nanopore Sequence Validation and PCR Chimeras During comparative analysis of the wastewater optimised methods and those using Ta=50°C, whole process and PCR negative controls were free from any sequences aligning to NoV (Supplementary Table 9).Prior to reads trimming the mean and median read lengths were 1,790 and 851 bp, respectively, suggesting a positively skewed distribution with a few very long reads.Approximately 45.5% to 71.2% of reads per sample included both primers and of those 17.2% to 45.2% aligned against a consensus sequence (Supplementary Table 9).Of 94 consensus sequences, 3 couldn't be typed due to indels in homopolymer regions.
To validate the nanopore consensus sequences, three faecal samples from assumed single-type infections were compared to sequences obtained by Sanger sequencing.All three had 100% nucleotide similarity compared to Sanger with no indels.The Sanger sequences had median and modal quality scores of Q58 (Supplementary Table 3).Alignment of nanopore reads from these samples against the consensus sequences estimated the median quality of reads in the sequencing library as being 12.3 (Supplementary Figure 1).
During initial data processing, 12 putative novel recombinant GI types were observed on 18 occasions (Supplementary Table 4).On 16 occasions, parent-types containing the genotype or p-type were both present at ≥7.4% of the total reads with the recombinants comprising ≤2.5%; indicating their potential as PCR chimeras.On two occasions the parent types (GI.1[P1] and GI.2[P2]) of the putative chimeras (GI.1[P2] and GI.2[P1]) were not detected in the sample.Some putative chimeras contained a low number of SNPs (<5) compared to their parent types.

The Impact of PCR Optimisation on Norovirus Diversity
Ten pooled wastewater samples were analysed to determine the impact of the PCR optimisation on norovirus taxa richness.The optimised PCR detected 13 GI types, two (GI.5[P12] and GI.7[P9]) were not detected by the Ta=50 °C method.Following optimisation taxa observations reduced from 41 to 38 along with taxa richness from 4.1 ± 1.7 to 3.8 ± 1.4 (p=0.729)(Figure 6 i and ii).Differences in detected taxa were from types whose genotypes and p-types had been detected separately in different samples; apart from GI.P5 which was not detected by the optimised assay.For GII, the total number of types detected increased from 6 to 8 after optimisation with Ta=50°C not detecting GII.2[P31] and GII.4[P16] (Figure 7).Total taxa count increased from 26 to 41 and taxa richness significantly increased from 2.6 ± 1.07 and 4.1 ± 0.7 (p<0.001)(Figure 7B i and ii).

Noroviruses in Wastewater in England
This section only uses the data from the wastewater optimised assay.Aside from GI.8, all GI genotypes and 11 out of 14 GI p-types described by the CDC were detected from the pooled samples collected from across England (Supplementary Table 5 and 6)(Centre for Disease Control, 2023b).
Eight GII types, representing 4 of the 26 genotypes and 4 of the of the 37 p-types as described by the CDC were detected (Supplementary Table 7 and 8 having the highest proportion of reads in all cases (Figure 7).

Discussion
Two of the challenges faced during method development were the physiochemical and microbiological nature of wastewater, their impact on NoV detection and the biology of NoVs themselves.As enteric viruses, NoVs are detected frequently (82% to 100%) in wastewater making NoV negative samples difficult to obtain (Qiu et al., 2018;Huang et al., 2022).Performing method validation with spiked-matrix mock communities is, therefore, difficult to achieve.Additionally, difficulties obtaining clinical samples and culturing NoVs makes obtaining a diverse range of reference material for artificial matrix-based experimentation challenging (Cates et al., 2020).
Furthermore, such experiments fail to account for the potentially large matrix-effect that wastewater exhibits (Scott et al., 2023).This creates issues around determination of PCR chimeras, quantitative validation and calculating sequencing error rates.

Wastewater and RT-PCR Inhibition
Wastewater's physicochemical properties are poorly characterised and are likely influenced by many geospatial and environmental factors.RT and PCR inhibitors likely present include polysaccharides, bile salts, lipid, urate, fulvic and humic acids, metal ions, algae and polyphenols (Schrader et al., 2012;Sims and Kasprzyk-Hordern, 2020).Inter-sample RT-qPCR inhibition levels have been shown to be highly variable (0% to 98%) for SARS-CoV-2 (Scott et al., 2023).Inhibition is, therefore, likely to influence the data quality of PCR-dependent WBE methods.
Here, we confirm the suitability of Child et al., (2023)'s inhibitor removal method for detecting NoVs in wastewater, reducing inhibition (85%) and increasing NoV quantification (4.8-fold) as measured by RT-qPCR (Figure 1 A and B).These data should be used with caution as inhibitor susceptibility may change between different enzymes and amplicons; quantitative performance gains for this metabarcoding approach cannot be inferred (Huggett et al., 2008;Kermekchiev et al., 2009).
Following inhibitor removal, LunaScript™ showed 163-to 1731-fold increases in sequencing depth highlighting the large impact inhibitors can have on amplicon sequencing and the importance of matrix-specific assay optimisation.

Wastewater and PCR Optimisation
The importance of matrix-specific assay optimisation is emphasised by wastewater's likely diverse nucleic acid content with influence from humans, their food, microbial ecology of the gastrointestinal tract other natural and industrial sources of wastewater.This makes designing nucleic acid-based assays challenging due to an increased likelihood of NSA.Furthermore, nucleic acid degeneracy within primers is required to capture the full genomic diversity of NoV; increasing the likelihood of NSA (De Graaf et al., 2016;Ford-Siltz et al., 2021).
Several polymerases were assessed for sensitivity, NSA and yield.Platinum™ showed the best performance overall for both GI and GII PCR assays (Figure 3 & Figure 4).It is likely that Platinum™ shows increased enzymatic activity at higher temperatures allowing good performance at higher Ta thereby reducing NSA and increasing yields.Optimisation of the Platinum™ PCR conditions significantly increased taxa richness for GII (57.7%) while a non-significant reduction was seen for GI.
Assay optimisation increased the frequency of observation for some taxa; for example GII.2[P16] was detected in 20% of the samples using the Ta=50°C method but in 100% with the optimised protocol.

Others (GII.2[P31] & GII.4[P16]
) weren't detected using the Ta=50°C method (Figure 6 B).This is likely due increased Ta reducing primer-template interactions when mismatches are present allowing the preferential amplification of NoV.Failure to properly optimise methods implemented in WBE is, therefore, likely to underestimate the diversity of the organisms under investigation.

Sequencing and Bioinformatics
GI and GII were sequenced under a single barcode per sample to maximise throughput.PCRs were optimised to reduce NSA but the residual NSA led to read-depth disparity for GI and GII.Size selection to remove NSA was unsuccessful, perhaps because products of the NSA were often close in size to the AOI.A weighting was therefore applied to the loading of the GI and GII amplicons into library prep to account for the differences in depth.
Given the high estimated read error rate (median ≈Q13) a consensus-based approach was used and analysis was focused mainly at the type-level.Consensus sequences were clustered at 95% prior to estimating abundance of the different sequences within the dataset.This threshold allowed for differentiation of types while accounting for the significant amount of noise in the data.GII subtypes, however, cannot be determined as heterogeneity thresholds are 98% (Tatusov et al., 2021).
Consensus sequences used to identify novel recombinants and putative PCR chimers were supported by high-quality alignments of ≥900 bp in length, included ≥90% of a read and had normalised read scores of ≥92%.
A large amount of sequencing data were filtered prior to alignment, which appeared to be due to NSA and the high error rate inherent in nanopore sequencing.The latter should improve with new developments in nanopore sequencing chemistry, basecalling and flowcells.Despite most of the analysis focussing on types, rather than subtypes, it was shown when analysing NoV faecal samples from three presumed single type infections there was 100% nucleotide similarity with Sanger sequencing; with the latter having a median basecalling score of Q58 (Supplementary Table 3).This indicates that this method is adequate to discriminate between subtypes if the consensus sequence clustering is performed at 98% rather than the 95% implemented here.

Figure 1 .Figure 2 .
Figure 1.The RT-qPCR inhibition (A) and concentration (B) of norovirus GI from cDNA samples

Figure 3 .
Figure 3. Gel electrophoresis of the first-round GII PCR products from several polymerases A)

Figure 4 .
Figure 4. Gel electrophoresis of the semi-nested PCR for GI (A and B) and GII (C and D) using

Figure 5 .
Figure 5.The percentage of the amplicon of interest (AOI) within the total amplicon pool for the

Figure 6 .
Figure 6.Norovirus genogroup I types identified in ten pooled wastewater samples (1 to 10) using

Table 1 .
Primers used for amplification of the RdRp+VP1 region.
Subscript represent 1) first round primers and 2) semi-nested primers and m) original primer modifications.Bold letters indicate primer modifications.