Abstract
Following the worldwide emergence of the p.Asp614Gly shift in the Spike (S) gene of SARS-CoV-2, there have been few recurring pathogenic shifts occurring during 2020, as assessed by genomic sequencing. This situation has evolved in the last several months with the emergence of several distinct variants (first identified in the United Kingdom and South Africa, respectively) that illustrate multiple changes in the S gene, particularly p.Asn501Tyr (N501Y), that likely have clinical impact. We report here the emergence in Columbus, Ohio in December 2020 of two novel SARS-CoV-2 clade 20C/G variants. One isolate, that has become the predominant virus found in nasopharyngeal swabs in the December 2020-January 2021 period, harbors S p.Gln677His, membrane glycoprotein (M) p.Ala85Ser (Q677H) and nucleocapsid (N) p.Asp377Tyr (D377Y) mutations. The other isolate contains S N501Y and ORF8 Arg52Ile (R52I), which are two markers of the UK-B.1.1.7 (clade 20I/501Y.V1) strain, but lacks all other mutations from that virus. It is also from a different clade and shares multiple mutations with the clade 20C/G viruses circulating in Ohio prior to December 2020. These two SARS-CoV-2 viruses emerging now in the United States add to the diversity of S gene shifts occurring worldwide and support multiple independent acquisition of S N501Y (in likely contrast to the unitary S D614G shift) occurring first during this period of the pandemic.
Introduction
SARS-CoV-2 genomic sequencing has facilitated surveillance efforts to track shifts in viral isolates worldwide (Brufsky, 2020). The emergence in March-April of 2020 of the D614G mutation defining the more transmissible G-strain has been the primary shift during the first nine months of the pandemic (Korber et al, 2020). This variant has been shown to have increased cell binding and viral spread in in vitro models (Mok et al, 2020; Hu et al 2020). Within the last several months, however, emergence of several distinct SARS-CoV-2 strains with additional likely pathogenic changes have occurred. These include the rapid spread of novel variants in the United Kingdom (UK, Technical Advisory Group, 2020; European Centre for Disease Prevention and Control, 2020) and South Africa (Tegally et al, 2020) containing several likely pathogenic but distinct mutations in the Spike (S) gene, particularly N501Y. The rapid transmissibility of these variants (Davies, 2020) and the sudden occurrence of multiple changes in the S gene has raised concerned about shifts in the pattern of COVID-19 disease and possible variability in response to antibody therapies or vaccines.
Here, we report the results of SARS-CoV-2 genomic surveillance from April 2020 through January 2021 in Columbus, Ohio. These data reveals a parallel recent shift in the predominant 20C>20G clade that contains 3 new variants and the presence of a novel viral isolate that also harbors the S N501Y variant but is different from UK-B.1.1.7 (20I/501Y.V1) and the South African variant (20H/501Y.V2),
Methods
This study was approved by the Institutional Review Board for the utilization of residual RNA samples from routine clinical SARS-CoV-2 PCR testing for viral sequencing. Briefly, standard PCR-based detection of SARS-CoV-2 was initiated by extraction of viral RNA from nasopharyngeal (NP) swabs (KingFisher™ Flex Magnetic Particle Processor, ThermoFisher). The viral RNA was analyzed, in most cases, using the TaqPath COVID-19 Combo Kit with an Applied Biosystems 7500 Fast Dx Real-Time PCR instrument (ThermoFisher) for SARS-CoV2 detection. SARS-CoV-2 virus sequence was then detected by next-generation sequencing (NGS) using a validated clinical assay in the James Molecular Laboratory at The Ohio State University. Residual RNA from PCR-based testing was reverse-transcribed using SuperScript™ VILO™ cDNA Synthesis Kit (ThermoFisher). NGS was performed using primer sets that tiled the entire SARS-CoV-2 genome (Ion AmpliSeq SARS-CoV-2 Research Panel, ThermoFisher), with library preparation and sequencing performed on Ion Chef and S5, respectively (Ion Torrent, Life Technologies). This panel included primers for the co-amplification of human housekeeping genes to assess RNA quality.
Analysis was performed in the Ion Browser with COVID-19 annotation plugins that produced consensus FASTA files using the IRMA method (reference strain: NC_045512.2). For tree-building, individual COVID-19 sequence FASTA files were combined into a single multifasta file with a custom shell script. The multifasta files were aligned using MAFFT (Katoh et al, 2002) (version 7.453) using default settings. MAFFT alignment files were analyzed for maximum likelihood using RAxML (Stamatakis, 2006) (version 8.2.12) using the GTRGAMMA model with 1000 bootstraps. The tree was produced using Dendroscope (Huson and Scornavacca 2012) (version 3.7.2) with default settings. Numbers at the tree branches represent percent of bootstraps supporting a branch (i.e. 30 = 300/1000 runs supporting this branch). Strain typing and clades were designated using the most recent NextStrain nomenclature (Bedford et al, 2021).
The sequence of the COH.20G/501Y variant in sample D32 was confirmed by an independent SARS-CoV-2 genomic sequencing and analysis method. Briefly, RNA was reverse-transcribed using SuperScript™ VILO™ cDNA Synthesis Kit (ThermoFisher). Libraries were produced with KAPA HyperPrep and DI Adapter Kit (Roche), SARS-CoV-2 viral sequences were captured with COVID-19 Capture Panel covering the entire genome (IDT) and the products were sequenced on the NextSeq 550 (Illumina). The analysis pipeline including BaseSpace, a custom pipeline using GATK tools and DRAGEN RNA Pathogen Detection software (Illumina).
Results
Summary of sequencing results in the early and mid-pandemic period
RNA extracted from PCR-positive nasopharyngeal samples from the Columbus OH area was sequenced in April (n= 56), May (n = 71), June (n=21), July (n=16), September (n = 11) and December 2020 (n =36) and January 2021 (n = 11) for surveillance purposes. A total of 222 NP samples were sequenced. In April 2020, two samples were positive for the S-strain with the remainder representing the G-strain.
Aside from the G-strain-defining changes (Supplementary Table 1), there were very few recurrent non-conservative/non-synonymous changes observed in April and May 2020 samples. In that period, most of the G-strain positive cases represented G strain alone or an unspecified G branch (17.3%, commonly with ORF8 p.Ala51Val) or the 20C clade bearing ORF3 p.Gln57His (80.3%), with few representing clade 20B (2.4%). In June and July 2020, as apparent infection rates in Columbus decreased, there was a proportional increase in clade 20B (40.5% of samples), with fewer G/unspecified (21.7%) and clade C viruses (37.8%). In September, coinciding with an increase in PCR positivity rates in the area, 20C clade samples again predominated (72.7% of samples), with some showing additional variants closely matching the newly designated 20G clade (Bedford, 2021), with the remaining being 20A or 20B clade viruses. Samples were not obtained during the months of October and November 2020.
Rapid emergence of a clade 20G virus with shared S, N and M mutations
When sequencing resumed in late December, we noted the emergence of a distinct 20C/G clade that had acquired the following variants: S p.Gln677His, M p.Ala85Ser and N p.Asp377Tyr (Table 1A) and is designated COH.20G/677H (Figure 1, brackets). During the week of Dec 21st 2020, these 3 variants was co-detected in 1 of 10 samples (10%), the following week, they were detected in 6/20 (30%) samples and then in 6/10 (60%) of samples in the following week.
Samples labeled “COH” are nasopharyngeal swabs from patients tested in Columbus, Ohio from 12/21-12/31/20. Most are clade 20G, with one each being 20A, 20B, and a 20G variant (var) that lacks every strain-defining mutation. The S N50Y-bearing virus is marked with an arrow. The emerging 20G/677H variant is bracketed, with the adjacent 20G/377Y contain N D377Y but not Q677H or M A85S. Reference sequences (FASTA files downloaded from GISAID.org) show recent examples of the UK B.1.1.7/20I/501Y.V1 strain (EPI_ISL_792680 collected in Japan on 1/2/2021), South African 20H/501Y.V2 strain collected in Australia (EPI_ISL_775245, 1/4/2021) and South Africa (EPI_ISL_745160, 12/4/2020), and a 20C-derived isolated from Nevada with several S variants (EPI_ISL_751557, 12/4/2020) as well as the 2019-nCoV/USA-WA1/2020 SARS-CoV-2 reference stain (ATCC). See Methods for details on tree-building and interpretation.
In all cases, these three co-occurring variants arose in a 20G clade variant branch that had been present in Columbus since at least September 2020. The backbone was defined by ORF1A: p.Met2606Ile (c.7818G>A), p.Leu3352Phe (c.10054C>T), p.Thr4847Ile (c.14540C>T), p.Leu6053Leu (c.18159A>G), p.His7013His (c.21039C>T); ORF3A: p.Gly172Val (c.515G>T); ORF8: p.Ser24Leu (c.71C>T); N: p.Pro67Ser (c.199C>T) and p.Pro199Leu (c.596C>T).
Emergence of a clade 20G virus harboring N501Y and ORF8 R52I Variants
In late December, we detected a single sample with a 20G strain backbone that had acquired S N501Y and ORF8 R52I mutations (designated COH.20G/501Y, Figure 1, arrow), which are both present in the UK-B.1.1.7 strain. In contrast to the B.1.1.7 strain, which has a 20B origin, the S N501Y and ORF8 R52I variants identified in this case were on a 20G background common to our area, as defined by ORF1A: p.Leu3352Phe (c.10054C>T), p.Leu6053Leu (c.18159A>G), p.His7013His (c.21039C>T), ORF3A: p.Gly172Val (c.515G>T), ORF8: p.Ser24Leu (c.71C>T), N: p.Pro67Ser (c.199C>T), and p.Pro199Leu (c.596C>T). In addition to lacking the characteristic N p.ArgGly203LysArg (c.608_610delGGGinsAAC) marking the 20B clade, COH.20G/501Y also lacks the other consensus changes in B.1.1.7, as summarized in Supplementary Table 1.
Discussion
We report the presence of a viral isolate from late December 2020 in Columbus Ohio that has acquired the S N501Y variant. This amino acid change was first described in a clinical sample in the United Kingdom in association with other novel S variants and a clade 20B backbone (ECDC, 2020; Davies et al, 2020), with the combination named as the B.1.1.7 strain and a Next Strain designation as 20I/501Y.V1 (Bedford et al, 2021). The same N501Y mutation was subsequently found in a clade 20C strain in South Africa, where it was associated with a different set of additional S variants (Tegally et al, 2020), with Next Strain designation as 20H/501Y.V3. In late December 2020, the incidence of detection of both of these variants began markedly increasing, implicating S 501Y (with or without other S mutations) in increased transmissibility. The virus with S N501Y identified in Columbus (COH.20G/501Y) has a 20G backbone but lacks most of the reported consensus changes in 20I/501Y.V1 as well as those in the 20H/501Y.V2. This favors an independent acquisition of this variant in a 20G clade branch that has been consistently present in Ohio since at least September 2020.
The S N501Y mutation, located within the receptor binding domain, is of particular concern for two reasons. First, the S protein with 501Y mutation displays increased affinity for ACE2 (Luan et al, 2020; Starr et al, 2020). Second, S 501Y mutation may impact association of receptor binding neutralizing antibodies including those in the Regeneron cocktail (Weisblum et al, 2020; Starr et al, 2020). The S N501Y mutation has also been shown to emerge spontaneously with viral passaging in a mouse model of SARS-CoV-2 infection (Gu et al, 2020), supporting its role in promoting viral spread and/or transmissibility. The only other shared mutation of COH.20G/501Y with B.1.1.7 is the ORF8 R52I mutation. B.1.1.7 also has a deletion involving ORF8 that would likely inactivate its functions; such deletions have emerged in multiple strains of SARS-CoV-2 (Su et al, 2020) but were not present in COH.20G/501Y.
We also report the emergence of a predominant SARS-CoV-2 virus population with a 20C/G clade backbone that has single mutations in the S, M and N genes (Gln677His, p.Ala85Ser and p.Asp377Tyr, respectively) in Columbus, Ohio in December. The N677H mutation affects a QTQTN consensus sequence (Figure 2) adjacent to the polybasic furin cleavage site spanning the S1 and S2 junction (Jacob et al, 2020). That mutation has only been rarely reported in NextStrain and in other publications (Kim et al, 2020). Deletions spanning the QTQTN motif have been reported and may influence viral properties (Liu et al, 2020). The N Asp377Tyr mutations has been uncommonly reported previously (Gupta et al, 2020). The rapid emergence of this variant merits close attention.
Signal peptide (SP), N-terminal domain (NTD), receptor-binding domain (RBD), fusion peptide (FP), heptad repeat 1 (HR1), heptad repeat 2 (HR2), and transmembrane domain (TM).