Abstract
Wide-scale SARS-CoV-2 genome sequencing is critical to monitoring and understanding viral evolution during the ongoing pandemic. Variants first detected in the United Kingdom, South Africa, and Brazil have spread to multiple countries. We have developed a software tool, Variant Database (VDB), for quickly examining the changing landscape of spike mutations. Using this tool, we detected an emerging lineage of viral isolates in the New York region that shares mutations with previously reported variants. The most common sets of spike mutations in this lineage (now designated as B.1.526) are L5F, T95I, D253G, E484K or S477N, D614G, and A701V. This lineage appeared in late November 2020, and isolates from this lineage account for ~25% of coronavirus genomes sequenced and deposited from New York during February 2021.
After the early months of the SARS-CoV-2 pandemic, the vast majority of sequenced isolates contained spike mutation D614G (along with 3 separate nucleotide changes) (Korber et al., 2020). Following a period of slower change, the fourth quarter of 2020 witnessed the emergence of several variants containing multiple mutations, seemingly focused on the spike protein (Rambaut et al., 2020b; Faria et al., 2021; Tegally et al., 2020; Zhang et al., 2021). Multiple lines of evidence support escape from antibody selective pressure as a driving force for the development of these variants (Cele et al., 2021; Greaney et al., 2021; Wang et al., 2021; Wibner et al., 2021). Considerations about the potential effects of these mutations on the effectiveness of passive antibody therapies and on the ability of vaccines to prevent mild or moderate COVID-19 have driven genomic surveillance programs to monitor the evolution of SARS-CoV-2. Analysis of this wealth of genomic sequences requires a variety of bioinformatics techniques. Here we developed a simple and fast utility that permits rapid inspection of the mutational landscape revealed by these sequences. This tool uncovered several groups of recent isolates that contain mutation patterns with changes at critical antibody binding sites.
Methods
We have developed a software tool named VDB (Variant Database). This tool consists of two Unix command line utilities: (1) vdb, a program for examining spike mutation patterns in a collection of sequenced isolates, and (2) vdbCreate, a program for generating a list of isolate spike mutations from a multiple sequence alignment for use by vdb. The design goal for the query program vdb is to provide a fast, lightweight, and natural means to examine the landscape of SARS-CoV-2 spike mutations. These programs are written in Swift and are available for MacOS and Linux from the authors, and these will be placed on the Github repository. The vdb program implements a mutation pattern query language (see Supplemental Method) as a command shell. The first-class objects in this environment are a collection of isolates (a “cluster”) and a group of spike mutations (a “pattern”). These objects can be assigned to variables and are the return types of various commands. Generally, clusters can be obtained from searches for patterns, and patterns can be found by examining a given cluster. Clusters can be filtered by geographical location, collection date, mutation count, or the presence or absence of a mutation pattern. The geographic or temporal distribution of clusters can be listed.
Results presented here are based on a multiple sequence alignment from GISAID (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) downloaded on February 10, 2021. Additional sequences downloaded from GISAID on February 22, 2021, were aligned with MAFFT v7.464 (Katoh and Standley, 2013).
Phylogenetic Analysis
Multiple sequence alignments were performed with MAFFT v7.464 (Katoh and Standley, 2013). The phylogenetic tree was calculated by IQ-TREE (Nguyen et al., 2015), and the tree diagram was generated using iTOL (Interactive Tree of Life) (Letunic and Bork, 2006). The Pango lineage nomenclature system (Rambaut et al., 2020a) provides systematic names for SARS-CoV-2 lineages. The Pango lineage designation for B.1.526 was supported by the phylogenetic tree shown in Figure 1.
Results
Using the vdb tool, we detected several clusters of isolates (unrelated to variants B.1.1.7, B.1.351, B.1.1.248, and B.1.429; Rambaut et al., 2020b; Faria et al., 2021; Tegally et al., 2020; Zhang et al., 2021) with spike mutations at sites known to be associated with resistance to antibodies against SARS-CoV-2 (Gaebler et al., 2021; Wang et al., 2021) (Table 1). This program can find clusters of isolates sharing identical sets of spike mutations, and then these patterns can be used to find potentially related isolates. One notable cluster of isolates was collected from the New York region and represents a distinct lineage, now designated as B.1.526 (Figure 1). There are two main branches of this lineage, one having E484K and the other including S477N, both located within the receptor-binding domain (RBD) of spike (Figure 2 and Supplementary Table S1). Regarding four of the mutations in isolates in this lineage: (1) E484K is known to attenuate neutralization of multiple anti-SARS-CoV-2 antibodies, particularly those found in Class 2 (Gaebler et al., 2021), and is also present in variants B.1.351 (Tegally et al., 2020) and P.1/B.1.1.248 (Faria et al., 2021), (2) D253G has been reported as an escape mutation from antibodies against the N-terminal domain (McCallum et al., 2021), (3) S477N has been identified in several earlier lineages (Hodcroft et al., 2020), is near the binding site of multiple antibodies (Barnes et al., 2020), and has been implicated to increase viral infectivity through enhanced interactions with ACE2 (Chen et al., 2020; Ou et al., 2020), and (4) A701V sits adjacent to the S2’ cleavage site of the neighboring protomer and is shared with variant B.1.351 (Tegally et al., 2020). The overall pattern of mutations in this lineage (Figure 2) suggests that it arose in part in response to selective pressure from antibodies. Based on the dates of collection of these isolates, it appears that the frequency of lineage B.1.526 has increased rapidly in New York (Table 2).
Supplementary Material
Methods
Commands for the program vdb, implementing a mutation pattern query language:
Acknowledgments
We thank the Global Initiative on Sharing Avian Influenza Data (GISAID) and the originating and submitting laboratories for sharing the SARS-CoV-2 genome sequences; see Supplementary Table S2 for a list of sequence contributors. We thank Andrew Rambaut and Áine O’Toole for lineage designation. This work was supported by the Caltech Merkin Institute for Translational Research (P.J.B.) and the Bill and Melinda Gates Foundation Collaboration for AIDS Vaccine Discovery (CAVD) (INV-002143).
Footnotes
This version has updated Table 2 to include sequencing of samples collected during February 2021.