High-Throughput, Single-Copy Sequencing Reveals SARS-CoV-2 Spike Variants Coincident with Mounting Humoral Immunity during Acute COVID-19

Tracking evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) within infected individuals will help elucidate coronavirus disease 2019 (COVID-19) pathogenesis and inform use of antiviral interventions. In this study, we developed an approach for sequencing the region encoding the SARS-CoV-2 virion surface proteins from large numbers of individual virus RNA genomes per sample. We applied this approach to the WA-1 reference clinical isolate of SARS-CoV-2 passaged in vitro and to upper respiratory samples from 7 study participants with COVID-19. SARS-CoV-2 genomes from cell culture were diverse, including 18 haplotypes with non-synonymous mutations clustered in the spike NH2-terminal domain (NTD) and furin cleavage site regions. By contrast, cross-sectional analysis of samples from participants with COVID-19 showed fewer virus variants, without structural clustering of mutations. However, longitudinal analysis in one individual revealed 4 virus haplotypes bearing 3 independent mutations in a spike NTD epitope targeted by autologous antibodies. These mutations arose coincident with a 6.2-fold rise in serum binding to spike and a transient increase in virus burden. We conclude that SARS-CoV-2 exhibits a capacity for rapid genetic adaptation that becomes detectable in vivo with the onset of humoral immunity, with the potential to contribute to delayed virologic clearance in the acute setting.

1 1 6 exclusion of unique SGS, we found that all remaining sequences exactly matched their 1 1 7 corresponding references, with quantitative recovery of the two targets from a dilution series 1 1 8 (Table I). These results support the high accuracy of our data generation and analytical approach.

HT-SGS Analysis of a Cultured Clinical Isolate of SARS-CoV-2
1 2 3 To begin evaluating intra-sample diversity of SARS-CoV-2, we applied our HT-SGS process to 1 2 4 a 4 th -passage Vero cell culture of the WA-1 reference clinical isolate. As shown in Figure 2A, sequence, consistent with the high accuracy of the method. However, data analysis at the single- individual virus genomes per haplotype, with each single-genome consensus supported by >500 1 2 9 sequence reads (Fig 2A-B). More than half (57.6%) of all SGS differed from the reference 1 3 0 consensus sequence at one or more nucleotide positions (Fig 2A). All 17 mutations detected in Structurally, mutations were clustered almost exclusively in the spike NTD and furin cleavage 1 3 3 site regions. The NTD mutations included 9 distinct single-nucleotide variants (SNVs) and 2 1 3 4 distinct insertions that added positively-charged or removed negatively-charged amino acid 1 3 5 residues at the NTD outer surface (Fig 2A and 2C), consistent with observed selection patterns in 1 3 6 other virus envelope proteins during cell culture adaptation (21,22). Mutations in the area of the 1 3 7 furin cleavage site included 3 SNVs and one deletion of 12 amino acids (Fig 2A and 2C), and 1 3 8 were consistent with mutations observed in this region after in vitro passage in other studies (23).

3 9
The remaining 2 mutations encoded a T307I substitution in spike, linked with R682L at the furin 1 4 0 cleavage site, and a T7I substitution in the M gene found both in isolation and linked with 2 1 4 1 different spike NTD mutations (Fig 2A). Overall, these results demonstrated that SARS-CoV-2 1 4 2 can accumulate considerable genetic diversity, as revealed by analysis of HT-SGS data at the We anticipated that, compared to high-quality RNA preparations from cultured virus, human 1 4 6 respiratory samples would contain variable levels of intact SARS-CoV-2 genomes, and that 1 4 7 contaminants and inhibitors of steps in the HT-SGS process might also be present. We therefore  Cross-sectional analysis of SARS-CoV-2 diversity and humoral immunity during acute 1 7 4

COVID-19
1 7 5 Because the mutations we detected in cultured virus resembled those described for SARS-CoV-2 1 7 6 and other viruses during culture adaptation, we interpreted the extensive diversity observed as 1 7 7 evidence of virus diversification in vitro rather than in the source patient. We therefore analyzed 1 7 8 the diversity of HT-SGS sequences obtained from the 7 study participants in S1 structural ORF3 and ORF6 genes, and 2 synonymous SNVs (Fig 3). Overall, therefore, cross- biolayer interferometry (BLI) to analyze antibody profiles in participants from whom 1 9 2 longitudinal serum samples were available (i.e., participants 1 and 3). In these individuals, we 1 9 3 observed a marked rise in autologous serum binding to spike protein between the earliest 1 9 4 available timepoint (participant 1, day 9 and participant 3, day 17) and later timepoints 1 9 5 (participant 1, days 16 and 19 and participant 3, day 27; Fig 4). The increase in total serum 1 9 6 binding to spike was 6.2-fold between days 12 and 16 in participant 1 and 5.75-fold between 1 9 7 days 17 and 27 in participant 3. Using monoclonal antibody (mAb) competition to map domain-1 9 8 specific responses, we detected serum binding to NTD, receptor-binding domain (RBD), and S2 1 9 9 domain in both participants (Fig 4). We also observed a continued increase in serum binding not  We next investigated the relationship between mounting spike-directed antibody responses and found that the burden of SARS-CoV-2 RNA declined substantially but irregularly between days 2 0 8 9 and 17 ( Fig 5A). Between days 9 and 13, virus RNA declined by nearly 4 orders of magnitude, 2 0 9 1 2 from 2.83 x 10 6 (N2) -3.0 x 10 6 (N1) copies/mL to 3.14 x 10 2 (N1) -3.86 x 10 2 (N2) copies/mL. deletions of either residues 141-144LGVY or residue 144Y alone. These mutations were found larger study of recurrently deleted regions (9). Therefore, we performed additional serum antibody mapping studies with this mAb and found that before the NTD mutations had emerged 2 2 4 in autologous viruses, autologous serum antibodies against NTD predominantly recognized the 2 2 5 4A8 epitope ( Fig 5D). Taken together, these results demonstrated a close temporal relationship 2 2 6 between the development of SARS-CoV-2 spike NTD-specific antibodies in serum, the 2 2 7 independent emergence of multiple mutations in a region of the NTD targeted by these 2 2 8 antibodies, and a transient delay in virus clearance. Here we developed and validated a novel method that accurately sequences the 6.1-kilobase 2 3 1 SARS-CoV-2 surface protein gene region from large numbers of individual virus genomes. Using this method, we analyzed virus genetic diversity both in vitro and in respiratory secretions slow evolution among worldwide virus sequences during the early months of the pandemic (1).

3 7
Nevertheless, our relatively homogeneous cross-sectional sequencing findings in people with shown to be a neutralizing antibody target (24), and was identified herein as a major target for spike variants by mounting antibody responses in the acute setting. 2 RNA in respiratory secretions may also favor high-quality data acquisition in very early all sequences in each sample. Finally, we cannot rule out that our distinctive findings might rapid accumulation of "total-body" virus diversity in vivo. However, we noted that our Tracking intra-individual virus evolution is of great interest in understanding SARS-CoV-2 2 6 7 pathogenesis and treatment. As our longitudinal study participant recovered clinically, spike polyclonal responses, perhaps especially in genome regions less tolerant of indel mutations (9). It will be important to examine whether this reflects a "tipping point" in early infection at which administered later in the disease course. Clinical Center who had positive tests for SARS-CoV-2 were enrolled consecutively for 2 9 0 combined virological and immunological analysis during the period of March-May 2020 (S1 2 9 1 Table). Study participants were recruited in compliance with relevant ethical regulations and 2 9 2 provided informed consent under protocols approved by the NIH Institutional Review Board.  from study participants were collected in viral transport medium and cryopreserved until 3 1 2 processing for HT-SGS or SARS-CoV-2 RNA quantification. Total RNA was extracted from oropharyngeal and nasopharyngeal swab specimens using the Pro Software (Bio-Rad) to quantify copies of N1, N2, and RP genes per well, which was then 3 2 6 normalized to mL of sample input. Nasopharygneal or oropharyngeal swab fluids were thawed and centrifuged at 1,150 x g for 15 3 2 9 min at room temperature to pellet cells and debris. Supernatants were transferred to separate 3 3 0 tubes, and supernatant and pellet fractions were processed in parallel, although SGS derived ThermoFisher Scientific) using a reverse transcription (RT) primer binding within the SARS- and S1 Table). Virus cDNA was treated with proteinase K for 25 min at 55°C with continuous RNAClean XP solid phase reverse immobilization (SPRI) beads (A63987, Beckman Coulter).

6 8
Length filtering was performed to remove reads shorter than 90% or longer than 130% of the 3 6 9 reference sequence length. Appropriately-sized reads were then binned using 8-base UMI Determining SGS in HT-SGS Data The probability that two independent UMI sequences differ by a single nucleotide substitution  We investigated the possibility of UMI collision (two distinct molecules labeled with the same  This number was observed to be small, and the probability of collision in each sample was at 4 1 6 most 4%, with an average 1.8% across all samples. We also note that, in the event of a UMI 4 1 7 collision between two distinct sequences, the clustering and consensus formation for each UMI abundance and removal of the sequence cluster with lower read abundance. We also applied a filtering criterion to remove variants in homopolymer regions with minimum 4 3 6 length of 2 and a frequency less than or equal to 20%. We did not consider quality or direction 4 3 7 and position filters typically used in analyzing paired-end, short-read data as these do not apply 4 3 8 to long-read amplicon sequencing. We then manually inspected the mutation list to remove were reverted to the reference base, where applicable, using an in-house python script. to HT-SGS. Results are described in Results and Table I.  representation with the NTD region colored in bright green. NTD mutations as well as T307I and 5 2 9 H655Y are shown in red and the furin cleavage site mutations are in brown. The molecular 5 3 0 structures were prepared with PyMOL (https://pymol.org).  Non-synonymous or synonymous mutations in each haplotype relative to the WA-1 reference with at least 1 non-synonymous mismatch to sample consensus are in pink. indicates the binding response without competition and is reported at saturating timepoint.

4 7
Stacked bars indicate proportions of binding attributable to S2 (dark blue), RBD (purple), and (right) are as in Fig 2 and 3. The haplotype matching the consensus for each sample is in red. Antibody 4A8 (PDB ID: 7c21) is shown to bind to NTD with its epitope (blue) 5 6 8 overlapping with the detected NTD mutations (right). The molecular structures were prepared 5 6 9 with PyMOL (https://pymol.org). (D) Relative contribution of NTD epitope-specific serum 5 7 0 antibodies to total NTD domain-specific binding on days 9, 12, 16, and 19. Plotted results 5 7 1 represent averages of 2-4 replicate experiments for each condition.  PCR using a forward primer that binds in ORF1, upstream of the spike gene start codon.

7 8
Amplified products are then subjected to long-read sequencing. show tasks at each step, with the tools used in the grey boxes, and the outputs in the blue bubbles. were excluded, following initial false bin removal from the sample and network adjacency. Data  Comparison of cDNA input copies from each sample with final SGS counts.