Long-term serial passaging of SARS-CoV-2 reveals signatures of convergent evolution

Measures to control the COVID-19 pandemic such as antiviral therapy and vaccination have been challenged by ongoing virus evolution under antiviral and immune pressures. Understanding viral evolutionary dynamics is crucial for responding to SARS-CoV-2, and preparing for the next pandemic, by informing prediction of virus adaptation, public health strategies, and design of broadly effective therapies. Whole-genome sequencing (WGS) of SARS-CoV-2 during the pandemic enabled fine-grained studies of virus evolution in the human population. Serial passaging in vitro offers a controlled environment to investigate the emergence and persistence of genetic variants that may confer selective advantage. Nine virus lineages, including four “variants of concern” and three former “variants under investigation” as designated by the World Health Organisation, were chosen to investigate intra- and inter-lineage evolution through long-term serial passaging in Vero E6 cells. Viruses were sampled over at least 33 passages (range 33-100) and analysed using WGS to examine evolutionary dynamics and identify key mutations with implications for virus fitness, transmissibility, and immune evasion. All passages continued to replicate in culture, despite regular accumulation of mutations. There was evidence of convergent acquisition of mutations both across passage lines and compared with contemporaneous SARS CoV-2 clinical sequences from population studies. Some of these convergent mutations are hypothesised to be important in proliferation of SARS-CoV-2 lineages, such as by evading host immune responses (e.g. S:A67V, S:H655Y). Given these mutations arose in vitro, in the absence of a multicellular host immune response, this suggests virus genome mutation resulted from stochastic events, rather than immune-driven mutation. There was a regular gain and loss of low-frequency variants during serial passaging, but some became fixed in subsequent multiple passages, suggesting either a benefit of the mutation in vitro, or at least a lack of deleterious effect. Our findings reveal valuable insights into the evolution of SARS-CoV-2 by quantitatively investigating evolutionary dynamics of the virus over the greatest number of serial passages to date. Knowledge of these evolutionary trends will be useful for public health and the development of antiviral and vaccine measures to reduce the effects of SARS CoV-2 infection on the human population.


Introduction
The COVID-19 pandemic saw whole-genome sequencing (WGS) embraced on an unprecedented scale, with nearly 100 countries worldwide possessing SARS-CoV-2 WGS capability and contributing to publicly available sequencing repositories (Chen et al., 2022).Consequently, the number of high quality full-genome sequences for SARS-CoV-2 (>14,000,000 as of July 2023) far exceeds that of any other pathogen.The availability of these sequencing data have allowed the population-scale evolution of SARS-CoV-2 to be characterised with fine-grained detail, facilitating real-time monitoring of the spread of SARS-CoV-2, identification of new viral lineages, and have provided crucial insights into virus evolution.For example, the rise and fall of different SARS-CoV-2 lineages has been charted, with attempts to link the presence of key mutations with lineage success in promulgation through the human population.This wealth of genomic data has been vital in guiding public health responses, vaccine design, and antiviral treatments (Markov et al., 2023).
Global sequencing efforts have revealed that SARS-CoV-2 has continued to accumulate mutations throughout its genome since first emerging globally in 2020.The rate of mutation accumulation within SARS-CoV-2 (~1 × 10 -6 to 2 × 10 -6 mutations per nucleotide per replication cycle) is typical of betacoronaviruses, but below the rate typical in other RNA viruses that lack proofreading mechanisms (Markov et al., 2023).Charting the changes accruing in the SARS-CoV-2 genome in real time has allowed quantitative inference of evolutionary processes, such as the diversification of the virus into distinct phenotypes with different transmissibility, disease severity, and immune evasion properties (Markov et al., 2023).Retrospective consideration can divide the COVID-19 pandemic into several 'eras': the initial phase of the pandemic characterised by apparently limited evolution of the virus, the sudden emergence of the highly divergent virus lineages with altered phenotypes (variants of concern, VOCs) (Tay et al., 2022), and periods of gradual evolution within VOC lineages (Lythgoe et al., 2023;Tay et al., 2022).Determining the evolutionary forces responsible for each of these eras is key to understanding how the pandemic has progressed over time, how it will continue to progress as SARS-CoV-2 shifts towards being an endemic virus (Lavine et al., 2021), and how future pandemics might unfold.
Analysis of the changes that have occurred in SARS-CoV-2 genomes from the global clinical population suggests the predominant driver of evolution in SARS-CoV-2 has been the accumulation of mutations resulting in increased transmissibility (Markov et al., 2023).Increased transmissibility is a multifaceted trait resulting from a given virus having an increased ability to survive within an infected individual, shed from an infected individual, establish within a new individual, or some combination of each of these abilities.These are all influenced by virus evading the host immune response.A range of 'key mutations' that are partially and additively responsible for each of these components of transmissibility have been identified.For example, several mutations in the Spike region have been linked to enhanced receptor binding, such as D614G, identified early in the pandemic (Yurkovetskiy et al., 2020), and N501Y, which was identified in several VOCs (Harvey et al., 2021;Leung et al., 2021;Liu et al., 2022).These conclusions predominantly based on clinical observations, albeit with further laboratory-based exploration, are valuable for understanding how the pandemic has unfolded and will continue to unfold.
Whilst genomic data derived from routine genomic sequencing for clinical or surveillance purposes (henceforth referred to as the "clinical population" of SARS-CoV-2) has enhanced understanding the evolution of SARS-CoV-2, there are inherent limitations in relying solely on this information.The availability of WGS data is influenced by differential sequencing efforts between countries, especially when routine WGS is difficult to achieve in low-and middle-income countries (Chen et al., 2022), or when data sharing in online repositories is not consistent among jurisdictions.Additionally, the gain or loss of mutations in these populations is affected by multiple uncontrolled processes including natural selection, genetic drift, host immunity, and population dynamics.Understanding the relative impact of each process is difficult in such an uncontrolled environment.Consequently, it is challenging to understand the adaptation of SARS-CoV-2 to specific selective pressures, or to predict the potential evolutionary pathways, when relying solely on the clinical population.
Serial passaging in vitro provides complementary insights into the ongoing evolution of viruses.In these experiments, the virus is passaged through successive generations within a cell line, allowing study of virus evolution in a controlled setting, as mutations accumulate over time during the course of passaging (Holland et al., 1982).Most commonly, in vitro serial passaging is conducted to develop virus stocks and assess therapeutics (Dimmock and Easton, 2014), or to assess virus attenuation for vaccine development (Minor, 2015).Serial passaging can also be used to study the evolutionary trajectory of viruses over long term experiments, with strictly controlled time periods (Andino and Domingo, 2015;Bertels et al., 2019).These experiments can determine the speed at which a virus develops resistance to antiviral therapies, or to understand how the virus evolves over longer periods in the absence of selective constraints including immune pressure.
Few studies have investigated the evolutionary dynamics of SARS-CoV-2 over time in vitro.Based on limited passaging of ancestral lineages (<15 passages), it was reported that SARS-CoV-2 accumulates mutations readily in vitro, especially when passaged in Vero E6 cells, with an estimated in vitro spontaneous mutation rate of 1.3 × 10 −6 ± 0.2 × 10 −6 per-base per-infection cycle (Amicone et al., 2022).There is evidence for the convergent acquisition of mutations among samples taken through serial passage, such as through accumulation of mutations associated with Vero E6-cultured samples (Amicone et al., 2022), or associated with the global clinical population of SARS-CoV-2 (Chung et al., 2022).However, these studies explicitly focusing on tracking the evolution of SARS-CoV-2 in vitro generally studied a relatively small number of passages (e.g.<10), over a short time, and relatively few SARS-CoV-2 lineages.The greatest number of lineages published to date is 15 passages with two lineages from early in the pandemic (Amicone et al., 2022).Further in vitro serial passaging studies of SARS-CoV-2 are needed to: i) more closely approach the multiple passages occurring the human population, ii) assess sites of mutation fixation, iii) use this assessment to determine potential antiviral and vaccine resistance sites developing without antiviral or vaccine pressure, and iv) inform therapeutic and vaccine research.
In this study, we compared the accumulation of mutations in vitro in 11 different passaged viruses, corresponding to nine unique Pango lineages, throughout long-term serial passaging (33-100 passages per passaged virus line).The evolutionary dynamics of these passage lines was determined by characterising the mutations that accumulated, both in terms of their number and potential functional consequences.Additionally, we assessed mutations for evidence of convergence across passage lines and/or with key mutations from the global clinical population of SARS-CoV-2.

Rationale and study design
Since the beginning of the COVID-19 pandemic, routine whole-genome sequencing (WGS) has been conducted at the Virology Research Laboratory, Serology and Virology Division (SAViD), NSW Health Pathology East, on PCR-positive SARS-CoV-2 clinical specimens.Viruses from clinical specimens were isolated in vitro, and then serially passaged and sequenced to investigate the evolutionary dynamics of SARS-CoV-2 in culture (see additional methods below).Initially an early variant (A.2.2) was chosen for this purpose.Later, additional lineages were also grown in vitro based on their relevance within the clinical population globally, with a particular focus on 'variants of concern' (VOC) or 'variants under investigation' (VUI), as designated by the World Health Organisation.In total, 11 different passage lines were established for the purpose of this study, with the sample details outlined in Table 1.All samples were taken through a minimum of 33 passages, up to a maximum of 100 passages.

Serial passaging
All SARS-CoV-2 cultures were performed in Vero E6s cells.Vero E6 cells were maintained in Minimum Essential Media supplemented with 10%-FBS and 1xPSG (MEM-10), and incubated at 37°C, 5% CO2.The day prior to infection, a 24-well plate was seeded with 120,000 cells per well and incubated overnight, for approximately 90% confluency at the time of virus inoculation.The Vero E6 cell line was chosen as it supports viral replication to high titres due to its lack of interferon production and relatively high expression of the ACE-2 receptor (Emeny and Morgan, 1979).Therefore, using the Vero E6 cell line disentangles the effects of virus replication dynamics from host-mediated selective pressures.The appearance of any immune evasion-associated mutations from the clinical population during in vitro passaging in Vero E6 cells can help to infer other fitness advantages of these mutations beyond immune evasion.
Clinical specimens (nasopharyngeal swabs in Virus Transport Media, referred to as "Passage 0") positive by diagnostic RT-qPCR were transported to a PC3 facility for virus culture.Specimens were spin-filtered (0.22 µm sterile Ultrafree-MC Centrifugal Filter) to remove cellular debris and reduce potential for bacterial and fungal contamination.Cell culture maintenance media was removed from wells, and 100 µL of virus-containing flow-through from spin-filtered specimens was added to wells.Plates were incubated for 1hr, before inoculum was removed and replaced with 500 µL maintenance media (MEM-2).Virus cultures were incubated for 3-4 days and observed for cytopathic effect (CPE).Cultured virus was then harvested (passage 1), diluted 1:1000, and used to inoculate a fresh plate of naïve Vero E6 cells (passage 2).Cultures of each SARS-CoV-2 lineage were maintained up to a maximum of passage 100.Aliquots of each virus passage were frozen at -80°C until later extraction for whole-genome sequencing.

Whole-genome sequencing
Samples were taken from each passage line and prepared for whole-genome sequencing (WGS).All wet lab methods follow those described previously (Foster et al., 2022), with the exception of the choice of amplicon scheme.In the present study, amplification was performed using the "Midnight" protocol, comprising tiled 1200 nucleotide amplicons (Freed et al., 2020), followed by 150 bp paired-end sequencing on an Illumina MiSeq.We chose not to sequence every passage of every passage line.Instead, we focused on sequencing passages 0-6, then every three passages onwards where possible.

Tracking mutational changes during serial passaging
We assessed the changes in called variants within and among passage lines throughout serial passaging using our in-house "vartracker" pipeline (see below, available from https://github.com/charlesfoster/vartracker),using the variant call format (VCF) files from lofreq as input.For each passage sample within a passage line, these VCF files were derived by calling variants against the SARS-CoV-2 reference genome (using the CIS pipeline, as per above), not against the prior passage number.Doing so allows variants to be discussed using standard notation for SARS-CoV-2, while also allowing the gain and/or loss of variants between passages to be tracked.
The vartracker pipeline comprises a series of in-house python scripts to allow interrogation of the mutations that accumulate within a passage line during serial passaging.During the pipeline, the VCF files for each passage line were prefiltered to retain variants with a minimum singlenucleotide variant (SNV) and indel frequency 0.03 and 0.1, respectively, using cyvcf2 (Pedersen and Quinlan, 2017) and bcftools.Functional annotation of called variants was inferred using bcftools csq (Danecek and McCarthy, 2017), followed by merging into a single file.The merged VCF was then processed using custom python code to track the change in presence/absence of a given variant throughout the trajectory of the passage line (from passage 0 to the final passage), as well as the change in allele frequency of each variant throughout passaging.Variants were classified under four main broad categories: present in passage 0 but lost throughout passaging ('original_lost'), present in passage 0 and retained throughout passaging ('original_retained'), absent in passage 0 but gained and retained throughout passaging ('new_persistent'), or absent in passage 0 but gained and lost throughout passaging ('new_transient').All variants were also subject to a suite of quality control measures incorporating the estimated variant depth, the overall site depth, the depth in a window of sites surrounding the variant site, and an overall metric.
We investigated convergent evolution of any new_persistent or new_transient variants (a) among passage lines, and (b) compared to the clinical population of SARS-CoV-2 globally by searching them against a database of SARS-CoV-2 mutations comprising information on the functional consequences of mutations with associated references to primary literature (Anwar et al., 2022).Additionally, we calculated the rate of mutation accumulation within each passage line, considering the rate among the overall genome as well as individual genes/regions within the genome, using the following formula:   () =   .  (following the approach of Amicone et al., 2022).
In this formula, Σfr is the sum of the frequency of all detected mutations in a region r, P is the number of passages, and Lr is the length of r.We calculated the rate for each applicable passage up to and including passage 33 to observe whether the inferred mutation rate varies over time.
We investigated the evolutionary dynamics within all passage lines from passage 0, or the earliest passage successfully sequenced, until their final passage number (Table 1).However, for all comparisons among passage lines, we restricted our focus to passages 0-33, with passage 33 representing the maximum passage number in common that all passage lines had reached.We also investigated in further detail the evolutionary trajectory of passage line "POW005" (Pango lineage: B.1.319),which has been maintained for the greatest number of passages (100) across all passage lines.The full results for all other passage lines across all passages are not discussed in this manuscript, but are available within the supplementary materials (Table S2).It is also worth noting that sequencing data for some passage numbers from some passage lines are not available.In these cases, particularly with samples corresponding to VOCs, there was insufficient material from the early passages to conduct WGS and/or the material was needed for unrelated neutralization assay studies.

Phylogenetic analysis
A phylogenetic tree was estimated to investigate further the evolution of the passaged virus lines in vitro.All isolate consensus genomes were aligned against the Wuhan Hu-1 reference genome using minimap2 (Li, 2018), followed by trimming of the 5' and 3' UTRs and conversion to multifasta format using gofasta (Jackson, 2022).We then used an in-house script to mask known problematic sites from the alignment (De Maio et al., 2020), and filtered out sites with too many gaps using eslalimask (hmmer.org).Finally, we inferred a maximum likelihood phylogeny for the samples using iqtree2 (Nguyen et al., 2015), with the best fitting nucleotide substitution model chosen using a constrained search using ModelFinder ('mset': GTR, TIM2, GTR; 'mfreq': F; 'mrate': 'G,R') (Kalyaanamoorthy et al., 2017), and support estimated using 1000 ultrafast bootstraps (Hoang et al., 2018).

Sequencing metrics
The majority of samples from each passage line were successfully sequenced (Supplementary Table S1).Excluding one sample that yielded no sequence, the mean reference genome coverage at a depth of at least 10 reads was 99.19% (SD: 1.07).Each of these samples was also sequenced at a high mean depth (mean 1334.48,SD: 753).All except one passage line were called as the same lineage as their original isolate (passage 0) after their maximum number of passages.The exception was "POW005", which was originally called as the lineage B.1.319,but from passage 5 onwards was called as B.1 by pangolin, albeit with conflicting placements by Usher (B.1(1/3); B.1.319(1/3);B.1.616(1/3)).However, nextclade called the lineage of passage POW005 as B.1.319from passage 0 through to passage 100, despite the accumulation of many private mutations.

Evolutionary dynamics across passage lines in vitro
All passage lines accumulated a range of mutations throughout the breadth of the genome (Figure 1).Many of these mutations were detected at a very low variant allele frequency (VAF) within a given passage (e.g., <5%), and were then lost in subsequent passages.Conversely, some new mutations that arose during passaging persisted until the final passage, and became near-fixed at a consensus-level frequency (Supplementary Table S2.Supplementary Figure S1).Most variants (~90%) relative to the Wuhan Hu-1 reference genome present in passage 0 of a given passage line were retained during the course of serial passaging.Occasionally these 'original' variants were lost, most commonly when they were originally present at a sub-consensus level, but sometimes even when well supported (allele frequency >50%, ~2% of all variants).Variants were detected in all genes of the SARS-CoV-2 genome, but occurred at different frequencies among genes and among passage lines (Figure 2, Supplementary Figure S1).The inferred mutation rates in vitro varied among passage lines, and varied within passage lines over time (Supplementary Figure S2).
The progressive accumulation of mutations throughout the course of serial passaging was evident in the inferred phylogenetic tree (Supplementary Figure S3).All samples within a given passage line form clades exclusive of the other passage lines, demonstrating that the convergent acquisition of some mutations among passage lines (as discussed below) did not impact upon inferred phylogenetic relationships.Overall, the phylogenetic relationships among passage lines reflects the expected topology based on the relationships among SARS-CoV-2 lineages within the global phylogeny.

Mutation accumulation over 100 serial passages
In the "POW005" passage line (Pango lineage: B.1.319)there was a continuous accumulation of mutations throughout the course of serial passaging, with no signs of an asymptotic "levelling off" of mutation accumulation (Figure 3, Supplementary Table S2).The initial (passage 0) isolate possessed 10 variants relative to the Wuhan Hu-1 reference genome: one in the 5' UTR and nine occurring in coding regions, of which eight were present as the majority allele and seven were nonsynonymous variants.Throughout the course of serial passaging, POW005 accumulated 131 new variants (58 missense including two in-frame deletions, 63 synonymous, 10 in intergenic regions/UTRs), of which 97 were transient and 34 persisted until passage 100.The sole lowfrequency variant detected in passage 0 was lost during passaging, although the other 10 variants present at passage 0 were retained.

Signatures of convergent evolution
Although the majority of variants arising during serial passaging were passage-line specific, there were also many cases where different passage lines independently acquired the same variants in a convergent manner.Most instances of convergence were restricted to variants arising in only a small number of passage lines, and/or were occurring only at a low allele frequency.However, other variants were acquired and went to near-fixation in several to many lineages (Supplementary Table S2, Supplementary Figure S1).At least 18 different variants that arose during serial passaging have been previously reported in the scientific literature as having important functional consequences in the global clinical population of SARS-CoV-2 (Table 2, Supplementary Table S3).The classification of these variants falls under 20 unique categories within a SARS-CoV-2 functional mutation literature compendium (Anwar et al., 2022).

Discussion
After more than three and a half years of the COVID-19 pandemic, SARS-CoV-2 continues to diversify at a remarkable rate in the global clinical population.We found that SARS-CoV-2 steadily accumulated mutations in vitro throughout most of its genome, reflecting what continues to occur in the clinical population (Figure 1, Supplementary Table S2).The accumulation of mutations over time was reflected in a gradual divergence between passage 0 and the final passage within each passage line, but without significantly affecting the phylogenetic relationships between passage lines (Figure S3).The greatest mutation rate occurred in the four main VOC lineages (Delta, Omicron BA.1, Alpha, Beta) (Figure S2).There was a more than threefold difference in mutation rate between the passage lines with the greatest (Delta; ~8.51 x 10 -05 mutations per passage per nucleotide) and smallest ("POW004"; ~2.64 x 10 -05 mutations per passage per nucleotide) mutation rates.
Previous studies have concluded that the spike region of SARS-CoV-2 is a mutational hotspot.Whilst we observed many mutations arising in the spike gene, after correcting for gene length, the number of mutations in spike was comparable with that of other genes (Figure 2).Genes coding for other structural and accessory proteins (e.g.M, ORF7a, ORF7b) accumulated mutations at a greater rate than other regions, including spike, with variation between passage lines (Figure S2).While the spike region is clearly an important region of the SARS-CoV-2 genome for mutation, our results further reflect recent studies highlighting the importance of other regions of the genome for understanding the ongoing evolution of SARS-CoV-2 (Arshad et al., 2023;Foster et al., 2023;Redondo et al., 2021;Zandi et al., 2022), and the capacity of the virus to adapt to new evolutionary pressures such as new drug targets.
The success of the various lineages of SARS-CoV-2 in promulgating through the human population has been linked to functional effects of mutations that accrue over time (Markov et al., 2023).Determining why some mutations go on to be fixed, whereas others are temporary, is important to understand the evolutionary trajectory of SARS-CoV-2.In general, positive selection has played an important role in the COVID-19 pandemic, for example with mutations aiding in immune escape being selected for in a chronically infected host, particularly if that host is immune compromised.This process is hypothesised to have led to the emergence of variants of concern such as Omicron (Viana et al., 2022).Positive selection of mutations has been observed to arise during serial passaging in cell lines (Amicone et al., 2022), as well as in other animal models such as minks or domestic cats (Bashor et al., 2022;Shuai et al., 2020).We observed many variants newly arising in all passaged SARS-CoV-2 lineages (Supplementary Table S2, Supplementary Figure S1).Some of the >170 newly arising variants were transient and purged from the virus populations throughout the course of serial passaging, but some persisted until the final serial passages, occasionally at (near-)fixation within the virus populations.One signature of positive selection is a selective sweep, whereby a rare or previously non-existing allele rapidly increases in frequency within a population.Therefore, we considered the fixation of these newly arising mutations throughout the course of serial passaging to represent possible positive selection for mutations beneficial to virus survival in vitro.
Another hallmark of the COVID-19 pandemic has been high levels of homoplasy most likely caused by convergent and parallel evolution.Even from early within the pandemic, sites within the SARS-CoV-2 genome were noted as being highly homoplastic and recommended to be masked from alignments when estimating phylogenies (De Maio et al., 2020).As the pandemic has progressed, multiple mutations speculated to impart fitness benefits on SARS-CoV-2 have been observed.For example, the spike mutations N501Y, E484K and ΔH69/V70 have arisen several times independently in lineages designated as variants of concern, and are proposed to lead to increased immune escape and/or increased ACE2 binding affinity (Harvey et al., 2021).Therefore, to understand further the impact of the mutations arising de novo during long-term passaging, we determined whether there were any signatures of convergent evolution.We were especially interested in whether mutations that are clinically significant, for example because of their noted impacts on transmissibility or disease severity, would arise throughout the course of serial passaging.
Many of the same de novo mutations arose convergently across passage lines throughout the course of serial passaging (Supplementary Table S3).Generally these mutations only appeared at a low frequency in few passage lines, but occasionally went to fixation in multiple passage lines, most notably with S:H655Y (see below).Some convergent mutations were expected to arise given that they have been noted repeatedly to arise during serial passaging in Vero E6 cells, especially in the region of the Spike furin-cleavage site (Ogando et al., 2020;Sasaki et al., 2021).For example, the S:YQTQTN674Y mutation, a deletion of five amino acids near the spike S1/S2 cleavage site, arose independently in POW003 (A.2.2),POW005 (B.1.319)and POW007 (A.2.2).This same mutation emerged multiple times in two background lineages in a previous study (Amicone et al., 2022).
We also considered instances of convergent evolution with the broader clinical population of SARS-CoV-2.Comparing our results with a curated database of SARS-CoV-2 mutations revealed that there were many such cases of convergent evolution (Table 2).Noted functional impacts of the newly arising mutations in our study are related to various aspects of pharmaceutical effectiveness, including vaccine efficacy and antibody neutralization (Table 2).Examples include the spike mutations L5F, A67V, T76I, A222V and others (Table 2).Likewise, the nsp14:A504V mutation that arose in our passage line of the Alpha variant has been demonstrated clinically to temporally associate with remdesivir resistance (Gandhi et al., 2022).The repeated evolution of these clinically significant mutations conclusively demonstrates that variants that are related to immune evasion can arise during in vitro serial passaging even in the absence of immune or drug -related selective pressures.This is because (a) we passaged the viruses through a Vero E6 cell line, which lacks an innate immune response, and (b) we did not directly apply any selective agents to the samples during passaging, such as neutralising antibodies.This result raises the possibility that (at least some) mutations associated with immune escape that arose in the clinical population were not driven by intra-host selection in the presence of (e.g.) antiviral therapies (cf.Weigang et al., 2021), but instead arose de novo with intrinsic immune escape benefits.However, it is possible that there could be another benefit of these putative immunerelated mutations to the survival of SARS-CoV-2 outside of escaping an immune response.
Alternatively, the proliferation of these mutations in vitro could be explained by a relaxation of negative selection, which can lead to the chance success of alleles through genetic drift (Lahti et al., 2009), rather than by positive selection.Consequently, many of the putative immune-related mutations we observed during this study could have arisen and been successful through chance alone, such as through transmission bottlenecks between successive serial passages.Analogous situations in the global clinical populations of SARS-CoV-2 have been observed, whereby mutations that were either non-beneficial, or potentially even mildly deleterious, became fixed within given regions after public health measures created transmission bottlenecks (e.g.Foster et al., 2023).Likewise, the ORF8 accessory region remains a hotspot for mutations throughout the COVID-19 pandemic (Arduini et al., 2023;Hisner et al., 2023), possibly as a result of a relaxation of purifying selection, as was hypothesised for SARS-CoV (Zinzula, 2021).
The most striking example of convergent evolution of a clinically significant mutation across passage lines was the repeated evolution of S:H655Y.This mutation was already present in passage 0 in two passage lines, and arose newly throughout serial passaging in six of the remaining nine passage lines.The S:H655Y mutation was originally detected in early cases of the Gamma and Omicron VOCs.Impacts of the S:H655Y mutation include acting as a "gateway" mutation by increasing the fitness of lineages with complex combinations of clinically significant mutations, and increased spike cleavage and fusogenicity associated with enhancement of the endosomal cell entry pathway (Yamamoto et al., 2022;Yurkovetskiy et al., 2020).A putative switch towards preferentially using an endosomal cell entry pathway within our passage lines, rather than the ACE2-TMPRSS2 pathway typical of most variants prior to omicron (Jackson et al., 2022), is not surprising given that Vero E6 cell lines do not express TMPRSS2.Similar changes in virus tropism have been observed in the clinical population.For example, several lineages of the Omicron BA.1 and BA.2 variants (and their descendants) used an endosomal cell entry pathway rather than the ACE2-TMPRSS2 pathway for virus entry (Aggarwal et al., 2022).This change in tropism resulted in Omicron replicating more efficiently in the bronchi, but not productively infecting human alveolar type 2 cells of the lung (Hui et al., 2022;Lamers et al., 2022).These changes, among other factors, likely contributed to the high transmissibility and decreased pathogenicity observed clinically since the emergence of Omicron and its sub-lineages (Hyams et al., 2023;Yuan et al., 2022).
In addition to virus tropism-associated changes, any mutations associated with enhanced virus replication in vitro would be expected to be beneficial for virus survival and selected for, as is the case with replication-enhancing mutations in the clinical population.For example, the emergence of the spike mutation D641G early in the COVID-19 pandemic was linked with the proliferation of virus with enhanced replication (Zhou et al., 2021).Additional changes in the spike region have been linked to enhanced virus replication at subsequent stages of the pandemic, such as the P681R mutation within the Delta VOC (Saito et al., 2022).However, the impact of mutations in other genomic regions on virus replication is also important to consider.A well characterised pair of sequential non-synonymous amino acid changes in the nucleocapsid gene, N:R203K and N:G204R, augment virus replication in SARS-CoV-2 (Johnson et al., 2022).One example of a replication-associated mutation that arose during our long-term passaging study is P203L within nsp14, which encodes a proofreading exonuclease.This mutation has been linked previously to altered replication (Takada et al., 2023).
Early in the pandemic, the furin cleavage site (PRRAR) within the S1/S2 domain of the SARS-CoV-2 spike protein was noted as an unusual characteristic of the virus compared to other coronaviruses in the same clade (Coutard et al., 2020).The furin cleavage site has been proposed as a key component of the pathogenesis of SARS-CoV-2, playing an important role in the affinity of for human hosts (Johnson et al., 2021).Deletion of the furin cleavage site has been linked with a fitness advantage in Vero E6 cells, albeit with this advantage reduced when ectopic expression of TMPRSS2 is introduced to the Vero E6 system (Johnson et al., 2021).Our results further strengthen these previous findings, with the furin cleavage site being a hotspot for de novo mutations on culture in vitro (Table S2).Four different missense amino acid replacements were observed at amino acid position 682 of spike (R682G, R682W, R682L, R682Q) at varying allele frequencies, as well as a six amino acid deletion in one passage line (RARSVA683R).All but one of these mutations appeared within four serial passages.SARS-CoV-2 clearly adapts readily and rapidly to its environment to enable continued virus proliferation.
We also observed the loss of at least one clinically significant mutation throughout the course of serial passaging.One key SNP hypothesised to have contributed to the success of the Omicron BA.1 lineage is S:S375F, a mutation within the receptor-binding domain of the Spike protein responsible for attenuation of spike cleavage efficiency and fusogenicity as well as decreased ACE2 binding affinity (Kimura et al., 2022).This key mutation was present in our BA.1 passage line at a VAF of ~1 from passage 0 until passage 6 (i.e., fixed as the majority allele), but was only present at a low VAF of 13% in passage 9, and was then not detected in subsequent passages (Supplementary Table S2).The dominance of S:S375F in global Omicron sequences suggests the site may have been subject to initial positive selection followed by purifying selection to retain benefits imparted by the mutation.However, its loss throughout serial passaging suggests that the benefits in the clinical population do not translate to in vitro benefits, at least in Vero E6 cell lines.
Apart from revealing signatures of convergence with previously noted mutations of interest, our results provide a resource for investigating future changes occurring in SARS-CoV-2 globally.Any mutations arising convergently among passage lines that have not been noted previously as occurring commonly during serial passaging, or in the global clinical population, are worth further investigation.The repeated appearance of these mutations during in vitro culture implies they benefit SARS-CoV-2, at least in vitro, and could also add to the growing list of Vero E6-associated mutations in SARS-CoV-2.The same is true for other mutations arising de novo even in a single passage line.

Figures
Figure 1: The location of all detected variants across all samples in all passage lines.The colour of points indicates whether a variant was present in the original clinical isolate ('original_lost'), or arose new during serial passaging but was then lost ('new_transient') or persisted in the final passage number ('new_persistent').The shape of points corresponds to the inferred amino acid consequence, either non-synonymous, synonymous, or 'other' (in non-coding regions).All points above the horizontal dashed line are the majority allele for a given sample (allele frequency ≥50%).The location of genes within the SARS-CoV-2 genome is indicated above the plot in a schematic of the complete SARS CoV-2 genome of ~30 kb.

Figure 2 :
Figure 2: The location of new variants arising during serial passaging within the SARS-CoV-2 genome.a Bars represent the mean number among passage lines (error bars: 95% confidence intervals), with the shading representing the inferred amino acid consequences of the variants both with and without scaling by gene length.b As per panel 'a', but with each point representing a sample within a given passage line.c The number of unique variants detected per genomic region, separated by passage line.

Figure 3 :
Figure 3: The cumulative number of mutations detected within passage line "POW005" (Pango lineage: B.1.319)over 100 serial passages.The number of mutations at any given passage number refers to those mutations detected with respect to the SARS-CoV-2 reference genome.The total number is represented by the blue line, with the orange line representing only those mutations that newly arose throughout serial passaging (absent from the clinical isolate).

Table 1 :
Overview of the passage lines used within this study, including their nomenclatural assignments and the number of serial passages