SARS-CoV-2 is well adapted for humans. What does this mean for re-emergence?

In a side-by-side comparison of evolutionary dynamics between the 2019/2020 SARS-CoV-2 and the 2003 SARS-CoV, we were surprised to find that SARS-CoV-2 resembles SARS-CoV in the late phase of the 2003 epidemic after SARS-CoV had developed several advantageous adaptations for human transmission. Our observations suggest that by the time SARS-CoV-2 was first detected in late 2019, it was already pre-adapted to human transmission to an extent similar to late epidemic SARS-CoV. However, no precursors or branches of evolution stemming from a less human-adapted SARS-CoV-2-like virus have been detected. The sudden appearance of a highly infectious SARS-CoV-2 presents a major cause for concern that should motivate stronger international efforts to identify the source and prevent near future re-emergence. Any existing pools of SARS-CoV-2 progenitors would be particularly dangerous if similarly well adapted for human transmission. To look for clues regarding intermediate hosts, we analyze recent key findings relating to how SARS-CoV-2 could have evolved and adapted for human transmission, and examine the environmental samples from the Wuhan Huanan seafood market. Importantly, the market samples are genetically identical to human SARS-CoV-2 isolates and were therefore most likely from human sources. We conclude by describing and advocating for measured and effective approaches implemented in the 2002-2004 SARS outbreaks to identify lingering population(s) of progenitor virus.


SARS-CoV-2 is well adapted for humans. What does this mean for re-emergence?
Several reports have noted that SARS-CoV-2 appears genetically stable and not under much pressure to adapt, which bodes well for diagnostics, vaccine, and therapeutics development (1)(2)(3)(4).How long a particular antiviral, antibody, or vaccine will be effective against SARS-CoV-2 depends greatly on how fast and how extensively the target gene or protein is evolving.To identify effective therapies, immense efforts have already been directed towards elucidating the precise structure of SARS-CoV-2 proteins, ideally, in complex with potential drug candidates -some of which are already undergoing clinical trials (5)(6)(7)(8)(9)(10)(11).Any new SARS-CoV-2 variant that can escape or confound these highly precise approaches will subvert efforts to control the pandemic.Therefore, it is very important to ascertain that the SARS-CoV-2 genes targeted in these efforts have stabilized and to identify novel virulent variants as soon as possible.
To gain a better understanding of how stable the SARS-CoV-2 genome is, we first performed a side-by-side comparison of evolutionary dynamics between SARS-CoV-2 and SARS-CoV.For this analysis, we curated high quality genomes spanning ~3-month periods for the following groups: 11 genomes for early-to-mid epidemic SARS-CoV, 32 genomes for late epidemic SARS-CoV, and 46 genomes for SARS-CoV-2 that included an early December, 2019 isolate, Wuhan-Hu-1, and 15 randomly selected genomes from each month of January through March, 2020 sampled from diverse geographical regions (methods in Supplementary Materials).We were surprised to find that SARS-CoV-2 exhibits low genetic diversity in contrast to SARS-CoV, which harbored considerable genetic diversity in its early-to-mid epidemic phase (Figure 1) (12); nucleotide diversity estimates (13,14) across all sites, non-synonymous and synonymous, for each locus examined are provided in the Supplementary Table .SARS-CoV was observed to adapt under selective pressure that was highest as it crossed from Himalayan palm civets (intermediate host species) to humans and diminished towards the end of the epidemic (15)(16)(17)(18); this series of adaptations between species and in humans culminated in a highly infectious SARS-CoV that dominated the late epidemic phase.In comparison, SARS-CoV-2 exhibits genetic diversity that is more similar to that of late epidemic SARS-CoV (Figure 1, Supplementary Table ).In fact, the exceedingly high level of identity shared among SARS-CoV-2 isolates makes it impractical to model site-wise selection pressure.As more mutations occur and, ideally, when SARS-CoV-2-like viruses from an intermediate host species are identified, it will become possible to model selection pressure as was done for SARS-CoV.An examination of 43 SARS-CoV and 46 SARS-CoV-2 genomes revealed a striking difference in the number of substitutions over similar 3-month periods (Figure 1A), with more genetic polymorphism in early-to-mid epidemic SARS-CoV compared to late epidemic SARS-CoV or SARS-CoV-2 (Figure 1B).To rule out a subsampling artefact, we performed one hundred sampling experiments from all high-quality SARS-CoV-2 sequences on GISAID; IQtree phylogenies were built from the one hundred 46-taxon subsampled sequence sets, i.e., 15 randomly selected samples from each month of January through March, 2020 from diverse geographical regions, in addition to Wuhan-Hu-1.The maximum tip-to-tip distance (number of substitutions between two genomes) of the early-to-mid epidemic SARS-CoV tree (~85 substitutions) was greater than that of all one hundred resampled SARS-CoV-2 trees (~15-25 substitutions; p < 0.01).Even by April 28, 2020, the SARS-CoV-2 genomes available on GISAID spanning 4 months exhibited modest genetic diversity (Supplementary Figure 1) as compared to early-to-mid epidemic SARS-CoV.
A caveat of this analysis is that genetic diversity can be inflated by sampling from diverse or high traffic locations and be skewed by factors such as effective population size and virus transmission rate; whereas a sampling bias towards isolates sharing a more recent common ancestor will underestimate genetic diversity (20)(21)(22).For this reason, we sampled within similar 3-month periods and ensured that there was geographic spread in the sampling.Unfortunately, due to the scarcity of 2003 SARS-CoV samples and information, we cannot ensure that the early-to-mid epidemic samples did not straddle deep splits in the tree.Nonetheless, the SARS-CoV genomes used in our analysis have been used in dozens of studies to examine SARS-CoV adaptive evolution.The division of SARS-CoV cases into early-to-mid versus late phase epidemiologicallylinked clusters has been validated (17).Furthermore, the late epidemic SARS-CoV and SARS-CoV-2 samples are from more numerous, international locations compared to early-to-mid epidemic SARS-CoV, which consists of infections only in China.The SARS-CoV outbreak spanning November, 2002 to August, 2003 was estimated to result in 8,422 cases (23) as compared to the more than 850,000 known SARS-CoV-2 cases by the end of March (and more than 3 million cases by the end of April).A considerable portion of SARS-CoV-2 cases are asymptomatic or mild, leading to an underestimation of the virus population size.Yet, early-tomid epidemic SARS-CoV, which was sampled from a more limited population in a more limited location, exhibits the most genetic diversity.
We proceeded to compare the evolutionary dynamics of SARS-CoV and SARS-CoV-2 in terms of the non-synonymous and synonymous substitution rates (dN and dS) in each gene.Nonsynonymous substitutions, as compared to synonymous substitutions, are generally more likely to result in functionally distinct variants.Therefore, dN and dS have commonly been used to model selective pressure on each gene, and were used to determine that the spike (S), Orf3a, and Orf1a genes experienced strong selective pressure in the SARS-CoV epidemic (15)(16)(17)24).Importantly, the S protein binds to host receptors and influences host specificity, while the Orf3a-encoded accessory protein facilitates the endocytosis of S (25,26).Orf3a and S have been proposed to share a co-evolutionary relationship (27,28).We sampled 50 SARS-CoV-2 genomes from each month of January, February, and March, 2020 from diverse locations in addition to Wuhan-Hu-1.
For the spike (S), Orf3a, and Orf1a genes, the dN and dS in SARS-CoV-2 is more similar to late epidemic than early-to-mid epidemic SARS-CoV (Figure 1C, Figure 2).In comparison, the highly conserved Orf1b (encodes RNA-dependent RNA polymerase RdRp and helicase Hel), which did not undergo strong positive selection in SARS-CoV (15), exhibits similarly low dN across the three CoV groups (Figure 1C, Figure 2).
In consideration that several therapies and antibodies in development target the SARS-CoV-2 S, it is important to track non-synonymous substitutions and predict the evolution of resistance.We analyzed the non-synonymous substitutions that occurred in the S of SARS-CoV and SARS-CoV-2 over the course of each epidemic.Numerous adaptive mutations that evolved in SARS-CoV S RBD have been experimentally demonstrated to enhance binding to the human ACE2 receptor and facilitate cross-species transmission, e.g., residues N479 and T487 (29,30), as well as K390, R426, D429, T431, I455, N473, F483, Q492, Y494, R495 (31); or predicted to have been positively selected, e.g.residues 239, 244, 311, 479, 778 (17) (Figure 3).In contrast, the majority of the non-synonymous substitutions in SARS-CoV-2 S are distributed across the gene at low frequency and have not been reported to confer adaptive benefit (Figure 4).Yet, the SARS-CoV-2 S has been demonstrated to bind more strongly to human ACE2 and has a superior plasma membrane fusion capacity compared to the SARS-CoV S (32,33).The only site of notable entropy in the SARS-CoV-2 S, D614G, lies outside of the RBD and is not predicted to impact the structure or function of the protein (34).Its prevalence in international COVID-19 cases has been attributed to the substitution occurring early in the pandemic leading to a founder's effect.There is no evidence of a more virulent strain of SARS-CoV-2 emerging despite passage through more than 3 million human hosts by the time of this analysis.The pairwise comparisons of dN and dS, alongside a dearth of signs of emerging adaptive mutations, suggest that by the time SARS-CoV-2 was first detected in late 2019, it was already well adapted for human transmission to an extent more similar to late epidemic than to early-tomid epidemic SARS-CoV.One possible scenario is that the SARS-CoV-2 outbreak in late 2019 resulted from a bottleneck event similar to the late epidemic SARS-CoV cases that stemmed from a single superspreader who visited Metropole Hotel in Hong Kong, China in late February, 2003 (36).In comparison to the SARS-CoV epidemic, the SARS-CoV-2 epidemic appears to be missing an early phase during which the virus would be expected to accumulate adaptive mutations for human transmission.However, if this were the origin story of SARS-CoV-2, there is a surprising absence of precursors or branches emerging from a less recent, less adapted common ancestor among humans and animals.In the case of SARS-CoV, the less human-adapted SARS-CoV progenated multiple branches of evolution in both humans and animals (Figure 1, Figure 5).In contrast, SARS-CoV-2 appeared without peer in late 2019, suggesting that there was a single introduction of the human-adapted form of the virus into the human population.This has important implications regarding the risk of SARS-CoV-2 re-emergence in the near future and the severity of its consequences.
It is important to recall that there were two SARS-CoV outbreaks in 2002-2004, each arising from separate palm civet-to-human transmission events (Figure 5): the first emerged in late 2002 and ended in August, 2003; the second arose in late 2003 from a lingering population of SARS-CoV progenitors in civets.The second outbreak was swiftly suppressed due to diligent human and animal host tracking, informed by lessons from the first outbreak (37,38).To prevent similar consecutive outbreaks of SARS-CoV-2 today, it is vital to learn from the past and implement measures to minimize the risk of additional SARS-CoV-2-like precursors adapting to and reemerging among humans.To do so, it is important to identify the route by which SARS-CoV-2 adapted for human transmission.However, there is presently little evidence to definitively support any particular scenario of SARS-CoV-2 adaptation.Did SARS-CoV-2 transmit across species into humans and circulate undetected for months prior to late 2019 while accumulating adaptive mutations?Or was SARS-CoV-2 already well adapted for humans while in bats or an intermediate species?More importantly, does this pool of human-adapted progenitor viruses still exist in animal populations?Even the possibility that a non-genetically-engineered precursor could have adapted to humans while being studied in a laboratory should be considered, regardless of how likely or unlikely (39).

What is known about possible intermediate hosts and SARS-CoV-2 species tropism?
Speculations that pangolins are the likely intermediate animal host stemmed from the discovery of a pangolin CoV that shares 95.4% S amino acid identity and six key RBD residues with SARS-CoV-2 (40).Since then, another closely related lineage of pangolin CoVs has been identified (41).
However, the unique polybasic furin cleavage site in the SARS-CoV-2 S is not found in pangolin CoVs (42), and SARS-CoV-2 is not a recent recombinant involving any of the CoVs sampled to date (41,43,44).The CoV that is most closely related to SARS-CoV-2 is RaTG13, a bat CoV that was identified at the Wuhan Institute of Virology and originally isolated from the Yunnan Province of China (45).RaTG13 shares 96.2% genome identity with the Wuhan-Hu-1 SARS-CoV-2 isolate.
In comparison, the most closely related pangolin CoV MP789 shares only 84.1% and 84.0%genome identity with Wuhan-Hu-1 and RaTG13, respectively.No evidence as yet points to the adaptation of SARS-CoV-2 for human infection in pangolins or the transmission of SARS-CoV-2 from pangolins to humans.
In addition, it is plausible for SARS-CoV-2 S to have evolved its broad species tropism naturally in bats or a wide range of intermediate species.The SARS-CoV-2 S is predicted to bind to ACE2 from potentially more than 100 diverse species (46)(47)(48), and was demonstrated to bind more strongly than the SARS-CoV S to ACE2 from both bat and human (33).The S of RaTG13 is also capable of binding to human ACE2 although the virus does not infect humans (49).Similarly, the S of human MERS-CoV was found to bind to receptors from humans, camels, and bats, and could adapt to semi-permissive host receptors within three passages in cell culture (50).Therefore, although no sampled bat CoVs have been found to possess a SARS-CoV-2-like S RBD, these findings collectively suggest that some CoVs in nature are evolving S that can bind at an optimal level to the same receptor across diverse species (43), potentially by interfacing with highly conserved parts of the receptor.As other groups have recommended, CoV sampling from more species -to avoid bias stemming from the focused scrutiny of Malayan pangolins -will provide us with a better grasp of the range of species that harbor CoVs with similar RBDs to SARS-CoV-2, as well as the natural diversity of bat CoVs (43).
There has been considerable debate among scientists and the public on whether SARS-CoV-2 originated from the Wuhan Huanan seafood market (2).According to the Chinese CDC's website, accessed on April 27, 2020, SARS-CoV-2 was detected in environmental samples at the Huanan seafood market, and the Chinese CDC suggested that the virus originated from animals sold there (51).However, phylogenetic tracking suggests that SARS-CoV-2 had been imported into the market by humans (52).To look for clues regarding an intermediate animal host, we turned to samples collected from the market in January, 2020.In contrast to the thorough and swift animal sampling executed in response to the 2002-2004 SARS-CoV outbreaks to identify intermediate hosts (37,53), no animal sampling prior to the shut down and sanitization of the market was reported.Details about the sampling are sparse: 515 out of 585 samples are environmental samples, and the other 70 were collected from wild animal vendors; it is unclear whether the latter samples are from animals, humans, and/or the environment.Only 4 of the samples, which were all environmental samples from the market, have passable coverage of SARS-CoV-2 genomes for analysis.Even so, these contain ambiguous bases that confound genetic clustering with human SARS-CoV-2 genomes.Nonetheless, the market samples did not form a separate cluster from the human SARS-CoV-2 genomes.We compared the market samples to the human Wuhan-Hu-1 isolate, and discovered >99.9% genome identity, even at the S gene that has exhibited evidence of evolution in previous CoV zoonoses.In the SARS-CoV outbreaks, >99.9% genome or S identity was only observed among isolates collected within a narrow window of time from within the same species (Figure 5) (15).The human and civet isolates of the 2003/2004 outbreak, which were collected most closely in time and at the site of cross-species transmission, shared only up to 99.79% S identity (Figure 5) (37).It is therefore unlikely for the January market isolates, which all share 99.9-100% genome and S identity with a December human SARS-CoV-2, to have originated from an intermediate animal host, particularly if the most recent common ancestor jumped into humans as early as October, 2019 (54,55).The SARS-CoV-2 genomes in the market samples were most likely from humans infected with SARS-CoV-2 who were vendors or visitors at the market.If intermediate animal hosts were present at the market, no evidence remains in the genetic samples available.

Conclusion
The lack of definitive evidence to verify or rule out adaptation in an intermediate host species, humans, or a laboratory, means that we need to take precautions against each scenario to prevent re-emergence.We would like to advocate for measured and effective approaches to identify any lingering population(s) of SARS-CoV-2 progenitor virus, particularly if these are similarly adept at human transmission.The response to the first SARS-CoV outbreak deployed the following strategies that were key to detecting SARS-CoV adaptation to humans and cross-species transmission, and could be re-applied in today's outbreak to swiftly eliminate progenitor pools: (i) Sampling animals from markets, farms, and wild populations for SARS-CoV-2-like viruses (38).
(ii) Checking human samples banked months before late 2019 for SARS-CoV-2-like viruses or SARS-CoV-2-reactive antibodies to detect precursors circulating in humans (56).In addition, sequencing more SARS-CoV-2 isolates from Wuhan, particularly early isolates if they still exist, could identify branches originating from a less human-adapted progenitor as was seen in the 2003 SARS-CoV outbreak.It would be curious if no precursors or branches of SARS-CoV-2 evolution are discovered in humans or animals.(iii) Evaluating the over-or underrepresentation of food handlers and animal traders among the index cases to determine if SARS-CoV-2 precursors may have been circulating in the animal trading community (57).While these investigations are conducted, it would be safer to more extensively limit human activity that leads to frequent or prolonged contact with wild animals and their habitats.

Figure 2 .
Figure 2. Pairwise dN and dS in each indicated gene in early-to-mid epidemic SARS-CoV, late epidemic SARS-CoV, and SARS-CoV-2.Sizes of each encoded protein in SARS-CoV-2 are provided.Importantly, the number of points per plot is influenced by the number of genomes analyzed (11 early-to-mid, 32 late epidemic SARS-CoV, 151 SARS-CoV-2) and cannot be used to infer the quantity of substitutions; for that, please refer to Figures 1, 3, and 4. The slopes estimated under robust regression using M-estimation (35) are shown by the dashed lines.

Figure 3 .
Figure 3. Spike protein evolution in SARS-CoV.Only residues that vary among the isolates are shown.The more recent variants at each position in each epidemic phase are emphasized with black outlines.Residues 75, 239, 244, 311, 778, 1148, and 1163 have been predicted to be positively selected during the interspecies epidemic (17); civet variants not shown.

Figure 4 .
Figure 4. Spike protein evolution in SARS-CoV-2.The 151 isolates in analysis include Wuhan-Hu-1 (EPI_ISL_402125, bottom row) and 50 randomly selected genomes per month between January through March, 2020.Only residues that vary among the isolates are shown.

Figure 5 .
Figure 5.Comparison of genome and S gene identity across human, animal, and environmental samples.Dotted lines indicate the samples under comparison for which genome or S gene identity are shown.Due to ambiguous bases and lack of data regarding Huananseafood market environmental sample quality, it is difficult to assess shared identity with high accuracy.This was particularly true for sample EPI_ISL_408512, which had frequent stretches of undetermined bases in its genome sequence.Shared identity above 99.9% is bolded in the left panel depicting SARS-CoV isolates.Only isolates from within the same species in a narrow window of time shared 99.9% S gene identity, i.e., only civets sampled at the same time shared >99.9% S gene identity; the civet isolates from May, 2003 (SZ3 and SZ16) and January, 2004 (PC4-13, PC4-136, and PC4-227) shared only up to 99.71% genome and 99.42% gene S identity.