Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of COVID-19, which escalated into a global pandemic in 2020. SARS-CoV-2 is an enveloped, positive-sense, single-stranded RNA virus of the genus Betacoronavirus1. In addition to SARS-CoV, which was responsible for the 2002–2004 SARS epidemic and which shares 79% nucleotide sequence identity with SARS-CoV-2 (ref.1), the genus also includes human coronavirus (HCoV)-OC43, HCoV-HKU1 and Middle East respiratory syndrome coronavirus (MERS-CoV). SARS-CoV-2 relies on its obligate receptor, angiotensin-converting enzyme 2 (ACE2), to enter cells2,3,4,5,6; ACE2 was originally identified in 2003 as the receptor for SARS-CoV7. ACE2 is also the receptor for alphacoronavirus HCoV-NL63, which, together with another alphacoronavirus, HCoV-229E, and betacoronaviruses HCoV-OC43 and HCoV-HKU1, is a known causative agent of mild upper respiratory tract infections8.

The coronavirus virion is made up of the nucleocapsid (N), membrane (M), envelope (E) and spike (S) proteins, which are structural proteins (Fig. 1). The entry steps of the viral particles — encompassing attachment to the host cell membrane and fusion — are mediated by the S glycoprotein. S protein is assembled as a homotrimer and is inserted in multiple copies into the membrane of the virion giving it its crown-like appearance. Entry glycoproteins of many viruses, including HIV-1, Ebola virus and avian influenza viruses, are cleaved into two subunits — extracellular and transmembrane — in the infected cells (that is, the cleavage occurs before release of the virus from the cell that produces it). Similarly, the S protein of some coronaviruses is cleaved into S1 and S2 subunits during their biosynthesis in the infected cells, while the S protein of other coronaviruses is cleaved only when they reach the next target cell. SARS-CoV-2, like MERS-CoV, belongs to the first category: its S protein is cleaved by proprotein convertases such as furin in the virus-producer cells9,10 (Fig. 1). Therefore, the S protein on the mature virion consists of two non-covalently associated subunits: the S1 subunit binds ACE2 and the S2 subunit anchors the S protein to the membrane. The S2 subunit also includes a fusion peptide and other machinery necessary to mediate membrane fusion upon infection of a new cell11.

Fig. 1: Coronavirus structure and maturation.
figure 1

Infection by a coronavirus induces in the perinuclear area the formation of new membranous structures of various sizes and shapes, which as a whole are referred to as ‘replication organelles’225,226,227,228,229. These structures — observed by electron microscopy in cells infected with mouse hepatitis virus, severe acute respiratory syndrome coronavirus or Middle East respiratory syndrome coronavirus and typically surrounded by double membranes — likely originate from the endoplasmic reticulum (ER) and house viral replication complexes, sequestering them from cellular innate immune molecules. Viral structural proteins and genomic RNA synthesized at the replication site are then translocated through an unknown mechanism to the ER–Golgi intermediate compartment (ERGIC), where virus assembly and budding occur228,230. Only four viral proteins — the spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins — are incorporated into the virion. While the N protein bound to the viral genomic RNA is packed inside the virion, the structural proteins S, E and M are incorporated in the virion membrane. The S protein, assembled as a trimer, giving the appearance of a crown (corona), mediates major entry steps, including receptor binding and membrane fusion. During biosynthesis and maturation in the infected cell, the S protein is cleaved by furin or furin-like proprotein convertase in the Golgi apparatus into the S1 and S2 subunits, which remain associated9,10. The S protein on the virus therefore consists of two non-covalently associated subunits with different functions: in the new target cell, the S1 subunit binds the receptor and the S2 subunit anchors the S protein to the virion membrane and mediates membrane fusion. The E and M proteins contribute to virus assembly and budding through the interactions with other viral proteins231,232. Assembled viruses bud into the ERGIC lumen and reach the plasma membrane via the secretory pathway, where they are released into the extracellular space after virus-containing vesicles fuse with the plasma membrane. FP, fusion peptide.

Receptor engagement by viral entry glycoproteins, typically with other triggers, induces dramatic conformational changes in both subunits that bring the viral and cellular membranes together, ultimately creating a fusion pore that allows the viral genome to reach the cell cytoplasm. For SARS-CoV-2, one such trigger is the cleavage of an additional site internal to the S2 subunit, termed the ‘S2′ site’. ACE2 engagement by the virus exposes the S2′ site. S2′ site cleavage — by transmembrane protease, serine 2 (TMPRSS2)12,13,14 at the cell surface or by cathepsin L15,16 in the endosomal compartment following ACE2-mediated endocytosis17,18 — releases the fusion peptide, initiating fusion pore formation (Fig. 1). Because the viral genome must access the cytoplasm and because it can do so only as this pore expands and the viral and cell membranes are seamlessly combined, every step of this process is important.

This Review provides the structural and cell biological foundations for understanding the multistep SARS-CoV-2 entry process. Much of the SARS-CoV-2 entry process was informed by prior studies of SARS-CoV, which shares a similar entry process, and combined interpretation of studies focused on infectious SARS-CoV-2, pseudoviruses, virus-like particles and structural analyses of the S protein. We also discuss S protein evolution and mutation supporting adaptation to the human host and provide an overview of current vaccine approaches and therapeutic strategies targeting SARS-CoV-2 entry mechanisms.

Coronavirus S protein

The monomeric S protein of SARS-CoV-2 is a type I membrane protein with 66 N-linked glycans per S protein trimer19 and belongs to the so-called class I viral fusion proteins exemplified by the influenza virus haemagglutinin protein. Accordingly, S proteins of coronaviruses and haemagglutinin share structural organization and conformational transition that promotes membrane fusion20.

Overall structure of S trimer

Structural biology of the SARS-CoV-2 S protein has advanced very rapidly since the initial outbreak of COVID-19. Structures of S protein fragments (Fig. 2a) derived from the Wuhan-Hu-1 strain, including the ectodomain stabilized in its prefusion conformation4,21,22, receptor binding domain (RBD)–ACE2 complexes2,5,6,23 and segments of the S2 subunit in the postfusion conformation24, were determined by either cryo-electron microscopy (cryo-EM) or X-ray crystallography. They were followed by structures of detergent-solubilized, full-length S proteins25,26,27 (Fig. 2bd), as well as those of the membrane-bound, intact S proteins on the surface of the virion, which were obtained by cryo-electron tomography using chemically inactivated SARS-CoV-2 preparations of both the Wuhan-Hu-1 strain and a B.1 variant carrying the D614G mutation28,29,30,31 (one of the first known mutations of the S protein, which was shown to prevail in other variants) (Table 1). These independently obtained structures are largely in agreement with each other, revealing both the overall architecture and atomic details of the protein. In the prefusion conformation, the S1 subunit folds into four domains — the amino-terminal (N-terminal) domain (NTD), the RBD and two carboxy-terminal (C-terminal) domains (CTD1 and CTD2) — and wraps around the prefusion S2 subunit, which forms a central helical bundle with heptad repeat 1 (HR1) bending back towards the viral membrane (Fig. 2a). The three RBDs of the S trimer form the apex of the S protein, which samples two distinct conformations: ‘up’ for a receptor-accessible state and ‘down’ for a receptor-inaccessible state (Fig. 2bd). In the postfusion state, conformational changes lead to S1 subunit disengagement from S2, and likely its dissociation from S2, while S2 undergoes a cascade of refolding events to form a stable and elongated trimer (Fig. 2e).

Fig. 2: Structures of the S protein, its subdomains and interaction between the RBD and ACE2.
figure 2

a | The full-length SARS-CoV-2 spike (S) protein. Selected subdomain structures are shown in ribbon diagrams below the schematic. Three distinct receptor-binding domain (RBD) antigenic sites (AS-1, AS-2 and AS-3)25 are indicated in the RBD ribbon diagram. be | Cryo-electron microscopy structures of detergent-solubilized full-length S trimers from the Wuhan-Hu-1 reference strain in the conformation with three RBDs down (Protein Data Bank (PDB) ID 6XR8; part b), the D614G variant in the conformation with three RBDs down (PDB ID 7KRQ; part c), the D614G variant in the conformation with one RBD up (PDB ID 7KRR; part d) and the postfusion structure of the Wuhan-Hu-1 S2 (PDB ID 6XRA; part e). A dramatic conformational change in heptad repeat 1 (HR1) propels insertion of the fusion peptide into the target membrane. In one protomer, in the foreground, subdomains are coloured according to part a, and the other two protomers, in the background, are shown in grey or light grey. f | The interface between angiotensin-converting enzyme 2 (ACE2; cyan) and bound RBD (red) in the ribbon diagram (PDB ID 6M0J). The receptor-binding motif (RBM) is shown in orange. In the inset, 20 residues of ACE2 and 17 residues from the RBD forming networks of hydrophilic side chain interactions are shown in the stick model. Residues in the RBD that are mutated in the variants of concerns (Table 1) and their interacting residues in ACE2 are highlighted. CH, central helix; CT, cytoplasmic tail; CTD1, carboxy-terminal domain 1; CTD2, carboxy-terminal domain 2; FP, fusion peptide; FPPR, fusion-peptide proximal region; HR2, heptad repeat 2; NTD, amino-terminal domain; S1/S2, furin-cleavage site; S2′, S2′ cleavage site; TM, transmembrane anchor.

Table 1 Spike protein mutations in SARS-CoV-2 variants

The NTD

The NTD is formed mainly by four stacked β-sheets and a number of connecting flexible loops, bearing several N-linked glycans (Fig. 2a). Whether the NTD plays any functional role in SARS-CoV-2 entry remains unknown. Similar NTDs appear to facilitate the binding of sugar moieties acting as attachment factors by HCoV-OC43 or bovine coronavirus, or the binding to the protein receptor by murine hepatitis virus32,33. The NTD of transmissible gastroenteritis virus and MERS-CoV may facilitate prefusion-to-postfusion transition of the S protein34,35. Notably, the NTD of SARS-CoV-2 is targeted by some potent neutralizing antibodies (exemplified by 4A8 and 4-8)36,37, suggesting that it may be functionally important or at least located in the vicinity of other functionally critical regions, such as the RBD. Interestingly, the newly emerged variants often have mutations and deletions in the NTD, rendering them resistant to these neutralizing antibodies38,39,40,41,42,43,44,45,46,47,48,49.

The RBD

The RBD has two subdomains (Fig. 2a): a core structure formed by a five-stranded antiparallel β-sheet covered with short connecting α-helices on both sides, and an extended loop, named the ‘receptor-binding motif’ (RBM), which wraps around one edge of the core structure and makes all the contacts with ACE2 (refs2,5,50). In the down conformation of the prefusion S trimer, a single RBD packs against the central helical bundle of S2 and the two other RBDs, while leaning on CTD1 from the same protomer and the NTD from a neighbouring protomer (Fig. 2b,c). This configuration partially occludes the RBM, making it inaccessible to the receptor ACE2. When the RBD flips to the up conformation and fully exposes the RBM (Fig. 2d), the adjacent CTD1 and NTD also shift away to accommodate the RBD movement. Thus, the transition of the RBD from the down conformation to the up conformation is a critical step before the receptor can fully engage, and the one-RBD-up conformation appears to be a stable intermediate state. RBDs from other sarbecoviruses can be categorized into two distinct groups: human ACE2 (hACE2) binders and non-hACE2 binders51. They are likely to fold similarly because of the high degree of sequence identity in the core subdomain, but variations at the ACE2-binding interface in the RBM of the non-hACE2 binders can explain their inability to use hACE2 as an entry receptor51.

Importantly, the RBD is also the primary target of the neutralizing antibodies elicited by natural infection or vaccination (Box 1). There are three major non-overlapping antigenic sites on the RBD52 (Fig. 2a). Most RBD-directed neutralizing antibodies, including REGN10933 (casirivimab) and C144, recognize the tip region of the RBM, and they often show remarkable potency blocking ACE2 engagement by direct competition53,54. Another site lies on the exposed surface of the RBD when it is in the down configuration, and is targeted by antibodies such as REGN10987 (imdevimab) and S309 (refs53,55). The third site, often referred to as a ‘cryptic supersite’, targeted by CR3022 (ref.56), is on the buried side of the RBD and is fully accessible only when the domain is in the up conformation. Together, these sites comprise the epitopes of majority of neutralizing antibodies to SARS-CoV-2, as discussed later.

The C-terminal domains

C-terminal domains are formed primarily by β-structures. The RBD appears to be an insertion between two antiparallel β-strands in CTD1, and CTD1 can also be viewed as an insertion between two antiparallel β-strands in CTD2 (Fig. 2a). Thus, a continuous strand (residues 306–330) runs through both CTD1 and CTD2, connecting the NTD and the RBD on its two ends. CTD1, packed underneath the RBD, needs to rotate outwards with the RBD in the flipping-up transition into the receptor-accessible state. A structural element in S2, named the ‘fusion-peptide proximal region’ (FPPR), abutting the opposite side of CTD1 from the RBD, appears to help clamp down the RBD and stabilize the closed conformation of the S trimer25,27. Therefore, CTD1 appears to be a structural relay between the RBD and the FPPR: it senses the changes in the RBD and the FPPR, which are located on either side of CTD1.

CTD2 is formed by two stacked β-sheets, each containing four strands, with a fifth strand in one sheet contributed by the connecting strand between the NTD and the RBD (Fig. 2a). In the other sheet, an interstrand loop contains the furin-cleavage site at the S1–S2 boundary, and one strand is the N-terminal segment of S2. A structural element, designated the ‘630 loop’27, in CTD2 is largely disordered in the S trimer of the Wuhan-Hu-1 strain (Fig. 2b) but is ordered in the presence of the D614G mutation (Fig. 2c) and plays an important role in stabilizing the S trimer, as discussed later. Thus, CTD2 is another key component for the structural rearrangements of the S protein required for membrane fusion.

The S2 subunit

In the prefusion conformation (Fig. 2a), the S2 subunit adopts a conformation with most of the polypeptide chain packed around a three-stranded coiled coil formed by the central helices4,21,25. In particular, the coiled coil and part of HR1 together with another helix formed by residues 758–784 assemble into a nine-helix bundle, which likely contributes to overall stability of the S protein. The second half of HR1 and the so-called connector domain, which links the central helix and C-terminal heptad repeat 2 (HR2)57, surround the bottom of the coiled coil, partially protecting the fusion peptide. The FPPR, directly connected to the fusion peptide, tucks underneath CTD1 from the neighbouring protomer and also makes contacts with CTD2 of the same protomer (Fig. 2bd). The C-terminal segments, including HR2, the transmembrane domain and the cytoplasmic tail, are largely disordered in the prefusion S trimer structures, except for some low-resolution density in certain reconstructions27,28.

In the postfusion S2 structure (Fig. 2e), HR1 and the central helix form an unusually long central three-stranded coiled coil (~180 Å)25, almost identical to the S2 structures of SARS-CoV and mouse hepatitis virus57,58 The N-terminal region of HR2 adopts a one-turn helical conformation and packs against the groove of the HR1 coiled coil; the C-terminal region of HR2 forms a longer helix that makes up a six-helix bundle structure with the rest of the HR1 coiled coil, forming a stable rigid postfusion structure. The postfusion S2 unexpectedly aligns N-linked glycans along the long axis, with four of them spaced regularly on the same side of the trimer25. Comparison of the prefusion and postfusion states of S2 suggests that HR1 undergoes a dramatic refolding transition that can insert the fusion peptide into the target cell membrane, similar to a mechanism proposed for other coronaviruses57,58.

SARS-CoV-2 receptor ACE2

ACE2 is an 805-amino acid carboxypeptidase that removes a single amino acid from the C terminus of its substrates. It consists of a single metallopeptidase domain in the first half that contains the HEXXH zinc-binding motif at the catalytic site and shares homology with angiotensin-converting enzyme (ACE), and a C-terminal half including the transmembrane domain that shares homology with collectrin59,60. Integral to the renin–angiotensin–aldosterone system, the primary role of ACE2 in normal physiology is to convert angiotensin I and angiotensin II, generated by renin and ACE, into angiotensin-(1–9) and angiotensin-(1–7), respectively60,61,62. Like ACE2 for SARS-CoV-2, the obligate receptors for several other coronaviruses are also proteases (DPP4 (also known as CD26) for MERS-CoV and APN for HCoV-229E)63,64,65. Interestingly, the catalytic activity of these receptors neither is necessary for their receptor function nor overlaps with the virus-binding site. Consistently, small-molecule inhibitors of the catalytic site of ACE2 do not block infection by SARS-CoV66. Nonetheless, studies have shown that downregulation of ACE2 resulting from SARS-CoV infection contributes to disease severity by perturbing the renin–angiotensin–aldosterone system67. The same is presumed for SARS-CoV-2, given that SARS-CoV and SARS-CoV-2 engage ACE2 in the same manner.

Interaction between the S protein and the receptor

The interface between the RBD of the S protein and ACE2 is formed primarily by a gently concave outer surface of the extended RBM and the N-terminal helix of the receptor2,5 (Fig. 2f). There are 20 residues of ACE2 and 17 residues from the RBD forming networks of hydrophilic, side chain interactions. Of the latter, residues Lys417, Leu452, Glu484 and Asn501 are those that have been mutated in the newly emerged SARS-CoV-2 variants of concern44,45,46,47,48,49 (Table 1), and may confer both increased receptor binding and/or antibody resistance. As expected, the mode of ACE2 binding is almost identical between the SARS-CoV and SARS-CoV-2 RBDs2,5,50. Many interacting residues are identical or have only conservative substitutions between the two RBDs. Several positions with non-conservative substitutions can still maintain similar interactions. The only extra ACE2-interacting residue in SARS-CoV-2 is Lys417, which forms a salt bridge with Asp30 of ACE2 (Fig. 2f). Interestingly, despite the higher affinity of the SARS-CoV-2 RBD for ACE2 compared with the SARS-CoV RBD, the SARS-CoV-2 S protein trimer does not bind ACE2 as efficiently as does the SARS-CoV S protein trimer10,21. This apparent paradox, a consequence of the inherent instability of the original SARS-CoV-2 (Wuhan-Hu-1) S protein, is further discussed together with the D614G mutation later. Other shared features of the RBD–ACE2 interfaces between the two viruses include the role of multiple tyrosine residues forming hydrogen bonds at the interface and a common disulfide-bonded RBD core2,5,50.

The structures of the soluble SARS-CoV-2 S trimer in complex with monomeric ACE2 confirm that the receptor interacts with the RBD in its up conformation68,69, which is consistent with previous findings with ACE2 binding to the SARS-CoV S protein70. While the NTD shifts outwards slightly, the S2 subunit remains largely unchanged upon ACE2 binding.

ACE2 expression and the putative impact of comorbidities

Analysis of animal models and human transcriptome databases suggests that ACE2 expression in the lower lung is relatively limited to type II alveolar cells, but is higher in the upper bronchial epithelia and much higher in the nasal epithelium, especially in the ciliated cells71,72,73,74,75,76. This difference in ACE2 expression level in the respiratory tract is mirrored by the SARS-CoV-2 infection gradient, with nasal ciliated cells being primary targets for SARS-CoV-2 replication in the early stage of infection71,75. Despite the respiratory route being dominant in SARS-CoV-2 infection, the highest levels of ACE2 expression are found in the small intestine, testis, kidney, heart muscle, colon and thyroid gland73,77. Cardiac infection by SARS-CoV-2 was frequently found in autopsy cases78, and the presence of ACE2 in colon and kidney cells has been suggested as an explanation for gastrointestinal and renal complications of SARS-CoV-2 infection. ACE2 expression in the gastrointestinal tract is consistent with the observation that many coronaviruses, including sarbecoviruses, are transmitted via the faecal–oral route as well as the respiratory route. Inflammatory cytokines released in severe COVID-19, such as IL-1β and type I and type III interferons, can upregulate ACE2 expression, potentially establishing a positive-feedback loop for viral replication71,79,80,81. However, their effect on disease severity is unknown. Moreover, a recent report indicates that interferon-stimulated expression of ACE2 yields a truncated isoform that cannot support SARS-CoV-2 binding82.

Several health comorbidities, including hypertension, hyperlipidaemia, diabetes, chronic pulmonary diseases, old age and smoking, are risk factors for COVID-19. A number of these factors have been proposed to modulate ACE2 expression. Differences in ACE2 levels associated with age and sex are controversial; advanced age has been both positively83,84 and negatively85 correlated with ACE2 expression, while another independent study found no significant effect80. Male sex has similarly been associated with higher ACE2 expression in some studies84 but not others80,85. Widely accepted epidemiological data indicate that a history of smoking tobacco increases the risk of severe disease86,87. However, whether smoking causes upregulation of ACE2 and is associated with enhanced infection is unclear. Many biochemical studies have shown that ACE2 expression is increased in lung tissue samples from smokers and patients with chronic obstructive pulmonary disease80,88,89 and also in mouse lungs exposed to cigarette smoking80. However, other studies indicated that ACE2 expression in the airway is not affected by smoking76, and that smoking is not a significant epidemiological risk factor for COVID-19 (refs90,91). A limited number of studies showed that diabetes mellitus is associated with increased ACE2 expression in human sputum cells and in murine kidney tissue77,92 and that cystic fibrosis is associated with ACE2 (and TMPRSS2) upregulation in human lung specimens71. It was initially speculated that common antihypertensive drugs such as ACE inhibitors and angiotensin receptor blockers could upregulate ACE2 expression93, raising concern that use of these agents could increase disease severity94,95. Although this idea is consistent with the association between hypertension and severer COVID-19 outcomes, recent studies have refuted this hypothesis76,91,96,97. Thus, to date, no comorbidity has been unambiguously associated with ACE2 expression level.

ACE2 orthologues in potential reservoir species

Similarities in virus-binding hotspots of ACE2 and its orthologues in reservoir animal species contribute to the zoonotic potential of sarbecoviruses. Horseshoe bats of the genus Rhinolophus are the presumed long-term reservoirs of SARS-CoV and SARS-CoV-2, but it remains unclear exactly which species in the genus serves as the most recent bat host of each virus98. A mammalian intermediate host is thought to facilitate transmission of both viruses to humans. Accordingly, SARS-CoV has been isolated from palm civets (Paguma larvata) and raccoon dogs (Nyctereutes procyonoides)99, and viruses close to SARS-CoV-2 were isolated from pangolins (Manis javanica)100,101,102. The significance of receptor adaptation during host-jumping from an intermediate host to humans is underscored by evidence that two independent zoonotic events gave rise to the two SARS outbreaks between 2002 and 2004 with markedly different disease severity. While the 2002–2003 SARS epidemic resulted in nearly 10% mortality, the outbreak in 2003–2004 resulted in a much milder disease103. Analysis of the S protein of the virus isolates from both outbreaks revealed that reduced disease severity of the second outbreak was in part due to insufficient adaptation of the S protein to hACE2, as shown by lower affinity66. This affinity difference was mapped to a single residue in the RBD, foreshadowing the impact of RBD mutations in SARS-CoV-2 on transmissibility and disease severity.

Viruses closely related to SARS-CoV-2 have been isolated from the horseshoe bat, Rhinolophus affinis, including SARS-like betacoronavirus RaTG13, which shares 96% nucleotide identity with SARS-CoV-2 (ref.104). Investigators evaluated the receptor function of a wide array of bat ACE2 orthologues and found that only 9 of 46 supported the entry by SARS-CoV-2 pseudoviruses105. Replacement of five residues in hACE2 with residues originating from the SARS-CoV-2-competent ACE2 orthologues of Rhinolophus spp. increased its binding affinity for the SARS-CoV-2 RBD98. Although not fully conclusive, these data are consistent with a Rhinolophus origin of SARS-CoV-2. Phylogenetic analysis and confirmatory infection studies have shown that ACE2 orthologues from a wide range of mammals, including domestic animals and livestock, support SARS-CoV-2 infection, suggesting that many animals have the potential to act as intermediate hosts as well106.

Additional host entry factors

In addition to ACE2, several molecules have been suggested to serve as alternative receptors for SARS-CoV and SARS-CoV-2. These include C-type lectins, DC-SIGN and L-SIGN107,108,109. Lectins are involved in the recognition of a broad range of pathogens110 and mediate intercellular adhesion111. They bind a wide range of viruses by recognizing the glycans on the virion surface, often promoting viral entry by allowing the virus to attach to the target cell. Likewise, TIM1 and AXL were also suggested to be alternative SARS-CoV-2 receptors112,113. As members of phosphatidylserine receptor families, TIM and TAM, respectively, they enhance the entry of a wide range of enveloped viruses by binding to phosphatidylserine on the virion membrane114,115,116. Although lectins and phosphatidylserine receptors increase viral entry, they are non-specific and do not support efficient infection by SARS-CoV or SARS-CoV-2 in the absence of ACE2 (refs115,117), and thus ‘attachment factors’ would better describe those molecules.

Similarly, CD147, a transmembrane glycoprotein expressed ubiquitously in epithelial and immune cells, was proposed to be an alternative receptor for SARS-CoV and SARS-CoV-2 infection118,119. Although a modest increase in viral entry was observed with higher levels of CD147, and although its upregulation was observed in obesity and diabetes120, which are potential risk factors for severe COVID-19, the role of CD147 in SARS-CoV-2 infection has been disputed on the basis of the inability of CD147 to bind the S protein121. Two groups identified neuropilin 1 (NRP1) as a host factor for SARS-CoV-2 (refs122,123). Although NRP1 is expressed in olfactory and respiratory epithelial cells122, its expression is low in ciliated cells, the primary target cells for SARS-CoV-2 in the airway, while it is high in goblet cells, which are not susceptible to SARS-CoV-2 (refs71,75). Nonetheless, NRP1 was shown to enhance TMPRSS2-mediated entry (see the next section) of wild-type SARS-CoV-2 but not that of mutant virus that lacks the multibasic furin-cleavage site122. NRP1 was also shown to bind S1 through the multibasic furin-cleavage site and to promote S1 shedding and to expose the S2′ site to TMPRSS2 (ref.124). Recently, the structure of ACE2 in complex with a neutral amino acid transporter, B0AT1, was analysed by cryo-EM in the presence and absence of the SARS-CoV-2 S protein23. ACE2 was previously shown to be essential for B0AT1 expression in the small intestine125. While B0AT1 is expressed in the gastrointestinal tract and kidney, it is not present in the lung. However, a B0AT1 homologue in the lung might contribute to SARS-CoV-2 infection. Additional studies are warranted to confirm the role of NRP1 and B0AT1 in SARS-CoV-2 infection.

SARS-CoV-2 entry process

Viral entry proteins must fold into an energetically stable state, and yet must undergo a subsequent conformational transition that provides sufficient energy to overcome the natural repulsion between the virus and the cellular membranes. Therefore, the S protein transitions to a so-called metastable state, a state prone to transformation to a lower-energy state, before membrane fusion. Like in SARS-CoV and other coronaviruses, this S protein transition is enabled through two proteolytic cleavage steps following ACE2 engagement. The first of these is localized to the S1–S2 boundary, and the second is localized to the S2′ site in the S2 subunit. For SARS-CoV, both sites are cleaved by proteases in the target cell. In the case of SARS-CoV-2, the S1–S2 boundary is cleaved by furin in the virus-producer cell, whereas the S2′ site cleavage still requires target-cell proteases. Cell entry by both viruses is therefore dependent on the target-cell proteases, and TMPRSS2 and cathepsin L are the two major proteases involved in S protein activation. As TMPRSS2 is present at the cell surface, TMPRSS2-mediated S protein activation occurs at the plasma membrane, whereas cathepsin-mediated activation occurs in the endolysosome (Fig. 3).

Fig. 3: Two distinct SARS-CoV-2 entry pathways.
figure 3

Two spike (S) protein cleavage events are typically necessary for the coronavirus entry process: one at the junction of the S1 and S2 subunits and the other at the S2′ site, internal to the S2 subunit. In the case of SARS-CoV-2, the polybasic sequence at the S1–S2 boundary is cleaved during virus maturation in an infected cell, but the S2′ site is cleaved at the target cell following angiotensin-converting enzyme 2 (ACE2) binding. Virus binding to ACE2 (step 1) induces conformational changes in the S1 subunit and exposes the S2′ cleavage site in the S2 subunit. Depending on the entry route taken by SARS-CoV-2, the S2′ site is cleaved by different proteases. Left: If the target cell expresses insufficient transmembrane protease, serine 2 (TMPRSS2) or if a virus–ACE2 complex does not encounter TMPRSS2, the virus–ACE2 complex is internalized via clathrin-mediated endocytosis (step 2) into the endolysosomes, where S2′ cleavage is performed by cathepsins, which require an acidic environment for their activity (steps 3 and 4). Right: In the presence of TMPRSS2, S2′ cleavage occurs at the cell surface (step 2). In both entry pathways, cleavage of the S2′ site exposes the fusion peptide (FP) and dissociation of S1 from S2 induces dramatic conformational changes in the S2 subunit, especially in heptad repeat 1, propelling the fusion peptide forward into the target membrane, initiating membrane fusion (step 5 on the left and step 3 on the right). Fusion between viral and cellular membranes forms a fusion pore through which viral RNA is released into the host cell cytoplasm for uncoating and replication (step 6 on the left and step 4 on the right). Several agents disrupt interaction between the S protein and ACE2: ACE2 mimetics, therapeutic monoclonal antibodies targeting the neutralizing epitopes on the S protein and antibodies elicited by vaccination block virus binding to ACE2 and thus inhibit both entry pathways. By contrast, strategies targeting post-receptor-binding steps differ between the two pathways. Being a serine protease inhibitor, camostat mesylate restricts the TMPRSS2-mediated entry pathway. Hydroxychloroquine and chloroquine block endosomal acidification, which is necessary for cathepsin activity, and thus restrict the cathepsin-mediated entry pathway.

Cleavage of the S protein S1–S2 boundary by furin

The presence of a multibasic site (Arg-Arg-Ala-Arg) located at the S1–S2 junction, which is cleaved by furin (Fig. 1), distinguishes SARS-CoV-2 from SARS-CoV and all other known sarbecoviruses whose S protein is not cleaved by furin-like proteases during virus maturation in the infected cell. Cleavage of the S1–S2 boundary is a prerequisite for the cleavage of the S2′ site126, and both cleavage events are essential to initiate the membrane-fusion process. The entry glycoproteins of many viruses, including HIV-1 and avian influenza viruses, are cleaved by furin, but such cleavage does not destabilize the entry glycoprotein; the two subunits are associated via a stable interaction. By contrast, the S1 subunit of human SARS-CoV-2 Wuhan-Hu-1 strain is easily shed from the S2 subunit and thereupon assumes a premature postfusion conformation of the S2 trimer, a non-functional form of the S protein (Fig. 2e). This perplexing observation suggests that the acquisition of a furin-cleavage site by SARS-CoV-2 may have been a recent event. While it would have been relatively straightforward for the virus to have eliminated this furin-cleavage site and depend on the target-cell proteases as does SARS-CoV, SARS-CoV-2 rather acquired a different mutation, D614G, to stabilize the S protein and slow S1 shedding27,127. This steadfast retention of the destabilizing furin site indicates that this site is important for virus fitness in human hosts. Indeed, the furin site was recently shown to be an important determinant for SARS-CoV-2 transmission among co-housed ferrets128.

Role of TMPRSS2 in viral entry

After the S1–S2 boundary is cleaved, the S2′ site must also be cleaved to fully activate the fusion process either by TMPRSS2 on the cell surface or by cathepsins in the endosomes (Fig. 3). TMPRSS2 is a type II transmembrane protein with serine protease activity whose major physiological role and substrate specificity are not well defined. Nonetheless, its role in respiratory virus infection, especially for influenza viruses129,130,131 and SARS coronaviruses12,13,14, is well established. TMPRSS2 is found in the gastrointestinal, respiratory and urogenital epithelium132,133. Among these tissues, three major cell types co-express TMPRSS2 and ACE2 — type II pneumocytes, ileal absorptive enterocytes and nasal goblet secretory cells81 — although other studies showed that nasal ciliated cells but not nasal goblet cells express ACE2 (refs71,76) or that both cells express ACE2 at high levels72. ACE2 expression in the lower airway is limited, but ACE2 is expressed at a higher level in the upper airway, and indeed many airway cells that express ACE2 also express TMPRSS2 (refs74,134,135,136). While TMPRSS2 is the most frequently studied protease in coronavirus entry, other serine proteases present in the lung, including human airway trypsin-like protease (HAT), TMPRSS4, TMPRSS11A, TMPRSS11E and matriptase, as well as secreted neutrophil elastase, appear to contribute to infection by many respiratory viruses137. Studies on their involvement in viral infection in vivo will provide further insight into the SARS-CoV-2 entry pathway and the means to inhibit the process.

Although both viruses utilize TMPRSS2, SARS-CoV is less dependent on TMPRSS2 than is SARS-CoV-2 (refs138,139). One determining factor might be the presence or absence of a furin site at the S1–S2 boundary140. It is possible that the sequence of the S1–S2 junction of the SARS-CoV S protein is not a suitable substrate for TMPRSS2 but is more easily cleaved by cathepsins. This explanation is consistent with the observation that replacing the SARS-CoV-2 furin site with the equivalent sequence of SARS-CoV or RaTG13 virus from horseshoe bats, containing no multibasic site, prevented efficient SARS-CoV-2 infection of TMPRSS2+ human airway cells9.

Role of cathepsins in viral entry

Although SARS-CoV-2 prefers activation by TMPRSS2, cleavage of the S2′ site can also be mediated by cathepsins, especially cathepsin L (Fig. 3). If the target cells express insufficient TMPRSS2 or if a virus–ACE2 complex does not encounter TMPRSS2, ACE2-bound virus is internalized via clathrin-mediated endocytosis17,18 into the late endolysosome, where the S2′ site is cleaved by cathepsins. Presumably owing to multiple ligations to ACE2, binding of SARS-CoV, SARS-CoV-2 or purified S protein induces ACE2 endocytosis17,18. The role of cathepsins in processing the S2′ site is supported by partial inhibition of pseudovirus entry by cathepsin L inhibitors in TMPRSS2+ cells138,141.

Cathepsins are non-specific proteases with endopeptidase and exopeptidase activities that participate in protein degradation in the late endosomes and lysosomes. They are divided into three catalytic classes: aspartic (D and E), serine (G) and cysteine (B, C, K, L, S and V) proteases. Of these, cysteine proteases (cathepsins B, L and S) contribute the most to viral entry. The role of cathepsins in viral entry is derived mostly from studies on reoviruses, Ebola virus and SARS-CoV142,143,144, with limited reports on SARS-CoV-2 (ref.138). Whereas cathepsin B plays an essential role in Ebola virus entry, cathepsin L plays a greater role in SARS-CoV15,16,142 and SARS-CoV-2 (refs138,141) entry. The lower dependence of SARS-CoV-2 on the endosomal pathway explains the limited effect of endosomal acidification inhibitors such as hydroxychloroquine on restricting SARS-CoV-2 infection of target cells138. In one study, while the potency of hydroxychloroquine was dramatically impaired when the target cell expressed TMPRSS2, it was partially restored in the presence of a TMPRSS2 inhibitor138, suggesting a potential benefit of combined use of a TMPRSS2 inhibitor and hydroxychloroquine.

Membrane fusion

Recent structural studies identified key components of the S fusion machinery, including the FPPR, 630 loop and CTD2, which appear to modulate the fusogenic structural rearrangements of the S protein (Fig. 2). The FPPR and the 630 loop help maintain the RBDs in the down conformation but move out of their positions when the adjacent RBD flips up. As summarized in Fig. 4, the RBD could sample the up conformation due to intrinsic protein dynamics. If ACE2 captures the RBD-up conformation, expelling both the 630 loop and the FPPR from their positions in the closed S trimer conformation, the FPPR shift may help expose the S2′ site near the fusion peptide for proteolytic cleavage. Departure of the 630 loop from the hydrophobic surface of CTD2 can destabilize this domain and free the N-terminal segment of S2 from S1, likely releasing S1 altogether, owing to the precleavage of the S1–S2 boundary of the SARS-CoV-2 S protein by furin as discussed earlier. Dissociation of S1 would then initiate a cascade of refolding events in the metastable prefusion S2, allowing the fusogenic transition to a stable postfusion structure (Fig. 4). Accompanying these transitions, the thrust of HR1 unfolding drives fusion peptide insertion into the target-cell membrane57,58. Folding back of HR2 places the fusion peptide and transmembrane segments at the same end of the molecule; this proximity causes the membranes with which they interact to bend towards each other, effectively leading to membrane fusion. This model is also very similar to that proposed for membrane fusion catalysed by the HIV envelope protein, in which gp120 dissociation triggers refolding of gp41 to complete the fusion process145.

Fig. 4: A model for membrane fusion induced by the SARS-CoV-2 S protein.
figure 4

Structural transition from the prefusion conformation to the postfusion conformation inducing membrane fusion likely proceeds stepwise as follows. The prefusion spike (S) protein trimer fluctuates between the three receptor-binding domain (RBD)-down, closed conformations and one RBD-up, open conformation. RBD binding to angiotensin-converting enzyme 2 (ACE2) enables exposure of the S2′ cleavage site immediately upstream of the adjacent fusion peptide (FP). Cleavage at the S2′ site releases the structural constraints on the FP and initiates a cascade of refolding events in S2, probably accompanied by complete dissociation of S1. Formation of the long central three-stranded coiled coil and folding back of heptad repeat 2 (HR2) leads to the postfusion structure of S2 that brings the two membranes together, facilitating fusion pore formation and viral entry. As shown in Fig. 3, these events can occur either at the plasma membrane or in the endosomal compartment. HR1, heptad repeat 1; TM, transmembrane segment.

Cellular proteins restricting viral entry

Toll-like receptors (TLRs) recognize pathogen-associated molecular patterns and induce the production of type I interferons. Of these, TLR3, TLR7, TLR8 and TLR9 mount antiviral immune responses: TLR3 recognizes double-stranded RNA viruses, TLR9 recognizes unmethylated CpG in viral DNA, and, relevant to coronaviruses, TLR7 and TLR8 bind G/U-rich single-stranded viral RNA. TLR7 and TLR8 reside in the endosomes and were shown to induce proinflammatory cytokines in response to SARS-CoV and SARS-CoV-2 RNA146,147. TLR7 and TLR8 are both expressed in lung tissue. Although TLR7 is expressed at higher levels in the brain, skin and lymphoid tissues than in the lung, TLR8 is predominantly expressed in the lung and lymphoid tissues.

Many interferon-stimulated gene products were identified as important for SARS-CoV-2 replication, but only a few of them are involved in the entry steps: interferon-induced transmembrane proteins (IFITMs)148,149 and lymphocyte antigen 6 family member E (LY6E)150,151. Four members of the human IFITM family (IFITM1, IFITM2, IFITM3 and IFITM5) are constitutively expressed at a high level but are strongly induced by type I and type II interferons and were identified as cellular antiviral proteins against influenza A viruses and flaviviruses152 and later against filoviruses and SARS-CoV153. Recently, IFITM2 was shown to restrict SARS-CoV-2 entry149. IFITM proteins prevent viruses from traversing the endosomal membrane to access cellular cytoplasm by an unclear mechanism. Such a restriction can be bypassed if SARS-CoV were directed to enter cells exclusively at the plasma membrane153. This restriction is amplified if the furin site is removed from SARS-CoV-2 (ref.149), and is compensated by TMPRSS2 overexpression148,151. In addition, it does not affect the viruses that enter the host cell solely via fusion at the plasma membrane153. These observations indicate that the site of membrane fusion is crucial for the antiviral activity of IFITM proteins.

LY6E is a glycophosphatidylinositol-anchored cell surface protein and was shown to inhibit replication of vesicular stomatitis virus and mouse hepatitis virus154, but it was also identified to promote yellow fever virus replication155, HIV-1 entry156, flavivirus internalization157 and influenza A virus entry158. Proposed mechanisms for entry enhancement include formation of a microtubule-like network that likely guides endocytosed viruses along the tubules157 and promotion of uncoating steps following membrane fusion158. Recently, LY6E was shown to impair infection by SARS-CoV, SAR-CoV-2 and MERS-CoV by inhibiting the S protein-mediated membrane fusion, and mice lacking LY6E expression in immune cells were highly susceptible to mouse hepatitis virus, also a coronavirus150. Unlike IFITM-mediated restriction, LY6E-mediated inhibition was not overcome by TMPRSS2 expression151. Further study is warranted to clarify the distinct roles of LY6E in regulating infection with SARS-CoV-2 and other viruses.

Natural evolution of the S protein

Comparison of the S protein sequences indicates SARS-CoV-2 may have emerged from the recombination between bat and pangolin coronaviruses. During zoonosis, SARS-CoV-2 acquired a furin-cleavage site at the boundary of the S1 and S2 domains. The virus retained this cleavage site throughout the pandemic but acquired the D614G mutation to compensate for S protein instability. More recently, as the number of infected or vaccinated people increases, SARS-CoV-2 has evolved to acquire S protein mutations to escape neutralizing antibodies.

S proteins from sarbecoviruses in reservoir species

Specimens taken from pangolins (Manis javanica) were found to be infected with coronaviruses similar in sequence to SARS-CoV-2 and capable of utilizing ACE2 for entry100,101,102. However, the pangolin is not likely the long-term reservoir species for SARS-CoV-2, because most infected pangolins exhibited severe respiratory distress and died within weeks102. In addition, because pangolin coronaviruses share lower sequence homology with SARS-CoV-2 than does RaTG13, a bat isolate, it is unlikely that a particular pangolin coronavirus is directly linked to the present SARS-CoV-2 outbreak. Interestingly, whereas pangolin isolates exhibit higher homology in the RBD with SARS-CoV-2 than does RaTG13, RaTG13 shares much greater homology with SARS-CoV-2 than do pangolin isolates outside the RBD3,100,102. This raises a possibility that the pangolin coronavirus RBD was introduced into the S gene of RaTG13 or another close ancestor of SARS-CoV-2 through a recombination event104,159. It remains unknown in which species, if any, this recombination event occurred. Other analyses suggest SARS-CoV-2 did not acquire the RBD from a pangolin coronavirus but rather that it evolved in bats and gained the ability to infect humans and pangolins160,161. Furthermore, other studies suggest SARS-CoV-2 crept into humans much earlier than 2019 through unnoticed infection and obtained its unique features, RBD and furin-cleavage site160,162.

Adaptation to humans

It was suggested that the acquisition of the furin-cleavage site in the SARS-CoV-2 S protein was essential for zoonotic transfer to humans9,163, and experimental data confirmed that SARS-CoV-2 pseudoviruses lacking this cleavage site in the S protein are incapable of facilitating entry into human airway cells9. Notably, the furin site is not essential for infection of mammalian epithelial cells generally, as it is lost after a few passages of the virus in Vero cells (African green monkey kidney epithelial cells)28,164,165,166,167. However, recent preliminary studies show virus passage in TMPRSS2-overexpressing Vero cells or in human lung cells prevents deletion or mutation of this site167,168,169. Therefore, acquisition of the furin-cleavage site appears to be one of the first human adaptation events.

For several months as SARS-CoV-2 initially spread throughout the world, only one S protein mutation, D614G, exhibited clear evidence of positive selection170,171. Although acquisition of the furin site by SARS-CoV-2 and the resulting S protein cleavage appear to be essential for human infection, they also makes the virus less infectious than SARS-CoV by rendering the S protein prone to S1 shedding, as discussed earlier. To compensate for this disadvantage, the SARS-CoV-2 S protein appears to have gained stronger intermolecular association between the S1 and S2 subunits, by way of the D614G mutation. A number of studies have supported such a stabilizing effect as the mode of infectivity enhancement of the D614G variant virus27,127,172,173,174. Cryo-EM studies of full-length S trimers revealed the source of the increased S protein stability: the D614G mutation renders the 630 loop more ordered, securing the NTD and CTD1 and resulting in reduced S1 shedding25,27 (Fig. 2b,c). A biochemical study clearly showed increased functional S protein density on the G614-containing variant as a result of reduced S1 shedding127. Others have explained that a greater proportion of the one-RBD-up conformation among G614-containing S proteins is associated with increased virus infectivity22,175,176,177. Most such studies, however, were performed with non-native S protein lacking the furin site and/or carrying the diproline mutation, which has been shown to stabilize the prefusion conformation21. Regardless of its mechanism for enhanced infectivity, faster transmission of D614G virus was demonstrated71. The fact that most currently circulating SARS-CoV-2 isolates carry the D614G mutation indicates it is undoubtedly a beneficial mutation for adaptation to humans.

Additional adaptations to the human host are ongoing. The SARS-CoV-2 variants Alpha (lineage B.1.1.7), Beta (B.1.351) and Gamma (P.1), which were first identified in the United Kingdom, South Africa and Brazil, respectively, carry a common N501Y mutation in addition to the D614G mutation (Table 1). Residue 501 is one of the key sites within the RBD involved in ACE2 binding2,178,179, and recent preliminary reports demonstrate that N501Y mutation strengthens RBD interaction with hACE2 (refs180,181) and increases the infectivity and virulence of variants containing the mutation182,183,184. Interestingly, data indicate that the N501Y substitution also lends the S protein the ability to utilize mouse and rat ACE2 orthologues182,185,186, raising concern for the potential of new rodent reservoirs. A recent variant, Delta (B.1.617.2), first identified in India, does not have this N501Y mutation, but it nonetheless exhibits significantly increased transmissibility by an unknown mechanism187. Increased cleavage at the S1–S2 boundary due to the P681R mutation in the furin-cleavage site might further contribute to increased transmissibility128,188.

Immune escape

Viruses escape immunity by mutating the residues recognized by neutralizing antibodies. As immune escape is necessary only in the presence of immune pressure, no escape variants appeared in the early days of the pandemic, with lower dissemination and in the absence of vaccination. In recent months, however, several neutralization-resistant variants have emerged. The aforementioned Alpha, Beta, Gamma and Delta variants and the Epsilon variant (B.1.429, California lineage) exhibit decreased sensitivity to neutralization by immune plasma derived from convalescent patients with COVID-19 or vaccinated individuals40,41,42,43,44,45,185,189,190,191,192,193,194. The possibility that these recent variants emerged from the infection of previously infected or vaccinated individuals is supported by studies in which escape variants were experimentally generated in the presence of neutralizing antibodies, convalescent sera or sera derived from vaccinated individuals195,196,197,198. In these studies, the same mutations, including K417N, E484K and N501Y in the RBD, which are the hallmarks of the Alpha, Beta, and Gamma variants, were identified. These variants are also fully or partially resistant to the therapeutic monoclonal antibodies bamlanivimab (Eli Lilly) and/or casirivimab (Regeneron)178,189,197. It is likely that other variants, Eta (B.1.525, United Kingdom), Iota (B.1.526, New York) and Kappa (B.1.617.1, India), are also less sensitive to neutralization, owing to the Glu484 mutation to either Lys or Gln (E484K/Q). The rapidly spreading Delta variant does not carry E484K/Q, but it does have the additional mutation L452R, which confers neutralization resistance199 and is also present in the Eta and Iota variants.

Conclusions and perspective

Three zoonotic coronaviruses have emerged to cause severe disease in humans in the last two decades: SARS-CoV, MERS-CoV and SARS-CoV-2. The frequency of these zoonoses, accelerated by increased human intrusion into previously undisturbed habitats, suggests that new coronaviruses and other emerging pathogens will continue to threaten human health. Despite the warning signs from SARS-CoV and MERS-CoV outbreaks, few anticipated a pandemic of the scale that was caused by SARS-CoV-2. Fortunately, and owing in part to lessons learned from the previous outbreaks, our understanding of SARS-CoV-2 biology and the development of vaccines (Box 2; Table 2) and therapeutics have proceeded at an unprecedented pace. In spite of these achievements, there remain many outstanding questions whose answers may assist in the development of new tools to control these viruses.

Table 2 Selected products used in global vaccination campaigns

One question is when in its natural history did the virus acquire a furin site? As already mentioned, cleavage of the SARS-CoV-2 S protein into the S1 and S2 subunits results in an unstable S protein. Nonetheless, the virus has judiciously held onto the furin site throughout the pandemic without deleting or mutating it. In addition, this furin site was shown to be important for transmission in ferrets128. One logical explanation is that the cleavage of the ancestral sequence at the S1–S2 junction (the sequence before the virus acquired the furin site) by target-cell proteases such as TMPRSS2 might have been inefficient, and thus the virus opted to precleave the site at the cost of resulting S protein instability. On the other hand, studies show that SARS-CoV-2 is nonetheless more dependent on TMPRSS2 than on cathepsins138. Recent studies on IFITM proteins, type I interferon-induced endosomal virus restriction factors, provide one clue to this puzzle. SARS-CoV-2 lacking the polybasic site is more sensitive to IFITM-mediated restriction than the wild-type virus128,149, suggesting the virus chose to use TMPRSS2 to enter cells to avoid IFITM-mediated restriction in the endosome. Alternatively, this preference of SARS-CoV-2 for TMPRSS2 can also be explained if tight folding of the SARS-CoV S protein allows the precise exposure of the S2′ site to cathepsin L, a protease with low substrate specificity, while the difference in SARS-CoV-2 S protein folding may not confine cathepsin digestion to only the S2′ site, leading to overdigestion of the neighbouring sequences, which renders the fusion peptide non-functional. These situations would force SARS-CoV-2 to prefer TMPRSS2 to cathepsins. If so, the furin dependency and TMPRSS2 dependency of SARS-CoV-2 appear to reflect a choice of less bad options. Of these, S protein instability was compensated by acquisition of the D614G mutation27,127. If TMPRSS2 inhibitors become widely used, it is possible the virus will acquire mutations that facilitate escape from inhibitory action of IFITM proteins and enable more precise use of cathepsins, suggesting the benefit of use of combinatorial therapies targeting both pathways (camostat mesylate plus hydroxychloroquine).

Some critical questions are associated with the link between transmissibility and disease severity. As a virus better adapts in a species, the severity of the disease can diminish, but recent changes in SARS-CoV-2 have increased both transmissibility and hospitalizations. A positive relationship between transmissibility and hospitalizations may reflect a common underlying mechanism: higher affinity for ACE2 can increase both. More efficient binding to ACE2 can promote replication in the upper respiratory tract, promoting more efficient transmission, and it can also increase replication in the lower respiratory tract and systemically, causing severer disease. One outstanding question is whether the S protein has reached the maximum affinity for hACE2 through RBD mutations such as N501Y or whether it will further mutate and continue to enhance both transmissibility and pathogenicity. Or might countervailing selection pressures result in transmission gains but with milder disease? If the differences between the upper and lower respiratory tracts allow the virus to adapt specifically to, say, nasal epithelial cells, the link between transmission and disease severity may be broken.

At the time of writing, the pandemic appears to be shifting gears, moving from an early period of adaptation of the virus to its new human host to a longer period where immune escape will shape S protein evolution. A key question here is what future vaccine antigens should look like. The S protein is rapidly diversifying, and current vaccination strategies (Box 2; Table 2) may soon become impractical, necessitating vaccine deployment against every major circulating variant. The major antibody-neutralizing epitope of the S protein is the RBD, accounting for more than 90% of all neutralizing activity, although the NTD also has some neutralizing epitopes. Importantly, the NTD is changing faster, acquiring more mutations and deletions than the RBD; understandably because the NTD plays at most a secondary role in viral entry and thus is subject to fewer constraints on its evolution. The remainder of the S protein, including the S2 subunit, does not appear to elicit significant neutralizing responses. Therefore, RBD vaccines presenting multiple RBDs derived from various escape variants may make superior antigens for future vaccines.

A related question is whether the accessible pathways of viral escape are essentially unlimited, or whether they are constrained, especially in the RBD, so that only a tractable number of mutations are likely to emerge. If the latter is true, they might be anticipated and blocked pre-emptively. So far, four major regions or residues in the S1 subunit have been identified that render the virus less sensitive to immune sera: a supersite in the NTD (residues 14–26, 141–156 and 246–260)200,201,202,203 and residues 417, 484 and 501 in the RBD40,200,204. Interestingly, the immune-escape mutations selected in several in vitro studies overlap with those that naturally emerged from human infection40,191,202,205 and those selected in the presence of monoclonal antibodies or sera derived from infected or vaccinated individuals195,196,197. These observations imply that there might indeed be only a limited number of escape pathways the virus can take and that those pathways could be blocked. Alternatively, the mutations the virus has been accumulating so far could represent the first ‘low-hanging fruit’ available to the virus. Indeed, all major mutations in the RBD to date have been accessed through a single nucleotide change, and so, in the pessimistic scenario, the virus could further evolve through more complex mutations such as several synergistic or compensatory mutations in parallel. Moreover, several recent mutations have increased RBD affinity for ACE2 (Table 1). Unfortunately, this adaptation provides the virus with some breathing room to accommodate immune-escape mutations that will likely decrease the affinity for ACE2.

Thus, unfortunately, SARS-CoV-2 variants may continue to adapt in the human population for years or decades. This raises additional questions. As coronaviruses are especially good at recombining with other coronaviruses206, will a recombination event with one of the milder coronaviruses that regularly circulate among humans generate an entirely new virus? Will SARS-CoV-2 diversify into even more distinct lineages more akin to the many forms of influenza A viruses with which we regularly contend? Will SARS-CoV-2 stably remain among our livestock or pests, perhaps creating reservoirs for new zoonoses? Perhaps the most critical question is whether we can finally learn the clearest lesson of the COVID-19 pandemic; namely, can we fully appreciate that viral infections are a major threat to all of us, but at the same time are fully addressable with our current technologies and effective implementation of well-established public health principles? If we learn this lesson well, hopefully this will be our last pandemic.