Abstract
The solution structure of SARS-CoV-2 nonstructural protein 7 (nsp7) at pH 7.0 has been determined by NMR spectroscopy. nsp7 is conserved in the coronavirinae subfamily and is an essential co-factor of the viral RNA-dependent RNA polymerase for active and processive replication. Similar to the previously deposited structures of SARS-CoV-1 nsp7 at acidic and basic conditions, SARS-CoV-2 nsp7 has a helical bundle folding at neutral pH. Remarkably, the α4 helix shows gradual dislocation from the core α2-α3 structure as pH increases from 6.5 to 7.5. The protonation state of residue H36 contributes to the change of nsp7’s intramolecular interactions, and thus, to the structural variation near-neutral pH. Spin-relaxation results revealed that all three loop regions in nsp7 possess dynamic properties associated with this structural variation.
Introduction
Since 2002, coronaviruses (CoVs) have threatened human health with three outbreaks: severe acute respiratory syndrome (SARS), Middle Eastern respiratory syndrome (MERS), and the coronavirus disease of 2019 (COVID-19).1 In just a year after COVID-19 was declared a global pandemic by the WHO in March 2020, more than 119 million individuals have been infected with SARS-CoV-2, with 2.6 million deaths across the world. To address the COVID-19 pandemic, research efforts have led to new advances in the basic understanding of SARS-CoV-2. Similar to the other highly pathogenic CoVs, SARS-CoV-2 belongs to the genus Betacoronavirus.2 The genome of SARS-CoV-2 shares high sequence identity with those of SARS-CoV-1 (∼80 %) and MERS-CoV (∼50 %).3,4 The genome consists of 14 open reading frames (ORFs), with the two largest encoding two large polyproteins, that once cleaved, comprise the 16 viral non-structural proteins or nsps.4 The remaining open reading frames encode the four structural proteins – spike (S), envelope (E), membrane (M), and nucleocapsid (N) – and nine accessory proteins.4
SARS-CoV-2 is an enveloped, positive-sense RNA virus.2 Upon entering host cells, the two largest and 5’ proximal ORFs are translated into polyproteins that are subsequently cleaved into individual nsps.1,5 The mature nsps induce rearrangement of the endoplasmic reticulum (ER) into double-membrane vesicles (DMVs) that are platforms for viral replication of genomic and subgenomic RNAs.1,6,7 Translation of the subgenomic mRNA’s produces structural and accessory proteins.8 These proteins and the newly synthesized genomic RNA assemble into a virus particle in the ER-Golgi intermediate compartment (ERGIC) that is later secreted to the intercellular space by the secretory pathway.1,9
nsp7, 8, and 12 are incorporated into DMVs and form the core structure of the viral replication-transcription complex (RTC).5,7,10 These core proteins are highly conserved in coronaviruses.11-13 While nsp12 serves as the catalytic center being the RNA-dependent RNA Polymerase, nsp7 and nsp8 are essential co-factors for processive RNA synthesis.14-16 To examine the crucial roles in viral replication and transcription of these protein in the virual life cyle, many crystallographic and electron microscopic structures have been determined. By contrast, the structural and dynamic features of nsps in solution have not been as widely investigated. Here, we determined the solution structure of SARS-CoV-2 nsp7 at pH 7.0 and compared it with structures of SARS-CoV-1 nsp7 at different pHs to better understand its structure in the biophysiological environment. We found that nsp7 shows gradual changes in its morphology as the pH changes. The pH-dependent protonation state of H36 (native number) was found to be a key element that causes this structural variation. The motional properties of the backbone of nsp7 were investigated. Our findings may provide an essential clue for understanding the structural and dynamic features of nsp7 associated with cellular pH conditions.
Results and Discussion
Solution Structure of SARS-CoV-2 nsp7 at pH 7.0
Similar to the structure of SARS-CoV-1 nsp7 at pH 7.5 and 6.5,17,18 SARS-CoV-2 nsp7 exhibited a helical bundle structure at pH 7.0 (Fig. 1a). Four helices defined by PROCHECK19 – α1 (V11-L20), α2 (K27-L41), α3 (T45-M62), and α4 (I68-R79) – are aligned with the TALOS-N prediction (green regions) reported in a previous paper.20 The statistics for the final 20 lowest energy models are shown in Table 1. Only the backbone and heavy atoms of residues in the ordered region (S10–R79) were included in the calculation of RMSDs. Except for the highly disordered N-terminal regions, the ordered regions of all calculated models were well-superimposed (Fig. 1b).
pH-dependent Polymorphism of the Structure of Coronavirus nsp7
Since there are no determined structures for SARS-CoV-2 nsp7 at different pH values, the structures of SARS-CoV-1 nsp7 at pH 7.5 (PDB ID 1YSY)17 and 6.5 (PDB ID 2KYS)18 were used to investigate the pH-dependency of the SARS-CoV-2 nsp7 structure. Given the high degree of sequence and structural homology, we compared the structures of SARS-CoV-1 at pH 6.5 and 7.5 together with our structure of SARS-CoV-2 at pH 7 to assess the pH-dependent polymorphism of nsp7. As shown in Fig. 2a, SARS-CoV-1 and SARS-CoV-2 nsp7 differ by a single residue at position 70 in the α4 helix, where SARS-CoV-2 has a Lys rather than an Arg. However, since the sidechains of both Lys and Arg in both nsp7s do not contact any residue on the other helices, they are irrelevant to the tertiary folding (Fig. 2b-d). Thus, the conformations of SARS-CoV-2 nsp7 at pH 7.5 and 6.5 are expected to be similar to those of SARS-CoV-1 nsp7 at the corresponding pHs.
The structures were superimposed based on the core structure of the protein that comprises the α2 and α3 helices (Fig. 3). As observed previously, the core structure of nsp7 is almost independent of pH changes within the 6.5 to 7.5 range.18 However, the α4 helix is gradually dislocated from the core structure as pH increases, completely losing contact with the α2 helix at pH 7.5. Since the α4 helix is known as the stabilizing factor for the overall folding,18 this dislocation may destabilize the structure of nsp7. This is compensated by the translocation of the α1 helix. As the α4 helix is displaced, the α1 helix moves to the position originally occupied by the α1 and α4 helices (Fig. 3b). By forming new interactions with the core structure, the α1 helix can act as a supplementary factor for maintaining the overall folding at pH 7.5.
Protonation State of a Histidine Residue and pH-dependent Polymorphism
Since the sidechain of histidine residues have a pKa value near neutral pH, its protonation states are often closely related to the pH-dependent variations of protein structure and oligomerization.21-26 Accordingly, the observed pH-dependent polymorphism of the nsp7 structure seems to be related to the protonation state of H36 in the α2 helix. H36 is conserved in both SARS-CoV-1 and SARS-CoV-2 nsp7. At acidic pH, the Nε2 atom of H36 prefers to be protonated. The Hε2 atom can then form a hydrogen bond with the Oδ1 atom of N37 (Fig. 4a). On the opposite side of the His ring, the Nδ1-Hδ1 bond serves as a proton donor for another hydrogen bond with the E74 from the α4 helix. Along with the hydrophobic packing between α2-α3-α4 helices, these hydrogen bonds may contribute to the compact fold of nsp7 at acidic pH, with the α4 helix tightly binding to the core structure of the protein.
As the pH increases to 7.0, the population of deprotonated H36 significantly increases, leading to the loss of the electrostatic interaction. This in turn leads to a conformational rearrangement that dislocates the α4 helix slightly outward from the core structure. The neutralized aromatic sidechain is now slightly buried within the hydrophobic packing among α1-α2-α4 helices (Fig. 4b), leading to a large decrease of solvent-accessible surface area (SASA) from 16.133 Å2 to 1.230 Å2. Nonetheless, as H36 balances between protonated and deprotonated forms, the overall structure of nsp7 still resembles the compact conformation. However, as the pH further increases above 7.0, the majority of H36 becomes deprotonated and the structure of nsp7 experiences a dramatic conformational change (Fig. 4c). The completely dislocated α4 helix no longer makes contact with the α2 helix, and only barely contacts the α3 helix by a few hydrophobic interactions. As the α1 helix translocates to occupy the position vacated by the α4 helix, the sidechain of H36 is surrounded by N37 and hydrophobic residues from the α1, α2, and α3 helices, thus losing SASA (0 Å2) completely. Consequently, H36 participates in the newly formed hydrophobic packing that may contribute to maintain the folding of the core structure at basic pH.
The structural variations of proteins in different environments often lead to different intermolecular interactions and may have a profound effect on the assembly state of protein complexes. Thus, given the observed effect of pH on the structure of nsp7, we propose that the pH-dependent polymorphism of nsp7 may play a critical role in the assembly state of the overall replication complex. During viral infection, nsp7 experiences significant pH changes as it travels between different cellular compartments. Immediately after production, nsp7 would exist as a globular protein that reversibly exchanges between two conformations, a more compact one, preferred in the acidic condition, and a loose one, preferred in the basic condition.18 At the slightly basic pH of DMV (pH 7.0 – 7.2),27 nsp7 in its loose conformation associates with its binding partners, Nsp8 and 12, to form the replication complex. The N-terminal region of nsp7 becomes fully structured, and the extended α1 helix forms a three-helix bundle with the core α2-α3 structure.11-13 In this condition, the sidechain of H36 is packed within the hydrophobic residues at the binding interfaces of nsp7 and nsp12, as illustrated in Fig. 4d. If the sidechain of H36 becomes protonated, its positive charge would make the hydrophobic interactions between nsp7 and nsp12 unfavorable. In this way, the protonation state of H36 of nsp7 would affect the assembly status of RTC, and thus, the viral replication.
Spin-relaxation Experiment of SARS-CoV-2 nsp7 at pH 7.0
Previously, the line shape analysis for the 1H-15N HSQC peaks of SARS-CoV-1 nsp7 revealed that the protein may have a dynamic loop between α3 and α4 helices at pH 6.5. Residues near the C-terminal end of α3 helix (S61 and Q63), in the loop 3 (V66), and near the N-terminal end of α4 helix (N69 and L71) exhibited a noticeable line-broadening that may be related to the conformational exchange of α4 helix at the weakly acidic pH.18 To understand the dynamic properties of SARS-CoV-2 nsp7 in detail, the longitudinal (R1) and transverse (R2) relaxations and heteronuclear Overhauser effect (hetNOE) experiments were conducted at the neutral pH (Fig. 5 and 6). Our results confirmed that all three loop regions of SARS-CoV-2 nsp7 have dynamic features at pH 7.0 that are needed for the exchange between compact and loose conformations.
As expected, S61 near the C-terminal end of α3 helix, V66 in the loop 3, and I68 at the N-terminal end of α4 helix were found to have high R2 rates, 37.96 ± 1.20 sec-1, 33.72 ± 0.63 sec-1, and 23.60 ± 0.27 sec-1, respectively. It seems that a slight difference in the nsp7 structure at pH 6.5 and 7.0 changed the motional properties at Q63, I68, N69, and L71. Despite this difference, it is still clear that the broad region in between the ends of α3 and α4 helices has sub-nanosecond timescale motions that are related to the movement of α4 helix. Interestingly, the amide proton of D67 in this dynamic region seems to have a very short transverse relaxation time (T2) since its peak was not shown in any NMR spectrum probably due to line-broadening. Similarly, the peaks for S24 and K27 in the loop 1 showed very weak intensities in most NMR spectra and were not shown in the spin-relaxation experiments, indicating their short T2 times. Therefore, the backbone regions including these two residues also have dynamic motions related to the movement of α1 helix.
Similar to the previous hetNOE results for SARS-CoV-1 nsp7 at pH 6.5,18 our results for SARS-CoV-2 nsp7 at pH 7.0 also revealed the increased structural flexibility at the highly disordered N- and C-terminal regions. In addition, the protein revealed two highly flexible loop regions, loop 2 and loop 3. In loop 2, K43 and D44 showed low hetNOE values, 0.597 ± 0.042 and 0.662 ± 0.023, respectively. Although T45, located between the end of loop 2 and the beginning of helix α3, was found to have a high hetNOE, it also showed a high R2 rate, 23.28 ± 0.41 sec-1, suggesting the presence of sub-nanosecond chemical exchange at the end of loop 2. As for loop 3, Q63 was found to have the lowest hetNOE of 0.671 ± 0.089, with the remaining three residues G64, A65, and V66 also showing low hetNOE values close to 0.7.
Remarkably, N37 and D38 near H36 showed significantly increased R2 rates, 23.65 ± 0.22 sec-1 and 25.14 ± 0.26 sec-1, respectively. Although the reason is still unclear, it seems that the structural variations by the protonation/deprotonation of H36 are related to their sub-nanosecond motions. Also, we measured an unexpectedly low hetNOE value of 0.604 ± 0.030 for F49 in the α3 helix. It was found that the sidechain of F49 is exposed toward the solvent (SASA = 114.166 Å2). This hydrophobic sidechain may tremble in the solvent, making its backbone region unexpectedly flexible.
The correlation time (τc) of the Brownian rotation diffusion of a small rigid protein in solution can be calculated as a function of the ratio of longitudinal and transverse relaxation times, T1 and T2. 28,29 Given that the τc values of globular proteins are proportional to their molecular weight (MW), one can use the calculated τc to approximate the MW of a protein using a standard curve of τc versus MW at a given temperature. 28 For SARS-CoV-2 nsp7 at pH 7.0 and 298 K, τc was calculated to 12.4 ns. Using the standard curve created with empirical data from NESG28 (Fig. 7), the MW of nsp7 was approximated to 20.6 kDa, which is about twice the value expected for the monomer (9.4 kDa). This suggests that SARS-CoV-2 nsp7 may exist as a homodimer when it is isolated in solution. Nevertheless, it is still unclear whether the protein really dimerized in the NMR sample, as we were unable to find any distinguishable intermolecular NOEs in the NOESY spectra. Accordingly, it was not possible to determine the dimeric structure because of the lack of contact information. We plan to verify whether nsp7 dimerizes in solution, and if it does, whether dimerization is a natural property of the protein or just an unintended consequence of the high concentration required for NMR studies. We also plan to verify whether the dimerization of nsp7 has any effect on its conformation.
Methods
Sample Preparation and NMR Assignments
The 13C, 15N-labeled SARS-CoV-2 nsp7 sample was expressed and purified by the same methods as previously described in our NMR assignment paper.20 The recombinant Escherichia coli Rossetta2 pLysS was first grown in a M9 minimal medium containing 1 g/L 15N4HCl (Cambridge Isotope Laboratories), 4 g/L 13C6-D-Glucose (Cambridge Isotope Laboratories) and 100 mg/L ampicillin at 37 °C. 0.5 mM of IPTG was added when OD600nm reached 0.8. The protein was expressed at 16 °C for 14-16 hours. After resuspending a cell pellet in buffer A (10 mM HEPES, pH 7.4, 30 mM imidazole and 2mM DTT), cells were lysed using a microfluidizer at 20,000 psi. The supernatant was filtered using a 0.45 μm vacuum filter before loading to a Ni-NTA agarose column (Qiagen). The protein was eluted with buffer B (buffer A with 300mM imidazole) and cleaved with 1% w/w TEV protease overnight at room temperature while dialyzing against 10 mM MOPS buffer (pH 7.0), containing 150 mM NaCl and 2 mM DTT. TEV protease and tag were removed via a second IMAC purification. The protein was further purified with the Superdex200 (GE Life Science) using the MOPS buffer used in the dialysis. The final NMR sample was prepared as 1.7 mM nsp7 containing 0.025 % NaN3 and 7 % D2O.
All NMR experiments were performed by 600 MHz Varian VNMRS and 600 MHz Bruker Avance III at 298 K. NMR data were acquired and processed using NMRPipe30 with SMILE31 for NUS reconstruction as described in the previous paper.20 Using pre-deposited chemical shift assignments (BMRB ID 50337),20 NOE data – NOESY 1H-15N HSQC, aliphatic NOESY 1H-13C HSQC, and aromatic NOESY 1H-13C HSQC – were assigned manually after automated NOE assignments with AUDANA algorithm32 in PONDEROSA-C/S software package33. NMRFAM-SPARKY34 in the virtual server of NMRbox35 was used for the assignments. The 3D NOESY spectra were aligned based on the chemical shift assignments using the Align Spectrum tool (the two-letter code al).
Structure Determination
The structure of SARS-CoV-2 nsp7 at pH 7.0 was calculated using PONDEROSA-C/S software package.33 At first, the initial crude structure was obtained after iterative cycles of AUDANA/AUDASA automation. Distance and hydrogen bond constraints were automatically generated by AUDANA algorithm using the protein sequence, chemical shift assignments, and NOE data.32 Before submitting calculations, all unused resonances were removed by the two-letter code dr, and nomenclature and resonance deviations of chemical shift assignments were thoroughly verified using the Check Protein Nomenclature tool (ca) and the Resonance Deviation Check tool (ck) in NMRFAM-SPARKY, respectively. Chemical shifts assignments were also used to generate backbone dihedral angle constraints by TALOS-N.36 After every calculation, distance constraints were manually validated using the Distance Constraint Validator tool in Poky Analyzer,37 along with the manual NOE assignments. During the validations, all spectrum data and 3D atomic coordinates were visually examined. This was repeated until the number of long-range NOEs reached about 5 times of the number of residues.
The initial crude structure was refined by Constraints-X only option that uses simulated annealing and molecular dynamics protocols in Xplor-NIH.38 Similar to the previous step, distance and angle constraints were manually validated with each validator tool in Poky Analyzer. The structure went through several refinements until its 20 lowest models showed no mean NOE violations larger than 0.5 Å nor mean angle violations greater than 5°. The final structure of SARS-CoV-2 nsp7 was determined by Final step with explicit H2O (x2) option. Among 200 calculated structures, the 20 lowest energy models were selected after the molecular dynamics simulation of water solvation.33 The quality of the final structure was evaluated with Poky Analyzer and PSVS 1.5 software suite39 (Table 1). The Ramachandran plot analysis from PROCHECK19 revealed that 95.4 % of residues are in the most favored regions, followed by 3.0 % in additionally allowed regions, 1.6 % in generously allowed regions and 0.0 % in disallowed regions.
Spin-relaxation Experiment
All spin-relaxation data were acquired on an Avance III Bruker spectrometer operating at 600MHz (1H) and equipped with a triple resonance cryogenic probe. The temperature of the sample was regulated at 298 K throughout the experiments. T1 (R1) experiment data were recorded as a series of interleaved 2D 1H-15N HSQC spectra using 12 relaxation delays, 80, 160, 240, 320, 400, 480, 560, 640, 800, 960, 1200 and 1520 ms. Similarly, T2 (R2) experiment data were acquired with 13 relaxation delays, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140 and 160 ms. The recycle delay was set to 2.0sec for both T1 and T2 experiments. The hetNOE experiment data were obtained as two 1H-15N HSQC spectra from interleaving pulse sequences with (NOE) and without (no NOE) proton saturation. A 3s recycle delay was used for the no NOE experiment. This delay was replaced by a 3s proton saturation period in the NOE experiment. For all experiments, the offset and the spectral width were 4.76 and 16.7 ppm for the t2 dimension and 116.8 and 29.9 ppm for the t1 dimension, respectively.
Poky Notepad in the Poky software suite 37 was used to run scripts for calculating T1, R1, T2, R2, τc, MW, and hetNOE. The decay curves of peak heights over relaxation delays were fitted using scipy.optimize.curve_fit 40, and the same fitting function was used to calculate T1 and T2 times, I(t) = I(0) Exp[(−1/T1,2) t], where I(t) is the peak intensity at a relaxation delay time (t). Similarly, R1 and R2 rates were calculated using the fitting function, I(t) = I(0) Exp[(-R1,2/1000) t]. The τc of SARS-CoV-2 nsp7 in the NMR sample was calculated as a function of the ratio of the average values of T1 and T2. 28,29 To minimize contributions from unfolded segments, only the peaks with 1H chemical shift higher than 8.5 ppm were considered for the τc calculation. The MW of the protein was approximated using the standard curve of τc versus MW created with empirical data from NESG28. Finally, the hetNOEs were calculated as the ratio of peak heights in the NOE spectrum with proton saturation and the reference spectrum without proton saturation. The errors of each hetNOE were estimated as [(σNOE/INOE)2 +(σRef/IRef)2]1/2, where σNOE and σRef represent the root-mean-square noises of background regions in each spectrum, and INOE and IRef are the intensities of a given peak. 41
Author contributions statement
Y.L. structure, manuscript, analysis
M.T. NMR, manuscript
M.R. computational work
T.K.A., R.N.K. sample preparation
K.H.W., W.L. design, manuscript, direction
Additional information
Accession codes
Final coordinates of the solution structure of SARS-CoV-2 nsp7 at pH 7.0 are deposited in the PDB under the entry ID 7LHQ (BMRB ID: 50337).
Competing interests
The authors declare no competing interests.
Acknowledgements
This research was funded by National Science Foundation DBI-2051595 and DBI-1902076, and the University of Colorado Denver [to W.L.]. This research was also supported by National Science Foundation EAGER MCB-2031269 [to K.H.W.] and the National Institutes for Health, National Institute for Allergy and Infectious Disease AI123498 [to R.N.K.]. This study utilized the National Magnetic Resonance Facility at Madison supported by NIH grant P41GM103399 (NIGMS), old number: P41RR002301. This study made use of NMRbox: National Center for Biomolecular NMR Data Processing and Analysis, a Biomedical Technology Research Resource (BTRR), which is supported by NIH grant P41GM111135 (NIGMS). Authors acknowledge the useful discussions in COVID19-NMR meetings.
Footnotes
↵† Co-first authors