Structure of SARS-CoV-2 M protein in lipid nanodiscs

SARS-CoV-2 encodes four structural proteins incorporated into virions, spike (S), envelope (E), nucleocapsid (N), and membrane (M). M plays an essential role in viral assembly by organizing other structural proteins through physical interactions and directing them to sites of viral budding. As the most abundant protein in the viral envelope and a target of patient antibodies, M is a compelling target for vaccines and therapeutics. Still, the structure of M and molecular basis for its role in virion formation are unknown. Here, we present the cryo-EM structure of SARS-CoV-2 M in lipid nanodiscs to 3.5 Å resolution. M forms a 50 kDa homodimer that is structurally related to the SARS-CoV-2 ORF3a viroporin, suggesting a shared ancestral origin. Structural comparisons reveal how intersubunit gaps create a small, enclosed pocket in M and large open cavity in ORF3a, consistent with a structural role and ion channel activity, respectively. M displays a strikingly electropositive cytosolic surface that may be important for interactions with N, S, and viral RNA. Molecular dynamics simulations show a high degree of structural rigidity in a simple lipid bilayer and support a role for M homodimers in scaffolding viral assembly. Together, these results provide insight into roles for M in coronavirus assembly and structure.


Introduction
Coronaviruses encode four structural proteins that are incorporated into mature enveloped virions: the transmembrane spike (S), membrane (M), and envelope (E) proteins and the soluble nucleocapsid (N) protein 1 . S proteins protrude from the virion, creating the eponymous corona in electron micrographs, and mediate fusion of viral and host cell membranes. E proteins form cationic viroporins that promote viral assembly and modulate the host immune response. N is an RNA-binding protein that packages the viral RNA genome. M organizes the assembly and structure of new virions and is essential for virus formation [2][3][4][5] . M is the most abundant membrane protein in the viral envelope and anti-M antibodies are found in plasma of patients infected with SARS-CoV-2 and other coronaviruses [6][7][8][9][10][11] . Based on its functional importance and immunogenicity, M has been proposed as a target for coronavirus vaccines or therapeutics.
M has further been implicated in modulating host antiviral innate immunity. M inhibits the innate immune response by interfering with MAVS-mediated signaling and interferon production 25,26 . In mouse models of infection, M expression results in lung epithelial cell apoptosis in vitro and in vivo and may contribute to lung injury and pulmonary edema found in severe disease 26 .
Despite its essential role in viral assembly and implication in pathogenesis, the molecular determinants of M function remain largely unknown. MHV M was proposed to adopt long and compact structures that differentially facilitate membrane bending and recruitment of other structural proteins based on low resolution tomographic analysis 15 . Intriguingly, a structural and evolutionary relationship between SARS-CoV-2 M and the accessory viroporin ORF3a was reported 27 based on predicted homology to our experimental ORF3a structures 28 . The manner in which distinct functional roles for M and ORF3a can be achieved in the context of a shared architecture remains to be determined. Here, we report the cryo-EM structure of SARS-CoV-2 M in lipid nanodiscs and perform molecular dynamics simulations to provide insight into M structure, function, and dynamics.

Results
We determined the structure of SARS-CoV-2 M in lipid nanodiscs. Full-length M was expressed in Spodoptera frugiperda (Sf9) cells with a cleavable C-terminal GFP tag. Gel filtration chromatography of protein extracted in DDM/CHS detergent shows M runs predominantly as a single species consistent with a 50 kDa homodimer. We do not observe evidence of specific higher order oligomerization at low concentrations by fluorescence size exclusion chromatography or at higher concentrations in large scale purifications (Fig. S1). SARS-CoV-2 ORF3a, in contrast, assembles into stable homodimers and homotetramers under similar conditions 28 .
We reconstituted homodimeric SARS-CoV-2 M in nanodiscs made from the scaffold protein MSP1E3D1 and lipids (DOPE:POPC:POPS in a 2:1:1 ratio) and determined its structure by cryo-EM (Figs. 1, S2, Table 1). The majority of M (189 of 222 amino acids per subunit) was de novo modeled in the cryo-EM map (Figs. 1,S2). The N-terminus (amino acids [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] and C-terminus (amino acids 205-222) are not resolved in the map and were not modeled. Loops connecting transmembrane helices (amino acids 36-42 and 71-78) are the least well resolved regions of the structure. The relatively weak density is consistent with a lack of stabilizing interactions between these and other M regions and likely indicates they adopt a range of conformations among particles used to generate the final map. electronegative patches towards the cytoplasm. Such uniform electropositivity across the M cytosolic surface could facilitate the close juxtaposition of M present at high concentration in viral envelope with the negatively charged viral RNA genome.
The large, complementary, and hydrophobic interface between transmembrane and cytosolic regions of subunits in the M structure suggests a structurally rigid core. However, a previous tomographic study of MHV suggesting that M adopts distinct long and compact structures 15 , M's structural homology to the viroporin ORF3a 27,28 , and the dissociation of cytosolic regions shown in predicted SARS-CoV-2 M structures 29 suggest the possibility that M is capable of undergoing large-scale structural rearrangements. Motivated by this discrepancy between the predicted dynamics of M and our experimental findings, we performed molecular dynamics (MD) simulations to gain insight into the potential for conformational changes in M.
We equilibrated M in a lipid environment and ran an all-atom MD simulation for 1.6 µs. Overall, we did not observe substantial conformational rearrangements in M during the simulation (Fig. 5A,B). Superposition of the experimental and final M structure following the simulation shows minor deviations through most of the protein (overall RMSD of 2.5 Å) (Fig. 5A). The largest difference is a shift in TM1 up towards the extracellular/lumenal side by approximately half a helical turn, enabled by rearrangement of the TM1-TM2 linker (Fig. 5A). This relatively subtle movement is consistent with weaker density for the TM1-TM2 linker in the cryo-EM map and fewer packing interactions for TM1 than TM2 or TM3. Per residue deviations ranged from ~ 1-4 Å and, aside from the movement of TM1, were similar between subunits and largest in the TM2-TM3 linker, transmembrane to cytosolic region connection, and loops connecting strands in the cytosolic domain. Minimal structural deviation was observed during the simulation within or between subunits as judged by the number of close Cα contacts, the angle between transmembrane and cytosolic regions, the distance between transmembrane regions, or the distance between cytosolic domains (Figs. 5D-H). Consistent with limited movement of the transmembrane region and a lack of evidence for lipid binding in the cryo-EM structure, no obvious enrichment of specific lipids around M was identified following simulation (Fig. S4). Finally, the internal M pocket remained similar in size and sealed from the surrounding solution throughout the simulation (Fig. 5I,J). We conclude that under these conditions M adopts a largely stable structure with minimal dynamic conformational rearrangement at physiological temperature.

Discussion
The structure of the SARS-CoV-2 M protein that we have obtained by cryo-EM reveals a homodimeric fold that is structurally homologous to the nonselective Ca 2+ permeable cation channel of SARS-CoV-2, ORF3a. As with 3a, each subunit of M contains three transmembrane helices and a C-terminal beta sandwich domain. However, the structure differs from ORF3a in several key ways that provides insight into how these structurally similar proteins can fill drastically different apparent roles in the coronavirus life cycle.
When viewed from the plane of the membrane, M is considerably wider and flatter than ORF3a, due to differences in transmembrane helix packing and a rotation about the central axis of the cytosolic domain. Among the consequences of this flattening out of M are distinct differences in the dimer interface across the membrane, where M shows a tighter dimer interface closer to the membrane outer leaflet as well as a gap between cytosolic domain subunits that forms an enclosed pocket lined by polar residues. In ORF3a, transmembrane regions are less closely opposed and a gap between subunits extends from halfway across the membrane to halfway down the cytosolic domains. The result is a larger cavity that is open to the membrane and cytoplasm. Mutations in the ORF3a cavity alter ion channel activity, consistent with the cavity forming part of the conduction path. Tight subunit association may therefore be important for the structural role of M, while loose subunit association that creates a large open cavity may be essential for the viroporin activity of ORF3a.
In further contrast to ORF3a, which was seen to form stable tetramers through electrostatic interactions between neighboring dimers, we see no evidence that M forms higher order oligomers under similar experimental conditions. Surface characteristics of the M dimer lend credence to the possibility that M exists solely as a dimer in the membrane-one striking feature of the M C-terminal beta sandwich domain is the presence of three sizable patches of positive charge that dominate its solvent exposed surface. Molecular dynamics simulations of M show that the dimer is stable and does not readily adopt alternate conformations at physiological temperature over the 1.6 µs trajectory. Taken together, these data suggest a purely structural role for M, whereby M mediates morphological changes in host cell membranes not through forming networks of M dimer-dimer interactions or through large-scale conformational changes, but rather through interactions with other SARS-CoV-2 structural proteins and perhaps negatively charged lipid headgroups or viral RNA.
M has also been shown to play a crucial role in viral assembly through protein-protein interactions with other coronavirus structural proteins such as N and S. Spike proteins are incorporated into coronavirus virions via interactions between the cytosolic tail of S and the cytosolic domain of M, however the precise details of this interaction are unknown 19 . In SARS-CoV-2, M and N or S are the minimal components required for forming VLPs when expressed heterologously in cells 23,24 . Several recent studies have suggested that the C-terminal domain of N is the site of interaction between SARS-CoV-2 M and N, but as with S a precise binding site has not been established 32,33 . It is possible that M and N interactions are mediated by favorable electrostatic interactions between negatively charged residues of the N CTD and one or more of the basic patches identified on the surface of the cytosolic domain of M. Through the sheer abundance of M dimers found in the membrane of SARS-CoV-2 virions, M and N together might facilitate VLP formation via a mechanism similar to the Gag precursor of HIV, where the high concentration of M C-terminal domains at the cytoplasmic membrane surface recruit and organize many N proteins that together physically extrude a membranous bud.
At present the World Health Organization puts the confirmed number of COVID-19 cases worldwide at nearly 530 million. Over the last two years the SARS-CoV-2 virus has undergone many mutations that have been extensively documented through sequencing efforts worldwide 34 . Despite this, the M protein sequence has remained virtually unchanged-a testament to the critical role that M plays in viral replication and assembly 35 . Furthermore, while only 20 amino acids in length, the N-terminus of M has been found to be highly immunogenic in COVID-19 patients [8][9][10][11] . M has also been shown to modulate innate immune response and could contribute to lung injury often seen in severe cases 25,26 . Given its clear importance in the coronavirus life cycle and pathogenicity, M presents an attractive target for therapeutics or vaccines. While M is well conserved across Coronaviridae (Fig. S5), it shows particularly high conservation between SARS-CoV-1 and SARS-CoV-2, with a sequence similarity of 90.54%, highlighting its potential as a therapeutic target for emergent coronaviruses in the future.    (a,b) Views of the wide and narrow faces of (a) M and (b) ORF3a colored according to electrostatic surface potential from red (electronegative, -10 kbTec -1 ) to blue (electropositive, +10 kbTec -1 ). (c) Views of three electropositive surface patches on M cytosolic domains with basic residues labeled and shown as sticks with blue nitrogen atoms. (a) Overlay of M cryo-EM structure (colored in pink and blue) and final structure (in white) following 1.6 µs all atom molecular dynamics simulation. (b) Overall RMSD between simulated and initial structure during simulation. (c) Root mean square fluctuation of protein residues in the simulation. Orange and yellow colors correspond to individual M protein chains. (d) Number of C-alpha contacts between two monomers. (e) Structural representation of distances and angles used for calculations in (f-h). (f) A188-R107-I73 angle plot for each monomer. One monomer has slightly higher values than the other. (g) Center of mass distance between I73 residues at the top of the TM2-TM3 linker. (h) Center of mass distance between A188 residues at the base of the cytosolic domains. (i) Mean radius of the enclosed pocket in M over the simulation trajectory versus distance along the symmetry axis. At its widest positions, the pocket is wide enough to accommodate two water molecules. (j) Minimum hole radius vs. the frame number in the simulation. The lack of substantial changes in radius indicates a stable pocket size and shape that does not open to solution during the simulation.     to dark blue (highly conserved). Accession numbers are indicated. Sequences were selected from representative species from each Coronavirus subgenus. Secondary structure from SARS-CoV-2 M is drawn above the alignment.

Cloning and protein expression
The coding sequence for SARS-Cov-2 M protein (Uniprot P0DTC5) was synthesized (IDT, Newark, NJ) and cloned into a vector based on the pACEBAC1 backbone (MultiBac; Geneva Biotech, Geneva, Switzerland) with an added C-terminal PreScission protease (PPX) cleavage site, linker sequence, superfolder GFP (sfGFP) and 7xHis tag, generating a construct for expression of M-SNS-LEVLFQGP-SRGGSGAAAGSGSGS-sfGFP-GSS-7xHis 37 . MultiBac cells were used to generate a Bacmid according to manufacturer's instructions. Sf9 cells were cultured in ESF 921 medium (Expression Systems, Davis, CA) and P1 virus was generated from cells transfected with Escort IV reagent (MillaporeSigma, Burlington, MA) according to manufacturer's instructions. P2 virus was then generated by infecting cells at 2 million cells/mL with P1 virus at a MOI ~0.1, with infection monitored by fluorescence and harvested at 72 hours. P3 virus was generated in a similar manner to expand the viral stock. The P2 or P3 viral stock was then used to infect Sf9 cells at 4 million cells/mL at a MOI ~2-5. At 72 hours, infected cells containing expressed M-sfGFP protein were harvested by centrifugation at 2500 x g for 10 minutes and frozen at -80°C.

Protein purification
Infected Sf9 cells from 1 L of culture (~15 mL of cell pellet) were thawed in 100 mL of Lysis Buffer containing 50 mM HEPES, 150 mM KCl, 1mM EDTA pH 8. Protease inhibitors (Final Concentrations: E64 (1 µM), pepstatin A (1 µg/mL), soy trypsin inhibitor (10 µg/mL), benzamidine (1 mM), aprotinin (1 µg/mL), leupeptin (1µg/mL), AEBSF (1mM), and PMSF (1mM)) were added to the lysis buffer immediately before use. Benzonase (4 µl) was added after the cell pellet thawed. Cells were then lysed by sonication and centrifuged at 150,000 x g for 45 minutes. The supernatant was discarded, and residual nucleic acid was removed from the top of the membrane pellet using DPBS. Membrane pellets were scooped into a dounce homogenizer containing extraction buffer (50 mM HEPES, 150 mM KCl, 1 mM EDTA, 1% n-Dodecyl-β-D-Maltopyranoside (DDM, Anatrace, Maumee, OH), 0.2% cholesteryl hemisuccinate Tris salt (CHS, Anatrace, Maumee, OH) pH 8). A stock solution of 10% DDM, 2% CHS was dissolved and clarified by bath sonication in 200 mM HEPES pH 8 prior to addition to buffer to the indicated final concentration. Membrane pellets were then homogenized in extraction buffer and this mixture (150 mL final volume) was gently stirred at 4°C for 1.5 hours. The extraction mixture was centrifuged at 33,000 x g for 45 minutes and the supernatant, containing solubilized membrane protein, was bound to 4 mL of Sepharose resin coupled to anti-GFP nanobody for 1.5 hours at 4°C. The resin was then collected in a column and washed with 10 mL of buffer 1 (20 mM HEPES, 150 mM KCl, 1 mM EDTA, 0.025% DDM, 0.005% CHS, pH 7.4), 40 mL of buffer 2 (20 mM HEPES, 500 mM KCl, 1 mM EDTA, 0.025% DDM, 0.005% CHS, pH 7.4), and 10 mL of buffer 1. The resin was then resuspended in 6 mL of buffer 1 with 0.5 mg of PPX protease and rocked gently in the capped column for 2 hours. Cleaved M protein was then eluted with an additional 12 mL of wash buffer, spin concentrated to ~1 mL with Amicon Ultra spin concentrator 10 kDa cutoff (Millipore), and loaded onto a Superose 6 increase column (GE Healthcare, Chicago, IL) on an NGC system (Bio-Rad, Hercules, CA) equilibrated in buffer 1. Peak fractions containing M protein were then collected and spin concentrated prior to incorporation into nanodiscs.

Nanodisc Formation
Freshly purified M protein in Buffer 1 was reconstituted into MSP1E3D1 nanodiscs with a mixture of lipids (DOPE:POPS:POPC at a 2:1:1 mass ratio, Avanti, Alabaster, Alabama) at a final molar ratio of 1:4:400 ( M:MSP1E3D1:lipid). 20 mM solubilized lipid in lipid dilution buffer (20 mM HEPES,150 mM KCl,pH 7.4) was mixed with additional DDM/CHS detergent and M protein at 4°C for 30 minutes before addition of purified MSP1E3D1. This addition brought the final concentrations to approximately 10 µM M protein, 40 µM MSP1E3D1, 4 mM lipid mix,10 mM DDM, and 1.7 mM CHS. The solution with MSP1E3D1 was mixed at 4°C for 15 minutes before addition of 150 mg of Biobeads SM2. Biobeads (washed into methanol, water, and then Nanodisc Formation Buffer) were weighed after liquid was removed by pipetting (damp weight). This final mixture was then gently tumbled at 4°C overnight (~ 12 hours). Supernatant was cleared of beads by letting large beads settle and carefully removing liquid with a pipette. Sample was spun for 10 minutes at 21,000 x g before loading onto a Superose 6 increase column in 20 mM HEPES, 150 mM KCl, pH 7.4. Peak fractions corresponding to M protein in MSP1E3D1 were collected, 10 kDa cutoff spin concentrated and used for grid preparation. MSP1E3D1 was prepared as previously described 38 without cleavage of the His-tag. maintained with a Nose-Hoover thermostat 52,53 and a coupling time constant of 1.0 ps in GROMACS. The pressure was set at 1bar with a Berendsen barostat 54 during initial relaxation. For the production runs, the Parrinello-Rahman barostat was used semi-isotropically with the compressibility of 4.5 x 10 -5 and a coupling time constant of 5.0 ps 55,56 . For the non-bonded interactions a switching function between 1.0 and 1.2 nm was used. The long-range electrostatics were computed using Particle Mesh Ewald (PME) 57 . The LINCS algorithm was used to constrain hydrogen bonds 58 . We performed 1.6 µs production run for the system and used Frontera (TACC), and Midway2 (Research Computing Center at the University of Chicago) to run these simulations.
The RMSD of the protein and RMSF per residue (Fig. 5B, C) were calculated using the GROMACS module. The center-of-mass (COM) distances between two residues (Fig. 5G, H), number of Cα contacts between two monomers (Fig. 5D), and angles between transmembrane and cytosolic regions (Fig. 5F) were also calculated using the GROMACS package 49 . The analysis of the M pocket was performed using the HOLE program 59 implemented in MDAnalysis 60 (Fig. 5I, J). In Fig. 5J, each frame was taken at a 4ns time step. The lipid distribution around the M protein was calculated using the MDAnalysis Python packages 60 (Fig. S4). Visual Molecular Dynamics (VMD) and PyMOL were used as visualization software.

Data and reagent availability
All data and reagents associated with this study are publicly available. The final model is in the PDB under 8CTK, the final map is in the EMDB under EMD-26993, and micrographs (original and motion corrected) and final particle stack are deposited in EMPIAR.