Abstract
The HIV-1 protein Nef suppresses multiple immune surveillance mechanisms to promote viral pathogenesis1. Individuals infected with HIV-1 encoding defective nef genes do not develop AIDS for decades2,3. A key target of Nef is the cellular receptor CD4. Although essential for viral entry into host cells, CD4 is problematic for the virus later in its replication cycle: CD4 disrupts processing of the viral glycoprotein, Env, inhibiting infectivity4; it interferes with the release of new virions5,6; and it causes vulnerability to superinfection, causing premature cell death and limiting viral productivity7. Furthermore, binding of CD4 to Env exposes otherwise-concealed Env epitopes, rendering infected cells more susceptible to antibody-dependent cellular cytotoxicity and virus particles more susceptible to neutralizing antibodies8-10. HIV-1 has evolved strategies to mitigate these problems. Newly synthesized CD4 is targeted in the endoplasmic reticulum by the viral Vpu protein for proteasomal degradation11. Surface-expressed CD4, in contrast, is targeted by Nef for endocytosis and lysosomal degradation12-15. Nef’s effect on CD4 involves hijacking of clathrin adaptor complex 2 (AP2)-dependent endocytosis16,17. Although how Nef associates with a part of the tetrameric AP2 is understood18, a complete understanding of the interaction, especially how CD4 is sequestered by Nef into a complex with AP2, has remained elusive. Here, we present a high-resolution crystal structure that describes the underlying mechanism. An intricate combination of conformational changes occurs in both Nef and AP2 to enable CD4 binding and downregulation. Strikingly, a pocket on Nef previously identified as crucial for recruiting class I MHC is also responsible for recruiting CD4, revealing a potential approach to inhibit two of Nef’s activities and sensitize the virus to immune clearance
To pursue the structure, we assembled the protein complex in vitro. We fused the cytoplasmic domain of CD4 (CD4CD) to the C-terminus of Nef via a 36 amino acid-long, flexible linker (Fig. 1a). Instead of using the full length CD4 tail (394-433), we included only residues 394 to 419, containing all the CD4 determinants reportedly required for Nef-mediated downregulation13,19,20. Since Nef residues within the N-terminal amphipathic helix are dispensable for CD4 downregulation21, we truncated 25 amino acids from the Nef N-terminus. We engineered the tetrameric AP2 complex by removing the mobile C-terminal domain of the μ2 subunit (136-423), enabling AP2 to adopt an open conformation in which its cargo binding sites are accessible. Binding between the Nef-CD4CD fusion protein and the modified AP2 construct (AP2Δμ2-CTD) was confirmed in a GST pulldown assay (Fig. 1b).
We then solved the crystal structure of the Nef-CD4CD fusion and AP2Δμ2-CTD complex to a resolution of 3.0Å (Fig. 1c, Extended Data Table 1). All polypeptides are largely resolved except for the flexible linker between Nef and CD4CD, which is disordered as expected, and part of the N-terminal region of β2. As revealed by the structure, Nef functions as a “connector” between AP2 and CD4CD; CD4CD binds Nef but none of the subunits of AP2. As previously reported, Nef’s C-terminal loop interacts with AP2 in part via mimicry of the acidic dileucine motifs in cellular proteins that are recognized by the α-σ2 subunits (Figure 1d)18. The extensive interface here, involving several charge-charge and hydrophobic interactions, is the foundation of the Nef-AP2 association.
CD4 is recruited to a pocket on Nef that is opposite the C-terminal loop. The association is mainly hydrophobic and involves three CD4 residues: Ile410, Leu413, and Leu414 (Fig. 2a). The dileucine motif of CD4 – Leu413/414 – is within a short helix. Leu414 and Ile410 dip into a hydrophobic pocket of Nef formed by Phe121, Trp124, Met79, Thr138, and Pro78 (Fig. 2b). Leu413 of CD4 sits just outside of that pocket and is accommodated by Nef residues Phe121, Leu37, Asn52, and Cys55 (Fig. 2b). Nef residue Asp123 contributes to CD4 binding by hydrogen-bonding with the backbone nitrogen of CD4 Leu413 and supports the helix-turn of CD4 (Fig. 2b). These structural findings explain previously observed roles of CD4 residues – the dileucine motif and Ile41013,20,22 – as well as Nef residues F121, D123, and Trp12423,24. To further confirm the structural model, we tested the mutations F121D, D123R, M79D, and T138D for their effects on CD4 surface-downregulation. The Nef mutants F121D and D123R were expressed similarly to the wild type but were unable to downregulate CD4 (Fig. 2ef). The M79D and T138D Nef mutants were relatively poorly expressed, but dose-response experiments suggested that this was insufficient to explain their functional defects (Fig. 2ef and Extended Data Fig. 1).
CD4 recruitment is secured by the Nef N-terminal loop, which is highly ordered in the structure (Fig. 2c and Extended Data Fig. 2). The part of the loop immediately connected to the rigid core of Nef, residues Phe68 to Pro75, wraps around the core. N-terminally, the loop then takes a sharp “U-turn”, placing a short helix at the interface between the Nef core and the α subunit of AP2. Two residues within this helix, Trp57 and Leu58, fit into a hydrophobic pocket formed mainly by Nef residues: Leu112, Pro122, Phe121, and Tyr115 (Fig. 2d). Lys298 of α, with its head group stabilized by Nef Asp111 through charge-charge interactions, also contacts Nef Trp57 via its hydrocarbon chain (Fig. 2d). The Nef acidic cluster (Glu62-65), located in the “U-turn” region, does not contribute to CD4-binding, consistent with previous functional observations16,25,26. The Nef N-terminal loop extends to contact the CD4 tail (Fig. 2c). This part of Nef, residues Cys55 to Val33, forms a wall-like structure to support CD4-binding. Here, contacts are made by several Nef residues with the short helical turn of CD4 and flanking residues. Overall, this conformation of the Nef N-terminal loop explains the crucial roles of Trp57, Leu58, Leu112, Tyr115, and Pro122 in CD4-downregulation16,17,21,23,27 (Fig. 2ef) as well as the cooperativity in the three-way binding between CD4CD, Nef and the α/σ2 hemicomplex of AP228. Nef residues 41-46 are disordered with most of their electron density missing (Fig. 2c, dotted line). The density used for modeling Nef 34-40 is significant, but isolated and less well-defined, yielding a degree of uncertainty in the structural assignment for this region (Extended Data Fig. 2). We challenged the assignment by mutating Leu37 to Asp, as the model predicts that Leu37 directly contacts CD4 Leu413 (Fig. 2b). This mutation abolished CD4 downregulation (Extended Data Fig. 1), supporting the structural assignment and the role of Leu37 in CD4 downregulation.
Nef binding causes a large conformational change and severe destabilization in the β2-μ2 half of AP2. Overlaying the α and σ2 subunits of the “open” AP2 structure29 on the current structure reveals that the β2 and μ2 N-terminal domains (μ2-NTD) of AP2 in the current structure “move” out as one rigid body, causing a greater “opening” of the tetramer (Fig. 3a). In addition, a large portion of the β2 N-terminus, residues 1-88 encompassing the first four helices, is displaced from the rest of β2. If β2 had maintained its original fold, then its N-terminus would clash with the Nef core (Extended Data Fig. 3); the observed displacement is likely necessary to accommodate Nef-binding. While most of this displaced portion of β2 becomes disordered, the first helix preserves its helical structure and binds to the Nef core through hydrophobic interactions (Fig. 3bc).
The Nef-induced destabilization and structural changes of the β2-μ2 half of AP2 were further characterized by chemical cross-linking mass spectrometry (XL-MS) and integrative structure modeling (Extended Data Fig. 4). Here, disuccinimidyl sulfoxide (DSSO), a MS-cleavable, bifunctional amine-reactive small molecule, was used to cross-link proximal Lys residues or N-termini of the Nef-CD4CD fusion and AP2Δμ2-CTD complex. Cross-linked proteins separated by SDS-PAGE were trypsin digested and resulting peptides analyzed by specialized LC-MS3 experiments for identification of cross-linked residues (Extended Data Fig. 4a). Application of this pipeline to the complex identified intra- and inter-linked peptides corresponding to 90 unique cross-linked residues (Extended Data Table 2, Extended Data Fig. 4b). Lys residues from the displaced β2 N-terminal domain (1-88) are involved in a total of 24 cross-links (Extended Data Table 2, bold entries). Importantly, most of these cross-links are made by Lys residues flanking the first helix (i.e. Lys5, 11, 12, 26, and 27), consistent with this helix binding specifically in the complex. In contrast, Lys residues from the other dislocated β2 N-terminal helices (i.e. Lys29, 31, 35, 36, 45, 66, 67, 78) are rarely observed, consistent with a lack of fixed residence within the complex. The XL-MS data, which captures structural information from a conformationally heterogeneous population of protein complexes as they exist in solution, the crystal structure, and other structural information (methods) were used for integrative modeling30 to produce a model ensemble which describes the complex in full (Extended Data Fig. 4c). Segments that are disordered in the crystal structure are modeled and represented as helices or flexible strings of beads (Extended Data Fig. 4c). The model ensemble agrees satisfactorily with the observed cross-links (89% of cross-links are satisfied, Extended Data Fig. 4d). Furthermore, it indicates large variability in the positions and orientations of the helices in the partially unfolded β2 segment (Extended Data Fig. 4e-g), consistent with the structural heterogeneity of this region indicated by the crystallographic data.
Binding of the first helix of β2 to the Nef core in the current structure closely resembles how the N-terminal amphipathic helix of Nef (deleted from the construct used herein) binds the same location when Nef hijacks AP1 for MHC-I downregulation (Fig. 3d)31. That interaction, mediated by Nef Trp13 and Met20, is critical for MHC-I downregulation but is dispensable for CD4 downregulation21,31,32. Our current structure indicates that intramolecular association of this Nef helix with the Nef core would force the Nef N-terminal loop to deviate from the conformation associated with CD4-binding (Extended Data Fig. 5). We suspect that this is an intricate plot of the virus: Nef binding forces the N-terminal helices of β2 to destabilize and unfold. By providing a binding-site for the first β2 helix on its core, Nef provides partial compensation for this destabilization. Moreover, the N-terminal β2 helix is now used to compete the Nef amphipathic helix off the core, freeing the N-terminal loop of Nef to adopt the conformation shown in Figure 2c for CD4 downregulation. A potential caveat to this hypothesis is that mutations of Nef residues F90, L97, L100, I109, W113, and I114, which line the binding site on the Nef-core for the β2 helix, have minimal effects on the downregulation of CD4 (Fig. 2, Extended Data Fig. 1 and 6). However, since these mutations likely affect the binding of the β2 helix and the Nef N-terminal amphipathic helix similarly, they might have minimal influence on the competition between these two helices for the Nef core in vivo, consistent with their lack of effects on CD4 downregulation. In contrast, and consistent with the model above, mutations of this site affect the downregulation of class I MHC (Extended Data Fig. 7); alanine substitution of L100 together with I109 is particularly informative: this mutation yields a well-expressed Nef that cannot downregulate class I MHC yet downregulates CD4.
Comparing the current structure with our earlier structure of Nef in complex with the μ1-CTD and the MHC-I cytoplasmic domain reveals versatility and specificity in how the structurally homologous AP1 and AP2 are selectively co-opted to downregulate MHC-I and CD4 (Fig. 4a). For MHC-I downregulation, Nef interacts solely with the μ1 subunit and exploits the conserved Tyr-based motif-binding site on AP131. For CD4 downregulation, Nef exploits the acidic dileucine-binding site on the α and σ2 subunits of AP2 and contacts all subunits except μ2. Our models show that by allowing the N-terminal Nef helix and the N-terminal β2 helix to bind the same pocket on the core, Nef creates a “molecular switch” that links its use of different AP complexes with the modulation of different targets: binding to AP2 frees the N-terminus of Nef from the core to recruit CD4, whereas binding to AP1 leaves the core free to bind the N-terminus of Nef and thereby facilitates the recruitment of MHC-I.
Despite these distinct modes of binding, the cytoplasmic domains of MHC-I and CD4 share much of the same binding “pocket” on Nef (Fig. 4b). Small molecules that bind this conserved site (Extended Data Fig. 8) might block both MHC-I and CD4 surface downregulation, potentially revitalizing immune mechanisms to combat and even clear HIV-1.
METHODS
Materials
Gene of rat α adaptin is a kind gift from Dr. Juan Bonifacino (NIH). Genes of β2, μ2, and σ2 adaptins were amplified from a cDNA library of human HEK293T cells. For cross-linking mass spectrometry analysis, anhydrous dimethyl sulfoxide (DMSO), disuccinimidyl sulfoxide (DSSO), MS-grade trypsin, HPLC-grade water, formic acid, and acetonitrile were all purchased from Thermo Fisher Scientific. 4-20% TGX SDS-PAGE gels were purchased from Bio-Rad. MS-safe AcquaStain was purchased from Bulldog Bio.
Cloning, expression, and purification of proteins
The Nef-CD4CD fusion was constructed by fusing CD4 (394-419) to the C-terminus of HIV-1 Nef (26-206, NL4.3) via a flexible linker of 36 amino acids (GVDGSDEASELACPTPKEDGLAQQQTQLNLRGSGSG). The encoding gene was cloned into a pMAT9 expression vector. The fusion protein was over-expressed in E. coli NiCo21(DE3) cells, carrying a N-terminal maltose binding protein (MBP) tag. Cells were induced with 0.1 mM isopropyl β-d-thiogalactopyranoside (IPTG) at OD600 of 0.8 and grown at 16°C overnight. Cells were lysed using sonication. Expressed protein was first purified using the MBP affinity column. For GST pull-down experiments, the MBP-tagged protein was further purified by a HiTrap Q anion exchange column, followed by a final Superdex 200 size exclusion column. For crystallization, the MBP fusion tag was cleaved off by the SARS-CoV Mpro protease33,34 after the affinity purification. The tagless protein was similarly purified using HiTrap Q anioin exchange column and the Superdex 200 size exclusion column.
For AP2Δμ2-CTD, genes for each of the four subunits were cloned into two duet vectors: a pETDuet vector with human β2(1-591) and human μ2(1-135) at each of the multiple cloning sites, respectively; a pCDFDuet vector with rat α(1-621) carrying a C-terminal GST tag and human σ2(1-142). The heterotetrameric AP2Δμ2-CTD core was expressed overnight at 22°C in the NiCo21(DE3) cells in Terrific broth after induction with IPTG. For binding assays, the GST-tagged the AP2Δμ2-CTD complex was purified by a sequence of Ni-NTA gravity column, GST affinity column, and Superdex 200 size exclusion column. For crystallization, the GST tag was cleaved off by the Tev protease after the GST affinity purification, followed by a final purification using the Superdex 200 size exclusion column.
In vitro GST pulldown assay
Purified proteins AP2Δμ2-CTD-GST (0.2 mg) and MBP-Nef-CD4CD (0.4 mg) were mixed in a final volume of 100 ul and incubated at 4°C for 30 minutes. The protein solution was then loaded onto a small gravity flow column containing 0.2 ml GST resin. Flow through was collected and the resin was extensively washed with 5 × 0.9 ml GST binding buffer (50 mM Tris, pH 8, 100 mM NaCl, 0.1 mM TCEP). The bound proteins were then eluted with 5 × 0.1 ml GST elution buffer containing 10 mM reduced glutathione. The eluted proteins were analyzed by SDS-PAGE.
Crystallization and crystallographic data collection
Crystallization was carried out using the microbatch under-oil method. The purified AP2Δμ2-CTD core and the Nef-CD4CD fusion chimera were mixed at 1:5 molar ratio to a final concentration of 2.5 mg/ml (25 mM Tris, pH 8.0, 100 mM NaCl, 0.1 mM TCEP, 0.1 mM PMSF). Equal volumes of the protein solution and the precipitant solution (100 mM HEPES, pH 6.9, 200 mM KCl, 15% PEG4000, 6% 1,6-hexanediol) were mixed. The drop was sealed using a mixture of paraffin and silicon oil at a 2:1 ratio.
Crystals appeared within 24 h at room temperature and grew to full size in about a week. Crystals were cryo-protected using the precipitant solution containing 20% glycerol and then frozen in liquid nitrogen. Datasets were collected at NE-CAT (24-ID) at the Advanced Photon Source, Argonne National Laboratory, and FMX/AMX (17-ID) at the National Synchrotron Light Source II, Brookhaven National Laboratory. Diffraction data was processed using HKL200035. The crystals were in the P41 space group and diffracted to a highest resolution of 3.0 Å. The statistics are summarized in Table S1.
Structure determination and refinement
The structural solution was obtained by molecular replacement using PHASER36 in PHENIX37. Only one molecule exists in the asymmetric unit. The PDB of the open AP2 core (2XA7) was divided into two search models: the α and σ2 hemicomplex and the β2 and μ2-NTD hemicomplex. Together with the Nef structure (4EMZ), the three models were used sequentially to successfully obtain the solution. Iterative rounds of model building in COOT38 and refinement in Phenix39 were carried out. The final model has an Rwork/Rfree of 0.241/0.277. A Ramachandran plot showed that 96.4% of the residues are in the favored region, together with 3.5% in the allowed region and 0.1% as outliers. The refinement statistics are summarized in Extended Data Table 1.
CD4 downregulation assays
HeLa cells expressing CD4 (TZM-bl, obtained from Dr. John Kappes via the NIH AIDS Reagent Program) were transfected using Lipofectamine2000 (Thermo Fisher Scientific) with pCG-GFP (a gift from Dr. Jacek Skowronski) and pCI-NL, a pCI-neo-based plasmid (Promega) expressing NefNL4-3 or the indicated Nef-mutants. Unless otherwise indicated, 1.6 µg of total plasmid DNA was used in each transfection, 0.4 µg of pCG-GFP and 1.2 µg of pCI-NL or derivative-mutants. In some experiments, the amount of pCI-NL plasmid encoding wild type Nef was reduced in a dose-response format to allow comparison with relatively poorly expressed Nef-mutants; in those cases, the total amount of plasmid DNA was equalized with the empty vector pCI-neo. One day later, half the cells were stained for surface CD4 (anti-human CD4, Biolegend, conjugated directly to APC), fixed in formaldehyde, then analyzed by two-color flow cytometry using an Accuri 6 cytometer. “Live cell” gates were set using untransfected cells; gates for GFP were set using cells transfected only with pCI-neo; and gates for CD4 were set using cells stained with an APC-conjugated antibody isotype control. The other half of the cells were lysed in Laemmli buffer, and the proteins were resolved on 10% denaturing SDS-PAGE gels before transfer to polyvinylidene difluoride membranes. Nef was detected using a polyclonal antiserum raised to NL4-3 Nef in sheep (a gift from Dr. Celsa Spina, University of California San Diego). β-actin was detected using a murine monoclonal antibody (Sigma-Aldrich). Species-specific secondary antibodies conjugated to HRP were visualized using Western Clarity detection reagent (Bio-Rad). Chemiluminescence was recorded using a ChemiDoc Imager System (Bio-Rad).
Class I MHC downregulation assays
HEK293 cells, which naturally express HLA-A2, were transfected using Lipofectamine2000 with pCG-GFP (0.4 µg) and pCI-NL, the indicated Nef-mutants, or the empty vector pCIneo (1.2 µg). One day later, half the cells were stained for surface HLA-A2 (Biolegend, anti-HLA-A2 conjugated directly to APC), fixed in formaldehyde, then analyzed by two-color flow cytometry using an Accuri 6 cytometer. “Live cell” gates were set using untransfected cells; gates for GFP were set using cells transfected only with pCI-neo; and gates for HLAA2 were set using cells stained with an APC-conjugated antibody isotype control. The other half of the cells were processed for western blot as above.
DSSO-based cross-linking mass spectrometry analysis (XL-MS)
Individual preparations of Nef-CD4CD-AP2Δμ2-CTD complex (2.8mg/mL and 0.7mg/mL at 1:5 Nef-CD4CD molar excess) were cross-linked using increasing molar ratios of DSSO (Thermo Fisher Scientific), for 5, 10, or 30 minutes at 4 or 37°C. Cross-linked proteins were separated on 4-20% TGX gradient SDS-PAGE gels (Bio-Rad), stained with MS-safe AcquaStain (Bulldog Bio), and cross-linked product bands excised and submitted for in gel reduction, alkylation, and trypsin digestion. Extracted peptides were separated online by Thermo Easy nLC 1000 by reverse-phase HPLC (75 μm × 30 cm fused silica packed with 1.9-μm Reprosil-Pur C18 AQ resin (Dr. Maisch-GmbH) column), running a linear gradient of 5-30% B in 50min, 35-95% B in 5 min, and 95% B for 4 min at a flow rate of 300 nL/min (buffer A: 100% H2O/0.1% FA; buffer B: 100% CAN/0.1% FA). For each sample, XL-MS3 data was acquired on Thermo Orbitrap Elite using two similar data dependent acquisition experiments40 where a single acquisition cycle consisted of: 1) one full MS1 scan (350–1500 m/z, 120,0000 resolution, AGC target of 1×106); 2) top two data-dependent MS2 scans (15,000 resolution, AGC target of 5×104, normalized collision energy = 22%); and 3) top three (or four) MS3 scans (ion count target 104, normalized collision energy = 35%).
Precursor ions (charge state ≥4+) were dynamically excluded for 20 seconds (tolerance of 10 ppm). Charge state and dynamic exclusion were applied to MS2 but turned off for MS3 acquisition.
Raw data was extracted to MGF format using MSConvert41, with MS3 data used for protein and peptide searches. Searches were performed by batch-tag feature of a locally installed version of Protein Prospector (v. 5. 19. 1, University of California San Francisco), with DSSO remnant mass modifications set as variable modifications (e.g. Alkene, Sulfenic-acid, and Thiol) 40. Peptide reports were generated using the Search Compare feature of Protein Prospector, and dead-end, intra-linked, and inter-linked peptides identified by in house software program XL-Discoverer (part of new XLTools suite)42. Summarization and confidence assignment of inter-linked peptides was performed by in house scripts that reduce ambiguous assignments and distribute redundant counts.
Integrative structure modeling of the Nef-CD4CD-AP2Δμ2-CTD complex
We applied an integrative structural modeling approach30,43-45 to characterize the structure of the Nef-CD4CD-AP2Δμ2-CTD complex in solution, based on the crystal structure and the 90 DSSO cross-links. Integrative structure determination proceeded through the standard four stages43,44,46-49: 1) gathering data, 2) representing subunits and translating data into spatial restraints, 3) configurational sampling to produce an ensemble of structures that satisfies the restraints, and 4) analyzing and validating the ensemble structures and data. The integrative structure modeling protocol (ie, stages 2, 3, and 4) was scripted using the Python Modeling Interface (PMI) package, a library for modeling macromolecular complexes based on our open-source Integrative Modeling Platform (IMP) package44, version 2.8 (https://integrativemodeling.org). Files containing the input data, scripts, and output results are available at https://github.com/salilab/Nef_CD4_AP2.
(1) Gathering data:
Modeling was based on the crystal structure, a comparative model of the β2 subunit 24-89 region built based on the AP2 structure29 using MODELLER50,51 and the 90 DSSO cross-links.
(2) Representing subunits and translating data into spatial restraints:
To maximize computational efficiency while avoiding using too coarse a representation, we represented the Nef-CD4CD-AP2Δμ2-CTD complex using a coarse-grained one residue per bead representation. The regions absent from the crystallographic structure and the comparative model were represented by a flexible string of beads corresponding to one residue each. To explore the positions and orientations of the Nef-CD4CD-AP2Δμ2-CTD components, we defined the following rigid bodies: α-σ2, β2(89-591)-μ2, Nef-CD4CD-β2(15-23), and the 4 helices in the partially unfolded β2 segment (29-43, 49-61, 64-78, and 81-86). With this representation in hand, we next translated the input information into spatial restraints as follows.
First, the 90 DSSO cross-links were used to construct a Bayesian term that restrained the distances spanned by the cross-linked residues52,53. The cross-link restraints were applied to the one residue-per-bead representation for the X-ray structure, comparative models, and flexible strings of beads. Second, to use the crystal structure of the hexamer as a template, we imposed “structural equivalence” distance restraints between pairs of residues closer than 8.0 Å across an interface between two rigid bodies, designed to restrain the model to resemble the template as much as possible. Third, excluded volume restraints were applied to all pairs of beads46,54. Fourth, we also applied the sequence connectivity restraint, using a harmonic upper bound on the distance between two consecutive beads in a subunit, with a threshold distance equal to four times the sum of the radii of the two connected beads46,54.
(3) Configurational sampling to produce an ensemble of structures that satisfies the restraints:
The initial positions and orientations of rigid bodies and positions of the beads in the flexible strings of beads were randomized. The structural models were generated using Replica Exchange Gibbs sampling, based on the Metropolis Monte Carlo (MC) algorithm53,55. Each MC step consisted of a series of random transformations (ie, rotations and translations) of the positions of the beads and rigid bodies. The sampling produced 4,000,000 models from 80 independent runs.
(4) Analyzing and validating the ensemble structures and data:
Model validation follows four major steps30,56:
Selection of the models for validation: The ensemble of models for further analysis was objectively defined as follows. For each trajectory, the MC step at which all data likelihoods and priors are equilibrated (run equilibration step) was computed and all prior frames were discarded57. Sampling of the Nef-CD4CD-AP2Δμ2-CTD complex yielded 2,007,800 representative structures that sufficiently satisfied the input restraints.
Estimation of sampling precision: The precision at which sampling sampled the selected structures (sampling precision) was estimated56. The sampling precision must be comparable or higher than the precision of the structure ensemble consistent with the input data (model precision). As a proxy for testing the thoroughness of sampling, we performed four sampling convergence tests as described56. We performed three sampling convergence tests: 1) verify that the scores of refined structures do not continue to improve as more structures are computed. 2) Confirm that the selected structures in independent sets of sampling runs (Sample A and Sample B) satisfy the data equally well. The non-parametric Kolmogorov-Smirnov two-sample test (two sided) indicates that the difference between the two score distributions is insignificant (p-value (0.08)>0.05). In addition, the magnitude of the difference is small, as demonstrated by the Kolmogorov-Smirnov two-sample test statistic (D=0.02). 3) Cluster the structural models and determine the sampling precision at which the structural features can be interpreted. Three criteria were used for determining the sampling precision, evaluated as a function of the RMSD clustering threshold. First, the p-value is computed using the χ2-test (one-sided) for homogeneity of proportions. Second, an effect size for the χ2-test is quantified by the Cramer’s V value. Third, the population of structures in sufficiently large clusters (containing at least ten structures from each sample). Clustering is done at the RMSD threshold at which three conditions are satisfied (χ2-test p-value (1.0) > 0.05, Cramer’s V (0.0) < 0.10, and the population of clustered structures (0.96) > 0.80). The output of this protocol is a single distinct cluster containing the majority (96%) of the individual models. The sampling precision is defined as the average bead RMSD between the models within the cluster and its corresponding centroid in the finest clustering for which each sample contributes models proportionally to its size (considering both significance and magnitude of the difference) and for which a sufficient proportion of all models occur in sufficiently large clusters. The sampling precision for our integrative model is 13.5Å.
Estimation of model precision: The most explicit description of model uncertainty is provided by the set of all models that are sufficiently consistent with the input information (i.e. the ensemble). For example, if the models in the ensemble are clustered into a single cluster as in the case of the Nef-CD4CD-AP2Δμ2-CTD ensemble, the model precision is defined as the RMSD between models in the cluster. The precision for the Nef-CD4CD-AP2Δμ2-CTD model is 8.3 Å.
Quantification of the degree to which a model satisfies the information used to compute it: An accurate structure needs to satisfy the input information used to compute it. A DSSO cross-link restraint is satisfied by a cluster of models if the corresponding Cα–Cα distance in any of the models is less than 30 Å58. The ensemble satisfies 89% of the XLs used to compute it, including all of the Nef XLs (Extended Data Fig. 4). The unsatisfied crosslinks mostly span residues between CD4CD and the NTD’s of the α and β2 subunits (Extended Data Table 2). These violations can be rationalized as non-specific interactions between CD4CD and the AP2 complex when CD4CD is not bound to Nef in solution, false-positive crosslinks, sample heterogeneity, insufficient conformational sampling, and coarse-grained representation of the modeled components. The remaining restraints, including structural equivalence, excluded volume, and sequence connectivity restraints, are also satisfied within their uncertainties.
To compare the X-ray structure to the ensemble obtained using integrative modeling, we computed the distribution of the Cα root-mean-square deviation (RMSD) between the X-ray structure and each of the models in the ensemble. The mean Cα RMSD is 7.1 Å (4.3-12.2 95% CI).
To indicate the most flexible parts of the structure, we assessed the uncertainty of the position and orientation of each rigid body representing the hexamer in the model ensemble. To this end, all models were superimposed on each rigid body in turn, followed by computing the average RMSD for each of the other rigid bodies (Extended Data Fig. 4e). The model ensemble indicates large variability in the positions and orientations of the 4 helices in the partially unfolded β2 segment (Extended Data Fig. 4eg), consistent with the structural heterogeneity of this region indicated by the relative lack of electron density from crystallography.
Data availability
The coordinates and structural factors for the crystal structure have been deposited at the Protein Data Bank (PDB) with the accession code 6URI.
Author contributions
Y.K. performed protein expression, purification, binding assay, and crystallization. Y.K. and X.J. performed data collection, structure determination, model building, and refinement. J.K. and R.S. contributed to protein expression and purification. M.S., C.S., and P.R. performed CD4 and MHC-I downregulation assays and mutagenesis. R.K. performed crosslinking-MS. I.E. performed integrative modeling. Y.K., R.K., I.E., J.G., and X.J. designed the experiments. All contributed to data analysis. J.G. and X.J. supervised the project. Y.K., J.G., and X.J. wrote the manuscript.
Competing interests
Authors declare no competing interests.
Correspondence and requests for materials
should be addressed to X.J.
Acknowledgments
We thank Yong Xiong for helpful discussions and valuable inputs. We thank the beamline staff at the Advanced Photon Source beamline 24-ID and the National Synchrotron Light Source beamline 17-ID. We thank Juan Bonifacino for providing the gene of rat α adaptin. This work was supported by the University of Massachusetts Dartmouth startup fund (X.J.), US National Institutes of Health (NIH) grants AI102778 and AI129706 (J.G.). A.S. was supported by NIH grants U19AI135990, R01GM083960, P41GM109824, and S10OD021596. N.K. was supported by NIH grants P50GM082250 and U19AI135990. R.K. was supported by NIH fellowship F32AI127291.