ABSTRACT
Many disordered proteins conserve essential functions in the face of extensive sequence variation. This makes it challenging to identify the forces responsible for functional selection. Viruses are robust model systems to investigate functional selection and they take advantage of protein disorder to acquire novel traits. Here, we combine structural and computational biophysics with evolutionary analysis to determine the molecular basis for functional selection in the intrinsically disordered adenovirus early gene 1A (E1A) protein. E1A competes with host factors to bind the retinoblastoma (Rb) protein, triggering early S-phase entry and disrupting normal cellular proliferation. We show that the ability to outcompete host factors depends on the picomolar binding affinity of E1A for Rb, which is driven by two binding motifs tethered by a hypervariable disordered linker. Binding affinity is determined by the spatial dimensions of the linker, which constrain the relative position of the two binding motifs. Despite substantial sequence variation across evolution, the linker dimensions are finely optimized through compensatory changes in amino acid sequence and sequence length, leading to conserved linker dimensions and maximal affinity. We refer to the mechanism that conserves spatial dimensions despite large-scale variations in sequence as conformational buffering. Conformational buffering explains how variable disordered proteins encode functions and could be a general mechanism for functional selection within disordered protein regions.
INTRODUCTION
Intrinsically disordered proteins and protein regions (IDRs) are abundant components of proteomes that play key roles in regulating cellular processes [1,2]. IDRs often bind their cellular partners via short linear motifs (SLiMs). These SLiMs represent conserved interaction modules that play key roles in cell signalling and can be identified from multiple sequence alignments [3]. In contrast, the contributions of seemingly unconcerned regions outside of SLiMs to IDR function are often less-well understood. Under the classical structure-function paradigm, the low sequence conservation and high frequency of insertions and deletions of IDRs [4] is indicative of weak evolutionary restraints, leading to the view that many IDRs might play the roles of passive “spacers”, stringing together ordered domains and disordered SLiMs. However, recent progress in the quantitative description of sequence-ensemble relationships in IDR conformations [5] indicates that specific features are required to support biological activity [6,7,8,9]. The observation that IDRs with vastly different sequence characteristics have conserved sequence-ensemble relationships (SERs) [10], suggests that SERs that determine function are under natural selection.
IDRs play major roles in viral adaptation by facilitating the acquisition of novel traits [11,12,13], making viral proteins ideal models to uncover the molecular mechanisms that shape functional evolution within IDRs. Tethering is a major function encoded by IDRs [14] that allows functional coupling between domains or SLiMs and regulates key processes including enzyme catalysis [15], transcriptional regulation [16] and liquid condensate formation [17]. The intrinsically disordered adenovirus early region 1A (E1A) protein is a multifunctional signalling hub that employs multiple SLiMs [13,18,19] tethered by disordered linkers to hijack cell signalling [20]. The subversion of cell cycle regulation by E1A involves crucial interactions with the retinoblastoma (Rb) tumour suppressor. Specifically, E1A uses two SLiMs [21] tethered by a disordered linker to bind to Rb and displace Rb-bound E2F transcription factors, triggering S-phase entry and viral genome replication at the low expression levels present hours after infection [22] (Fig. 1a, b). Additional binding motifs for the CREB binding protein (CBP) TAZ2 domain [23] and the BS69 transcriptional repressor MYND domain [20] (Fig.1b), mediate the formation of ternary complexes regulated by allostery [24]. Here we test the central hypothesis regarding conserved SERs driving functions in the disordered E1A linker, demonstrating that functionally equivalent IDRs that perform a tethering function can emerge despite dramatic changes in the linear sequence.
RESULTS
Tethering is required for high affinity Rb binding and E2F displacement
To uncover the molecular mechanisms underlying E2F displacement, we selected the minimal Rb binding region from the adenovirus serotype 5 (HAdV5) E1A protein. This region harbours the E1AE2F and E1ALxCxE SLiMs tethered by a 71-residue linker (E1AWT). We tested a series of E1A constructs comprising the individual motifs or fragments where the E2F (E1AΔE) or LxCxE (E1AΔL) motifs were mutated to poly-alanine (Extended Data Fig. 1, Fig.1b). For comparison we also tested the E2F SLIM (E2F2) taken from the host transcription factor E2F2 (Fig. 1b). Isothermal titration calorimetry (ITC) (Extended Data Fig. 2 Extended Data Table 1) and SEC-SLS experiments (Extended Data Table 3) confirmed that all E1A constructs bound to Rb with 1:1 stoichiometry. To quantify binding affinities, we performed fluorescence polarization measurements using FITC-labelled constructs (Extended Data Fig. 3 Extended Data Tables 1 and 2). While the host-derived E2F2 SLiM bound to Rb with high affinity (KD = 1 nM), the E1AE2F SLIM had a KD = 119 nM, suggesting it would be a weak competitor of E2F2 (Fig. 1c). To our surprise, the motif-linker-motif arrangement (E1AWT) enabled binding to Rb with picomolar affinity (KD = 24 pM). This represents a 4000-fold enhancement compared to E1AE2F and a 40-fold enhancement compared to E2F2.
It has been proposed that a minimal flexible linker or “spacer” is required to allow the simultaneous binding of both SLiMs that is required for the displacement of E2F [25]. To test the role of tethering in E2F displacement, we carried out competition assays. As expected, neither E1ALxCxE, nor E1AE2F or E1AΔL were able to outcompete E2F from Rb (Fig. 1d). However, E1AWT was a strong competitor, disrupting the [E2F2:Rb] complex at low nanomolar concentration (Fig. 1d). The agreement among ITC, direct titration and competition experiments confirmed that tethering was required for high affinity Rb binding and E2F displacement (Fig. 1e, Extended Data Table 1).
The striking affinity enhancement between the independent and linked SLiMs of E1A can be explained by three alternative models (Fig. 1f). In Model A, the E1A linker enhances affinity by establishing additional stabilizing interactions with the Rb domain. In Model B, a primary interaction by the E1AE2F or E1ALxCxE SLiMs induces an allosteric change in Rb that enables the complementary motif to bind with higher affinity. In Model C, once a primary interaction is established, the linker functions as an entropic tether that maximizes the effective concentration (Ceff) of the second motif. We tested each of these models using a combination of biophysical measurements, molecular simulations, and evolutionary analysis.
The disordered E1A linker does not make additional stabilizing interactions with Rb
We used NMR spectroscopy to determine the structural basis for the E1AWT binding to Rb. For the E1AWT in isolation, the transverse optimized relaxation (TROSY) spectrum of 15N-labeled E1AWT revealed narrow chemical shift dispersion in the 1H-dimension. This is a characteristic signature of disordered regions and is consistent with previous work on E1A fragments (Fig. 2a) [23]. Further, the 13Cα secondary chemical shifts (CαΔδ) showed minimal deviation from random coil values obtained from disordered proteins (Fig. 2b I). Negative NHNOE values observed for E1AWT indicated fast backbone dynamics (Fig. 2b II). Finally, sequence analysis also predicted that and E1AWT is globally disordered (Fig. 2b IV). Taken together, these results indicate that the conformational ensemble of E1AWT is characterized by high heterogeneity (disorder) and with fast interconversion between distinct conformations on the nanosecond to picosecond timescale (flexibility).
Next, we examined the impact of binding of labelled E1AWT to unlabelled Rb. The complex of E1AWT and Rb is 54.6 KDa (Extended Data Table 3) and we expect to observe global chemical shift changes and line broadening for the regions of E1AWT that interact with Rb. The TROSY spectrum of E1AWT bound to Rb does not show any large chemical shift changes or widespread peak broadening for residues in the linker consistent with previous reports (Fig. 2a and 2b III) [23]. These results indicate that the linker region remains disordered and flexible when bound to Rb, an interpretation supported by the lack of changes in secondary structure upon binding (Fig. 2c). Based on our affinity data (SI Text Section 1) and previous reports [26], we anticipated that the region flanking the canonical E1AE2F or E1ALxCxE motifs contributes stabilizing interactions to the complex. In agreement with this expectation, the peaks corresponding to the E1AE2F and E1ALxCxE SLiMs (L43 to Y47 and L122 to E126) and their flanking residues (E39 to T52 and V119 to E135) disappeared upon binding, yielding near-zero I/I0 ratios (Fig. 2b III). This result is consistent with persistent intermolecular interactions, and independent binding of each motif to Rb (SI Text Section 1).
The N-terminal linker region encompassing the TAZ2 binding motif showed a decrease in peak intensities (Fig. 2b III) that could be due to stabilizing interactions with Rb [23]. However, an isolated E1A linker fragment (E1A60-83) showed no detectable association to Rb (Extended Data Fig. 2i) and E1A constructs including the linker region did not show higher binding affinities than isolated E1A motifs (Fig. 2d, Extended Data Table 1). Finally, the linker region did not contribute to the change in accessible surface area upon binding (Fig. 2d, SI text Section 2, Extended Data Fig. 5h and Extended Data Table 4). Collectively, these results demonstrate that the linker does not contribute to the thermodynamics of binding through coupled folding and binding or through discernible contacts with Rb. These results appear to rule out Model A (Fig. 1f).
Affinity enhancement is not due to allosteric coupling between binding sites
To assess whether allosteric coupling between the E2F and LxCxE binding sites in Rb could explain the affinity enhancement (Fig. 1f, Model B), we performed ITC titrations where Rb was pre-saturated with E1AE2F or E1ALxCxE and titrated with the complementary motif (Extended Data Fig. 5). If a positive allosteric effect is at play, E1ALxCxE should bind more tightly to Rb when E1AE2F already bound and vice versa. This was measured as ΔΔG = ΔGPRE-SATURATED − ΔGNON-SATURATED, where a negative value for ΔΔG indicates positive cooperativity. However, ΔΔG values for both motifs were lower than ± 0.25 kcal/mol (Extended Data Table 5). In E1ALxCxE binding assays, pre-saturation with E1AΔL instead of E1AE2F did not change the outcome, indicating that neither the motif nor the motif + linker arrangement behaved as allosteric effectors on the complementary site. Therefore, our results suggest that allosteric coupling is unlikely to be responsible for affinity enhancement. This rules out Model B (Fig. 1f) and points to tethering (Fig. 1f, Model C) as the mechanism underlying the ability of E1A to disrupt the E2F-Rb complex.
The E1A linker acts as an entropic tether that optimizes the affinity of the E1ALxCxE and E1AE2F SLiMs for Rb
As suggested by early reports [27] tethering could allow docking through the E1ALXCXE motif, increasing the effective concentration (Ceff) of the E1AE2F motif such that it efficiently outcompetes E2F (Fig. 1f, Model C). This form of “zeroth-order” cooperativity can be described using a simple Worm Like Chain (WLC) model that treats the linker as an entropic tether (Fig. 3a,b) [28]. According to this model, a short linker would be unable to straddle the distance between the two binding sites (Fig. 3a,b l), an optimal linker would maximize Ceff (Fig. 3a,b II), and a longer than optimal linker would decrease Ceff (Fig. 3a,b III). Application of the WLC model yields a predicted Ceff value of 0.92 mM, with a near-optimal linker length (Fig. 3b). This is in close agreement with the estimate for Ceff (0.52 ± 0.09 mM) obtained from the measured affinities, implicating model C (Extended Data Table 1).
To assess the prediction that the E1A linker behaves as an entropic tether, we performed Small Angle X-ray Scattering (SAXS) experiments on Rb, E1AWT, and the [Rb:E1AWT] complex (Fig. 3c, Extended Data Fig. 6). The experimental SAXS profile of the RbAB domain could be fitted to the theoretical SAXS profile derived from its crystal structure (χi2 = 1.3) and further refined (RMSD = 1.7 Å) using a SAXS-driven modelling approach (χi2 = 0.82) (Fig. 3c, Extended Data Fig. 6a), indicating that Rb in solution retained its folded structure. Instead, the Kratky plots of E1AWT were characteristic of an IDP. Fitting of the SAXS profiles using the Ensemble Optimization Method (EOM) [29] indicated E1AWT was highly expanded (Extended Data Fig. 6b). To analyse the conformation of the linker in the [Rb:E1AWT] complex, we applied a sampling method [30] to generate a pool of 10250 realistic conformations [31] and computed theoretical SAXS profiles that were selected using EOM analysis. The SAXS profile of the complex was best described by sub-ensembles where the linker sampled expanded conformations (Fig. 3c-e, Extended Data Fig. 6c) with Rh values (Rh EOM = 3.36 nm) matching those obtained from SEC-SLS experiments (Rh SEC = 3.20 ± 0.12 nm) (Fig. 3f-g, Extended Data Fig. 6d and Extended Data Table 3) and Rg/Rh ratios consistent with compaction upon bivalent tethering (SI Text Section 3). Collectively, the WLC modelling and the NMR, SAXS and SEC-SLS data support Model C, showing that the E1A linker behaves as an entropic tether that remains highly flexible and expanded to optimize the affinity of both binding motifs to Rb (Fig. 1f).
Conformational buffering leads to conservation of the E1A linker dimensions
Inspection of selected linker sequences representative of Mast adenoviruses that infect a variety of mammalian hosts revealed significant variations in linker length and sequence composition (Extended Data Fig. 7a). While N- and C-terminal acidic extensions and an aromatic/hydrophobic TAZ2 binding region were highly conserved (Fig. 2b V), the linker lengths and compositions vary considerably within the central region enriched predominantly with polar, hydrophobic and proline residues. To gain insight into how these variations impacted linker conformations, we performed all atom simulations [8] and obtained conformational ensembles of 24 E1A linker sequences that exhaustively sample linker sizes ranging from 41 to 75 residues (Source Data File 1). Strikingly, the average end-to-end distance of these linkers remained roughly constant despite an almost doubling of the length (Fig. 4a), indicating the global linker dimensions were preserved through compensatory changes in sequence length and composition. We refer to this adaptive mechanism as conformational buffering. To uncover the sequence-encoded origins of conformational buffering we examined various statistical properties (Extended Data Fig. 7b). Net charge per residue (NCPR) was the strongest predictor of normalized end-to-end distance, with more expanded chains having a higher NCPR (Fig. 4b). Longer chains tend to have higher proline contents with fewer hydrophobic and charged residues (Extended data fig. 7b, Fig. 4b, inset). Our results suggest that conformational buffering is achieved through compensatory covariations in sequence that preserve the mean end-to-end distances across linker sequences. To test this hypothesis directly, we performed simulations for a collection of 140 random synthetic sequences of variable length that matched the amino acid composition of one of the shortest representative linkers (HF_HAdV40). In sharp contrast to natural sequences, the synthetic sequences showed a clear monotonic increase in end-to-end distance with chain length (Extended Data Fig. 7c). This result suggests that the sequence composition and patterning within natural E1A linkers are decidedly non-random, being tuned throughout evolution to ensure that changes in composition and chain length did not significantly alter the key physical properties of the linker. Our findings underscore the functional implications of preserving sequence-ensemble-relations (SERs), which in the case of E1A is achieved by preserving the dimensions of disordered linkers in order to enable the hijacking of the eukaryotic cell cycle by the virus.
Evolutionary conservation of E1A tethering
If tethering is a key functional feature, then the Ceff and global Rb binding affinity should be under selection. Accordingly, the expectation is that these parameters should be conserved across E1A evolution. In agreement with this expectation, the predicted Ceff values for 110 E1A linkers covering the Mast adenovirus E1A phylogeny were distributed close to the maximum of the Ceff function (Extended Data Fig. 8a, c). Next, we calculated the global Rb binding affinity of all E1A proteins (KD,E1A) using the predicted Ceff values together with individual motif affinities predicted using energy scoring matrices (Extended Data Fig. 8d). Strikingly, the median values of 110 E1A proteins followed the trend observed in HAdV5E1A: while the E1AE2F motif bound Rb with weaker affinity (KD,E2F = 3.9 μM) than the cellular E2F2 motif (KD,E2F2 = 1 nM) (Fig.4c, red line), the global binding affinity (KD,E1A = 150 pM) was higher than E2F2 suggesting that the hijack mechanism is conserved across E1A proteins (Fig. 4c).
The compensatory changes uncovered by our all-atom simulations suggested that linker-specific Ceff may confer tighter binding affinities when compared to a naïve WLC model in which chain length is the sole determinant of Ceff. The relevant parameter is the apparent chain stiffness or persistence length (Lp) (Extended Data Fig. 8b). While Lp is kept constant in the naïve WLC model, in reality the apparent stiffness varies between sequences in a composition-dependent manner [28]. In order to explore how conformational buffering affected the Ceff enforced by the linkers, we calculated sequence-specific Lp values (LpSim) from the all-atom simulations (See Methods). The median LpSim value (6.7 Å) was higher than the naïve expectation (Lp = 3Å) (Extended Data Fig. 8e). Thus, E1A linkers were globally expanded [32], and approached an optimal linker length (Extended Data Fig. 8f,g). While the naïve WLC model predicts Ceff to increase from 0.4 mM to 1 mM from the shortest to the longest linker, chain-specific LpSim values led to all linkers displaying a Ceff of 1 mM (Extended Data Fig. 8h,i). This produced a 3-fold enhancement in the global binding affinity for short linkers (Extended Data Fig. 8j) and improved the median KD,E1A Sim value to 5.9 pM (Fig. 4c) independent of variations in sequence length (Extended Data Fig. 8k). Accordingly, we propose that the functional length of the linker [33] referred to as a joint contribution of sequence length, amino acid composition, and sequence patterning as determinants of end-to-end distances – is not random. Instead, the functional length is conserved through conformational buffering to enable maximal affinity enhancement.
Phylogenetic analysis further revealed that tethering was strongly conserved within primate-infecting Mast adenoviruses and host orders Carnivora, Chiropteran and Perissodactyla (Fig. 4d). In contrast, tethering was weakened within more divergent orders (PC, OA, BA, Extended Data Fig. 7a and outlier points in Fig. 4c) due to the presence of short linkers coupled to low affinity or absent Rb binding motifs (Fig. 4d). This suggests that the motif-linker-motif arrangement is under co-evolutionary selection, such that either the linker and the motifs are jointly optimized, or neither are under selective pressure, presumably leading to a loss of the displacement mechanism.
DISCUSSION
Here, we demonstrate how E1A hijacks the eukaryotic cell cycle using two SLiMs tethered by an optimal entropic linker [34,35]. The proposed docking and displacement mechanism appears to be maintained by natural selection through conformational buffering, a mechanism that promotes robust encoding of core functions while supporting the extensive variation of sequence features that would be required for viral adaptation. Our work challenges the view that IDRs evolve with few restrictions, demonstrating that distinct features, specifically sequence ensemble relationships (SERs) of IDR dimensions are likely to be under selective pressure that can be masked by sequence variation and naïve sequence alignments.
For the E1A tethers, the main feature driving conformational buffering is net charge per residue (NCPR), in agreement with previous findings that charge valence and patterning are major determinants of IDR dimensions in natural [33,36,37,38] and synthetic [37,39,40] sequences. Additional linker features might help maintain specific conformations: the acidic extensions may generate electrostatic repulsion that ensures the linker stays extended, exposing the associated TAZ2 and MYND motifs, while favoring local solvation that prevents linker-Rb interactions. Previous studies have shown that proline can contribute more or less [40] strongly to IDR compaction. Changes in proline content were not a major determinant of E1A linker expansion, but instead appear to reflect an orthogonal selection feature such as the inhibition of helical secondary structure (Extended Data Fig. 7). This underscores the reality that multiple mechanisms might contribute to fine-tune the dimensions of E1A linkers. Our results suggest that WLC models coupled with sequence-based estimations of persistence length could be used to create more realistic representations of multivalent interactions in systems biology toolboxes [41].
We uncover how conformational buffering maintains the linker functional length through compensatory changes in sequence length, composition, and patterning that give rise to a diverse set of functionally equivalent IDRs (Fig. 4e, upper). This sequence variability could reflect adaptive changes following host switch events [42] that support rewiring of the E1A interactome through the gain and loss of SLiMs [19,43] or enhance viral immune evasion by fine tuning interactions with immune suppressors that map onto the region of interest [44] (Fig. 4e, lower). The robust encoding of functions provided by conformational buffering could be widespread among IDRs, underlying gene silencing [6], kinase inhibitor function [7], transcriptional control [16,36] and phase separation [8].
Data deposition
The EOM ensembles have been deposited in the Protein Ensemble Database (proteinensemble.org) with codes PED00174 (E1AWT:Rb) and PED00175 (E1AWT)
Author Contribution Statement
LBC, GWD, ASH and RVP designed research and conceived the study. NSGF and WMB produced reagents. NSGF and WMB performed FP, ITC and NMR experiments and WB, NGF, GWD and LBC analysed data. JG conducted bioinformatic analyses of E1A variants. AS and PB performed and analysed SAXS experiments. AE, AB and JC produced and analysed E1A conformational ensembles. SBV and ASH performed and analysed all atom simulations of E1A linkers. GFB, CBM and IES computed and analysed FOLDX matrices. NSGF, JG, AS and ASH produced figures. LBC, GWD, PB, JC, GdPG, IES, ASH and RVP Supervised research. LBC, NSGF, JG, RVP, ASH and GWD wrote the paper with critical feedback from all authors.
METHODS
Protein purification and peptide synthesis and labeling
Protein expression and purification
The human Retinoblastoma protein (Uniprot ID: P06400) AB domain with a stabilizing loop deletion (372-787Δ582-642), named Rb, was recombinantly expressed from a pRSET-A vector in E. coli Bl21(DE3) following a previously described protocol [1]. The adenovirus serotype 5 (HAdV5) Early 1A protein fragment (36-146) (Uniprot ID: P03255), named E1AWT, was subcloned into BamHI/HindIII sites of a modified pMalC vector (NewEnglandBioLabs, Hitchin, UK). E1AΔE (43-LHELY-47Δ43-AAAA-46) and E1AΔL (122-LTCHE-126Δ122-AAAA-125) variants were obtained by site-directed mutagenesis of the wild type vector. E1A proteins were expressed as MBP fusion products in E. coli BL21(DE3). Unlabelled and single (15N) and double (15N/13C) labelled samples were obtained from 2TY medium and M9-minimal medium supplemented with 15NH4Cl and 13C-glucose respectively. Cultures were induced with 0.8 mM IPTG at 0.7 OD600 and grown at 37 °C overnight in 2TY medium or for 5 h after induction in M9-minimal medium. Harvested cells were lysed by sonication and proteins isolated performing amylose affinity chromatography of the soluble fraction, followed by Q-HyperD Ion exchange and size exclusion (Superdex 75) chromatography. The MBP tag was cleaved with Thrombin (Sigma-Aldrich, USA) at 0.4 unit per mg of protein. Protein purity (> 90%) and conformation were assessed by SDS-PAGE, SEC-SLS and circular dichroism analysis (Extended Data Fig. 1).
Peptide synthesis
Peptides corresponding to individual E1A or E2F2 binding motifs were synthesized by FMoc chemistry at >95% purity (GenScript, USA) and quantified by Absorbance at 280 nm or by quantitation of peptide bonds at 220 nm in HCl -when Tryptophan or Tyrosine residues were absent. The peptide sequences are:
E1AE2F 36-SHFEPPTLHELYDLDV-51
E1ALxCxE 116-VPEVIDLTCHEAGFPP-131
E1ALxCxE-AC 116-VPEVIDLTCHEAGFPPSDDEDEEG-139
E1ALxCxE-ACP 116-VPEVIDLTCHEAGFPPpS DDEDEEG-139
Human E2F2 404-SPSLDQDDYLWGLEAGEGISDLFD-427
FITC labeling
Proteins and peptides were labelled at their N-terminus with Fluorescein 5-Isothiocyanate (FITC, Sigma), purified and quantified following a described protocol [45]. F/P (FITC/Protein) ratio was above 0.8 in all cases.
Circular Dichroism
Far-UV CD spectra were measured on a Jasco J-810 (Jasco, Japan) spectropolarimeter equipped with a Peltier thermostat using 0.1 or 0.2 cm path-length quartz cuvettes (Hellma, USA). Five CD scans were averaged from 195 to 200 nm at 100nm/min scan speed, and buffer spectra were subtracted from all measurements. All spectra were measured in 10mM Sodium Phosphate buffer pH 7.0 and 2mM DTT at 20 ± 1 °C and 5 μM protein concentration.
Size Exclusion Chromatography, Hydrodynamic radii calculations and Light Scattering Experiments
Analytical size exclusion chromatography (SEC) was performed on a Superdex 75 column (GE Healthcare) calibrated with globular standards: BSA (66 kDa), MBP (45 kDa) and Lisozyme (14.3 kDa). All runs were performed by injecting 100 μl protein sample (E1AWT and E1AΔL at 270 μM and E1AΔE at 540 μM) in 20 mM Sodium Phosphate buffer pH 7.0, 200 mM NaCl, 2 mM DTT. For each protein or complex a partition coefficient (Kav) was calculated and apparent molecular weights were interpolated from the –logMW vs Kav calibration curve. Experimental hydrodynamic radii (Rh) were calculated following empirical formulations developed by Uversky and col. [46]:
Where MW is the apparent molecular weight derived from SEC experiments. The predicted Rh for E1AWT was calculated following the formulation developed by Marsh and Forman-Kay [3].
The exponent ν was calculated from Rh = Ro·Nν using the experimental Rh values, with Ro = 2.49 nm for E1AWT and Ro = 4.92 nm for Rb, following [32]. For E1AWT, ν was calculated from Rg = R0·Nν using Rg obtained from SAXS measurements and R0 = 2.1 nm, following [47]. In both cases, N is the number of residues in the chain (Extended Data Table 3).
Static Light Scattering (SLS) coupled to SEC was carried out to determine the average molecular weight of individual protein peaks and the stoichiometry of [Rb:E1A] complexes using a PD2010 detector (Precision Detectors Inc, China) coupled in tandem to an HPLC system and an LKB 2142 differential refractometer. The 90° light scattering (LS) and refractive index (RI) signals of the eluting material were analysed with Discovery32 software (Precision Detectors).
Dynamic Light Scattering (DLS) was used to measure the hydrodynamic size distribution of E1A, using a Wyatt Dynapro Spectrometer (Wyatt Technologies, USA). Data was fitted using Dynamics 6.1 software. All measurements were performed in 20 mM Sodium Phosphate buffer pH 7.0, 200 mM NaCl, 1 mM DTT at 2 mg/ml. Samples were filtered by 0.22 μM filters (Millipore) and placed into a 96 Well glass bottom black plate (In Vitro Scientific P96-1.5H-N) covered by a high performance cover glass (0.17+/-0.005mm) before measurements were taken.
Fluorescence Spectroscopy Experiments
Measurements were performed in a Jasco FP-6200 (Nikota, Japan) spectropolarimeter assembled in L geometry coupled to a Peltier thermostat. Excitation and emission wavelengths were 495 nm and 520 nm respectively, with a 4 nm bandwidth. All measurements were performed in 20 mM Sodium Phosphate buffer pH 7.0, 200 mM NaCl, 2 mM DTT and 0.1% Tween-20 at 20 ± 1 °C.
For direct titrations, a fixed concentration of FITC-labelled protein/peptide was titrated with increasing amounts of Rb until saturation was reached. Maximal dilution was 20% and samples equilibrated for 2 min ensuring steady state. Titrations performed at concentrations 10 times higher than the equilibrium dissociation constant (KD) allowed estimation of the stoichiometry of each reaction. Binding titrations performed at sub-stoichiometric concentrations allowed an estimation of KD, by fitting the titration curves to a bimolecular association model:
Where Y is the measured anisotropy signal, YF and YB are the free and bound labelled peptide signals, P0 is the total labelled peptide concentration, x is Rb concentration, and KD is the equilibrium dissociation constant in Molar units. The [C * x] linear term accounts for slight bleaching or aggregation. Data was fitted using the Profit 7.0 software (Quantumsoft, Switzerland), yielding a value for each parameter and its corresponding standard deviation. Titrations for each complex were performed in triplicate at least at three different concentrations of FITC-labelled sample, and parameters were obtained from fitting individual titrations or by global fitting of the KD parameter using normalized titration curves at different concentrations, obtaining an excellent agreement between individual and global fits (Extended Data Table 2).
Competition experiments were carried out by titrating the pre-assembled complex [Rb:FITC-E2F2] (1:1 molar ratio, 5 nM) with increasing amounts of unlabelled competitors and following the decrease in the anisotropy signal until the value corresponding to free FITC-E2F2 was reached. IC50 values were estimated directly from the curves as the concentration where the competitor produced a decrease in 50% of the maximal anisotropy value. KD values were calculated by fitting the data considering the binding equilibrium of the labelled peptide and the unlabelled competitors according to [48], obtaining KD(comp) values that differed only slightly (2 to 4-fold) from those obtained from direct titrations. KD and KD(comp) values also displayed similar fold changes in binding affinity relative to E2F2 within each method (Extended Data Table 1). The agreement between the KD values obtained from fluorescence and ITC titrations (Extended Data Table 1) confirmed that FITC moiety did not cause significant changes in Rb binding affinity.
ITC Experiments
Direct titrations
ITC experiments were performed on MicroCal VP-ITC and MicroCal PEAQ-ITC equipment (Malvern Panalytical) in 20 mM Sodium Phosphate pH 7.0, 200 mM NaCl, 5mM 2-mercapto ethanol at 20.0 ± 0.1 °C, unless stated otherwise. Prior to titrations, cell and titrating samples were co-dialyzed in the aforementioned buffer for 48 h at 4 ± 1 °C and then de-gassed. Measurements performed in the MicroCal VP-ITC used 28 10-μl injections at a flow rate of 0.5 μl/s and those performed in the MicroCal PEAQ-ITC used 13 3-μl injections. The concentration range of cell and titrating samples are detailed in Extended Data Figs. 2 and 5. Data were analysed using the Origin software.
Allosteric coupling experiments
First, a pre-assembled [Rb:E1ALxCxE] complex (1:1 molar ratio, 30 μM) was titrated with E1AE2F or E1AΔL to assess whether binding of the LxCxE motif modified the binding affinity for the E2F site. Conversely, pre-assembled [Rb: E1AE2F] or [Rb: E1AΔL] complexes were titrated with E1ALxCxE to assess whether binding of the E2F motif modified the binding affinity for the LxCxE site (Extended Data Table 5).
Calculation of ΔCp and ΔASA parameters from ITC data
A series of titrations were carried out at different temperatures (10.0, 15.0, 20.0 and 30.0 ± 0.1 °C) and the change in binding heat capacity (ΔCp) was obtained from the slope of the linear regression analysis of the plot of ΔH vs temperature (Extended Data Fig. 5). The changes in accessible surface area and the number of residues involved in the interaction were estimated by solving semi-empirical equations from protein unfolding studies applied to protein-ligand binding [49](Extended Data Table 4).
NMR Experiments
NMR experiments were carried out using a Varian VNMRS 800 MHz spectrometer equipped with triple resonance pulse field Z-axis gradient cold probe. A series of two-dimensional sensitivity-enhanced 1H–15N HSQC and three-dimensional HNCACB, HNCO and CBCA(CO)NH experiments [50,51] were performed for backbone resonance assignments on uniformly 13C–15N-labelled samples of E1AWT, E1AΔE and E1AΔL at 700 μM, 975 μM and 850 μM respectively. All measurements were performed in 10 % D2O, 20 mM Sodium Phosphate pH 7.0, 200 mM NaCl, 2 mM DTT at 25 °C. The HSQC used 9689.9 Hz and 1024 increments for the t1 dimension and 2106.4 Hz with 128 increments for the t2. The HNCACB used 9689.9, 14075.1, and 2106.4 Hz, with 1024, 128, and 32 increments for the t1, t2, and t3 dimensions, respectively. The HNCO used 9689.9, 2010.4 Hz, and 2106.4 Hz with 1024, 64, and 32 increments for the t1, t2, and t3 dimensions, respectively. The CBCA(CO)NH used 9689.9, 14072.6, and 2106.4 Hz, with 1024, 128, and 32 increments for the t1, t2, and t3 dimensions, respectively. For E1AWT 88% of non-proline backbone 1H and 15N nuclei, 75% of 13C’ nuclei and 90% of 13Cα and 13Cβ of E1A nuclei were assigned. For E1AΔE and E1AΔL 85% of non-proline backbone 1H and 15N nuclei, 72% of 13C’ nuclei and 87% of 13Cα and 13Cβ E1A nuclei were assigned.
NMRPipe and NMRViewJ software packages were used to process and analyse all the NMR spectra [52]. Residue-specific random coil chemical shifts were generated for the three sequences using the neighbor-corrected IDP chemical shift library [53]. Secondary chemical shifts (Δ δ), were calculated by subtracting random coil chemical shifts from the experimentally obtained chemical shifts.
Two-dimensional 1H–15N TROSY experiments were performed on single 15N-labelled samples of free E1AWT, E1AΔE and E1AΔL and on each E1A protein bound stoichiometrically to Rb (1:1 molar ratio) at 525 μM (E1AWT), 300 μM (E1AΔE) and 315 μM (E1AΔL). The ratio between the peak intensity in the bound state (I) and the peak intensity in the free state (I0) was calculated, allowing interacting residues to be determined together with additional data.
WLC modelling
The worm like chain (WLC) model
A worm like chain (WLC) model [54] was used to describe the end-to-end probability density distribution function of the E1A linker and estimate the effective concentration term (Ceff) used in the tethering model (Figure 1, Model C and Figure 3). In this model, the disordered linker behaves as a random polymer chain whose dimensions depend on the persistence length (LP), which represents the chain stiffness, or the length it takes for the chain motions to become uncorrelated and on the contour length (LC), which is the total length of the chain. For long peptides, LP assumes a standard value of 3Å and LC is the product of the number of linker residues times the average unit size of one amino acid (3.8 Å) [28]. Under this model, the probability density function p(r) is defined by:
Where p(r) is a function of distance r and depends on LP and LC. The last term in the equation is expanded in [54,28]. The end-to-end probability density function can be related to the effective concentration in the bound state when the linker is restrained to a fixed distance between binding sites, ro [54]. In this case, the effective concentration Ceff is defined by:
Where NA is Avogadro’s number and (r0) is the distance separating the binding sites obtained from the X-ray structure of the complex (49 Å calculated from PDB: 2R7G [25] and 1GUX [55]). Multiplying Eq. (4) by 103 yields Ceff in millimolar units.
Calculation of predicted Ceff value for the E1AWT:Rb interaction
The Ceff value predicted from the WLC model was obtained by applying Eq. [4] with the designated Lp parameter (standard model LP = 3Å and b = 3.8Å), using a linker length of 71 residues for HAdV5 E1A. The separation between binding sites, r0, was 49 Å (from PDB:1GUX and PDB:2R7G). The WLC model was also used to estimate the Ceff values of a collection of 110 natural linker sequences of different length changing the length value for each linker and keeping other parameters constant.
Calculation of sequence-dependent Lp values
In order to represent the extension of E1A linkers taking into account sequence composition, we derived the persistence length from all atom simulations (Lp Sim). Lp Sim was calculated from the average end-to-end distance of each simulated ensemble using the equation <r2>= 2*Lp*Lc, where Lc = N*b and b takes the value 3.8 Å. This equation is an approximation for the value of <r2> for a worm like chain in the case where the contour length of the chain is much larger than its persistence length (Lc >> Lp) (Source Data File 2) [28]. New Ceff values were derived using the same parameters described above, but replacing the standard Lp value by the Lp Sim value.
Calculation of experimental Ceff value for the E1AWT:Rb interaction
We calculated the experimental Ceff value from Model C: KG = K1*K2*Ceff (Figure 1f) where KG, K1 and K2 were the equilibrium association constants (K =1/KD) for binding of the motif-linker-motif construct E1AWT (KG) or the individual motifs E1AE2F (K1) and E1ALxCxE (K2) to Rb (Extended Data Table 1). The condition K1 = K1’ and K2 = K2’ (no allosteric coupling between sites) was met, as experimentally proven (Extended Data Fig. 5 and Extended Data Table 5).
Molecular modelling of Rb:E1A conformational ensembles
Conformations of E1AWT bound to Rb were modelled using an extended version of a recently proposed method to generate realistic conformational ensembles of IDPs [56]. This method exploits local, sequence-dependent structural information encoded in a database of three-residue fragments and builds conformations incrementally sampling dihedral angles values from the database, while avoiding steric clashes. In order to model the double-bound [Rb:E1AWT] complex, the E2F and LxCxE motifs were considered to be static, preserving the conformations extracted from experimentally determined structures (2R7G and 1GUX). The 71-residue fragment between these two motifs was considered as a long protein loop that adapts its conformation in order to maintain the two ends rigidly positioned. Conformational sampling considering such loop-closure constraints was performed using a robotics-inspired algorithm [31] adapted to use dihedral angle values from the aforementioned database. For each feasible conformation of the central fragment, geometrically compatible conformations of the short N- and C-terminal tails were sampled using the basic strategy explained in [56]. For singly bound models, only one of the two motifs were considered to be statically bound to Rb and the other motif behaved as the flexible linker.
SAXS Experiments
SAXS experiments were carried out at the European Molecular Biology Laboratory beamline P12 of DORIS and PETRAIII storage rings respectively, using the X-ray wavelengths of 1.24 Å and a sample-to-detector distance of 3.0 m. The scattering profiles measured covered a momentum transfer range of 0.0026 < s < 0.73 Å−1. SAXS data were measured for Rb, E1AWT and [Rb:E1AWT] complex at 10° C. Concentrations used for E1AWT were 7.0, 5.6 and 4.2 mg/ml, for Rb were 4.0, 2.0, 1.0 mg/ml, and for and [Rb:E1AWT] were 2.7, 1.4, and 0.7 mg/ml, in 20 mM Sodium Phosphate pH 7.0, 200 mM NaCl, 1mM DTT. The scattering patterns of the buffer solution were recorded before and after the measurement of each sample. Multiple repetitive measurements were performed to detect and correct for radiation damage. Final curves at each concentration were derived after the averaged buffer scattering patterns were subtracted from the protein sample patterns. No sign of aggregation was observed in any of the curves. Final SAXS profiles for the systems were obtained by merging curves for the lowest and highest concentrations to correct small attractive interparticle effects observed. Raw data manipulation was performed using standard protocols with the suite of programs ATSAS [57]. The forward scattering intensity, I(0), and the radius of gyration, Rg, were evaluated using Guinier’s approximation [58], assuming that at very small angles (s < 1.3/Rg, the intensity can be well represented as I(s) = I(0) exp(−(sRg)2/3)). The P(r) distribution functions were calculated by indirect Fourier Transform using GNOM [59] applying a momentum transfer range of 0.01 < s < 0.33 Å−1 and 0.013 < s < 0.27 Å−1 for Rb and [Rb:E1A], respectively. For E1AWT a SEC-SAXS experiment was also performed to obtain the SAXS profile from a highly monodisperse sample. This profile overlaid perfectly with the E1AWT merged curve from the three batch experiments, discarding aggregation problems.
The fitting of the crystallographic structure of Rb (PDB: 3POM [60]) to the experimental SAXS curve was performed with FOXS [61,62]. An optimal fit (X2=0.86) was obtained after modelling the missing parts (loops, N- and C-termini) and a subsequent refinement with the program AllosMod-FoXS [63]. SAXS data measured for [Rb:E1A] were analysed with the Ensemble Optimization Method (EOM) [29,64]. Briefly, theoretical SAXS profiles of the 10250 structures of the complex were computed with CRYSOL [65]. 200 different sub-ensembles of 20 or 50 conformations collectively describing the experimental curve were collected with EOM and analysed in terms of Rg distributions. The experimental SAXS data of [Rb:E1AWT] complex is compatible with three distinct scenarios: a 100% doubly-bound ensemble where the linker is highly expanded, a 100% singly-bound ensemble where the linker is highly compact and thirdly, an ensemble with a combination of 76% doubly bound and :24% singly-bound species, which resulted from the linear combination of a curve representing the ensemble average of all singly- and all doubly-bound conformations. However, thermodynamic (KD for E1AWT) data strongly argue against the last two scenarios as it indicates an extremely low expected population of the singly-bound forms at any concentration of the complex used in the SAXS experiments.
Hydrodynamic radii for generated conformations
Hydrodynamic radii were calculated using the program HYDROPRO (version 10) [66,67]. HYDROPRO was run on 1000 models selected by EOM for the doubly-bound conformations and 1000 randomly selected conformations of N- and C-terminal bound conformations. The calculations were done at temperatures of 20 and 25 °C with corresponding solvent viscosities of 0.01 and 0.009 poise, respectively. The values of atomic element radius (AER), Molecular Weight, Partial Specific Volume and Solvent Density were set to 2.9 Å, 54590 Da, 0.702 cm3/g and 1.0 g/cm3, respectively.
All-atom simulations of E1A Linker sequences
All-atom simulations were run using the CAMPARI simulation engine (V2) and ABSINTH implicit solvent model [68,69]. All simulations were run at 320 K; while this is a slightly elevated temperature compared to the experimental temperature, none of the terms the Hamiltonian lacks temperature dependence such that this slightly high temperature serves to improve sampling quality in a uniform way across all simulations. This approach has been leveraged to great effect in previous studies and is especially convenient in the case of simulating many different sequences that span a range of sequence properties and lengths [7]. A collection of Monte Carlo moves was used to fully sample conformational space as previously described [70,71,36].
For all simulations of natural sequences, 15 independent simulations were run per sequence for a total of 90K conformations per sequence across 27 different sequences (405 independent simulations, 5.25 ×108 Monte Carlo steps per sequence). Simulations were performed in 15 mM NaCl in a simulation droplet size sufficiently large for each sequence, calibrated in a length dependent manner. Simulations were analysed using CTraj (http://ctraj.com), an analysis suite built on the MDTraj package [72]. Sequence analysis was performed using the local CIDER software package [73] with all parameters reported in (Source Data File 1). Normalized end-to-end distance was calculated as the absolute end-to-end distance divided by the end-to-end distance expected for an equivalently long Gaussian chain.
Length titration Simulations
The linker from HF_HAdV40 was used to determine the overall amino acid composition and generate random sequences across a range of lengths that recapitulated this composition. Specifically, for each length (45, 50, 55, 60, 65, 70, 75) twenty random sequences were generated for a total of 140 randomly generated sequences. Each sequence was simulated under equivalent simulation conditions for 35 × 109 simulation steps, with the goal of elucidating the general relationship between sequence length and end-to-end distance for an arbitrary sequence of the composition associated with HF_HAdV40. The mean end-to-end distance for the collection of sequences at a given length was determined, such that the mean value is a double average over both conformational space and sequence space.
Calculation of global binding affinity of natural Adenovirus E1A sequences
Dataset
A previously reported alignment of 116 Mast adenovirus E1A sequences [43] was used to identify the E2F and LxCxE motifs as described [43], collecting 110 sequences in which both motifs were present (Source Data File 4). For all sequences, the length of the linker region between both motifs was recorded. Individual motif binding affinities, Ceff values and E1A global affinity (KD,E1A) were calculated as explained below (Source Data File 2).
Calculation of E1A binding affinity
The global binding affinity (KD,E1A) was calculated from the predicted Ceff values and the predicted binding affinities of each motif (KD,E2F and KD,LxCxE) from Model C (Figure 1f) as KD,E1A = KD,E2F*KD,LxCxE*Ceff−1. Each term was calculated as follows:
Motif binding affinity prediction
To estimate the binding affinity of individual E2F and LxCxE motifs present in each sequence, FoldX 5.0 [74] was used to build substitution matrices for all 20 amino acids at each position (Source Data File 2). Briefly, given a structural complex the FoldX algorithm assesses the change in binding free energy produced by mutating each position of the motif for each one of the 20 amino acids. For the E2F matrix, the structure of the HAdV5 E1AE2F motif in complex with Rb (PDB: 2R7G) was used as input. For the LxCxE matrix, the structure used as input was a model of the HAdV5 E1ALxCxE motif in complex with Rb (Source Data File 3), built using FlexPepDock [75] and the structure of the HPV E7 LxCxE motif bound to Rb (PDB: 1GUX). The total change in binding free energy with respect to the wild type sequence (ΔΔGFoldX) was calculated by adding up the free energy terms for each residue at each matrix position (Source Data File 2). The predicted equilibrium dissociation constant of the E2F and LxCxE motifs for each sequence (KD SEQ) was calculated as:
Where ΔΔGFoldX is the total predicted change in binding energy calculated using FoldX, RT is 0.582 kcal mol−1, KD WT is the experimentally measured binding affinity of the sequence (HAdV5 E1A) present in the model structure (KD,E2F and KD,LxCxE measured in this work, Extended Data Table 1).
Ceff prediction
The Ceff value was calculated for a collection of 110 natural E1A linkers as described in the “WLC modelling” section using Equation (4) and Lp = 3 Å (Lp WLC). For the subset of 24 natural E1A linkers used in all-atom simulations, we additionally calculated Lp values obtained from the best fit of the data to a Gaussian Chain model (Lp GC) and sequence-specific Lp values from all atom simulations (Lp Sim) as described in “WLC modelling”. All data are reported in (Source Data File 2).
Statistical analysis
We used bootstrapping [76] to generate 99% confidence intervals (CI) for KD,E2F, KD,LxCxE and KD,E1A average values, and compared the lower and upper end points against the value of KD,E2F2 (1 10−9 M). The lower bound of the 99% CI for KD,E2F and KD,LxCxE is higher than KD,E2F2 and the upper bound of the 99% CI for all KD,E1A are lower than KD,E2F2. We also used permutation tests [76] to assess the null hypothesis that the Ceff, Lp and average KD average values did not differ between all pairs of groups. In order to control for the false discovery rate, the p-values were corrected using the Benjamini-Hochberg [77] correction for multiple comparisons.
Calculations of disorder propensity and conservation
All calculations were performed on the dataset from Source data file 4, using the methods described in [43]. For disorder propensity we recorded the mean IUPRED value ± SD per position and for residue conservation we recorded the information content (IC) per position.
Acknowledgements
This work was supported by Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT) Grants PICT 2013-1895 and 2017-1924 (to LBC), 2012-2550 and PICT 2015-1213 to IES and 2016-4605 (to GPG), the US National Institutes of Health (grants GM115556 and CA141244 to GWD, grant 5R01NS056114 to RVP), FLDOH grant 20B17 (to GWD), and the US National Science Foundation (grant MCB-1614766 to RVP). GWD and LBC were supported by a travel award from the USF Nexus Initiative and a Creative Scholarship Grant from the USF College of Arts and Sciences. PB was supported by the Labex EpiGenMed, an «Investissements d’avenir» program (ANR-10-LABX-12-01) (to PB). The CBS is a member of France-BioImaging (FBI) and the French Infrastructure for Integrated Structural Biology (FRISBI), two national infrastructures supported by the French National Research Agency (ANR-10-INBS-04-01 and ANR-10-INBS-05, respectively). This work was supported by the Spanish Ministerio de Ciencia y Universidades (MICYU-FEDER, RTI2018-097189-C2-1 to GFB) NGF was supported by a doctoral fellowship from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina) and a scholarship from Fulbright Visiting Scholar Program. NGF and JG are CONICET postdoctoral fellows, and LBC, IES and GdPG are CONICET researchers. SBV was supported by fellowships from Ministerio de Ciencia e Innovación, España #BES-2013-063991 and #EEBB-I-16-11670. ASH is supported by the Longer Life Foundation: A RGA/Washington University Collaboration. This work benefited from the HPC resources of the CALMIP supercomputing center under the allocation 2016-P16032 and the Cluster of Scientific Computing (http://ccc.umh.es/) of the Miguel Hernández University (UMH). We thank Kathryn Perez at the Protein Expression and Purification Core Facility at EMBL (Heidelberg) for critical help with ITC experiments.