Abstract
Many multidomain proteins contain disordered linkers that regulate inter-domain contacts, and thus the effective concentrations that govern intra-molecular reactions. Effective concentrations are rarely measured experimentally and therefore little is known about how they relate to linker architecture. We have directly measured the effective concentrations enforced by disordered protein linkers using a new fluorescent biosensor. We show that effective concentrations follow simple geometric models based on polymer physics, offering an indirect method to probe the structural properties of the linker. The compaction of the disordered linker depends not only on net charge, but also on the type of charged residues. In contrast to theoretical predictions, we found that polyampholyte linkers can contract to similar dimensions as globular proteins. Hydrophobicity has little effect in itself, but aromatic residues lead to strong compaction likely through π-interactions. Finally, we find that the individual contributors to chain compaction are not additive. This work represents perhaps the most systematic study of the relationship between sequence and structure of intrinsically disordered proteins so far. A quantitative understanding of the relationship between effective concentration and linker sequence will be crucial for understanding disorder-based allosteric regulation in multidomain proteins.
Introduction
Protein interactions are tightly regulated. The specificity is not only determined by the protein structure, but also by which molecules a protein encounters. Molecular encounters are often enhanced by co-localization or even a direct physical link between the interaction partners. Such links are created by hundreds of anchoring and scaffolding proteins that connect other molecules.1–3 Furthermore, multidomain proteins often contain long linkers that play a similar role for intra-molecular interactions.4 A physical connection often increases encounter rates by several orders of magnitude, which shifts equilibrium position of binding reactions and the rates of biochemical reactions by a similar amount.5,6 The encounter frequency of such linked reactions is concentration independent. Instead it depends on the architecture of the connection, and therefore linker properties directly affect protein function.
Anchoring proteins and interdomain linkers often belong to the family of intrinsically disordered proteins (IDPs).7 Disordered linkers allow the tethered domains to contact in any orientation, and therefore they represent a generic mechanism for fusing domains, which cannot be accomplished by rigid proteins. The architecture of the connection between molecules is not static, but can vary in time and space to regulate the functions of the tethered domains. Recently, it has been proposed that the concepts of allostery needs to be updated to include structural disorder: Here allostery is transmitted through changes in heterogenous ensembles rather than through structured domains.8,9 Most structural changes occurring in linker regions will affect the function of the tethered domains. Therefore, linkers provide a generic mechanism whereby an event occurring in one part of a protein can affect distant parts. This is the defining feature of allostery. To understand such allosteric effects, it is crucial to study the role of linkers quantitatively.
The functions of linkers can be understood quantitatively in terms of effective concentrations. For an intra-molecular reaction, the encounter rate between tethered domains equals the rate of the same untethered reaction at a given concentration.10 This concentration is known as the effective concentration. Formally, the effective concentration is defined as the ratio of the equilibrium constants for two equivalent binding reactions, where one occurs intra-and one inter-molecularly.11 When the linker is sufficiently long to join the binding sites without strain, the effective concentration is independent of what is linked and solely a property of the linker.12,13 Intriguingly, this suggests that effective concentrations can be measured in a convenient model system and extrapolated to other systems. Effective concentrations can be measured by competition experiments, where a free ligand displaces a tethered ligand.11 Such measurements have mostly been used in efforts to optimize multivalent drugs through avidity.14,15 In molecular biology, effective concentrations are rarely measured experimentally, but usually estimated theoretically from the volume in which the tethered ligand is free to diffuse.16–20 While such simple geometric models are commonly used, they have not been tested in complex biological systems.
The simplest linker is a fully disordered chain. Fully disordered linkers are an attractive model for understanding effective concentrations as they are well-described by theories borrowed from polymer physics. When the sizes of disordered proteins are measured as e.g. end-to-end distances or the radius of hydration or gyration, it scales with the chain length following a power law such as:
Where N is the number of residues and v is a scaling exponent determined by chain compaction. Such scaling laws underpin theoretical calculations of effective concentrations and depends on estimating the scaling exponent v. On average, IDPs have been found to have v values from 0.51-0.58,21–23 but the scaling exponents of disordered proteins varies from about 0.4 for disordered states of foldable proteins to about 0.72 for highly charged IDPs.24 For reference, globular proteins and rigid rods have scaling exponents of 0.33 and 1, respectively. The sequence-compaction relationship of IDPs has been studied by correlating chain size with variations in sequence. Net charge dominates chain compaction through intra-chain repulsion21,24–26. Furthermore, compaction is weakly correlated to hydrophobicity and weakly anti-correlated to proline content.21 There is a discrepancy in the literature regarding the effect polyampholyte strength as it has been predicted to cause both compaction or expansion.27,28 Furthermore, the patterning of charged residues is also critical,28–30 which vastly increases the potential complexity.
Here we investigate how effective concentrations in multi-domain proteins depend on linker architecture. We directly measure effective concentrations for many disordered linkers with systematic changes in the physical properties of the linker. Our new fluorescent biosensor for measurement of the effective concentrations provides a new way to probe sequence-compaction relationships in intrinsically disordered proteins and relating these to biochemical function.
Materials and methods
Preparation of DNA constructs
DNA constructs were obtained from Genscript by insertion of synthetic genes between the NdeI and BamHI sites of a pET15b vector, and sub-cloning of new linkers using unique NheI and KpnI sites flanking the linkers. Full protein sequences are given in the supplementary materials.
Protein expression and purification
All fusion protein constructs were expressed in BL21(DE3) cells in 50mL ZYM-5052 auto-induction medium31 supplied with 100μg/mL ampicillin and shaking at 120 RPM. The temperature was kept at 37°C for 3 h, and thereafter decreased to 18°C. The cells were harvested by centrifugation (15min, 6.000g) after 40-48 h, when the cultures had changed color indicating mature fluorescent proteins. Bacterial pellets were lysed using B-PER Bacterial Protein Extraction Kit (Thermo Scientific) according to manufacturer’s protocol, and the lysate was applied to gravity flow columns packed with nickel sepharose. After washing with 20 mM NaH2PO4 pH 7.4, 0.5 M NaCl, 20 mM imidazole, fusion proteins were eluted by increasing the imidazole concentration to 0.5 M. Fusion proteins were subsequently purified using Strep-Tactin XT Superflow (IBA) columns according to the manufacturer’s instructions, and dialyzed overnight into Tris buffered saline (TBS). The MBD2 peptide was expressed overnight in BL21(DE3) cells at 37°C in ZYM-5052 auto-induction medium with 100μg/mL ampicillin and shaking at 120 RPM. The cells were resuspended in 20 mM NaH2PO4 pH 7.4, 0.5 M NaCl, 20 mM imidazole and lysed by heating to 80 °C for 20min,32 and debris pelleted by centrifugation (15min, 14.000g). The peptide was purified by nickel sepharose as above except a stepwise elution up to 0.5 M imidazole was used before dialysis into TBS. It was critical to prepare and concentrate the MBD2 peptide freshly and store it on ice until use. Protein concentrations were measured using A280.
Measurement of effective concentrations
0.1 μM of each fusion protein was titrated with the MBD2 peptide through 16 serial two-fold dilutions in TBS. The starting concentration was in the range of 1.6-2 mM for WT MBD2 and 3.3 mM for the V227A MBD2 mutant. Samples were analyzed in triplicate in black 386-well plates with 1g/L bovine serum albumin (BSA) (Fisher Scientific) added to prevent sticking. The FRET measurements were performed in a SpectraMax I3 platereader using donor excitation at 500nm, and emission detected in 25nm-wide bands centered at 535 and 600 nm. The titration data was analyzed by non-linear fitting in MATLAB to the standard fitting equation for 1:1 binding reactions with Kd replaced by ce,app:
For titration with the WT MBD2 peptide, this determines an “apparent effective concentration”, which was multiplied by the affinity ratio of the wild-type and mutant peptides to produce the true effective concentration. The correction factor was established to be 30 by titration of the fusion protein containing the GS120 linker with the V227A MBD2 peptide.
Results
Reporter design
We designed a fusion protein inspired by FRET biosensors33 to measure effective concentrations for different linker architectures. An exchangeable linker joins two protein domains that form an interaction pair. These domains are flanked by the fluorescent proteins mClover3 and mRuby3, which form a FRET pair34 (Fig. 1A). When the intra-molecular complex is formed, the fluorescent proteins are brought into close contact resulting in efficient FRET. The effective concentration is measured by following the FRET efficiency in a titration,11 where a free ligand displaces the intra-molecular interaction. The ideal interaction pair is small, has a known 3D structure, is easy to produce in E. coli and binds tightly to ensure full ring-closing. Furthermore, in the bound state the N-terminus of one protein should be close to the C-terminus of the other and vice versa. This ensures a high FRET efficiency in the closed state and allows even a short linker to join the domains without strain. These constraints are ideally fulfilled by an anti-parallel heterodimeric coiled-coil such as the complex between MBD2 and p66α used here (Fig. 1B).35
The fusion proteins have purification tags at both the N-and C-termini to allow parallel purification of many constructs by sequential affinity chromatography. After purification, SDS-PAGE revealed a major band corresponding to the expected molecular weight of the fusion protein (Fig. 1C). The four minor bands corresponded to proteolytic cleavage in the mRuby3 domain as revealed by mass spectrometry. These bands could not be removed by gel filtration. This suggests that the cleaved fluorescent proteins remain in a stable complex, although it is not clear whether it is fluorescent. A fraction of inactive fluorophores will decrease the FRET amplitude, but will not affect the mid-point of the titration and the measurements of effective concentrations.
Measurement of effective concentration
Fusion proteins were initially constructed with linkers consisting of (GS)n repeats ranging from 20 to 120 residues. A 240-residue GS-linker was also tested, but resulted in insoluble protein. Titration of all constructs with free WT MBD2 peptide resulted in a sigmoidal decrease of the proximity ratio (Fig. 2A) where the titration midpoint moved to lower concentrations with increasing linker length. Simultaneously, the proximity ratio of the post-titration baseline decreased with linker length consistent with a more expanded open form (Fig 2A). Across the whole dataset, the pre-and post-transition proximity ratios varied unsystematically, which we believe was due to small differences in fluorophore maturation. This complicated the determination of intra-molecular distances from FRET values, and therefore we only extracted the midpoint of the titration.
Fitting of titration data requires the concentration of the free ligand to exceed the mid-point by a factor of ten. This is impractical for effective concentrations that were expected to reach the millimolar-range. Therefore, the fusion proteins contained a weakened mutation of MBD2, and were titrated with WT MBD2 peptide. This shifted the midpoint by the ratio between WT and mutant affinities, which subsequently was applied as a correction factor. To determine the correction factor, we titrated the fusion protein with the GS120 linker with both WT and mutant peptide (Fig. 2B). The titration midpoint was shifted by a factor of 30, which was applied to the titration midpoint of all variants to produce the effective concentration. As it was impractical to prepare competitor peptides for each linker composition, we used the titrant peptide with a flanking GS-segment for all other linker compositions. As the flanking linker residues may affect the stability of the complex, the correction factor may differ for other linker compositions. A mis-matched correction factor results in a constant shift of the polymer scaling law, but should not affect the scaling exponent.
The effective concentration scales with linker length following a power law as shown by the straight line in Fig. 2C. Notably, this conclusion does not require any assumptions of the distribution of the linker conformations. Relative to one binding partner, geometric considerations suggest that the volume accessible to the other partner scales with the linker length with an exponent of 3v (Fig. 2D). Accordingly, the effective concentrations should scale with an exponent of −3v. Fitting of effective concentrations from the GS-linker series gave a scaling exponent of −1.46 corresponding to a v of 0.49. The GS-linker is a polar tract in a recent systematic classifications,28 and is thus expected to form a relatively compact globules due to backbone interactions.36 Accordingly, the GS-linker was slightly more compact than denatured proteins37 and IDPs.21–23 The excellent agreement with a power law and scaling exponent of approximately −3v suggested that it is valid to estimate effective concentrations based on geometric considerations.
Variation of the linker sequence
The excellent agreement with a power law suggested that effective concentrations could be used to probe the sequence-compaction relationship of IDPs. To probe how the effective concentration depends on linker sequence, we systematically varied the properties that were likely to affect linker compaction: charge, ampholyte strength, rigidity, hydrophobicity and aromaticity. We systematically increased the net charge of the linker by introduction of charged residues into a GS-linker. All linkers used here have a uniform pattern through-out the sequence (Fig. 3A), which allows linkers of different lengths to be described by a single scaling exponent. For each linker composition, we generated linkers with a total of 20, 40, 60 and 120 residues, and measured the effective concentrations through titration experiments. In a few case near the limit of solubility, the longest linker was shortened to 80 residues or the series only contained the three shorter linkers (Table S1). Each linker composition followed a power law as illustrated for linkers containing glutamate residues (Fig. 3B, Fig. S1). Both the pre-factor and the scaling exponent from the fit varied with linker composition. The variation of the pre-factor may be due to mis-match of the correction factor, so in the following we concentrate on the scaling exponent that reports on linker compaction.
Are all charged residues equal
To test if all charged residues affect linker compaction equally, we measured effective concentrations for linkers containing increasing amounts of the four charged residues. For each residue-type, increased net charge per residue lead to chain expansion (Fig. 3C). Glutamate and aspartate caused an equal expansion: The scaling exponent changed gradually from −1.46 (v=0.49) in the uncharged linker to ∼-2.1 (v=∼0.7) at a net charge per residue of 0.2. The scaling exponent did not increase further with increasing net charge suggesting an upper limit that linker does not expand beyond. This limit corresponds to the v previous observed for highly charged IDPs.24 Lysine-containing linkers initially followed the expansion of negatively charged linkers, but continues up to a scaling exponent of ∼-2.4 (v=∼0.8). This is higher than the v-values reported for any other disordered protein. In contrast, arginine-containing linkers have the opposite effect: Residue fractions up to 0.1 had the same scaling exponent as GS-linkers. At higher fractions, the scaling exponent increased, but did not reach the same expansion as the other residue types even at the highest fractions permitted by solubility. This demonstrated that charged residues types have a different impact on IDP compaction.
Polyampholyte linkers
Most IDPs are polyampholytes, which entails that chain compaction is determined by the balance between attractive and repulsive interactions. To investigate the effect of ampholyte strength, we created linkers with equal numbers of glutamate and arginine residues. At low fractions of charged residues, the scaling exponent was roughly constant, but there appeared to be a threshold after which the chain contracts dramatically (Fig. 4A). The last data point at a fraction of charged residues of 0.67 suggested a scaling exponent of −1 (v=0.33), which is the same compaction as globular protein. This observation agrees qualitatively with the contraction caused by screening of charges in a polyampholyte,38 but contradicts the diagram-of-states representation of IDP classes.28,29
The effect of chain rigidity
The rigidity of IDP backbone is mainly determined by the fraction of proline and glycine residues. The baseline GS-linker is highly flexible, and we therefore increased rigidity by introducing proline residues (Fig. 4B). The scaling exponent increased to a plateau at ∼-1.7 (v = ∼0.57), which is reached at a proline fraction of 0.1, which is similar to the dimensions of chemically denatured proteins.37 Proline thus expanded the linker to a lesser extent than charged residues in agreement with previous studies.21 Unlike the other series tested here, the expansion with increasing proline content appeared non-monotonic. This could suggest that different effect dominate at different fractional content of proline, e.g. the propensity to form polyproline II helices.
The effect of hydrophobic interactions
It is unclear strongly hydrophobicity affects compaction of disordered proteins. In IDPs, hydrophobicity is weakly anti-correlated with v,21 whereas in disordered states of foldable proteins this anti-correlation is strong.24 To assess the effect of hydrophobicity systematically, leucine residues were introduced into the linker. We chose leucine as it is among the most hydrophobic of the non-aromatic residues,39 but is not Β-branched and thus less likely to perturb backbone dihedral distributions. Due to solubility, the linker sequence can only be extended up to leucine fractions of 0.2. Introduction of leucine residues lead to a small decrease in scaling exponent to −1.4 (v=0.47) (Fig. 4C). This qualitatively agreed with a weak anti-correlation between size and hydrophobicity.
π-interactions between aromatic residues
Interactions between π-electrons in aromatic side chains have a strong potential to induce intra-chain interaction, most recently demonstrated by their effect on liquid-liquid phase-separation.40 We introduced tyrosine residues as it is the least hydrophobic aromatic amino acid and has the largest effect on phase separation.40 Tyrosine residues caused a noticeable reduction in the scaling exponents already at a fractional content of 0.1, where the scaling exponent was ∼-1.2 (v=0.4) (Fig. 4D). The contraction was smaller than the contraction observed for polyampholytes, although the fractional content of aromatic residues cannot be increased as far due to insolubility. However, the effect of tyrosine residues was larger than that of leucine, which suggests that π-interactions are more important to chain compaction than hydrophobicity.
Are contributions to chain compaction additive
Previous studies have identified individual factors that affect IDP compaction, but have not clarified whether they are additive. We therefore constructed linkers simultaneously increasing net charge and proline content. When probed alone, proline expanded linkers measurably already at fraction of 0.05. However, when proline was introduced together with a charged residue, it did not lead to further expansion, simply resulted in identical scaling exponents as glutamate-only linkers (Fig. 5A). This suggested that expansion caused by chain rigidity and net charge are not additive. Instead, the mixed sequences were simply dominated by the strongest individual effect.
Does hydrophobicity modify charge expansion
Hydrophobicity had a surprisingly small effect when investigated alone. To further test this conclusion in the context of additivity, we created mixed linker series containing glutamate and leucine residues in equal proportion. Again, the scaling exponent followed that of charge series alone. This series changed two parameters at once, so we designed new linker series where either the glutamate content or the leucine content was varied and the other held constant at a residue fraction of 0.1. When the charge expansion was probed in the context of a fraction of 0.1 leucine residues, the curve again followed the glutamate-only series (Fig. 5B). When the fraction of leucine residues was increased within the limits permitted by solubility, the scaling exponent decreases slightly, although within error (Fig. 5C). In total, these data suggest that hydrophobicity in itself has a vanishing effect on the compaction of IDPs, at least within the limits of protein solubility. Furthermore, the factors that affect IDP compaction are not necessarily additive.
Discussion
Linkers control many biochemical reactions via the effective concentration. Here we have shown that effective concentrations in multidomain proteins with disordered linkers follow polymer scaling laws. We have thus experimentally validated the geometric models commonly used to estimate effective concentrations, but also show that the effective concentration depends strongly on linker sequence. For a 100-residue linker, the difference between v values of 0.4 and 0.7 corresponds to a 63-fold change in effective concentration. This can be the difference between an intra-molecular interaction being saturated or hardly formed at all. Changes in the linkers following e.g. ligand-binding or post-translational modification may thus be directly transmitted into allosteric regulation of the domains tethered at the end. Linkers may thus be one of the most direct examples of how the structural properties of intrinsically disordered proteins affect biochemical function, underscoring the need to understand the relationship between IDP sequence and compaction. In the short-term, direct measurement of the effective concentrations using the system developed here may help us understand allostery in IDPs. 9
Sequence-structure relationships in IDPs
Polymer models have been used successfully to describe the structural properties of IDPs and are the foundation for theoretical predictions of effective concentrations. Here we show that the relationship can be reversed: Measurement of effective concentrations is an efficient way to parametrizing polymer descriptions of IDPs and describe the relationship between sequence and compaction. While we study interdomain linkers, it is likely that the conclusions can be generalized to other types of IDPs.
Net charge
Net charge is the strongest predictor of the compaction of IDPs.21,24,25 Previous studies suggest that the maximal expansion is reached at a net charge per residue of ∼0.4. In contrast, we find a value of ∼0.2 (Fig. 3C). In previous studies, the charge expansion occurred against a complex background sequence, and there may thus be compensating attractive interactions. Our value may thus represent a pure estimate of how much charge density it takes to fully expand an otherwise inert IDP, whereas the value of around 0.4 may be more relevant for complex sequences with additional attractive interactions. Another key difference is the role of arginine residues. Arginine form attractive interactions that partially the charge-charge repulsion. This recapitulates the role of arginine in self-association of IDPs during liquid-liquid phase separation, and may thus be due to its capacity to form π-interactions.40 Conveniently, for most positively charged protein sequences this effect may be offset by the higher than average repulsion of lysine residues. Therefore, chain properties may be well-described in terms of the net charge density as long as lysine and arginine are equally common.
Polyampholyte sequences
The compaction of polyampholytes is determined by the balance between attractive and repulsive interactions. As controlled experiments on polyampholytic IDPs have been scarce, organic polyampholytes have been used as models to understand the impact of ampholyte strength on IDP structure. Organic polyampholytes form a range of compact structures, and at high charge densities they eventually become almost globular albeit with a liquid-like internal structure.41 It is not clear how well such models describe proteins. In one case, increase of the ionic strengths lead to expansion of a polyampholytic protein,38 which suggested that polyampholyte interactions are overall attractive. This conclusion is also supported by the compact state adopted by the complex between two oppositely charged IDPs.42 On the other hand, a computational study of polyampholyte sequences suggested that the compact state only arises if charges are unevenly distributed, whereas a well-mixed polyampholytes was predicted to form expanded coils.29 This is summarized in the diagram-of-states description of IDPs, where increase of the strength of neutral polyampholyte leads to a globule-to-coil transition.28 In contrast, we find that increase of the polyampholyte strength lead to compaction (Fig. 4A), which eventually reaches the same dimensions as a globular protein. One possible explanation for this discrepancy is the difference between arginine and lysine. The computational study used lysine,29 whereas we used arginine. Whether ampholyte interactions are attractive or repulsive may thus depend on the type of positive amino acids.
Hydrophobic side chains in IDPs
Hydrophobic side chains in IDPs are mostly solvent exposed. Contraction may bring such side chains in proximity to interact and thus form a partial protection from the solvent. Therefore, hydrophobicity has been found to be anti-correlated with v in disordered proteins.21,24 By testing this finding in a variety of sequence contexts, we found that increasing the hydrophobicity did not lead to a noticeable contraction of the linker. One explanation for this is that the disorder of the chain prevents the proteins from forming even partially desolvated hydrophobic interactions. In complex sequences, hydrophobicity correlates with other factors that could cause chain compaction. Such confounders could explain the correlation of hydrophobicity with the compaction of unfolded states of foldable proteins.24 A key candidate for such a confounder is aromatic residues, which we found to induce strong compaction of the linkers. A tyrosine fraction of 0.1 was thus sufficient to contract the linker to dimensions observed for unfolded, but foldable proteins.24
The additivity of IDP compaction
Several properties affect chain compaction, but how do they add up? Both charge and proline residues expand the linker, however the combination of the two did not expand the linker more than charge alone. This demonstrates that not all contributions to chain compaction are additive. A likely explanation is that rigidity added by proline residues can be accommodated inside the ensemble expanded by charge-charge repulsion. The reason the net charge density is such a good predictor of the compaction of IDPs, may thus be that the strongest effect dominates in the absence of additivity. The most likely explanation for the different threshold for charge expansion is thus the presence of compensating attractive interactions. This suggests an additivity between some factors, but likely not all, and demonstrates that much remains to be uncovered about sequence-compaction relationship of IDPs.
The value of synthetic IDPs
Our present understanding of the relationship between sequence-structure in IDPs is mainly based on the study of natural proteins. In contrast, we have studied synthetic linkers never seen in nature. The synthetic linkers allow tight control over the physical properties of the linker, which is crucial for hypothesis testing. Experiments on synthetic IDPs are thus a natural step for critically evaluating our understanding of IDPs. Synthetic DNA has removed the need for ingenious cloning strategies used previously,43 leaving protein preparation as the major bottleneck. For artificial proteins spanning a range of physical properties, this can however be a major challenge. We have for example not been able to make our linkers in isolation yet. The fusion protein used here serves both as solubility-tags and a reporter system operating at nM concentrations. These are likely the key factors that have allowed us to study a broader range of IDPs than previous studies, and suggest that such fusion proteins provide useful tools for future investigations of sequence-structure relationships in IDPs.
Acknowledgments
This work was supported by grants to M.K. from the “Young Investigator Program” of the Villum Foundation and the AIAS COFUND program (Agreement No. 609033). We wish to thank Birthe B. Kragelund, Mateusz Dyla and Xavier Warnet for critical comments to this manuscript, and Tanja Klymchuk for technical assistance.