Abstract
Determining accurate atomic resolution conformational ensembles of intrinsically disordered proteins (IDPs) is extremely challenging. Molecular dynamics (MD) computer simulations provide atomically detailed conformational ensembles of IDPs, but their accuracy is highly dependent on the quality of the underlying physical models, or force fields, used. Integrative methods that combine experimental data with computational models offer a promising approach to address force field limitations and generate accurate conformational ensembles of IDPs, shedding light on their functional mechanisms. Here, we present a simple and robust maximum entropy reweighting procedure to refine atomic resolution conformational ensembles of IDPs with large experimental datasets consisting of several different types of data. We apply this approach to refine structural ensembles obtained from long timescale MD simulations and generate IDP ensembles with substantially improved agreement with a variety of nuclear magnetic resonance (NMR) and small-angle X-ray scattering (SAXS) measurements. We ask if reweighted IDP ensembles derived from MD simulations run with different force fields converge to similar conformational distributions when extensive experimental datasets are used for refinement. We find that in favorable cases IDP ensembles derived from different force fields become highly similar after reweighting with experimental data. The maximum entropy reweighting procedure presented here enables the integration of atomic resolution MD simulations with extensive experimental datasets and can facilitate the elucidation of accurate, force field independent conformational ensembles of IDPs.
Introduction
Many proteins that perform important biological functions are completely or partially disordered under physiological conditions.1,2 These so-called intrinsically disordered proteins (IDPs) lack a well-defined tertiary-structure in solution and instead populate a conformational ensemble of rapidly interconverting structures. Structurally characterizing the heterogeneous conformational ensembles adopted by IDPs can provide mechanistic insight into their physiological interactions and functions.3,4 IDPs are implicated in many human diseases and are increasingly being pursued as drug targets.5 Determining accurate conformational ensembles of IDPs and IDPs in complex with small molecules can provide valuable insight for the rational design of IDP inhibitors.6–11
Experimentally determining atomic resolution conformational ensembles of IDPs is extremely challenging. Most experimental techniques used to structurally characterize IDPs in solution, such as nuclear magnetic resonance (NMR) spectroscopy and small angle X-ray scattering (SAXS), report on conformational properties averaged over many molecules over long periods of time.12,13 Such ensemble-averaged measurements can be consistent with a large number of conformational distributions. Typical experimental datasets used to characterize conformational ensembles of IDPs are also sparse, meaning that they report on a small subset of structural properties of IDPs. Additionally, many experimental data, such as NMR chemical shifts, are challenging to interpret and predict, as they are sensitive to a combination of many structural properties.14–18
Molecular dynamics (MD) computer simulations are a powerful approach for determining atomic resolution conformational ensembles of IDPs in silico. In principle, long timescale all-atom MD simulations of an IDP driven by an accurate physical model, or force field, can provide atomically detailed structural descriptions of the conformational states populated in solution along with their equilibrium populations. In practice, however, MD simulations are limited by the accuracy of the force fields used to describe the interactions between atoms in molecules.19–24 Recent improvements in molecular mechanics force fields and water models have dramatically improved the accuracy of MD simulations of IDPs as assessed by agreement with a large variety of experimental measurements. However, systematic and system-dependent discrepancies between simulations and experiments remain among the best performing force fields.19–24
Due to the challenges of determining conformational ensembles of IDPs from experimental or computational methods alone, integrative approaches, where experimental data are used to construct or refine computational models of IDP ensembles, have grown increasingly popular.25–37 The maximum entropy principle38,39 is the basis for a number of successful reweighting 25–33 and biasing 40–44 approaches to determine conformational ensembles of proteins. In the reweighting framework, after generating a conformational ensemble using a technique such as unbiased MD simulations, one seeks to introduce the minimal perturbation to the ensemble required to match a given set of experimental data. This is achieved by assigning a new statistical weight to each conformation of the ensemble, thus promoting or disfavoring regions of the conformational space and ultimately altering the size of the reweighted ensemble (the effective ensemble size).
The conformational ensembles of IDPs derived from integrative approaches are highly sensitive to the strengths of the restraints used to enforce agreement with experimental data. These strengths reflect the relative confidence in the experimental data compared to the prior simulation model (ie. the molecular mechanics force field) and should account for errors and uncertainties in both experimental measurements and computational forward models used to predict experimental data from structures.12 When integrating experimental data from several sources, researchers are often required to make subjective decisions about the relative importance of satisfying different classes of restraints.
In maximum entropy reweighting approaches, regularization parameters are often used to balance confidence in the prior simulation model against the satisfaction of each class of experimental restraints.29–33 Choosing regularization parameters effectively requires researchers to decide what fraction of an initial ensemble they are willing to modify to satisfy a set of experimental data within a specified threshold. If many of the conformations in an initial ensemble are in poor agreement with experimental data, this choice can dramatically reduce the effective ensemble size of the reweighted ensemble. While several Bayesian and heuristic approaches have been proposed to infer optimal values of regularization parameters,29–33,35,45 these approaches remain sensitive to estimates of the errors of forward models and experimental measurements, and/or subjective decisions about the desired level of agreement with unrestrained cross-validating data. This flexibility complicates the development of automated procedures for calculating structural ensembles of IDPs and objectively comparing the accuracy of IDP ensembles obtained by reweighting ensembles derived from different force fields.
Here, we introduce a maximum entropy reweighting procedure for combining several types of experimental data with a single adjustable parameter: the desired effective ensemble size of the final reweighted ensemble. This approach does not require manually tuning regularization parameters or the relative weights of experimental restraints. We use our approach to address the following question: if one uses all available experimental data to reweight IDP conformational ensembles obtained from long timescale MD simulations run with different state-of-the-art force fields, how similar are the resulting ensembles? In other words, when extensive experimental data is available, can we determine force field independent IDP ensembles? This question is particularly timely as IDP simulations are increasingly being used to train machine learning approaches based on deep generative models to predict the conformational ensembles of IDPs.46 These methods potentially offer an efficient alternative to MD for generating conformational ensembles in the same way that methods like AlphaFold347 provide accurate structural models of folded proteins. To be most effective, training and validating these approaches will require improved examples of “ground truth” conformational ensembles of IDPs.
To address the questions above, we reweighted conformational ensembles of five well studied IDPs obtained with unbiased long timescale MD simulations run with different force fields.19 We considered five IDPs ranging in length from 40 to 140 residues: Aβ40,48 drkN SH3,49 ACTR,50 PaaA2,51 and α-synuclein.52 We first demonstrate that our reweighting procedure simultaneously improves agreement with several types of NMR data and SAXS data while maintaining reasonable statistical sampling of the most populated conformational states observed in unbiased MD simulations. In all cases studied, we find that the ensembles of IDPs derived from different force fields become more similar to one another upon reweighting. In favorable cases, we observe that reweighted ensembles converge to very similar conformational distributions which may be considered a reasonable approximation of the true solution ensemble. We observe that MD ensembles of longer IDPs and IDPs with highly stable secondary structure elements derived from different force fields can have little-to-no overlap in conformational space before or after reweighting. Our analyses underscore the importance of using accurate force fields and achieving sufficient sampling in MD simulations of IDPs, even if reweighting methods are used to refine the resulting ensembles with experimental data. We anticipate that the reweighting protocol proposed here will provide a valuable tool for integrating MD simulations with extensive experimental datasets to improve the accuracy of atomic resolution conformational ensembles of IDPs and will facilitate the calculation of accurate, force field independent ensembles of IDPs.
Results
In this investigation we performed maximum entropy reweighting of IDP conformational ensembles obtained from long timescale MD simulations run with different state-of-the-art force fields and compared the resulting ensembles. We performed reweighting on 30µs unbiased MD simulations of five IDPs that were previously used to benchmark the ability of modern force fields to describe both IDPs and structured proteins:19 Aβ4048 (40 residues), drkN SH349 (59 residues), ACTR50 (69 residues), The ParE2-associated antitoxin (PaaA2)51 (70 residues), and α-synuclein52 (140 residues). These IDPs span a range of secondary propensity: Aβ40 and α-synuclein contain little-to-no experimentally detectable residual secondary structure, ACTR and drkN SH3 contain regions of residual helical structure and PaaA2 contains two stable helical elements connected by a flexible linker.
We compared MD simulations run with three different protein force field and water model combinations: a99SB-disp 19 with a99SB-disp water, Charmm22*53 with TIP3P water,54 and Charmm36m22 with TIP3P water. We henceforth refer to these protein force field and water model combinations as a99SB-disp, C22* and C36m, respectively. We selected these force field combinations as they were among the top performing force fields in a previous benchmark study considering both folded and disordered proteins.19 We used previously reported experimental datasets containing different combinations of experimental NMR and SAXS data to reweight simulations of each protein. Backbone NMR chemical shifts48–51,55 and residual dipolar couplings (RDCs)50–52,56,57 were used as restraints for all proteins. Three-bond backbone 3J scalar coupling constants were used as restraints for Aβ4058 and α-synuclein59 ensembles. Paramagnetic relaxation enhancements (PREs) were used as restraints for drkN SH3,34 ACTR,50 and α-synuclein52,60–63 ensembles. SAXS data were used as restraints for drkN SH3,64 ACTR,65 PaaA2,51 and α-synuclein37 ensembles.
Developing a maximum entropy reweighting procedure with a single free parameter
We provide a detailed description of our reweighting procedure in the Theory section. Here, we summarize the key features of our approach and provide illustrative examples that highlight the rationale behind the choices made. We build upon the formalism of Bussi and coworkers,26–28 where the weight of the restraint on each experimental data point i is determined by the value of an adjustable regularization parameter σi (Eq. 9). A key feature of our reweighting procedure is the use of effective ensemble sizes of reweighted ensembles to select the values of regularization parameters for reweighting. We quantify the effective ensemble size of reweighted ensemble using the Kish ratio28,66 (Eq. 11). Briefly, the Kish ratio (K) is a measure of the fraction of frames of an original ensemble with significant statistical weights after reweighting. A reweighted ensemble where ∼90% of the frames have statistical weights close to zero will have a Kish ratio of K ≈ 0.10. Conversely, if an unbiased MD ensemble contains 10,000 frames, a reweighted ensemble with a Kish ratio K ≈ 0.01 would contain approximately 100 frames with statistical weights substantially larger than zero.
We initially attempted to reweight IDP ensembles using a cross-validation approach to determine the relative weights of restraints for different experimental data types. We performed reweighting using a single value of the regularization parameter σ for all experimental data points of a given type (ie. Cα chemical shifts, Cβ chemical shifts, SAXS intensities) and iteratively decreased the value of σ while monitoring: i) the root mean squared error (RMSE) between calculated and experimental data used as restraints; ii) the RMSE between calculated end experimental cross-validating data not used as restraints; iii) the Kish ratio of the reweighted ensemble. We sought to identify an optimal value of σ for each experimental data type corresponding to the point where further reducing σ caused agreement with unrestrained cross-validating experimental data to deteriorate. We found, however, that many experimental data types were highly correlated and the RMSE of several cross-validating data sets continued to improve as the Kish ratio approached zero (SI Figure 1).
We tested several Bayesian and heuristic approaches to infer the optimal values of regularization parameters for each type of experimental data by monitoring the agreement of restrained and unrestrained cross-validation data using empirical RMSE thresholds and L-curve or “elbow method” analyses.29,31,45 In many instances, we observed that these approaches produced reweighted ensembles with extremely small effective ensemble sizes, where only hundreds of structures from an initial MD ensemble of nearly 30,000 structures maintained appreciable statistical weights after reweighting (K < 0.01). When reweighting ensembles derived from different force fields, we observed that these approaches produced reweighted ensembles with dramatic differences in their effective ensemble size, depending on the initial agreement of unbiased MD ensembles with experimental data. This complicated global comparisons of the accuracy and structural properties of reweighted ensembles derived from different force fields.
As we desire reweighted ensemble sizes that are large enough to provide reasonable statistics for the conformational properties of IDPs, such as the overall topology of backbone conformations and the relative positions of sidechains, we opted to utilize the value of the Kish ratio of reweighted ensembles to determine regularization parameters for each data type. To account for the errors of forward models used to calculate experimental data and the effects of calculating average quantities over a finite-size ensemble, we introduced an additional regularization parameter σi,MD for each experimental data point i used for reweighting (Eq. 10). We calculated σi,MD by applying the Flyvbjerg block analysis method67 to the forward model predictions obtained from the unbiased MD trajectory. With this addition, we obtain one regularization parameter for each experimental data point , where σreg is the same for all data of a given data type (ie. σreg−Cα, σreg−Cβ, σreg−SAXS).
To determine an optimal minimum Kish ratio threshold for reweighting IDP ensembles, we examined how the conformational properties of reweighted ensembles changed as a function of σreg for each data type. We sought to identify a minimum Kish ratio threshold where the conformational properties of reweighted ensembles, such as secondary structure content, radius of gyration (Rg) and populations of intramolecular contacts, remained relatively consistent as σreg was decreased and further improvements in the agreement between calculated and experimental data were largely achieved by increasing the sparsity of the reweighted ensemble. For each data type, as the weight of experimental restraints was increased (by decreasing σreg) we generally observed an initial regime where agreement with experimental data was achieved by changing the statistical weights of the structures belonging to the most populated regions of conformational space sampled in unbiased MD simulations. As Kish ratios approached smaller values (ie. K < 0.10), improved agreement with experimental data was often achieved by favoring a smaller number of structures from regions of conformational space with unfavorable free energies in unbiased MD simulations.
An illustrative example of this behavior is shown in Figure 1. Here, we use only two experimental backbone NMR chemical shifts of a single residue as restraints to reweight the unbiased a99SB-disp MD ensemble of PaaA2. We examine how reweighted distributions of the predicted values of these chemical shifts change as we increase the weight of these two restraints. We observe that as we decrease σreg from 16.0 to 2.0, we improve agreement with experimental data by increasing the statistical weights of conformations near the most populated free energy basins of the original unbiased MD ensemble. When we further increase the weight of experimental restraints by setting σreg = 1.0, the Kish ratio drops to K = 0.003, corresponding to an effective reweighted ensemble size of approximately one hundred structures from an initial ensemble of 29,976 structures. Notably, we observe that very few of the conformations that maintain non-negligible statistical weights in the reweighted ensemble are located within the free energy basins of the original unbiased MD ensemble (Figure 1). This dramatic reduction in the effective ensemble size and reshaping of the free energy surface results in an extremely marginal improvement in the agreement between the calculated and experimental values of these chemical shifts relative to the reweighted ensemble obtained with σreg = 2.0, which has a Kish ratio K = 0.11. This marginal improvement is likely within the noise of chemical shift predictor SPARTA+.16
We performed similar analyses for each experimental data type, monitoring how the free energy surfaces of reweighted ensembles and agreement with unrestrained cross-validating data varied with σreg. Based on these analyses, we elected to select regularization parameters using a Kish ratio threshold of K = 0.10. This means we iteratively decrease σreg for each individual data type until the resulting reweighted ensemble has a Kish ratio K < 0.10, and select the smallest value of σreg where K ≥ 0.10 as the final σreg value for that data type. After establishing an initial σreg for each individual data type, we perform a global reweighting using all experimental data as restraints simultaneously. In order to ensure a suitable ensemble size for the final reweighted ensemble, we rescaled the regularization parameter of each data type by a global scaling factor σreg−Global. We iteratively decrease the value of σreg−Global and select the minimum value of σreg−Global that produces a reweighted ensemble with Kish ratio K ≥ 0.10 as the final value for global reweighting using all experimental data as restraints. The full reweighting protocol is illustrated for the a99SB-disp MD ensemble of Aβ40 in Figure 2 and summarized in the Methods section. We note that the choice of an optimal Kish ratio threshold is ultimately a subjective decision that will depend on the initial ensemble size of unbiased ensembles, the desired level of agreement with experimental data and the the desired degree of statistical sampling of properties of interest in reweighted ensembles. We found that varying the Kish ratio threshold between K = 0.05 and K = 0.25 did not appreciably change the structural properties of reweighted IDP ensembles or any of the conclusions of this study.
Reweighting IDP ensembles improves agreement with extensive experimental data sets from NMR spectroscopy and SAXS
We sought to determine if our proposed reweighting protocol could simultaneously improve agreement with all data types in large experimental NMR and SAXS datasets for IDPs or if improving agreement with some experimental data types requires sacrificing agreement with others. We treat the NMR chemical shifts of each backbone nucleus (Cα, Cβ, C, Hα, H, N), each type of 3J scalar coupling constant (ie. 3JHNHα and 3JCC) and PREs measured from each nitroxide spin-label as separate experimental data types. The experimental datasets used for reweighting Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein ensembles therefore contain 7, 10, 11, 8, and 14 experimental data types, respectively. For the a99SB-disp, C36m and C22* MD ensembles of each protein we performed: i) reweighting using each experimental data type as the only restraint; ii) a global reweighting using all experimental data as restraints; and iii) a leave-one-out cross-validation reweighting using all but one experimental data type as restraints.
We present a comparison of the results of reweighting calculations performed on the a99SB-disp MD ensemble of Aβ40 in Figure 2. We illustrate the process of determining σreg for each experimental data type of Aβ40 in Figure 2A and the process of determining the global regularization parameter scaling factor σreg−Global in Figure 2B. In both plots, we denote the Kish ratio threshold K = 0.10 with a dashed line. In Figure 2B we observe that the RMSE between calculated and experimental data continues to improve for all data types as σreg−Global is decreased beneath the selected K = 0.10 Kish ratio threshold. With our approach, we choose to prioritize obtaining a reweighted ensemble with a desired minimum effective ensemble size over maximizing agreement with experimental data.
We compare the RMSE between calculated and unrestrained experimental data for each leave-one-out cross-validation reweighting calculation performed on the a99SB-disp Aβ40 MD ensemble in Figure 2C. To compare the relative accuracy of ensembles before and after reweighting we introduce an expression for the normalized RMSE (RMSEN), where we normalize the value of the RMSE calculated for each experimental data type by the corresponding value in the unbiased a99SB-disp MD ensemble (RMSEN = RMSE/RMSEa99SB−disp). We note that for this Aβ40 ensemble, with the exception of backbone RDCs, the agreement of unrestrained cross-validating data with experiment continues to improve as the Kish ratio decreases beneath the selected K ≥ 0.10 threshold. With our approach, we also choose to prioritize obtaining a desired minimum effective ensemble size over maximizing agreement with unrestrained cross-validating data.
We compare the agreement between calculated and experimental data for each reweighted Aβ40 ensemble in Figure 2D. We observe that when we use all available experimental NMR data (Cα, Cβ, Hα, H, and N backbone NMR chemical shifts, backbone 3JHNHα coupling constants, and backbone RDCs) as restraints our reweighting protocol simultaneously improves agreement with all experimental data types (Figure 2D, “All Restraints”). For each experimental data type, we obtain a smaller RMSE between calculated and experimental data when we perform reweighting using only those data as restraints (Figure 2D, “Individual Restraints”) compared to the RMSE obtained when performing reweighting using all available data as restraints. This demonstrates that satisfying multiple types of data requires trade offs in the agreement with each type of restraint and that our reweighting procedure naturally balances the weights of different types of restraints without requiring manual adjustment of regularization parameters or cross-validation thresholds.
We observe that the RMSE between calculated and experimental data for each data type withheld in leave-one-out cross validation reweighting calculations is very similar to the RMSE obtained when reweighting using all experimental data as restraints (Figure 2D, “Cross validation”). This suggests that using a Kish ratio threshold of K ≥ 0.10 to determine regularization parameters for each data type results in minimal overfitting when all experimental data are used as restraints. We observe that the structural properties (secondary structure populations, intramolecular contact populations and Rg distribution) of the Aβ40 conformational ensemble obtained by reweighting with all experimental data and the structural properties of conformational ensembles obtained from leave-one-out cross-validation reweighting calculations were largely indistinguishable (data not shown). This suggests that we are in a data rich regime where all of the reweighted ensembles of Aβ40 derived from the unbiased a99SB-disp Aβ40 MD ensemble are extremely similar.
We present a comparison of the results of reweighting calculations performed on a99SB-disp MD ensembles of drkN SH3, ACTR, PaaA2 and α-synuclein in SI Figure 6, SI Figure 12, SI Figure 16, and SI Figure 19, respectively. We see similar global improvements in the agreement between calculated and experimental data in all proteins, illustrating the robustness of the reweighting protocol proposed here on a range of IDPs. We note one exception in the case of the protein ACTR, where the agreement between calculated and experimental SAXS data obtained using all experimental data as restraints is slightly (6%) worse than the agreement observed in the unbiased a99SB-disp MD ensemble of ACTR (SI Figure 12). We also observe that the RMSE between calculated and experimental SAXS data is 18% worse than the RMSE of the unbiased a99SB-disp MD ensemble when SAXS data is excluded as cross-validating data. This demonstrates some incompatibility in the agreement between experimental NMR data (backbone chemical shifts, backbone RDCs and PREs) and SAXS data in the conformations sampled in the unbiased a99SB-disp MD ensemble of ACTR.
Remarkably, only three of the fifty experimental data types withheld in leave-one-out cross-validation reweighting calculationsperformed on a99SB-disp MD ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein were found to have worse agreement with experiment in reweighted ensembles than in unbiased MD ensembles: ACTR SAXS data, PRE data measured with a spin-label on residue 59 of drkN SH3 (drkN SH3 PRE-59), and α-synuclein backbone 3JHNHα coupling constants (SI Figure 6, SI Figure 12, SI Figure 19). Similarly, only five data types withheld in leave-one-out cross-validation reweighting tests performed on C36m ensembles had worse agreement with experiment in reweighted ensembles than in unbiased MD ensembles (ACTR PRE-3, ACTR PRE-41, ACTR PRE-61, ACTR SAXS data and α-synuclein PRE-103) and only three data types withheld in leave-one-out cross-validation reweighting tests performed on C22* ensembles had worse agreement in reweighted ensembles (drkN SH3 PRE-59, ACTR PRE-41 and α-synuclein PRE-103). These results demonstrate that the reweighting protocol introduced in this investigation to refine IDP ensembles improves agreement with extensive experimental datasets from NMR spectroscopy and SAXS with minimal overfitting.
Comparing the accuracy of reweighted IDP ensembles derived from different force fields
We sought to compare the accuracy of reweighted ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein derived from 30µs unbiased a99SB-disp, C22* and C36m MD simulations based on their agreement with experimental data. In order to compare the accuracy of the reweighted ensembles obtained from different force fields we define a global quality index for each system: the averaged normalized RMSE . First, for each force field and data type i, we compute the agreement between calculated and experimental data in the reweighted ensemble (RMSEi). We then normalize each value of RMSEi by the corresponding RMSE value in the unbiased a99SB-disp MD ensemble (RMSEN, i = RMSEi/RMSEi,a99SB−disp) and finally average across all M data types to obtain .
We observe similar agreement between calculated and experimental data in the reweighted a99SB-disp, C22*, and C36m ensembles of ACTR (Figure 3), Aβ40 (SI Figure 2) and drkN SH3 (SI Figure 7). The of the reweighted ensembles of ACTR are 0.84, 0.84, 0.85, for a99SB-disp, C22* and C36m, respectively (Figure 3C); the of reweighted ensembles of Aβ40, are 0.74, 0.74, 0.66, for a99SB-disp, C22* and C36m, respectively (SI Figure 2C); and the of reweighted ensembles of drkN SH3 are 0.80, 0.95, 0.91, for a99SB-disp, C22* and C36m, respectively (SI Figure 7C). All of the reweighted ensembles of these three IDPs are in exceptionally good agreement with experimental NMR and SAXS data, and are in substantially better agreement with experimental data than the most accurate unbiased MD ensembles of these proteins reported in previous benchmark studies. 19,21
We observed substantially larger differences between calculated and experimental data in reweighted ensembles of PaaA2 (Figure 4) and α-synuclein (SI Figure 20). The of reweighted ensembles of PaaA2 are 0.80, 1.39, 1.14, for a99SB-disp, C22* and C36m, respectively (Figure 4C), while the of reweighted ensembles of α-synuclein are 0.79, 2.46, 1.97, for a99SB-disp, C22* and C36m, respectively (SI Figure 20C). Reweighted a99SB-disp, C36m and C22* ensembles of PaaA2 have a similar agreement with experimental NMR chemical shift and RDC data (Figure 4C). The agreement between calculated and experimental data in reweighted C22* and C36m α-synuclein ensembles is, however, substantially worse than the agreement observed in the reweighted a99SB-disp ensemble (SI Figure 20C).
The agreement with experimental SAXS data is particularly poor in reweighted C22* and C36m ensembles of PaaA2 and α-synuclein. The C22* and C36m MD ensembles of these IDPs are substantially more compact than the a99SB-disp ensembles both before and after reweighting (Figure 4, Figure 5, SI Figure 20, SI Figure 22). Reweighting C22* and C36m ensembles of PaaA2 and α-synuclein only produces a marginal improvement in their agreement with the experimental SAXS data (Figure 4C, SI Figure 20). This results from the fact that unbiased C36m and C22* ensembles of PaaA2 and α-synuclein contain almost no conformations with a Rg as large as the experimental Rg determined from SAXS19,37,51 (Figure 5, SI Figure 22). This places an upper limit on the agreement that can be obtained from reweighting regardless of the specified Kish ratio threshold. Reweighted C36m and C22* ensembles of α-synuclein also have substantially worse agreement with experimental NMR chemical shift, RDC, and PRE data than the reweighted a99SB-disp ensemble, due to an overestimation of β-sheet content and overly compact ensemble dimensions (SI Figures 20-23).
Comparing the structural properties of reweighted IDP ensembles derived from different force fields
We next sought to determine if the ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein obtained by reweighting MD ensembles derived from different force fields converge to similar conformational distributions when extensive experimental datasets are used as restraints. To assess the similarity of unbiased and reweighted conformational ensembles of each protein, we compared the populations of α-helical and β-sheet secondary structure elements, the populations of intramolecular contacts, and the free energy surfaces of each ensemble as a function of Rg and the α-helical order parameter Sα68 (See Methods). The populations of α-helices in unbiased and reweighted ensembles of ACTR and PaaA2 are shown in Figure 3 and Figure 4, respectively. The populations of α-helices in unbiased and reweighted ensembles of Aβ40, drkN SH3 and α-synuclein are shown in SI Figure 2, SI Figure 7, and SI Figure 20, respectively. We observe highly consistent populations of moderately stable helices (populations of 20%-40%) in the reweighted ensembles of ACTR and drkN (Figure 3, SI Figure 7). In the case of PaaA2, we observe large differences in the locations and populations of helical elements in unbiased MD ensembles that become substantially more similar after reweighting (Figure 4). We note that there are strong signals for the helical elements of drkN SH3, ACTR and PaaA2 in experimental NMR chemical shift, RDC, and NOE data and that the populations of helical conformations observed in our reweighted ensembles are in excellent agreement with helical populations directly estimated from experimental data.49–51,56
The populations of β-sheets in unbiased and reweighted ensembles for Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein are shown in SI Figure 2, SI Figure 8, SI Figure 13, SI Figure 17, and SI Figure 21, respectively. We observe that all reweighted ensembles of Aβ40 converge to very similar secondary structure populations, with residues 15-20 and residues 30-35 forming β-sheets with populations of 20%-40% (SI Figure 2). We observe several highly populated β-sheets (populations of 40%-95%) in unbiased C36m and C22* MD ensembles of drkN SH3 and α-synuclein that produce notable discrepancies with experimental NMR data (SI Figure 7, SI Figure 8, SI Figure 20, SI Figure 21). The populations of these β-sheets are only partially decreased upon reweighting. The unbiased C36m MD ensemble of drkN SH3 contains β-sheets between by residues 1-10 and residues 20-30 with populations of 40%-80%. These populations are reduced to 20%-40% after reweighting with all experimental data. The reweighted C36m drkN SH3 ensemble, however, has substantially worse agreement with NMR chemical shifts and 3JHNHα scalar couplings than the reweighted a99SB-disp drkN SH3 ensemble, which contains very little residual β-sheet structure. We observe similarly modest reductions in spurious populations of β-sheets in reweighted C22* and C36m ensembles of α-synuclein (SI Figure 20, SI Figure 21).
To determine if the persistence of spurious β-sheet populations is the result of our selected Kish ratio threshold of K = 0.10 we computed reweighted C22* and C36m ensembles of drkN SH3 and α-synuclein using a Kish ratio threshold of K = 0.01. We observed similar β-sheet populations in these reweighted ensembles, suggesting that the persistence of these β-sheets is likely the result of the relatively weak effect of residual β-sheet structure on experimental secondary NMR chemical shifts36,69 and the properties of empirical structure based chemical predictors. Previous work has shown that experimental secondary NMR chemical shifts and structure-based backbone NMR chemical shift predictions from algorithms such as SHIFTX+17 and SPARTA+16 are inherently less discriminative of conformations in the β-sheet region of backbone ϕ/ψ Ramachandran space than conformations in the α-helical region.36,69
We observe that IDP ensembles derived from MD simulations run with different force fields converge to very similar populations of residual helical structure after reweighting, but IDP ensembles with identical populations of α-helices can have large differences in the lengths of helical elements observed in individual conformations. An IDP ensemble containing ten consecutive residues with 50% α-helical populations might be constituted of only structures containing a contiguous ten-residue helix or structures with no helical content. Alternatively, this ensemble could contain conformations with short helical elements where the populations of helical turns are relatively uncorrelated from neighboring residues. To obtain a more detailed measure of the similarity of the individual conformations within unbiased and reweighted IDP ensembles, we compared the free energy surfaces of the unbiased and reweighted ensembles of each protein as a function of Rg and the α-helical order parameter Sα. Sα is a measure of the number of seven residue fragments in a structure that resemble an ideal helix68 (See Methods). The free energy surfaces of unbiased and reweighted ensembles of PaaA2 are shown as a function of Rg and Sα in Figure 5. The free energy surfaces of unbiased and reweighted ensembles of Aβ40, drkN SH3, ACTR and α-synuclein are shown as a function of Rg and Sα in SI Figure 3, SI Figure 9, SI Figure 14 and SI Figure 22, respectively.
We observe dramatic differences in the free energy surfaces of the unbiased a99SB-disp, C36m and C22* ensembles of PaaA2 both before and after reweighting (Figure 5). The reweighted a99SB-disp, C22* and C36m PaaA2 ensembles have similar per-residue populations of helical conformations but have very little overlap when projected using Rg and Sα descriptors. For comparison, we plot the Rg and Sα values of 50 conformations contained in a protein ensemble database (PED)70 ensemble of PaaA2 calculated directly from NMR and SAXS restraints (PED00013)51 as white dots in each free energy surface in Figure 5. We observe that the unbiased a99SB-disp PaaA2 MD ensemble has two broad free energy minima in the Rg and Sα projection. One of these minima overlaps very well with the conformations in PED00013 ensemble, and the depth of this minimum is substantially increased upon reweighting. The overly compact unbiased C22* and C36m MD ensembles of PaaA2 have relatively little overlap with the experimental PaaA2 ensemble in this projection. Upon reweighting, the overlap with more compact conformations in the experimental ensemble increases, but the reweighted C22* and C36m ensembles are substantially more similar to the unbiased MD ensembles from which they were derived than to the experimentally restrained PED ensemble or the reweighted a99SB-disp PaaA2 ensemble. Reweighted PaaA2 ensembles clearly do not converge to similar conformational distributions across different force fields.
Free energy surfaces of unbiased and reweighted drkN SH3 ensembles are compared to an experimental ensemble (PED00427) in SI Figure 9. All three unbiased MD ensembles of drkN SH3 have substantial overlap with the experimental ensemble in the Rg and Sα projection, and the free energy surfaces of reweighted ensembles are relatively similar. In contrast to PaaA2, reweighted drkN SH3 ensembles appear to converge to similar conformational distributions with similar lengths of helical elements and a similar coupling between helix formation and chain compaction. Free energy surfaces of reweighted and unbiased ensembles of Aβ40, ACTR and α-synuclein are shown as a function of Rg and Sα in SI Figure 3, SI Figure 14 and SI Figure 22, respectively. Reweighted ensembles of Aβ40, which have minimal helical content, are fairly similar in this projection. Unbiased ensembles of ACTR have fairly large differences in this projection, and become more similar after reweighting. Unbiased and reweighted α-synuclein ensembles have very little overlap in these projections.
We examine the similarity of the populations of intramolecular contacts, or contact maps, of unbiased and reweighted ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein in SI Figure 4, SI Figure 10, SI Figure 15, SI Figure 18 and SI Figure 23, respectively. Unbiased Aβ40 ensembles have a similar pattern of intramolecular contacts, defined by contacts between residues 15-20 and residues 30-35, with notable differences in populations. The populations of these contacts, which are driven by transient β-sheet formation, become more similar after reweighting (SI Figure 4). The populations of intramolecular contacts of drkN SH3 and ACTR are quite similar in unbiased MD ensembles of C36m and C22*, while the more extended unbiased a99SB-disp MD ensembles contain fewer intramolecular contacts. Upon reweighting ensembles of drkN SH3 and ACTR with experimental datasets that contain PREs and SAXS data, the contact maps become more similar, and the contact maps of the reweighted ACTR ensembles are particularly similar (SI Figure 15).
The reweighted C36m and C22* ensembles of drkN SH3 contain several intramolecular contacts resulting from β-sheet structure not observed in the reweighted a99SB-disp ensemble (SI Figure 10). The overly compact unbiased C36m and C22* MD ensembles of PaaA2 and α-synuclein contain a large number of highly populated intramolecular contacts relative to the more realistically expanded unbiased a99SB-disp MD ensembles (SI Figure 18, SI Figure 23). The contacts observed in C36m and C22* PaaA2 and α-synuclein ensembles remain highly populated after reweighting, as there are too few extended conformations in the unbiased MD ensembles to reduce the populations of these contacts.
Quantifying the similarity of reweighted IDP ensembles derived from different force fields
While comparing IDP ensembles using selected reaction coordinates of interest (such as Rg and Sα) can be both intuitive and informative, we sought to obtain a more objective and quantitative description of the similarity of unbiased and reweighted ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein. To do so, we utilized the energy landscape visualization method (ELViM) for dimensionality reduction.71–73 The ELViM approach uses the distances between Cα carbons in each conformation of a protein ensemble as an input, calculates a dissimilarity matrix between all pairs of conformations based on differences in the populations of Cα-Cα contacts and projects the information contained in the high-dimensional dissimilarity matrix onto a low-dimensional latent space. The ELViM method is conceptually similar to using t-stochastic nearest neighbor embedding (t-SNE) to cluster conformations of IDPs.8,74 Projecting IDP ensembles onto a low-dimensional latent space derived directly from atomic coordinates provides a more objective comparison of the similarity of ensembles than examining the similarity of a small number of subjectively selected structural descriptors.
We applied the ELViM approach to compare the similarity of the conformational ensembles sampled in the unbiased a99SB-disp, C36m and C22* MD simulations of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein. For each protein, we concatenated the three unbiased MD ensembles into a single merged ensemble, computed the dissimilarity matrix between all conformations and used the ELViM algorithm to project the conformations of the merged ensemble onto a two-dimensional (2D) latent space (See Methods). We then used kernel density estimates of each unbiased and reweighted ensemble projected on to the ELViM latent space to compare the ensembles. A comparison of the projections of the unbiased and reweighted ensembles of ACTR and PaaA2 on their ELViM latent spaces are shown in Figure 6 and Figure 7, respectively. To obtain more insight into the nature of the ELViM latent spaces, we project the Rg and Sα values of each conformation of the merged ensembles on their respective ELViM latent space and present snapshots of conformations from selected regions of ELViM density projections in unbiased and reweighted ensembles. A comparison of the projections of the unbiased and reweighted ensembles of Aβ40, drkN SH3 and α-synuclein on their respective ELViM latent space are shown in SI Figure 5, SI Figure 11 and SI Figure 24, respectively.
We display overlays of the ELViM latent space embeddings of all ensembles of Aβ40, drkN SH3, ACTR, PaaA2 and α-synuclein in Figure 8A. To provide a quantitative measure of the similarity of the unbiased and reweighted ensembles of each protein in the ELViM latent space we define a density overlap metric S (Eq. 13), which is analogous to the overlap integral used to quantify the overlap of electronic wave functions in quantum mechanics (See Methods). We first construct a kernel density of each ensemble in the ELViM latent space. We compare the similarity of two ensembles by computing the overlap integral S of their kernel densities, normalized such that the value of S ranges from [0,1] (Eq. 14). Two densities with no overlapping points will have an overlap integral value of S=0, while two identical densities will have an overlap integral value of S=1. We display the values of the overlap integrals between the reweighted and unbiased ensembles of each protein, expressed as an overlap percentage (S · 100%) in Figure 8B. Values in the blue triangles reflect the overlap of unbiased MD ensembles derived from different force fields, values in red triangles reflect the overlap of reweighted ensembles derived from different force fields, and the diagonal elements reflect the overlap of the reweighted ensemble derived from each force field with the unbiased MD ensemble from which it was derived.
The ELViM projections of ACTR and Aβ40 and ensembles shown in Figure 6, Figure 8 and SI Figure 5 illustrate that the unbiased a99SB-disp, C36m and C22* MD simulations of these proteins sample the same regions of conformational space with different probabilities, and that these probabilities are adjusted to produce highly similar conformational distributions after reweighting with experimental data. The reweighted ensembles of ACTR and Aβ40 share similar high density regions in the ELViM latent space, which is consistent with the similarity of the intramolecular contact maps and secondary structure propensities of the reweighted ensembles of these proteins (SI Figures 2-4, SI Figures 13-15). Unbiased drkN SH3 ensembles initially have relatively little overlap in the ELViM latent space, but their overlap is substantially increased upon reweighting (Figure 8, SI Figure 11).
The ELViM projections of PaaA2 ensembles shown in Figure 7 and Figure 8 illustrate that there is very little overlap of unbiased MD ensembles derived from different force fields in the space of Cα contacts. This is reflected by overlap percentages of less than 2% between unbiased ensembles (Figure 8B). As a result, while reweighting marginally increases the overlap of the ensembles, reweighted PaaA2 ensembles are substantially more similar to the unbiased MD ensembles from which they were derived than to one another. This results from the fact that the stable helical elements in PaaA2 ensembles can pack in distinct discrete orientations in the overly collapsed C22* and C36m ensembles, and the unbiased MD ensembles do not sample the same packing orientations. The unbiased and reweighted ensembles of α-synuclein derived from different force fields are as similarly disjoint as the PaaA2 ensembles in the ELViM latent space, with almost no overlap before or after reweighting (Figure 8, SI Figure 24).
Discussion
In this investigation we have proposed a simple and robust maximum entropy reweighting approach to refine atomic resolution conformational ensembles of IDPs using large datasets consisting of several different types of experimental data. Our proposed reweighting procedure contains a single free parameter, the desired effective ensemble size of the reweighted ensemble, and naturally balances the weights of restraints for different types of experimental data. This approach does not require any a priori knowledge of the accuracy of an initial IDP ensemble, the magnitude of errors of experimental measurements, the correlation of experimental observables or the accuracy of forward models for predicting experimental data. We demonstrate, through extensive cross-validation, that the proposed reweighting approach simultaneously improves agreement with several types of experimental NMR data and SAXS data with minimal overfitting while maintaining a desired degree of sampling of the most populated regions of conformational space in unbiased MD ensembles.
The reweighting approach presented here builds upon several successful maximum entropy and Bayesian approaches for reweighting conformational ensembles of flexible molecules with experimental data.26–32 Refining conformational ensembles of flexible molecules with maximum entropy reweighting poses two inherent challenges: i) determining how to balance the weights of experimental restraints against one’s confidence in the initial conformational ensemble, and ii) determining the relative weights of different types of experimental restraints. Several heuristic and theoretical approaches have been proposed for determining or inferring the weights of experimental restraints when reweighting ensembles of flexible molecules.29–33,35,45 When considering experimental datasets with a small number of experimental data types cross-validation or L-curve/elbow methods provide a practical and direct approach. When performing reweighting using a few sets of experimental data it is reasonable and expedient to determine the weight of each set of restraints by identifying weight values where the agreement of calculated cross-validating data with experiment begin to deteriorate.
When reweighting conformational ensembles with data from many different experiments this process becomes less straightforward and one must attempt to normalize the relative RMSE between predicted and experimental values based on estimates of experimental measurement errors and the errors of forward models used to predict experimental data. This is frequently done through the use of χ2 values obtained by normalizing RMSE calculations with various error models. While it is possible to propose reasonable estimates of experimental measurement and forward model prediction errors for different types of data, these quantities are inherently uncertain and these estimates introduce a degree of subjectivity into reweighting protocols. Additionally, in cases where many experimental data are highly correlated, such as NMR data that report on the distribution of backbone dihedral angles of IDPs (ie. chemical shfits, scalar coupling constants, RDCs), cross-validation may not be a practical approach to determine the relative weights of restraints.
We set out to develop a reweighting protocol to refine IDP conformational ensembles obtained from all-atom MD simulations that is minimally biased and maximally automated. We have proposed a heuristic approach for determining the relative weights of experimental restraints that directly builds upon previous work that considers the Kish ratios of reweighted ensembles.28 This approach is motivated by several practical considerations. One motivation is a desire to objectively compare the accuracy of IDP ensembles obtained from reweighting unbiased MD simulations run with force fields with different levels of accuracy. If a reweighting procedure contains adjustable parameters to determine the relative importance of different types of experimental data in a reweighting objective function, it is possible to adjust the relative “scores” or likelihoods of reweighted ensembles derived from different force fields by adjusting these parameters.
A second motivation for using the effective ensemble size of reweighted ensembles to choose the relative weights of experimental restraints is a desire to directly compare the structural properties and similarity of reweighted ensembles derived from different force fields. If we choose to use RMSE (or χ2) thresholds to determine the weights of experimental restraints, we observe that reweighted ensembles derived from different force fields can achieve similar RMSEs with experimental data with dramatically different effective ensemble sizes. Requiring an overly compact C22* or C36m ensemble to agree with SAXS data within a specified tolerance could require reducing the ensemble to a small handful of tens of structures, while a more realistically extended a99SB-disp ensemble might contain 1,000s or 10,000s of structures after reweighting. Comparing the structural properties and structural overlap of reweighted ensembles with orders-of-magnitude differences in effective ensemble sizes is less meaningful than comparing ensembles with similar effective ensemble sizes.
An additional motivation for using the effective ensemble size of reweighted ensembles to determine the relative weights of restraints is our desire to adequately represent the most populated regions of conformational space sampled in unbiased MD simulations in reweighted IDP ensembles. One application of our research efforts to determine accurate atomic resolution ensembles of IDPs is to obtain mechanistic insight into their binding mechanisms with small molecule drugs.6–11 These dynamic and heterogeneous binding mechanisms are mediated by statistical distributions of backbone conformations and positions of sidechain pharmacophores. As such, IDP ensembles with a small number of conformations that sparsely sample backbone topologies and the relative positions of sidechain atoms are unlikely to be informative for predicting small molecule binding sites or understanding subtle shifts in the conformational ensembles of IDPs that occur upon small molecule binding.6–8,10,11
We note a conceptual similarity of the reweighting procedure proposed here with the concept of gentle ensemble refinement in the recently published work of Köfinger and Hummer.32 In this work, the authors propose an elegant approach to balance the weight of experimental restraints in maximum entropy reweighting against confidence in an initial prior model. They relate the KL divergence between the weights of an initial and reweighted ensemble to an energy uncertainty of the unbiased ensemble, and propose to select the optimal value of restraint weights by relating the expected force field accuracy in the space of experimental observables to the expected energy variance upon reweighting. While motivated by theoretical considerations, this approach is similar in spirit and in practice to the heuristic approach proposed here, where we specify an acceptable limit to changes in the statistical weights of conformations in unbiased ensembles a priori.
We draw attention to the small effective ensemble sizes obtained with our proposed reweighting procedure if we utilize agreement with restrained or unrestrained data as the primary criterion to select the strength of the experimental restraints. We observe that the RMSE between calculated and experimental data of both restrained and unrestrained crossvalidating data largely improve as the weight of experimental restraints is increased until ensemble sizes become very small (Kish ratio K < 0.01), corresponding to effective ensemble sizes of a few hundred structures. We note that several approaches to calculate IDP conformational ensembles with a minimal set of structures, sometimes referred to as maximum parsimony 12 approaches, have previously found that ensembles of ∼50-500 structures are required to best reproduce NMR and SAXS data of IDPs.34–37,50,51,61 It appears that the IDP ensembles obtained by reweighting long timscale MD simulations using cross-validation approaches produce ensembles with a similar size to ensembles calculated by minimal ensemble approaches. We observe that the reweighted ensemble of IDPs obtained in this investigation have similar structural properties to conformational ensembles calculated from NMR and SAXS data using minimal ensemble or maximum parsimony approaches. We find that α-helical populations of reweighted drkN SH3 ensembles obtained in this work agree well with the helical populations of drkN SH3 ensembles derived from NMR and SAXS data with approaches such as ENSEMBLE,34 X-EISD35 and IDPConformerGenerator.75 We also observe that α-helical populations in PaaA2 ensembles obtained from our reweighting approach agree reasonably well with populations obtained in a previously reported PaaA2 ensemble51 derived from NMR and SAXS data using the EOM approach.76
We find that our proposed reweigthing procedure only marginally reduces the populations of potentially spurious β-sheets in reweighted ensembles of drkN SH3 and α-synuclein relative to those calculated by X-EISD35 and ASTEROIDS.37 We note that these approaches, and other ensemble calculations approaches that generate conformations by statistically sampling backbone dihedral angles, may not sample any hydrogen-bonded β-sheet conformations in initial conformational pools used for ensemble selection, but rather aim to identify elevated sampling of β-sheet Ramachandran space relative to statistical coil models. Therefore, populations of β-sheets observed in reweighted MD ensembles may not be directly comparable to β-propensities identified in these ensemble calculation approaches. Nevertheless, we caution that the presence of substantially populated β-sheets in reweighted MD ensembles produced by our reweighting procedure does not necessarily suggest strong experimental evidence for residual β-sheet populations. We encourage a direct inspection of agreement with experimental data for these β-sheet elements. We emphasize that an important difference between ensembles obtained from reweighting long timescale MD simulations and ensemble selection approaches such as ASTEROIDS,36,37,61 EOM76 or IDPConformerGenerator75 is that the positions of side chains in reweighted MD ensembles are governed by the physics of the underlying force fields, compared to a stochastic sampling of side chain rotamer libraries to place side chains and avoid steric clashes. The positions of sidechains in reweighted MD ensembles may therefore have more physical meaning.
The maximum entropy reweighting procedure developed in this investigation enables us to address the following question: are modern force fields sufficiently accurate that MD simulations with adequate sampling will converge to the similar underlying conformational distribution when extensive experimental datasets are used for reweighting? This question can be alternatively viewed as: with sufficient experimental data, does the problem of determining structural ensembles of IDPs with modern force fields and maximum entropy reweighting methods become well-defined12? In the case of Aβ40 and ACTR, two shorter IDPs (40 and 69 residues, respectively) with moderate populations of residual secondary structure elements, we find substantial overlap in the unbiased a99SB-disp, C36m and C22* MD ensembles from both traditional structural descriptors and from a low dimensional projection of the ensembles onto a latent space defined by the positions of Cα carbons. Upon reweighting, we find that the ensembles become substantially more similar to one another and have similar levels of agreement with experimental data. These encouraging results suggest that accessible conformational space of these proteins is well sampled in 30µs-long MD simulations run with different force fields, and that available experimental NMR and SAXS datasets provide sufficient information to correct the simulated ensembles into similar conformational distributions. While this is not evidence that these reweighted ensembles are perfectly faithful representations of the true solution ensembles of these proteins, it is encouraging that each force field does not produce a unique reweighted ensemble with equally good agreement with experimental restraints. These results suggest that with modern force fields and sufficient NMR and SAXS data, the challenge of determining structural ensembles of IDPs with fewer than 70 residues may now be becoming well-defined, in that consensus descriptions of solution ensembles are beginning to emerge.
In contrast, the reweighted ensembles of PaaA2 and α-synuclein produced in this work have extremely little overlap. The unbiased MD ensembles of these proteins appear to sample almost entirely disjoint regions of conformational space. The lack of overlap of unbiased ensembles is likely the result of a combination of force field inaccuracies and insufficient sampling in unbiased 30µs MD simulations. While reweighting improves some properties of the overly compact C22* and C36m ensembles of PaaA2 and α-synuclein, such as the distributions of backbone dihedral angles, we are clearly far from identifying a consensus description of the solution ensembles of these proteins. Based on the superior agreement with experimental data, the reweighted a99SB-disp ensembles of PaaA2 and α-synuclein are likely substantially more representative of the true solution ensembles of these proteins than the reweighted C22* and C36m ensembles. In the case of drkN SH3, the unbiased ensembles appear relatively similar aside from substantial populations of β-sheets in the C36m and C22* ensembles. Reweighting reduces the populations of these β-sheets and substantially increases the overlap of the conformational ensembles, but agreement with experimental data indicate that the reweighted a99SB-disp drkN SH3 ensemble is the most accurate representation of this protein in solution.
Accurate, force field independent conformational ensembles of IDPs, like the ones determined here for Aβ40 and ACTR, are extremely valuable for a variety of reasons. First, these reweighted ensembles can be used as potential target distributions when assessing the accuracy of coarse-grained representations of IDPs,77,78 machine learning methods for predicting atomic resolution conformational ensembles,46,79 and ensemble properties of IDPs.3,80,81 Additionaly, these reweighted ensembles can serve as training data for machine learning approaches to generate IDP conformational ensembles at reduced computational cost compared to standard MD simulations. Force field independent ensembles could complement the information encoded in the co-evolution of amino-acid sequences and the structures deposited in the Protein Data Bank,82 on which methods like AlphaFold347 were trained, but which provide limited information about the conformational heterogeneity of dynamic and disordered proteins. A major challenge for these efforts will be generating a sufficient number of high-quality conformational ensembles to train IDP models that generalize well to unseen data, especially when considering the vast conformational space potential of IDPs.
In conclusion, we have proposed and validated a robust approach for integrating MD simulations of IDPs with extensive experimental sets to improve the accuracy of atomic resolution conformational ensembles of IDPs. We have provided an in depth comparison of the reweighted IDP ensembles derived from different force fields and have shown that in favorable cases these ensembles converge to very similar conformational distributions after reweighting. We have also shown that for several proteins, IDP ensembles obtained from reweighting atomic resolution MD simulations have very similar properties to IDP ensembles calculated directly from experimental data with minimal ensemble methods. These results demonstrate substantial progress in the field of IDP ensemble modeling, and suggest that the field may be maturing from the realm of assessing the accuracy of disparate computational models to the realm of atomic resolution integrative structural biology. We anticipate that the maximum entropy reweighting protocol proposed here will provide a valuable tool for calculating accurate atomic resolution IDP ensembles, assessing the accuracy of future force field development efforts, scrutinizing IDP ensembles obtained from artificial intelligence and machine learning methods and ultimately providing high-quality data to train novel, efficient, and accurate deep learning models for IDP conformational ensemble generation.
Theory
In the following sections we provide a brief overview of the theory of the maximum entropy reweighting and the treatment of different sources error with regularization parameters. We focus on the formalism proposed by Bussi et al.26–28 which is the basis of our approach. We build upon this formalism by proposing a heuristic approach to determine regularization parameters in an automated fashion, using the Kish ratio (Eq. 11).
Overview of maximum entropy reweighting
A MD simulation generates a prior distribution P0(x) of conformations x. In initial implementations,83–85 maximum entropy reweighting approaches were designed to determine the distribution P1(x) which is the closest to P0(x) that fits exactly each experimental data point . To measure the statistical distance between the two distributions P0(x) and P1(x), one can use the Kullback–Leibler divergence, defined as:
To determine P1(x), the closest distribution to P0(x) that fits the ensemble-averaged data, the KL divergence is then minimized subject to the following two constraints: where fi(x) is the forward model needed to predict the value of observable i for each conformation x. In Eq. 2, each experimental observation constrains the ensemble average of the corresponding forward model fi(x) computed over the distribution P1(x) to be exactly equal to . The second constraint in Eq. 3 ensures that P1(x) is normalized.
The problem of finding the distribution P1(x) that is the best representation of the state of knowledge of a system after observing a set of experimental data can be cast in the framework of information theory. In this framework, amongst all the possible distributions compatible with the data, the one that has the highest degree of uncertainty should be selected in order to ensure minimal bias and that the data has been used as conservatively as possible. This uncertainty is expressed by the Shannon entropy38 of the random variable x with distribution P1(x) with respect to the reference distribution P0(x):
Maximizing the Shannon entropy in Eq. 4 to determine P1(x) is thus equivalent to minimizing the KL divergence in Eq. 1, hence the name of maximum entropy reweighting.
The solution for the minimization of the KL divergence subject to the constraints in Eq. 2 and 3 can be achieved by using Lagrange multipliers. We first define the following Lagrange function: where λi and µ are Lagrange multipliers. There is one Lagrange multiplier λi for each of the of the M experimental data points being used as constraints. To find stationary points of the Lagrangian, we set ∂L/∂P1 = 0 and neglect the normalization factor. This leads to the following form for the desired distribution P1(x):
Approaches such as gradient descent or appropriate quasi-Newton methods can be utilized to calculate the values of the Lagrange multipliers, which ultimately yield to a new set of weights, one for each conformation of the ensemble:
Introducing different source of errors
By fitting exactly the experimental observations, the formalism introduced above neglects the fact that experimental data as well as forward models can be affected by random and systematic errors and therefore data should not be matched exactly to avoid overfitting to noise. In order to alleviate the problem of data overfitting, maximum entropy reweighting approaches can be modified to account for different sources of errors as well as a limited confidence in the prior distribution. Different approaches have been proposed over the years.27,30–33 In the formalism of Bussi et al.,27 errors are modeled by modifying the constraints of Eq. 2 by introducing an auxiliary variable for each data point ɛi, which represents the difference between experimental and predicted values. The new constraints are hence defined as follows:
Errors can be modeled by choosing a proper prior distribution function for the variables ɛi. We choose a Gaussian prior with fixed standard deviation σi for the ith observable.
The value of σi corresponds to the level of confidence in the ith experimental data point and ultimately determines how well the reweighted ensemble will fit the data. σi = ∞ implies no confidence in the data and results in a reweighted ensemble that is identical to the unbiased simulation. σi = 0 implies complete confidence in the data and results in a reweighted ensembles in which the ith data point is exactly matched, as in Eq. 2. The selection of appropriate values of the regularization parameters σi for each experimental data point is therefore essential to balance one’s confidence in a prior model (ie. the accuracy of a force field), error estimates of experimental measurements and error estimates of the accuracy of forward models for predicting experimental data. Ultimately, the properties of the resulting conformational ensembles will be highly dependent on the relative magnitudes of the regularization parameters selected for each experimental data point.
Different strategies have been developed to identify the optimal values of σi. Typically, when performing reweighting using multiple types of experimental data, one value of σi is assigned to each class of experimental data, such as SAXS intensities or NMR chemical shifts, RDCs or scalar couplings. A grid search is then performed, usually independently for each class of experimental data, and the optimal value of σi is identified for each class to maximize the agreement of the reweighted ensemble with independent experimental data not used for reweighting.29A final reweighting step using all the available experimental data is then performed.
Determining regularization parameters
In our proposed maximum entropy approach, we adopt the above formalism of Bussi et al. with the following modifications. We assign one regularization parameter σi to each data point. To optimize the grid search over the space of σi, we decompose this parameter into two contributions: where σreg is the regularization parameter, one per data type, that describes both experimental and forward model errors of this specific type of experiment. The σi,MD parameter, one per data point, represents both the statistical errors of forward models to predict experimental data and statistical errors that result from calculating average quantities over a finite-size ensemble. The σi,MD term can be calculated by using the Flyvbjerg block analysis67 technique to account for the correlation between adjacent frames in the MD simulation or the standard error of the mean for uncorrelated data. We use a single free parameter, the Kish ratio (Eq. 11), to determine the optimal values of σreg for each experimental data type. This parameter will be described in the next section.
Kish Ratio
As described in the previous sections, maximum entropy reweighting algorithms change the statistical weights of each conformation of the input ensemble in a minimal way to improve the agreement with the available experimental data. In a conformational ensemble derived from an unbiased MD simulation with N frames, each conformation will have an equal statistical weight of 1/N : wMD = {1/N, 1/N, …, 1/N }. Upon reweighting, the weights of all frames are modified to w = {w0, w1, …, wN } as defined in Eq. 7. If the initial unbiased MD ensemble is already in excellent agreement with all the experimental data used for reweighting, it may be possible to satisfy these data with very small perturbations to the weights of each frame. In this instance, the weight of most frames will still be close to 1/N. If instead a large fraction of frames of the unbiased MD ensemble are in poor agreement with the experimental data, in order to satisfy the data within a desired threshold it may be necessary to reduce the statistical weights of a significant number of frames to values close to 0, effectively discarding these frames from the reweighted ensemble.
To quantify the overall change in weights occurred upon reweighting, we use the (normalized) Kish effective sample size,28,66 or Kish ratio K: where wi is value of the statistical weight of the ith frame. The Kish ratio is a measure of the dimension of the reweighted ensemble expressed in terms of the fraction of frames of the original ensemble with significant statistical weight after reweighting. A reweighted ensemble where ∼50% of the frames have statistical weights close to zero will have a Kish ratio K ≈ 0.50. Conversely, if an unbiased MD simulation has 10,000 frames a reweighted ensemble with a Kish ratio K = 0.01 would have approximately 100 frames (1% of frames) with statistical weights substantially larger than 0. By definition, a conformational ensemble derived from an unbiased MD simulation where all N weights are equal has a Kish ratio is K = 1.0.
As discussed in the previous section, the σi parameters express our confidence in the experimental data. As the values of σi are changed during the optimization of the regularization parameters, the weights assigned to each conformation might be substantially altered to fit the experimental data within a desired threshold. Depending on the quality of the original MD ensemble, this might lead to dramatically reducing the Kish ratio of the reweighted ensemble, thus compromising the statistical accuracy of any average property calculated over the reweighted ensemble. Reweighted ensembles with small Kish ratios may effectively discard all structures representative of free energy basins in an initial MD simulation, and satisfy experimental restraints with a small set of rarely populated conformations (Figure 1). It is therefore desirable to ensure that the reweighted ensemble preserves a sufficiently large Kish ratio in order to be able to make predictions of the properties of IDPs, such as the relative positions and orientations of sidechain residues, with a desired statistical accuracy.
Methods
A maximum entropy reweighting protocol with a single free parameter
Our approach to combine multiple types of experimental data to reweight IDP ensembles uses only one free parameter: the Kish ratio of the reweighted ensemble (Eq. 11). Our protocol proceeds as follows:
We calculate σi,MD for each data point from the unbiased MD simulation using a Fylvberg67 blocking analysis.
For each experimental data type, we perform reweighting at a wide range of values of σreg and monitor the Kish ratio of the reweighted ensembles. For each data type, we choose a value of σreg based on a selected Kish ratio threshold. Here, we select the minimum value of σreg that produces an ensemble with Kish ratio K ≥ 0.10 for each data type. This procedure establishes the relative values of σreg for each data type (ie. σreg−Cα, σreg−JHNHα, σreg−SAXS, etc.).
A global reweighting is performed using all experimental data as restraints. In this step, we determine a global scaling factor σreg−Global for the values of σreg calculated for each data type in step 2. We perform reweighting at a range of values of σreg−Global and monitor the Kish ratio of the reweighted ensembles. We select the minimum value of σreg−Global that produces a final reweighted ensembles with Kish ratio K ≥ 0.10.
Calculation of experimental observations
All 30 µs MD simulations analyzed here were run on the Anton supercomputer86 and contained 29976 frames, with a spacing of ∼1ns per frame. Backbone scalar coupling constants were calculated using previously determined Karplus equations.87–89 RDCs were calculated with PALES90 using a local alignment window of 15 residues.56 Backbone chemical shifts were calculated using SPARTA+16 with MDTraj.91 The SAXS profiles of each frame were calculated using Pepsi-SAXS92 following a previously described protocol93 setting adjustable solvent parameters to δρ = 3.34 e/nm3 and r0 = 1.68 Å. The SAXS profiles were scaled by dividing the intensities by the intensities in the first scattering angle in order to normalize the datasets and set the initial intensity to 1. PREs were calculated using distances between Cα atoms as described previously20
α-helical order parameter Sα
Sα describes the similarity of each seven-residue fragment of a protein to an ideal helix.68 Sα is calculated as follows: where RMSDαi is the root mean square deviation (RMSD) between a seven-residue fragment of a protein (spanning from residue i to residue i + 6) and an ideal seven residue helix. Eq. 12 functions as switching function which outputs values ranging from 0 (indicating not helical) to 1 (indicating perfectly helical) for each seven-residue segment. The threshold for this function is adjusted by the parameter r0, which we set to 0.80 Å. Setting the parameter r0 = 0.80 Å means that a seven-residue segment with RMSDαi > 2.5 Å effectively contributes a value 0 to the Sα sum, while a seven-residue segment with RMSDαi < 0.5 Å contributes a value of 1.0 to the Sα sum. The value of Sα for a protein conformation can therefore be interpreted as a proxy for the number of seven-residue fragments that resemble an ideal helix.
ELViM ensemble comparisons
We provide a quantitative measure of the structural similarity of conformational ensembles by collectively analyzing a set of molecular dynamics simulations of the same protein by using the energy landscape visualization method (ELViM)71–73 for dimensionality reduction and computing the overlap of latent space densities. For each protein, we concatenated all unbiased trajectories into a single merged ensemble. We used ELViM to compute a dissimilarity matrix for this merged ensemble with the hyper-parameters σ0 and ɛ set to 1 and 0.15, respectively, and projected the conformations of the merged ensemble onto a 2D latent space. Individual kernel densities were constructed for each simulation dataset (each unbiased and reweighted ensemble of each protein) and sampled over the global extrema in each dimension of the ELViM latent space using an 80 by 80 grid. Gaussian kernel densities were estimated using the SciPy 94 Python package. The bandwidth of kernels were determined using Scott’s rule95 for both weighted and unbiased kernel density estimates. For a pair of kernel densities, D1 and D2, we evaluate the overlap integral, S: where N1 and N2 are normalization constants for the distributions D1 and D2 respectively, chosen such that the integrand is defined in the range : [0, 1]. For a kernel density Di the normalization constant is defined as :
If evaluations of kernel density estimators are compiled into vectors D1and D2, this computation is equivalent the normalized dot product.
Using the definitions in Eq. 12-14, S is a positive value from [0,1]. Two kernel densities with no overlapping points have an overlap of value of S = 0, while two identical kernel densities have an overlap of S = 1. The values of overlap integrals S can be converted to a percentage by multiplying S · 100%. We computed the overlap integral of the ELViM projections of unbiased and reweighted ensembles derived from simulations of each protein with each force field, and compare the overlap of the ELViM projections for all pairs of unbiased and reweighted ensembles for each protein (Figure 8).
Code Availability
All code used to calculate experimental data from MD trajectories, perform reweighting, analyze ensembles, and compare energy landscape visualization method (ELViM) projections71–73 is freely available from the GitHub repository https://github.com/paulrobustelli/Borthakur_MaxEnt_IDPs_2024/. The ELViM code was adapted from the GitHub repository https://github.com/VLeiteGroup/ELViM/.
Data Availability
All experimental data used to reweight MD trajectories, the predicted values of experimental data calculated from each trajectory, and values of the structural descriptors used to compare ensembles are freely available from the GitHub repository (https://github.com/paulrobustelli/Borthakur_MaxEnt_IDPs_2024/). All previously reported19 MD trajectories analyzed in this work are available for non-commercial use by request from D.E. Shaw Research (Trajectories{at}DEShawResearch.com)
Acknowledgement
P.R., K.B. and T.R.S acknowledge the support of NIH award R35GM142750. M.B. and F.P.P acknowledge the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-20-CE45-0002. The authors thank Vincent Schnapka and Vitor Leite for useful discussions.
References
- (1).↵
- (2).↵
- (3).↵
- (4).↵
- (5).↵
- (6).↵
- (7).
- (8).↵
- (9).
- (10).↵
- (11).↵
- (12).↵
- (13).↵
- (14).↵
- (15).
- (16).↵
- (17).↵
- (18).↵
- (19).↵
- (20).↵
- (21).↵
- (22).↵
- (23).
- (24).↵
- (25).↵
- (26).↵
- (27).↵
- (28).↵
- (29).↵
- (30).↵
- (31).↵
- (32).↵
- (33).↵
- (34).↵
- (35).↵
- (36).↵
- (37).↵
- (38).↵
- (39).↵
- (40).↵
- (41).
- (42).
- (43).
- (44).↵
- (45).↵
- (46).↵
- (47).↵
- (48).↵
- (49).↵
- (50).↵
- (51).↵
- (52).↵
- (53).↵
- (54).↵
- (55).↵
- (56).↵
- (57).↵
- (58).↵
- (59).↵
- (60).↵
- (61).↵
- (62).
- (63).↵
- (64).↵
- (65).↵
- (66).↵
- (67).↵
- (68).↵
- (69).↵
- (70).↵
- (71).↵
- (72).
- (73).↵
- (74).↵
- (75).↵
- (76).↵
- (77).↵
- (78).↵
- (79).↵
- (80).↵
- (81).↵
- (82).↵
- (83).↵
- (84).
- (85).↵
- (86).↵
- (87).↵
- (88).
- (89).↵
- (90).↵
- (91).↵
- (92).↵
- (93).↵
- (94).↵
- (95).↵