Quantitative prediction of ensemble dynamics, shapes and contact propensities of intrinsically disordered proteins

Intrinsically disordered proteins (IDPs) are highly dynamic systems that play an important role in cell signaling processes and their misfunction often causes human disease. Proper understanding of IDP function not only requires the realistic characterization of their three-dimensional conformational ensembles at atomic-level resolution but also of the time scales of interconversion between their conformational substates. Large sets of experimental data are often used in combination with molecular modeling to restrain or bias models to improve agreement with experiment. It is shown here for the N-terminal transactivation domain of p53 (p53TAD) and Pup, which are two IDPs that fold upon binding to their targets, how the latest advancements in molecular dynamics (MD) simulations methodology produces native conformational ensembles by combining replica exchange with series of microsecond MD simulations. They closely reproduce experimental data at the global conformational ensemble level, in terms of the distribution properties of the radius of gyration tensor, and at the local level, in terms of NMR properties including 15N spin relaxation, without the need for reweighting. Further inspection revealed that 10–20% of the individual MD trajectories display the formation of secondary structures not observed in the experimental NMR data. The IDP ensembles were analyzed by graph theory to identify dominant inter-residue contact clusters and characteristic amino-acid contact propensities. These findings indicate that modern MD force fields with residue-specific backbone potentials can produce highly realistic IDP ensembles sampling a hierarchy of nano- and picosecond time scales providing new insights into their biological function.

ABSTRACT 24 Intrinsically disordered proteins (IDPs) are highly dynamic systems that play an important role in  primarily developed for ordered proteins turned out to be unsatisfactory for applications to IDPs.

90
With the continuing increase in computer power, the quality of sampling has reached a level that to produce better agreement, at least on average for those experimental parameters directly used 101 as restraints or for reweighting, they naturally depend on large amounts of experimental data of 102 good quality as input for each protein system studied. This amounts to a laborious experimental 103 effort that needs to be repeated for each new protein system as the experimental data are protein-104 specific rendering them non-transferrable between systems.

105
An alternative and more principled approach is to improve the MD force fields 106 themselves enabling them to increasingly accurately predict experimental data in a way that is 107 fully transferrable between protein systems, both ordered and disordered. This premise has led to 108 a recent proliferation of protein force field developments(32-37) and new explicit water 109 models(38-40) specifically geared toward the improved representation of disordered proteins. In    instantaneous secondary structures and average contact maps were determined from the MD 259 trajectories (Fig. 4). A contact is defined in an MD snapshot when the nearest distance between 260 atoms from two different residues is smaller than 4 Å (uninformative first-neighbor (i,i+1) and 261 second-neighbor (i,i+2) contacts between residues were excluded (white band along diagonal in 262 Fig. 4A,B)). The most frequent contacts are relatively short range, but contacts over larger

317
We also grouped the number of contacts per snapshot formed by each residue according 318 to residue type and normalized them by the number of residues of the same type. The resulting 319 value for each amino acid residue type present in p53TAD and Pup reflects their inherent contact 320 propensity (Fig. 5C,D). These profiles display the following trends: positively charged residues 321 arginine and lysine are on average most prone to form contacts, followed by hydrophobic 322 residues isoleucine and leucine as well as aromatic residues tryptophan and phenylalanine.

323
Negatively charged residues aspartate and glutamate, however, are least disposed to form 324 contacts. This may be also a consequence that both IDPs are overall negatively charged (-14e for 325 p53TAD and -12e for Pup). When acidic residues outnumber basic residues, the former tend to  The regions of Pup with elevated R2 values (Fig. 3D) around Arg8, Ile18, Thr22, Arg29,

375
Arg56 are all involved in clusters B1, B4, or B3 (Fig. 4E,F). Separate clusters can involve 376 sequentially adjacent residues, such as clusters B2 and B3 or clusters B3 and B5 and thereby  protein sequence data. Here, we back-calculated NMR chemical shifts using PPM(72) (Fig. S4), 471 which only uses the physical parametrization of chemical shifts with respect to 3D protein 472 structure of each snapshot,(71) achieving very good agreement.

473
The close correspondence observed between experimental and computed 15 N relaxation 474 R1 and R2 relaxation rates for both IDPs studied here (Fig. 3) inter-residue interactions between non-sequential amino acids are short-lived. Therefore, the 498 time-averaged interaction maps (Fig. 4A,B) offer only partial insights as they conceal the  Tables S2, S3. 504 Snapshot by snapshot analysis revealed the dominance of small cluster sizes over larger 505 ones (Fig. 6). For both p53TAD and Pup, clusters with 2 or 3 residues make up more than 50% 506 of all clusters and clusters with more than 10 residues have notably low occurrence, although 507 their formation could be functionally relevant during molecular recognition events. Because that have occupancies larger than 0.2 visualized as separate graphs (Fig. 4E,F). Instantaneous 512 clusters can belong to such larger graphs as exemplified by clusters A1.1, A1.2, A1.3 for 513 p53TAD and clusters B1.1 and B1.2 for Pup (Fig. 4E,F) The majority of clusters are linear graphs with few circular sub-graphs leading to the 520 linear relationship between the number of nodes and number of edges (Fig. 6B). Acidic residues 521 tend to have low cluster participation whereas arginine residues have the highest participation in 522 both proteins (Fig. 5A,B). This difference in cluster participation between cationic and anionic 523 residues is also evident in Fig. 5C  volume with all protein heavy atoms positionally fixed. The pressure was then coupled to 1 atm 608 and the system was simulated for another 100 ps. The final production run of 1 µs length was 609 performed in the NPT ensemble at 300 K and 1 atm. For simulation details, see Table S1. CRg(t) = <(Rg(t) -<Rg>)(Rg(t+t) -<Rg>))>t /<(Rg(t) -<Rg>) 2 >t (5) as an average over all 1-µs MD trajectories.

629
According to polymer theory, for an unfolded polymer the ensemble-averaged Rg scales 630 with the number of residues N as(61) where r0 is a constant reflecting the average size of a residue and the Flory exponent 633 n determines the overall compactness of the polymer serving as a reference.