To milliseconds and beyond: challenges in the simulation of protein folding
Highlights
► Millisecond simulations. ► Hardware and software advances. ► Simulation to understanding.
Introduction
Understanding protein folding via molecular simulation has been an aspiration of computational chemists ever since Anfinsen uncovered the surprising fact that proteins folded to a unique structure [1, 2, 3]. Applying simulation to folding appeals for many reasons. Folding is rapid and complex, requiring atomic-level resolution at nanosecond timescales for a complete detailed picture — outside the hard limits of temporal and spatial resolution of most experimental techniques [4]. Furthermore, the complexity of protein native states and the inherent physical heterogeneity in the folding process have frustrated the search for microscopic physical theories of folding, though some advanced phenomenological approaches have been proposed [5, 6, 7, 8, 9]. Thus atomic-level simulations of folding, possessing intrinsically high resolution, have been aggressively pursued with the hope of surmounting these difficulties.
Three main problems must be overcome to achieve useful simulations of protein folding: accurate models (forcefields), sufficient sampling, and robust data analysis. Forcefield development has received much attention from the field, and has been extensively discussed [10, 11, 12, 13, 14, 15]. Although more remains to be done to build and validate even more accurate models, forcefields capable of folding proteins in good agreement with experiment already exist (Figure 1). Instead, the major challenge in producing reliable simulations of folding has been harnessing enough computer power to produce sufficient sampling to study folding. Because classical simulations must integrate Newton's equations of motion with femtosecond timesteps (10−15 s), folding simulations require ∼1012 timesteps to reach millisecond timescales. This expense is compounded by large system sizes (∼105 atoms for explicit solvent simulations) and the need to witness many events for statistical confidence, making the computational effort required to study folding via simulation enormous.
Very recently, advances in hardware, software, and sampling techniques have made millisecond simulations possible. In 2010, using an aggregate of 1.5 ms of data, Voelz et al. reported the first simulation describing the folding of a millisecond folder in implicit solvent, from data generated by the distributed computing network Folding@home [16•, 17]. Later that year, Bowman et al. reported a millisecond timescale in the folding dynamics of lambda repressor in explicit solvent [18]. More recently, using 30 ms of aggregate data, Voelz et al. studied the folding of ACBP on the 10 ms timescale, revealing that an experimentally observed folding intermediate was in fact a complex, heterogeneous ensemble of structures [19••]. Finally, with the advent of ANTON, a computer specialized for protein simulation, the first single trajectory of millisecond length was reported near the end of 2010 [20•], making it possible to predict folding times of up to 100 μs from a single trajectory.
While challenging, generating enough sampling in an accurate forcefield does not constitute the end of the road for a folding simulation. Another major challenge is gaining scientific insight from the simulation — turning data into knowledge. Insights gained from simulations have already begun to shape the protein folding field, through connection to experiment and analysis of the simulations themselves [18, 19••, 21, 22, 23, 24, 25]. These preliminary studies have revealed that the analysis of simulation data is difficult, with cases where certain techniques have led researchers to believe results inconsistent with their raw simulation data [20•, 26•, 27]. The simulation community needs to develop general, robust, and easy to use data analysis tools to continue toward the goal of understanding folding.
In this review, we briefly explain how the sampling problem has been overcome, and why we can expect the future to yield even longer simulations more efficiently. As sampling becomes less of an issue, a new challenge in folding simulations raises its head — given the massive amount of data extensive sampling provides, how does one make sense of it all? MD simulations are a high-dimensional time series, and therefore present a ‘Big Data’ challenge [28, 29]. We expect the techniques of data analysis will be the new limiting factor in the quest to understand folding through molecular simulation.
Section snippets
Overcoming the barrier to sampling
Over the past decade, the system sizes and time scales accessible to protein simulations have grown exponentially (Figure 2). This gain has been achieved through progress on three main fronts: efficient parallelization of MD codes, specialized hardware, and statistical analysis of multiple independent trajectories. While there are many techniques designed to accelerate dynamic sampling via biasing the system in some way (e.g. replica exchange, metadynamics, aMD, and the string method [30, 31, 32
Analysis: the final challenge
The advances in sampling techniques, along with historical exponential increases in achievable sampling methods (Figure 2), make us hopeful that protein folding simulations will become routine calculations for commodity hardware within a decade. Indeed, one can now simulate the folding of small proteins in explicit solvent at a rate of up to 100 ns/day/GPU [44, 45, 46], such that a cluster of 100 GPUs can produce MSMs with the ability to predict the millisecond time scales in only three months.
What has and can be learned from simulations of folding
Given that the sampling at millisecond timescales has been possible for only two years, and analysis methodology is still immature, unambiguous scientific results learned from atomic simulation have thus far been modest. It will be a major challenge in the next five years to turn advances in sampling and accuracy into scientific insight about how proteins fold.
Despite this relative immaturity, atomistic simulation has already begun to influence our view of protein folding. Detailed comparisons
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
DS and VSP acknowledge support from the Simbios NIH Center for Biomedical Computation (NIH U54 Roadmap GM072970), VSP acknowledges NIH (R01GM62828). TJL was supported by an NSF GRF, KAB was supported by a Stanford Graduate Fellowship.
References (93)
- et al.
Theory of protein folding
Curr Opin Struct Biol
(2004) - et al.
β-Sheet folding mechanisms from perturbation energetics
Curr Opin Struct Biol
(2006) - et al.
Are current molecular dynamics force fields too helical?
Biophys J
(2008) The free energy landscape analysis of protein (FIP35) folding dynamics
J Phys Chem B
(2011)- et al.
Desmond Performance on a Cluster of Multicore Processors
(2008) - et al.
Scalable molecular dynamics with NAMD
J Comput Chem
(2005) - et al.
Anton, a special-purpose machine for molecular dynamics simulation
Commun ACM
(2008) - et al.
Enhanced modeling via network theory: adaptive sampling of Markov state models
J Chem Theor Comput
(2010) Stochastic Processes in Chemistry and Physics
(2007)- et al.
Protein folding and misfolding: mechanism and principles
Quart Rev Biophys
(2008)
The Fip35 WW domain folds with structural and mechanistic heterogeneity in molecular dynamics simulations
Biophys J
Principles that govern the folding of protein chains
Science
Dynamics of folded proteins
Nature
Molecular dynamics simulations of biomolecules
Nat Struct Biol
Biomolecular simulation: a computational microscope for molecular biology
Ann Rev Biophys
Spin glasses and the statistical mechanics of protein folding
Proc Natl Acad Sci U S A
The ultimate speed limit to protein folding is conformational searching
J Am Chem Soc
Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics
Biochemistry
Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules
Acc Chem Res
Force fields for protein simulations
Adv Protein Chem
Optimizing protein–solvent force fields to reproduce intrinsic conformational preferences of model peptides
J Chem Theor Comput
Are protein force fields getting better? A systematic benchmark on 524 diverse NMR measurements
J Chem Theor Comput
Systematic validation of protein force fields against experimental data
PLoS ONE
Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39)
J Am Chem Soc
Screen savers of the world unite!
Science
Atomistic folding simulations of the five-helix bundle protein λ(6–85)
J Am Chem Soc
Slow unfolded-state structuring in acyl-CoA binding protein folding revealed by simulation and experiment
J Am Chem Soc
Atomic-level characterization of the structural dynamics of proteins
Science
Unfolded-state dynamics and structure of protein L characterized by simulation and experiment
J Am Chem Soc
Modeling conformational ensembles of slow functional motions in Pin1-WW
PLoS Comput Biol
The fast and the slow: folding and trapping of λ 6–85
J Am Chem Soc
Computational design and experimental testing of the fastest-folding β-sheet protein
J Mol Biol
Single-molecule fluorescence experiments determine protein folding transition path times
Science
Markov state model reveals folding and functional dynamics in ultra-long MD trajectories
J Am Chem Soc
Computational solutions to large-scale data management and analysis
Nat Rev Genet
Immersive out-of-core visualization of large-size and long-timescale molecular dynamics trajectories
Lecture Notes in Computer Science
Transition-path theory and path-finding algorithms for the study of rare events
Annu Rev Phys Chem
Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules
J Chem Phys
Replica-exchange molecular dynamics method for protein folding
Chem Phys Lett
Replica exchange simulation of reversible folding/unfolding of the Trp-cage miniprotein in explicit solvent: on the structure and possible role of internal water
J Struct Biol
A bias-exchange approach to protein folding
J Phys Chem B
GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation
J Chem Theor Comput
The Amber biomolecular simulation programs
J Comput Chem
Accelerating molecular modeling applications with graphics processors
J Comput Chem
Probing biomolecular machines with graphics processors
Commun ACM
Accelerating molecular dynamic simulation on graphics processing units
J Comput Chem
Cited by (317)
Folding@home: Achievements from over 20 years of citizen science herald the exascale era
2023, Biophysical JournalBeneficial and detrimental effects of non-specific binding during DNA hybridization
2023, Biophysical JournalMolecular dynamics in predicting the stability of drug-receptor interactions
2023, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug DevelopmentIncorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels
2023, PLoS Computational BiologyDiffusive dynamics of a model protein chain in solution
2024, Journal of Chemical Physics