To milliseconds and beyond: challenges in the simulation of protein folding

doi:10.1016/j.sbi.2012.11.002

Current Opinion in Structural Biology

Volume 23, Issue 1, February 2013, Pages 58-65

https://doi.org/10.1016/j.sbi.2012.11.002 Get rights and content

Quantitatively accurate all-atom molecular dynamics (MD) simulations of protein folding have long been considered a holy grail of computational biology. Due to the large system sizes and long timescales involved, such a pursuit was for many years computationally intractable. Further, sufficiently accurate forcefields needed to be developed in order to realistically model folding. This decade, however, saw the first reports of folding simulations describing kinetics on the order of milliseconds, placing many proteins firmly within reach of these methods. Progress in sampling and forcefield accuracy, however, presents a new challenge: how to turn huge MD datasets into scientific understanding. Here, we review recent progress in MD simulation techniques and show how the vast datasets generated by such techniques present new challenges for analysis. We critically discuss the state of the art, including reaction coordinate and Markov state model (MSM) methods, and provide a perspective for the future.

Highlights

► Millisecond simulations. ► Hardware and software advances. ► Simulation to understanding.

Introduction

Understanding protein folding via molecular simulation has been an aspiration of computational chemists ever since Anfinsen uncovered the surprising fact that proteins folded to a unique structure [1, 2, 3]. Applying simulation to folding appeals for many reasons. Folding is rapid and complex, requiring atomic-level resolution at nanosecond timescales for a complete detailed picture — outside the hard limits of temporal and spatial resolution of most experimental techniques [4]. Furthermore, the complexity of protein native states and the inherent physical heterogeneity in the folding process have frustrated the search for microscopic physical theories of folding, though some advanced phenomenological approaches have been proposed [5, 6, 7, 8, 9]. Thus atomic-level simulations of folding, possessing intrinsically high resolution, have been aggressively pursued with the hope of surmounting these difficulties.

Three main problems must be overcome to achieve useful simulations of protein folding: accurate models (forcefields), sufficient sampling, and robust data analysis. Forcefield development has received much attention from the field, and has been extensively discussed [10, 11, 12, 13, 14, 15]. Although more remains to be done to build and validate even more accurate models, forcefields capable of folding proteins in good agreement with experiment already exist (Figure 1). Instead, the major challenge in producing reliable simulations of folding has been harnessing enough computer power to produce sufficient sampling to study folding. Because classical simulations must integrate Newton's equations of motion with femtosecond timesteps (10⁻¹⁵ s), folding simulations require ∼10¹² timesteps to reach millisecond timescales. This expense is compounded by large system sizes (∼10⁵ atoms for explicit solvent simulations) and the need to witness many events for statistical confidence, making the computational effort required to study folding via simulation enormous.

Very recently, advances in hardware, software, and sampling techniques have made millisecond simulations possible. In 2010, using an aggregate of 1.5 ms of data, Voelz et al. reported the first simulation describing the folding of a millisecond folder in implicit solvent, from data generated by the distributed computing network Folding@home [16•, 17]. Later that year, Bowman et al. reported a millisecond timescale in the folding dynamics of lambda repressor in explicit solvent [18]. More recently, using 30 ms of aggregate data, Voelz et al. studied the folding of ACBP on the 10 ms timescale, revealing that an experimentally observed folding intermediate was in fact a complex, heterogeneous ensemble of structures [19^••]. Finally, with the advent of ANTON, a computer specialized for protein simulation, the first single trajectory of millisecond length was reported near the end of 2010 [20^•], making it possible to predict folding times of up to 100 μs from a single trajectory.

While challenging, generating enough sampling in an accurate forcefield does not constitute the end of the road for a folding simulation. Another major challenge is gaining scientific insight from the simulation — turning data into knowledge. Insights gained from simulations have already begun to shape the protein folding field, through connection to experiment and analysis of the simulations themselves [18, 19••, 21, 22, 23, 24, 25]. These preliminary studies have revealed that the analysis of simulation data is difficult, with cases where certain techniques have led researchers to believe results inconsistent with their raw simulation data [20•, 26•, 27]. The simulation community needs to develop general, robust, and easy to use data analysis tools to continue toward the goal of understanding folding.

In this review, we briefly explain how the sampling problem has been overcome, and why we can expect the future to yield even longer simulations more efficiently. As sampling becomes less of an issue, a new challenge in folding simulations raises its head — given the massive amount of data extensive sampling provides, how does one make sense of it all? MD simulations are a high-dimensional time series, and therefore present a ‘Big Data’ challenge [28, 29]. We expect the techniques of data analysis will be the new limiting factor in the quest to understand folding through molecular simulation.

Section snippets

Overcoming the barrier to sampling

Over the past decade, the system sizes and time scales accessible to protein simulations have grown exponentially (Figure 2). This gain has been achieved through progress on three main fronts: efficient parallelization of MD codes, specialized hardware, and statistical analysis of multiple independent trajectories. While there are many techniques designed to accelerate dynamic sampling via biasing the system in some way (e.g. replica exchange, metadynamics, aMD, and the string method [30, 31, 32

Analysis: the final challenge

The advances in sampling techniques, along with historical exponential increases in achievable sampling methods (Figure 2), make us hopeful that protein folding simulations will become routine calculations for commodity hardware within a decade. Indeed, one can now simulate the folding of small proteins in explicit solvent at a rate of up to 100 ns/day/GPU [44, 45, 46], such that a cluster of 100 GPUs can produce MSMs with the ability to predict the millisecond time scales in only three months.

What has and can be learned from simulations of folding

Given that the sampling at millisecond timescales has been possible for only two years, and analysis methodology is still immature, unambiguous scientific results learned from atomic simulation have thus far been modest. It will be a major challenge in the next five years to turn advances in sampling and accuracy into scientific insight about how proteins fold.

Despite this relative immaturity, atomistic simulation has already begun to influence our view of protein folding. Detailed comparisons

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgements

DS and VSP acknowledge support from the Simbios NIH Center for Biomedical Computation (NIH U54 Roadmap GM072970), VSP acknowledges NIH (R01GM62828). TJL was supported by an NSF GRF, KAB was supported by a Stanford Graduate Fellowship.

References (93)

J.N. Onuchic et al.
Theory of protein folding
Curr Opin Struct Biol
(2004)
S. Deechongkit et al.
β-Sheet folding mechanisms from perturbation energetics
Curr Opin Struct Biol
(2006)
R.B. Best et al.
Are current molecular dynamics force fields too helical?
Biophys J
(2008)
S.V. Krivov
The free energy landscape analysis of protein (FIP35) folding dynamics
J Phys Chem B
(2011)
E. Chow et al.
Desmond Performance on a Cluster of Multicore Processors
(2008)
J.C. Phillips et al.
Scalable molecular dynamics with NAMD
J Comput Chem
(2005)
D.E. Shaw et al.
Anton, a special-purpose machine for molecular dynamics simulation
Commun ACM
(2008)
G.R. Bowman et al.
Enhanced modeling via network theory: adaptive sampling of Markov state models
J Chem Theor Comput
(2010)
N.G. van Kampen
Stochastic Processes in Chemistry and Physics
(2007)
S.W. Englander et al.
Protein folding and misfolding: mechanism and principles
Quart Rev Biophys
(2008)

D.L. Ensign et al.

The Fip35 WW domain folds with structural and mechanistic heterogeneity in molecular dynamics simulations

Biophys J

(2009)

C. Anfinsen

Principles that govern the folding of protein chains

Science

(1973)

J.A. McCammon et al.

Dynamics of folded proteins

Nature

(1977)

M. Karplus et al.

Molecular dynamics simulations of biomolecules

Nat Struct Biol

(2002)

R.O. Dror et al.

Biomolecular simulation: a computational microscope for molecular biology

Ann Rev Biophys

(2012)

J.D. Bryngelson et al.

Spin glasses and the statistical mechanics of protein folding

Proc Natl Acad Sci U S A

(1987)

K. Ghosh et al.

The ultimate speed limit to protein folding is conformational searching

J Am Chem Soc

(2007)

K.W. Plaxco et al.

Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics

Biochemistry

(2000)

P. Kollman

Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules

Acc Chem Res

(1996)

J. Ponder et al.

Force fields for protein simulations

Adv Protein Chem

(2003)

P.S. Nerenberg et al.

Optimizing protein–solvent force fields to reproduce intrinsic conformational preferences of model peptides

J Chem Theor Comput

(2011)

K.A. Beauchamp et al.

Are protein force fields getting better? A systematic benchmark on 524 diverse NMR measurements

J Chem Theor Comput

(2012)

K. Lindorff-Larsen et al.

Systematic validation of protein force fields against experimental data

PLoS ONE

(2012)

V.A. Voelz et al.

Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39)

J Am Chem Soc

(2010)

M. Shirts et al.

Screen savers of the world unite!

Science

(2000)

G.R. Bowman et al.

Atomistic folding simulations of the five-helix bundle protein λ(6–85)

J Am Chem Soc

(2011)

V.A. Voelz et al.

Slow unfolded-state structuring in acyl-CoA binding protein folding revealed by simulation and experiment

J Am Chem Soc

(2012)

D.E. Shaw et al.

Atomic-level characterization of the structural dynamics of proteins

Science

(2010)

V.A. Voelz et al.

Unfolded-state dynamics and structure of protein L characterized by simulation and experiment

J Am Chem Soc

(2010)

F. Morcos et al.

Modeling conformational ensembles of slow functional motions in Pin1-WW

PLoS Comput Biol

(2010)

M.B. Prigozhin et al.

The fast and the slow: folding and trapping of λ 6–85

J Am Chem Soc

(2011)

S. Piana et al.

Computational design and experimental testing of the fastest-folding β-sheet protein

J Mol Biol

(2011)

H.S. Chung et al.

Single-molecule fluorescence experiments determine protein folding transition path times

Science

(2012)

T.J. Lane et al.

Markov state model reveals folding and functional dynamics in ultra-long MD trajectories

J Am Chem Soc

(2011)

E.E. Schadt et al.

Computational solutions to large-scale data management and analysis

Nat Rev Genet

(2010)

J. Stone et al.

Immersive out-of-core visualization of large-size and long-timescale molecular dynamics trajectories

Lecture Notes in Computer Science

(2011)

E. Weinan et al.

Transition-path theory and path-finding algorithms for the study of rare events

Annu Rev Phys Chem

(2010)

D. Hamelberg et al.

Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules

J Chem Phys

(2004)

Y. Sugita et al.

Replica-exchange molecular dynamics method for protein folding

Chem Phys Lett

(1999)

D. Paschek et al.

Replica exchange simulation of reversible folding/unfolding of the Trp-cage miniprotein in explicit solvent: on the structure and possible role of internal water

J Struct Biol

(2007)

S. Piana et al.

A bias-exchange approach to protein folding

J Phys Chem B

(2007)

B. Hess et al.

GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation

J Chem Theor Comput

(2008)

D.A. Case et al.

The Amber biomolecular simulation programs

J Comput Chem

(2005)

J.E. Stone et al.

Accelerating molecular modeling applications with graphics processors

J Comput Chem

(2007)

J.C. Phillips et al.

Probing biomolecular machines with graphics processors

Commun ACM

(2009)

M.S. Friedrichs et al.

Accelerating molecular dynamic simulation on graphics processing units

J Comput Chem

(2009)

Cited by (317)

One bead per residue can describe all-atom protein structures
2024, Structure
Atomistic resolution is the standard for high-resolution biomolecular structures, but experimental structural data are often at lower resolution. Coarse-grained models are also used extensively in computational studies to reach biologically relevant spatial and temporal scales. This study explores the use of advanced machine learning networks for reconstructing atomistic models from reduced representations. The main finding is that a single bead per amino acid residue allows construction of accurate and stereochemically realistic all-atom structures with minimal loss of information. This suggests that lower resolution representations of proteins may be sufficient for many applications when combined with a machine learning framework that encodes knowledge from known structures. Practical applications include the rapid addition of atomistic detail to low-resolution structures from experiment or computational coarse-grained models. The application of rapid, deterministic all-atom reconstruction within multi-scale frameworks is further demonstrated with a rapid protocol for the generation of accurate models from cryo-EM densities close to experimental structures.
Folding@home: Achievements from over 20 years of citizen science herald the exascale era
2023, Biophysical Journal
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over 20 years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project’s name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as graphics processing unit (GPU)-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small-molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world’s first exascale computer and deploying this massive resource to provide insight into the inner workings of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and aid the development of new antivirals. This success provides a glimpse of what is to come as exascale supercomputers come online and as Folding@home continues its work.
Beneficial and detrimental effects of non-specific binding during DNA hybridization
2023, Biophysical Journal
DNA strands have to sample numerous states to find the alignment that maximizes Watson-Crick-Franklin base pairing. This process depends strongly on sequence, which affects the stability of the native duplex as well as the prevalence of non-native inter- and intramolecular helices. We present a theory that describes DNA hybridization as a three-stage process: diffusion, registry search, and zipping. We find that non-specific binding affects each of these stages in different ways. Mis-registered intermolecular binding in the registry search stage helps DNA strands sample different alignments and accelerates the hybridization rate. Non-native intramolecular structure affects all three stages by rendering portions of the molecule inert to intermolecular association, limiting mis-registered alignments to be sampled, and impeding the zipping process. Once in-register base pairs are formed, the stability of the native structure is important to hold the molecules together long enough for non-native contacts to break.
Molecular dynamics in predicting the stability of drug-receptor interactions
2023, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development
With the growing number of small molecule libraries, computer-aided drug discovery (CADD) now has a profound role in proposing novel molecules to cure ever-increasing diseases and disorders. Molecular simulations are a key step in this process, allowing a closer look into the molecular motions associated with the recognition of biomolecular targets by ligands. This article focuses on the fundamentals of molecular dynamics (MD) and several state-of-the-art concepts in vogue, such as free energy perturbation, umbrella sampling, steered molecule dynamics, MMBAPPL, and MMPBSA/GBSA methods in the context of drug discovery. The study emphasizes how molecular dynamics efficiently helps elucidate the mode of action of drug molecules, identify allosteric or cryptic binding pockets, and decipher mutational effects in the target proteins, not all of which are within the accessible domain of crystallographic experiments. We also present some case studies in which molecular dynamics and free energy simulations, combined with virtual screening and molecular docking, have successfully contributed to novel pharmacological therapeutics. With the growing computer power and development of enhanced sampling techniques, simulation-driven CADD has a prominent role in developing novel drugs and has a bright future.
Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels
2023, PLoS Computational Biology
Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ∆V_1/2, with a RMSE ~ 32 mV and correlation coefficient of R ~ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V_1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ∆V_1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.
Diffusive dynamics of a model protein chain in solution
2024, Journal of Chemical Physics

View all citing articles on Scopus

View full text

To milliseconds and beyond: challenges in the simulation of protein folding

Highlights

Introduction

Section snippets

Overcoming the barrier to sampling

Analysis: the final challenge

What has and can be learned from simulations of folding

References and recommended reading

Acknowledgements

Curr Opin Struct Biol

Curr Opin Struct Biol

Biophys J

J Phys Chem B

J Comput Chem

Commun ACM

J Chem Theor Comput

Quart Rev Biophys

Biophys J

Principles that govern the folding of protein chains

Science

Dynamics of folded proteins

Nature

Molecular dynamics simulations of biomolecules

Nat Struct Biol

Biomolecular simulation: a computational microscope for molecular biology

Ann Rev Biophys

Spin glasses and the statistical mechanics of protein folding

Proc Natl Acad Sci U S A

The ultimate speed limit to protein folding is conformational searching

J Am Chem Soc

Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics

Biochemistry

Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules

Acc Chem Res

Force fields for protein simulations

Adv Protein Chem

Optimizing protein–solvent force fields to reproduce intrinsic conformational preferences of model peptides

J Chem Theor Comput

Are protein force fields getting better? A systematic benchmark on 524 diverse NMR measurements

J Chem Theor Comput

Systematic validation of protein force fields against experimental data

PLoS ONE

Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39)

J Am Chem Soc

Screen savers of the world unite!

Science

Atomistic folding simulations of the five-helix bundle protein λ(6–85)

J Am Chem Soc

Slow unfolded-state structuring in acyl-CoA binding protein folding revealed by simulation and experiment

J Am Chem Soc

Atomic-level characterization of the structural dynamics of proteins

Science

Unfolded-state dynamics and structure of protein L characterized by simulation and experiment

J Am Chem Soc

Modeling conformational ensembles of slow functional motions in Pin1-WW

PLoS Comput Biol

The fast and the slow: folding and trapping of λ 6–85

J Am Chem Soc

Computational design and experimental testing of the fastest-folding β-sheet protein

J Mol Biol

Single-molecule fluorescence experiments determine protein folding transition path times

Science

Markov state model reveals folding and functional dynamics in ultra-long MD trajectories

J Am Chem Soc

Computational solutions to large-scale data management and analysis

Nat Rev Genet

Immersive out-of-core visualization of large-size and long-timescale molecular dynamics trajectories

Lecture Notes in Computer Science

Transition-path theory and path-finding algorithms for the study of rare events

Annu Rev Phys Chem

Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules

J Chem Phys

Replica-exchange molecular dynamics method for protein folding

Chem Phys Lett

Replica exchange simulation of reversible folding/unfolding of the Trp-cage miniprotein in explicit solvent: on the structure and possible role of internal water

J Struct Biol

A bias-exchange approach to protein folding

J Phys Chem B

GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation

J Chem Theor Comput