To milliseconds and beyond: challenges in the simulation of protein folding

https://doi.org/10.1016/j.sbi.2012.11.002Get rights and content

Quantitatively accurate all-atom molecular dynamics (MD) simulations of protein folding have long been considered a holy grail of computational biology. Due to the large system sizes and long timescales involved, such a pursuit was for many years computationally intractable. Further, sufficiently accurate forcefields needed to be developed in order to realistically model folding. This decade, however, saw the first reports of folding simulations describing kinetics on the order of milliseconds, placing many proteins firmly within reach of these methods. Progress in sampling and forcefield accuracy, however, presents a new challenge: how to turn huge MD datasets into scientific understanding. Here, we review recent progress in MD simulation techniques and show how the vast datasets generated by such techniques present new challenges for analysis. We critically discuss the state of the art, including reaction coordinate and Markov state model (MSM) methods, and provide a perspective for the future.

Highlights

► Millisecond simulations. ► Hardware and software advances. ► Simulation to understanding.

Introduction

Understanding protein folding via molecular simulation has been an aspiration of computational chemists ever since Anfinsen uncovered the surprising fact that proteins folded to a unique structure [1, 2, 3]. Applying simulation to folding appeals for many reasons. Folding is rapid and complex, requiring atomic-level resolution at nanosecond timescales for a complete detailed picture  outside the hard limits of temporal and spatial resolution of most experimental techniques [4]. Furthermore, the complexity of protein native states and the inherent physical heterogeneity in the folding process have frustrated the search for microscopic physical theories of folding, though some advanced phenomenological approaches have been proposed [5, 6, 7, 8, 9]. Thus atomic-level simulations of folding, possessing intrinsically high resolution, have been aggressively pursued with the hope of surmounting these difficulties.

Three main problems must be overcome to achieve useful simulations of protein folding: accurate models (forcefields), sufficient sampling, and robust data analysis. Forcefield development has received much attention from the field, and has been extensively discussed [10, 11, 12, 13, 14, 15]. Although more remains to be done to build and validate even more accurate models, forcefields capable of folding proteins in good agreement with experiment already exist (Figure 1). Instead, the major challenge in producing reliable simulations of folding has been harnessing enough computer power to produce sufficient sampling to study folding. Because classical simulations must integrate Newton's equations of motion with femtosecond timesteps (10−15 s), folding simulations require ∼1012 timesteps to reach millisecond timescales. This expense is compounded by large system sizes (∼105 atoms for explicit solvent simulations) and the need to witness many events for statistical confidence, making the computational effort required to study folding via simulation enormous.

Very recently, advances in hardware, software, and sampling techniques have made millisecond simulations possible. In 2010, using an aggregate of 1.5 ms of data, Voelz et al. reported the first simulation describing the folding of a millisecond folder in implicit solvent, from data generated by the distributed computing network Folding@home [16•, 17]. Later that year, Bowman et al. reported a millisecond timescale in the folding dynamics of lambda repressor in explicit solvent [18]. More recently, using 30 ms of aggregate data, Voelz et al. studied the folding of ACBP on the 10 ms timescale, revealing that an experimentally observed folding intermediate was in fact a complex, heterogeneous ensemble of structures [19••]. Finally, with the advent of ANTON, a computer specialized for protein simulation, the first single trajectory of millisecond length was reported near the end of 2010 [20], making it possible to predict folding times of up to 100 μs from a single trajectory.

While challenging, generating enough sampling in an accurate forcefield does not constitute the end of the road for a folding simulation. Another major challenge is gaining scientific insight from the simulation  turning data into knowledge. Insights gained from simulations have already begun to shape the protein folding field, through connection to experiment and analysis of the simulations themselves [18, 19••, 21, 22, 23, 24, 25]. These preliminary studies have revealed that the analysis of simulation data is difficult, with cases where certain techniques have led researchers to believe results inconsistent with their raw simulation data [20•, 26•, 27]. The simulation community needs to develop general, robust, and easy to use data analysis tools to continue toward the goal of understanding folding.

In this review, we briefly explain how the sampling problem has been overcome, and why we can expect the future to yield even longer simulations more efficiently. As sampling becomes less of an issue, a new challenge in folding simulations raises its head  given the massive amount of data extensive sampling provides, how does one make sense of it all? MD simulations are a high-dimensional time series, and therefore present a ‘Big Data’ challenge [28, 29]. We expect the techniques of data analysis will be the new limiting factor in the quest to understand folding through molecular simulation.

Section snippets

Overcoming the barrier to sampling

Over the past decade, the system sizes and time scales accessible to protein simulations have grown exponentially (Figure 2). This gain has been achieved through progress on three main fronts: efficient parallelization of MD codes, specialized hardware, and statistical analysis of multiple independent trajectories. While there are many techniques designed to accelerate dynamic sampling via biasing the system in some way (e.g. replica exchange, metadynamics, aMD, and the string method [30, 31, 32

Analysis: the final challenge

The advances in sampling techniques, along with historical exponential increases in achievable sampling methods (Figure 2), make us hopeful that protein folding simulations will become routine calculations for commodity hardware within a decade. Indeed, one can now simulate the folding of small proteins in explicit solvent at a rate of up to 100 ns/day/GPU [44, 45, 46], such that a cluster of 100 GPUs can produce MSMs with the ability to predict the millisecond time scales in only three months.

What has and can be learned from simulations of folding

Given that the sampling at millisecond timescales has been possible for only two years, and analysis methodology is still immature, unambiguous scientific results learned from atomic simulation have thus far been modest. It will be a major challenge in the next five years to turn advances in sampling and accuracy into scientific insight about how proteins fold.

Despite this relative immaturity, atomistic simulation has already begun to influence our view of protein folding. Detailed comparisons

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

DS and VSP acknowledge support from the Simbios NIH Center for Biomedical Computation (NIH U54 Roadmap GM072970), VSP acknowledges NIH (R01GM62828). TJL was supported by an NSF GRF, KAB was supported by a Stanford Graduate Fellowship.

References (93)

  • D.L. Ensign et al.

    The Fip35 WW domain folds with structural and mechanistic heterogeneity in molecular dynamics simulations

    Biophys J

    (2009)
  • C. Anfinsen

    Principles that govern the folding of protein chains

    Science

    (1973)
  • J.A. McCammon et al.

    Dynamics of folded proteins

    Nature

    (1977)
  • M. Karplus et al.

    Molecular dynamics simulations of biomolecules

    Nat Struct Biol

    (2002)
  • R.O. Dror et al.

    Biomolecular simulation: a computational microscope for molecular biology

    Ann Rev Biophys

    (2012)
  • J.D. Bryngelson et al.

    Spin glasses and the statistical mechanics of protein folding

    Proc Natl Acad Sci U S A

    (1987)
  • K. Ghosh et al.

    The ultimate speed limit to protein folding is conformational searching

    J Am Chem Soc

    (2007)
  • K.W. Plaxco et al.

    Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics

    Biochemistry

    (2000)
  • P. Kollman

    Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules

    Acc Chem Res

    (1996)
  • J. Ponder et al.

    Force fields for protein simulations

    Adv Protein Chem

    (2003)
  • P.S. Nerenberg et al.

    Optimizing protein–solvent force fields to reproduce intrinsic conformational preferences of model peptides

    J Chem Theor Comput

    (2011)
  • K.A. Beauchamp et al.

    Are protein force fields getting better? A systematic benchmark on 524 diverse NMR measurements

    J Chem Theor Comput

    (2012)
  • K. Lindorff-Larsen et al.

    Systematic validation of protein force fields against experimental data

    PLoS ONE

    (2012)
  • V.A. Voelz et al.

    Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39)

    J Am Chem Soc

    (2010)
  • M. Shirts et al.

    Screen savers of the world unite!

    Science

    (2000)
  • G.R. Bowman et al.

    Atomistic folding simulations of the five-helix bundle protein λ(6–85)

    J Am Chem Soc

    (2011)
  • V.A. Voelz et al.

    Slow unfolded-state structuring in acyl-CoA binding protein folding revealed by simulation and experiment

    J Am Chem Soc

    (2012)
  • D.E. Shaw et al.

    Atomic-level characterization of the structural dynamics of proteins

    Science

    (2010)
  • V.A. Voelz et al.

    Unfolded-state dynamics and structure of protein L characterized by simulation and experiment

    J Am Chem Soc

    (2010)
  • F. Morcos et al.

    Modeling conformational ensembles of slow functional motions in Pin1-WW

    PLoS Comput Biol

    (2010)
  • M.B. Prigozhin et al.

    The fast and the slow: folding and trapping of λ 6–85

    J Am Chem Soc

    (2011)
  • S. Piana et al.

    Computational design and experimental testing of the fastest-folding β-sheet protein

    J Mol Biol

    (2011)
  • H.S. Chung et al.

    Single-molecule fluorescence experiments determine protein folding transition path times

    Science

    (2012)
  • T.J. Lane et al.

    Markov state model reveals folding and functional dynamics in ultra-long MD trajectories

    J Am Chem Soc

    (2011)
  • E.E. Schadt et al.

    Computational solutions to large-scale data management and analysis

    Nat Rev Genet

    (2010)
  • J. Stone et al.

    Immersive out-of-core visualization of large-size and long-timescale molecular dynamics trajectories

    Lecture Notes in Computer Science

    (2011)
  • E. Weinan et al.

    Transition-path theory and path-finding algorithms for the study of rare events

    Annu Rev Phys Chem

    (2010)
  • D. Hamelberg et al.

    Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules

    J Chem Phys

    (2004)
  • Y. Sugita et al.

    Replica-exchange molecular dynamics method for protein folding

    Chem Phys Lett

    (1999)
  • D. Paschek et al.

    Replica exchange simulation of reversible folding/unfolding of the Trp-cage miniprotein in explicit solvent: on the structure and possible role of internal water

    J Struct Biol

    (2007)
  • S. Piana et al.

    A bias-exchange approach to protein folding

    J Phys Chem B

    (2007)
  • B. Hess et al.

    GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation

    J Chem Theor Comput

    (2008)
  • D.A. Case et al.

    The Amber biomolecular simulation programs

    J Comput Chem

    (2005)
  • J.E. Stone et al.

    Accelerating molecular modeling applications with graphics processors

    J Comput Chem

    (2007)
  • J.C. Phillips et al.

    Probing biomolecular machines with graphics processors

    Commun ACM

    (2009)
  • M.S. Friedrichs et al.

    Accelerating molecular dynamic simulation on graphics processing units

    J Comput Chem

    (2009)
  • Cited by (317)

    • Molecular dynamics in predicting the stability of drug-receptor interactions

      2023, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development
    View all citing articles on Scopus
    View full text