TY - JOUR T1 - What Markov state models can and cannot do: Correlation versus path-based observables in protein folding models JF - bioRxiv DO - 10.1101/2020.11.09.374496 SP - 2020.11.09.374496 AU - Ernesto Suárez AU - Rafal P. Wiewiora AU - Chris Wehmeyer AU - Frank Noé AU - John D. Chodera AU - Daniel M. Zuckerman Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/11/09/2020.11.09.374496.abstract N2 - Markov state models (MSMs) have been widely applied to study the kinetics and pathways of protein conformational dynamics based on statistical analysis of molecular dynamics (MD) simulations. These MSMs coarse-grain both configuration space and time in ways that limit what kinds of observables they can reproduce with high fidelity over different spatial and temporal resolutions. Despite their popularity, there is still limited understanding of which biophysical observables can be computed from these MSMs in a robust and unbiased manner, and which suffer from the space-time coarse-graining intrinsic in the MSM model. Most theoretical arguments and practical validity tests for MSMs rely on long-time equilibrium kinetics, such as the slowest relaxation timescales and experimentally observable time-correlation functions. Here, we perform an extensive assessment of the ability of well-validated protein folding MSMs to accuractely reproduce path-based observable such as mean first-passage times (MFPTs) and transition path mechanisms compared to a direct trajectory analysis. We also assess a recently proposed class of history-augmented MSMs (haMSMs) that exploit additional information not accounted for in standard MSMs. We conclude with some practical guidance on the use of MSMs to study various problems in conformational dynamics of biomolecules. In brief, MSMs can accurately reproduce correlation functions slower than the lag time, but path-based observables can only be reliably reproduced if the lifetimes of states exceed the lag time, which is a much stricter requirement. Even in the presence of short-lived states, we find that haMSMs reproduce path-based observables more reliably.Competing Interest StatementThe authors have declared no competing interest. ER -