Abstract
Despite advances in sampling and scoring strategies, Monte Carlo modeling methods still struggle to accurately predict de novo the structures of large proteins, membrane proteins, or proteins of complex topologies. Previous approaches have addressed these shortcomings by leveraging sparse distance data gathered using site-directed spin labeling and electron paramagnetic resonance spectroscopy (SDSL-EPR) to improve protein structure prediction and refinement outcomes. However, existing computational implementations must choose between coarse-grained models of the spin label that lower the resolution and explicit models that lead to resource-intense simulations. Existing methods are further limited by their reliance on distance distributions, which are calculated from a primary refocused echo decay signal and may contain artifacts introduced during this processing step. Here, we addressed these challenges by developing RosettaDEER, a scoring method within the Rosetta software suite capable of simulating distance distributions and echo decay traces between spin labels fast enough to fold proteins de novo. We demonstrate that the accuracy of resulting distance distributions match or exceed those generated by more computationally intensive methods. Moreover, decay traces generated from these distributions recapitulate intermolecular background coupling parameters, allowing RosettaDEER to discriminate between poorly-folded and native-like models even when the time window of EPR data collection is truncated, rendering them unsuitable for accurate transformation into distance distributions. Finally, we demonstrate that one decay trace per nine residues is sufficient to predict the folds of Bax and the C-terminus of ExoU, two soluble proteins with surface-exposed amphipathic structural features that prevent the Rosetta energy function from correctly identifying native-like models in the absence of experimental data. These benchmarking results confirm that RosettaDEER can effectively leverage sparse experimental data for a wide array of modeling applications built into the Rosetta software suite.