Abstract
Conformational heterogeneity of biological macromolecules is a challenge in single particle averaging (SPA). Current standard practice is to employ classification and filtering methods which may allow a discrete number of conformational states to be reconstructed. However, the conformation space accessible to these molecules is continuous and therefore explored incompletely by a small number of discrete classes. Recently developed heterogeneous reconstruction algorithms (HRAs) to analyse continuous heterogeneity rely on machine learning methods employing low-dimensional latent space representations. The non-linear nature of many of these methods pose challenges to their validation and interpretation, and to identifying functionally relevant conformational trajectories. We believe these methods would benefit from in-depth benchmarking using high quality synthetic data and concomitant ground truth information. Here we present a framework for the simulation and subsequent analysis with respect to ground-truth of cryo-EM micrographs containing conformationally heterogeneous particles whose conformational heterogeneity is sourced from molecular dynamics (MD) simulations. This synthetic data can then be processed as if it were experimental data allowing aspects of standard SPA workflows, as well as heterogeneous reconstruction methods, to be compared with known groundtruth using available utilities. We will demonstrate the simulation and analysis of several such datasets and present an initial investigation into HRAs.
Competing Interest Statement
The authors have declared no competing interest.