Abstract
Circular dichroism spectroscopy is a highly sensitive, but low-resolution technique to study the structure of proteins. Combed with molecular modelling and other complementary techniques, CD spectroscopy can also provide essential information at higher resolution. To this aim, we introduce a new computational method to calculate the electronic circular dichroism spectra of proteins from a three dimensional-model structure or structural ensemble. The method determines the CD spectrum from the average secondary structure composition of the protein using a pre-calculated set of basis spectra. We derived several basis spectrum sets obtained from the experimental CD spectra and secondary structure information of 71 reference proteins and tested the prediction accuracy of these basis spectrum sets through cross-validation. Furthermore, we investigated how prediction accuracy is affected by contributions from amino acid side chain groups and protein flexibility, potential experimental errors of the reference protein spectra, as well as the choice of the secondary structure classification algorithm and the number of basis spectra. We compared the predictive power of our method to previous spectrum prediction algorithms – such as DichroCalc and PDB2CD – and found that SESCA predicts the CD spectra with up to 50% smaller deviation. Our results indicate that SESCA basis sets are robust to experimental error in the reference spectra, and the choice of the secondary structure classification algorithm. For over 80% of the globular reference proteins, SESCA basis sets could accurately predict the experimental spectrum solely from their secondary structure composition. To improve SESCA predictions for the remaining proteins, we applied corrections to account for intensity normalization, contributions from the amino side chains, and conformational flexibility. For globular proteins only intensity scaling improved the prediction accuracy significantly, but our models indicate that side chain contributions and structural flexibility are pivotal for the prediction of shorter peptides and intrinsically disordered proteins.
Author summary Proteins are biomolecules that perform almost all of active task in living organisms, and how they perform these task is defined by their structure. By understanding the structure of proteins, we can alter and regulate their biological functions, which may lead to many medical, scientific, and technological advancements. Here we present SESCA, a new method that allows the assessment, and refinement of protein model structures. SESCA predicts the expected circular dichroism spectrum of a proposed protein model and compares it to an experimentally determined CD spectrum, to determine the model quality. CD spectroscopy is an experimental technique that is very sensitive to the secondary structure of the protein, and widely used as a quality control in protein chemistry.
We demonstrate that our method can accurately and robustly predict the spectrum of globular proteins from their secondary structure, which is necessary for a rigorous model assessment. The SESCA scheme can also address protein flexibility and contributions from amino acid side chains, which further enhance the accuracy of the method. In addition, this allows SESCA predictions to target disordered proteins. For these proteins, flexibility is part of their function, but it also renders their structural characterization much more challenging.