Abstract
In response to the COVID-19 pandemic caused by the SARS-CoV-2 virus, structural biologists are using experimental structural determination methods to better understand the viral proteome. Our goal in this work was to help researchers use these rapidly emerging structural data to gain detailed insights into the molecular mechanisms underlying COVID-19 infection. Our analysis was based on the protein sequences defined by UniProt as comprising the viral proteome. We systematically compared each SARS-CoV-2 protein sequence against all available protein 3D structures derived from any organism (164,250 PDB entries), using pairs of hidden Markov models built with the HHblits tool. We found 872 sequence-to-structure alignments assessed to have significant similarity (E < 10e-10) to infer structural similarity. The resulting 872 3D template models now provide a wealth of new detail, currently not available from related resources. To help make this large, complex dataset accessible and usable for other researchers, we also developed a tailored layout strategy to visually organise the 3D models by mapping them to the viral genome. The resulting graph provides an immediate and comprehensive visual overview of what is known - and not known - about the 3D structure of the viral proteome, thereby helping direct future research. The graph also clearly reveals all available structural evidence of viral mimicry or hijacking of human proteins, as well as all evidence of interactions between viral proteins. We have created PDF and online versions of the graph, in which users can click on any node in the graph to open the corresponding 3D model in the Aquaria molecular graphics system. In Aquaria, these models can then be colored to show sequence features, such as single nucleotide polymorphisms and posttranslational modifications. Previous versions of Aquaria showed only features from UniProt; however, as part of this study, we have now added features from PredictProtein and CATH, thus providing a total of 32,717 features for SARS-CoV-2 protein sequences. In this work, we present novel insights found, using the above approach, into how SARS-CoV-2 mimics and hijacks host proteins, and how viral proteins self-assemble during infection. The resulting Aquaria-COVID resource is freely available online at https://aquaria.ws/covid19, and an accompanying video (https://youtu.be/J2nWQTlJNaY) explains how researchers can use the resource.
Competing Interest Statement
The authors have declared no competing interest.