Abstract
Cancer is a complex group of diseases due to the accumulation of mutations in tumor suppressors or oncogenes in the genome. Cancer alterations can be very heterogeneous, even in tumors from the same tissue, affecting cancer predisposition, response to treatment, and risks of relapse in different patients. The role of genomics variants in this context continues to be realized. Thanks to advances in sequencing techniques and their introduction in the clinics, the number of genomic variants discovered is growing exponentially. Many of these variants are classified as Variants of Uncertain Significance (VUS), while other variants have been reported with conflicting evidence or with a ‘likely’ effect. Applications of bioinformatic-based approaches to characterize protein variants demonstrated their full potential thanks to advances in machine learning, comparisons between predicted effects and cellular readouts, and progresses in the field of structural biology and biomolecular simulations. We here introduce a modular structure-based framework for the annotations and classification of the impact of variants affecting proteins or their interactions and impacting on the corresponding protein product (MAVISp, Multi-layered Assessment of VarIants by Structure for proteins) together with a Streamlit-based web application (https://github.com/ELELAB/MAVISp) where the variants and the data generated by the assessment are made available to the community for consultation or further studies. Currently, MAVISp includes information for 127 different proteins and approximately 42000 variants. New protein targets are routinely analyzed in batches by biocurators through standardized Python-based workflows and high-throughput free energy and biomolecular simulations. We also illustrate the potential of the approach for each protein included in the database. New variants will be deposited on a regular base or in connection with future publications where the approach will be applied. We also welcome new contributors who are interested in participating to the collection in relation to their research.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We have updated the results with analyses on a larger set of proteins and protein variants. We currently included data from more than 100 proteins and 40000 protein variants. We have also added new analyses, including the usage of evolutionary-based scores for pathogenicity, a new implementation of the protocol for long-range interactions and changes in the way we store the data in the database. We have also included the support to changes in free energies upon mutation in protein-DNA interactions. The new authors who have been added have contributed or to the developments of the new tools, or to data analysis and interpretations for the new proteins added to the study.