Abstract
Interpretation of genetic variation is required for understanding genotype-phenotype associations, mechanisms of inherited disease, and drivers of cancer. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 20% of disease-associated missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants using PTMs. We integrated >385,000 published PTM sites with ∼3.8 million missense SNVs from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and inter-individual variation from human genome sequencing projects. The database includes interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated SNVs. Users can upload mutation datasets interactively and use our application programming interface for pipelines. Integrative analysis of SNVs and PTMs helps decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of disease genes TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org.