TY - JOUR T1 - From Chemoproteomic-Detected Amino Acids to Genomic Coordinates: Insights into Precise Multi-omic Data Integration JF - bioRxiv DO - 10.1101/2020.07.03.186007 SP - 2020.07.03.186007 AU - Maria F. Palafox AU - Valerie A. Arboleda AU - Keriann M. Backus Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/07/04/2020.07.03.186007.abstract N2 - The integration of proteomic, transcriptomic, and genetic-variant annotation data will improve our understanding genotype-phenotype associations. Due, in part, to challenges associated with accurate inter-database mapping, such multi-omic studies have not extended to chemoproteomics, a method that measure the intrinsic reactivity and potential ‘druggability’ of nucleophilic amino acid side chains. Here, we evaluated two mapping approaches to match chemoproteomic-detected cysteine and lysine residues with their genetic coordinates. Our analysis reveals that databases update cycles and reliance on stable identifiers can lead to pervasive misidentification of labeled residues. Enabled by this examination of mapping strategies, we then integrated our chemoproteomic data with in silico generated predictions of genetic variant pathogenicity, which revealed that codons of highly reactive cysteines are enriched for genetic variants that are predicted to be more deleterious. Our study provides a roadmap for more precise inter-database comparisons and points to untapped opportunities to improve the predictive power of pathogenicity scores and to advance prioritization of putative druggable sites through integration of predictions of pathogenicity with chemoproteomic datasets.(UniProtKB-SP)UniProt Knowledge Base-Swiss-Prot,(ENSP)External Reference (xref), Ensembl Protein,(ENSG)Ensembl Transcript (ENST), Ensembl Gene,(UKB)UniProt Knowledge Base,(CADD)Combined Annotation Dependent Depletion,(CpDAA)Chemoproteomic Detected Amino Acids,(CCDS)Consensus Coding Sequence,(DANN)Deleterious Annotation of genetic variants using Neural Networks,(FATHMM-MKL)Functional Analysis through Hidden Markov Models,(SNV)Protein Data Bank (PDB), single nucleotide variant,(dbNSFP)Database for Non Synonymous Functional predictions. ER -