Abstract
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
In revision, several changes have been made to promote enhanced reproducibility and accessibility. Specifically, the analysis notebooks used in the study have been refactored, the downstream integrations have been updated to ensure MAVE data is visible on the respective platforms, and a new software package (dcd-mapping) has been created to allow for the mapping process to be performed on individual score sets. Additionally, a new figure (now figure 2) has been added to the manuscript to help illustrate the challenges faced in the mapping of MAVE experimental data to human reference sequences.
Abbreviations
- API
- Application Programming Interface
- AVE
- Atlas of Variant Effects Alliance
- BLAT
- BLAST-like Alignment Tool
- CAid
- Canonical Allele Identifier
- Cool-Seq-Tool
- Common Operations on Lots of Sequences Tool
- CURIE
- Compact Uniform Resource Identifier
- DCD
- Data Coordination and Dissemination Workstream
- DECIPHER
- Database of Genomic Variation and Phenotype in Humans using Ensembl Resources
- G2P
- Genomics 2 Proteins Portal
- GA4GH
- Global Alliance for Genomics and Health
- HGNC
- Human Gene Nomenclature Committee
- HGVS
- Human Genome Variation Society
- HSP
- High-Scoring Pair
- JSON
- JavaScript Object Notation
- LDH
- ClinGen Linked Data Hub
- MANE
- Matched Annotation from NCBI and EMBL-EBI
- MAVE
- Multiplexed Assays of Variants Effect
- MPRA
- Massively Parallel Reporter Assay
- NCBI
- National Center for Biotechnology Information
- PAid
- Protein Allele Identifier
- PSL
- Pattern Space Layout
- PyPi
- Python Package Interface
- RefSeq
- NCBI Reference Sequence Database
- SO
- Sequence Ontology
- URN
- Uniform Resource Name
- UTA
- Universal Transcript Archive
- VEP
- Ensembl Variant Effect Predictor
- VICC
- Variant Interpretation for Cancer Consortium
- VOCA
- Variant Precision Overcorrection Algorithm
- VRS
- Global Alliance for Genomics and Health Variation Representation Specification
- VUS
- Variant of Uncertain Significance