Abstract
The majority of disease-associated variants identified through genome-wide association studies (GWAS) are located outside of protein-coding regions, and are collectively overrepresented in sequences that regulate gene expression. Prioritizing candidate regulatory variants and potential biological mechanisms for further functional experiments, such as genome editing, can be challenging, especially in regions with a high number of variants in strong linkage disequilibrium or multiple proximal gene targets. Improved annotation of the regulatory genome can help identify promising variants and target genes for further experiments and accelerate translation of identified GWAS loci into important biological insights. To advance this area, we developed FORGEdb (https://forge2.altiusinstitute.org/files/forgedb.html), a web-based tool that can rapidly integrate data for individual genetic variants, providing information on associated regulatory elements, transcription factor (TF) binding sites and target genes. FORGEdb uses annotations derived from data across a wide range of biological samples to delineate the regulatory context for each variant at the cell type level. Different datatypes, including CADD scores, expression quantitative trait loci (eQTLs), activity-by-contact (ABC) interactions, Contextual Analysis of TF Occupancy (CATO) scores, TF motifs, DNase I hotspots, histone mark ChIP-seq and chromatin states in FORGEdb are made available for >37 million variants, and these annotations are integrated into a FORGEdb score to guide assignment of functional importance. The inclusion of a wide range of genomic annotations, such as ABC interactions and CADD scores, provides a comprehensive resource for researchers seeking to prioritize variants for functional validation. In summary, FORGEdb provides an expansive and unique resource for the analysis of genomic variants associated with complex traits and diseases.
Competing Interest Statement
The authors have declared no competing interest.