RT Journal Article SR Electronic T1 GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants in whole-genome sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.09.17.301960 DO 10.1101/2020.09.17.301960 A1 E Giacopuzzi A1 N Popitsch A1 JC Taylor YR 2020 UL http://biorxiv.org/content/early/2020/09/19/2020.09.17.301960.abstract AB Background Non-coding variants have emerged as important contributors to the pathogenesis of human diseases, not only as common susceptibility alleles but also as rare high-impact variants. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging.Results We integrated 24 data sources to develop a standardized collection of 2.4 million regulatory elements in the human genome, transcription factor binding sites, DNase peaks, ultra-conserved non-coding elements, and super-enhancers. Information on controlled gene(s), tissue(s) and associated phenotype(s) are provided for regulatory elements when possible. We also calculated a variation constraint metric for regulatory regions and showed that genes controlled by constrained regions are more likely to be disease-associated genes and essential genes from mouse knock-out screenings. Finally, we evaluated 16 non-coding impact prediction scores providing suggestions for variant prioritization. The companion tool allows for annotation of VCF files with information about the regulatory regions as well as non-coding prediction scores to inform variant prioritization. The proposed annotation framework was able to capture previously published disease-associated non-coding variants and its integration in a routine prioritization pipeline increased the number of candidate genes, including genes potentially correlated with patient phenotype, and established clinically relevant genes.Conclusion We have developed a resource for the annotation and prioritization of regulatory variants in WGS analysis to support the discovery of candidate disease-associated variants in the non-coding genome.Competing Interest StatementThe authors have declared no competing interest.OROdds-ratioTPRTrue positive rate (sensitivity)TNRTrue negative rate (specificity)FDRFalse discovery rateACCAccuracyHPOHuman Phenotype OntologyTFBSTranscription factor binding siteUCNEUltra-conserved non-coding elementAUCArea under the curveOPMOverall Performance Measure