Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals

Abstract

Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies’ findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Outline of the GARFIELD method.
Fig. 2: Method assessment.
Fig. 3: Enrichment of GWA analysis P values in DHS (hotspots).
Fig. 4: Method comparison for 21 GWAS datasets in DHS (hotspots) and histone modifications (H3K27ac and H3K4me3) at the T < 10-8 GWAS significance threshold.
Fig. 5: Enrichment levels (log OR) and extent of sharing between traits for 25-state chromatin segmentations of the National Institutes of Health Roadmap and ENCODE projects at the T < 10−5 GWAS significance threshold.

Similar content being viewed by others

Code availability

Custom codes can be found at http://www.ebi.ac.uk/birney-srv/GARFIELD/.

Data availability

Web links for publicly available GWAS datasets and regulatory information databases are included in the URLs section. Restriction of availability applies to blood cell indices GWAS from van der Harst et al.33 and Gieger et al.34, which have been obtained through the manuscripts’ authors. Any other data that support the findings of this study are available from the corresponding authors upon reasonable request.

References

  1. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  4. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).

    Article  CAS  PubMed  Google Scholar 

  7. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  8. Shen, H. et al. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty-four Caucasians. PLoS ONE 8, e59494 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS. Genet. 10, e1004787 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Schork, A. J. et al. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS. Genet. 9, e1003449 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  PubMed Central  Google Scholar 

  16. Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol. 33, 559–568 (2009).

    Article  PubMed  Google Scholar 

  18. Dunham, I., Kulesha, E., Iotchkova, V., Morganella, S. & Birney, E. FORGE: a tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions. F1000Res. https://doi.org/10.12688/f1000research.6032.1 (2015).

  19. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Heid, I. M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Saxena, R. et al. Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat. Genet. 42, 142–148 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Dupuis, J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 42, 105–116 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Strawbridge, R. J. et al. Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes. Diabetes 60, 2624–2634 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A1(C) levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. International Consortium for Blood Pressure Genome-Wide Association Studies et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).

    Article  Google Scholar 

  31. Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    Article  PubMed Central  Google Scholar 

  33. Van der Harst, P. et al. Seventy-five genetic loci influencing the human red blood cell. Nature 492, 369–375 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Gieger, C. et al. New gene functions in megakaryopoiesis and platelet formation. Nature 480, 201–208 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

This study made use of data generated by the UK10K Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.UK10K.org/. Funding for UK10K was provided by the Wellcome Trust under award no. WT091310. Research by N.S. is supported by the Wellcome Trust (grants WT098051 and WT091310). N.J.T. is supported as a Wellcome Trust Investigator (no. 202802/Z/16/Z), is supported as the principal investigator of the Avon Longitudinal Study of Parents and Children (no. MRC & WT 102215/2/13/2), is supported by the University of Bristol NIHR Biomedical Research Centre (no. BRC-1215-20011) and the MRC Integrative Epidemiology Unit (no. MC_UU_12013/3), and works within the CRUK Integrative Cancer Epidemiology Programme (no. C18281/A19169). G.R.S.R. and V.I. are supported by European Molecular Biology Laboratory-Wellcome Trust Sanger Institute postdoctoral fellowships.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

G.R.S.R., J.L.M., K.W., N.J.T., I.D. and N.S. contributed data or materials. E.B., G.R.S.R., I.D., J.L.M., N.S. and V.I. developed the method. V.I. analyzed the data. E.B., I.D., N.J.T., N.S. and V.I. provided critical interpretation of the results. M.G. and S.M. designed the tools. E.B., N.S. and V.I. wrote the manuscript. E.B., G.R.S.R., I.D., J.L.M., K.W., M.G., N.J.T., N.S., S.M. and V.I. evaluated the manuscript. E.B. and N.S. designed and managed the project.

Corresponding authors

Correspondence to Ewan Birney or Nicole Soranzo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 GARFIELD method assessment.

a, GARFIELD-estimated false-positive rate (FPR) from 29 real data GWAS and n = 1,000 independent simulated annotations. The black horizontal line denotes the 5% FPR threshold. Traits are shown on the x axis. Colored bars denote the GARFIELD threshold of T < 10–8 (red) and the GARFIELD threshold of T < 10–5 (gray). Error bars denote standard errors. b,c, Effect of feature correction for each of 29 GWAS at the T < 10–8 significance threshold when analyzing 424 DHS cell types. The figure shows the proportion of significant annotations with respect to feature correction, where N denotes the number of LD proxies and T distance to the nearest TSS. The y axis shows the corresponding values when no feature correction is employed. d, Difference in proportion of significant enrichments between a model not accounting for any feature to a model accounting for any combination of the features, respectively, applied to 424 DHS cell types.

Supplementary Fig. 2 GARFIELD enrichment wheel plots.

Enrichment of genome-wide association analysis P-values in DNase I–hypersensitive sites (hotspots) for 27 disease/quantitative traits. Radial lines show OR values at eight GWAS –log10 P-value thresholds (T) for n = 424 ENCODE and Roadmap Epigenomics DHS cell lines, sorted by tissue on the outer circle. Dots on the outer side of the circle denote significant enrichment (if present) at T < 10–5 (outermost) to T < 10–8 (innermost).

Supplementary Fig. 3 Comparison between real data results for 29 real GWAS and 424 open chromatin annotations at the T < 10–8 and T < 10–5 GWAS P-value thresholds.

Left, −log10 P-value comparison for all trait annotation pairs (n = 29 × 424 points). Horizontal and vertical lines denote the threshold for detecting enrichment after multiple-testing correction. The numbers in each corner denote the number of points in it. Right, odds ratio (OR) comparison for trait annotation pairs with significant enrichment in both the T < 10–8 and T < 10–5 GWAS P-value thresholds.

Supplementary Fig. 4 Multiple-annotation enrichment of genome-wide association analysis P-values in DNase I–hypersensitive sites (hotspots) for 15 disease or quantitative GWAS traits.

Cell types/tissues remaining after a heuristic multiple-annotation approach are shown on the y axis for each trait. Odds ratios (on log scale) are represented as dots and 95% CI with lines. The multiple-annotation model estimates are represented in red and the marginal effects of analysis of each annotation on its own are represented in black. Only phenotypes with at least a single detected enrichment are shown.

Supplementary Fig. 5 Enrichment levels (log OR) and extent of sharing between traits for 25-state chromatin segmentations of the NIH Roadmap and ENCODE projects at the T < 10–8 GWAS significance threshold.

a, Distribution of significant OR values across the 29 traits considered, split by segmentation state and colored to highlight predicted functional elements by Roadmap Epigenomics (see Supplementary Table 9). Number of points n is shown on the x axis below each category. b, Distribution of the pairwise difference between ORs from all enhancer, promoter and transcriptional enhancers and transcriptional regulatory states tested (‘state 1’) to ORs from transcription states for significant enrichments only (‘state 2’; for example, measuring ORc,tEnhA1 − ORc,tTx for all cell types c and traits t for which P-valuec,tEnhA1 and P-valuec,tTx are both significant). Number of points n is shown on the x axis below each category. c, Sharing of significantly enriched/depleted annotations across 27 phenotypes (excluding CD and UC). The bar plot displays the number of cell types where an annotation is uniquely enriched in a trait or shared among multiple traits.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Note

Reporting Summary

Supplementary Table 1

Summary of available enrichment analysis approaches

Supplementary Table 2

Summary of overlap of UK10K sequence variants with DNase I–hypersensitive sites

Supplementary Table 3

Summary of the number of variants per disease/quantitative trait

Supplementary Table 4

GARFIELD enrichment of 29 publicly available GWAS studies in DNase I–hypersensitive sites from 424 ENCODE and Roadmap Epigenomics cell types

Supplementary Table 5

Enrichment P values (–log10) from five methods of 21 publicly available GWAS studies in DNase I–hypersensitive sites from 424 ENCODE and Roadmap Epigenomics cell types; in H3K27ac peaks from 127 ENCODE and Roadmap Epigenomics cell types; and in H3K4me3 peaks from 127 ENCODE and Roadmap Epigenomics cell types

Supplementary Table 6

Method running time and memory usage for 21 traits and 424 DNase I–hypersensitive site annotations

Supplementary Table 7

Roadmap epigenomics and ENCODE cell lines used for segmentations

Supplementary Table 8

GARFIELD enrichment of 29 publicly available GWAS studies in 25 state genome segmentations in 127 cell types

Supplementary Table 9

Segmentation state summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iotchkova, V., Ritchie, G.R.S., Geihs, M. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat Genet 51, 343–353 (2019). https://doi.org/10.1038/s41588-018-0322-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-018-0322-6

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics