High-resolution DNA accessibility profiles increase the discovery and interpretability of genetic associations

Aviv Madar; Diana Chang; Feng Gao; Aaron J. Sams; Yedael Y. Waldman; Deborah S. Cunninghame Graham; Timothy J. Vyse; Andrew G. Clark; Alon Keinan

doi:10.1101/070268

Abstract

Genetic risk for common autoimmune diseases is influenced by hundreds of small effect, mostly non-coding variants, enriched in regulatory regions active in adaptive-immune cell types. DNaseI hypersensitivity sites (DHSs) are a genomic mark for regulatory DNA. Here, we generated a single DHSs annotation from fifteen deeply sequenced DNase-seq experiments in adaptive-immune as well as non-immune cell types. Using this annotation we quantified accessibility across cell types in a matrix format amenable to statistical analysis, deduced the subset of DHSs unique to adaptive-immune cell types, and grouped DHSs by cell-type accessibility profiles. Measuring enrichment with cell-type-specific TF binding sites as well as proximal gene expression and function, we show that accessibility profiles grouped DHSs into coherent regulatory functions. Using the adaptive-immune-specific DHSs as input (0.37% of genome), we associated DHSs to six autoimmune diseases with GWAS data. Associated loci showed higher replication rates when compared to loci identified by GWAS or by considering all DHSs, allowing the additional discovery of 327 loci (FDR<0.005) below typical GWAS significance threshold, 52 of which are novel and replicating discoveries. Finally, we integrated DHS associations from six autoimmune diseases, using a network model (bird’-eye view) and a regulatory Manhattan plot schema (per locus). Taken together, we described and validated a strategy to leverage finely resolved regulatory priors, enhancing the discovery, interpretability, and resolution of genetic associations, and providing actionable insights for follow up work.

Introduction

Most common autoimmune diseases affecting over 4% of the world’ population^1,2 have a substantial polygenic heritable component³. Genome-wide association studies (GWAS) have been successful at linking hundreds of genomic loci to autoimmune diseases⁴, but understanding the molecular mechanisms influencing disease-risk remains challenging⁵. DNaseI hypersensitive site sequencing (DNase-Seq) is a high-throughput technology for genome-wide detection of DNaseI hypersensitive sites (DHSs) in a given cell type^6,7, and DHSs are an excellent mark for regulatory DNA where transcription factors bind^8,9. Recently, it has been shown that the majority of autoimmune disease risk alleles reside in DHSs active in adaptive immune cell types, such as T and B cells^10-13. As recently suggested^11,14, this presents the opportunity to focus association studies on regulatory DNA of such trait-relevant cell types. However, a large proportion of the regulatory DNA of a given cell type is shared with non-related cell types⁸. Since genes involved in adaptive-immune-specific functions (e.g. T cell receptor signaling¹⁵) are likely regulated by adaptive-immune-specific regulatory DNA, we suggest further focusing genotype-phenotype studies of autoimmune diseases, on regulatory DNA specific to adaptive immune cell types.

Regulatory marks other than DHSs such as certain histone marks are also available for many cell types^16,17. However, when compared with DHSs, these other marks typically lack in two key factors we relied on, resolution, about 200-300 base pairs [bps] for a typical DHS compared to thousands of bps for a typical histone mark peak, and sequencing depth, over 200 million reads for each DNase-Seq data we used, compared to 10-30 million reads for a typical histone-mark ChlP-Seq experiment. For these two main reasons, here we used DHSs and not other regulatory marks.

This work focused on developing a framework for analyzing genetic data in light of the growing functional data. Perhaps the largest difference between our approach and many other reports is that we have used the functional data, instead of the genetic data, to dictate which regions of the genome would be evaluated (Fig. 1a,b). This resulted in, 1) a reduced requirement for multiple-testing correction and thereby increased statistical power; 2) tested units became context-specific, short regulatory regions thought to influence the studied trait, instead of all SNPs—the large majority of which have no functional impact; and 3) improved discovery and interpretation of genetic associations from current autoimmune-disease GWAS, afforded by trait-relevant, finely-resolved regulatory priors.

Figure 1. Overview

a. From left to right: (1) Functional analyses of GWAS data typically follow a funnel-like strategy starting with genetically associated loci and focusing in them on subsets of functional DNA (e.g. DHSs). That is, the genetic information dictates which DNA regions are considered. (2) A further funneling to smaller subsets of functional DNA may be desired by focusing on trait-relevant cell types. (3) We propose that an additional funneling to functional DNA unique to trait-relevant cell types could be even more advantageous. (4) By reversing the information flow from genetics → function, to function → genetics, and considering only one tag SNP per DHS, we transition from a funnel-like approach, to a flashlight-like approach where the functional information dictates which DNA regions are being considered. b. Several limitations of GWAS and functional, funnel-like approach to GWAS are mitigated by the proposed flashlight approach. c. DNase-seq data from adaptive-immune and non-immune cell types were integrated to generate a single DHS annotation and quantify DHS accessibility in cleavages per kilo base per million (CPKM) across cell types. Adaptive-immune-specific DHSs were identified and further resolved into accessibility profiles. DHSs belonging to each accessibility profile were characterized in terms of enrichment in binding motifs, GO terms, and GWAS signal in autoimmune diseases and unrelated control traits. Using the adaptive-immune-specific DHSs as input DNA, we associated DHSs to six autoimmune diseases with GWAS data. We integrated combined results from six autoimmune diseases using regulatory Manhattan plots (one per locus) and a hierarchical network model (over all loci).

Finally, since more results were generated than are possible to report in detail, we integrated results from six autoimmune diseases in a human-accessible format, in the hope of encouraging further investigations by others. This was achieved through a network model (integrating over all results) and a regulatory Manhattan plot schema (integrating results per locus). See Fig. 1c for an overview of our approach.

Results

Generating a unified DHS annotation from multiple samples and quantifying DHS accessibility

Currently, there is no unified genome-wide annotation for DHSs as there is for example for genes. Instead, each sample comes with its own unique DHS annotation. Moreover, there are currently no standards for quantifying accessibility as there are for quantifying gene expression, e.g. RPKM¹⁸. This makes working with DHSs from multiple DNase-seq experiments challenging. Here, we generated a single DHS annotation from fifteen available DNase-seq samples^16,17,19: seven adaptive immune cell types (B, CD8+ T, Naïve CD4+ T, Th1, Th2, Th17, Treg), an innate immune cell (monocytes), and seven non-immune cell types (human embryonic stem cell, hereafter hESC, fetal brain, astrocyte, myoblast, fibroblast, epithelial, and hematopoietic stem cells, hereafter hematoSC). In total, we annotated 348,527 DHSs covering 3.4% of the human genome from DHSs that were significantly accessible in at least one of the above cell types (Methods). We then quantified accessibility per DHS and cell-type in number of cleavages per kilobase per million (CPKMs). We visualized the resulting accessibility matrix as a heatmap (Fig. S1). We saw that over half of all accessible DNA in adaptive-immune cell types is ubiquitously accessible across many other cell types, and that such ubiquitous DHSs tended to be more accessible when compared to adaptive-immune-specific DHSs. If one considered trait-relevant cell types alone, or considered cell types independently from one another, the ubiquitous DHSs might have overshadowed the less accessible but more interesting adaptive-immune-specific DHSs. This demonstrates the insights that can be gained by creating one DHS annotation from trait-relevant and trait-irrelevant cell types, and continuously quantifying accessibility in a single matrix.

Grouping DHSs by accessibility profiles

Next, we hypothesized that grouping of DHSs by the subset of cell-types they were accessible in, would also group DHSs by distinct regulatory functions. Using the CPKM matrix, we scored DHSs for how well they matched cell-type-specific profiles, or profiles specific to predefined subsets of cell types (Figure 2a). Out of nineteen accessibility profiles we assayed, nine were unique to adaptive-immune cell types (purple), and were designed to group DHSs in broad strokes by salient immune functions: B cells specific DHSs (antibody production²⁰), CD8+ T cell specific DHSs (cell-mediated killing²¹), Naïve CD4+ T cell specific DHSs (maintenance of non-activated T cells²²), CD4+ Th1 specific DHSs (cytokine-induced cell-mediated immunity²²), CD4+ Th2 specific DHSs (cytokine-induced antibody production²²), CD4+ Th17 specific DHSs (cytokine-induced mucosal immunity²²), CD4+ Treg specific DHSs (cytokine-induced anti-inflammatory response and tolerance to self antigens²²); and for subsets of cell types: DHSs specific to all six T lineages (T cell maintenance), and DHSs specific to all four activated CD4+ T cells, namely: Th1, Th2, Th17 and Treg (CD4+ T cell activation). As examples, we present genome-browser views of two uniformly selected DHSs per accessibility profile (Fig. 2b, Fig. S2). The accessibility annotation, CPKM matrix, and match-scores per accessibility profile are available in Table S1.

Figure 2. Grouping DHSs by accessibility profiles

a. Each DHS received a score for how well it matched one of nineteen predefined accessibility profiles. The top 5000 DHSs matching each accessibility profile (columns) were visualized as a heatmap across all fifteen cell-types (rows). We noted that lineage-specific DHSs for the related T cells were captured quantitatively as having higher accessibility in one T lineage compared to the others, but not in binary terms. b. An IGV genome browser view (Broad Institute), across all fifteen cell-types, of a 1kb DNA region surrounding the midpoint of DHSs uniformly selected as the 100^th best match per accessibility profile. Reads per millions were used as input to IGV. For each panel the track height was fixed at a single value across all tracks to allow comparisons across cell types.

We also compared the grouping of DHSs into accessibility profiles based on the CPKM matrix, with the alternative of grouping DHSs into the same profiles using a binary matrix in which DHSs were annotated as accessible (1) or not accessible (0)–the common practice for DHS data^{8,11,16,19,23,24}. With respect to resolving accessibility profiles among the relate T lineages, the CPKM matrix increased the number of identified DHSs per profile (sensitivity; Fig. S3), proportion of correctly classified DHSs per profile (specificity; Fig. S3), and overall accessibility at identified DHSs per profile (signal-to-noise ratio; Fig. S4; Methods).

Underlying DHSs of different accessibility profiles are distinct combinations of DNA sequence motifs

The expected molecular basis for DNA accessibility differences between cell types is a corresponding difference in TF occupancy^9,25. Therefore, to assess whether grouping of DHSs into accessibility profiles was cell type dependent and not strongly confounded by other factors (e.g. genetic or environmental differences between sample donors), we de novo identified enriched DNA sequence-motifs in the top 600 DHSs matching each accessibility profile (Methods). Consistent with unique regulatory functions for DHSs belonging to different accessibility profiles, but not with confounders, distinct combinations of sequence motifs were enriched in each profile, often matching to binding sites of TFs known to be important in the accessible cell types (Fig. 3a). For example, binding sites for the Th2 and Th17 master regulators, GATA3²² and RORC²⁶, were solely discovered in DHSs specific to Th2 and Th17 cells, respectively. However, we did not recover the binding motifs for the Th1 and Treg master regulators, FOXP3 and TBX21, respectively, suggesting that a lower-resolution was achieved for DHSs specific to these two cell types. Additionally, we found that ubiquitous DHSs were solely enriched for the CCCTC-binding factor (CTCF) – a constitutively expressed DNA-binding protein involved in organizing chromatin into topological domains²⁷ – suggesting that these DHSs were marking generic chromatin-to-chromatin contact points found across all cell types.

Figure 3. Functional characterization of DHSs grouped by accessibility profiles

a. A table summarizing de novo identified DNA sequence motifs (rows) enriched in the top 600 DHSs per accessibility profile (columns). The percent of DHSs containing the enriched motifs is reported. We gave motifs matching known TF binding sites the name of the matching TFs, otherwise, names were reported as the accessibility profile name followed by an index (e.g. see two novel binding sites for fetal-brain-specific DHSs). b. We assigned a gene to each DHS by the nearest transcription start site (TSS), and evaluated the top 600 unique genes per accessibility profile for GO term enrichment. The top six enriched terms for six example profiles are shown as barplots. The Bar length marks the fold change (FC) between observed and expected number of genes. Black bars correspond to FC larger than four. Highly enriched GO terms were meaningful and specific to the queried profile. Barplots for all accessibility profiles are presented in Figure S5. c. We assigned a gene to each DHS by the nearest TSS, and evaluated the top 600 unique genes per accessibility profile for differential gene expression in profile-related tissues. We show boxplots of Z score distributions of gene expression values for each profile (x-axis), in brain cortex, CD19+ B cells, and CD4+ T cells. As can be seen by comparing boxplots, in each sample the most differentially expressed genes came from accessibility profiles of cell types related to the tissue. Similar boxplots for additional tissues are presented in Figure S6 (BioGPS) and Figure S7 (GTEx).

Accessibility profiles grouped DHSs by coherent regulatory functions

We next determined if DHSs of different accessibility profiles marked regulatory DNA for genes with distinct cellular functions. To this end, we associated DHSs with their nearest transcription-start-site (TSS) gene, as a proxy for regulated genes, and evaluated each profile for gene ontology (GO) term enrichment (Methods)^28,29. For each accessibility profile, the enriched GO terms were mostly in line with known functions in accessible cell types (Fig. 3b and Fig. S5). Using the BioGPS³⁰ and GTEx³¹ gene-expression compendia, we also evaluated if genes assigned to DHSs of a given accessibility profile, where differentially expressed in tissues relevant to accessible cell types (Fig. 3c, Fig. S6, S7). This was indeed the case. For example, in bronchial epithelial cells, genes assigned to epithelial-specific DHSs were most differentially expressed. Additionally, we noted that up regulation in gene expression was much more common than down regulation, indicating that most DHSs marked enhancers.

Taken together, the profile-specific enrichment in TF binding sites, GO terms, and differential gene expression, show that grouping DHSs by accessibility profiles, resulted in grouping of regulatory DNA by coherent, cell-type-dependent regulatory functions.

GWAS signal stratified by DHS cell-type accessibility profiles

In previous steps, we grouped DHSs into accessibility profiles. We first verified that as expected adaptive-immune-specific DHSs were selectively enriched with risk alleles for autoimmune diseases (Fig. S8). We further resolved which subsets of adaptive-immune-specific DHSs were most relevant to each disease, using a rank-based approach and a permutation-based statistic (Methods; Fig. S9). No single accessibility profile dominated across all autoimmune diseases (Fig. S10a). However, within each disease more strongly enriched cell-type profiles emerged, e.g. DHSs specific to CD8+ T cells, B cells, and Treg cells, were most enriched for systemic lupus erythematosus (SLE). Perhaps of more interest was that Alzheimer’ disease and schizophrenia, two non-autoimmune controls, clustered together with the autoimmune diseases (Fig. S10b). Specifically, we found that schizophrenia was enriched with DHSs specific to CD4+ T cells, and Alzheimer’ was enriched with DHSs specific to monocytes, B cells, and CD8+ T cells, revealing involvement of different immune processes, and providing further resolution to recent results suggesting immune involvement in susceptibility to Alzheimer’s^32,33 and schizophrenia^34,35.

A context-specific, regulatory-wide association study (csRWAS)

Next, we performed an association study between the adaptive-immune-specific DHSs and autoimmune diseases. Specifically, from each GWAS we assigned a single proxy SNP for each DHS, and used that as the DHS association P-value (Methods, table S2). Choosing the largest GWAS (by sample size) for each of six autoimmune diseases, we analyzed rheumatoid arthritis³⁶ (RA, n=80k), ulcerative colitis³⁷ (UC, n=26k), Crohn’s disease³⁸ (CD, n=21k), multiple sclerosis³⁹ (MS, n=15k), systemic lupus erythematosus⁴⁰ (SLE, n=11k), and type 1 diabetes⁴¹ (T1D, n=5k). The genome-wide significance typically employed by GWAS (p=5x10^-8) is not appropriate for csRWAS, as all the adaptive-immune-specific DHSs combined constitute ~0.37% of the human genome. Leveraging the groupings of DHSs into accessibility profiles, we estimated a null P-values distribution from proxy SNPs assigned to non-immune DHSs (Methods), as we did not expect these to be specifically associated with autoimmune diseases. However, to the extent that non-immune DHSs were tagged by risk alleles, this null is conservative. This allowed matching a P-value to a desired FDR threshold. For example, at an FDR of 0.001, P-values ranged from 2x10^-10 for RA to 3.1x10^-4 for MS, and at an FDR of 0.005, from 7x10^-5 for RA to 2.8x10^-3 for MS. Employing an FDR<0.005 as a significance threshold, we identified between 243 and 839 DHSs (94 to 165 independent loci) per disease. To aid in analysis of these many associated DHSs, we first filtered them based on the GWAS genome-wide significance threshold, as associations above this threshold would have likely already been reported by the corresponding GWAS. Specifically, each associated DHS at FDR<0.005 was assigned to one of four groups, in the following order of precedence:

Genome-wide significant (GW), if the associated DHS itself was identified above genome-wide significance (p<5x10^-8).
Locus GW significance (IGW), if any GWAS SNP within 0.1cM or 100kb of the associated DHS was identified above genome-wide significance.
True, if any GWAS SNP within 0.1cM or 100kb of the associated DHS, in the GWAS catalog⁴ for the same disease, was identified above genome-wide significance. Or,
Novel otherwise.

That means that DHSs in the Novel and True bins represent associations found below genome-wide significance (with the latter reaching genome-wide significance in other studies). Similarly, after grouping DHSs into independent loci, loci were assigned into these four groups, as determined by the highest-precedence DHS found in each locus.

We initially expected all GW and lGW loci to be reported in the GWAS catalog; after all they had a GWAS SNP with p<5x10^-8. In practice however we found that this was not the case. For example, out of 644 DHSs assigned to the GW bin (FDR<0.005), 89 were not reported as part of any locus in the catalog. We examined these un-cataloged DHSs manually (Methods) and found that beside a single genome-wide-significance SNP supporting them, typically 0-4 other SNPs in the region were below a nominal significance of p<1x10^-3. Therefore, these loci were likely flagged as poor in the original GWAS and therefore not reported. Here, we flagged such poorly associated DHSs if around the associated DHS, fewer than four SNPs with p<1x10^-3 were found (Methods). Finally, we determined which DHSs and loci replicated across diseases, as this can provide further proof that an association is genuine, and detect key loci and regulatory DNAs associated to multiple autoimmune diseases (Methods).

We summarized the results for all the non-poor loci, stratified by the above groupings of: GW, lGW, True, and Novel in Table 1 (see table S3 and S4 for details per DHS and per locus, respectively). Out of 529 loci associated to one or more of the autoimmune diseases (FDR<0.005), 322 (60.9%) had further support by either replicating here across diseases (27.2%) or reaching genome-wide significance in the GWAS catalog in the same diseases (55%). Importantly, out of the 529 loci, 327 were discovered below genome-wide significance (True or Novel) in at least one of the diseases a locus was associated with, and 153 (46.8%) of these had replication or catalog support. This shows that using informed regulatory priors allows the identification of many genuine associations below GWAS genome-wide significance; albeit higher significance does lead to improved validation rates (Table 1). It still remains to be evaluated how GWAS compares to csRWAS with respect to replication rates at equal significance thresholds and we will return to this question to conclude the results section. Next, we describe how polymorphisms that disrupt potential TF binding sites inside DHSs were identified, and how we prioritize among them to suggest follow up SNPs for each associated locus.

View this table:

Table 1:

Associated loci with cross-replication and GWAS catalog status

Identifying polymorphisms in DHSs that disrupt predicted TF binding sites

Genetic variants that modify TF binding sites in DHSs are a major source of human gene expression variation⁴². Here, we scanned DHSs for sequences matching one of the enriched TF binding sites found in adaptive-immune-specific accessibility profiles (Fig. 3a), and identified polymorphisms disrupting such binding sites as possible sources of disease risk alleles (Methods). Since many such polymorphic TF binding sites were found, we scored binding sites using a three-letter grade to allow prioritizing among them (Methods). The first letter grade measured how well the predicted binding-site matched to the cognate TF binding site and cell-type context (+A being the best). The second and third letters measured the SNP MAF in Europeans and Asians (A being common, B intermediate, and C rare), respectively (table S5). We will see examples of these grades in action below.

We next describe how we integrated the regulatory and genetic information in a network model.

A hierarchical integrative network model

Through disease-associated DHSs our approach connected diseases, SNPs, loci, DHSs, polymorphic TF-binding-sites, DHS accessibility profiles, and genes by most proximal TSS. Using the accessibility profiles as a functional guide, we constructed a hierarchical network model with five levels. The network for associated DHSs at an FDR<0.005 is visualized in Figure 4 and is available for navigation in Cytoscape⁴³ (supplementary file 1). Although gene assignment to DHSs by nearest TSS is prone to false negatives and false positives, particularly in gene-dense regions, this simple procedure clearly matched many DHSs to correct regulated genes, as supported by the GO term enrichments and differential gene expression analyses, and as shown next by KEGG⁴⁴ pathway enrichment. We first examined the 528 genes in the top level of the network. We asked if the associated genes were enriched with functional pathways, above what was expected for genes most proximal to adaptive-immune-specific DHSs (Methods). The top three enriched KEGG pathways revealed canonical T-cell signaling pathways (hypergeometric test): JAK/STAT signaling pathway⁴⁵ (fold-change[observed/expected]=5.56, p=2.86x10^-12), Cytokine-cytokine receptor interaction⁴⁵ (fold-change =4.08, p=3.43x10^-11), and T-cell receptor signaling⁴⁶ (fold-change =4.17, p=1.65x10^-06). No unexpected pathway enrichments were found. We further examined whether genes found in two or more autoimmune diseases (96 such genes) were more enriched for one of those three pathways. Cytokine-cytokine receptor interaction⁴⁵ and T-cell receptor signaling showed similar fold enrichments (fold-changes of 4.7 and 4.48, respectively), however, JAK/STAT signaling pathway had almost doubled its fold enrichment (fold-change of 10.04, p=9.56x10^-07) (Fig. 4b). The JAK/STAT genes along with their DHS-derived cellular-contexts were: 1. Janus-kinase 2 (JAK2, Activated CD4+ T set⁴⁷), 2. Suppressors of cytokine signaling 1 (SOCS1, Activated CD4+ T set²²), 3. Signal transducer and activator of transcription 1 (STAT1, T set⁴⁸), 4. Protein tyrosine phosphatase non-receptor type 2 (PTPN2,T set⁴⁹), 5. Interleukin 23 receptor (IL23R, T set or Th17⁵⁰), 6. Interleukin 12 Receptor beta 2 (IL12RB2, Th1⁵¹), 7. Leukemia inhibitory factor (LIF, Th1⁵²), 8. Sprouty-related EVH1 domain containing 2 (SPRED2, Treg), 9. Signal transducer and 10. Activator of transcription 3 (STAT3, Treg^22,53), and interleukin 21 (IL21, Th17⁵⁴), most having reported functions in the predicted cell-type context (references provided). Five of these genes form protein-protein complexes through interactions with JAK2, namely: IL12RB2, STAT3, STAT1 and SOCS1^22,55’⁵⁷. This suggests that in addition to the highly autoimmune-relevant HLA region (which we excluded from all of our analyses), the dysregulation of genes involved in JAK/STAT signaling is the second most prevalent pathway in autoimmunity.

Figure 4: A cross disease hierarchical network model

a. We display a hierarchical network model constructed of DHSs associated with six autoimmune diseases at an FDR<0.005. Reading the gene names in the network is possible by either using the digital copy of this manuscript and zooming in, or by navigating the Cytoscape network used to produce this image (supplemental file 1). The network has five levels indicated in the left diagram. DHSs were matched to accessibility profiles (rightmost in the network) and connected to harbored polymorphic TF binding sites (leftmost in the network). Nodes from lower levels connected to nodes in the immediate upper level. DHSs, TF binding sites, and genes were colored to indicate their most well supported accessibility profile (color map on the top left). Loci associated with multiple diseases appear above the associated disease with the largest GWAS sample size. b. A disease to gene network of genes associated with two or more diseases (the more diseases a gene was associated with the more central it is). These cross-disease genes were particularly enriched with the JAK/STAT signaling pathway (see main text). c. We highlight a sub network generated by selecting IRF5 as a gene of interest and descending down the network to diseases. Underlying the association is a stretch of B-cell-specific associated DHSs, one of which harbors a good match for SPI1 - a TF binding site enriched in B-cell-specific DHSs. Furthermore, a common SNP in Europeans and Asians disrupt this binding site, providing a testable hypothesis in search of a causal SNP. We graded such good match polymorphic TF binding sites, found in DHSs assigned to accessibility profiles where the TF binding site was also enriched in, and intersected by a common SNP in Europeans and Asians, as ‘+AAA’ to allow prioritizing regulatory polymorphism for follow up work (see methods to better understand our motivation and nomenclature for the three letter grading system). d. We highlight the sub network generated by selecting BLK as a gene of interest and descending down the network to diseases. This gene was assigned to a B-cell context, as mainly B-cell-specific associated DHSs were underlying it. Three good motif matches were found for RREB1, a TF binding site enriched in B-cell-specific DHSs - one of which was intersected by a common SNP in Europeans and Asians (+AAA), and two were intersected by less frequent SNPs (scored as +ABC and +ACC). Together, these and other polymorphic TF binding sites provide a prioritizable search space for causal regulatory SNPs.

Note that there is a bias in which cellular contexts we can find. For example, IL21 was associated with a Th17 context, but is also important for T follicular helper cells⁵⁸, for which we had no accessibility data. Moreover, for T cells we have six lineages, allowing fine resolution into cellular contexts, whereas for B cells we have only one sample, a composite of B cells at different developmental stages, in which case cellular contexts can only be identified as an aggregate.

Next, we provide two examples for how the network can be used to extract actionable insights for two genes of interest, IRF5 and BLK (Fig. 4c,d). For both genes, we show that stretches of B-cell-specific DHSs underlie the association, with at least one of these DHSs harboring a ‘+AAA’ polymorphic TF binding site. Thereby suggesting candidate causal regulatory SNPs, molecular mechanisms of action, and cellular contexts of associations. In support, both genes were previously described as important for B cell development and function^59,60 and in the same regions super enhancers in B-cells were previously reported¹⁰.

Regulatory Manhattan plots (RMPs)

A regulatory Manhattan plot (RMP) highlights DHS tagging SNPs above a set significance threshold, across related traits, and visualizes their matched accessibility profile. The profiles in turn help determine the likely cellular context of the locus. For example, we show the RMPs for the IRF5 and BLK example loci discussed above (Fig. 5). Note that the signal over the IRF5 TSS region is not in LD with the signal above the TNPO3 gene body (Fig. S11). As additional examples, we present thirteen associations in three loci that replicated across diseases, IL2RA, PTPN22, and IKZF1, including associations below GWAS genome-wide significance (True and Novel) (Fig. 6). For IL2RA and IKZF1, csRWAS resolved associations to a single dominant accessibility profile: PTPN22 with T set, and IL2RA with Treg. Clear evidence that a locus is not associated is also important, so we provide RMPs for all loci across all diseases, including non-associated ones (supplementary files 2 and 3). Such RMPs reveal that csRWAS often could not associate loci with some diseases, not because an association signal was not present, but because no DHS proximal SNPs were available. This highlights the importance of high-quality dense imputations for csRWAS, and means that we could only provide a lower bound for the true cross-replication rates across these six autoimmune diseases.

Figure 5: Regulatory Manhattan plots for the IRF5 and BLK loci

a. and b. We present regulatory Manhattan plots (RMPs) for the BLK and IRF5 loci, complementing the subnetworks in Fig. 4. The input GWAS name is indicated on the top left of each plot together with the loci association bin (GW, lGW, True, or Novel). For cross-disease associations, RMPs were ordered by loci significance (from top to bottom) and centered on the same chromosomal position. DHS tagging SNPs above an FDR threshold of 0.005, in any of the diseases, are visualized as small pie charts with one wedge per accessibility profile, in all of the diseases (as long as a tag SNP was available). Purple wedges indicate the accessibility membership of an associated DHS. A larger summary pie chart that counts the total times each accessibility profile was observed over the entire locus is shown on the top left. Often this summary pie chart can suggest the mostly likely cell-type context for an association. Accessibility profiles in pie charts were ordered as indicated outside the circumference of the summary pie-chart (moving clockwise form 12 o’clock): B cell, CD8+ T cell, Naïve CD4+ T, Th1, Th2, Treg, Th17, T set, and Activated CD4+ T set. For example, for the IRF5 locus, B-cell-specific accessibility profile was matched by seven associated DHSs, followed by two matches to Activated CD4+ T set profile and a single match to T set and Th17. Therefore, B cell is the predicted cellular context for the IRF5 locus association. Note that the subnetworks from figures 4c and 4d connect only to the DHSs most proximal to the TSSs of IRF5 or BLK, whereas the Manhattan plots displays all DHSs in a given locus. LD was calculated for all SNPs within 50kb of their nearest DHS tagging SNP. Genes were visualized from their longest Refseq isoform and in cross-disease associations shown under every other RMP.

Figure 6. Examples of prevalent loci associated with autoimmunity

We present three loci associated at FDR<0.005 across four or more diseases (13 associations in total). For a. and b. the cellular context could be resolved to a single most-likely accessibility profile, and five associations were detected below genome-wide significance. For c. we report two novel associations in MS and RA upstream to IKZF1 that replicated in SLE and CD.

In support of the two novel IKZF1 associations reported above with MS and RA, csRWAS linked three additional members of the ikaros hematopoietic transcription factor family⁶¹, namely, IKZF2, IKZF3, and IKZF4, to autoimmune diseases (Fig. S12), further resolving the cellular context of the RA-association with IKZF4 to Treg, and the RA-UC-CD-SLE-T1D association with IKZF3 to a T-set or CD8+ T cell accessibility profiles. Note that IKZF3, was associated with five of the six autoimmune diseases, suggesting that dysregulation of IKZF3 is a prevalent cause for autoimmunity, highlighting these TF for possible drug intervention. Additional RMPs for eight loci including the gene-regions for IL12RB2, TCF7, CTLA4, CD28, CCR6, and ETS1 (Fig. S13-S14), reveal more cellular contexts and identify four novel cross-replicating associations, two for IL12RB2 with RA and SLE, one for ETS1 (upstream) with SLE, and one for TCF7 with RA, as well as one non-replicating association for ETS1 (gene body) with UC. We also note that several genome-wide significant loci identified here were not within 0.1cM of a catalog lead-SNP for the respective diseases, and were not flagged by us as poor. Specifically, two cross-replicating loci were found for RA near ZFP36L1 and FAM213B (Fig. S15), two loci for CD, the first cross-replicating with UC near BRD7 and the second not-replicating near TTC33 (Fig. S16), and one locus for MS near SOCS1 that crossreplicated with T1D and CD (Fig. S17). To summarize, we highlight the more promising novel and replicating loci in Table 2 and Figures S18-S27, providing an array of new insights for six common autoimmune diseases. Furthermore, in order to assist in prioritizing polymorphic TF binding sites for follow up work, each RMP is accompanied by a table (in the same.pdf file) providing additional information similar to that found in the network, but in a tabular format and on the locus scale (supplementary files 2 and 3, see example in Fig. S28). We next contrast the loci replication rates of GWAS and csRWAS to conclude the results section.

View this table:

Table 2:

Novel replicating loci, cellular contexts, and DHS proximal genes

High rate of validation for genetic associations identified by csRWAS

Often, an initial GWAS suffers from what is known as the “winner’ curse”, where associations found are likely stronger in that GWAS sample than in the general population that is assayed⁶². This problem is less significant as the GWAS sample size increases. Therefore, the gold standard for validation of any genetic study is replication in an independent sample, which is recommended to be larger to correctly identify false positives⁶³. Following these GWAS guidelines, we examined the rate at which loci discovered in a smaller GWAS, replicated in a larger and independent GWAS for the same autoimmune disease. We compared the replication rates attained by GWAS (considering all SNPs), RWAS (considering all DHS tagging SNPs), and csRWAS (considering all adaptive-immune-specific DHS tagging SNPs) (Methods). We show that the replication rates were overall higher for csRWAS when compared to RWAS or GWAS (Fig. 7a,b). The difference in replication rate was more pronounced for the smaller CD discovery data (n=4,664), when compared to the larger RA discovery data (n=22,515), consistent with prior information having larger impact on underpowered GWAS. Since emerging whole-genome sequencing-based GWAS commonly have such lower sample sizes^64–70, we propose that csRWAS is well suited for their analysis as well, as it can leverage the improved SNP resolution offered by sequencing data, while avoiding the pitfalls of greatly increasing the number of variants considered.

Figure 7. Contrasting the validation rates of csRWAS and GWAS

DHSs were assigned to their nearest GWAS reported SNP. We compared loci replication rates for GWAS (all SNPs, grey), RWAS (all DHS tagging SNPs, blue), and csRWAS (all adaptive-immune-specific DHS tagging SNPs, purple). We considered a GWAS locus as replicating if any of the discovered SNPs within it reached nominal significance in the replication study (p<0.01). For csRWAS or RWAS, a locus was considered as replicating if any of its discovered DHS tagging SNPs replicated with such nominal significance. a. and b. Loci replication rates as a function of discovery P value thresholds for CD and RA, respectively. On the lines we display the number of discovered loci that replicated at each threshold. c. and d. The inverse of a. and b., estimating the P-value thresholds required for attaining increasing replication rates. The significance threshold required for csRWAS to attain equal replication rates with GWAS, was often much lower. As before, the entries on the lines specify the number of replicating loci. e. and f. The fraction of all csRWAS associations (total csRWAS discoveries) that were not detected by GWAS at equivalent replication rates. The entries on the line specify the number of such discoveries unique to csRWAS.

Considering the reciprocal of the results above, we show that to achieve a desired replication rate, csRWAS required a reduced significance threshold when compared to GWAS (Fig. 7c,d). For example, if the desired replication rate was 0.5 for RA, csRWAS required loci to reach p<7.4x10^-05, while GWAS required loci to reach p<5.6x10^-06. This means that at the intermediate P-values (in which significance threshold was reached for csRWAS but not for GWAS) additional loci could be discovered from csRWAS, without loss of accuracy. We quantified the number of such discoveries unique to csRWAS for a range of replication rates (Fig. 7e,f). Keeping with the same example as before, for RA at a 0.5 replication rate, csRWAS discovered 7 additional loci to GWAS, accounting for ~58% of all twelve csRWAS discoveries at that replication rate. The results for CD were more striking, likely because of the higher relevance of priors for underpowered GWAS. These results remained consistent when varying parameters (Methods, Fig. S29-S30).

Discussion

We have demonstrated that genome-wide association testing can be made considerably more powerful and precise by making use of chromatin accessibility and other functional data, and by focusing on genomic regions relevant to the trait or disease of interest. We illustrate the method by reanalyzing existing genetic data from six autoimmune diseases, where the approach led to additional discoveries and increased interpretability. Note that even if associated DHSs did not harbor any causal allele, the regulatory information they provide can still be used to highlight likely cellular contexts, and therefore help determine candidate genes (e.g. genes with function or high expression in suggested cell types), and guide further experiments to prove causal relationships. We propose that when accessibility data is available for trait-relevant cell types, csRWAS would be a valuable complementary analysis to GWAS.

An important technical advancement made here is in defining a single annotation from multiple DNase-seq samples, and generating a continuous accessibility matrix from multiple DNase-Seq data. This simplifies analysis of large collections of DNase-seq experiments, and can be generalized to other peak-like functional data. This also suggests that a unified DHS annotation from all DNase-seq samples can be defined, facilitating comparative studies of accessibility across a larger accessibility matrix (analogous to GTEx³¹ for gene expression).

As used here, csRWAS had limitations. Some were data related, e.g. not having DNase-seq data for every trait-relevant cell type, or not having biological replicates to estimate the contribution of donor-to-donor variation. As a result of the latter, for example, some of the differences we attributed to cell type specificity, may instead be due to donor-specific genetic or environmental differences. However, the multitude of tests we performed to validate that DHSs grouped by accessibility profiles, also grouped regulatory DNA by coherent regulatory functions, suggests that DHS accessibility was driven primarily by cell-type differences. Other limitations were analysis related, e.g. the TF binding sites we tested here, likely comprise only a fraction of the known and unknown binding motifs important for adaptive-immune cell types. With that respect, the modular design of the analysis makes it possible to incorporate additional DHS related data in the future. Also, although a useful first approximation, the assignment of the most proximal gene’ TSS to DHSs as a likely target was error prone. As such, when examining a locus of interest we recommend carefully reviewing the RMP and information table for that locus, and applying domain knowledge before making a decision on candidate genes for follow up.

Extensions to this work are also possible. For example, the measured trait, instead of being disease status, could be gene expression in relevant cell types⁷¹. The large number of studied “traits” generally further diminishes power of such expression quantitative trait loci (eQTL) mapping. Hence, focusing on regulatory DNA specific to trait-relevant cell types is promising for greatly improving both statistical power and functional insights of eQTL studies.

To conclude, we identified 1975 DHSs (FDR<0.005) associated to one or more of six autoimmune diseases, grouped into 529 independent loci across the genome. The original GWAS of these six diseases, employing standard genome-wide significance thresholds and not integrating non-genetic data, would have missed 327 of these loci, although 153 of those (46.8%) readily replicated here or in the GWAS catalog. Also, as a result of the functional prior, this study identified and replicated 52 novel loci. But perhaps more importantly, csRWAS provides actionable insights for many associations, e.g. suggests polymorphic TF binding sites only in accessible DNA unique to trait-relevant cell types, and proposes cellular contexts for follow up experiments.

Taken together we presented compelling evidence that trait-tailored approaches to functional and genetic data, can provide a structured path to better understanding how genotypes are translated to phenotypes.

Figures

All figures appear below but are also available as two downloadable .pdf files from the following private links.

Main figures, private links:

https://drive.google.com/open?id=OB_nf7cPOLTBSSVpuajRxQlZySVk

Supplemental figures, private link:

https://drive.google.com/open?id=OB_nf7cPOLTBSQUtfZOpUV3hQLW8

Acknowledgements

We thank Leonardo Arbiza, Kaixiong Ye, and Cris Van Hout for helpful comments and discussions about this work. This work was supported in part by NIH grant R01- HG006849 (A.K.) and by an award from the Ellison Medical Foundation (A.K.). This work would not have been possible without the sharing of data through public resources. We thank the Encyclopedia of DNA Elements (ENCODE) consortium and Roadmap Epigenomics consortium and particularly the work carried out by the Stamatoyannopoulos lab for collecting the DNaseI-seq data used in this manuscript. We thank the following consortia for sharing GWAS summary statistics: Rheumatoid Arthritis Consortium International for Immunochip (RACI), International Inflammatory Bowel Disease Genetics Consortium (IIBDGC), The Coronary Artery Disease Genetics Consortium (C4D), International Consortium for Blood Pressure (ICBP), DIAbetes Genetics Replication And Meta-analysis (DIAGRAM), The Genetic Investigation of ANthropometric Traits (GIANT), International Genomics of Alzheimer’ Project (IGAP), Psychiatric Genomics Consortium (PGC), and the Euro-Canadian systemic lupus erythematosus consortium (EC-SLE). This study makes use of data generated by the Wellcome Trust Case Control Consortium (WTCCC). A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the WTCCC project was provided by the Wellcome Trust under award 076113. Finally, we would like to thank all the participants and staff involved in the collection of the genotype-phenotype data used in this manuscript.

References

↵
Cooper, G. S., Bynum, M. L., & Somers, E. C., Recent insights in the epidemiology of autoimmune diseases: improved prevalence estimates and understanding of clustering of diseases. Journal of autoimmunity 33, 197–207, doi:10.1016/j.jaut.2009.09.008(2009).
OpenUrl CrossRef PubMed Web of Science
↵
Hayter, S. M., & Cook, M. C., Updated assessment of the prevalence, spectrum and case definition of autoimmune disease. Autoimmunity reviews 11, 754–765, doi:10.100616/j.autrev.2012.02.001(2012).
OpenUrl CrossRef PubMed
↵
Goris, A. & Liston, A. The immunogenetic architecture of autoimmune disease. Cold Spring Harbor perspectives in biology 4, doi:10.1101/cshperspect.a007260(2012).
OpenUrl Abstract/FREE Full Text
↵
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research 42, D1001–1006, doi:10.1093/nar/gkt1229(2014).
OpenUrl CrossRef PubMed Web of Science
↵
Spain, S. L., & Barrett, J. C., Strategies for fine-mapping complex traits. Human molecular genetics, doi:10.1093/hmg/ddv260(2015).
OpenUrl CrossRef PubMed
↵
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet 43, 264–268, doi:10.1038/ng.759(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Boyle, A. P., et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322, doi:10.1016/j.cell.2007.12.014(2008).
OpenUrl CrossRef PubMed Web of Science
↵
Stergachis, A. B., et al. Developmental fate and cellular maturity encoded in human regulatory DNA landscapes. Cell 154, 888–903, doi:10.1016/j.cell.2013.07.020(2013).
OpenUrl CrossRef PubMed Web of Science
↵
Thurman, R. E., et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82, doi:10.1038/nature11232(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Hnisz, D., et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947, doi:10.1016/j.cell.2013.09.053(2013).
OpenUrl CrossRef PubMed Web of Science
↵
Maurano, M. T., et al.Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195, doi:10.1126/science.1222794(2012).
OpenUrl Abstract/FREE Full Text
Trynka, G., et al.Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics 45, 124–130, doi:10.1038/ng.2504(2013).
OpenUrl CrossRef PubMed
↵
Finucane, H. K., et al.Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, doi:10.1038/ng.3404(2015).
OpenUrl CrossRef PubMed
↵
Gerasimova, A., et al.Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data. PloS one 8, e54359, doi:10.1371/journal.pone.0054359(2013).
OpenUrl CrossRef PubMed
↵
Smith-Garvin, J. E., Koretzky, G. A., & Jordan, M. S., T cell activation. Annu Rev Immunol 27, 591–619, doi:10.1146/annurev.immunol.021908.132706(2009).
OpenUrl CrossRef PubMed Web of Science
↵
Consortium, E. P., et al.An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74, doi:10.1038/nature11247(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Roadmap Epigenomics, C., et al.Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330, doi:1038.1038/nature14248(2015).
OpenUrl CrossRef PubMed
↵
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 5, 621–628, doi:10.1038/nmeth.1226(2008).
OpenUrl CrossRef PubMed Web of Science
↵
Neph, S., et al.An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90, doi:10.1038/nature11212(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Pieper, K., Grimbacher, B., & Eibel, H., B-cell biology and development. J Allergy Clin Immunol 131, 959–971, doi:10.1016/j.jaci.2013.01.046(2013).
OpenUrl CrossRef Web of Science
↵
Zhang, N., & Bevan, M. J., CD8(+) T cells: foot soldiers of the immune system. Immunity 35, 161–168, doi:10.1016/j.immuni.2011.07.010(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Zhu, J., Yamane, H., & Paul, W. E., Differentiation of effector CD4 T cell populations (*). Annu Rev Immunol 28, 445–489, doi:10.1146/annurev-immunol-030409-101212(2010).
OpenUrl CrossRef PubMed Web of Science
↵
Schork, A. J., et al.All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoSgenetics 9, e1003449, doi:10.1371/journal.pgen.1003449(2013).
OpenUrl CrossRef PubMed
↵
Pickrell, J. K., Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits. American journal of human genetics 94, 559–573, doi:10.1016/j.ajhg.2014.03.004(2014).
OpenUrl CrossRef PubMed
↵
Maston, G. A., Evans, S. K., & Green, M. R., Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7, 29–59, doi:10.1146/annurev.genom.7.080505.115623(2006).
OpenUrl CrossRef PubMed Web of Science
↵
Manel, N., Unutmaz, D., & Littman, D. R., The differentiation of human T(H)-17 cells requires transforming growth factor-beta and induction of the nuclear receptor RORgammat. Nature immunology 9, 641–649, doi:10.1038/ni.1610(2008).
OpenUrl CrossRef PubMed Web of Science
↵
Ong, C. T., & Corces, V. G., CTCF: an architectural protein bridging genome topology and function. Nature reviews. Genetics 15, 234–246, doi:10.1038/nrg3663(2014).
OpenUrl CrossRef PubMed
↵
Ashburner, M., et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29, doi:10.1038/75556(2000).
OpenUrl CrossRef PubMed Web of Science
↵
Prufer, K., et al.FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC bioinformatics 8, 41, doi:10.1186/1471-2105-8-41(2007).
OpenUrl CrossRef PubMed
↵
Wu, C., Macleod, I., & Su, A., I. BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic acids research 41, D561–565, doi:10.1093/nar/gks1114(2013).
OpenUrl CrossRef PubMed Web of Science
↵
Consortium, G. T., Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660, doi:10.1126/science.1262110(2015).
OpenUrl Abstract/FREE Full Text
↵
Gjoneska, E., et al.Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518, 365–369, doi:10.1038/nature14252(2015).
OpenUrl CrossRef PubMed
↵
Yokoyama, J. S., et al.Association Between Genetic Traits for Immune-Mediated Diseases and Alzheimer Disease. JAMA Neurol, doi:10.1001/jamaneurol.2016.0150(2016).
OpenUrl CrossRef
↵
Muller, N., & Schwarz, M. J., Immune System and Schizophrenia. Curr Immunol Rev 661, 213–220(2010).
OpenUrl
↵
Harrison, P. J., Recent genetic findings in schizophrenia and their therapeutic relevance. J Psychopharmacol 29, 85–96, doi:10.1177/0269881114553647(2015).
OpenUrl CrossRef PubMed
↵
Okada, Y., et al.Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381, doi:10.1038/nature12873(2014).
OpenUrl CrossRef PubMed Web of Science
↵
Consortium, U. I. G., et al.Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nature genetics 41, 1330–1334, doi:10.1038/ng.483(2009).
OpenUrl CrossRef PubMed Web of Science
↵
Franke, A., et al.Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nature genetics 42, 1118–1125, doi:10.1038/ng.717(2010).
OpenUrl CrossRef PubMed Web of Science
↵
International Multiple Sclerosis Genetics, C., et al.Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219, doi:10.1038/nature10251(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Bentham, J., et al.Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nature genetics, doi:10.1038/ng.3434(2015).
OpenUrl CrossRef PubMed
↵
Wellcome Trust Case Control C., Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678, doi:10.1038/nature05911(2007).
OpenUrl CrossRef PubMed Web of Science
↵
Degner, J. F., et al.DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394, doi:10.1038/nature10808(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Shannon, P., et al.Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13, 2498–2504, doi:10.1101/gr.1239303(2003).
OpenUrl Abstract/FREE Full Text
↵
Kanehisa, M., & Goto, S., KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30(2000).
OpenUrl CrossRef PubMed Web of Science
↵
Rawlings, J. S., Rosler, K. M., & Harrison, D. A., The JAK/STAT signaling pathway. J Cell Sci 117, 1281–1283, doi:10.1242/jcs.00963(2004).
OpenUrl FREE Full Text
↵
Brownlie, R. J., & Zamoyska, R., T cell receptor signalling networks: branched, diversified and bounded. Nature reviews. Immunology 13, 257–269, doi:10.1038/nri3403(2013).
OpenUrl CrossRef PubMed
↵
Villarino, A. V., Kanno, Y., Ferdinand, J. R., & O’Shea, J. J., Mechanisms of Jak/STAT signaling in immunity and disease. J Immunol 194, 21–27, doi:10.4049/jimmunol.1401867(2015).
OpenUrl Abstract/FREE Full Text
↵
Quigley, M., Huang, X., & Yang, Y., STAT1 signaling in CD8 T cells is required for their clonal expansion and memory formation following viral infection in vivo. J Immunol 180, 2158–2164(2008).
OpenUrl Abstract/FREE Full Text
↵
Spalinger, M. R., et al.PTPN2 controls differentiation of CD4(+) T cells and limits intestinal inflammation and intestinal dysbiosis. Mucosal Immunol 8, 918–929, doi:10.1038/mi.2014.122(2015).
OpenUrl CrossRef PubMed
↵
Korn, T., Bettelli, E., Oukka, M., & Kuchroo, V. K., IL-17 and Th17 Cells. Annu Rev Immunol 27, 485–517, doi:10.1146/annurev.immunol.021908.132710(2009).
OpenUrl CrossRef PubMed Web of Science
↵
Koch, M. A., et al. T-bet(+) Treg cells undergo abortive Th1 cell differentiation due to impaired expression of IL-12 receptor beta2. Immunity 37, 501–510, doi:10.1016/j.immuni.2012.05.031(2012).
OpenUrl CrossRef PubMed
↵
Metcalfe, S. M., LIF in the regulation of T-cell fate and as a potential therapeutic. Genes Immun 12, 157–168, doi:10.1038/gene.2011.9(2011).
OpenUrl CrossRef PubMed
↵
Pallandre, J. R., et al.Role of STAT3 in CD4+CD25+FOXP3+ regulatory lymphocyte generation: implications in graft-versus-host disease and antitumor immunity. J Immunol 179, 7593–7604(2007).
OpenUrl Abstract/FREE Full Text
↵
Wei, L., Laurence, A., Elias, K. M., & O'Shea, J. J., IL-21 is produced by Th17 cells and drives IL-17 production in a STAT3-dependent manner. J Biol Chem 282, 34605–34610, doi:10.1074/jbc.M705100200(2007).
OpenUrl Abstract/FREE Full Text
↵
Parham, C., et al.A receptor for the heterodimeric cytokine IL-23 is composed of IL-12Rbeta1 and a novel cytokine receptor subunit, IL-23R. J Immunol 168, 5699–5708(2002).
OpenUrl Abstract/FREE Full Text
Yamamoto, K., et al.Physical interaction between interleukin-12 receptor beta 2 subunit and Jak2 tyrosine kinase: Jak2 associates with cytoplasmic membrane-proximal region of interleukin-12 receptor beta 2 via amino-terminus. Biochem Biophys Res Commun 257, 400–404, doi:10.1006/bbrc.1999.0479(1999).
OpenUrl CrossRef PubMed Web of Science
↵
Ali, S., Nouhi, Z., Chughtai, N., & Ali, S., SHP-2 regulates SOCS-1-mediated Janus kinase-2 ubiquitination/degradation downstream of the prolactin receptor. J Biol Chem 278, 52021–52031, doi:10.1074/jbc.M306758200(2003).
OpenUrl Abstract/FREE Full Text
↵
Ma, C. S., Deenick, E. K., Batten, M., & Tangye, S. G., The origins, function, and regulation of T follicular helper cells. J Exp Med 209, 1241–1253, doi:10.1084/jem.20120994(2012).
OpenUrl Abstract/FREE Full Text
↵
Lien, C., et al.Critical role of IRF-5 in regulation of B-cell differentiation. Proceedings of the National Academy of Sciences of the United States of America 107, 4664–4668, doi:10.1073/pnas.0911193107(2010).
OpenUrl Abstract/FREE Full Text
↵
Dymecki, S. M., Niederhuber, J. E., & Desiderio, S. V., Specific expression of a tyrosine kinase gene, blk, in B lymphoid cells. Science 247, 332–336(1990).
OpenUrl Abstract/FREE Full Text
↵
John, L. B., & Ward, A. C., The Ikaros gene family: transcriptional regulators of hematopoiesis and immunity. Mol Immunol 48, 1272–1278, doi:10.1016/j.molimm.2011.03.006(2011).
OpenUrl CrossRef PubMed
↵
Zollner, S., & Pritchard, J. K., Overcoming the winner's curse: estimating penetrance parameters from case-control data. American journal of human genetics 80, 605–615, doi:10.1086/512821(2007).
OpenUrl CrossRef PubMed
↵
Bush, W. S., & Moore, J. H., Chapter 11: Genome-wide association studies. PLoS computational biology 8, e1002822, doi:10.1371/journal.pcbi.1002822(2012).
OpenUrl CrossRef
↵
consortium, C., Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591, doi:10.1038/nature14659(2015).
OpenUrl CrossRef PubMed
Gudmundsson, J., et al.A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nature genetics 44, 1326–1329, doi:10.1038/ng.2437(2012).
OpenUrl CrossRef PubMed
Morrison, A. C., et al.Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nature genetics 45, 899–901, doi:10.1038/ng.2671(2013).
OpenUrl CrossRef PubMed
Palles, C., et al.Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature genetics 45, 136–144, doi:10.1038/ng.2503(2013).
OpenUrl CrossRef PubMed
Sidore, C., et al.Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nature genetics 47, 1272–1281, doi:10.1038/ng.3368(2015).
OpenUrl CrossRef PubMed
Vrieze, S. I., et al.In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes. Psychophysiology 51, 1309–1320, doi:10.1111/psyp.12350(2014).
OpenUrl CrossRef
↵
Zoledziewska, M., et al.Height-reducing variants and selection for short stature in Sardinia. Nature genetics 47, 1352–1356, doi:10.1038/ng.3403(2015).
OpenUrl CrossRef PubMed
↵
Nica, A. C., & Dermitzakis, E. T., Expression quantitative trait loci: present and future. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368, 20120362, doi:10.1098/rstb.2012.0362(2013).
OpenUrl CrossRef
↵
Edgar, R., Domrachev, M., & Lash, A. E., Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research 30, 207–210(2002).
OpenUrl CrossRef PubMed Web of Science
↵
Langmead, B., & Salzberg, S. L., Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357–359, doi:10.1038/nmeth.1923(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Machanick, P., & Bailey, T. L., MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697, doi:10.1093/bioinformatics/btr189(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Bailey, T. L., & Elkan, C., Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28–36(1994).
OpenUrl PubMed
↵
Bailey, T. L., DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659, doi:10.1093/bioinformatics/btr261(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L., & Noble, W. S., Quantifying similarity between motifs. Genome biology 8, R24, doi:10.1186/gb-2007-8-2- r24(2007).
OpenUrl CrossRef PubMed
↵
Mathelier, A., et al.JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic acids research 42, D142–147, doi:10.1093/nar/gkt997(2014).
OpenUrl CrossRef PubMed Web of Science
↵
Hume, M. A., Barrera, L. A., Gisselbrecht, S. S., & Bulyk, M. L., UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic acids research 43, D117–122, doi:10.1093/nar/gku1045(2015).
OpenUrl CrossRef PubMed
↵
Su, A. I., et al.A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101, 6062–6067, doi:10.1073/pnas.0400782101(2004).
OpenUrl Abstract/FREE Full Text
↵
Anderson, C. A., et al.Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature genetics 43, 246–252, doi:10.1038/ng.764(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Coronary Artery Disease Genetics, C., A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nature genetics 43, 339–344, doi:10.1038/ng.782(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Schunkert, H., et al.Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics 43, 333–338, doi:10.1038/ng.784(2011).
OpenUrl CrossRef PubMed
↵
Morris, A. P., et al.Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics 44, 981–990, doi:10.1038/ng.2383(2012).
OpenUrl CrossRef PubMed
↵
Speliotes, E. K., et al.Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature genetics 42, 937–948, doi:10.1038/ng.686(2010).
OpenUrl CrossRef PubMed Web of Science
↵
Lango Allen, H., et al.Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838, doi:10.1038/nature09410(2010).
OpenUrl CrossRef PubMed Web of Science
↵
Schizophrenia Working Group of the Psychiatric Genomics, C., Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427, doi:10.1038/nature13595(2014).
OpenUrl CrossRef PubMed Web of Science
↵
Major Depressive Disorder Working Group of the Psychiatric, G. C., et al.A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry 18, 497–511, doi:10.1038/mp.2012.21(2013).
OpenUrl CrossRef PubMed
↵
Purcell, S., et al.PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81, 559–575, doi:10.1086/519795(2007).
OpenUrl CrossRef PubMed
↵
Patterson, N., Price, A. L., & Reich, D., Population structure and eigenanalysis. PLoSgenetics 2, e190, doi:10.1371/journal.pgen.0020190(2006).
OpenUrl CrossRef PubMed
↵
Novembre, J., et al.Genes mirror geography within Europe. Nature 456, 98–101, doi:10.1038/nature07331(2008).
OpenUrl CrossRef PubMed Web of Science
↵
Aulchenko, Y. S., Ripke, S., Isaacs, A., & van Duijn, C. M., GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296, doi:10.1093/bioinformatics/btm108(2007).
OpenUrl CrossRef PubMed Web of Science
↵
International HapMap,C., et al.A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861, doi:10.1038/nature06258(2007).
OpenUrl CrossRef PubMed Web of Science
↵
Wang, J., Duncan, D., Shi, Z., & Zhang, B., WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 41, W77–83, doi:10.1093/nar/gkt439(2013).
OpenUrl CrossRef PubMed Web of Science
↵
Khan, A., & Zhang, X., dbSUPER: a database of super-enhancers in mouse and human genome. Nucleic acids research 44, D164–171, doi:10.1093/nar/gkv1002(2016).
OpenUrl CrossRef PubMed
↵
Genomes Project, C., et al.A global reference for human genetic variation. Nature 526, 68–74, doi:10.1038/nature15393(2015).
OpenUrl CrossRef PubMed
↵
Stormo, G. D., DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000).
OpenUrl CrossRef PubMed Web of Science
↵
Grant, C. E., Bailey, T. L., & Noble, W. S., FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018, doi:10.1093/bioinformatics/btr064(2011).
OpenUrl CrossRef PubMed Web of Science
↵
Gentleman, R. C., et al.Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5, R80, doi:10.1186/gb-2004-5-10-r80(2004).
OpenUrl CrossRef PubMed
↵
Team, R. C., R: A Language and Environment for Statistical Computing. Vienna, Austria (2013).
↵
Genomes Project, C., et al.An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65, doi:10.1038/nature11632(2012).
OpenUrl CrossRef PubMed Web of Science
↵
Danecek, P., et al.The variant call format and articleVCFtools. Bioinformatics 27, 2156–2158, doi:10.1093/bioinformatics/btr330(2011).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted August 23, 2016.

Download PDF

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29195)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18306)
Genetics (12245)
Genomics (16801)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60965)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] ↵
Cooper, G. S., Bynum, M. L., & Somers, E. C., Recent insights in the epidemiology of autoimmune diseases: improved prevalence estimates and understanding of clustering of diseases. Journal of autoimmunity 33, 197–207, doi:10.1016/j.jaut.2009.09.008(2009).
OpenUrl CrossRef PubMed Web of Science

[2] ↵
Hayter, S. M., & Cook, M. C., Updated assessment of the prevalence, spectrum and case definition of autoimmune disease. Autoimmunity reviews 11, 754–765, doi:10.100616/j.autrev.2012.02.001(2012).
OpenUrl CrossRef PubMed

[3] ↵
Goris, A. & Liston, A. The immunogenetic architecture of autoimmune disease. Cold Spring Harbor perspectives in biology 4, doi:10.1101/cshperspect.a007260(2012).
OpenUrl Abstract/FREE Full Text

[4] ↵
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research 42, D1001–1006, doi:10.1093/nar/gkt1229(2014).
OpenUrl CrossRef PubMed Web of Science

[5] ↵
Spain, S. L., & Barrett, J. C., Strategies for fine-mapping complex traits. Human molecular genetics, doi:10.1093/hmg/ddv260(2015).
OpenUrl CrossRef PubMed

[6] ↵
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet 43, 264–268, doi:10.1038/ng.759(2011).
OpenUrl CrossRef PubMed Web of Science

[7] ↵
Boyle, A. P., et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322, doi:10.1016/j.cell.2007.12.014(2008).
OpenUrl CrossRef PubMed Web of Science

[8] ↵
Stergachis, A. B., et al. Developmental fate and cellular maturity encoded in human regulatory DNA landscapes. Cell 154, 888–903, doi:10.1016/j.cell.2013.07.020(2013).
OpenUrl CrossRef PubMed Web of Science

[9] ↵
Thurman, R. E., et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82, doi:10.1038/nature11232(2012).
OpenUrl CrossRef PubMed Web of Science

[10] ↵
Hnisz, D., et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947, doi:10.1016/j.cell.2013.09.053(2013).
OpenUrl CrossRef PubMed Web of Science

[11] ↵
Maurano, M. T., et al.Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195, doi:10.1126/science.1222794(2012).
OpenUrl Abstract/FREE Full Text

[12] Trynka, G., et al.Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics 45, 124–130, doi:10.1038/ng.2504(2013).
OpenUrl CrossRef PubMed

[13] ↵
Finucane, H. K., et al.Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, doi:10.1038/ng.3404(2015).
OpenUrl CrossRef PubMed

[14] ↵
Gerasimova, A., et al.Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data. PloS one 8, e54359, doi:10.1371/journal.pone.0054359(2013).
OpenUrl CrossRef PubMed

[15] ↵
Smith-Garvin, J. E., Koretzky, G. A., & Jordan, M. S., T cell activation. Annu Rev Immunol 27, 591–619, doi:10.1146/annurev.immunol.021908.132706(2009).
OpenUrl CrossRef PubMed Web of Science

[16] ↵
Consortium, E. P., et al.An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74, doi:10.1038/nature11247(2012).
OpenUrl CrossRef PubMed Web of Science

[17] ↵
Roadmap Epigenomics, C., et al.Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330, doi:1038.1038/nature14248(2015).
OpenUrl CrossRef PubMed

[18] ↵
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B., Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 5, 621–628, doi:10.1038/nmeth.1226(2008).
OpenUrl CrossRef PubMed Web of Science

[19] ↵
Neph, S., et al.An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90, doi:10.1038/nature11212(2012).
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Pieper, K., Grimbacher, B., & Eibel, H., B-cell biology and development. J Allergy Clin Immunol 131, 959–971, doi:10.1016/j.jaci.2013.01.046(2013).
OpenUrl CrossRef Web of Science

[21] ↵
Zhang, N., & Bevan, M. J., CD8(+) T cells: foot soldiers of the immune system. Immunity 35, 161–168, doi:10.1016/j.immuni.2011.07.010(2011).
OpenUrl CrossRef PubMed Web of Science

[22] ↵
Zhu, J., Yamane, H., & Paul, W. E., Differentiation of effector CD4 T cell populations (*). Annu Rev Immunol 28, 445–489, doi:10.1146/annurev-immunol-030409-101212(2010).
OpenUrl CrossRef PubMed Web of Science

[23] ↵
Schork, A. J., et al.All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoSgenetics 9, e1003449, doi:10.1371/journal.pgen.1003449(2013).
OpenUrl CrossRef PubMed

[24] ↵
Pickrell, J. K., Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits. American journal of human genetics 94, 559–573, doi:10.1016/j.ajhg.2014.03.004(2014).
OpenUrl CrossRef PubMed

[25] ↵
Maston, G. A., Evans, S. K., & Green, M. R., Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7, 29–59, doi:10.1146/annurev.genom.7.080505.115623(2006).
OpenUrl CrossRef PubMed Web of Science

[26] ↵
Manel, N., Unutmaz, D., & Littman, D. R., The differentiation of human T(H)-17 cells requires transforming growth factor-beta and induction of the nuclear receptor RORgammat. Nature immunology 9, 641–649, doi:10.1038/ni.1610(2008).
OpenUrl CrossRef PubMed Web of Science

[27] ↵
Ong, C. T., & Corces, V. G., CTCF: an architectural protein bridging genome topology and function. Nature reviews. Genetics 15, 234–246, doi:10.1038/nrg3663(2014).
OpenUrl CrossRef PubMed

[28] ↵
Ashburner, M., et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29, doi:10.1038/75556(2000).
OpenUrl CrossRef PubMed Web of Science

[29] ↵
Prufer, K., et al.FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC bioinformatics 8, 41, doi:10.1186/1471-2105-8-41(2007).
OpenUrl CrossRef PubMed

[30] ↵
Wu, C., Macleod, I., & Su, A., I. BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic acids research 41, D561–565, doi:10.1093/nar/gks1114(2013).
OpenUrl CrossRef PubMed Web of Science

[31] ↵
Consortium, G. T., Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660, doi:10.1126/science.1262110(2015).
OpenUrl Abstract/FREE Full Text

[32] ↵
Gjoneska, E., et al.Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518, 365–369, doi:10.1038/nature14252(2015).
OpenUrl CrossRef PubMed

[33] ↵
Yokoyama, J. S., et al.Association Between Genetic Traits for Immune-Mediated Diseases and Alzheimer Disease. JAMA Neurol, doi:10.1001/jamaneurol.2016.0150(2016).
OpenUrl CrossRef

[34] ↵
Muller, N., & Schwarz, M. J., Immune System and Schizophrenia. Curr Immunol Rev 661, 213–220(2010).
OpenUrl

[35] ↵
Harrison, P. J., Recent genetic findings in schizophrenia and their therapeutic relevance. J Psychopharmacol 29, 85–96, doi:10.1177/0269881114553647(2015).
OpenUrl CrossRef PubMed

[36] ↵
Okada, Y., et al.Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381, doi:10.1038/nature12873(2014).
OpenUrl CrossRef PubMed Web of Science

[37] ↵
Consortium, U. I. G., et al.Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nature genetics 41, 1330–1334, doi:10.1038/ng.483(2009).
OpenUrl CrossRef PubMed Web of Science

[38] ↵
Franke, A., et al.Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nature genetics 42, 1118–1125, doi:10.1038/ng.717(2010).
OpenUrl CrossRef PubMed Web of Science

[39] ↵
International Multiple Sclerosis Genetics, C., et al.Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219, doi:10.1038/nature10251(2011).
OpenUrl CrossRef PubMed Web of Science

[40] ↵
Bentham, J., et al.Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nature genetics, doi:10.1038/ng.3434(2015).
OpenUrl CrossRef PubMed

[41] ↵
Wellcome Trust Case Control C., Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678, doi:10.1038/nature05911(2007).
OpenUrl CrossRef PubMed Web of Science

[42] ↵
Degner, J. F., et al.DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394, doi:10.1038/nature10808(2012).
OpenUrl CrossRef PubMed Web of Science

[43] ↵
Shannon, P., et al.Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13, 2498–2504, doi:10.1101/gr.1239303(2003).
OpenUrl Abstract/FREE Full Text

[44] ↵
Kanehisa, M., & Goto, S., KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30(2000).
OpenUrl CrossRef PubMed Web of Science

[45] ↵
Rawlings, J. S., Rosler, K. M., & Harrison, D. A., The JAK/STAT signaling pathway. J Cell Sci 117, 1281–1283, doi:10.1242/jcs.00963(2004).
OpenUrl FREE Full Text

[46] ↵
Brownlie, R. J., & Zamoyska, R., T cell receptor signalling networks: branched, diversified and bounded. Nature reviews. Immunology 13, 257–269, doi:10.1038/nri3403(2013).
OpenUrl CrossRef PubMed

[47] ↵
Villarino, A. V., Kanno, Y., Ferdinand, J. R., & O’Shea, J. J., Mechanisms of Jak/STAT signaling in immunity and disease. J Immunol 194, 21–27, doi:10.4049/jimmunol.1401867(2015).
OpenUrl Abstract/FREE Full Text

[48] ↵
Quigley, M., Huang, X., & Yang, Y., STAT1 signaling in CD8 T cells is required for their clonal expansion and memory formation following viral infection in vivo. J Immunol 180, 2158–2164(2008).
OpenUrl Abstract/FREE Full Text

[49] ↵
Spalinger, M. R., et al.PTPN2 controls differentiation of CD4(+) T cells and limits intestinal inflammation and intestinal dysbiosis. Mucosal Immunol 8, 918–929, doi:10.1038/mi.2014.122(2015).
OpenUrl CrossRef PubMed

[50] ↵
Korn, T., Bettelli, E., Oukka, M., & Kuchroo, V. K., IL-17 and Th17 Cells. Annu Rev Immunol 27, 485–517, doi:10.1146/annurev.immunol.021908.132710(2009).
OpenUrl CrossRef PubMed Web of Science

[51] ↵
Koch, M. A., et al. T-bet(+) Treg cells undergo abortive Th1 cell differentiation due to impaired expression of IL-12 receptor beta2. Immunity 37, 501–510, doi:10.1016/j.immuni.2012.05.031(2012).
OpenUrl CrossRef PubMed

[52] ↵
Metcalfe, S. M., LIF in the regulation of T-cell fate and as a potential therapeutic. Genes Immun 12, 157–168, doi:10.1038/gene.2011.9(2011).
OpenUrl CrossRef PubMed

[53] ↵
Pallandre, J. R., et al.Role of STAT3 in CD4+CD25+FOXP3+ regulatory lymphocyte generation: implications in graft-versus-host disease and antitumor immunity. J Immunol 179, 7593–7604(2007).
OpenUrl Abstract/FREE Full Text

[54] ↵
Wei, L., Laurence, A., Elias, K. M., & O'Shea, J. J., IL-21 is produced by Th17 cells and drives IL-17 production in a STAT3-dependent manner. J Biol Chem 282, 34605–34610, doi:10.1074/jbc.M705100200(2007).
OpenUrl Abstract/FREE Full Text

[55] ↵
Parham, C., et al.A receptor for the heterodimeric cytokine IL-23 is composed of IL-12Rbeta1 and a novel cytokine receptor subunit, IL-23R. J Immunol 168, 5699–5708(2002).
OpenUrl Abstract/FREE Full Text

[56] Yamamoto, K., et al.Physical interaction between interleukin-12 receptor beta 2 subunit and Jak2 tyrosine kinase: Jak2 associates with cytoplasmic membrane-proximal region of interleukin-12 receptor beta 2 via amino-terminus. Biochem Biophys Res Commun 257, 400–404, doi:10.1006/bbrc.1999.0479(1999).
OpenUrl CrossRef PubMed Web of Science

[57] ↵
Ali, S., Nouhi, Z., Chughtai, N., & Ali, S., SHP-2 regulates SOCS-1-mediated Janus kinase-2 ubiquitination/degradation downstream of the prolactin receptor. J Biol Chem 278, 52021–52031, doi:10.1074/jbc.M306758200(2003).
OpenUrl Abstract/FREE Full Text

[58] ↵
Ma, C. S., Deenick, E. K., Batten, M., & Tangye, S. G., The origins, function, and regulation of T follicular helper cells. J Exp Med 209, 1241–1253, doi:10.1084/jem.20120994(2012).
OpenUrl Abstract/FREE Full Text

[59] ↵
Lien, C., et al.Critical role of IRF-5 in regulation of B-cell differentiation. Proceedings of the National Academy of Sciences of the United States of America 107, 4664–4668, doi:10.1073/pnas.0911193107(2010).
OpenUrl Abstract/FREE Full Text

[60] ↵
Dymecki, S. M., Niederhuber, J. E., & Desiderio, S. V., Specific expression of a tyrosine kinase gene, blk, in B lymphoid cells. Science 247, 332–336(1990).
OpenUrl Abstract/FREE Full Text

[61] ↵
John, L. B., & Ward, A. C., The Ikaros gene family: transcriptional regulators of hematopoiesis and immunity. Mol Immunol 48, 1272–1278, doi:10.1016/j.molimm.2011.03.006(2011).
OpenUrl CrossRef PubMed

[62] ↵
Zollner, S., & Pritchard, J. K., Overcoming the winner's curse: estimating penetrance parameters from case-control data. American journal of human genetics 80, 605–615, doi:10.1086/512821(2007).
OpenUrl CrossRef PubMed

[63] ↵
Bush, W. S., & Moore, J. H., Chapter 11: Genome-wide association studies. PLoS computational biology 8, e1002822, doi:10.1371/journal.pcbi.1002822(2012).
OpenUrl CrossRef

[64] ↵
consortium, C., Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591, doi:10.1038/nature14659(2015).
OpenUrl CrossRef PubMed

[65] Gudmundsson, J., et al.A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nature genetics 44, 1326–1329, doi:10.1038/ng.2437(2012).
OpenUrl CrossRef PubMed

[66] Morrison, A. C., et al.Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nature genetics 45, 899–901, doi:10.1038/ng.2671(2013).
OpenUrl CrossRef PubMed

[67] Palles, C., et al.Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature genetics 45, 136–144, doi:10.1038/ng.2503(2013).
OpenUrl CrossRef PubMed

[68] Sidore, C., et al.Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nature genetics 47, 1272–1281, doi:10.1038/ng.3368(2015).
OpenUrl CrossRef PubMed

[69] Vrieze, S. I., et al.In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes. Psychophysiology 51, 1309–1320, doi:10.1111/psyp.12350(2014).
OpenUrl CrossRef

[70] ↵
Zoledziewska, M., et al.Height-reducing variants and selection for short stature in Sardinia. Nature genetics 47, 1352–1356, doi:10.1038/ng.3403(2015).
OpenUrl CrossRef PubMed

[71] ↵
Nica, A. C., & Dermitzakis, E. T., Expression quantitative trait loci: present and future. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368, 20120362, doi:10.1098/rstb.2012.0362(2013).
OpenUrl CrossRef

[72] ↵
Edgar, R., Domrachev, M., & Lash, A. E., Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research 30, 207–210(2002).
OpenUrl CrossRef PubMed Web of Science

[73] ↵
Langmead, B., & Salzberg, S. L., Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357–359, doi:10.1038/nmeth.1923(2012).
OpenUrl CrossRef PubMed Web of Science

[74] ↵
Machanick, P., & Bailey, T. L., MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697, doi:10.1093/bioinformatics/btr189(2011).
OpenUrl CrossRef PubMed Web of Science

[75] ↵
Bailey, T. L., & Elkan, C., Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28–36(1994).
OpenUrl PubMed

[76] ↵
Bailey, T. L., DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659, doi:10.1093/bioinformatics/btr261(2011).
OpenUrl CrossRef PubMed Web of Science

[77] ↵
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L., & Noble, W. S., Quantifying similarity between motifs. Genome biology 8, R24, doi:10.1186/gb-2007-8-2- r24(2007).
OpenUrl CrossRef PubMed

[78] ↵
Mathelier, A., et al.JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic acids research 42, D142–147, doi:10.1093/nar/gkt997(2014).
OpenUrl CrossRef PubMed Web of Science

[79] ↵
Hume, M. A., Barrera, L. A., Gisselbrecht, S. S., & Bulyk, M. L., UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic acids research 43, D117–122, doi:10.1093/nar/gku1045(2015).
OpenUrl CrossRef PubMed

[80] ↵
Su, A. I., et al.A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101, 6062–6067, doi:10.1073/pnas.0400782101(2004).
OpenUrl Abstract/FREE Full Text

[81] ↵
Anderson, C. A., et al.Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature genetics 43, 246–252, doi:10.1038/ng.764(2011).
OpenUrl CrossRef PubMed Web of Science

[82] ↵
Coronary Artery Disease Genetics, C., A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nature genetics 43, 339–344, doi:10.1038/ng.782(2011).
OpenUrl CrossRef PubMed Web of Science

[83] ↵
Schunkert, H., et al.Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics 43, 333–338, doi:10.1038/ng.784(2011).
OpenUrl CrossRef PubMed

[84] ↵
Morris, A. P., et al.Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics 44, 981–990, doi:10.1038/ng.2383(2012).
OpenUrl CrossRef PubMed

[85] ↵
Speliotes, E. K., et al.Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature genetics 42, 937–948, doi:10.1038/ng.686(2010).
OpenUrl CrossRef PubMed Web of Science

[86] ↵
Lango Allen, H., et al.Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838, doi:10.1038/nature09410(2010).
OpenUrl CrossRef PubMed Web of Science

[87] ↵
Schizophrenia Working Group of the Psychiatric Genomics, C., Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427, doi:10.1038/nature13595(2014).
OpenUrl CrossRef PubMed Web of Science

[88] ↵
Major Depressive Disorder Working Group of the Psychiatric, G. C., et al.A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry 18, 497–511, doi:10.1038/mp.2012.21(2013).
OpenUrl CrossRef PubMed

[89] ↵
Purcell, S., et al.PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81, 559–575, doi:10.1086/519795(2007).
OpenUrl CrossRef PubMed

[90] ↵
Patterson, N., Price, A. L., & Reich, D., Population structure and eigenanalysis. PLoSgenetics 2, e190, doi:10.1371/journal.pgen.0020190(2006).
OpenUrl CrossRef PubMed

[91] ↵
Novembre, J., et al.Genes mirror geography within Europe. Nature 456, 98–101, doi:10.1038/nature07331(2008).
OpenUrl CrossRef PubMed Web of Science

[92] ↵
Aulchenko, Y. S., Ripke, S., Isaacs, A., & van Duijn, C. M., GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296, doi:10.1093/bioinformatics/btm108(2007).
OpenUrl CrossRef PubMed Web of Science

[93] ↵
International HapMap,C., et al.A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861, doi:10.1038/nature06258(2007).
OpenUrl CrossRef PubMed Web of Science

[94] ↵
Wang, J., Duncan, D., Shi, Z., & Zhang, B., WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 41, W77–83, doi:10.1093/nar/gkt439(2013).
OpenUrl CrossRef PubMed Web of Science

[95] ↵
Khan, A., & Zhang, X., dbSUPER: a database of super-enhancers in mouse and human genome. Nucleic acids research 44, D164–171, doi:10.1093/nar/gkv1002(2016).
OpenUrl CrossRef PubMed

[96] ↵
Genomes Project, C., et al.A global reference for human genetic variation. Nature 526, 68–74, doi:10.1038/nature15393(2015).
OpenUrl CrossRef PubMed

[97] ↵
Stormo, G. D., DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000).
OpenUrl CrossRef PubMed Web of Science

[98] ↵
Grant, C. E., Bailey, T. L., & Noble, W. S., FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018, doi:10.1093/bioinformatics/btr064(2011).
OpenUrl CrossRef PubMed Web of Science

[99] ↵
Gentleman, R. C., et al.Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5, R80, doi:10.1186/gb-2004-5-10-r80(2004).
OpenUrl CrossRef PubMed

[100] ↵
Team, R. C., R: A Language and Environment for Statistical Computing. Vienna, Austria (2013).

[101] ↵
Genomes Project, C., et al.An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65, doi:10.1038/nature11632(2012).
OpenUrl CrossRef PubMed Web of Science

[102] ↵
Danecek, P., et al.The variant call format and articleVCFtools. Bioinformatics 27, 2156–2158, doi:10.1093/bioinformatics/btr330(2011).
OpenUrl CrossRef PubMed Web of Science