Contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies

Kushal K. Dey; Steven Gazal; Bryce van de Geijn; Samuel Sungil Kim; Joseph Nasser; Jesse M. Engreitz; Alkes L. Price

doi:10.1101/2020.09.02.279059

Abstract

Gene regulation is known to play a fundamental role in human disease, but mechanisms of regulation vary greatly across genes. Here, we explore the contributions to disease of two types of genes: genes whose regulation is driven by enhancer regions as opposed to promoter regions (Enhancer-driven) and genes that regulate many other genes in trans (Master-regulator). We link these genes to SNPs using a comprehensive set of SNP-to-gene (S2G) strategies and apply stratified LD score regression to the resulting SNP annotations to draw three main conclusions about 11 autoimmune diseases and blood cell traits (average N_case=13K across 6 autoimmune diseases, average N =443K across 5 blood cell traits). First, several characterizations of Enhancer-driven genes defined in blood using functional genomics data (e.g. ATAC-seq, RNA-seq, PC-HiC) are conditionally informative for autoimmune disease heritability, after conditioning on a broad set of regulatory annotations from the baseline-LD model. Second, Master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating Enhancer-driven and Master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2x stronger conditional signal (maximum standardized SNP annotation effect size (τ*) = 2.0 (s.e. 0.3) vs. 0.91 (s.e. 0.21)), and >2x stronger gene-level enrichment for approved autoimmune disease drug targets (5.3x vs. 2.1x), as compared to the recently proposed Enhancer Domain Score (EDS). In each case, using functionally informed S2G strategies to link genes to SNPs that may regulate them produced much stronger disease signals (4.1x-13x larger τ* values) than conventional window-based S2G strategies. We conclude that our characterizations of Enhancer-driven and Master-regulator genes identify gene sets that are important for autoimmune disease, and that combining those gene sets with functionally informed S2G strategies enables us to identify SNP annotations in which disease heritability is concentrated.

1 Introduction

Disease risk variants associated with complex traits and diseases predominantly lie in non-coding regulatory regions of the genes, motivating the need to assess the relative importance of genes for disease through the lens of gene regulation^1–6. Several recent studies have performed disease-specific gene-level prioritization by integrating GWAS summary statistics data with functional genomics data, including gene expression and gene networks^7–14. Here, we investigate the contribution to autoimmune disease of gene sets reflecting two specific aspects of gene regulation in blood—genes with strong evidence of enhancer-driven regulation (Enhancer-driven) and genes that regulate many other genes (Master-regulator); previous studies suggested that both of these characterizations are important for understanding human disease^{9, 15–24}. For example, several common non-coding variants associated with Hirschsprung disease have been identified in intronic enhancer elements of RET gene and have been shown to synergistically regulate its expression^{25, 26} and NLRC5 acts as a master regulator of MHC class genes in immune response²⁷. Our two main goals are to characterize which types of genes are important for autoimmune disease, and to construct SNP annotations derived from those genes that are conditionally informative for disease heritability, conditional on all other annotations.

A major challenge in gene-level analyses of disease is to link genes to SNPs that may regulate them, a prerequisite to integrative analyses of GWAS summary statistics. Previous studies have often employed window-based strategies such as 100kb^{8, 9, 11}, linking each gene to all SNPs within 100kb; however, this approach lacks specificity. Here, we incorporated functionally informed SNP-to-gene (S2G) linking strategies that capture both distal and proximal components of gene regulation. We evaluated the resulting SNP annotations by applying stratified LD score regression²⁸ (S-LDSC) conditional on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model^{29, 30}, meta-analyzing the results across 11 autoimmune diseases and blood cell traits; we focused on autoimmune diseases and blood cell traits because the functional data underlying the gene scores and S2G strategies that we analyze is primarily measured in blood. We also assessed gene-level enrichment for disease-related gene sets, including approved drug targets for autoimmune disease¹⁰.

Results

Overview of methods

We define an annotation as an assignment of a numeric value to each SNP with minor allele count ≥5 in a 1000 Genomes Project European reference panel³¹, as in our previous work²⁸; we primarily focus on annotations with values between 0 and 1. We define a gene score as an assignment of a numeric value between 0 and 1 to each gene; gene scores predict the relevance of each gene to disease. We primarily focus on binary gene sets defined by the top 10% of genes; we made this choice to be consistent with ref.⁹, and to ensure that all resulting SNP annotations (gene scores x S2G strategies; see below) were of reasonable size (0.2% of SNPs or larger). We consider 11 gene scores prioritizing enhancer-driven genes, master-regulator genes, and genes with high network connectivity to enhancer-driven or master-regulator genes (Table 1, Supplementary Figure S1); these gene scores were only mildly correlated (average r= 0.08, Supplementary Figure S2). We considered enhancer-driven and master-regulator genes because previous studies suggested that both of these characterizations are important for understanding human disease^{9, 15–24}.

View this table:

Table 1.

List of 11 gene scores: For each gene score, including (A) 7 Enhancer-driven genes scores (red font), (B) 2 Master-regulator gene scores (blue font) and (C) 2 PPI-network informed gene scores (corresponding to Enhancer-driven and Master-regulator gene scores), we provide a brief description and report its size (average gene score across 22,020 genes; equal to % of genes for binary gene scores). Gene scores are listed alphabetically within each category. All gene scores are binary except ATAC-distal and eQTL-CTS, which are probabilistic. Density plots of the distribution of the metric underlying each gene score are provided in Figure S1. Further details are provided in the Methods section.

We define a SNP-to-gene (S2G) linking strategy as an assignment of 0, 1 or more linked genes to each SNP. We consider 10 S2G strategies capturing both distal and proximal gene regulation (see Methods, Figure 1A and Table 2); these S2G strategies aim to link SNPs to genes that they regulate. For each gene score X and S2G strategy Y, we define a corresponding combined annotation X × Y by assigning to each SNP the maximum gene score among genes linked to that SNP (or 0 for SNPs with no linked genes); this generalizes the standard approach of constructing annotations from gene scores using window-based strategies^{8, 9}. For example, EDS-binary × ABC annotates SNPs linked by Activity-By-Contact enhancer-gene links^{32, 33} to any gene from the EDS-binary gene set, whereas EDS-binary × 100kb annotates all SNPs within 100kb of any gene from the EDS-binary gene set. We considered combined annotations based on S2G strategies related to gene regulation because SNPs that regulate functionally important genes may be important for disease. For each S2G strategy, we also define a corresponding binary S2G annotation defined by SNPs linked to the set of all genes. We have publicly released all gene scores, S2G links, and annotations analyzed in this study (see URLs). We have also included annotations for 93 million Haplotype Reference Consortium (HRC) SNPs³⁴ (MAF > 0.1% in imputed UK Biobank data³⁵) and 170 million TOPMed SNPs³⁶ (Freeze 3A).

Figure 1.

Illustration of S2G strategies and gene scores: (A) SNP annotations defined by integration of genes in gene set with proximal (close to gene body) and distal S2G strategies. (B) Examples of approaches used to define Enhancer-driven genes. (C) A Trans-master gene regulates multiple distal genes, via a cis-eQTL that is a trans-eQTL of the distal genes. (D) PPI-Enhancer genes have high connectivity to Enhancer-driven genes in a PPI network.

View this table:

Table 2.

List of 10 S2G strategies: For each S2G strategy, we provide a brief description, indicate whether the S2G strategy prioritizes distal or proximal SNPs relative to the gene, and report its size (% of SNPs linked to genes). S2G strategies are listed in order of increasing size. Further details are provided in the Methods section.

We assessed the informativeness of the resulting annotations for disease heritability by applying stratified LD score regression (S-LDSC)²⁸ to 11 independent blood-related traits (6 autoimmune diseases and 5 blood cell traits; average N_case=13K for autoimmune diseases and N =443K for blood cell traits, Supplementary Table S1) and meta-analyzing S-LDSC results across traits; we also assessed results meta-analyzed across autoimmune diseases or blood cell traits only, as well as results for individual diseases/traits. We conditioned on 86 coding, conserved, regulatory and LD-related annotations from the baseline-LD model (v2.1)^{29, 30} (see URLs). S-LDSC uses two metrics to evaluate informativeness for disease heritability: enrichment and standardized effect size (τ*). Enrichment is defined as the proportion of heritability explained by SNPs in an annotation divided by the proportion of SNPs in the annotation²⁸, and generalizes to annotations with values between 0 and 1³⁷. Standardized effect size (τ*) is defined as the proportionate change in per-SNP heritability associated with a 1 standard deviation increase in the value of the annotation, conditional on other annotations included in the model²⁹; unlike enrichment, τ* quantifies effects that are conditionally informative, i.e. unique to the focal annotation conditional on other annotations included in the model. In our “marginal” analyses, we estimated τ* for each focal annotation conditional on the baseline-LD annotations. In our “joint” analyses, we merged baseline-LD annotations with focal annotations that were marginally significant after Bonferroni correction and performed forward stepwise elimination to iteratively remove focal annotations that had conditionally non-significant τ* values after Bonferroni correction, as in ref.^{11, 29, 37–42}. We did not consider other feature selection methods, as previous research determined that a LASSO-based feature selection method is computationally expensive and did not perform better in predicting off-chromosome χ² association statistics (R. Cui and H. Finucane, personal correspondence). The difference between marginal τ^* and joint τ^* is that marginal τ* assesses informativeness for disease conditional only on baseline-LD model annotations, whereas joint τ^* assesses informativeness for disease conditional on baseline-LD model annotations as well as other annotations in the joint model.

As a preliminary assessment of the potential of the 10 S2G strategies, we considered the 10 S2G annotations defined by SNPs linked to the set of all genes. The S2G annotations were only weakly positively correlated (average r = 0.09; Supplementary Figure S3). We analyzed the 10 S2G annotations via a marginal analysis, running S-LDSC²⁸ conditional on the baseline-LD model and meta-analyzing the results across the 11 blood-related traits. In the marginal analysis, all 10 S2G annotations were significantly enriched for disease heritability, with larger enrichments for smaller annotations (Figure 2A and Supplementary Table S2); values of standardized enrichment (defined as enrichment scaled by the standard deviation of the annotation¹¹) were more similar across annotations (Supplementary Figure S4 and Supplementary Table S3). 7 S2G annotations attained conditionally significant τ^* values after Bonferroni correction (p < 0.05/10) (Figure 2B and Supplementary Table S2). In the joint analysis, 3 of these 7 S2G annotations were jointly significant: TSS (joint τ* = 0.97), Roadmap (joint τ* = 0.84) and Activity-by-Contact (ABC) (joint τ* = 0.44) (Figure 2B and Supplementary Table S4). This suggests that these 3 S2G annotations are highly informative for disease. Subsequent analyses were conditioned on the baseline-LD+ model defined by 86 baseline-LD model annotations plus all S2G annotations (except Coding, TSS and Promoter, which were already part of the baseline-LD model), to ensure that conditionally significant τ^* values for (gene scores x S2G strategies) annotations are specific to the gene scores and cannot be explained by (all genes x S2G strategies) annotations. Accordingly, we confirmed that (random genes x S2G strategies) annotations did not produce conditionally significant τ* values for any S2G strategy (Supplementary Table S5).

Figure 2.

Disease informativeness of S2G annotations: We evaluated 10 S2G annotations, defined from the corresponding S2G strategies by SNPs linked to the set of all genes. (A) Heritabilty enrichment (log scale), conditional on the baseline-LD model. Horizontal line denotes no enrichment. (B) Standardized effect size (τ*), conditional on either the baseline-LD model (marginal analyses: left column, white) or the baseline-LD+ model, which includes all 10 S2G annotations (right column, dark shading). Results are meta-analyzed across 11 blood-related traits. ** denotes P < 0.05/10. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Tables S2 and S4.

We validated the gene scores implicated in our study by investigating whether they were enriched in 5 “gold-standard” disease-related gene sets: 195 approved drug target genes for autoimmune diseases^{10, 43}; 550 Mendelian genes related to immune dysregulation⁴⁴, 390 Mendelian genes related to blood disorders⁴⁵, 146 “Bone Marrow/Immune” genes defined by the Developmental Disorders Database/Genotype-Phenotype Database (DDD/G2P)⁴⁶, and 2200 (top 10%) high-pLI genes⁴⁷ (Figure 3C and Supplementary Table S6). (We note that the high-pLI genes should not be viewed as a strict gold standard as not all of these genes are disease-related, but 30% of these genes have an established human disease phenotype⁴⁷.)

Figure 3.

Disease informativeness of Enhancer-driven and PPI-enhancer annotations: We evaluated 80 annotations constructed by combining 7 Enhancer-driven + 1 PPI-enhancer gene scores with 10 S2G strategies. (A) Standardized effect size (τ*), conditional on the baselineLD+ model. (B) Comparison of meta-analyzed standardized effect size (τ*) across 6 autoimmune diseases vs. 5 blood cell traits. (C) Enrichment of Enhancer-driven and PPI-enhancer genes in 5 “gold standard” disease-related gene sets. (D) Standardized effect size (τ*), conditional on the baseline-LD+ model plus 7 jointly significant Enhancer-driven + PPI-enhancer annotations. In panels A and D, results are meta-analyzed across 11 blood-related traits. In panels A and C, ** denotes Bonferroni-significant (P < 0.05/110 in Panel A and P < 0.05/55 in Panel C) and * denotes FDR < 0.05. In panel A, the black box in each row denotes the S2G strategy with highest τ*. In Panel B, circled dots denote annotations with significant (FDR<5%) difference in effect size between the two meta-analyses, the solid line denotes y=x, and the dashed line denotes the regression slope. We report the slope of the regression and the Pearson correlation for enhancer-driven and PPI-enhancer annotations (slope=1.3, r=0.57 for enhancer-driven annotations only). Numerical results are reported in Supplementary Tables S8, S10, S11, S6, S23 and S26.

Subsequent subsections are organized in the following order: description of gene scores for that subsection; marginal analyses using S-LDSC; joint analyses using S-LDSC; and validation of the genes scores implicated in our study using “gold-standard” disease-related gene sets.

Enhancer-driven genes are conditionally informative for autoimmune disease heritability

We assessed the disease informativeness of 7 gene scores prioritizing Enhancer-driven genes in blood. We defined these gene scores based on distal enhancer-gene connections, tissue-specific expression, or tissue-specific eQTL, all of which can characterize enhancer-driven regulation (Figure 1B, Table 1 and Methods). Some of these gene scores were derived from the same functional data that we used to define S2G strategies (e.g. ABC^{32, 33} and ATAC-seq⁴⁸; see URLs). We included two published gene scores, (binarized) blood-specific enhancer domain score (EDS)²⁴ and specifically expressed genes in GTEx whole blood⁹ (SEG-GTEx). We use the term “enhancer-driven” to broadly describe gene scores with high predicted functionality under a diverse set of metrics, notwithstanding the fact that all genes require the activation of enhancers and their promoters. 4 of our enhancer-driven gene scores (ABC-G, ATAC-distal, EDS-binary, PC-HiC) were explicitly defined based on distal enhancer-gene connections. Using the established EDS-binary (derived from the published Enhancer Domain Score (EDS)²⁴) as a point of reference, we determined that the other 3 gene scores had an average excess overlap of 1.7x with the EDS-binary score (P-values per gene score: 2e-08 to 6e-06; Supplementary Table S7), confirming that they prioritize enhancer-driven genes. 3 of our enhancer-driven scores (eQTL-CTS, Expecto-MVP, SEG-GTEx) were not explicitly defined based on distal enhancer-gene connections. We determined that these 3 gene scores had an average excess overlap of 1.5x with the EDS-binary score (P-values per gene score: 4e-07 to 1e-04; Supplementary Table S7)) confirming that they prioritize enhancer-driven genes; notably, the excess overlap of 1.5x was almost as large as the excess of overlap of 1.7x for gene scores defined based on distal enhancer-gene connections.

We combined the 7 Enhancer-driven gene scores with the 10 S2G strategies (Table 2) to define 70 annotations. In our marginal analysis using S-LDSC conditional on the baseline-LD+ model (meta-analyzing S-LDSC results across 11 autoimmune diseases and blood cell traits), all 70 enhancer-driven annotations were significantly enriched for disease heritability, with larger enrichments for smaller annotations (Supplementary Figure S5 and Supplementary Table S8); values of standardized enrichment were more similar across annotations (Supplementary Figure S6 and Supplementary Table S9). 37 of the 70 enhancer-driven annotations attained conditionally significant τ* values after Bonferroni correction (p < 0.05/110) (Figure 3A and Supplementary Table S8). We observed the strongest conditional signal for ATAC-distal ABC (τ* = 1.0 0.2). ATAC-distal is defined by the proportion of mouse gene expression variability across blood cell types that is explained by distal ATAC-seq peaks in mouse⁴⁸; the mouse genes are mapped to orthologous human genes. 4 of the 7 gene scores (ABC-G, ATAC-distal, EDS-binary and SEG-GTEx) produced strong conditional signals across many S2G strategies; however none of them attained Bonferroni-significant τ* for all 10 S2G strategies (Figure 3A). Among the S2G strategies, average conditional signals were strongest for the ABC strategy (average τ* = 0.59) and TSS strategy (average τ* = 0.52), which greatly outperformed the window-based S2G strategies (average τ* = 0.04-0.07), emphasizing the high added value of S2G strategies incorporating functional data (especially the ABC and TSS strategies).

We compared meta-analyses of S-LDSC results across 6 autoimmune diseases vs. 5 blood cell traits(Figure 3B, Supplementary Figure S7, Supplementary Table S1, Supplementary Table S10, and Supplementary Table S11). Results were broadly concordant (r = 0.57 between τ* estimates), with slightly stronger signals for autoimmune diseases (slope=1.3). We also compared meta-analyses of results across 2 granulocyte-related blood cell traits vs. 3 red blood cell or platelet-related blood cell traits (Supplementary Figure S8, Supplementary Table S12, and Supplementary Table S13). Results were broadly concordant (r = 0.65, slope = 1.1). We also examined S-LDSC results for individual disease/traits and applied a test for heterogeneity⁴⁹ (Supplementary Figure S9, Supplementary Figure S10, Supplementary Table S14, Supplementary Table S15). Results were generally underpowered (FDR<5% for 16 of 770 annotation-trait pairs), with limited evidence of heterogeneity across diseases/traits (FDR<5% for 11 of 70 annotations).

We jointly analyzed the 37 enhancer-driven annotations that were Bonferroni-significant in our marginal analysis (Figure 3A and Supplementary Table S8) by performing forward stepwise elimination to iteratively remove annotations that had conditionally non-significant τ^* values after Bonferroni correction. Of these, 6 annotations were jointly significant in the resulting Enhancer-driven joint model (Supplementary Figure S11 and Supplementary Table S16), corresponding to 4 Enhancer-driven gene scores: ABC-G, ATAC-distal, EDS-binary and SEG-GTEx.

We assessed the enrichment of the 7 Enhancer-driven gene scores (Table 1) in 5 “gold-standard” disease-related gene sets (drug target genes^{10, 43}, Mendelian genes (Freund)⁴⁴, Mendelian genes (Vuckovic)⁴⁵, immune genes⁴⁶, and high-pLI genes⁴⁷) (Figure 3C and Supplementary Table S6). 6 of the 7 gene scores were significantly enriched (after Bonferroni correction; p < 0.05/55) in the drug target genes, all 7 were significantly enriched in both Mendelian gene sets, 3 of 7 were significantly enriched in the immune genes, and 5 of 7 were significantly enriched in the high-pLI genes. The largest enrichment was observed for SEG-GTEx genes in the drug target genes (2.4x, s.e. 0.1) and Mendelian genes (Freund) (2.4x, s.e. 0.1). These findings validate the high importance to disease of enhancer-driven genes..

We performed 5 secondary analyses. First, for each of the 6 annotations from the Enhancer-driven joint model (Supplementary Figure S11), we assessed their functional enrichment for fine-mapped SNPs for blood-related traits from two previous studies^{50, 51}. We observed large and significant enrichments for all 6 annotations (Supplementary Table S17), consistent with the S-LDSC results. Second, for each of the 7 Enhancer-driven gene scores, we performed pathway enrichment analyses to assess their enrichment in pathways from the ConsensusPathDB database⁵²; all 7 gene scores were significantly enriched in immune-related and signaling pathways (Supplementary Table S18). Third, we explored other approaches to combining information across genes that are linked to a SNP using S2G strategies, by using either the mean across genes or the sum across genes of the gene scores linked to a SNP, instead of the maximum across genes. We determined that results for either the mean or the sum were very similar to the results for the maximum, with no significant difference in standardized effect sizes of the resulting SNP annotations (Supplementary Table S8, Supplementary Table S19, Supplementary Table S20). Fourth, we repeated our analyses of the 5 enhancer-driven gene scores for which the top 10% (of genes) threshold was applied, using top 5% or top 20% thresholds instead (Supplementary Table S21 and Supplementary Table S22). We observed very similar results, with largely non-significant differences in standardized effect sizes. Fifth, we confirmed that our forward stepwise elimination procedure produced identical results when applied to all 70 enhancer-driven annotations, instead of just the 37 enhancer-driven annotations that were Bonferroni-significant in our marginal analysis.

We conclude that 4 of the 7 characterizations of Enhancer-driven genes are conditionally informative for autoimmune diseases and blood-related traits when using functionally informed S2G strategies.

Genes with high network connectivity to Enhancer-driven genes are even more informative

We assessed the disease informativeness of a gene score prioritizing genes with high connectivity to Enhancer-driven genes in a protein-protein interaction (PPI) network (PPI-enhancer).We hypothesized that (i) genes that are connected to Enhancer-driven genes in biological networks are likely to be important, and that (ii) combining potentially noisy metrics defining enhancer-driven genes would increase statistical signal. We used the STRING PPI network⁵³ to quantify the network connectivity of each gene with respect to each of the 4 jointly informative Enhancer-driven gene scores from Supplementary Figure S11 (ABC-G, ATAC-distal, EDS-binary and SEG-GTEx) (Figure 1D). Network connectivity scores were computed using a random walk with restart algorithm^{10, 54} (see Methods). We defined the PPI-enhancer gene score based on genes in the top 10% of average network connectivity across the 4 Enhancer-driven gene scores (Table 1). The PPI-enhancer gene score was only moderately positively correlated with the 4 underlying Enhancer-driven gene scores (average r=0.28; Supplementary Figure S2).

We combined the PPI-enhancer gene score with the 10 S2G strategies (Table 2) to define 10 annotations. In our marginal analysis using S-LDSC (meta-analyzing S-LDSC results across 11 autoimmune diseases and blood cell traits), all 10 PPI-enhancer annotations were significantly enriched for disease heritability, with larger enrichments for smaller annotations (Supplementary Figure S5 and Supplementary Table S23); values of standardized enrichment were more similar across annotations (Supplementary Figure S6 and Supplementary Table S24). All 10 PPI-enhancer annotations attained conditionally significant τ^* values after Bonferroni correction (p < 0.05/110) (Figure 3A and Supplementary Table S23). Notably, the maximum τ^* (2.0 (s.e. 0.3) for PPI-enhancer x ABC) was >2x larger than the maximum τ* for the recently proposed Enhancer Domain Score²⁴ (EDS) (0.91 (s.e. 0.21) for EDS-binary x ABC). All 10 PPI-enhancer annotations remained significant when conditioned on the Enhancer-driven joint model from Supplementary Figure S11 (Supplementary Table S25). In a comparison of meta-analyses of S-LDSC results across 5 blood cell traits vs. 6 autoimmune diseases, results were broadly concordant (r = 0.93 between τ^* estimates), but with much stronger signals for autoimmune diseases (slope=2.2) (Figure 3B, Supplementary Figure S7, Supplementary Table S10, and Supplementary Table S11). In a comparison of meta-analyses across 2 granulocyte-related blood cell traits vs. 3 red blood cell or platelet-related blood cell traits, results were broadly concordant (r = 0.83), but with much stronger signals for granulocyte-related blood cell traits (slope = 2.1) (Supplementary Figure S8 and Supplementary Tables S12, S13). In analyses of individual traits, 62 of 110 PPI-enhancer annotation-trait pairs were significant (FDR<5%) (Supplementary Figure S9, Supplementary Figure S10, Supplementary Table S14), 8 of with evidence of heterogeneity across diseases/traits (FDR<5% for 8 of 10 PPI-enhancer annotations) (Supplementary Table S15).

We jointly analyzed the 6 Enhancer-driven annotations from the Enhancer-driven joint model (Supplementary Figure S11) and the 10 marginally significant PPI-enhancer annotations conditional on the enhancer-driven joint model in Supplementary Table S25. Of these, 3 Enhancer-driven and 4 PPI-enhancer annotations were jointly significant in the resulting PPI-enhancer-driven joint model (Figure 3D and Supplementary Table S26). The joint signal was strongest for PPI-enhancer × ABC (τ* = 1.2±0.21), highlighting the informativeness of the ABC S2G strategy. 3 of the 7 annotations attained τ* > 0.5; annotations with τ* > 0.5 are unusual, and considered to be important³⁹.

We assessed the enrichment of the PPI-enhancer gene score in the 5 “gold-standard” disease-related gene sets (drug target genes^{10, 43}, Mendelian genes (Freund)⁴⁴, Mendelian genes (Vuckovic)⁴⁵, immune genes⁴⁶, and high-pLI genes⁴⁷) (Figure 3C and Supplementary Table S6). The PPI-enhancer gene score showed significant enrichment in all 5 gene sets, with higher magnitude of enrichment compared to any of the 7 Enhancer-driven gene scores. In particular, the PPI-enhancer gene score was 5.3x (s.e. 0.1) enriched in drug target genes and 4.6x (s.e. 0.1) enriched in Mendelian genes (Freund), a ≥2x stronger enrichment in each case than the EDS-binary gene score²⁴ (2.1x (s.e. 0.1) and 2.3x (s.e. 0.1)). These findings validate the high importance to disease of the PPI-enhancer gene score..

We sought to assess whether the PPI-enhancer disease signal derives from (i) the information in the PPI network or (ii) the improved signal-to-noise of combining different Enhancer-driven gene scores (see above). To assess this, we constructed an optimally weighted linear combination of the 4 enhancer-driven scores from Figure 3D, without using PPI network information (Weighted-enhancer; see Methods). We repeated the above analyses using Weighted-enhancer instead of PPI-enhancer. We determined that marginal τ^* values were considerably lower for Weighted-enhancer vs. PPI-enhancer annotations (0.65x, Bootstrap p: 3.4e-09 for 1,000 re-samples per annotation, Supplementary Table S27 vs. Supplementary Table S23). (In addition, none of the Weighted-enhancer annotations were significant conditional on the PPI-enhancer-driven joint model from Figure 3D; see Supplementary Table S28). This confirms that the additional PPI-enhancer signal derives from the information in the PPI network. To verify that the PPI-enhancer disease signal is driven not just by the PPI network but also by the input gene scores, we defined a new gene score analogous to PPI-enhancer but using 4 randomly generated binary gene sets of size 10% as input (PPI-control). We determined that marginal τ^* values were much lower for PPI-control vs. PPI-enhancer annotations (0.52x, Bootstrap p-value: 4.4e-16 over 1,000 re-samples per annotation, Supplementary Table S29 vs. Supplementary Table S23). (In addition, none of the PPI-control annotations were significant conditional on the PPI-enhancer-driven joint model from Figure 3D; see Supplementary Table S30). This confirms that the PPI-enhancer disease signal is driven not just by the PPI network but also by the input gene scores.

We performed 4 secondary analyses. First, we defined a new gene score (RegNet-Enhancer) using the regulatory network from ref.⁵⁵ instead of the STRING PPI network, and repeated the above analyses. We determined that the STRING PPI network and the RegNet regulatory network are similarly informative (Supplementary Table S31 and Supplementary Table S32); we elected to use the STRING PPI network in our main analyses because the RegNet regulatory network uses GTEx expression data, which is also used by the SEG-GTEx gene score, complicating interpretation of the results. Second, for each of the 4 jointly significant PPI-enhancer annotations from Figure 3D, we assessed their functional enrichment for fine-mapped SNPs for blood-related traits from two previous studies^{50, 51}. We observed large and significant enrichments for all 4 annotations (Supplementary Table S17), consistent with the S-LDSC results (and with the similar analysis of Enhancer-driven annotations described above). Third, we performed a pathway enrichment analysis to assess the enrichment of the PPI-enhancer gene score in pathways from the ConsensusPathDB database⁵²; this gene score was enriched in immune-related pathways (Supplementary Table S18). Fourth, we confirmed that our forward stepwise elimination procedure produced identical results when applied to all 80 enhancer-driven and PPI-enhancer annotations, instead of just the 6 Enhancer-driven annotations from the Enhancer-driven joint model (Supplementary Figure S11) and the 10 PPI-enhancer annotations.

We conclude that genes with high network connectivity to Enhancer-driven genes are conditionally informative for autoimmune diseases and blood-related traits when using functionally informed S2G strategies.

Master-regulator genes are conditionally informative for autoimmune disease heritability

We assessed the disease informativeness of two gene scores prioritizing Master-regulator genes in blood. We defined these gene scores using whole blood eQTL data from the eQTLGen consortium⁵⁶ (Trans-master) and a published list of known transcription factors in humans⁵⁷ (TF) (Figure 1C, Table 1 and Methods). We note that TF genes do not necessarily act as master regulators, but can be viewed as candidate master regulators. Using 97 known master-regulator genes from 18 master-regulator families^58–62 as a point of reference, we determined that Trans-master and TF genes had 3.5x and 5.4x excess overlaps with the 97 master-regulator genes (P-values: 5.6e-72 and 2.2e-160; Supplementary Table S33 and Supplementary Table S34), confirming that they prioritize master-regulator genes.

In detail, Trans-master is a binary gene score defined by genes that significantly regulate 3 or more other genes in trans via SNPs that are significant cis-eQTLs of the focal gene (10% of genes); the median value of the number of genes trans-regulated by a Trans-master gene is 14. Notably, trans-eQTL data from the eQTLGen consortium⁵⁶ was only available for 10,317 previously disease-associated SNPs. It is possible that genes with significant cis-eQTL that are disease-associated SNPs may be enriched for disease heritability irrespective of trans signals. To account for this gene-level bias, we conditioned all analyses of Trans-master annotations on both (i) 10 annotations based on a gene score defined by genes with at least 1 disease-associated cis-eQTL, combined with each of the 10 S2G strategies, and (ii) 10 annotations based on a gene score defined by genes with at least 3 unlinked disease-associated cis-eQTL, combined with each of the 10 S2G strategies; we chose the number 3 to maximize the correlation between this gene score and the Trans-master gene score (r = 0.32). Thus, our primary analyses were conditioned on 93 baseline-LD+ and 20 additional annotations (113 baseline-LD+cis model annotations); additional secondary analyses are described below. We did not consider a SNP annotation defined by trans-eQTLs, because the trans-eQTLs in eQTLGen data were restricted to disease-associated SNPs, which would bias our results.

We combined the Trans-master gene score with the 10 S2G strategies (Table 2) to define 10 annotations. In our marginal analysis using S-LDSC conditional on the baseline-LD+cis model, all 10 Trans-master annotations were strongly and significantly enriched for disease heritability, with larger enrichments for smaller annotations (Supplementary Figure S5 and Supplementary Table S35); values of standardized enrichment were more similar across annotations (Supplementary Figure S6 and Supplementary Table S36). All 10 Trans-master annotations attained conditionally significant τ^* values after Bonferroni correction (p < 0.05/110) (Figure 4A and Supplementary Table S35). We observed the strongest conditional signals for Trans-master × TSS (τ* = 1.6, vs. τ* = 0.37-0.39 for Master-regulator × window-based S2G strategies). We observed similar (slightly more significant) results when conditioning on baseline-LD+ annotations only (Supplementary Table S37).

Figure 4.

Disease informativeness of Master-regulator and PPI-master annotations: We evaluated 30 annotations constructed by combining 2 Master-regulator + 1 PPI-master gene scores with 10 S2G strategies. (A) Standardized effect size (τ*), conditional on the 113 baselineLD+cis model annotations. (B) Comparison of meta-analyzed standardized effect size (τ*) across 6 autoimmune diseases vs. 5 blood cell traits. (C) Enrichment of Master-regulator and PPI-master genes in 5 “gold standard” disease-related gene sets.(D) Standardized effect size (τ*), conditional on the baseline-LD+cis model plus 5 jointly significant Master-regulator + PPI-master annotations. In panels A and D, results are meta-analyzed across 11 blood-related traits. In panels A and C, ** denotes Bonferroni-significant (P < 0.05/110 in Panel A and P < 0.05/55 in Panel C) and * denotes FDR < 0.05. In panel A, the black box in each row denotes the S2G strategy with highest τ*. In Panel B, circled dots denote annotations with significant (FDR<5%) difference in effect size between the two meta-analyses, the solid line denotes y=x, and the dashed line denotes the regression slope. We report the slope of the regression and the Pearson correlation for master-regulator and PPI-master annotations (slope=0.57, r=0.56 for master-regulator annotations only). Numerical results are reported in Supplementary Tables S35, Table S44, S45, S6, S53 and S57.

As noted above, trans-eQTL data from the eQTLGen consortium⁵⁶ was only available for 10,317 previously disease-associated SNPs, and we thus defined and conditioned on baseline-LD+cis model annotations to account for gene-level bias. We verified that conditioning on annotations derived from gene scores defined by other minimum numbers of cis-eQTL and/or unlinked cis-eQTL produced similar results (Supplementary Table S38, Supplementary Table S39, Supplementary Table S40, Supplementary Table S41, Supplementary Table S42). To verify that our results were not impacted by SNP-level bias, we adjusted each of the 10 Trans-master annotations by removing all disease-associated trans-eQTL SNPs in the eQTLGen data from the annotation, as well as any linked SNPs (Methods). We verified that these adjusted annotations produced similar results (Supplementary Table S43).

TF is a binary gene score defined by a published list of 1,639 known transcription factors in humans⁵⁷. We combined TF with the 10 S2G strategies (Table 2) to define 10 annotations. In our marginal analysis conditional on the baseline-LD+cis model, all 10 TF annotations were significantly enriched for heritability, but with smaller enrichments than the Trans-master annotations (Supplementary Table S35); see Supplementary Table S36 for standardized enrichments. 9 TF annotations attained significant τ^* values after Bonferroni correction (Figure 4A and Supplementary Table S35) (the same 9 annotations were also significant conditional on the baseline-LD+ model; Supplementary Table S37). Across all S2G strategies, τ^* values of Trans-master annotations were larger than those of TF annotations (Supplementary Table S35).

We compared meta-analyses of S-LDSC results across 6 autoimmune diseases vs. 5 blood cell traits(Figure 4B, Supplementary Figure S7, Supplementary Table S1, Supplementary Table S44, Supplementary Table S45). Results were broadly concordant (r = 0.56 between τ* estimates), with slightly stronger signals for blood cell traits (slope=0.57). We also compared meta-analyses of results across 2 granulocyte-related blood cell traits vs. 3 red blood cell or platelet-related blood cell traits (Supplementary Figure S8, Supplementary Table S46, and Supplementary Table S47). Results were broadly concordant (r = 0.94, slope = 1.12). We also examined S-LDSC results for individual disease/traits and applied a test for heterogeneity⁴⁹ (Supplementary Figure S12, Supplementary Figure S13, Supplementary Table S14, Supplementary Table S15). We observed several annotation-trait pairs with disease signal (FDR<5% for 96 of 220 annotation-trait pairs), with evidence of heterogeneity across diseases/traits (FDR<5% for 10 of 20 annotations).

We jointly analyzed the 10 Trans-master and 9 TF annotations that were Bonferroni-significant in our marginal analysis (Figure 4A and Supplementary Table S35) by performing forward stepwise elimination to iteratively remove annotations that had conditionally non-significant τ* values after Bonferroni correction. Of these, 3 Trans-master annotations and 2 TF annotations were jointly significant in the resulting Master-regulator joint model (Supplementary Figure S14 and Supplementary Table S48). The joint signal was strongest for Trans-master Roadmap (τ* = 0.81, s.e. = 0.13), emphasizing the high added value of the Roadmap S2G strategy.

We assessed the enrichment of the Trans-master and TF gene scores in the 5 “gold standard” disease-related gene sets (drug target genes^{10, 43}, Mendelian genes (Freund)⁴⁴, Mendelian genes (Vuckovic)⁴⁵, immune genes⁴⁶, and high-pLI genes⁴⁷) (Figure 4C and Supplementary Table S6). The Trans-master gene score showed higher enrichment in all 5 gene sets compared to the TF gene score. The enrichments for master-regulator genes were lower (1.4x, s.e. 0.07) for drug target genes in comparison to some Enhancer-driven genes and the PPI-enhancer gene score (Figure 3C); this can be attributed to the fact that master-regulator genes may tend to disrupt genes across several pathways, rendering them unsuitable as drug targets.

We performed 7 secondary analyses. First, for comparison purposes, we defined a binary gene score (Trans-regulated) based on genes with at least one significant trans-eQTL. We combined Trans-regulated genes with the 10 S2G strategies to define 10 annotations. In our marginal analysis using S-LDSC conditional on the baseline-LD+cis model, none of the Trans-regulated annotations attained conditionally significant τ^* values after Bonferroni correction (p < 0.05/110) (Supplementary Table S49). (In contrast, 3 of the annotations were significant when conditioning only on the baseline-LD+ model (Supplementary Table S50).) Second, a potential complexity is that trans-eQTL in whole blood may be inherently enriched for blood cell trait-associated SNPs (since SNPs that regulate the abundance of a specific blood cell type would result in trans-eQTL effects on genes that are specifically expressed in that cell type⁵⁶), potentially limiting the generalizability of our results to non-blood cell traits. To ensure that our results were robust to this complexity, we verified that analyses restricted to the 5 autoimmune diseases (Supplementary Table S1) produced similar results (Supplementary Table S51). Third, for each of the 5 annotations from the Master-regulator joint model (Supplementary Figure S14), we assessed their functional enrichment for fine-mapped SNPs for blood-related traits from two previous studies^{50, 51}. We observed large and significant enrichments for all 5 annotations (Supplementary Table S17), consistent with the S-LDSC results (and with similar analyses described above). Fourth, we performed pathway enrichment analyses to assess the enrichment of the Trans-master and TF gene scores in pathways from the ConsensusPathDB database⁵². The Trans-master gene score was significantly enriched in immune-related pathways (Supplementary Table S18). Fifth, we explored other approaches to combining information across genes that are linked to a SNP using S2G strategies, by using either the mean across genes or the sum across genes of the gene scores linked to a SNP, instead of the maximum across genes. We determined that results for either the mean or the sum were very similar to the results for the maximum, with no significant difference in standardized effect sizes of the resulting SNP annotations (Supplementary Table S35, Supplementary Table S19 and Supplementary Table S20). Sixth, we repeated our analyses of the Trans-master gene score, defined in our primary analyses based on 2,215 genes that trans-regulate ≥ 3 genes, using either 3,717 genes that trans-regulate ≥ 1 gene (most of which trans-regulate multiple genes) or 1,170 genes that trans-regulate ≥ 10 genes (Supplementary Table S52). We observed very similar results, with largely non-significant differences in standardized effect sizes. Seventh, we confirmed that our forward stepwise elimination procedure produced identical results when applied to all 20 master-regulator annotations, instead of just the 19 master-regulator annotations that were Bonferroni-significant in our marginal analysis.

We conclude that master-regulator genes are conditionally informative for autoimmune diseases and blood-related traits when using functionally informed S2G strategies.

Genes with high network connectivity to Master-regulator genes are even more informative

We assessed the disease informativeness of a gene score prioritizing genes with high connectivity to Master-regulator genes in the STRING PPI network⁵³ (PPI-master, analogous to PPI-enhancer; see Methods and Table 1). The PPI-master gene score was positively correlated with the 2 underlying master-regulator gene scores (average r = 0.43) and modestly correlated with PPI-enhancer (r=0.22) (Supplementary Figure S2). In addition, it had an excess overlap of 7.2x with the 97 known master-regulator genes^58–62 (P = 2e-214; Supplementary Table S33 and Supplementary Table S34).

We combined the PPI-master gene score with the 10 S2G strategies (Table 2) to define 10 annotations. In our marginal analysis using S-LDSC conditional on the baseline-LD+cis model, all 10 PPI-master annotations were significantly enriched for disease heritability, with larger enrichments for smaller annotations (Figure 4A and Supplementary Table S53); values of standardized enrichment were more similar across annotations (Supplementary Figure S6 and Supplementary Table S54). All 10 PPI-master annotations attained conditionally significant τ* values after Bonferroni correction (p < 0.05/110) (Figure 4B and Supplementary Table S53) (as expected, results were similar when conditioning only on the baseline-LD+ model; Supplementary Table S55). We observed the strongest conditional signals for PPI-master combined with TSS (τ*=1.7, s.e. 0.16), Coding (τ*=1.7, s.e. 0.14) and ABC (τ*=1.6, s.e. 0.17) S2G strategies, again emphasizing the high added value of S2G strategies incorporating functional data (Supplementary Table S53). 9 of the 10 PPI-master annotations remained significant when conditioning on the Master-regulator joint model from Supplementary Figure S14 (Supplementary Table S56). In a comparison of meta-analyses of S-LDSC results across 5 blood cell traits vs. 6 autoimmune diseases, results were broadly concordant (r = 0.81 between τ^* estimates, slope = 0.93) (Figure 4B, Supplementary Figure S7, Supplementary Table S44, and Supplementary Table S45). In a comparison of meta-analyses across 2 granulocyte-related blood cell traits vs. 3 red blood cell or platelet-related blood cell traits, results were broadly concordant, but with slightly signals for granulocyte-related traits (r = 0.92, slope = 1.3) (Supplementary Figure S8 and Supplementary Tables S46, S47). In the analyses of individual traits, 101 of 110 PPI-enhancer annotation-trait pairs weresignificant (FDR<5%) (Supplementary Figure S12, Supplementary Figure S13, Supplementary Table S14), with evidence of heterogeneity across diseases/traits (FDR<5% for 6 of 10 PPI-master annotations)(Supplementary Table S15).

We jointly analyzed the 5 Master-regulator annotations from the master-regulator joint model (Supplementary Figure S14 and Supplementary Table S48) and the 9 PPI-master annotations significant conditional on the master-regulator joint model in Supplementary Table S56. Of these, 2 Trans-master-regulator and 3 PPI-master annotations were jointly significant in the resulting PPI-master-regulator joint model (Figure 4D and Supplementary Table S57). The joint signal was strongest for PPI-master × Roadmap (τ* = 0.94±0.14),and 4 of the 5 annotations attained τ* > 0.5.

We assessed the enrichment of the PPI-master gene score in the 5 “gold standard” disease-related gene sets (drug target genes^{10, 43}, Mendelian genes (Freund)⁴⁴, Mendelian genes (Vuckovic)⁴⁵, immune genes⁴⁶, and high-pLI genes⁴⁷) (Figure 4C and Supplementary Table S6). The PPI-master gene score showed significant enrichment in all 5 gene sets, with higher magnitude of enrichment compared to either of the master-regulator gene scores. In particular, the PPI-master gene score was 2.7x (s.e. 0.1) enriched in drug target genes and 3.4x (s.e. 0.1) enriched in Mendelian genes (Freund)..

We performed 3 secondary analyses. First, for each of the 3 jointly significant PPI-master annotations from Figure 4D, we assessed their functional enrichment for fine-mapped SNPs for blood-related traits from two previous studies^{50, 51}. We observed large and significant enrichments for all 3 annotations (Supplementary Table S17), consistent with the S-LDSC results (and with similar analyses described above). Second, we performed a pathway enrichment analysis to assess the enrichment of the PPI-master gene score in pathways from the ConsensusPathDB database⁵² and report the top enriched pathways (Supplementary Table S18). Third, we confirmed that our forward stepwise elimination procedure produced identical results when applied to all 30 master-regulatorand PPI-master annotations, instead of just the 5 master-regulator annotations from the Master-regulator joint model (Supplementary Figure S14) and the 9 PPI-master annotations that were Bonferroni-significant in our marginal analysis.

We conclude that genes with high network connectivity to Master-regulator genes are conditionally informative for autoimmune diseases and blood-related traits when using functionally informed S2G strategies.

Combined joint model

We constructed a combined joint model containing annotations from the above analyses that were jointly significant, contributing information conditional on all other annotations. We merged the baseline-LD+cis model with annotations from the PPI-enhancer (Figure 3D) and PPI-master (Figure 4D) joint models, and performed forward stepwise elimination to iteratively remove annotations that had conditionally non-significant τ* values after Bonferroni correction (p < 0.05/110). The combined joint model contained 8 new annotations, including 2 Enhancer-driven, 2 PPI-enhancer, 2 Trans-master and 2 PPI-master annotations (Figure 5 and Supplementary Table S58). The joint signals were strongest for PPI-enhancer ABC (τ* = 0.99, s.e. 0.23) and PPI-master Roadmap (τ* = 0.91, s.e. 0.12) highlighting the importance of two distal S2G strategies, ABC and Roadmap; 5 of the 8 new annotations attained τ* > 0.5. We defined a new metric quantifying the conditional informativeness of a heritability model (combined τ*, generalizing the combined τ* metric of ref.⁶³ to more than two annotations; see Methods). As expected, the combined joint model attained a larger combined τ^* (2.5, s.e. 0.24) than the PPI-enhancer (1.5, s.e. 0.15) or PPI-master (1.9, s.e. 0.14) joint models (Supplementary Figure S15, Supplementary Table S59, Supplementary Table S60, Supplementary Table S61).

Figure 5.

Combined joint model: (A) Heritability enrichment (log scale) of the 8 jointly significant Enhancer-driven, Master-regulator, PPI-enhancer-driven and PPI-master-regulator annotations, conditional on the baseline-LD+cis model. Horizontal line denotes no enrichment. Standardized effect size (τ*) conditional on the baseline-LD+cis model plus the 8 jointly significant annotations. Significance is corrected for multiple testing by Bonferroni correction (P < 0.05/110). Errors bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table S58.

We evaluated the combined joint model of Figure 5 (and other models) by computing logl_SS⁶⁴ (an approximate likelihood metric) relative to a model with no functional annotations (Δlogl_SS), averaged across a subset of 6 blood-related traits (1 autoimmune disease and 5 blood cell traits) from the UK Biobank (Supplementary Table S1). The combined joint model attained a +12.3% larger Δlogl_SS than the baseline-LD model (Supplementary Table S62); most of the improvement derived from the 7 S2G annotations (Figure 2) and the 8 Enhancer-driven and Master-regulator annotations (Figure 5). The combined joint model also attained a 27.2% larger Δlogl_SS than the baseline-LD model in a separate analysis of 24 non-blood-related traits from the UK Biobank (Supplementary Table S62; traits listed in Supplementary Table S63), implying that the value of the annotations introduced in this paper is not restricted to autoimmune diseases and blood-related traits. However, the non-blood-related traits had considerably lower absolute Δlogl_SS compared to the blood-related traits. Accordingly, in a broader analysis of 36 non-blood-related traits from UK Biobank and non-UK Biobank sources (Supplementary Table S64), meta-analyzed τ* values were considerably lower for non-blood-related traits than for blood-related traits (Supplementary Figure S16, Supplementary Table S65 and Supplementary Table S66).

We investigated the biology of individual loci by examining 1,198 SNPs that were previously confidently fine-mapped (posterior inclusion probability (PIP) > 0.90) for 1 autoimmune disease and 5 blood cell traits from the UK Biobank. As noted above, fine-mapped SNPs from ref.⁵¹ were highly enriched for all 8 annotations from the combined joint model (Supplementary Table S17); accordingly, focusing on the 4 highly enriched regulatory annotations from Figure 5 (enrichment 18; 1.5% of SNPs in total), 194 of the 1,198 SNPs belonged to one or more of these 4 annotations. A list of these 194 SNPs is provided in Supplementary Table S67. We highlight 3 notable examples. First, rs231779, a fine-mapped SNP (PIP=0.91) for “All Auto Immune Traits” (Supplementary Table S1), was linked by the ABC S2G strategy to CTLA4, a high-scoring gene for the PPI-enhancer gene score (ranked 109) (Supplementary Figure S17A). CTLA4 acts as an immune checkpoint for activation of T cells and is a key target gene for cancer immunotherapy^65–67. Second, rs6908626, a fine-mapped SNP (PIP=0.99) for ”All Auto Immune Traits” (Supplementary Table S1), was linked by the Roadmap S2G strategy to BACH2, a high-scoring gene for the Trans-master gene score (ranked 311) (Supplementary Figure S17B). BACH2 is a known master-regulator TF that functions in innate and adaptive lineages to control immune responses^{68, 69}, has been shown to control autoimmunity in mice knockout studies⁷⁰, and has been implicated in several autoimmune and allergic diseases including lupus, type 1 diabetes and asthma^71–73. Third, rs113473633, a fine-mapped SNP (PIP = 0.99 and PIP = 0.99) for white blood cell (WBC) count and eosinophil count (Supplementary Table S1), was linked by the Roadmap S2G strategy to NFKB1, a high-scoring gene for the Trans-master and PPI-master gene scores (ranked 409 and 111) (Supplementary Figure S17C for WBC count, Supplementary Figure S17D for eosinophil count). NFKB1 is a major transcription factor involved in immune response⁷⁴ and is critical for development and proliferation of lymphocytes^{75, 76}, and has previously been implicated in blood cell traits⁴⁵. In each of these examples, we we nominate both the causal gene and the SNP-gene link.

We performed 4 secondary analyses. First, we investigated whether the 8 annotations of the combined joint model still contributed unique information after including the pLI gene score⁴⁷, which has previously been shown to be conditionally informative for disease heritability^{37, 41, 77}. We confirmed that all 8 annotations from Figure 5 remained jointly significant (Supplementary Figure S18 and Supplementary Table S68). Second, we considered integrating PPI network information via a single gene score (PPI-all) instead of two separate gene scores (PPI-enhancer and PPI-master). We determined that the combined joint model derived from PPI-all attained a similar combined τ^* (2.5, s.e. 0.22; Supplementary Table S59; see Supplementary Table S69 for individual τ* values) as the combined joint model derived from PPI-enhancer and PPI-master (2.5, 0.24; Supplementary Table S59), and we believe it is less interpretable. Third, we constructed a less restrictive combined joint model by conditioning on the baseline-LD+ model instead of the baseline-LD+cis model. The less restrictive combined joint model included 1 additional annotation, SEG-GTEx Coding (Supplementary Table S70). This implies that the combined joint model is largely invariant to conditioning on the baseline-LD+ or baseline-LD+cis model. Fourth, we analyzed binarized versions of all 11 gene scores (Table 1) using MAGMA⁷⁸, an alternative gene set analysis method. 9 of the 11 gene scores produced significant signals (Supplementary Table S71), 11 marginally significant gene scores (Figure 3 and Figure 4) and 5 gene scores included in the combined joint model of Figure 5 in the S-LDSC analysis. However, MAGMA does not allow for conditioning on the baseline-LD model, does not allow for joint analysis of multiple gene scores to assess joint significance, and does not allow for incorporation of S2G strategies. Fifth, we confirmed that our forward stepwise elimination procedure produced identical results when applied to all 110 enhancer-driven, master-regulator, PPI-enhancer and PPI-master annotations, instead of just the 12 annotations from the PPI-enhancer (Figure 3D) and PPI-master (Figure 4D) joint models. Sixth, we assessed the model fit of the final joint model by correlating the residuals from stratified LD score regression with the independent variables in the regression (annotation-specific LD scores) for each of the 11 blood-related traits (Supplementary Figure S19). We observed an average squared correlation of 0.02 across annotation-specific LD scores and traits, suggesting good model fit.

We conclude that both Enhancer-driven genes and Master-regulator genes, as well as genes with high network connectivity to those genes, are jointly informative for autoimmune diseases and blood-related traits when using functionally informed S2G strategies.

Discussion

We have assessed the contribution to autoimmune disease of Enhancer-driven genes and Master-regulator genes, incorporating PPI network information and 10 functionally informed S2G strategies. We determined that our characterizations of Enhancer-driven and Master-regulator genes, informed by PPI networks, identify gene sets that are important for autoimmune disease, and that combining those gene sets with functionally informed S2G strategies enables us to identify SNP annotations in which disease heritability is concentrated. The conditional informativeness of SNP annotations derived from these characterizations implies that these gene sets and SNP annotations provide information about disease that is different from other available sources of information. The gene scores analyzed in this study were enriched for 5 gold-standard disease-related gene sets, including autoimmune disease drug targets and Mendelian genes related to immune dysregulation, providing further evidence that they expand and enhance our understanding of which genes impact autoimmune disease. In particular, our PPI-enhancer gene score produced stronger signals than the recently proposed Enhancer Domain Score²⁴ (EDS). Our primary results were meta-analyzed across 11 autoimmune diseases and blood-related traits; we determined that results of meta-analyses across 6 autoimmune diseases and meta-analyses across 5 blood cell traits were quite similar, and that analyses of individual diseases/traits were generally underpowered.

Our work has several downstream implications. First, the PPI-enhancer gene score, which attained a particularly strong enrichment for approved autoimmune disease drug targets, will aid prioritization of drug targets that share similar characteristics as previously discovered drugs, analogous to pLI⁴⁷ and LOEUF^{79, 80}. Second, our gene scores and S2G strategies help nominate specific gene-linked regions for functional validation using CRISPRi experiments. In particular, our results implicate ABC and Roadmap strategies as highly informative distal S2G strategies and TSS as a highly informative proximal S2G strategy, motivating the use of these specific S2G strategies in functional validation experiments. Third, our framework for disease heritability analysis incorporating S2G strategies (instead of conventional window-based approaches) will be broadly applicable to gene sets derived from other types of data, as in our more recent work⁸¹. Fourth, at the level of genes, our findings have immediate potential for improving gene-level probabilistic fine-mapping of transcriptome-wide association studies⁸² and gene-based association statistics⁸³. Fifth, at the level of SNPs, SNPs in regions linked to genes representing disease informative gene scores have immediate potential for improving functionally informed fine-mapping^{51, 84–86} (including experimental followup⁸⁷), polygenic localization⁵¹, and polygenic risk prediction^{88, 89}.

Our work has several limitations, representing important directions for future research. First, we restricted our analyses to Enhancer-driven and Master-regulator genes in blood, focusing on autoimmune diseases and blood-related traits; this choice was primarily motivated by the better representation of blood cell types in functional genomics assays and trans-eQTL studies. However, it will be valuable to extend our analyses to other tissues and traits as more functional data becomes available. Second, the trans-eQTL data from eQTLGen consortium⁵⁶ is restricted to 10,317 previously disease-associated SNPs; we modified our analyses to account for this bias. However, it would be valuable to extend our analyses to genome-wide trans-eQTL data at large sample sizes, if that data becomes available. Third, we investigated the 10 S2G strategies separately, instead of constructing a single optimal combined strategy. A comprehensive evaluation of S2G strategies, and a method to combine them, will be provided elsewhere (S. Gazal, unpublished data). Fourth, the forward stepwise elimination procedure that we use to identify jointly significant annotations²⁹ is a heuristic procedure whose choice of prioritized annotations may be close to arbitrary in the case of highly correlated annotations; however, the correlations between the gene scores, S2G strategies, and annotations that we analyzed were modest. Fifth, the potential of the gene scores implicated in this study to aid prioritization of future drug targets—based on observed gene-level enrichments for approved autoimmune disease drug targets—is subject to the limitation that novel drug targets that do not adhere to existing patterns may be missed; encouragingly, we also identify gene-level enrichments of the gene scores implicated in this study for 4 other “gold-standard” disease-related gene sets. Despite all these limitations, our findings expand and enhance our understanding of which gene-level characterizations of enhancer-driven and master-regulatory architecture and their corresponding gene-linked regions impact autoimmune diseases.

Methods

Genomic annotations and the baseline-LD model

We define an annotation as an assignment of a numeric value to each SNP in a predefined reference panel (e.g., 1000 Genomes Project³¹; see URLs). Binary annotations can have value 0 or 1 only. Continuous-valued annotations can have any real value; our focus is on continuous-valued annotations with values between 0 and 1. Annotations that correspond to known or predicted function are referred to as functional annotations. The baseline-LD model (v.2.1) contains 86 functional annotations (see URLs). These annotations include binary coding, conserved, and regulatory annotations (e.g., promoter, enhancer, histone marks, TFBS) and continuous-valued linkage disequilibrium (LD)-related annotations.

Gene Scores

We define a gene score as an assignment of a numeric value between 0 and 1 to each gene; we primarily focus on binary gene sets defined by the top 10% of genes. We analyze a total of 11 gene scores (Table 1): 7 Enhancer-driven gene scores, 2 Master-regulator gene scores and 2 PPI-based gene scores (PPI-master, PPI-enhancer) that aggregate information across Enhancer-driven and Master-regulator gene scores. We scored 22,020 genes on chromosomes 1-22 from ref.⁷ (see URLs). When selecting the top 10% of genes for a given score, we rounded the number of genes to 2,200. We used the top 10% of genes in our primary analyses to be consistent with previous work⁹, who also defined gene scores using the top 10% of genes for a given metric, and to ensure that all SNP annotations (gene scores x S2G strategies) analyzed were of reasonable size (0.2% of SNPs or larger).

The 7 Enhancer-driven gene scores are as follows:

ABC-G: A binary gene score denoting genes that are in top 10% of the number of ’intergenic’ and ’genic’ Activity-by-Contact (ABC) enhancer to gene links in blood cell types, with average HiC score fraction > 0.015³² (see URLs).
ATAC-distal: A probabilistic gene score denoting the proportion of gene expression variance in 86 immune cell types in mouse, that is explained by the patterns of chromatin covariance of distal enhancer OCRs (open chromatin regions) to the gene, compared to chromatin covariance of OCRs that are near TSS of the gene and unexplained variances (see Figure 2 from⁴⁸). The genes were mapped to their human orthologs using Ensembl biomaRt⁹⁰.
EDS-binary: A binary gene score denoting genes that are in top 10% of the blood-specific Activity-based Enhancer Domain Score (EDS)²⁴ that reflects the number of conserved bases in enhancers that are linked to genes in blood related cell types as per the Roadmap Epigenomics Project^{91, 92} (see URLs).
eQTL-CTS: A probabilistic gene score denoting the proportion of immune cell-type-specific eQTLs (with FDR adjusted p-value < 0.05 in one or two cell-types) across 15 different immune cell-types from the DICEdb project⁹³ (see URLs).
Expecto-MVP: A binary gene score denoting genes that are in top 10% in terms of the magnitude of variation potential (MVP) in GTEx Whole Blood, which is the sum of the absolute values of all directional mutation effects within 1kb of the TSS, as evaluated by the Expecto method⁷ (see URLs).
PC-HiC-distal: A binary gene score denoting genes that are in top 10% in terms of the total number of Promoter-capture HiC connections across 17 primary blood cell-types.
SEG-GTEx: A binary gene score denoting genes that are in top 10% in terms of the SEG t-statistic⁹ score in GTEx Whole Blood.

The 2 Master-regulator gene scores are as follows:

Trans-master: A binary gene score denoting genes with significant trait-associated cis-eQTLs in blood that also act as significant trans-eQTLs for at least 3 other genes based on data from eQTLGen Consortium⁵⁶. We used the threshold of trans-regulating ≥3 genes in our primary analyses because this results in a gene score spanning ≈10% of genes, analogous to other gene scores.
TF: A binary gene score denoting genes that act as human transcription factors⁵⁷.

The 2 PPI-based gene scores are as follows:

PPI-enhancer: A binary gene score denoting genes in top 10% in terms of closeness centrality measure to the disease informative enhancer-regulated gene scores. To get the closeness centrality metric, we first perform a Random Walk with Restart (RWR) algorithm⁵⁴ on the STRING protein-protein interaction (PPI) network^{53, 94}(see URLs) with seed nodes defined by genes in top 10% of the 4 enhancer-regulated gene scores with jointly significant disease informativeness (ABC-G, ATAC-distal, EDS-binary and SEG-GTEx). The closeness centrality score was defined as the average network connectivity of the protein products from each gene based on the RWR method.
PPI-master: A binary gene score denoting genes in top 10% in terms of closeness centrality measure to the 2 disease informative master-regulator gene scores (Trans-master and TF). The algorithm was same as that of PPI-enhancer.

S2G strategies

We define a SNP-to-gene (S2G) linking strategy as an assignment of 0, 1 or more linked genes to each SNP with minor allele count ≥ 5 in a 1000 Genomes Project European reference panel³¹. We explored 10 SNP-to-gene linking strategies, including both distal and proximal strategies (Table 2). The proximal strategies included gene body ± 5kb; gene body ± 100kb; predicted TSS (by Segway^{95, 96}) ; coding SNPs; and promoter SNPs (as defined by UCSC^{97, 98}). The distal strategies included regions predicted to be distally linked to the gene by Activity-by-Contact (ABC) score^{32, 33} > 0.015 as suggested in ref.³³ (see below); regions predicted to be enhancer-gene links based on Roadmap Epigenomics data (Roadmap)^{91, 92, 99}; regions in ATAC-seq peaks that are highly correlated (> 50% as recommended in ref.⁴⁸) to expression of a gene in mouse immune cell-types (ATAC)⁴⁸; regions distally connected through promoter-capture Hi-C links (PC-HiC)¹⁰⁰; and SNPs with fine-mapped causal posterior probability (CPP)³⁷ > 0.001 (we chose this threshold to ensure that the SNP annotations generated after combining the gene scores with the eQTL S2G strategy were of reasonable size (0.2% of SNPs or larger) for all gene scores analyzed) in GTEx whole blood (we use this thresholding on CPP to ensure adequate annotation size for annotations resulting from combining this S2G strategy with the gene scores studied in this paper).

Activity-by-Contact model predictions

We used the Activity-by-Contact (ABC) model (https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction) to predict enhancer-gene connections in each cell type, based on measurements of chromatin accessibility (ATAC-seq or DNase-seq) and histone modifications (H3K27ac ChIP-seq), as previously described^{32, 33}. In a given cell type, the ABC model reports an “ABC score” for each element-gene pair, where the element is within 5 Mb of the TSS of the gene.

For each cell type, we:

Called peaks on the chromatin accessibility data using MACS2 with a lenient p-value cutoff of 0.1.
Counted chromatin accessibility reads in each peak and retained the top 150,000 peaks with the most read counts. We then resized each of these peaks to be 500bp centered on the peak summit. To this list we added 500bp regions centered on all gene TSS’s and removed any peaks overlapping blacklisted regions^{101, 102} (https:// sites.google.com/site/anshulkundaje/projects/blacklists). Any resulting overlapping peaks were merged. We call the resulting peak set candidate elements.
Calculated element Activity as the geometric mean of quantile normalized chromatin accessibility and H3K27ac ChIP-seq counts in each candidate element region.
Calculated element-promoter Contact using the average Hi-C signal across 10 human Hi-C datasets as described below.
Computed the ABC Score for each element-gene pair as the product of Activity and Contact, normalized by the product of Activity and Contact for all other elements within 5 Mb of that gene.

To generate a genome-wide averaged Hi-C dataset, we downloaded KR normalized Hi-C matrices for 10 human cell types (GM12878, NHEK, HMEC, RPE1, THP1, IMR90, HU-VEC, HCT116, K562, KBM7). This Hi-C matrix (5 Kb) resolution is available here: ftp://ftp.broadinstitute.org/outgoing/lincRNA/average_hic/average_hic.v2.191020.tar.gz^{32, 103}. For each cell type we performed the following steps.

Transformed the Hi-C matrix for each chromosome to be doubly stochastic.
We then replaced the entries on the diagonal of the Hi-C matrix with the maximum of its four neighboring bins.
We then replaced all entries of the Hi-C matrix with a value of NaN or corresponding to Knight–Ruiz matrix balancing (KR) normalization factors ¡ 0.25 with the expected contact under the power-law distribution in the cell type.
We then scaled the Hi-C signal for each cell type using the power-law distribution in that cell type as previously described.
We then computed the “average” Hi-C matrix as the arithmetic mean of the 10 cell-type specific Hi-C matrices.

In each cell type, we assign enhancers only to genes whose promoters are “active” (i.e., where the gene is expressed and that promoter drives its expression). We defined active promoters as those in the top 60% of Activity (geometric mean of chromatin accessibility and H3K27ac ChIP-seq counts). We used the following set of TSSs (one per gene symbol) for ABC predictions: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/blob/v0.2.1/reference/RefSeqCurated.170308.bed. CollapsedGeneBounds.bed. We note that this approach does not account for cases where genes have multiple TSSs either in the same cell type or in different cell types.

For intersecting ABC predictions with variants, we took the predictions from the ABC Model and applied the following additional processing steps: (i) We considered all distal element-gene connections with an ABC score 0.015, and all distal or proximal promoter-gene connections with an ABC score 0.1. (ii) We shrunk the 500-bp regions by 150-bp on either side, resulting in a 200-bp region centered on the summit of the accessibility peak. This is because, while the larger region is important for counting reads in H3K27ac ChIP-seq, which occur on flanking nucleosomes, most of the DNA sequences important for enhancer function are likely located in the central nucleosome-free region. (iii) We included enhancer-gene connections spanning up to 2 Mb.

Stratified LD score regression

Stratified LD score regression (S-LDSC) is a method that assesses the contribution of a genomic annotation to disease and complex trait heritability^{28, 29}. S-LDSC assumes that the per-SNP heritability or variance of effect size (of standardized genotype on trait) of each SNP is equal to the linear contribution of each annotation where a_cj is the value of annotation c for SNP j, where a_cj may be binary (0/1), continuous or probabilistic, and τ_c is the contribution of annotation c to per-SNP heritability conditioned on other annotations. S-LDSC estimates the τ_c for each annotation using the following equation where is the stratified LD score of SNP j with respect to annotation c and r_jk is the genotypic correlation between SNPs j and k computed using data from 1000 Genomes Project³¹ (see URLs); N is the GWAS sample size.

We assess the informativeness of an annotation c using two metrics. The first metric is enrichment (E), defined as follows (for binary and probabilistic annotations only): where is the heritability explained by the SNPs in annotation c, weighted by the annotation values.

The second metric is standardized effect size (τ*) defined as follows (for binary, probabilistic, and continuous-valued annotations):

where sd_c is the standard error of annotation c, the total SNP heritability and M is the total number of SNPs on which this heritability is computed (equal to 5, 961, 159 in our analyses). represents the proportionate change in per-SNP heritability associated to a 1 standard deviation increase in the value of the annotation.

Combined τ*

We defined a new metric quantifying the conditional informativeness of a heritability model (combined τ*, generalizing the combined τ* metric of ref.⁶³ to more than two annotations. In detail, given a joint model defined by M annotations (conditional on a published set of annotations such as the baseline-LD model), we define Here r_ml is the pairwise correlation of the annotations m and l, and is expected to be positive since two positively correlated annotations typically have the same direction of effect (resp. two negatively correlated annotations typically have opposite directions of effect). We calculate standard errors for using a genomic block-jackknife with 200 blocks.

Evaluating heritability model fit using SumHer logl_SS

Given a heritability model (e.g. the baseline-LD model or the combined joint model of Figure 5), we define the Δlogl_SS of that heritability model as the logl_SS of that heritability model minus the logl_SS of a model with no functional annotations (baseline-LD-nofunct; 17 LD and MAF annotations from the baseline-LD model²⁹), where logl_SS⁶⁴ is an approximate likelihood metric that has been shown to be consistent with the exact likelihood from restricted maximum likelihood (REML). We compute p-values for Δlogl_SS using the asymptotic distribution of the Likelihood Ratio Test (LRT) statistic: −2 logl_SS follows a χ² distribution with degrees of freedom equal to the number of annotations in the focal model, so that −2Δlogl_SS follows a χ² distribution with degrees of freedom equal to the difference in number of annotations between the focal model and the baseline-LD-nofunct model. We used UK10K as the LD reference panel and analyzed 4,631,901 HRC (haplotype reference panel³⁴) well-imputed SNPs with MAF ≥ and INFO ≥ 0.99 in the reference panel; We removed SNPs in the MHC region, SNPs explaining > 1% of phenotypic variance and SNPs in LD with these SNPs.

We computed Δlogl_SS for 8 heritability models:

baseline-LD model: annotations from the baseline-LD model²⁹ (86 annotations).
baseline-LD+ model: baseline-LD model plus 7 new S2G annotations not included in the baseline-LD model (93 annotations).
baseline-LD+Enhancer model: baseline-LD model+ plus 6 jointly significant S2G annotations c corresponding to Enhancer-driven gene scores from Supplementary Figure S11 (99 annotations).
baseline-LD+PPI-enhancer model: baseline-LD model+ plus 7 jointly significant S2G annotations c corresponding to Enhancer-driven and PPI-enhancer gene scores from Figure 3D (100 annotations).
baseline-LD+cis model: baseline-LD+ plus 20 S2G annotations used to correct for confounding in evaluation of Trans-master gene score (see Results) (113 annotations).
baseline-LD+Master model: baseline-LD+cis plus 4 jointly significant Master-regulator S2G annotations from Supplementary Figure S14 (117 annotations).
baseline-LD+PPI-master model: baseline-LD+cis plus 4 jointly significant Master-regulator and PPI-master S2G annotations from Figure 4D (117 annotations).
baseline-LD+PPI-master model: baseline-LD+cis plus 8 jointly significant Enhancer-driven, Master-regulator, PPI-enhancer and PPI-master S2G annotations from the final joint model in Figure 5 (121 annotations).

Data Availability

All summary statistics used in this paper are publicly available (see URLs). This work used summary statistics from the UK Biobank study (http://www.ukbiobank.ac.uk/). The summary statistics for UK Biobank is available online (see URLs). All gene scores, S2G links and SNP annotations analyzed in this study are publicly available here:https://data.broadinstitute.org/alkesgroup/LDSCORE/Dey_Enhancer_MasterReg. Supplementary Tables S14 and S67 are provided as excel files in the above link. We have also included annotations for 93 million Haplotype Reference Consortium (HRC) SNPs and 170 million TOPMed SNPs (Freeze 3A).

Code Availability

The codes used to generate SNP annotations from gene sets, and for performing PPI-informed integration of gene sets are available on Github: https://github.com/kkdey/GSSG.

URLs

Gene scores, S2G links, annotations https://data.broadinstitute.org/alkesgroup/LDSCORE/Dey_Enhancer_MasterReg
Github code repository and data https://github.com/kkdey/GSSG
Activity-by-Contact (ABC) S2G links: https://www.engreitzlab.org/resources
1000 Genomes Project Phase 3 data: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502
UK Biobank summary statistics: https://data.broadinstitute.org/alkesgroup/UKBB/
baseline-LD model annotations: https://data.broadinstitute.org/alkesgroup/LDSCORE/
BOLT-LMM software: https://data.broadinstitute.org/alkesgroup/BOLT-LMM
S-LDSC software: https://github.com/bulik/ldsc

Supplementary Tables

View this table:

Table S1.

List of all blood-related traits: List of 11 blood-related traits (6 autoimmune diseases and 5 blood cell traits) analyzed in this paper. The disease called “All Auto Immune Traits” is based on the following codes and disease names that characterize autoimmune physiopathogenic etiology - 1222 (t1d; type 1 diabetes); 1256 (guillainBarre); 1260 (myasthenia); 1261 (ms); 1372 (vasculitis); 1378 (Wegener’s); 1381 (sle); 1382 (Sjogren); 1384 (sysSclerosis); 1437 (myasthenia); 1456 (celiac); 1464 (ra); 1522 (grave); 1661 (vitiligo)^{37, 38}. For autoimmune disease case/control GWAS-es, we report N as XXX/Y Y Y where XXX represents the number of cases and Y Y Y represents the number of controls. The average number of cases for the autoimmune diseases in this study is equal to 13,186 and the average number of samples (N) for blood cell traits is 443K.

View this table:

Table S2.

S-LDSC results for SNPs linked to all genes conditional on baseline-LD model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to all genes. The analysis is conditional on 86 baseline-LD (v2.1) annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S3.

Standardized enrichment of SNP annotations linked to all genes: Standardized enrichment of SNP annotations for SNPs linked to all genes, conditional on 86 baseline-LD annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S4.

S-LDSC results of a joint analysis of all S2G annotations for SNPs linked to all genes conditional on baseline-LD annotations: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations in a joint analysis comprising of 10 SNP annotations corresponding to all genes. The analysis is conditional on 86 baseline-LD annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S5.

S-LDSC results for SNP annotations based on (random genes × S2G strategies): Maximum standardized Effect sizes (τ*) (with significance) and maximum Enrichment (E) (with significance) of SNP annotations corresponding to 10 different negative control gene sets where each gene set is generated by taking 10% genes randomly sampled from the list of protein coding genes without replacing. All results are conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits. Size of the annotation is reported in bracket.

View this table:

Table S6.

Enrichment of gene scores for 5 “gold-standard” disease-related gene sets: Enrichment and bootstrap standard error of all probabilistic gene scores with respect to (a) 195 genes that are Phase2+ drug targets for immune related diseases^{10, 43}, (b) 550 Mendelian genes related to “immune dysregulation” as per ref.⁴⁴ (c) 390 Mendelian genes related to blood disorders curated in ref.⁴⁵, (d) 146 “Bone Marrow/Immune” genes defined by the Developmental Disorders Database/Genotype-Phenotype Database (DDD/G2P)⁴⁶ and (e) top 10 % pLI genes⁴⁷.

View this table:

Table S7.

Excess overlap of 6 of the enhancer-driven gene scores with the EDS-binary gene score: We report excess overlap of genes, as defined in ref¹¹, in each of 6 enhancer-driven gene scores (ABC-G, ATAC-distal, eQTL-CTS, Expecto-MVP, PC-HiC-binary and SEG-GTEx) as well as the PPI-enhancer gene score with respect to the EDS-binary gene score derived from the well-established Enhancer Domain Score (EDS) that characterizes enhancer-regulation in genes²⁴. The excess overlap metric assess the relative enrichment in gene scores in the genes annotated in EDS-binary, compared to all genes. We also report excess overlap for a control score defined by a gene set with randomly chosen 10% genes.

View this table:

Table S8.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores: Standardized Effect sizes (τ*) and Enrichment (E) of 70 SNP annotations corresponding to 7 Enhancer-driven gene scores and 10 S2G strategies, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S9.

Standardized enrichment of SNP annotations based on Enhancer-driven gene scores: Standardized enrichment of the enhancer-driven SNP annotations, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S10.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores meta-analyzed across 5 blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 7 Enhancer-driven gene scores and 1 PPI-enhancer, and 10 S2G strategies, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 5 blood cell traits.

View this table:

Table S11.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores meta-analyzed across 6 autoimmune diseases: Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 7 Enhancer-driven gene scores and 1 PPI-enhancer, and 10 S2G strategies, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 6 autoimmune diseases.

View this table:

Table S12.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores meta-analyzed across 2 granulocyte related blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 7 Enhancer-driven gene scores and 1 PPI-enhancer gene score and 10 S2G strategies, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 2 granulocyte related traits (white blood cell count and eosinophil count).

View this table:

Table S13.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores meta-analyzed across 3 red blood cell or platelet-related blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 7 Enhancer-driven gene scores and 1 PPI-enhancer gene score and 10 S2G strategies, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 3 red blood cell or platelet related traits (red blood count, red blood distribution width and platelet count).

View this table:

Table S14.

S-LDSC results for SNP annotations corresponding to all 11 gene scores for each of 11 autoimmune diseases and blood cell traits: Enrichment and standardized effect size estimates (τ*) corresponding to each of 11 blood-related traits and each of 110 annotations generated by combining the 11 gene scores described in the paper (7 enhancer-driven, 1 PPI-enhancer, 2 master-regulator and 1 PPI-master) with 10 2G strategies.

View this table:

Table S15.

Results of test of heterogeneity of τ* values across traits. We report the significance (p-value) of the heterogeneity metric of standardized effect size (τ*) estimates across 11 blood-related traits, where heterogeneity is defined as per ref.⁴² (see Methods). Results are meta-analyzed across 11 blood-related traits.All FDR significant (< 5%) associations are marked by asterisk (*).

View this table:

Table S16.

S-LDSC results for joint analysis of all marginally significant SNP annotations for Enhancer-driven genes. Standardized Effect sizes (τ*) and Enrichment (E) of the jointly significant Enhancer-driven SNP annotations from Table S8, conditional on the baseline-LD+ model and all SNP annotations in the Enhancer-driven joint model. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S17.

Enrichment of SNP annotations from different joint models with respect to fine-mapped SNPs for blood-related traits: We report the enrichment (E) and Jack-knife standard error with respect to 8,741 fine-mapped SNPs in autoimmune traits (Farh⁵⁰) and 1,429 genome-wide functionally fine-mapped SNPs (PIP ¿ 0.95) in blood-related traits⁵¹. Annotations in the combined joint model in Figure 5 are marked in bold.

View this table:

Table S18.

Pathway enrichment analysis of the different gene scores: Pathway enrichment analysis of the top 10% genes for 7 Enhancer-driven gene scores (colored red), 2 Master-regulator gene scores (colored blue), PPI-enhancer and PPI-master gene scores (Methods). Pathway enrichment is performed using the ConsensusPathDB database^{52, 109}. Only the top 5 non-redundant and statistically significant (q-value < 0.05) pathways for a gene set are reported.

View this table:

Table S19.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores, using mean across genes of the gene scores linked to a SNP: Standardized Effect sizes (τ*) and Enrichment (E) of 70 SNP annotations corresponding to 7 Enhancer-driven gene scores and 10 S2G strategies, where SNPs linked to multiple genes are weighted by the mean of the gene scores of the genes they are linked to. The results are conditional on 93 baseline-LD+ annotations for enhancer-driven annotaions and 113 baseline-LD+cis annotations for master-regulator annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S20.

S-LDSC results for SNP annotations corresponding to Enhancer-driven gene scores, using sum across genes of the gene scores linked to a SNP: Standardized Effect sizes (τ*) and Enrichment (E) of 70 SNP annotations corresponding to 7 Enhancer-driven gene scores and 10 S2G strategies, where SNPs linked to multiple genes are weighted by the sum of the gene scores of the genes they are linked to. The results are conditional on 93 baseline-LD+ annotations for enhancer-driven annotaions and 113 baseline-LD+cis annotations for master-regulator annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S21.

S-LDSC results for SNP annotations corresponding to all gene scores based on top 5% of genes (instead of top 10%): Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 8 gene scores with 10% genes down-sampled to top 5% genes and 10 S2G strategies, conditional on 93 baseline-LD+ annotations for enhancer-driven or PPI-enhancer gene scores and 113 baseline-LD+cis annotations for PPI-master. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S22.

S-LDSC results for SNP annotations corresponding to all gene scores based on top 20% of genes (instead of top 10%): Standardized Effect sizes (τ*) and Enrichment (E) of 80 SNP annotations corresponding to 8 gene scores with 10% genes up-sampled to top 20% genes and 10 S2G strategies, conditional on 93 baseline-LD+ annotations for enhancer-driven or PPI-enhancer gene scores and 113 baseline-LD+cis annotations for PPI-master. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S23.

S-LDSC results for SNP annotations corresponding to the PPI-enhancer gene score conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-enhancer gene score, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S24.

Standardized enrichment of SNP annotations corresponding to PPI-enhancer gene score conditional on the baseline-LD+ model: Standardized enrichment of the 10 SNP annotations corresponding to the PPI-enhancer gene score, conditional on 93 baseline-LD+ annotations respectively. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S25.

S-LDSC results for SNP annotations corresponding to the PPI-enhancer gene score conditional on the Enhancer-driven joint model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-enhancer gene score, conditional on the Enhancer-driven joint model from Table S16. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S26.

S-LDSC results for the joint analalysis of SNP annotations corresponding to the Enhancer-driven and PPI-enhancer gene scores. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a joint model comprising of marginally significant Enhancer-driven SNP annotations and PPI-enhancer SNP annotations and from Figure 3. We report results on the annotations that are significant in the joint model. Also marked in red are annotations that ere jointly Bonferroni significant in the Enhancer-driven joint model (Supplementary Figure S11) but not Bonferroni significant in this PPI-enhancer-driven joint model. All analysis are conditional on 93 baseline-LD+ annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S27.

S-LDSC results for SNP annotations corresponding to the Weighted Enhancer gene score conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the Weighted Enhancer gene score. The analysis is conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S28.

S-LDSC results for SNP annotations corresponding to the Weighted Enhancer gene score conditional on the PPI-enhancer-driven joint model: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations corresponding to the Weighted Enhancer gene score. The analysis is conditional on 93 baseline-LD+ annotations and 7 annotations from the PPI-enhancer-driven joint model in Table S26. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S29.

S-LDSC results for SNP annotations corresponding to the PPI-control gene score conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-control gene score, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S30.

S-LDSC results for SNP annotations corresponding to the PPI-control gene score conditional on the PPI-enhancer-driven joint model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-control gene score, conditional on the PPI-enhancer-driven joint model from Table S26. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S31.

S-LDSC results for SNP annotations corresponding to the RegNet-Enhancer gene score conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations corresponding to the RegNet-Enhancer gene score. The analysis is conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S32.

S-LDSC results for SNP annotations corresponding to the RegNet-Enhancer gene score conditional on the PPI-enhancer-driven joint model: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations corresponding to the RegNet-Enhancer gene score. The analysis is conditional on 93 baseline-LD+ annotations and 7 annotations from the PPI-enhancer-driven joint model in Table S26. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S33.

List of 97 known master-regulator genes from 18 master-regulator families: We list 97 genes spanning 18 master-regulator families curated from existing literature^58–62 that we use as a validation of the master-regulator gene scores.

View this table:

Table S34.

Excess overlap of master-regulator gene scores with 97 known master-regulator genes: Excess overlap (with p-value) of the Trans-master, TF and PPI-master gene scores with respect to 97 genes from 18 master-regulator families are curated from known literature^58–62 (see Table S33. For the control, we averaged over 10 different control gene scores as reported in Table S5.

View this table:

Table S35.

S-LDSC results for Trans-master and Transcription Factor (TF) SNP annotations conditional on the baseline-LD+cis model: Standardized Effect sizes (τ*) and Enrichment (E) of 20 SNP annotations corresponding to Trans-master and Transcription Factor genes. The analysis is conditional on 113 baseline-LD+cis model annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S36.

Standardized Enrichment S-LDSC results for SNP annotations generated from Trans-master and Transcription Factor gene scores conditional on the baseline-LD+cis model.: Standardized Enrichment (StdE) of 20 SNP annotations corresponding to Trans-master and Transcription Factor genes. The analysis is conditional on the 113 baseline-LD+cis (93 baseline-LD+ and 10 Cis1 and 10 Cis3LD) annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S37.

S-LDSC results for Trans-master and Transcription Factor (TF) SNP annotations conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 20 SNP annotations corresponding to Trans-master and Transcription Factor genes. The analysis is conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S38.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+Cis1 model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 103 baseline-LD+Cis1 (93 baseline-LD+ and 10 Cis1) annotations where Cis1 represents S2G annotations linked to genes with at least 1 trait-associated cis-eQTL. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S39.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+Cis1+Cis2LD model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 113 baseline-LD+Cis1+Cis2LD (93 baseline-LD+ and 10 Cis1 and 10 Cis2LD) annotations where Cis1 represents S2G annotations linked to genes with at least 1 trait-associated cis-eQTL and Cis2LD represents S2G lined to genes with 2 unlinked trait-associated cis-eQTLs. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S40.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+Cis1+Cis2 model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 113 baseline-LD+Cis1+Cis2 (93 baseline-LD+ and 10 Cis1 and 10 Cis2) annotations where Cis1 represents S2G annotations linked to genes with at least 1 trait-associated cis-eQTL and Cis2 represents S2G lined to genes with 2 not LD-corrected trait-associated cis-eQTLs. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S41.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+Cis1+Cis3 model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 113 baseline-LD+Cis1+Cis3 (93 baseline-LD+ and 10 Cis1 + 10 Cis3) annotations where Cis1 represents S2G annotations linked to genes with at least 1 trait-associated cis-eQTL and Cis3 represents S2G lined to genes with 3 not LD-corrected trait-associated cis-eQTLs. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S42.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+Cis1+Cis4 model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 113 baseline-LD+Cis1+Cis4 (93 baseline-LD+ and 10 Cis1 + 10 Cis4) annotations where Cis1 represents S2G annotations linked to genes with at least 1 trait-associated cis-eQTL and Cis4 represents S2G lined to genes with 4 not LD-corrected trait-associated cis-eQTLs. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S43.

S-LDSC results for Trans-master SNP annotations conditional on the baseline-LD+cis model, but with all trait-associated trans-eQTLs removed.: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations, but with SNPs that are among the 3,853 trait-associated trans-eQTL SNPs that were used to construct the Trans-master gene score. The analysis is conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S44.

S-LDSC results for SNP annotations corresponding to Master-regulator gene scores meta-analyzed across 5 blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 30 SNP annotations corresponding to 2 master-regulator gene scores and 1 PPI-master score and 10 S2G strategies, conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 5 blood cell traits.

View this table:

Table S45.

S-LDSC results for SNP annotations corresponding to Master-regulator gene scores meta-analyzed across 6 autoimmune diseases: Standardized Effect sizes (τ*) and Enrichment (E) of 30 SNP annotations corresponding to 2 master-regulator gene scores and 1 PPI-master and 10 S2G strategies, conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 6 autoimmune diseases.

View this table:

Table S46.

S-LDSC results for SNP annotations corresponding to Master-regulator gene scores meta-analyzed across 2 granulocyte-related blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 30 SNP annotations corresponding to 2 master-regulator gene scores and 1 PPI-master score and 10 S2G strategies, conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 2 granulocyte related traits (white blood cell count and eosinophil count).

View this table:

Table S47.

S-LDSC results for SNP annotations corresponding to Master-regulator gene scores meta-analyzed across across 3 red blood cell or platelet-related blood cell traits: Standardized Effect sizes (τ*) and Enrichment (E) of 30 SNP annotations corresponding to 2 master-regulator gene scores and 1 PPI-master score and 10 S2G strategies, conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 3 red blood cell or platelet related traits (red blood count, red blood distribution width and platelet count).

View this table:

Table S48.

S-LDSC results for jointly significant Trans-master and Transcription Factor SNP annotations: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations that are significant in a joint analysis of all marginally significant Trans-master and Transcription Factor (TF) SNP annotations. The analysis is conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S49.

S-LDSC results for Trans-regulated SNP annotations conditional on the baseline-LD+cis model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to Trans-regulated genes. The analysis is conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S50.

S-LDSC results for Trans-regulated SNP annotations conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to Trans-regulated genes. The analysis is conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S51.

S-LDSC results restricted to 6 autoimmune traits for Trans-master SNP annotations conditional on baseline-LD+cis model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 Trans-master SNP annotations conditional on 113 baseline-LD+cis annotations, with results meta-analyzed across 5 Autoimmune traits.

View this table:

Table S52.

S-LDSC results for SNP annotations corresponding to Trans-master gene score based on genes that regulate ≥ 10 genes or ≥ 1 gene (instead of ≥ 3 genes): Standardized Effect sizes (τ*) and Enrichment (E) of 20 SNP annotations corresponding to Trans-master gene score but instead of trans-regulating ≥ 3 genes by any significant cis-eQTL of the focal gene as per original definition, we define two versions corresponding to trans-regulating ≥ 10 (3717) genes or ≥ 1 (1170) genes. Results are conditional on 113 baseline-LD+cis annotations for PPI-master. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S53.

S-LDSC results for SNP annotations corresponding to the PPI-master gene score conditional on the baseline-LD+cis model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-master gene score, conditional on 113 baseline-LD+cis annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S54.

Standardized enrichment of SNP annotations corresponding to PPI-master gene score conditional on the baseline-LD+cis model: Standardized enrichment of the 10 SNP annotations corresponding to the PPI-master gene score, conditional on 113 baseline-LD+cis annotations respectively. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S55.

S-LDSC results for SNP annotations corresponding to the PPI-master gene score conditional on the baseline-LD+ model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-master gene score, conditional on 93 baseline-LD+ annotations. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S56.

S-LDSC results for SNP annotations corresponding to the PPI-master gene score conditional on the Master-regulator joint model: Standardized Effect sizes (τ*) and Enrichment (E) of 10 SNP annotations corresponding to the PPI-master gene score, conditional on the Master-regulator joint model from Table S48. Reports are meta-analyzed across 11 blood-related traits.

View this table:

Table S57.

S-LDSC results for a joint analysis of all SNP annotations corresponding to the master-regulator and PPI-master gene scores. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a joint analysis comprising of marginally significant Master-regulator SNP annotations from Figure 4B and PPI-master SNP annotations. Reported are the results for the jointly significant annotations only. Also marked in red are annotations that are jointly Bonferroni significant in the Master-regulator joint model in Supplementary Figure S14 but not Bonferroni significant in this model. All analyses are conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S58.

S-LDSC results for the combined joint analysis of SNP annotations corresponding to Enhancer-driven, PPI-enhancer, Master-regulator and PPI-master gene scores conditional on the baseline-LD+cis model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a combined joint model comprising of marginally significant Enhancer-driven, PPI-enhancer, Master-regulator and PPI-master SNP annotations. All analysis are conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S59.

Combined τ* of various joint models. We report combined τ* scores (as well as standard errors, t statistic and p-value) for the several joint models studied.

View this table:

Table S60.

S-LDSC results for a joint analysis of SNP annotations corresponding to Enhancer-driven and PPI-enhancer gene scores conditional on baseline-LD+cis model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in the PPI-enhancer-driven joint model in Table S26 conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S61.

S-LDSC results for a joint analysis of all SNP annotations corresponding to Enhancer-driven gene scores conditional on baseline-LD+cis model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in the Enhancer driven joint model in Table S16 conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S62.

Δlogl_SS results for the different heritability models. We report Δlogl_SS derived from the logl_SS metric, proposed in ref.⁶⁴, for the different heritability models studied in this paper: baseline-LD, baseline-LD+, baseline-LD+Enhancer (86 baseline-LD annotations and the 6 jointly significant annotations from Supplementary Figure S11), baseline-LD+PPI-enhancer (86 baseline-LD annotations and 7 jointly significant annotations from Figure 3D), baseline-LD+cis, baseline-LD+master (113 baseline-LD+cis and 4 jointly significant annotations from Supplementary Figure S14), baseline-LD+PPI-master (113 baseline-LD+cis and 4 jointly significant annotations from Figure 4D) and baseline-LD+final (113 baseline-LD+cis and 8 jointly significant annotations from Figure 5). We compute Δlogl_SS as the difference in logl_SS for each model with respect to s baselineLD-no-funct model with 17 annotations that include no functional annotations^{42, 64}. We also report the percentage increase in Δlogl_SS for each model over the baseline-LD model. We do not report AIC as the number of annotations are not too different to alter conclusions based on just the logl_SS. We report three summary logl_SS results - one averaged across 30 UK Biobank traits⁴² (All), one averaged across 6 blood-related traits from UK Biobank (Blood) and one averaged across the other 24 non blood related traits from UK Biobank (Non-blood) (Table S63). Averaged across all traits, we observe a +22.4% improvement from the annotations of the final joint model (baseline-LD+final) over baseline-LD. Of this 22.4%, we observe a +5.1% larger improvement in this metric from the 7 new S2G annotations constituting baseline-LD+ and +13% larger improvement from the 7 new S2G, plus the 20 annotations in baseline-LD+cis model. The remainder of the improvement (22.4% - 13% = 9.4%) comes from the 8 jointly significant annotations in the final joint model in Figure 5. The percentage increase was higher for the non-blood-related traits case as the logl_SS values, on average, were considerably smaller in comparison to the blood related traits.

View this table:

Table S63.

List of UKBiobank traits used for logl_SS calculations. The list consists of 6 blood-related traits and 24 non blood-related traits.

View this table:

Table S64.

List of 41 diseases and complex traits analyzed. We first list the 11 blood-related traits (6 autoimmune diseases and 5 blood cell traits), followed by 10 brain traits (with 2 traits analyzed by two different datasets). Overall, for 6 traits we analyzed two different data sets, leading to a total of 47 data sets.

View this table:

Table S65.

S-LDSC results for SNP annotations corresponding to Enhancer-driven and master-regulator gene scores meta-analyzed across 8 brain-related traits: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations corresponding to all enhancer-driven and master-regulator gene scores and 10 S2G strategies, meta-analyzed across 8 relatively independent brain-related traits. Results are conditional on 93 baseline-LD+ annotations for the 7 enhancer-driven genes and 1 PPI-enhancer gene score, and conditional on 113 baseline-LD+cis annotations for the 2 master-regulator scores and 1 PPI-master gene score.

View this table:

Table S66.

S-LDSC results for SNP annotations corresponding to Enhancer-driven and master-regulator gene scores meta-analyzed across 28 non-blood and non-brain related traits: Standardized Effect sizes (τ*) and Enrichment (E) of SNP annotations corresponding to all enhancer-driven and master-regulator gene scores and 10 S2G strategies, meta-analyzed across 28 relatively independent non-blood and non-brain related traits. Results are conditional on 93 baseline-LD+ annotations for the 7 enhancer-driven genes and 1 PPI-enhancer gene score, and conditional on 113 baseline-LD+cis annotations for the 2 master-regulator scores and 1 PPI-master gene score.

View this table:

Table S67.

List of 194 fine-mapped SNPs belonging to Trans-master × Roadmap, PPI-enhancer × ABC, PPI-master × ATAC, and/or PPI-master × Roadmap annotations: Fine-mapped SNPs with PIP > 0.90 in UKBB All autoimmune traits, or either of the 5 blood cell traits belonging to Trans-master × Roadmap, PPI-enhancer × ABC, PPI-master × ATAC, and/or PPI-master × Roadmap annotations, annotations are regulatory and highly enriched (≥18) in the combined joint model (Figure 5). The number of unique fine-mapped SNPs is 194 and the number of unique SNP-gene pairs for traits implicated by our analysis is 273.

View this table:

Table S68.

S-LDSC results for the combined joint model of all SNP annotations corresponding to Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and pLI gene scores conditional on baseline-LD+cis model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a joint model comprising of Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and pLI SNP annotations. All analysis are conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S69.

S-LDSC results for the combined joint model of all SNP annotations corresponding to Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and PPI-all gene scores conditional on baseline-LD+cis model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a joint model comprising of Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and PPI-all SNP annotations. All analysis are conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S70.

S-LDSC results for the less restrictive combined joint model of all SNP annotations corresponding to both Enhancer-driven, PPI-enhancer, Master-regulator and PPI-master gene scores conditional on the baseline-LD+ model. Standardized Effect sizes (τ*) and Enrichment (E) of the significant SNP annotations in a joint model comprising of Enhancer-driven, PPI-enhancer, Master-regulator and PPI-master SNP annotations. All analysis are conditional on 113 baseline-LD+cis annotations. Results are meta-analyzed across 11 blood-related traits.

View this table:

Table S71.

MAGMA gene-set analysis p-values for all gene scores, meta-analysed across blood-related traits: Meta-analysed p-value across 11 independent blood-related diseases and traits for binarized (taking top 10% if the actual gene score is probabilistic) Enhancer-driven, Master-regulator, PPI-enhancer and PPI-master gene scores in a MAGMA competitive gene sets analysis.

2 Supplementary Figures

Figure S1.

Density plots of the distribution of the metric underling each gene score. The density plots of the original gene scores (re-scaled between 0 and 1). The solid vertical line denotes the marker for top 10% genes for scores that are binarized in Table 1, while the dashed plots to its left and right correspond to top 5% and top 20% genes.

Figure S2.

Correlation between different gene scores. Correlation matrix of all 11 gene scores (7 enhancer-driven, 2 master-regulator, 1 PPI-enhancer-driven and 1 PPI-master) from Table 1 across all genes.

Figure S3.

Correlation between SNP-to-gene (S2G) linking strategies. Correlation matrix of all 10 SNP-to-gene (S2G) linking strategies (Table 2), as assessed by the sets of SNPs linked to genes.

Figure S4.

Standardized enrichment of SNP annotations for SNPs linked to all genes. Barplot representing standardized enrichment metric, as proposed in ref.¹¹ for 10 SNP annotations corresponding SNPs linked to all genes by 10 S2G strategies from Table 2.

Figure S5.

Heritability enrichment of SNP annotations corresponding to all gene scores. Heatmap representing heritability enrichment (log scale) metric for (A) SNP annotations corresponding to all genes, (B) SNP annotations corresponding to 7 enhancer-driven and 1 PPI-enhancer gene scores, and (C) SNP annotations corresponding to 2 master-regulator gene scores and 1 PPI-master gene score stacked sequentially from top to bottom..

Figure S6.

Standardized enrichment of SNP annotations corresponding to all 11 gene scores. Heatmap representing standardized enrichment metric, as proposed in ref.¹¹ for all SNP annotations corresponding to all 11 gene scores (7 enhancer-driven, 2 master-regulator, 1 PPI-enhancer-driven and 1 PPI-master) stacked sequentially from top to bottom.

Figure S7.

Standardized effect sizes of SNP annotations corresponding to all 11 genes scores, meta-analyzed across blood cell traits only or 5 autoimmune diseases only. Heatmap representing standardized effect size of all SNP annotations corresponding to all 11 gene scores (7 enhancer-driven, 2 master-regulator, 1 PPI-enhancer and 1 PPI-master; stacked sequentially from top to bottom) for two meta-analyses - (A) meta-analysis over 5 blood cell traits and (B) meta-analysis over 6 autoimmune diseases. The meta-analysis result for all 11 traits is reported in Figure 3B and Figure 4B. Numerical results are reported in Supplementary Tables S10, S11, S44, S45.

Figure S8.

Standardized effect sizes of SNP annotations corresponding to all 11 genes scores, meta-analyzed across 2 granulocyte-related blood cell traits only or 3 red blood cell or platelet-related blood cell traits only. Scatter plot of the meta-analyzed standardized effect size τ* between two different meta-analysis - (i) across 2 granulocyte related blood cell traits (eosinophil count, white blood cell count) and (ii) 3 other blood cell traits (red blood cell counts, red blood cell distribution width, platelet count). The annotations with FDR (¡ 0.05) significant difference in effect size between the two meta-analyses are circled. Numerical results are reported in Supplementary Tables S13, S12, S47 and S46.

Figure S9.

Heritability enrichment of SNP annotations corresponding to 7 enhancer-driven and 1 PPI-enhancer gene scores for each of 11 autoimmune diseases and blood cell traits. Heatmap representing heritability enrichment of SNP annotations for each of 11 blood-related traits separately corresponding to (A) ABC-G, (B) ATAC-distal, (C) EDS-binary, (D) eQTL-CTS, (E) Expecto-MVP, (F) PC-HiC-distal, (G) SEG-GTEx and (H) PPI-enhancer gene scores. Numerical results are reported in Supplementary Table S14.

Figure S10.

Standardized effect sizes of SNP annotations corresponding to 7 enhancer-driven and 1 PPI-enhancer gene scores for each of 11 autoimmune diseases and blood cell traits. Heatmap representing standardized effect size of SNP annotations for each of 11 blood-related traits separately corresponding to (A) ABC-G, (B) ATAC-distal, (C) EDS-binary, (D) eQTL-CTS, (E) Expecto-MVP, (F) PC-HiC-distal, (G) SEG-GTEx and (H) PPI-enhancer gene scores. Numerical results are reported in Supplementary Table S14.

Figure S11.

Disease informativeness of Enhancer-driven annotations in a joint analysis: (Panel A) Heritability enrichment (log scale), conditional on the Enhancer-driven joint model. Horizontal line denotes no enrichment. (Panel B) Standardized effect size τ* conditional on the same model. Results are shown only for the SNP annotations that are significant in the joint analysis after Bonferroni correction (0.05/110). Errors bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table S16.

Figure S12.

Heritability enrichment of SNP annotations corresponding to 2 master-regulator and 1 PPI-master gene scores for each of 11 autoimmune diseases and blood cell traits. Heatmap representing heritability enrichment of SNP annotations for each of 11 blood-related traits separately corresponding to (A) Trans-master, (B) TF and (H) PPI-master gene scores. Numerical results are reported in Supplementary Table S14.

Figure S13.

Standardized effect sizes of SNP annotations corresponding to 2 master-regulator and 1 PPI-master gene scores for each of 11 autoimmune diseases and blood cell traits. Heatmap representing standardized effect size of SNP annotations for each of 11 blood-related traits separately corresponding to (A) Trans-master, (B) TF and (H) PPI-master gene scores. Numerical results are reported in Supplementary Table S14.

Figure S14.

Disease informativeness of Master-regulator annotations in a joint analysis: (Panel A) Heritability enrichment (log scale), conditional on the Master-regulator joint model. Horizontal line denotes no enrichment. (Panel B) Standardized effect size τ* conditional on the same model. Results are shown only for the SNP annotations that are significant in the joint analysis after Bonferroni correction (0.05/110). Errors bars represent 95% confidence intervals. Numerical results are reported in Supplementary Table S48.

Figure S15.

Combined τ* of different joint models studied: A plot of the combined τ* metric for different joint models. Numerical results are reported in Supplementary Table S59.

Figure S16.

Standardized effect sizes of SNP annotations corresponding to all 11 gene scores, meta-analyzed across 8 brain-related traits or 28 non-brain and non-blood related traits. Heatmap representing standardized effect size of all SNP annotations corresponding to all 11 gene scores (7 enhancer-driven, 2 master-regulator, 1 PPI-enhancer and 1 PPI-master; stacked sequentially from top to bottom) for two meta-analyses - (A) meta-analysis over 8 relatively independent brain-related traits and (B) meta-analysis over 28 relatively independent non-brain and non-blood related traits. We also present in (C) results of meta-analysis over 11 relatively independent blood-related traits for the sake of comparison. Numerical results are reported in Supplementary Tables S65, S66, S8 and S35.

Figure S17.

Illustrative examples of fine-mapped SNPs belonging to SNP annotations analyzed in this study: Illustration of four fine-mapped SNPs (PIP > 0.90), 2 in UKBB “All autoimmune traits”, and one variant that is fine-mapped for both WBC count and Eosinophil count traits (Table S1). These SNPs overlap with the region implicated by a S2G strategy linking to a gene in top 2% for some of the jointly informative S2G annotations in Figure 5. (A) rs231779 is linked to gene CTLA4 by PPI-enhancer gene score (ranked 109) using ABC S2G strategy for “All autoimmune traits”, (B) rs6908626 is linked to gene BACH2 by Trans-master gene score (ranked 311) using Roadmap S2G strategy for “All autoimmune traits”, (C, D) rs113473633 is linked to gene NFKB1 by PPI-master (ranked 111) using Roadmap S2G strategy for WBC count (C) and Eosinophil count (D). Numerical results are reported in Supplementary Table S67.

Figure S18.

Disease informativeness of Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and pLI SNP annotations in a joint analysis: (Panel A) Heritability enrichment, conditional on the combined joint model of Enhancer-driven, PPI-enhancer, Master-regulator, PPI-master and pLI SNP annotations. Horizontal line denotes no enrichment. (Panel B) Standardized effect size τ* conditional on the same model. Results are shown only for the SNP annotations that are significant in the joint model after Bonferroni correction (0.05/110). Errors bars represent 95% confidence intervals. Numerical results are reported in Table S68.

Figure S19.

Correlations of residuals from stratified LD score regression with annotation-specific LD scores: A diagnostic plot assessing the model fit of the stratified LD score regression (across 11 blood-related traits) where we evaluate the correlation of the residuals from the final joint model with LD score predictors corresponding to 121 annotations in the final joint model (113 baseline-LD+cis and 8 annotations from Figure 5). The mean correlation was observed to be 0.05 which is close to 0.

Acknowledgments

We thank Ran Cui, Hilary Finucane, Sebastian Pott, John Platig, Xinchen Wang and Soumya Raychaudhuri for helpful discussions. This research was funded by NIH grants U01 HG009379, R01 MH101244, K99HG010160, R37 MH107649, R01 MH115676 and R01 MH109978. S.S.Kim was supported by NIH award F31HG010818. This research was conducted using the UK Biobank Resource under application 16549.

Footnotes

We have updated the manuscript to include individual trait-level analysis, meta-analyses for autoimmmune disease and blood cell traits separately and highlighting how this approach can nominate causal genes and SNP-gene links for specific fine-mapped SNPs. Figures 2, 3 revised.
https://github.com/kkdey/GSSG
https://alkesgroup.broadinstitute.org/LDSCORE/Dey_Enhancer_MasterReg

References

1.↵
M.T. Maurano et al. Systematic localization of common disease-associated variation in regulatory DNA. Science, 337:1190–1195, 2012.
OpenUrl Abstract/FREE Full Text
2.
G. Trynka et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics, 45(2):124–130, 2013.
OpenUrl CrossRef PubMed
3.
J.K. Pickrell. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. The American Journal of Human Genetics, 94(4):559–573, 2014.
OpenUrl CrossRef PubMed
4.
A.L. Price, C.C. Spencer, and P. Donnelly. Progress and promise in understanding the genetic basis of common diseases. Proceedings of the Royal Society B: Biological Sciences, 282(1821):20151684, 2015.
OpenUrl CrossRef PubMed
5.
P.M. Visscher et al. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics, 101(1):5–22, 2017.
OpenUrl CrossRef PubMed
6.↵
J. Shendure, G.M. Findlay, and M.W. Snyder. Genomic medicine–progress, pitfalls, and promise. Cell, 177:45–57, 2019.
OpenUrl CrossRef PubMed
7.↵
J. Zhou, C.L. Theesfeld, K. Yao, K.M. Chen, A.K. Wong, and O.G. Troyanskaya. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, 50:1171–1179, 2018.
OpenUrl CrossRef
8.↵
X. Zhu and M. Stephens. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nature communications, 9(1):4361, 2018.
OpenUrl
9.↵
H.K. Finucane, Y.A. Reshef, V. Anttila, K. Slowikowski, A. Gusev, A. Byrnes, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature genetics, 50:621–629, 2018.
OpenUrl CrossRef PubMed
10.↵
H. Fang et al. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nature genetics, 51:1082–1091, 2019.
OpenUrl
11.↵
S.S. Kim et al. Genes with high network connectivity are enriched for disease heritability. The American Journal of Human Genetics, 104:896–913, 2019.
OpenUrl CrossRef PubMed
12.
Q. Wang et al. A Bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data. Nature neuroscience, 22:691–699, 2019.
OpenUrl CrossRef PubMed
13.
C.S. Smillie et al. Intra-and inter-cellular rewiring of the human colon during ulcerative colitis. Cell, 178:714–730, 2019.
OpenUrl CrossRef PubMed
14.↵
M. Wainberg et al. Opportunities and challenges for transcriptome-wide association studies. Nature genetics, 51:592–599, 2019.
OpenUrl CrossRef PubMed
15.↵
A.D. Sawle et al. Identification of master regulator genes in human periodontitis. Journal of dental research, 95:1010–1017, 2016.
OpenUrl CrossRef PubMed
16.
E.A. Boyle, Y.I. Li, and J.K. Pritchard. An expanded view of complex traits: from polygenic to omnigenic. Cell, 169:1177–1186, 2017.
OpenUrl CrossRef PubMed
17.
B. Brynedal et al. Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation. The American Journal of Human Genetics, 100(4):581–591, 2017.
OpenUrl CrossRef
18.
C. Yao et al. Dynamic role of trans regulation of gene expression in relation to complex traits. The American Journal of Human Genetics, 100:571–580, 2017.
OpenUrl CrossRef
19.
D.M.D. Vargas et al. Alzheimer’s disease master regulators analysis: search for potential molecular targets and drug repositioning candidates. Alzheimer’s research & therapy, 10:59, 2018.
OpenUrl
20.
L.E. Montefiori et al. A promoter interaction map for cardiovascular disease genetics. Elife, 7:e35788, 2018.
OpenUrl
21.
X. Liu, Y.I. Li, and J.K. Pritchard. Trans effects on gene expression can drive omnigenic inheritance. Cell, 177:1022–1034, 2019.
OpenUrl CrossRef PubMed
22.
A.D. Torshizi et al. Deconvolution of Transcriptional Networks Identifies TCF4 as a Master Regulator in Schizophrenia. Science Advances, 5:eaau4139, 2019.
OpenUrl FREE Full Text
23.
R. Andersson and A. Sandelin. Determinants of enhancer and promoter activities of regulatory elements. Nature Reviews Genetics, pages 1–17, 2019.
24.↵
X. Wang and D.B. Goldstein. Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. The American Journal of Human Genetics, 106:215–233, 2020.
OpenUrl CrossRef
25.↵
E.S. Emison et al. A common sex-dependent mutation in a ret enhancer underlies hirschsprung disease risk. Nature, 434(7035):857–863, 2005.
OpenUrl CrossRef PubMed Web of Science
26.↵
S. Chatterjee et al. Enhancer variants synergistically drive dysfunction of a gene regulatory network in hirschsprung disease. Cell, 167(2):pp.355–368, 2016.
OpenUrl CrossRef PubMed
27.↵
K.S. Kobayashi and P.J. Van Den Elsen. Nlrc5: a key regulator of mhc class i-dependent immune responses. Nature Reviews Immunology, 12(12):813–820, 2012.
OpenUrl CrossRef PubMed
28.↵
H.K. Finucane, B. Bulik-Sullivan, A. Gusev, G. Trynka, Y. Reshef, P.R. Loh, V. Anttila, H. Xu, C. Zang, K. Farh, and S. Ripke. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, 47:1228–1235, 2015.
OpenUrl CrossRef PubMed
29.↵
S. Gazal et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nature genetics, 49(10):1421–1427, 2017.
OpenUrl CrossRef PubMed
30.↵
S. Gazal, C. Marquez-Luna, H.K. Finucane, and A.L. Price. Reconciling s-ldsc and ldak models and functional enrichment estimates. Nature genetics, 51(8):1202–1204, 2019.
OpenUrl CrossRef
31.↵
1000 Genomes Project Consortium. A global reference for human genetic variation. Molecular cell, 526(7571):68–74, 2015.
OpenUrl
32.↵
C.P. Fulco et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nature Genetics, 51:1664–1669, 2019.
OpenUrl CrossRef
33.↵
J. Nasser et al. Genome-wide maps of enhancer regulation connect risk variants to disease genes. bioRxiv, accepted in principle, Nature, 2020.
34.↵
S. McCarthy et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics, 48:1279–1283, 2016.
OpenUrl CrossRef PubMed
35.↵
C. Bycroft et al. The uk biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203–209, 2018.
OpenUrl CrossRef PubMed
36.↵
D. Taliun et al. Sequencing of 53,831 diverse genomes from the nhlbi topmed program. Nature, 590(7845):290–299, 2021.
OpenUrl CrossRef
37.↵
F. Hormozdiari et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nature genetics, 50(7):1041–1047, 2018.
OpenUrl CrossRef PubMed
38.↵
S. Gazal et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet, 50:1600–1607, 2018.
OpenUrl
39.↵
F.I. Hormozdiari et al. Functional disease architectures reveal unique biological role of transposable elements. Nature communications, 10(1):1–8, 2019.
OpenUrl CrossRef
40.
P.F. Palamara et al. High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nature Genetics, 50(9):1311–1317, 2018.
OpenUrl CrossRef
41.↵
K.K. Dey et al. Evaluating the informativeness of deep learning annotations for human complex diseases. Nature communications, 11(1):pp.1–9, 2020.
OpenUrl
42.↵
S.S. Kim et al. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nature communications, 11(1):1–15, 2020.
OpenUrl
43.↵
A. Gaulton et al. The ChEMBL database in 2017. Nucleic acids research, 45:D945–D954, 2016.
OpenUrl
44.↵
M.K. Freund et al. Phenotype-specific enrichment of Mendelian disorder genes near GWAS regions across 62 complex traits. The American Journal of Human Genetics, 103 (4):pp.535–552, 2018.
OpenUrl PubMed
45.↵
D. Vuckovic et al. The polygenic and monogenic basis of blood traits and diseases. Cell, 182 (5):pp.1214–1231, 2020.
OpenUrl CrossRef PubMed
46.↵
C.F. Wright et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. The Lancet, 385(9975):1305–1314, 2015.
OpenUrl
47.↵
M. Lek, K.J. Karczewski, E.V. Minikel, K.E. Samocha, E. Banks, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536:285–291, 2016.
OpenUrl CrossRef PubMed Web of Science
48.↵
H. Yoshida et al. The cis-Regulatory Atlas of the Mouse Immune System. Cell, 176:897–912, 2019.
OpenUrl CrossRef
49.↵
A.P. Schoech et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nature communications, 10:1–10, 2019.
OpenUrl CrossRef
50.↵
K.K.H. Farh et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature, 518:337–343, 2015.
OpenUrl CrossRef PubMed
51.↵
O. Weissbrod et al. Functionally-informed fine-mapping and polygenic localization of complex trait heritability. Nature Genetics, 52(12):p.1355–1363, 2020.
OpenUrl
52.↵
A. Kamburov et al. The ConsensusPathDB interaction database: 2013 update. Nucleic acids research, 41(D1):D793–D800, 2012.
OpenUrl PubMed
53.↵
D. Szklarczyk et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic acids research, 45(Database issue):D362–D368, 2017.
OpenUrl CrossRef PubMed
54.↵
H. Tong, C. Faloutsos, and J.Y. Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14:327–346, 2008.
OpenUrl
55.↵
A.R. Sonawane et al. Understanding tissue-specific gene regulation. Cell reports, 21:1077–1088, 2017.
OpenUrl
56.↵
U. Vösa et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. bioRxiv, page 447367, 2018.
57.↵
S.A. Lambert et al. The human transcription factors. Cell, 172:650–665, 2018.
OpenUrl CrossRef PubMed
58.↵
W. Cai et al. Master regulator genes and their impact on major diseases. PeerJ, 8:p.e9952, 2020.
59.
M.C. Nakamura. CIITA: a master regulator of adaptive immunity shows its innate side in the bone. Journal of bone and mineral research: the official journal of the American Society for Bone and Mineral Research, 29(2):p.287, 2014.
OpenUrl
60.
L. Colomer, C.and Marruecos, A. Vert, A. Bigas, and L. Espinosa. NF-kB members left home: NF-kB-independent roles in cancer. Biomedicines, 5(2):p.26, 2017.
OpenUrl
61.
E.H. Bresnick, K.R. Katsumura, H.Y. Lee, K.D. Johnson, and A.S. Perkins. Master regulatory GATA transcription factors: mechanistic principles and emerging links to hematologic malignancies. Nucleic acids research, 40(13):pp.5819–5831, 2012.
OpenUrl CrossRef PubMed
62.↵
S. Paul, P. Home, B. Bhattacharya, and S. Ray. GATA factors: Master regulators of gene expression in trophoblast progenitors. Placenta, 60:pp.S61–S66, 2017.
OpenUrl
63.↵
B. van de Geijn, H. Finucane, S. Gazal, F. Hormozdiari, T. Amariuta, and X Liu. Annotations capturing cell-type-specific TF binding explain a large fraction of disease heritability. Human Molecular Genetics, 29:1057–1067, 2020.
OpenUrl
64.↵
D. Speed, J. Holmes, and D.J Balding. Evaluating and improving heritability models using summary statistics. Nature Genetics, 52:458–462, 2020.
OpenUrl PubMed
65.↵
S. Chikuma. Ctla-4, an essential immune-checkpoint for t-cell activation. Emerging Concepts Targeting Immune Checkpoints in Cancer and Autoimmunity, pages 99–126, 2017.
66.
Y. Zhao et al. Evolving roles for targeting ctla-4 in cancer immunotherapy. Cellular Physiology and Biochemistry, 47(2):721–734, 2018.
OpenUrl CrossRef
67.↵
F. Liu et al. Ctla-4 correlates with immune and clinical characteristics of glioma. Cancer cell international, 20(1):1–10, 2020.
OpenUrl
68.↵
M.J. Richer, et al. T cell fates zipped up: how the bach2 basic leucine zipper transcriptional repressor directs t cell differentiation and function. The Journal of Immunology, 197(4):1009–1015, 2016.
OpenUrl
69.↵
H. Zhang et al. Bach2 deficiency leads to spontaneous expansion of il-4-producing t follicular helper cells and autoimmunity. Frontiers in immunology, 10:2050, 2019.
OpenUrl
70.↵
R. Roychoudhuri et al. Bach2 represses effector programs to stabilize t reg-mediated immune homeostasis. Nature, 498(7455):506–510, 2013.
OpenUrl CrossRef PubMed Web of Science
71.↵
J.D. Cooper et al. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nature genetics, 40(12):1399–1401, 2008.
OpenUrl CrossRef PubMed Web of Science
72.
M.A. Ferreira et al. Identification of il6r and chromosome 11q13. 5 as risk loci for asthma. The Lancet, 378(9795):1006–1014, 2011.
OpenUrl
73.↵
D.L. Morris et al. Genome-wide association meta-analysis in chinese and european individuals identifies ten new loci associated with systemic lupus erythematosus. Nature genetics, 48(8):940, 2016.
OpenUrl CrossRef PubMed
74.↵
A. Oeckinghaus and S. Ghosh. The nf-kb family of transcription factors and its regulation. Cold Spring Harbor perspectives in biology, 1(4):p.a000034, 2009.
75.↵
R.J. Grumont et al. B lymphocytes differentially use the rel and nuclear factor kb1 (nfkb1) transcription factors to regulate cell cycle progression and apoptosis in quiescent and mitogen-activated cells. Journal of Experimental Medicine, 187(5):663–674, 1998.
OpenUrl Abstract/FREE Full Text
76.↵
S. Gerondakis and U. Siebenlist. Roles of the nf-kb pathway in lymphocyte development and function. Cold Spring Harbor perspectives in biology, 2(5):p.a000182, 2010.
77.↵
M.L. Hujoel et al. Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species. The American Journal of Human Genetics, 104:611–624, 2019.
OpenUrl CrossRef
78.↵
C.A. de Leeuw, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS computational biology, 11(4):e1004219, 2015.
OpenUrl CrossRef
79.↵
K.J. Karczewski et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581:434–443, 2020.
OpenUrl CrossRef PubMed
80.↵
E.V. Minikel et al. Evaluating drug targets through human loss-of-function genetic variation. Nature, 581:459–464, 2020.
OpenUrl CrossRef
81.↵
K.A. Jagadeesh, K.K. Dey, et al. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. bioRxiv, 2021.
82.↵
N. Mancuso et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nature genetics, 51:675–682, 2019.
OpenUrl CrossRef PubMed
83.↵
E.M. Weeks et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. medRxiv, 2020.
84.↵
G. Kichaev et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics, 10(10):e1004722, 2014.
OpenUrl
85.
W. Chen et al. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics, 204(3):933–958, 2016.
OpenUrl Abstract/FREE Full Text
86.↵
G. Kichaev et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics, 33(2):248–255, 2017.
OpenUrl CrossRef PubMed
87.↵
J.P. Ray et al. Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features. Nature communications, 11(1):1–13, 2020.
OpenUrl
88.↵
Y. Hu et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS computational biology, 13(6):e1005589, 2017.
OpenUrl
89.↵
C. Marquez-Luna et al. LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. bioRxiv, 2020.
90.↵
R.J. Kinsella et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database, 2011.
91.↵
A. Kundaje et al. Integrative analysis of 111 reference human epigenomes. Nature, 518:317–330, 2015.
OpenUrl CrossRef PubMed
92.↵
Y. Liu, A. Sarkar, and M. Kellis. Evidence of a recombination rate valley in human regulatory domains. Genome Biology, page 193, 2017.
93.↵
B.J. Schmiedel et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell, 175:1701–1715, 2018.
OpenUrl CrossRef PubMed
94.↵
D. Szklarczyk et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic acids research, 43:D447–D452, 2014.
OpenUrl PubMed
95.↵
M.M. Hoffman, J. Ernst, S.P. Wilder, A. Kundaje, R.S. Harris, and M. Libbrecht. A method to predict the impact of regulatory variants from DNA sequence. Nucleic acids research, 41:827–841, 2012.
OpenUrl PubMed Web of Science
96.↵
M.M. Hoffman, O.J. Buske, J. Wang, Z. Weng, J.A. Bilmes, and W.S. Noble. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods, 9:473–476, 2012.
OpenUrl
97.↵
W.J. Kent et al. The human genome browser at ucsc. Genome research, 12(6):996–1006, 2002.
OpenUrl Abstract/FREE Full Text
98.↵
D. Karolchik et al. The ucsc table browser data retrieval tool. Nucleic acids research, 32 (Database Issue):D493–D496, 2004.
OpenUrl CrossRef PubMed Web of Science
99.↵
J. Ernst et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473:43–49, 2011.
OpenUrl CrossRef PubMed Web of Science
100.↵
B.M. Javierre et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell, 167:1369–1384, 2016.
OpenUrl CrossRef PubMed
101.↵
H.M. Amemiya, A. Kundaje, and A.P. Boyle. The ENCODE blacklist: identification of problematic regions of the genome. Scientific reports, 9(1):1–5, 2019.
OpenUrl CrossRef
102.↵
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74, 2012.
OpenUrl CrossRef PubMed Web of Science
103.↵
Jan-Renier AJ Moonen et al. KLF4 Recruits SWI/SNF to Increase Chromatin Accessibility and Reprogram the Endothelial Enhancer Landscape under Laminar Shear Stress. bioRxiv, 2020.
104.
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature, 550(7675):204–213, 2017.
OpenUrl CrossRef PubMed Web of Science
105.
L. Jostins et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491:119–124, 2012.
OpenUrl CrossRef PubMed Web of Science
106.
Y. Okada et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature, 506:376–381, 2014.
OpenUrl CrossRef PubMed Web of Science
107.
J. Bentham et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nature genetics, 47(12):1457–1464, 2015.
OpenUrl CrossRef PubMed
108.
P.C. Dubois et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics, 42(4):295–302, 2010.
OpenUrl CrossRef PubMed Web of Science
109.↵
A. Kamburov et al. ConsensusPathDB: toward a more complete picture of cell biology. Nucleic acids research, 39(suppl 1):D712–D717, 2010.
OpenUrl PubMed Web of Science
110.
E.K. Speliotes et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature genetics, 42:937–948, 2010.
OpenUrl CrossRef PubMed Web of Science
111.
A. Okbay et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature, 533:539–542, 2016.
OpenUrl CrossRef PubMed
112.
S. Ripke et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.
OpenUrl CrossRef PubMed Web of Science
113.
Tobacco and . Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature genetics, 42:441–447, 2010.
OpenUrl CrossRef PubMed Web of Science
114.
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near odz4. Nature genetics, 43:977–983, 2011.
OpenUrl CrossRef PubMed
115.
N. Barban et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nature genetics, 48(12):1462–1472, 2016.
OpenUrl CrossRef PubMed
116.
V. Boraska et al. A genome-wide association study of anorexia nervosa. Molecular psychiatry, 19(10):1085–1094, 2014.
OpenUrl CrossRef PubMed Web of Science
117.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. The Lancet, 381(9875):1371–1379, 2013.
OpenUrl CrossRef
118.
H. Schunkert et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43:333–338, 2011.
OpenUrl CrossRef PubMed
119.
T.M. Teslovich et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature, 466:707–713, 2010.
OpenUrl CrossRef PubMed Web of Science
120.
H.L. Allen et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467:832–838, 2010.
OpenUrl CrossRef PubMed Web of Science
121.
A.P. Morris et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44:981–990, 2012.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted March 31, 2021.

Download PDF

Data/Code

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29194)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18305)
Genetics (12245)
Genomics (16801)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60962)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] 1.↵
M.T. Maurano et al. Systematic localization of common disease-associated variation in regulatory DNA. Science, 337:1190–1195, 2012.
OpenUrl Abstract/FREE Full Text

[2] 2.
G. Trynka et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics, 45(2):124–130, 2013.
OpenUrl CrossRef PubMed

[3] 3.
J.K. Pickrell. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. The American Journal of Human Genetics, 94(4):559–573, 2014.
OpenUrl CrossRef PubMed

[4] 4.
A.L. Price, C.C. Spencer, and P. Donnelly. Progress and promise in understanding the genetic basis of common diseases. Proceedings of the Royal Society B: Biological Sciences, 282(1821):20151684, 2015.
OpenUrl CrossRef PubMed

[5] 5.
P.M. Visscher et al. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics, 101(1):5–22, 2017.
OpenUrl CrossRef PubMed

[6] 6.↵
J. Shendure, G.M. Findlay, and M.W. Snyder. Genomic medicine–progress, pitfalls, and promise. Cell, 177:45–57, 2019.
OpenUrl CrossRef PubMed

[7] 7.↵
J. Zhou, C.L. Theesfeld, K. Yao, K.M. Chen, A.K. Wong, and O.G. Troyanskaya. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, 50:1171–1179, 2018.
OpenUrl CrossRef

[8] 8.↵
X. Zhu and M. Stephens. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nature communications, 9(1):4361, 2018.
OpenUrl

[9] 9.↵
H.K. Finucane, Y.A. Reshef, V. Anttila, K. Slowikowski, A. Gusev, A. Byrnes, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature genetics, 50:621–629, 2018.
OpenUrl CrossRef PubMed

[10] 10.↵
H. Fang et al. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nature genetics, 51:1082–1091, 2019.
OpenUrl

[11] 11.↵
S.S. Kim et al. Genes with high network connectivity are enriched for disease heritability. The American Journal of Human Genetics, 104:896–913, 2019.
OpenUrl CrossRef PubMed

[12] 12.
Q. Wang et al. A Bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data. Nature neuroscience, 22:691–699, 2019.
OpenUrl CrossRef PubMed

[13] 13.
C.S. Smillie et al. Intra-and inter-cellular rewiring of the human colon during ulcerative colitis. Cell, 178:714–730, 2019.
OpenUrl CrossRef PubMed

[14] 14.↵
M. Wainberg et al. Opportunities and challenges for transcriptome-wide association studies. Nature genetics, 51:592–599, 2019.
OpenUrl CrossRef PubMed

[15] 15.↵
A.D. Sawle et al. Identification of master regulator genes in human periodontitis. Journal of dental research, 95:1010–1017, 2016.
OpenUrl CrossRef PubMed

[16] 16.
E.A. Boyle, Y.I. Li, and J.K. Pritchard. An expanded view of complex traits: from polygenic to omnigenic. Cell, 169:1177–1186, 2017.
OpenUrl CrossRef PubMed

[17] 17.
B. Brynedal et al. Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation. The American Journal of Human Genetics, 100(4):581–591, 2017.
OpenUrl CrossRef

[18] 18.
C. Yao et al. Dynamic role of trans regulation of gene expression in relation to complex traits. The American Journal of Human Genetics, 100:571–580, 2017.
OpenUrl CrossRef

[19] 19.
D.M.D. Vargas et al. Alzheimer’s disease master regulators analysis: search for potential molecular targets and drug repositioning candidates. Alzheimer’s research & therapy, 10:59, 2018.
OpenUrl

[20] 20.
L.E. Montefiori et al. A promoter interaction map for cardiovascular disease genetics. Elife, 7:e35788, 2018.
OpenUrl

[21] 21.
X. Liu, Y.I. Li, and J.K. Pritchard. Trans effects on gene expression can drive omnigenic inheritance. Cell, 177:1022–1034, 2019.
OpenUrl CrossRef PubMed

[22] 22.
A.D. Torshizi et al. Deconvolution of Transcriptional Networks Identifies TCF4 as a Master Regulator in Schizophrenia. Science Advances, 5:eaau4139, 2019.
OpenUrl FREE Full Text

[23] 23.
R. Andersson and A. Sandelin. Determinants of enhancer and promoter activities of regulatory elements. Nature Reviews Genetics, pages 1–17, 2019.

[24] 24.↵
X. Wang and D.B. Goldstein. Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. The American Journal of Human Genetics, 106:215–233, 2020.
OpenUrl CrossRef

[25] 25.↵
E.S. Emison et al. A common sex-dependent mutation in a ret enhancer underlies hirschsprung disease risk. Nature, 434(7035):857–863, 2005.
OpenUrl CrossRef PubMed Web of Science

[26] 26.↵
S. Chatterjee et al. Enhancer variants synergistically drive dysfunction of a gene regulatory network in hirschsprung disease. Cell, 167(2):pp.355–368, 2016.
OpenUrl CrossRef PubMed

[27] 27.↵
K.S. Kobayashi and P.J. Van Den Elsen. Nlrc5: a key regulator of mhc class i-dependent immune responses. Nature Reviews Immunology, 12(12):813–820, 2012.
OpenUrl CrossRef PubMed

[28] 28.↵
H.K. Finucane, B. Bulik-Sullivan, A. Gusev, G. Trynka, Y. Reshef, P.R. Loh, V. Anttila, H. Xu, C. Zang, K. Farh, and S. Ripke. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, 47:1228–1235, 2015.
OpenUrl CrossRef PubMed

[29] 29.↵
S. Gazal et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nature genetics, 49(10):1421–1427, 2017.
OpenUrl CrossRef PubMed

[30] 30.↵
S. Gazal, C. Marquez-Luna, H.K. Finucane, and A.L. Price. Reconciling s-ldsc and ldak models and functional enrichment estimates. Nature genetics, 51(8):1202–1204, 2019.
OpenUrl CrossRef

[31] 31.↵
1000 Genomes Project Consortium. A global reference for human genetic variation. Molecular cell, 526(7571):68–74, 2015.
OpenUrl

[32] 32.↵
C.P. Fulco et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nature Genetics, 51:1664–1669, 2019.
OpenUrl CrossRef

[33] 33.↵
J. Nasser et al. Genome-wide maps of enhancer regulation connect risk variants to disease genes. bioRxiv, accepted in principle, Nature, 2020.

[34] 34.↵
S. McCarthy et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics, 48:1279–1283, 2016.
OpenUrl CrossRef PubMed

[35] 35.↵
C. Bycroft et al. The uk biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203–209, 2018.
OpenUrl CrossRef PubMed

[36] 36.↵
D. Taliun et al. Sequencing of 53,831 diverse genomes from the nhlbi topmed program. Nature, 590(7845):290–299, 2021.
OpenUrl CrossRef

[37] 37.↵
F. Hormozdiari et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nature genetics, 50(7):1041–1047, 2018.
OpenUrl CrossRef PubMed

[38] 38.↵
S. Gazal et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet, 50:1600–1607, 2018.
OpenUrl

[39] 39.↵
F.I. Hormozdiari et al. Functional disease architectures reveal unique biological role of transposable elements. Nature communications, 10(1):1–8, 2019.
OpenUrl CrossRef

[40] 40.
P.F. Palamara et al. High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nature Genetics, 50(9):1311–1317, 2018.
OpenUrl CrossRef

[41] 41.↵
K.K. Dey et al. Evaluating the informativeness of deep learning annotations for human complex diseases. Nature communications, 11(1):pp.1–9, 2020.
OpenUrl

[42] 42.↵
S.S. Kim et al. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nature communications, 11(1):1–15, 2020.
OpenUrl

[43] 43.↵
A. Gaulton et al. The ChEMBL database in 2017. Nucleic acids research, 45:D945–D954, 2016.
OpenUrl

[44] 44.↵
M.K. Freund et al. Phenotype-specific enrichment of Mendelian disorder genes near GWAS regions across 62 complex traits. The American Journal of Human Genetics, 103 (4):pp.535–552, 2018.
OpenUrl PubMed

[45] 45.↵
D. Vuckovic et al. The polygenic and monogenic basis of blood traits and diseases. Cell, 182 (5):pp.1214–1231, 2020.
OpenUrl CrossRef PubMed

[46] 46.↵
C.F. Wright et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. The Lancet, 385(9975):1305–1314, 2015.
OpenUrl

[47] 47.↵
M. Lek, K.J. Karczewski, E.V. Minikel, K.E. Samocha, E. Banks, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536:285–291, 2016.
OpenUrl CrossRef PubMed Web of Science

[48] 48.↵
H. Yoshida et al. The cis-Regulatory Atlas of the Mouse Immune System. Cell, 176:897–912, 2019.
OpenUrl CrossRef

[49] 49.↵
A.P. Schoech et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nature communications, 10:1–10, 2019.
OpenUrl CrossRef

[50] 50.↵
K.K.H. Farh et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature, 518:337–343, 2015.
OpenUrl CrossRef PubMed

[51] 51.↵
O. Weissbrod et al. Functionally-informed fine-mapping and polygenic localization of complex trait heritability. Nature Genetics, 52(12):p.1355–1363, 2020.
OpenUrl

[52] 52.↵
A. Kamburov et al. The ConsensusPathDB interaction database: 2013 update. Nucleic acids research, 41(D1):D793–D800, 2012.
OpenUrl PubMed

[53] 53.↵
D. Szklarczyk et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic acids research, 45(Database issue):D362–D368, 2017.
OpenUrl CrossRef PubMed

[54] 54.↵
H. Tong, C. Faloutsos, and J.Y. Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14:327–346, 2008.
OpenUrl

[55] 55.↵
A.R. Sonawane et al. Understanding tissue-specific gene regulation. Cell reports, 21:1077–1088, 2017.
OpenUrl

[56] 56.↵
U. Vösa et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. bioRxiv, page 447367, 2018.

[57] 57.↵
S.A. Lambert et al. The human transcription factors. Cell, 172:650–665, 2018.
OpenUrl CrossRef PubMed

[58] 58.↵
W. Cai et al. Master regulator genes and their impact on major diseases. PeerJ, 8:p.e9952, 2020.

[59] 59.
M.C. Nakamura. CIITA: a master regulator of adaptive immunity shows its innate side in the bone. Journal of bone and mineral research: the official journal of the American Society for Bone and Mineral Research, 29(2):p.287, 2014.
OpenUrl

[60] 60.
L. Colomer, C.and Marruecos, A. Vert, A. Bigas, and L. Espinosa. NF-kB members left home: NF-kB-independent roles in cancer. Biomedicines, 5(2):p.26, 2017.
OpenUrl

[61] 61.
E.H. Bresnick, K.R. Katsumura, H.Y. Lee, K.D. Johnson, and A.S. Perkins. Master regulatory GATA transcription factors: mechanistic principles and emerging links to hematologic malignancies. Nucleic acids research, 40(13):pp.5819–5831, 2012.
OpenUrl CrossRef PubMed

[62] 62.↵
S. Paul, P. Home, B. Bhattacharya, and S. Ray. GATA factors: Master regulators of gene expression in trophoblast progenitors. Placenta, 60:pp.S61–S66, 2017.
OpenUrl

[63] 63.↵
B. van de Geijn, H. Finucane, S. Gazal, F. Hormozdiari, T. Amariuta, and X Liu. Annotations capturing cell-type-specific TF binding explain a large fraction of disease heritability. Human Molecular Genetics, 29:1057–1067, 2020.
OpenUrl

[64] 64.↵
D. Speed, J. Holmes, and D.J Balding. Evaluating and improving heritability models using summary statistics. Nature Genetics, 52:458–462, 2020.
OpenUrl PubMed

[65] 65.↵
S. Chikuma. Ctla-4, an essential immune-checkpoint for t-cell activation. Emerging Concepts Targeting Immune Checkpoints in Cancer and Autoimmunity, pages 99–126, 2017.

[66] 66.
Y. Zhao et al. Evolving roles for targeting ctla-4 in cancer immunotherapy. Cellular Physiology and Biochemistry, 47(2):721–734, 2018.
OpenUrl CrossRef

[67] 67.↵
F. Liu et al. Ctla-4 correlates with immune and clinical characteristics of glioma. Cancer cell international, 20(1):1–10, 2020.
OpenUrl

[68] 68.↵
M.J. Richer, et al. T cell fates zipped up: how the bach2 basic leucine zipper transcriptional repressor directs t cell differentiation and function. The Journal of Immunology, 197(4):1009–1015, 2016.
OpenUrl

[69] 69.↵
H. Zhang et al. Bach2 deficiency leads to spontaneous expansion of il-4-producing t follicular helper cells and autoimmunity. Frontiers in immunology, 10:2050, 2019.
OpenUrl

[70] 70.↵
R. Roychoudhuri et al. Bach2 represses effector programs to stabilize t reg-mediated immune homeostasis. Nature, 498(7455):506–510, 2013.
OpenUrl CrossRef PubMed Web of Science

[71] 71.↵
J.D. Cooper et al. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nature genetics, 40(12):1399–1401, 2008.
OpenUrl CrossRef PubMed Web of Science

[72] 72.
M.A. Ferreira et al. Identification of il6r and chromosome 11q13. 5 as risk loci for asthma. The Lancet, 378(9795):1006–1014, 2011.
OpenUrl

[73] 73.↵
D.L. Morris et al. Genome-wide association meta-analysis in chinese and european individuals identifies ten new loci associated with systemic lupus erythematosus. Nature genetics, 48(8):940, 2016.
OpenUrl CrossRef PubMed

[74] 74.↵
A. Oeckinghaus and S. Ghosh. The nf-kb family of transcription factors and its regulation. Cold Spring Harbor perspectives in biology, 1(4):p.a000034, 2009.

[75] 75.↵
R.J. Grumont et al. B lymphocytes differentially use the rel and nuclear factor kb1 (nfkb1) transcription factors to regulate cell cycle progression and apoptosis in quiescent and mitogen-activated cells. Journal of Experimental Medicine, 187(5):663–674, 1998.
OpenUrl Abstract/FREE Full Text

[76] 76.↵
S. Gerondakis and U. Siebenlist. Roles of the nf-kb pathway in lymphocyte development and function. Cold Spring Harbor perspectives in biology, 2(5):p.a000182, 2010.

[77] 77.↵
M.L. Hujoel et al. Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species. The American Journal of Human Genetics, 104:611–624, 2019.
OpenUrl CrossRef

[78] 78.↵
C.A. de Leeuw, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS computational biology, 11(4):e1004219, 2015.
OpenUrl CrossRef

[79] 79.↵
K.J. Karczewski et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581:434–443, 2020.
OpenUrl CrossRef PubMed

[80] 80.↵
E.V. Minikel et al. Evaluating drug targets through human loss-of-function genetic variation. Nature, 581:459–464, 2020.
OpenUrl CrossRef

[81] 81.↵
K.A. Jagadeesh, K.K. Dey, et al. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. bioRxiv, 2021.

[82] 82.↵
N. Mancuso et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nature genetics, 51:675–682, 2019.
OpenUrl CrossRef PubMed

[83] 83.↵
E.M. Weeks et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. medRxiv, 2020.

[84] 84.↵
G. Kichaev et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics, 10(10):e1004722, 2014.
OpenUrl

[85] 85.
W. Chen et al. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics, 204(3):933–958, 2016.
OpenUrl Abstract/FREE Full Text

[86] 86.↵
G. Kichaev et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics, 33(2):248–255, 2017.
OpenUrl CrossRef PubMed

[87] 87.↵
J.P. Ray et al. Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features. Nature communications, 11(1):1–13, 2020.
OpenUrl

[88] 88.↵
Y. Hu et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS computational biology, 13(6):e1005589, 2017.
OpenUrl

[89] 89.↵
C. Marquez-Luna et al. LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. bioRxiv, 2020.

[90] 90.↵
R.J. Kinsella et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database, 2011.

[91] 91.↵
A. Kundaje et al. Integrative analysis of 111 reference human epigenomes. Nature, 518:317–330, 2015.
OpenUrl CrossRef PubMed

[92] 92.↵
Y. Liu, A. Sarkar, and M. Kellis. Evidence of a recombination rate valley in human regulatory domains. Genome Biology, page 193, 2017.

[93] 93.↵
B.J. Schmiedel et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell, 175:1701–1715, 2018.
OpenUrl CrossRef PubMed

[94] 94.↵
D. Szklarczyk et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic acids research, 43:D447–D452, 2014.
OpenUrl PubMed

[95] 95.↵
M.M. Hoffman, J. Ernst, S.P. Wilder, A. Kundaje, R.S. Harris, and M. Libbrecht. A method to predict the impact of regulatory variants from DNA sequence. Nucleic acids research, 41:827–841, 2012.
OpenUrl PubMed Web of Science

[96] 96.↵
M.M. Hoffman, O.J. Buske, J. Wang, Z. Weng, J.A. Bilmes, and W.S. Noble. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods, 9:473–476, 2012.
OpenUrl

[97] 97.↵
W.J. Kent et al. The human genome browser at ucsc. Genome research, 12(6):996–1006, 2002.
OpenUrl Abstract/FREE Full Text

[98] 98.↵
D. Karolchik et al. The ucsc table browser data retrieval tool. Nucleic acids research, 32 (Database Issue):D493–D496, 2004.
OpenUrl CrossRef PubMed Web of Science

[99] 99.↵
J. Ernst et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473:43–49, 2011.
OpenUrl CrossRef PubMed Web of Science

[100] 100.↵
B.M. Javierre et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell, 167:1369–1384, 2016.
OpenUrl CrossRef PubMed

[101] 101.↵
H.M. Amemiya, A. Kundaje, and A.P. Boyle. The ENCODE blacklist: identification of problematic regions of the genome. Scientific reports, 9(1):1–5, 2019.
OpenUrl CrossRef

[102] 102.↵
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74, 2012.
OpenUrl CrossRef PubMed Web of Science

[103] 103.↵
Jan-Renier AJ Moonen et al. KLF4 Recruits SWI/SNF to Increase Chromatin Accessibility and Reprogram the Endothelial Enhancer Landscape under Laminar Shear Stress. bioRxiv, 2020.

[104] 104.
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature, 550(7675):204–213, 2017.
OpenUrl CrossRef PubMed Web of Science

[105] 105.
L. Jostins et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491:119–124, 2012.
OpenUrl CrossRef PubMed Web of Science

[106] 106.
Y. Okada et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature, 506:376–381, 2014.
OpenUrl CrossRef PubMed Web of Science

[107] 107.
J. Bentham et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nature genetics, 47(12):1457–1464, 2015.
OpenUrl CrossRef PubMed

[108] 108.
P.C. Dubois et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics, 42(4):295–302, 2010.
OpenUrl CrossRef PubMed Web of Science

[109] 109.↵
A. Kamburov et al. ConsensusPathDB: toward a more complete picture of cell biology. Nucleic acids research, 39(suppl 1):D712–D717, 2010.
OpenUrl PubMed Web of Science

[110] 110.
E.K. Speliotes et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature genetics, 42:937–948, 2010.
OpenUrl CrossRef PubMed Web of Science

[111] 111.
A. Okbay et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature, 533:539–542, 2016.
OpenUrl CrossRef PubMed

[112] 112.
S. Ripke et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.
OpenUrl CrossRef PubMed Web of Science

[113] 113.
Tobacco and . Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature genetics, 42:441–447, 2010.
OpenUrl CrossRef PubMed Web of Science

[114] 114.
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near odz4. Nature genetics, 43:977–983, 2011.
OpenUrl CrossRef PubMed

[115] 115.
N. Barban et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nature genetics, 48(12):1462–1472, 2016.
OpenUrl CrossRef PubMed

[116] 116.
V. Boraska et al. A genome-wide association study of anorexia nervosa. Molecular psychiatry, 19(10):1085–1094, 2014.
OpenUrl CrossRef PubMed Web of Science

[117] 117.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. The Lancet, 381(9875):1371–1379, 2013.
OpenUrl CrossRef

[118] 118.
H. Schunkert et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43:333–338, 2011.
OpenUrl CrossRef PubMed

[119] 119.
T.M. Teslovich et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature, 466:707–713, 2010.
OpenUrl CrossRef PubMed Web of Science

[120] 120.
H.L. Allen et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467:832–838, 2010.
OpenUrl CrossRef PubMed Web of Science

[121] 121.
A.P. Morris et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44:981–990, 2012.
OpenUrl CrossRef PubMed

Contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies

Abstract

1 Introduction

Results

Overview of methods

Enhancer-driven genes are conditionally informative for autoimmune disease heritability

Genes with high network connectivity to Enhancer-driven genes are even more informative

Master-regulator genes are conditionally informative for autoimmune disease heritability

Genes with high network connectivity to Master-regulator genes are even more informative

Combined joint model

Discussion

Methods

Genomic annotations and the baseline-LD model

Gene Scores

S2G strategies

Activity-by-Contact model predictions

Stratified LD score regression

Combined τ*

Evaluating heritability model fit using SumHer loglSS

Data Availability

Code Availability

URLs

Supplementary Tables

2 Supplementary Figures

Acknowledgments

Footnotes

References

Citation Manager Formats

Subject Area

Evaluating heritability model fit using SumHer logl_SS