Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Modeling Regulatory Network Topology Improves Genome-Wide Analyses of Complex Human Traits

View ORCID ProfileXiang Zhu, Zhana Duren, Wing Hung Wong
doi: https://doi.org/10.1101/2020.03.13.990010
Xiang Zhu
1Pennsylvania State University & Stanford University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xiang Zhu
  • For correspondence: xiangzhu@psu.edu whwong@stanford.edu
Zhana Duren
1Pennsylvania State University & Stanford University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wing Hung Wong
1Pennsylvania State University & Stanford University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: xiangzhu@psu.edu whwong@stanford.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and new trait-associated genes revealing novel biological and therapeutic insights.

Introduction

Genome-wide association studies (GWAS) have catalogued many significant and reproducible associations between common genetic variants, notably single-nucleotide polymorphisms (SNPs), and diverse human complex traits1. However, it remains challenging2 to translate these findings into biological mechanisms and clinical applications, because most trait-associated variants have individually small effects and map to non-coding sequences.

One interpretation is that non-coding variants cumulatively affect complex traits through cell type- or tissue-specific3 gene regulation4. To test this hypothesis, large-scale epigenomic5,6 and transcriptomic7–10 data have been made available spanning diverse human cell types and tissues. Exploiting these regulatory genomic data, many studies have shown enrichments of trait-associated SNPs in chromatin regions11–13 and genes14–16 that are active in trait-relevant cell types or tissues. These studies simply overlap regulatory maps with GWAS data and often ignore functional interactions among loci within regulatory programs.

Gene regulatory networks17–20 have proven useful in mining functional interactions of genes from genomic data. Transcriptional regulatory interactions, rather than gene expression alone, drive tissue specificity19. Further, context-specific regulatory networks have emerged as promising tools to dissect the genetics of complex traits21–23. Network-connectivity analyses in GWAS have shown that trait-associated genes are more highly interconnected than expected18 and highly interconnected genes are enriched for trait heritability24. A major limitation of these analyses, however, is that they do not leverage observed enrichments to enhance trait-associated gene discovery.

To unleash the potential of regulatory networks in GWAS, we develop a novel framework for simultaneous genome-wide network enrichment and gene prioritization analysis. Through extensive simulations on the new method, we show its flexibility in various genetic architectures, its robustness to a wide range of model mis-specification, and its improved performance over existing methods. Applying the method to 18 human traits and 38 regulatory networks, we identify strong enrichments of genetic associations in network topology specific to trait-relevant cell types or tissues. By prioritizing variants within enriched networks we identify trait-associated genes that were not implicated by the same GWAS. Many of these putatively novel genes have strong support from multiple lines of external evidence; some are further validated by follow-up GWAS of the same traits with increased sample sizes. Together, these results demonstrate the potential for our method to yield novel biological and therapeutic insights from existing data.

Results

Method overview

Figure 1 shows the method schematic. In brief, we develop a new model dissecting the total effect of a single SNP on a trait into effects of multiple (nearby and distal) genes through a regulatory network, and then we combine it with a multiple-SNP regression likelihood25 based on GWAS summary statistics to perform Bayesian inference.

Fig 1:
  • Download figure
  • Open in new tab
Fig 1: Schematic of RSS-NET.

a Decomposition of the total effect of a common SNP on a complex trait through multiple nearby and distal genes. b Gene regulatory network defined as a weighted and directed bipartite graph linking TFs to TGs. c Given a TF-TG network, RSS-NET exploits its topology to decompose the total genetic effect into cis and trans regulatory components. Both the SNP-gene (cjg) and TF-TG (vgt) weights in the decomposition are assumed known and they are specified by existing omics data (Methods). d-e In addition to TF-TG networks, RSS-NET also requires GWAS summary statistics and ancestry-matching LD estimates as input. f-h Bayesian hierarchical models underlying RSS-NET. An in-depth description is provided in Methods. i Given a network, RSS-NET produces a BF comparing the baseline (M0) and enrichment (M1) models to summarize the evidence for network enrichment. j RSS-NET prioritizes loci within an enriched network by computing P1, the posterior probability that at least one SNP j in a locus is trait-associated (βj = 0). Differences between P1 under M0 and M1 reflect the influence of a regulatory network on genetic associations, highlighting putatively novel trait-associated genes.

We start with a conceptual decomposition of the total effect of a common SNP on a complex trait into three components: a cis-regulatory component through nearby genes, a trans-regulatory component through distal genes that are regulated by genes near this SNP, and a remaining component due to other factors (Fig. 1a). Since common genetic variation contributes to complex traits primarily via gene regulation22, we find this decomposition a sensible approximation to the genetic basis of complex traits.

Despite various ways to model the regulatory components, here we use cell type- or tissue-specific regulatory networks18,20 linking transcription factors (TFs) to target genes (TGs). Specifically, we define a regulatory network as a directed bipartite graph with weighted edges from TFs to TGs (Fig. 1b). Given a TF-TG network, we use its topology to decompose the total effect of each SNP into effects of multiple interconnected genes (Fig. 1c). For example, we represent the expected total effect of SNP j shown in Figure 1c as a weighted sum of cis effects of three nearby genes (outside-network gene k, TG u and TF g) and trans effects of three TGs (n, u, t) that are directly regulated by TF g. For identifiability we assume the SNP-gene (cjg) and TF-TG (vgt) weights in the decomposition are known, specified by existing omics data (Methods).

To implement this regulatory decomposition (Fig. 1c) in GWAS, we formulate a network-induced prior for SNP-level effects (β), and combine it with a multiple regression likelihood25 for β based on single-SNP association statistics from a GWAS (Fig. 1d) and linkage disequilibrium (LD) estimates from a reference panel with ancestry matching the GWAS (Fig. 1e). We refer to the resulting Bayesian framework (Fig. 1f-h) as Regression with Summary Statistics exploiting NEtwork Topology (RSS-NET).

RSS-NET accomplishes two tasks simultaneously: (1) testing whether a network is enriched for genetic associations (Fig. 1i); (2) identifying which genes within this network drive the enrichment (Fig. 1j). Specifically, RSS-NET estimates two independent enrichment parameters: θ and σ2, which measure the extent to which, SNPs near network genes and regulatory elements (REs) have increased likelihood to be associated with the trait, and, SNPs near network edges have larger effect sizes, respectively. To assess network enrichment, RSS-NET computes a Bayes factor (BF) comparing the “enrichment model” (M1: θ > 0 or σ2 > 0) against the “baseline model” (M0: θ = 0 and σ2 = 0). To prioritize genes within enriched networks, RSS-NET contrasts posterior distributions of β estimated under M0 and M1.

RSS-NET improves upon its predecessor RSS-E16. Specifically, RSS-NET exploits the full network topology, whereas RSS-E ignores the edge information. By explicitly modeling regulatory interconnections, RSS-NET outperforms RSS-E in both simulated and real datasets. Despite different treatments of network information, RSS-NET and RSS-E share computation schemes (Supplementary Notes), allowing us to reuse the efficient algorithm of RSS-E. Software is available at https://github.com/suwonglab/rss-net.

Method comparison through simulations

The novelty of RSS-NET resides in a unified framework that leverages network topology to infer enrichments from whole-genome association statistics and prioritizes loci in light of inferred enrichments automatically. We are not aware of any published method with the same features. However, one could ignore topology and simply annotate SNPs based on their proximity to network genes and REs (Methods). For these SNP-level annotations there are methods to assess global enrichments or local associations on GWAS summary data. Here we use Pascal26, LDSC13,27 and RSS-E16 to benchmark RSS-NET.

Given a network, we first simulated SNP effects (β) from either RSS-NET assumed or mis-specified priors, and then combined them with real genotypes to simulate phenotypes from a genome-wide multiple-SNP model. We computed the corresponding single-SNP association statistics, on which we compared RSS-NET with other methods. Since RSS-NET is a model-based approach, we designed a large array of simulation scenarios for both correctly- and mis-specified β. To reduce computation of this large-scale design, we mainly used real genotypes28 of 348,965 genome-wide common SNPs and a whole-genome regulatory network inferred for B cell (436 TFs, 3,018 TGs)20,29. We also performed simulations on real genotypes30 of 1 million common SNPs31 or different networks, and obtained similar results.

We started with simulations where RSS-NET modeling assumptions were satisfied. We considered two genetic architectures: a sparse scenario with most SNPs being null and a polygenic scenario with most SNPs being trait-associated. For each architecture, we created negative datasets by simulating SNP effects (β) from M0 and positive datasets by simulating β from three M1 patterns (only θ > 0; only σ2 > 0; both θ > 0 and σ2 > 0) of the target network, and applied the methods to detect M1 from all datasets (Fig. 2 and Supplementary Fig.s 1-2). Existing methods tend to perform well in select settings. For example, Pascal and LDSC perform poorly when genetic signals are very sparse (Fig. 2b); RSS-E performs poorly when enrichment patterns are inconsistent with its modeling assumptions (Fig. 2c). Except for datasets with weak genetic signals on the network (Fig. 2d), RSS-NET performs consistently well in all scenarios. This is expected because the flexible models underlying RSS-NET can capture various genetic architectures and enrichment patterns. In practice, one rarely knows beforehand the correct genetic or enrichment architecture. This makes the flexibility of RSS-NET appealing.

Fig 2:
  • Download figure
  • Open in new tab
Fig 2: Flexibility of RSS-NET to identify network-based enrichments from GWAS summary statistics.

We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate negative and positive individual-level data under two genetic architectures (“sparse” and “polygenic”). We used the baseline model (M0: 0 = 0 and σ2 = 0) to simulate SNP effects (β) for negative datasets. We simulated β for positive datasets from the enrichment model (M1:θ: 0 > 0 or σ2 > 0) for the target network under three scenarios: a 0 > 0, σ2 = 0; b 0 = 0, σ2 > 0; c 0 > 0, σ2 > 0. We computed the corresponding single-SNP association statistics, on which we compared RSS-NET with RSS-E16, LDSC-baseline13, LDSC-baselineLD27 and Pascal26 using their default setups (Methods). Pascal includes two gene scoring options: maximum-of-χ2 (-max) and sum-of-y2 (-sum), and two pathway scoring options: χ2 approximation (-chi) and empirical sampling (-emp). For each dataset, Pascal and LDSC methods produced P-values, whereas RSS-E and RSS-NET produced BFs; these statistics were used to rank the significance of enrichments. A false and true positive occurs if a method identifies enrichment of the target network in a negative and positive dataset respectively. Each panel displays the trade-off between false and true enrichment discoveries, namely receiver operating characteristics (ROC) curves, for all methods in 200 negative and 200 positive datasets of a simulation scenario, and also reports the corresponding areas under ROC curves (AUROCs, a higher value indicating better performance). Dashed diagonal lines denote random ROC curves (AUROC=0.5). d RSS-NET, as well as other methods, does not perform well when the target network harbors weak genetic associations. Simulation details and additional results are provided in Supplementary Figures 1-2.

Genetic associations of complex traits are enriched in regulatory regions5,6 Since a regulatory network is a set of genes linked by REs, it is important to confirm that network enrichments identified by RSS-NET are not driven by general regulatory enrichments. To this end we simulated negative datasets with enriched associations in random SNPs that are near genes (Fig. 3a; Supplementary Fig. 3) or REs (Fig. 3b; Supplementary Fig. 4). The results show that RSS-NET is unlikely to yield false discoveries due to arbitrary regulatory enrichments, and it is yet more powerful than other methods.

Fig 3:
  • Download figure
  • Open in new tab
Fig 3: Robustness of RSS-NET to model mis-specification in enrichment analyses.

Here positive datasets were generated from a special case of M1 with θ > 0 and σ2 > 0. Negative datasets were simulated from four scenarios where genetic associations were enriched in: a a random set of near-gene SNPs; b a random set of near-RE SNPs; c SNPs with MAF- and LD-dependent effects; d a random edge-altered network. By this design, RSS-NET was mis-specified in all four scenarios. Similar to positive datasets, the simulated false enrichments in all negative datasets manifested in both association proportion (more frequent) and magnitude (larger effect). RSS-E was excluded here because of its poor performance shown in Figure 2c. The rest of this simulation study is the same as Figure 2. Simulation details and additional results are provided in Supplementary Figures 3-6.

Minor allele frequency (MAF)- and LD-dependent genetic architectures have been identified in complex traits27. To assess the impact of MAF- and LD-dependence on RSS-NET results, we simulated MAF- and LD-dependent SNP effects (β) from an additive model of 10 MAF bins and 6 LD-related annotations27, which were then used to create negative datasets (Fig. 3c; Supplementary Fig. 5). Similarly, enrichments identified by RSS-NET are unlikely to be false positives induced by MAF- and LD-dependence.

Interconnections within regulatory programs play key roles in driving context specificity19 and propagating disease risk22, but existing methods often ignore the edge information. In contrast, RSS-NET leverages the full topology of a given network. The topology-aware feature increases the potential of RSS-NET to identify the most relevant network for a trait among candidates that share many nodes but differ in edges. To illustrate this feature, we designed a scenario where a real target network and random candidates had the same nodes and edge counts, but different edges. We simulated positive and negative datasets where genetic associations were enriched in the target network and random candidates respectively, and then tested enrichment of the target network on all datasets. As expected, only RSS-NET can reliably distinguish true enrichments of the target network from enrichments of its edge-altered counterparts (Fig. 3d; Supplementary Fig. 6).

To benchmark its prioritization component, we compared RSS-NET with gene-based association modules in RSS-E16 and Pascal26 (Fig. 4; Supplementary Fig.s 7-9). Consistent with previous work16, RSS methods outperform Pascal methods even without network enrichment (Fig. 4a). This is because RSS-NET and RSS-E exploit a multiple regression framework25 to learn the genetic architecture from data of all genes and assess their effects jointly, whereas Pascal only uses data of a single gene to estimate its effect. Similar to enrichment simulations (Fig. 2), RSS-NET outperforms RSS-E in prioritizing genes across different enrichment patterns (Fig. 4b-d). This again highlights the flexibility of RSS-NET.

Fig 4:
  • Download figure
  • Open in new tab
Fig 4: Power of RSS-NET to identify gene-based associations from GWAS summary statistics.

We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate individual-level GWAS data under four scenarios: a θ = 0, σ2 = 0; b θ > 0, σ2 = 0; c θ = 0, σ2 > 0; d θ > 0, σ2 > 0. We computed the corresponding single-SNP summary statistics on which we compared RSS-NET with gene-based association components of RSS-E16 and Pascal26. RSS-E is a special case of RSS-NET assuming σ2 = 0, and RSS-E-baseline is a special case of RSS-E assuming θ = 0. Pascal includes two gene scoring options: maximum-of-of-χ2 (-max) and sum-of-χ2 (-sum). Given a network, Pascal and RSS-E-baseline do not leverage any network information, RSS-E ignores the edge information, and RSS-NET exploits the full topology. Each scenario contains 200 datasets and each dataset contains 16,954 autosomal protein-coding genes for testing. We defined a gene as “trait-associated” if at least one SNP j within 100 kb of the transcribed region of this gene had non-zero effect (βj ≠ 0). For each gene in each dataset, RSS methods produced posterior probabilities that the gene was trait-associated (P1), whereas Pascal methods produced association P-values; these statistics were used to rank the significance of gene-level associations. The upper half of each panel displays ROC curves and AUROCs for all methods, with dashed diagonal lines indicating random performance (AUROC=O.5). The lower half of each panel displays precision-recall (PRC) curves and areas under PRC curves (AUPRCs) for all methods, with dashed horizontal lines indicating random performance. For AUROC and AUPRC, higher value indicates better performance. Simulation details and additional results are provided in Supplementary Figures 7-8.

Finally, since RSS-NET uses a regulatory network as is, and, most networks to date are algorithmically inferred, we performed simulations to assess the robustness of RSS-NET under noisy networks. Specifically we simulated datasets from a real target network, created noisy networks by randomly removing edges from this real target, and then used the noisy networks (rather than the real one) in RSS-NET analyses. By exploiting retained true nodes and edges, RSS-NET produces reliable results in identifying both network enrichments and genetic associations, and unsurprisingly, its performance drops as the noise level increases (Supplementary Fig. 10).

In conclusion, RSS-NET is adaptive to various genetic architectures and enrichment patterns, it is robust to a wide range of model mis-specification, and it outperforms existing related methods. To further investigate its real-world utility, we applied RSS-NET to analyze 18 complex traits and 38 regulatory networks.

Enrichment analyses of 38 networks across 18 traits

We first inferred20 whole-genome regulatory networks for 38 human cell types and tissues (Methods; Supplementary Table 1) from public data29 of paired expression and chromatin accessibility (PECA). On average each network has 431 TFs, 3,298 TGs and 93,764 TF-TG weighted edges. Clustering showed that networks recapitulated context similarity, with immune cells and brain regions grouping together as two units (Fig. 5a; Supplementary Fig. 11).

Fig 5:
  • Download figure
  • Open in new tab
Fig 5: RSS-NET analyses of 18 complex traits and 38 regulatory networks.

a Clustering of 38 regulatory networks based on í-distributed stochastic neighbour embedding. Details are provided in Supplementary Figure 11. b Similarity between a given tissue-specific PECA-based network and 394 CAGE-based networks for various cell types and tissues (a: adult samples; c: cell lines; f: fetal samples). The similarity between a PECA- and CAGE-based network is summarized by Jaccard indices of their node sets (x-axis) and edge sets (y-axis). Additional results are provided in Supplementary Figure 12. c Ternary diagram showing, for each trait, percentages of the “best” enrichment model (showing the largest BF) as M11: θ > 0, σ2 = 0, M12: θ = 0, σ2 > 0 and M13: θ > 0, σ2 > 0 across networks. See Supplementary Table 6 for numerical values. Shown are 16 traits that had multiple networks more enriched than the near-gene control. d Comparison of context-matched PECA-based (y-axis) and CAGE-based (x-axis) network enrichments on the same GWAS data. Dashed lines have slope 1 and intercept 0. Additional results are provided in Supplementary Figure 14. e Median proportion of genes with Embedded Image higher than reference estimates (Embedded Image or Embedded Image), among genes with reference estimates higher than a given cutoff. Medians are evaluated among 16 traits that had multiple networks more enriched than the near-gene control. See Supplementary Table 8 for numerical values. f-g Overlap of RSS-NET prioritized genes Embedded Image with genes implicated in knockout mouse phenotypes38 (f) and human Mendelian diseases39,40 (g). An edge indicates that a category of knockout mouse or Mendelian genes is significantly enriched for genes prioritized for a GWAS trait (FDR ≤ 0.1). Thicker edges correspond to stronger enrichment odds ratios. To simplify visualization, only top-ranked categories are shown for each trait (f: 3; g: 2). See Supplementary Tables 12-13 for full results. Trait abbreviations are defined in Supplementary Table 2.

As a validation, we assessed the pairwise similarity between the 38 PECA-based networks and 394 human cell type- and tissue-specific regulatory networks18 reconstructed from independent cap analysis of gene expression (CAGE) data7,8. As expected, PECA- and CAGE-based networks often reached maximum overlap when derived from biosamples of matched cell or tissue types (Fig. 5b; Supplementary Fig. 12), showing that the context specificity of PECA-based networks is replicable.

On the 38 networks, we applied RSS-NET to analyze 1.1 million common SNPs31 for 18 traits, using GWAS summary statistics from 20,883 to 253,288 European-ancestry individuals (Supplementary Table 2) and LD estimates from the European panel of 1000 Genomes Project30. For each trait-network pair we computed a BF assessing network enrichment. Full results of 684 trait-network pairs are available online (Data Availability).

To check whether observed enrichments could be driven by general regulatory enrichments, we created a “near-gene” control network with 18,334 protein-coding autosomal genes as nodes and no edges, and then analyzed this control with RSS-NET on the same GWAS data. For most traits, the near-gene control has substantially weaker enrichment than the actual networks. In particular, 512 out of 684 trait-network pairs (one-sided binomial P = 2.2 × 10−40) showed stronger enrichments than their near-gene counterparts (average log10 BF increase: 13.94; one-sided t P = 5.1 × 10−15), and, 16 out of 18 traits had multiple networks more enriched than the near-gene control (minimum: 5; one-sided Wilcoxon P = 1.2 × 10−4). In contrast, LDSC and Pascal methods identified fewer trait-network pairs passing the neargene enrichment control (LDSC maximum: 389, one-sided χ2 P = 1.7 × 10−12; Pascal maximum: 69, P = 2.0 × 10−129; Supplementary Table 3). Consistent with simulations (Fig. 3a-b), these results indicate that network enrichments identified by RSS-NET are unlikely driven by generic regulatory enrichments harbored in the vicinity of genes.

Among 512 trait-network pairs passing the near-gene enrichment control, we further examined whether the observed enrichments could be confounded by network properties or genomic annotations. We did not observe any correlation between BFs and three network features (proportion of SNPs in a network: Pearson R = −3.0 × 10−2, P = 0.49; node counts: R = −5.4 × 10−2, P = 0.23; edge counts: R = −9.2 × 10−3, P = 0.84). To check confounding effects of genomic annotations, we computed the correlation between BFs and proportions of SNPs falling into both a network and each of 73 functional categories27, and we did not find any significant correlation (−0.13 < R < −0.01, P > 0.05/73). Similar patterns hold for all 684 trait-network pairs (Supplementary Tables 4-5). Altogether, the results suggest that observed enrichments are unlikely driven by generic network or genome features.

For each trait-network pair, we also computed BFs comparing the baseline (M0) against three disjoint models where enrichment (M1) was contributed by (1) network genes and REs only (M11: θ > 0, σ2 = 0); (2) TF-TG edges only (M12: θ = 0, σ2 > 0); (3) network genes, REs and TF-TG edges (M13: θ > 0, σ2 > 0). We found that M13 was the most supported model by data (with the largest BF) for 411 out of 512 trait-network pairs (one-sided binomial P = 1.2 × 10−45), highlighting the key role of TF-TG edges in driving enrichments. To further confirm this finding, we repeated RSS-NET analyses by fixing all TF-TG edge weights as zero (vtg = 0) and we observed substantially weaker enrichments (average log10 BF decrease: 30.46; one-sided tP = 8.6× 10−35; Supplementary Fig. 13). Together the results corroborate the “omnigenic” model that genetic signals of complex traits are distributed via regulatory interconnections22.

When stratifying results by traits, however, we found that enrichment patterns varied considerably (Fig. 5c; Supplementary Table 6). For type 2 diabetes (T2D), two of five networks passing the near-gene enrichment control showed the strongest support for M11. Many networks showed the strongest support for M12 in breast cancer (10), body mass index (BMI, 14), waist-hip ratio (37) and schizophrenia (38). Since one rarely knows the true enrichment patterns a priori, and M1 includes {M11,M12,M13} as special cases, we used M1-based BFs throughout this study. Collectively, these results highlight the heterogeneity of network enrichments across complex traits, which can be potentially learned from data by flexible approaches like RSS-NET.

Top-ranked enrichments recapitulated many trait-context links reported in previous GWAS. Genetic associations with BMI were enriched in the networks of pancreas (BF = 2.07 × 1013), bowel (BF = 8.02 × 1012) and adipose (BF = 4.73 × 1012), consistent with the roles of obesity-related genes in insulin biology and energy metabolism. Networks of immune cells showed enrichments for rheumatoid arthritis (RA, BF = 2.95 × 1060), inflammatory bowel disease (IBD, BF = 5.07 × 1035) and Alzheimer’s disease (BF = 8.31 × 1026). Networks of cardiac and other muscle tissues showed enrichments for coronary artery disease (CAD, BF = 9.78 × 1028), atrial fibrillation (AF, BF = 8.55 × 1014), and heart rate (BF = 2.43 × 107). Other examples include brain network with neuroticism (BF = 2.12 × 1019), and, liver network with high- and low-density lipoprotein (HDL, BF = 2.81 × 1021; LDL, BF = 7.66 × 1027).

Some top-ranked enrichments were not identified in the original GWAS, but they are biologically relevant. For example, natural killer (NK) cell network showed the strongest enrichment among 38 networks for BMI (BF = 3.95 × 1013), LDL (BF = 5.18 × 1030) and T2D (BF = 1.49 × 1077). This result supports a recent mouse study32 revealing the role of NK cell in obesity-induced inflammation and insulin resistance, and adds to the considerable evidence unifying metabolism and immunity in many pathological states33. Other examples include adipose network with CAD34 (BF = 1.67 × 1029), liver network with Alzheimer’s disease16,35 (BF = 1.09 × 1020) and monocyte network with AF36,37 (BF = 4.84 × 1012).

Some networks show enrichments in multiple traits. To assess network co-enrichments among traits, we tested correlations for all trait pairs using their BFs of 38 networks (Supplementary Table 7). In total 29 of 153 trait pairs were significantly correlated (P < 0.05/153). Reassuringly, subtypes of the same disease showed strongly correlated enrichments, as in IBD subtypes (R = 0.96, P = 1.3 × 10−20) and CAD subtypes (R = 0.90, P = 3.3 × 10−14). The results also recapitulated known genetic correlations including RA with IBD (R = 0.79, P = 5.3 × 10−9)and neuroticism with schizophrenia (R = 0.73, P = 1.6 × 10−7). Network enrichments of CAD were correlated with enrichments of its known risk factors such as heart rate (R = 0.75, P = 5.1 × 10−8), BMI (R = 0.71, P = 5.1 × 10−7), AF (R = 0.65, P = 9.2 × 10−6) and height (R = 0.64, P = 1.6 × 10−5). Network enrichments of Alzheimer’s disease were strongly correlated with enrichments of LDL (R = 0.90, P = 2.6 × 10−14) and IBD (R = 0.78, P = 8.3 × 10−9), consistent with roles of lipid metabolism and inflammation in Alzheimer’s disease35. Genetic correlations among traits are not predictive of correlations based on network enrichments (R = 0.12, P = 0.18), suggesting the additional explanatory power from regulatory networks to reveal trait similarities in GWAS.

To show that RSS-NET can be applied more generally, we analyzed the CAGE-based networks18 of 20 cell types and tissues that were also present in 38 PECA-based networks (Fig. 5d; Supplementary Fig. 14). PECA-based networks often produced larger BFs than their CAGE-based counterparts on the same GWAS data (average log10 BF increase: 17.36; one-sided t P = 1.4 × 10−11), suggesting that PECA-based networks are more enriched in genetic signals. Reassuringly, PECA- and CAGE-based networks consistentlyhighlighted known trait-context links (e.g. immune cells and autoimmune diseases, muscle tissues and heart diseases). For some traits PECA-based networks produced more informative results. For example, CAGE-based analysis of HDL showed a broad enrichment pattern across cell types and tissues (consistent with previous connectivity analysis18 of the same data), whereas PECA-based analysis identified liver as the top-enriched context by a wide margin. Although not our main focus, these results highlight the potential for RSS-NET to systematically evaluate different network inferences in GWAS.

Enrichment-informed prioritization of network genes

A key feature of RSS-NET is that inferred network enrichments automatically contribute to prioritize associations of network genes (Method). Specifically, for each locus RSS-NET produces Embedded Image and Embedded Image, the posterior probability that at least one SNP in the locus is associated with the trait, assuming M0, M1 for the near-gene control network, and M1 for a given network, respectively. When multiple networks are enriched, RSS-NET produces Embedded Image by averaging Embedded Image over all networks passing the near-gene control, weighted by their BFs. This allows us to assess genetic associations in light of enrichment without having to select a single enriched network. Differences between enrichment estimates Embedded Image and reference estimates Embedded Image reflect the impact of network on a locus.

RSS-NET enhances genetic association detection by leveraging inferred enrichments. To quantify this improvement, for each trait we calculated the proportion of genes with higher Embedded Image than reference estimates Embedded Image, among genes with reference P1 passing a given cutoff (Fig. 5e). When using Embedded Image as reference, we observed high proportions of genes with Embedded Image (median: 82 – 98%) across a wide range of Embedded Image-cutoffs (0 – 0.9), and as expected, the improvement decreased as the reference cutoff increased. When using Embedded Image as reference, we observed less genes with improved P1 than using Embedded Image (one-sided Wilcoxon P = 9.8 × 10−4), suggesting the observed improvement might be partially due to general near-gene enrichments, but proportions of genes with Embedded Image remained high (median: 74 – 94%) nonetheless. Similar patterns occurred when we repeated the analysis with Embedded Image across 512 trait-network pairs (Supplementary Table 8). Together the results demonstrate the strong influence of network enrichments on nominating additional trait-associated genes.

RSS-NET tends to promote more genes in networks with stronger enrichments. For each trait the proportion of genes with Embedded Image in a network is often positively correlated with its enrichment BF (R: 0.28 – 0.91; Supplementary Table 9). When a gene belongs to multiple networks, its highest Embedded Image often occurs in the top-enriched networks. We illustrate this coherent pattern with MT1G, a liver-active9 gene that was prioritized for HDL by RSS-NET and also implicated in a recent multi-ancestry genome-wide interaction analysis of HDL41. Although MT1G belongs to regulatory networks of 18 contexts, only the top enrichment in liver (BF = 2.81 × 1021) informs a strong association between MT1G and HDL Embedded Image, and remaining networks with weaker enrichments yield minimal improvement Embedded Image. Figure 6 shows additional examples.

RSS-NET recapitulates many genes implicated in the same GWAS. For each analyzed dataset we downloaded the GWAS-implicated genes from the GWAS Catalog1 and computed the proportion of these genes with high Embedded Image. With a stringent cutoff 0.9, we observed a significant overlap (median across traits: 69%; median Fisher exact P = 1.2 × 10−26; Supplementary Table 10). Reassuringly, many recapitulated genes are well-established for the traits (Supplementary Table 11), such as CACNA1C for schizophrenia, TCF7L2 for T2D, APOB for lipids and STAT4 for autoimmune diseases.

RSS-NET also uncovers putative associations that were not reported in the same GWAS. To demonstrate that many of these new associations are potentially real we exploited 15 analyzed traits that each had a updated GWAS with larger sample size. In each case we obtained newly implicated genes from the GWAS Catalog1 and computed the proportion of these genes that were identified by RSS-NET Embedded Image. The overlap proportions remained significant (median: 12%; median Fisher exact P = 1.9 × 10−5; Supplementary Table 10), showing the potential of RSS-NET to identify trait-associated genes that can be validated by later GWAS with additional samples. Among these validated genes, many are strongly supported by multiple lines of external evidence. A particular example is NR0B2, a liver-active9 gene prioritized for HDL (BF = 2.81 × 1021, Embedded Image), which was not identified by standard GWAS43 of the same data (minimum single-SNP P = 1.4 × 10−7 within 100 kb, n = 99,900). NR0B2 was associated with mouse lipid traits44–46 and human obesity47, and identified in a later GWAS of HDL48 with doubled sample size (P = 9.7 × 10−16, n = 187,056). Table 1 lists additional examples.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1

Examples of RSS-NET highlighted genes that were not reported in GWAS of the same data (P≥ 5×10−8) but were implicated in later GWAS with increased sample sizes (P < 5×108). The “mouse trait” column is based on the Mouse Genome Informatics 38. The “therapeutic/clinical evidence” column is based on the Online Mendelian Inheritance in Man39 and Therapeutic Target Database42. Click blue links to view details online. Drugs are highlighted in yellow. Abbreviations of GWAS traits are defined in Supplementary Table 2. GEJ: gastroesophageal junction; CS: cardiovascular system; DS: digestive/alimentary system; Metab.: metabolism; NS: nervous system.

Biological and clinical relevance of prioritized genes

Besides looking up overlaps with GWAS publications, we cross-referenced RSS-NET prioritized genes Embedded Image with multiple orthogonal databases to systematically assess their biological and therapeutic themes.

Mouse phenomics provides important resources to study genetics of human traits49. Here we evaluated overlap between RSS-NET prioritized genes and genes implicated in 27 categories of knockout mouse phenotypes38. Network-informed genes Embedded Image were significantly enriched in 128 mouse-human trait pairs (FDR ≤ 0.1; Supplementary Table 12). Fewer significant pairs were identified without network information (119 for Embedded Image; 80 for Embedded Image). For many human traits, top enrichments of network-prioritized genes occurred in closely related mouse phenotypes (Fig. 5f). Schizophrenia-associated genes were strongly enriched in nervous, neurological and growth phenotypes (OR: 1.77 – 2.04). Genes prioritized for autoimmune diseases were strongly enriched in immune and hematopoietic phenotypes (OR: 2.05 – 2.35). The cardiovascular system showed strong enrichments of genes associated with heart conditions (OR: 2.45 – 2.92). The biliary system showed strong enrichments of genes associated with lipids, BMI, CAD and T2D (OR: 2.16 −10.78). The phenotypically matched cross-species enrichments strengthen the biological relevance of RSS-NET results.

Mendelian disease-causing genes often contribute to complex traits50. Here we quantified overlap between RSS-NET prioritized genes and genes causing 19 categories40 of Mendelian disorders39. Leveraging regulatory networks Embedded Image, we observed 47 significantly enriched Mendelian-complex trait pairs (FDR ≤ 0.1; 44 for Embedded Image; 31 for Embedded Image; Supplementary Table 13), among which the top-ranked ones were often phenotypically matched (Fig. 5g). Schizophrenia-associated genes were strongly enriched in Mendelian development and psychiatric disorders (OR: 2.22-2.23). Genes prioritized for AF and heart rate were strongly enriched in arrhythmia (OR: 7.16 – 8.28). Genes prioritized for autoimmune diseases were strongly enriched in monogenic immune dysregulation (OR: 3.11 −4.32). Monogenic cardiovascular diseases showed strong enrichments of genes associated with lipids and heart conditions (OR: 2.69 – 3.70). We also identified pairs where Mendelian and complex traits seemed unrelated but were indeed linked. Examples include Alzheimer’s disease with immune dysregulation35 (OR = 7.32) and breast cancer with insulin disorders51 (OR = 9.71). The results corroborate that Mendelian and complex traits exist on a continuum.

Human genetics has proven valuable in therapeutic development52. To evaluate their potential in drug discovery, we examined whether RSS-NET prioritized genes are pharmacologically active targets with known clinical in-dications42. We identified genes with perfectly matched drug indications and GWAS traits. The most illustrative identical match is EDNRA, a gene that is prioritized for CAD (Embedded Image in aorta network), and is also a successful target of approved drugs for cardiovascular diseases (Table 1). We identified genes with closely related drug indications and GWAS traits. For example, TTR is prioritized for Alzheimer Embedded Image, and is also a successful target of approved drugs for amyloidosis (Table 2). For early-stage development, overlaps between drug indications and GWAS traits may provide additional genetic confidence. For example, HCAR3 is prioritized for HDL Embedded Image, and is also a clinical trial target for lipid metabolism disorders (Table 2). Other examples include CASP8 with cancer, NFKB2 with IBD, and DLG4 with stroke (Tables 1–2). We also found mismatches between drug indications and GWAS traits, which could suggest drug repurposing opportunities53. For example, CSF3 is prioritized for AF Embedded Image, and is also a successful target of an approved drug for aplastic anemia (AA). Since CSF3 is associated with various blood cell traits in mouse54 and human55, and inflammation plays a role in both AA and AF etiology36,37,56, it is tempting to assess effects of the approved AA drug on AF. Mechanistic evaluations are required to understand the prioritized therapeutic genes, but they could form a useful basis for future studies.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2

Examples of RSS-NET highlighted genes that have not reached genome-wide significance in the GWAS Catalog1 (P ≥ 5×108) at the time of analysis. The rest is the same as Table 1.

Discussion

We present RSS-NET, a new topology-aware method for integrative analysis of regulatory networks and GWAS summary data. We demonstrate the improvement of RSS-NET over existing methods through extensive simulations, and illustrate its potential to yield novel insights via analyses of 38 networks and 18 traits. With multi-omics integration becoming a routine in GWAS, we expect that researchers will find RSS-NET useful.

Compared with existing integrative approaches, RSS-NET has several key strengths. First, unlike many methods that require loci passing a significance threshold11,12,17, RSS-NET uses data from genome-wide common variants. This potentially allows RSS-NET to identify subtle enrichments even in studies with few significant hits. Second, RSS-NET models enrichments directly as increased rates (θ) and sizes (σ2) of SNP-level associations, and thus bypasses the issue of converting SNP-level summary data to gene-level statistics17,18,26. Third, RSS-NET inherits from RSS-E16 an important feature that inferred enrichments automatically highlight which network genes are most likely to be trait-associated. This prioritization component, though useful, is missing in current polygenic analyses13,15,24,27. Fourth, by making flexible modeling assumptions, RSS-NET is adaptive to unknown genetic and enrichment architectures.

RSS-NET provides a new view of complex trait genetics through the lens of regulatory topology. Complementing previous connectivity analyses17–19,24, RSS-NET highlights a consistent pattern where genetic signals of complex traits often distribute across genome via regulatory topology. RSS-NET further leverages topology enrichments to enhance trait-associated gene discovery. The topology awareness of RSS-NET relies on a novel model that decomposes effect size of a single SNP into effects of multiple (cis or trans) genes through a regulatory network. Other than a theoretical perspective22, we are not aware of any publication implementing the topology-aware model in practice.

RSS-NET depends critically on the quality of input regulatory networks. The more accurate networks are, the better performance RSS-NET achieves. Currently our understanding of regulatory networks remains incomplete, and most of available networks are algorithmically inferred17–20. Artifacts in inferred networks can bias RSS-NET results; however our simulations confirm the robustness of RSS-NET when input networks are not severely deviated from ground truth. The modular design of RSS-NET enables systematic assessment of various networks in the same GWAS and provides interpretable performance metrics, as illustrated in our comparison of PECA- and CAGE-based networks. As more accurate networks become available in diverse cellular contexts, the performance of RSS-NET will be markedly enhanced.

Like any method, RSS-NET has several limitations in its current form. First, despite its prioritization feature, RSS-NET does not attempt to pinpoint associations to causal SNPs within prioritized loci. For this task we recommend off-the-shelf fine-mapping methods57. Second, the computation time of RSS-NET increases as the total number of analyzed SNPs increases, and thus our analyses focused on 1.1 million common SNPs31. Relaxing the complexity will allow RSS-NET to analyze more SNPs jointly. Third, RSS-NET uses a simple method to derive SNP-gene relevance (cjg) from expression quantitative trait loci (eQTL). A more principled approach would be applying the RSS likelihood25 to eQTL summary data (as we did in GWAS) and using the estimated SNP effects to specify cjg. However, our initial assessments indicated that the model-based approach was limited by the small sample sizes of current eQTL studies9,10. With eQTL studies reaching large sample sizes58 comparable to current GWAS1, this approach may improve c jg specification in RSS-NET. Fourth, RSS-NET analyzes one network at a time. Since a complex disease typically manifests in various sites, multiple cellular networks are likely to mediate disease risk jointly. To extend RSS-NET to incorporate multiple networks, an intuitive idea would be representing the total effect of a SNP as an average of its effect size in each network, weighted by network relevance for a disease. Fifth, RSS-NET does not include known SNP-level13,24,27 or gene-level14–16 annotations. Although our mis-specification simulations and near-gene control analyses confirm that RSS-NET is robust to generic enrichments of known features, accounting for known annotations can help interpret observed network enrichments24. Our preliminary experiments, however, showed that incorporating additional networks or annotations in RSS-NET increased computation costs. Hence, we view developing computationally efficient multi-network, multi-annotation methods as an important area for future work.

In summary, improved understanding of human complex trait genetics requires biologically-informed models beyond the standard one employed in GWAS. By modeling tissue-specific regulatory topology, RSS-NET is a step forward in this direction.

Methods

Gene and SNP information

This study used genes and SNPs from the human genome assembly GRCh37. This study used 18,334 protein-coding autosomal genes(http://ftp.ensembl.org/pub/grch37/release-94/gtf/homo_sapiens, accessed January 3, 2019). Simulations used 348,965 genome-wide SNPs28 (https://www.wtccc.org.uk), and data analyses used 1,289,786 genomewide HapMap331 SNPs (https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2, accessed November 27, 2018). As discussed later, these SNP sets were chosen to reduce computation. This study also excluded SNPs on sex chromosomes, SNPs with minor allele frequency less than 1%, and SNPs in the human leukocyte antigen region.

GWAS summary statistics and LD estimates

The European-ancestry GWAS summary statistics (Supplementary Table 2) and LD estimates used in the present study were processed as in previous work16. Data sources and references are provided in Supplementary Notes.

Gene regulatory networks

In this study a regulatory network is a directed bipartite graph {VTF, VTG,ETF→TG}, where VTF denotes the node set of TFs, VTG denotes the node set of TGs, and ETF→TG denotes the set of directed TF-TG edges, summarizing how TFs regulate TGs through REs (Fig. 1b, Supplementary Notes). Each edge has a weight between 0 and 1, measuring the relative regulation strength of a TF on a TG.

Here we inferred 38 regulatory networks from context-matched high-throughput sequencing data of gene expression (e.g., RNA-seq) and chromatin accessibility (e.g., DNAse-seq or ATAC-seq). We obtained these PECA data from portals of ENCODE29 (https://www.encodeproject.org, accessed December 14, 2018) and GTEx9 (https://gtexportal.org, accessed July 13, 2019); see Supplementary Table 1. The network-construction software and TF-motif information are available at https://github.com/suwonglab/PECA. The 38 networks are available at https://github.com/suwonglab/rss-net, with descriptive statistics provided in Supplementary Tables 14-16.

We first constructed an “omnibus” network from PECA data of 201 biosamples across 80 cell types and tissues, using a regression-based method20. In brief, by modeling the distribution of TG expression levels conditional on RE accessibility levels and TF expression levels, we estimated a regression coefficient for each TF-TG pair. We selected a TF-TG pair as the network edge if this estimated coefficient was significantly non-zero, and divided its estimate by the maximum of estimates for all TF-TG pairs to set a (0, 1)-scale edge weight. We also estimated a regression coefficient for each RE-TG pair, which reflected the regulating strengths of REs on TGs and was later used to construct context-specific networks (i.e., {Iit} in (1)). Here we defined REs as open chromatin peaks called from accessibility sequencing data by MACS259.

With the omnibus network in place, we then constructed context-specific networks for 5 immune cell types, 5 brain regions and 27 non-brain tissues. For each context (tissue or cell type), we computed a trans-regulation score (TRS) between TF g and TG t: Embedded Image where Rgt is the correlation of TF g and TG t expression levels across all contexts; Embedded Image are normalized context-specific expression (TF g, TG t) and accessbility (RE i) levels (Embedded Image, y: acutal RE accessibility or gene expression level in the given context, ymed: median level across all contexts); Bgi reflects the motif binding strength of TF g on RE i, defined as the sum of motif position weight matrix-based log-odds probabilities of all binding sites on RE i (calculated by HOMER60); and Iit reflects the overall regulating strength of RE i on TG t, provided by the omnibus network above. TRS offers a natural way to rank and select context-specific TF-TG edges because a larger value of TRSgt indicates a stronger regulating strength of TF g on TG t in the given context. We further set (0,1)-scale TF-TG edge weights by computing log2(1 + TRSgt) / max(i;j){log2(1 + TRSsij)}.

To benchmark PECA-based networks and illustrate RSS-NET as a generally applicable tool, we analyzed 394 published human cell type- and tissuespecific TF-TG circuits18 inferred from independent CAGE data7,8 (http://regulatorycircuits.org/, accessed May 8, 2019). When evaluating the similarity between PECA- and CAGE-based networks (Fig. 5b, Supplementary Fig. 12), we used their full node and edge sets to compute Jaccard indices. When running RSS-NET on context-matched PECA- and CAGE-based networks (Fig. 5d, Supplementary Fig. 14), we selected top-ranked CAGEbased edges to match PECA-based edge counts (Supplementary Table 15) and normalized CAGE-based edge weights Embedded Image, x: original weight) to match the scale of PECA-based edge weights (Supplementary Table 16).

External databases for cross-reference

To validate and interpret RSS-NET results, we used the following external databases (accessed November 28,2019): GWAS Catalog1 (https://www.ebi.ac.uk/gwas/), Mouse Genome Informatics38 (http://www.informatics.jax.org/), phenotype-specific Mendelian gene sets40 (https://github.com/bogdanlab/gene_sets/), Online Mendelian Inheritance in Man39 (https://www.omim.org/), Therapeutic Target Database42 (http://db.idrblab.net/ttd/).

When quantifying overlaps between RSS-NET prioritized genes and mouse or Mendelian genes, we used all genes for each GWAS trait. We repeated the overlap analysis under the same significance cutoff (FDR≤ 0.1) after excluding genes that were implicated in the same or later GWAS. Since GWAS-implicated genes overlap significantly with phenotypically-matched mouse and Mendelian genes (median Fisher exact P = 7.1 × 10−7), we identified fewer discoveries as expected (mouse-human pairs: 26, Mendelian-complex pairs: 4; Supplementary Tables 12-13), but we obtained consistent odds ratio estimates nonetheless (mouse R = 0.78, P = 8.6 × 10−73; Mendelian R = 0.89, P = 9.0 × 10−74; Supplementary Fig. 15).

Network-induced effect size distribution

We model the total effect of SNP j on a given trait, βj, as Embedded Image where πj denotes the probability that SNP j is associated with the trait (βj = 0), Embedded Image denotes a normal distribution with mean μj and variance Embedded Image specifying the effect size of a trait-associated SNP j, and δ0 denotes point mass at zero (βj = 0).

We model the trait-association probability πj as Embedded Image where θ0 < 0 captures the genome-wide background proportion of trait-associated SNPs, θ > 0 reflects the increase in probability, on the log10-odds scale, that a SNP near network genes and REs is trait-associated, and αj reflects the proximity of SNP j to a network. Following previous analyses15,16,24, we let αj = 1 if SNP j is within 100 kb of any member gene (TF, TG) or RE for a given network. The idea of (3) is that if a cell type or tissue plays an important role in a trait then genetic associations may occur more often in SNPs involved in the corresponding network and REs than expected by chance.

We model the mean effect size μj as Embedded Image where Oj is the set of all nearby or distal genes contributing to the total effect of SNP j, wjg measures the relevance between SNP j and gene g, and γjg denotes the effect of SNP j on a trait due to gene g. We note that (4) provides a generic model to decompose the total effect of a SNP into effects of genes through {Oj, wjg}.

Here we use a TF-TG regulatory network to specify {Oj, wjg} in (4): Embedded Image where Gj is the set of all genes within 1 Mb window of SNP j (a standard cis-eQTL window size9,10,58), cjg measures the relative impact of a SNP j on gene g, Tg is the set of all genes directly regulated by TF g in a given network (Tg is empty if gene g is not a TF), and vgt measures the relative impact of a TF g on its TG t. Since a genome-wide analysis typically involves many SNPs and genes, we fix {Tg, vgt, cjg} to ensure the identifiability of (5). We use inferred edges and weights of a context-specific TF-TG network20,29 to specify Tg and vgt respectively. We use context-matched cis-eQTL9,10,58 to specify cjg (Supplementary Notes and Tables 17-18). The idea of (5) is that the true effect of a SNP may fan out through some regulatory network of multiple (nearby or distal) genes to affect the trait22.

We model γjg, the random effect of SNP j due to gene g as Embedded Image where the SNP-level subscript j in γjg ensures the exchangeability of βj in (2); see Supplementary Notes. We use a constant variance σ2 in (6) for computational convenience. (One could potentially improve (6) by letting σ2 depend on functional annotations13,27 of SNP j and/or context-specific expression14–16 of gene g, though possibly at higher computational cost.)

Combining (2), (4) and (6) yields a variance decomposition for SNP effect: Embedded Image

We hypothesize that (7) may provides an alternative approach to heritability analyses13,24,27 and we plan to investigate this idea elsewhere.

Bayesian hierarchical modeling

Consider a GWAS with n unrelated individuals measured on p SNPs. In practice we do not know the true SNP-level effects β:= (β1,..., βp)’ in (2), but we can infer them from GWAS summary statistics and LD estimates. Specifically, we perform Bayesian inference for β by combining the network-based prior (2)-(6) with the RSS likelihood25: Embedded Image where Embedded Image is a p × p diagonal matrix with diagonal elements being Embedded Image and Embedded Image are estimated single-SNP effect size of each SNP j and its standard error from the GWAS, and Embedded Image is the p × p LD matrix estimated from a reference panel with ancestry matching the GWAS.

RSS-NET, defined by the hierarchical model (2)-(6) and (8), consists of four unknown hyper-parameters: Embedded Image. To specify hyper-priors, we first introduce two free parameters {η, ρ} ∈ [0,1] to re-parameterize Embedded Image: Embedded Image where, roughly, η represents the proportion of the total phenotypic variation explained by p SNPs, and ρ represents the proportion of total genetic variation explained by network annotations {Oj, wjg}. Because Embedded Image is roughly the ratio of phenotype variance to genotype variance, (9) ensures that SNP effects (β) do not rely on sample size n and have the same measurement unit as the trait. See Supplementary Notes for a rigorous derivation of (9).

Wethen place independent uniform grid priors on {θ0, θ, η, ρ} (Supplementary Table 19). These simple hyper-priors produce accurate posterior estimates for hyper-parameters in simulations (Supplementary Fig. 16). RSS-NET results are robust to grid choice in both simulated and real data (Supplementary Fig.s 17-18). (If one had specific information about {θ0,θ,η,ρ} in a given setting then this could be incorporated in the hyper-priors.)

Network enrichment

To assess whether a regulatory network is enriched for genetic associations with a trait, we evaluate a Bayes factor (BF): Embedded Image where f (·) denotes probability densities, a is defined in (3), {O,W} are defined in (4), M1 denotes the enrichment model where θ > 0 or σ2 > 0, and M0 denotes the baseline model where θ = 0 and σ2 = 0. The observed data are BF times more likely under M1 than under M0, and so the larger the BF, the stronger evidence for network enrichment. See Supplementary Notes for details of computing BF. To compute BFs used in Figure 5c, we replace M1 in (10) with three restricted enrichment models (M11, M12, M13). Unless otherwise specified, all BFs reported in this work are based on M1.

Given a BF cutoff false positive rates vary considerably across genetic architectures and enrichment patterns in simulations (Supplementary Table 20). As the genetic basis of most complex traits remains unknown, we find it impractical to fix some significance threshold. Instead we recommend an adaptive approach. Specifically, for a given GWAS we run RSS-NET on a near-gene control network containing all genes as nodes and no edges (i.e., αj = 1 for all SNPs within 100 kb of any gene and vgt = 0 for all TF-TG pairs), and we use the resulting BF as the enrichment threshold in this GWAS. As shown in our analyses, this approach has three main advantages. First, it is adaptive to study heterogeneity such as differences in traits and sample sizes (Supplementary Table 2). Second, it accounts for generic regulatory enrichments of genetic signals residing near genes. Third, it facilitates comparisons with non-Bayesian methods based on P-values (Supplementary Table 3).

Locus association

To identify association between a locus and a trait, we compute P1, the posterior probability that at least one SNP in the locus is associated with the trait: Embedded Image where D is a shorthand for the input data of RSS-NET including GWAS summary statistics Embedded Image, LD estimates Embedded Image and network annotations {a, O, W}. See Supplementary Notes for details of computing P1. For a locus, Embedded Image and Embedded Image correspond to P1 evaluated under the baseline model M0, the enrichment model M1 for the near-gene control network, and M1 for a given TF-TG network. In this study we defined a locus as the transcribed region of a gene plus 100 kb up and downstream, and thus we used “locus” and “gene” interchangeably.

For K networks with enrichments stronger than the near-gene control, we use Bayesian model averaging (BMA) to compute Embedded Image for each locus: Embedded Image where Embedded Image and BF(k) are enrichment P1 and BF for network k. The ability to average across networks in (12) is an advantage of our Bayesian framework, because it allows us to assess associations in light of network enrichment without having to select a single enriched network.

In this study we used P1 ≥ 0.9 as the significance cutoff, yielding a median false-positive rate 1.24 × 10−4 in simulations (Supplementary Table 21). We also highlighted genes with Embedded Image much larger than Embedded Image (Fig. 6 and Tables 1–2), because they showcase the influence of tissue-specific regulatory topology on prioritizing genetic associations.

Fig 6:
  • Download figure
  • Open in new tab
Fig 6: RSS-NET gene prioritization results of select trait-network pairs.

In the left column, each dot represents a member gene of a given network. Dashed lines have slope 1 and intercept 0. In the center and right columns, each dot represents a network to which a select gene belongs. Numerical values of P1 and BF are available online (Data Availability).

Computation time

The total computational time of RSS-NET to analyze a pair of trait and network is determined by the number of genomewide SNPs analyzed, the size of hyper-parameter grid, and the number of variational iterations till convergence, all of which can vary considerably among studies. It is thus hard to make general statements about computational time. However, to give a specific example, we finished the analysis of 1,032,214 HapMap3 SNPs and liver network for HDL within 12 hours in a standard computer cluster (60 nodes, 8 CPUs and 32 Gb memory per node).

The number of genome-wide SNPs analyzed (p) affects the computation time of RSS-NET in two distinct ways. First, the per-iteration complexity of RSS-NET is linear with p (Supplementary Notes). Second, a large p defines a large optimization problem, often requiring many iterations to converge. To quantify the impact of p on computation time, we simulated datasets from different sets of genome-wide common SNPs, analyzed them with RSS-NET on identical computers, and compared the computation time (Supplementary Fig. 9). When p increased from 348,965 to 1,030,397, on average the total computation time was four times longer (one-sided Wilcoxon P = 8.0 × 10−132).

Simulation overview

To assess the new model for SNP effects (β) in RSS-NET, we simulated a large array of correctly- and mis-specified β for a given target network. Specifically, we generated “positive” datasets where the underlying β was simulated from M1 for the target network, and “negative” datasets where β was simulated from either M0 or the following scenarios: (1) random enrichments of near-gene SNPs; (2) random enrichments of near-RE SNPs; (3) MAF-and LD-dependent effect sizes; (4) M1 for edge-altered copies of the target network. For a fair comparison in each scenario, we matched positive and negative datasets by i) the number of trait-associated SNPs and ii) proportion of phenotypic variation explained by all SNPs. Simulation details are provided in Supplementary Figures 1-9.

We combined the simulated β with genotypes of 348,965 genome-wide SNPs from 1,458 individuals28 to simulate phenotypes using an additive multiple-SNP model with Gaussian noise. For simulated individual-level data, we performed the standard single-SNP analysis to generate GWAS summary statistics, on which we compared RSS-NET with external methods.

External software for benchmarking

This study used the following software to benchmark RSS-NET: RSS-E (https://github.com/stephenslab/rss, accessed October 19,2018), Pascal (https://www2.unil.ch/cbg/index.php?title=Pascal, accessed October 5, 2017) and LDSC with two sets of baseline annotations as covariates (version 1.0.0, https://github.com/bulik/ldsc; baseline model v1.1, https://data.broadinstitute.org/alkesgroup/LDSC0RE/1000G_Phase3_baseline_v1.1_ldscores.tgz; baselineLD model v2.1, https://data.broadinstitute.org/alkesgroup/LDSC0RE/1000G_Phase3_baselineLD_v2.1_ldscores.tgz; accessed November 27, 2018). Versions of all packages and files were up-to-date at the time of analysis.

Given a context-specific TF-TG network, RSS-E and LDSC methods use the same binary SNP-level annotations {aj} as defined in (3) of RSS-NET. The interface design of Pascal does not allow direct usage of {αj}. Here we supplied Pascal program with a GMT file containing all member genes of the network and set SNP-to-gene window sizes as 100 kb (‘-up=100000 -down=100000’). In this study all external methods were used with their default setups, and did not include the edge information of a network.

RSS-E outputs the same statistics as RSS-NET (namely, BF and P1). Pascal implements two gene scoring methods, maximum-of-χ2 and sum-of-χ2, each producing gene-based association P-values. Given gene scores, Pascal provides two gene set scoring options, χ2 approximation and empirical sampling, to produce enrichment P-values. LDSC methods output enrichment P-values and coefficient Z-scores, which produced consistent results in simulations (LDSC-baseline: R = 0.98, P = 1.2 × 10−67; LDSC-baselineLD: R = 0.98, P = 9.1 × 10−63; Supplementary Fig. 19). Due to its higher power shown in simulations (LDSC-baseline: average AUROC increase= 0.012, one-sided t P = 4.0× 10−3; LDSC-baselineLD: average AUROC increase= 0.023, one-sided t P = 1.5 × 10−5), we used LDSC enrichment P-values throughout this study.

Data availability

Network files used in this study are available at https://github.com/suwonglab/rss-net. Analysis results of this study are available at https://suwonglab.github.io/rss-net/results. Other data are specified in Methods and Supplementary Notes.

Code availability

The RSS-NET software is available at https://github.com/suwonglab/rss-net. Tutorials of installing and using RSS-NET are available at https://suwonglab.github.io/rss-net. Results of this study were generated from MATLAB version 9.3.0.713579 (R2017b), on a Linux system with Intel E5-2650V2 2.6 GHz and E5-2640V4 2.4 GHz processors. Other codes are specified in Methods and Supplementary Notes.

Author contributions

X.Z. and W.H.W. conceived the study. X.Z. developed the methods and implemented the software. X.Z. conducted the simulation experiments. Z.D. provided the 38 regulatory networks. X.Z. performed the data analyses. X.Z. prepared the supplementary materials and online resources. X.Z. wrote the manuscript. X.Z. and W.H.W. revised the manuscript.

Acknowledgments

This study is supported by Stein Fellowship to X.Z. and NIH grants P50HG007735 and R01HG010359 to W.H.W. This study uses computational resources provided by the Stanford Research Computing Center. This study uses data generated by the WTCCC, 1000 Genomes, ENCODE, GTEx, DICE, eQTLGen and multiple GWAS consortia. We thank them for making their data publicly available (Supplementary Notes). We thank X. He for helpful comments on a draft manuscript.

Footnotes

  • We summarize our revisions by: (1) new simulation studies to illustrate the robustness of RSS-NET; (2) additional replication and comparative analyses of RSS-NET on real data; (3) expanded discussions of methodology intuition and implementation details. The new results are consistent with conclusions in our original manuscript, and the revised texts improve the readability of this work.

  • https://suwonglab.github.io/rss-net/

References

  1. [1].↵
    Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research 47, D1005–D1012 (2019).
    OpenUrlCrossRefPubMed
  2. [2].↵
    Tam, V. et al. Benefits and limitations of genome-wide association studies. Nature Reviews Generics 20, 467–484 (2019).
    OpenUrl
  3. [3].↵
    Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nature Reviews Genetics 21, 137–150 (2020).
    OpenUrlPubMed
  4. [4].↵
    French, J. & Edwards, S. The role of noncoding variants in heritable disease. Trends in Genetics 36, 880–891 (2020).
    OpenUrl
  5. [5].↵
    The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  6. [6].↵
    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
    OpenUrlCrossRefPubMed
  7. [7].↵
    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  8. [8].↵
    The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  9. [9].↵
    GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
    OpenUrlCrossRefPubMedWeb of Science
  10. [10].↵
    Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715 (2018).
    OpenUrlCrossRefPubMed
  11. [11].↵
    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
    OpenUrlAbstract/FREE Full Text
  12. [12].↵
    Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature Genetics 45, 124–130 (2013).
    OpenUrlCrossRefPubMed
  13. [13].↵
    Finucane, H. K. et al. Partitioning heritability by functional annotation using genomewide association summary statistics. Nature Genetics 47, 1228–1235 (2015).
    OpenUrlCrossRefPubMed
  14. [14].↵
    Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. American Journal of Human Genetics 101, 686–699 (2017).
    OpenUrlCrossRefPubMed
  15. [15].↵
    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature Genetics 50, 621–629 (2018).
    OpenUrlCrossRefPubMed
  16. [16].↵
    Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nature Communications 9, 4361 (2018).
    OpenUrl
  17. [17].↵
    Greene, C. S. et al. Understanding multicellular function and disease with human tissuespecific networks. Nature Genetics 47, 569–576 (2015).
    OpenUrlCrossRefPubMed
  18. [18].↵
    Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nature Methods 13, 366–370 (2016).
    OpenUrl
  19. [19].↵
    Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Reports 21, 1077–1088 (2017).
    OpenUrl
  20. [20].↵
    Duren, Z., Chen, X., Jiang, R., Wang, Y. & Wong, W. H. Modeling gene regulation from paired expression and chromatin accessibility data. Proceedings of the National Academy of Sciences 114, E4914–E4923 (2017).
    OpenUrlAbstract/FREE Full Text
  21. [21].↵
    Califano, A., Butte, A. J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature Genetics 44, 841–847 (2012).
    OpenUrlCrossRefPubMed
  22. [22].↵
    Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034 (2019).
    OpenUrlCrossRefPubMed
  23. [23].↵
    Li, W., Duren, Z., Jiang, R. & Wong, W. H. A method for scoring the cell type-specific impacts of noncoding variants in personal genomes. Proceedings of the National Academy of Sciences 117, 21364–21372 (2020).
    OpenUrlAbstract/FREE Full Text
  24. [24].↵
    Kim, S. S. et al. Genes with high network connectivity are enriched for disease heritabil-ity. American Journal of Human Genetics 104, 896–913 (2019).
    OpenUrlCrossRefPubMed
  25. [25].↵
    Zhu, X. & Stephens, M. Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. Annals of Applied Statistics 11, 1561–1592 (2017).
    OpenUrl
  26. [26].↵
    Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Computational Biology 12, e1004714 (2016).
    OpenUrl
  27. [27].↵
    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nature Genetics 49, 1421–1427 (2017).
    OpenUrlCrossRefPubMed
  28. [28].↵
    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  29. [29].↵
    Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Research 48, D882–D889 (2020).
    OpenUrl
  30. [30].↵
    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
    OpenUrlCrossRefPubMed
  31. [31].↵
    International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  32. [32].
    Lee, B.-C. et al. Adipose natural killer cells regulate adipose tissue macrophages to promote insulin resistance in obesity. Cell Metabolism 23, 685–698 (2016).
    OpenUrl
  33. [33].
    Hotamisligil, G. S. Inflammation, metaflammation and immunometabolic disorders. Nature 542, 177–185 (2017).
    OpenUrlCrossRefPubMed
  34. [34].
    Oikonomou, E. K. & Antoniades, C. The role of adipose tissue in cardiovascular health and disease. Nature Reviews Cardiology 16, 83–99 (2019).
    OpenUrl
  35. [35].↵
    Kang, J. & Rivest, S. Lipid metabolism and neuroinflammation in Alzheimer’s disease: a role for liver X receptors. Endocrine Reviews 33, 715–746 (2012).
    OpenUrlCrossRefPubMed
  36. [36].↵
    Shahid, F., Lip, G. Y. & Shantsila, E. Role of monocytes in heart failure and atrial fibrillation. Journal of the American Heart Association 7, e007849 (2018).
    OpenUrlFREE Full Text
  37. [37].↵
    Aviles, R. J. et al. Inflammation as a risk factor for atrial fibrillation. Circulation 108, 3006–3010 (2003).
    OpenUrlAbstract/FREE Full Text
  38. [38].↵
    Bult, C. et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Research 47, D801–D806 (2019).
    OpenUrlCrossRefPubMed
  39. [39].↵
    Amberger, J., Bocchini, C., Scott, A. & Hamosh, A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Research 47, D1038–D1043 (2019).
    OpenUrlCrossRefPubMed
  40. [40].↵
    Freund, M. K. et al. Phenotype-specific enrichment of Mendelian disorder genes near GWAS regions across 62 complex traits. The American Journal of Human Genetics 103, 535–552 (2018).
    OpenUrlPubMed
  41. [41].↵
    Noordam, R. et al. Multi-ancestry sleep-by-SNP interaction analysis in 126,926 individuals reveals lipid loci stratified by sleep duration. Nature Communications 10, 5121 (2019).
    OpenUrl
  42. [42].↵
    Wang, Y. et al. Therapeutic Target Database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Research 48, D1031–D1041 (2020).
    OpenUrlPubMed
  43. [43].↵
    Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  44. [44].↵
    Kerr, T. A. et al. Loss of nuclear receptor SHP impairs but does not eliminate negative feedback regulation of bile acid synthesis. Developmental Cell 2, 713–720 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  45. [45].
    Wang, L. et al. Redundant pathways for negative feedback regulation of bile acid production. Developmental Cell 2, 721–731 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  46. [46].↵
    Hartman, H. B., Lai, K. & Evans, M. J. Loss of small heterodimer partner expression in the liver protects against dyslipidemia. Journal of Lipid Research 50, 193–203 (2009).
    OpenUrlAbstract/FREE Full Text
  47. [47].↵
    Nishigori, H. et al. Mutations in the small heterodimer partner gene are associated with mild obesity in Japanese subjects. Proceedings of the National Academy of Sciences 98, 575–580 (2001).
    OpenUrlAbstract/FREE Full Text
  48. [48].↵
    Willer, C. et al. Discovery and refinement of loci associated with lipid levels. Nature Genetics 45, 1274–1283 (2013).
    OpenUrlCrossRefPubMed
  49. [49].↵
    Brown, S. et al. High-throughput mouse phenomics for characterizing mammalian gene function. Nature Reviews Genetics 19, 357–370 (2018).
    OpenUrlCrossRefPubMed
  50. [50].↵
    Blair, D. R. et al. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell 155, 70–80 (2013).
    OpenUrlCrossRefPubMedWeb of Science
  51. [51].↵
    Bruning, P. F. et al. Insulin resistance and breast-cancer risk. International Journal of Cancer 52, 511–516 (1992).
    OpenUrlPubMedWeb of Science
  52. [52].↵
    Plenge, R., Scolnick, E. & Altshuler, D. Validating therapeutic targets through human genetics. Nature Reviews Drug Discovery 12, 581–594 (2013).
    OpenUrlCrossRefPubMed
  53. [53].↵
    Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nature Biotechnology 30, 317–320 (2012).
    OpenUrlCrossRefPubMed
  54. [54].↵
    Lieschke, G. et al. Mice lacking granulocyte colony-stimulating factor have chronic neutropenia, granulocyte and macrophage progenitor cell deficiency, and impaired neutrophil mobilization. Blood 84, 1737–1746 (1994).
    OpenUrlAbstract/FREE Full Text
  55. [55].↵
    Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
    OpenUrlCrossRefPubMed
  56. [56].↵
    Barrett, A. & Sloand, E. Autoimmune mechanisms in the pathophysiology of myelodysplastic syndromes and their clinical relevance. Haematologica 94, 449–451 (2009).
    OpenUrlFREE Full Text
  57. [57].↵
    Schaid, D., Chen, W. & Larson, N. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nature Reviews Genetics 19, 491–504 (2018).
    OpenUrlCrossRefPubMed
  58. [58].↵
    Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. bioRxiv 447367 (2018).
  59. [59].↵
    Zhang, Y. et al. Model-based analysis of ChlP-Seq (MACS). Genome Biology 9, R137 (2008).
    OpenUrlCrossRefPubMed
  60. [60].↵
    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular Cell 38, 576–589 (2010).
    OpenUrlCrossRefPubMedWeb of Science
Back to top
PreviousNext
Posted December 08, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Modeling Regulatory Network Topology Improves Genome-Wide Analyses of Complex Human Traits
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Modeling Regulatory Network Topology Improves Genome-Wide Analyses of Complex Human Traits
Xiang Zhu, Zhana Duren, Wing Hung Wong
bioRxiv 2020.03.13.990010; doi: https://doi.org/10.1101/2020.03.13.990010
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Modeling Regulatory Network Topology Improves Genome-Wide Analyses of Complex Human Traits
Xiang Zhu, Zhana Duren, Wing Hung Wong
bioRxiv 2020.03.13.990010; doi: https://doi.org/10.1101/2020.03.13.990010

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4079)
  • Biochemistry (8751)
  • Bioengineering (6467)
  • Bioinformatics (23315)
  • Biophysics (11720)
  • Cancer Biology (9135)
  • Cell Biology (13227)
  • Clinical Trials (138)
  • Developmental Biology (7404)
  • Ecology (11360)
  • Epidemiology (2066)
  • Evolutionary Biology (15078)
  • Genetics (10390)
  • Genomics (14001)
  • Immunology (9110)
  • Microbiology (22026)
  • Molecular Biology (8773)
  • Neuroscience (47318)
  • Paleontology (350)
  • Pathology (1419)
  • Pharmacology and Toxicology (2480)
  • Physiology (3701)
  • Plant Biology (8044)
  • Scientific Communication and Education (1427)
  • Synthetic Biology (2206)
  • Systems Biology (6009)
  • Zoology (1247)