Integrative analyses of multi-tissue Hi-C and eQTL data demonstrate close spatial proximity between eQTLs and their target genes

Gene regulation is important for cells and tissues to function. At the genomic level, it has been studied from two aspects, the identification of expression quantitative trait loci (eQTLs) and identification of long-range chromatin interactions. It is important to understand their relationship, such as whether eQTLs regulate their target genes through physical chromatin interaction. Although previous studies have suggested enrichment of eQTLs in regions with a high chromatin interaction frequency, it is unclear whether this relationship is consistent across different tissues and cell lines and whether there would be any tissue-specific patterns. Here, we performed integrative analyses of eQTL and high-throughput chromatin conformation capture (Hi-C) data from 11 human primary tissue types and 2 human cell lines. We found that chromatin interaction frequency is positively correlated with the number of genes having eQTLs, and eQTLs and their target genes are more likely to fall in the same topologically associating domains than that expected from randomly generated control datasets. These results are consistent across all tissues and cell lines we evaluated. Moreover, in dorsolateral prefrontal cortex, spleen, hippocampus, pancreas and aorta, tissue-specific eQTLs are enriched in tissue-specific frequently interacting regions. These results reveal a more detailed picture of the complicated relationship between different mechanisms of gene regulation. Author summary Whole-genome gene regulation has been studied in tissues and cell lines from multiple perspectives, including identification of expression quantitative trait loci (eQTLs) and identification of long-range chromatin interactions. These two complementary approaches focus on different aspects of gene regulation, one being statistical across individuals while the other being physical within a sample. Integrating results from these two approaches will help us understand their relationships, such as whether eQTLs regulate their target genes through physical chromatin interaction. We performed comprehensive analyses using data from multiple human tissues and cell lines, and showed that chromatin interaction frequency is positively associated with eQTL results in all evaluated tissues and cell lines. The observed relationships also displayed tissue-specific pattern in some tissues. Our results revealed a more detailed picture of the complicated relationship between the different mechanisms of gene regulation.

we performed integrative analyses of eQTL and high-throughput chromatin conformation 23 capture (Hi-C) data from 11 human primary tissue types and 2 human cell lines. We found that 24 chromatin interaction frequency is positively correlated with the number of genes having eQTLs, 25 and eQTLs and their target genes are more likely to fall in the same topologically associating 26 domains than that expected from randomly generated control datasets. These results are 27 consistent across all tissues and cell lines we evaluated. Moreover, in dorsolateral prefrontal 28 cortex, spleen, hippocampus, pancreas and aorta, tissue-specific eQTLs are enriched in tissue-29 specific frequently interacting regions. These results reveal a more detailed picture of the 30 complicated relationship between different mechanisms of gene regulation. being physical within a sample. Integrating results from these two approaches will help us 37 understand their relationships, such as whether eQTLs regulate their target genes through 38 6 between two genomic regions tends to decrease as their genomic distance increases [16]. We 114 also stratified the data by distance and the total number of tested genes. S1

125
Lieberman-Aiden et al. [16] have discovered the A and B compartments, which are 126 correlated with relatively high and low gene density, respectively. We thus fitted a regression 127 model with the absolute difference in the number of tested genes between the two genomic 128 regions in place of the total number of tested genes (Fig 1B). As we expected, for all tissues 129 and cell lines, the difference in gene density has a significant negative effect on chromatin 130 interaction frequency, indicating that genomic regions with a larger discrepancy in gene density 131 tend to interact less frequently. The effects of the number of eGenes are still significantly 132 positive, although their magnitudes are less than those in the previous model except for spleen. 133 In addition to the two models above, we repeated the analyses with the fraction of eGenes the fraction of associations mapping in same TADs between the simulated and real data, we 145 found that the real data showed significantly higher fraction of eQTL-gene pairs falling in same 146 TADs than the simulated data, and this is true across all tissues and cell lines. For example, 147 74% of real eQTL-gene associations and 70% of simulated pairs in GM12878 were inside TADs 148 (Fisher's exact test, p-value < 2.2e-16). We further stratified the data by distance between the 149 eQTL and its associated gene. We found that majority of eQTL-gene associations at distance 150 from 40Kb to 400Kb are significantly enriched in TADs (Fig 2 and S3 Fig). For example, for 151 GM12878, the real eQTL-gene pairs had a significantly higher fraction inside TADs than the 152 simulated data at distance 40Kb to 280Kb (Fig 2A).  Table).

165
The GTEx project has identified many tissue-specific eQTLs, with effects in only one or a 166 few tissues [3]. Meanwhile, FIREs identified from Hi-C data also showed strong tissue 167 specificity [19]. Intrigued by the high tissue specificity of both eQTLs and FIREs, we next asked 168 whether the association between eQTL results and Hi-C data is also tissue-specific. Specifically, 169 we first identified tissue-specific FIREs and tissue-specific eQTLs based on the data from 170 Schmitt et al. and the GTEx project, respectively. For the 11 tissues we considered, a total of 171 349,311 eQTLs were tissue-specific. By design, all eQTLs detected by the GTEx study are 172 within 1Mb of the TSS of tested genes. On average, 3,488 FIREs were identified per tissue (S1 173   Table) and 18% of them were tissue-specific. We found a significant enrichment for tissue-174 specific FIREs in regions near the TSS of genes tested in GTEx. As showed in  Table) 187 188 We then examined whether tissue-specific eQTLs are enriched in tissue-specific FIREs. 189 Since all the eQTLs are within 1Mb of the TSS of tested genes by the design of the GTEx study, 9 we focused on FIREs that are also within 1Mb of the genes tested in GTEx for the tissue of 191 interest. For each tissue, we compared the fraction of tissue-specific eQTLs mapped to tissues-192 specific FIREs and to other FIREs. Among the 11 tissues we evaluated, five of them (DLPFC,193 spleen, hippocampus, pancreas and aorta) have significant positive association after Bonferroni 194 correction (Fig 4). In these five tissues, tissue-specific eQTLs are enriched in tissue-specific 195 FIREs, suggesting that there may be synergy between eQTLs and chromatin spatial 196 organization for gene regulation. However, a negative association was found in lung, suggesting 197 a more complicated relationship between eQTLs and chromatin spatial organization. The results 198 for other tissues were not significant. We also repeated the analysis for the cell lines GM12878 199 and IMR90, and obtained are not significant results (Fig 4). Taken together, these results

209
The scale of odds ratio is in log-scale.

211
Chromatin spatial organization and eQTLs are known to be involved in gene regulation. In 212 this work, we systematically studied the relationship between eQTL-gene association and 213 chromatin interaction across 11 tissues and 2 cell lines. To the best of our knowledge, this is the 214 most comprehensive study on this topic up to date. We found that chromatin interaction 215 frequency is positively correlated with the number of eGenes in all tissues and cell lines we 216 evaluated. Moreover, we found that eQTL-gene associations are enriched in TADs. Since both 217 eQTLs and FIREs are known to be tissue-specific, we further evaluated the tissue-specificity of 218 the relationship between eQTL-gene associations and chromatin interactions. We found that in 219 DLPFC, spleen, hippocampus, pancreas, and aorta, tissue-specific eQTLs are significantly 220 enriched in tissue-specific FIREs. This results highlight the tissue-specific manner of the positive 221 relationship between eQTLs and chromatin interactions. However, lung showed a significant 222 negative association between tissue-specific eQTLs and tissue-specific FIREs, which might be 223 due to more complicated mechanisms or high tissue heterogeneity. Our data demonstrate the 224

complexity of the relationship between eQTL-gene associations and chromatin interactions. 225
The tissue-specificity of the relationship between eQTLs and chromatin interactions can 226 be useful for identifying tissue-specific genes that are likely to be regulated by eQTLs through 227 chromatin interactions. For example, in the brain cortex tissue DLPFC, there are 2,954 tissue-228 specific eQTLs identified from the GTEx data and 323 tissue-specific FIREs identified from the 229 Hi-C data. When both factors are considered, we identified 32 DLPFC-specific eQTLs located in 230 the tissue-specific FIREs. These eQTLs are significantly associated with 4 genes, including 231 ADGRB2 (adhesion G protein-coupled receptor B2), WASF3 (WAS protein family member 3), 232 SPEF2 (sperm flagellar 2), and XPA (xeroderma pigmentosum complementation group A). 233 Among these genes, ADGRB2, which encodes a transmembrane signaling receptor [22], has a 234 brain-specific developmental expression pattern and its expression level is increased as the 235 development of the brain progressed [22]. The TSS of the ADGRB2 gene (chr1:32,192,718) is 236 ~47Kb from a brain-specific FIRE (chr1:32,240,000-32,320,000). 237 We matched the chromatin interaction data to the eQTL data simply by tissue name. For tissues (S1 Table and see S1 File). In all our analyses, we focus on the autosomes. The 261 reference genome is hg19. 262 The Hi-C data contained over 2.9 billion raw intra-chromosomal unique paired-end reads 263 on 13 samples in total, out of which >1 billion are long-range read pairs (>15Kb). An average of 264 2,068 TADs per sample were identified in the original study [19]. 265 266 We first evaluated the relationship between eQTL results and chromatin interaction 267 frequency on the 11 tissues and 2 cell lines using regression analysis (see S1 File). Specifically, 268

Regression analysis of chromatin interaction frequency
we considered autosomal chromosomes in 40kb bin resolution, and for every bin pair ( , ), we 269

302
For each of these 10,000 associations, we randomly selected an autosomal gene from the list of genes 303 tested in GTEx for the tissue or cell line, obtained its TSS position , and designated + as the 304 position of a simulated SNP as long as the position is within the chromosome.

305
Next, we performed the Fisher's exact test on the 2x2 tables of the counts of SNP-gene pairs 306 according to whether the pair is real or simulated and whether it is in the same TAD, and then repeated 307 the analysis by stratifying the data by genomic distance ranging from 40Kb to 1Mb.

309
For each of the 11 tissues, we defined tissue-specific FIREs as those detected only for 310 that tissue and not for any of the other 10 tissues. Tissue-specific eQTLs were similarly defined 311 using the GTEx meta-analysis results (see S1 File). Cell line-specific FIREs and eQTLs were 312 similarly defined using all 13 samples we considered.

316
For each tissue, we evaluated whether tissue-specific FIREs tend to be close to genes. 317 We counted the number of FIREs by whether the FIRE is within 1Mb of the TSS of genes tested 318 in GTEx for the tissue and by whether it is tissue-specific, and performed the Fisher's exact test 319 on the 2x2 table to assess the statistical significance of the enrichment of tissue-specific FIREs 320 within 1Mb of genes. 321

322
To explore the tissue specificity of the relationship between eQTLs and FIREs, we