Abstract
To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue-specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue-specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.
Author Summary Cells in different tissues perform very different functions with the same DNA. This requires tissue-specific gene expression and regulation; understanding this tissue-specificity is often instrumental to understanding complex diseases. Here, we use tissue-specific gene expression data to learn tissue-specific gene regulatory networks for 35 human tissues, where two genes are linked if their expression levels are correlated. Learning such networks accurately is difficult because of the large number of possible links between genes and small number of samples. We propose a novel algorithm that combats this problem by sharing data between similar tissues and show that this increases the accuracy with which networks are learned. We provide a web tool for exploring these networks, enabling users to pose diverse queries in a gene-or tissue-centric manner, and facilitating explorations into gene function and regulation.