Abstract
Background Normal function of specific intestinal epithelial cell types, like the mucin producing goblet cells or the anti-microbial peptide producing Paneth cells, is key to the homeostasis of the gut. Dysfunction of these cells is often associated with severe gut pathologies, such as inflammatory bowel disease (IBD), including Crohn’s disease and ulcerative colitis. Although transcriptional signatures of intestinal cells have been identified, an integrated and cell type specific network analysis that identifies the key regulators with their disease relevance has been lacking.
Method In this study, we profiled the expression of mRNAs, microRNAs and long non-coding RNAs from mouse derived 3D intestinal organoids whose differentiation was directed towards Paneth cells or goblet cells. We generated cell type specific regulatory networks by integrating expression data into multi-layered networks of regulatory interactions.
Results By mapping cell type specific marker genes to the network, we were able to identify regulators potentially contributing to cell type specific functions. Among the seven putative master regulators (those targeting many of the markers), we identified four nuclear hormone receptors with links to IBD, immunity and autophagy: Vdr, Rxra, Nr1d1 and Nr3c1. We also found common regulators relevant in both Paneth and goblet cells that regulate different sets of cell type specific markers as a result of regulatory rewiring after differentiation.
Conclusions We describe an integrative organoid study that combines-omics data with multi-layered networks to study the regulatory landscapes of Paneth cells and goblet cells. Using the developed computational workflow, we generated cell type specific regulatory networks encompassing transcription factor, microRNA and long non-coding RNA regulation at the transcriptional and post-transcriptional level. Analysis of the networks uncovered potential cell type specific master regulators. Specific investigation of four of these regulators identified links to IBD and to cellular phenotypes associated with IBD pathology. Therefore, we demonstrate that application of our workflow in a cell type specific context can be used to disentangle multifactorial mechanisms of IBD and improve our understanding of disease pathomechanisms.
Introduction
Gut barrier integrity is critically important for efficient nutrient absorption and maintenance of intestinal homeostasis (Zhang et al. 2015). Intestinal homeostasis and barrier integrity are cumulative by-products of the functions and activities of various cell types lining the intestinal epithelium. These intestinal epithelial cells serve to mediate signals between the gut microbiota and the host innate/adaptive immune systems (Gerbe and Jay 2016; Duerkop et al. 2009). Disruption of epithelial homeostasis along with dysregulated immune responses are some of the underlying reasons behind the development of inflammatory gut conditions such as Crohn’s disease (CD) and ulcerative colitis (UC) (Mokry et al. 2014). Therefore, a greater understanding of the functions of intestinal cells and their role in regulatory signalling, will further our understanding of the aetiology of inflammatory bowel diseases.
The diverse functions of the mammalian intestinal epithelium are attributed to the distinct cell types lining the epithelium (Okumura and Takeda 2017). To date, various intestinal epithelial cell types have been identified based on specific functional and gene expression signatures. Among the documented cell types, Paneth cells (which reside in the small intestinal crypts of Lieberkuhn) help to maintain the balance of the gut microbiota by secreting anti-microbial peptides, cytokines and other trophic factors (Bevins and Salzman 2011). Located further up the intestinal villi, goblet cells secrete mucin, which aggregates to form the mucus layer. The mucus layer acts as a physical barrier between the intestinal lumen and the epithelial lining (Kim and Ho 2010). Both of these intestinal cell types have documented roles in gut-related diseases (Gersemann et al. 2011; Okamoto and Watanabe 2016). Dysfunctional Paneth cells with reduced secretion of anti-microbial peptides have been shown to contribute to the pathogenesis of CD (Liu et al. 2016), while reduction in goblet cell numbers and defective goblet cell function has been associated with UC (Gersemann et al. 2009; Kim and Ho 2010).
To understand the function of intestinal epithelial cell types in a disease context, it is important to first identify their molecular signatures and regulatory pathways in a healthy context. At least three separate studies have captured RNA profiles of healthy and/or diseased intestinal cell types; capturing messenger RNA (mRNA), microRNA (miRNA) and long non coding RNA (lncRNA) signatures (Mirza et al. 2015; Haber et al. 2017; Peck et al. 2017). MiRNAs and lncRNAs, together with transcription factors (TFs) (proteins that regulate mRNA transcription), perform critical regulatory functions in maintaining intestinal homeostasis. Dysregulation of these functions has been associated with various gut pathologies (Chapman and Pekow 2015; Mirza et al. 2015). However, a comprehensive analysis of miRNAs, TFs and lncRNAs as well as their regulated genes has not yet been performed on a systems-level in a standardized manner. To identify connections between a regulator and its downstream pathological effect, a systems biology approach is often required. For example, biological interaction networks are a type of systems biology data representation, which aid the interpretation of-omics read-outs by identifying relevant signalling and regulatory pathways. In particular, the study of regulatory interactions using interaction networks, can be used to uncover how cells respond to changing environments at a transcriptional level and to investigate the downstream effects of gene mutations and knockouts.
Many recent studies, particularly those focusing on diseases, have used biopsy samples to produce-omics read-outs (Mirza et al. 2015; Balfe et al. 2018). When the expression profile of diseased tissue is greatly different from a healthy control, these experiments can yield interesting results. However, due to the cellular heterogeneity of the biopsies, the-omics readouts usually represent a combination of different cell types (including semi-differentiated cells). Current understanding of many intestinal diseases such as CD and UC, implicates specific cell types in the dysregulation of homeostasis (Adolph et al. 2013). Whilst primary intestinal epithelial cells all originate from Lgr5+ stem cells, differentiation results in differences in gene expression as well as signalling and regulatory wiring (Crosnier et al. 2006; Vanuytsel et al. 2013). These differences can result in altered phenotypic functions, responses to stress and susceptibilities to specific dysregulations. Therefore, to understand the role of these cell types in diseases, it is necessary to study their molecular networks and pathways in a cell type specific manner. Here, we reduced the heterogeneity issues often encountered with intestinal biopsies by employing the small intestinal organoid system, which recapitulates many features of the in vivo intestinal tissue (Sato et al. 2009).
In this study, we identified Paneth cell and goblet cell specific mRNAs, miRNAs and lncRNAs, and applied this data to generate cell type specific regulatory networks. To obtain the expression data, we used mouse derived 3D organoids; normally differentiated (control) and those enriched in Paneth cells or goblet cells. In two recently published reports, we show that these organoid models are useful for the investigation of normal and disease processes in these intestinal cell types (Luu et al. 2018; Jones et al. 2019). We performed RNAseq transcriptomics analysis on the organoids and compared the readouts from enriched organoids to the control organoids. To the best of our knowledge, this is the first study profiling all the three RNA regulator types from across different intestinal epithelial cell types. To contextualise the sequencing data, we integrated the expression data with directed regulatory networks from transcriptional and post-transcriptional regulatory resources. Our integrative strategy to infer regulatory network landscapes (Figure 1) helped to identify putative master regulators of Paneth cells and goblet cells, and highlighted the contribution of regulatory rewiring to the diverse phenotypes of these cells.
Materials and Methods
Animal handling
C57/Bl6 mice of both sexes were used for organoid generation. All animals were maintained in accordance with the Animals (Scientific Procedures) Act 1986 (ASPA).
Small intestinal organoid culture
Murine small intestinal organoids were generated as described previously (Sato et al. 2009; Sato and Clevers 2013). Briefly, the entire small intestine was opened longitudinally, washed in cold PBS then cut into ~5mm pieces. The intestinal fragments were incubated in 30mM EDTA/PBS for 5 minutes, transferred to PBS for shaking, then returned to EDTA for 5 minutes. This process was repeated until five fractions had been generated. The PBS supernatant fractions were inspected for released crypts. The crypt suspensions were passed through a 70μm filter to remove any villus fragments, then centrifuged at 300xg for 5 minutes. Pellets were resuspended in 200μl phenol-red free Matrigel (Corning), seeded in small domes onto coverslips and incubated at 37°C for 20 minutes to allow Matrigel to polymerise. Organoid media containing EGF, Noggin and R-spondin (ENR; (Sato et al. 2009)) was then overlaid. Organoids were generated from three separate animals for each condition, generating three biological replicates.
Chemically inducing differentiation in organoid cultures
On days two, five and seven post-crypt isolation, ENR media was changed to include additional factors for each cell type specific condition: 3μM CHIR99021 (Tocris) and 10μM DAPT (Tocris) [Paneth cells]; 2μM IWP-2 (Tocris) and 10μM DAPT [goblet and enteroendocrine cells] (Yin et al. 2014).
RNA extraction
On day eight, organoids were extracted from Matrigel using Cell Recovery Solution (BD Bioscience), rinsed in PBS and RNA was extracted using Exiqon tissue kit according to the manufacturer’s protocol.
Stranded RNA library preparation
The organoid transcriptomics libraries were constructed using the NEXTflex™ Rapid Directional RNA-Seq Kit (PN: 5138-07) using the polyA pull down beads from Illumina TruSeq RNA v2 library construction kit (PN: RS-122-2001) with the NEXTflex™ DNA Barcodes – 48 (PN: 514104) diluted to 6uM. The library preparation involved an initial QC of the RNA using Qubit DNA (Life technologies Q32854) and RNA (Life technologies Q32852) assays as well as a quality check using the PerkinElmer GX with the RNA assay (PN:CLS960010).
1µg of RNA was purified to extract mRNA with a poly-A pull down using biotin beads, fragmented and first strand cDNA was synthesised. This process reverse transcribes the cleaved RNA fragments primed with random hexamers into first strand cDNA using reverse transcriptase and random primers. The second strand synthesis process removes the RNA template and synthesizes a replacement strand to generate ds cDNA. Directionality is retained by adding dUTP during the second strand synthesis step and subsequent cleavage of the uridine containing strand using Uracil DNA Glycosylase. The ends of the samples were repaired using the 3’ to 5’ exonuclease activity to remove the 3’ overhangs and the polymerase activity to fill in the 5’ overhangs creating blunt ends. A single ‘A’ nucleotide was added to the 3’ ends of the blunt fragments to prevent them from ligating to one another during the adapter ligation reaction. A corresponding single ‘T’ nucleotide on the 3’ end of the adapter provided a complementary overhang for ligating the adapter to the fragment. This strategy ensured a low rate of chimera formation. The ligation of a number indexing adapters to the ends of the DNA fragments prepared them for hybridisation onto a flow cell. The ligated products were subjected to a bead based size selection using Beckman Coulter XP beads (PN: A63880). As well as performing a size selection this process removed the majority of unligated adapters. Prior to hybridisation to the flow cell the samples were PCR’d to enrich for DNA fragments with adapter molecules on both ends and to amplify the amount of DNA in the library. The strand that was sequenced is the cDNA strand. The insert size of the libraries was verified by running an aliquot of the DNA library on a PerkinElmer GX using the High Sensitivity DNA chip (PerkinElmer CLS760672) and the concentration was determined by using a High Sensitivity Qubit assay and q-PCR. Libraries were then equimolar pooled and checked by qPCR to ensure the libraries had the necessary sequencing adapters ligated.
Small RNA library preparation
The small RNA libraries were made using the TruSeq Small RNA Library Prep Kits, six-base indexes distinguish samples and allow multiplexed sequencing and analysis using 48 unique indexes ((Set A: indexes 1-12 (RS-200-0012), Set B: indexes 13–24(RS-200-0024), Set C: indexes 25–36 (RS-200-0036), Set D: indices 37–48 (RS-200-0048)) (TruSeq Small RNA Library Prep Kit Reference Guide (15004197 Rev.G)). The TruSeq Small RNA Library Prep Kit protocol is optimised for an input of 1µg of total RNA in 5 µl nuclease-free water or previously isolated microRNA may be used as starting material (minimum of 10–50 ng of purified small RNA). Total RNA is quantified using the Qubit RNA HS Assay kit (ThermoFisher Q32852) and quality of the RNA is established using the Bioanalyzer RNA Nano kit (Agilent Technologies 5067-1511), it is recommended that RNA with an RNA Integrity Number (RIN) value ≥ 8 is used for these libraries as samples with degraded mRNA are also likely to contain degraded small RNA.
This protocol generates small RNA libraries directly from RNA by ligating adapters to each end of the RNA molecule, the 3’ adapter is modified to target microRNAs and other small RNAs that have a 3’ hydroxyl group resulting from enzymatic cleavage by Dicer or other RNA processing enzymes, whilst most mature microRNAs have a 5’ phosphate group. Reverse transcription is used to create cDNA, and PCR amplification of the cDNA (standard protocol is 14 cycles of PCR) is used to generate libraries. Library purification combines using BluePippin cassettes (Sage Science Pippin Prep 3% Cassettes Dye-Free (CDF3010), set to collection mode range 125-160bp) to extract the library molecules followed by a concentration step (Qiagen MinElute PCR Purification (cat. no. 28004)) to produce libraries ready for sequencing. Library concentration and size are established using HS DNA Qubit and HS DNA Bioanalyser. The resulting libraries were then equimolar pooled and qPCR was performed on the pool prior to clustering.
Stranded RNA library sequencing on HiSeq 100PE
The final pool was quantified using a KAPA Library Quant Kit and found to be 2nM. 9.5µl of the pool was combined with 0.5µl 2N NaOH to make a 1.9nM dilution. This was incubated for five minutes at room temperature to denature the libraries before 940µl of HT1 was added to make a 20pM dilution. This dilution was stored at-20°C for ~2 weeks before being run. 60µl of the 20pM dilution was combined with 60µl of HT1 plus a 1% PhiX spike in three 200µl tubes of a strip of eight to make a final running concentration of 10pM. The flow-cell was clustered using HiSeq PE Cluster Kit v3 (Illumina PE-401-3001) utilising the Illumina PE HiSeq Cluster Kit V3 cBot recipe V8.0 method on the Illumina cBot. Following the clustering procedure, the flow-cell was loaded onto the Illumina HiSeq2000 instrument following the manufacturer’s instructions with a 101 cycle paired reads and a 7 cycle index read. The sequencing chemistry used was HiSeq SBS Kit v3 (Illumina FC-401-3001) with HiSeq Control Software 2.2.68 and RTA 1.18.66.3. Reads in bcl format were demultiplexed based on the 6bp Illumina index by CASAVA 1.8, allowing for a one base-pair mismatch per library, and converted to FASTQ format by bcl2fastq.
Small RNA library sequencing on HiSeq Rapid 50SE
The final pool was quantified using a KAPA Library Quant Kit and found to be 11.2nM. 1.8µl of the pool was combined with 0.5µl of 2N NaOH and 7.7µl EB to make a 2nM dilution. This was incubated for five minutes at room temperature to denature the libraries. 10µl of this was added to 990µl of HT1 to give 1ml at 20pM. 54.7µl of the 20pM dilution was combined in a single 200µl tube of a strip of eight with 7.3µl of PhiX at 20pM and 73µl of HT1. The libraries were hybridized to the flow-cell using TruSeq Rapid Duo cBot Sample Loading Kit (Illumina CT-403-2001), utilising the Illumina RR_TemplateHyb_FirstExt_VR method on the Illumina cBot. The flow-cell was loaded onto the Illumina HiSeq2500 instrument following the manufacturer’s instructions with a 51 cycle single read and a 7 cycle index read. The sequencing chemistry used was HiSeq SBS Rapid Kit v2 (Illumina FC-402-4022) with a single read cluster kit (Illumina GD-402-4002) HiSeq Control Software 2.2.68 and RTA 1.18.66.3. Reads in bcl format were demultiplexed based on the 6bp Illumina index by CASAVA 1.8, allowing for a one base-pair mismatch per library, and converted to FASTQ format by bcl2fastq.
Mapping and identification of differentially expressed transcripts
The quality of stranded reads was assessed by FastQC software (version 0.11.4) (Andrews 2010). All reads coming from technical repeats were concatenated together and aligned (in stranded mode, i.e. with ‘--rna-strandness RF’ flag) using HISAT aligner (version 2.0.5) (Kim et al. 2015). Subsequently, a reference-based de novo transcriptome assembly was carried out for each biological repeat and merged together using StringTie (version 1.3.2) with following parameters: minimum transcript length of 200 nucleotides, minimum FPKM of 0.1 and minimum TPM of 0.1 (Pertea et al. 2015; Pertea et al. 2016). Coding potential of each novel transcript was determined with CPC (version 0.9.2) and CPAT (version 1.2.2) (Kong et al. 2007; Wang et al. 2013). From the novel transcripts, only non-coding transcripts (as predicted by both tools) were included in final GTF file. Gene and transcript abundances were estimated with kallisto (version 0.43.0) (Bray et al. 2016). Sleuth (version) R library was used to perform differential gene expression (0.28.1) (Pimentel et al. 2017). mRNAs and lncRNAs with an absolute log2 fold change of 1 and q value <= 0.05 were considered to be differentially expressed.
The small RNA reads were analysed using the sRNAbench tool within the sRNAtoolbox suite of tools, which is dedicated for the mapping and identification of miRNAs from NGS data (Rueda et al. 2015). The barcodes from the 5’ end and adapter sequences (corresponding to Illumina RA3) from the 3’ end were removed respectively. Zero mismatches were allowed in detecting the adapter sequences with a minimum adapter length set at 10. Post barcode and adapter trimming, only reads that have a minimum length of 16 and a read-count of 2 were considered for further analysis. The mice miRNA collection was downloaded from miRBase version 21 (Kozomara and Griffiths-Jones 2014). The trimmed and length filtered reads were then mapped to the mature version of the miRBase miRNAs in addition to the annotated version of the mouse genome (version mm10). No mismatches were allowed for the mapping. A seed length of 20 was used for the alignment with a maximum number of multiple mappings set at 10. Read-counts normalized after multiple-mapping were calculated for all the libraries corresponding to the control and differentiated organoids. The multiple-mapping normalized read-counts from the corresponding differentiated organoids were compared against the control to identify differentially expressed miRNAs in a pair-wise manner using edgeR (Robinson et al. 2010). miRNAs with an absolute log2 fold change of 1 and false discovery rate <= 0.05 were considered to be differentially expressed.
Enrichment of marker genes in differentially expressed gene lists
Cell type specific marker gene lists were obtained from the mouse single cell sequencing survey published by Haber et al. (Haber et al. 2017). The cell type specific signature genes of the droplet based and the plate based method were obtained for Paneth, goblet and enteroendocrine cell types. Gene symbols were converted to Ensembl gene IDs using bioDBnet db2db (Mudunuri et al. 2009).
Hypergeometric distribution testing was carried out using a custom R script to measure enrichment of cell type specific marker genes in the differentially upregulated gene sets. To standardise the background dataset and thus enable hypergeometric distribution testing, only markers which are present in the output of the wald test (genes with variance greater than zero among samples) were used. Similarly to enable fair comparisons, only differentially expressed protein coding genes and documented lncRNAs were used from the DEG lists, as was surveyed in the cell type specific marker paper. Bonferroni correction was applied to the hypergeometric distribution p values to account for multiple testing and significance scores were calculated using-log10(corrected p value). For the mapping of marker genes to the interaction networks, no filters were applied.
Reconstruction of molecular networks
Mice regulatory networks containing the different regulatory layers were retrieved from multiple databases: miRNA-mRNA (ie., miRNAs regulating mRNAs) and lncRNA-miRNA (ie., miRNAs regulating lncRNAs) interactions were downloaded from TarBase v7.0 (Vlachos et al. 2015) and LncBase v2.0 (Paraskevopoulou et al. 2016), respectively. Only miRNA-mRNA and lncRNA-miRNA interactions determined using HITS-CLIP (Chi et al. 2009) experiments were considered. Regulatory interactions between transcription factors (TFs) and miRNAs (ie. TFs regulating miRNAs) were specifically retrieved from TransmiR v1.2 (Wang et al. 2010), GTRD (Yevshin et al. 2017) and TRRUST v2 (Han et al. 2015; Han et al. 2018). Co-expression based inferences were ignored. Transcriptional regulatory interactions (ie., TFs regulating target genes) were inferred using data from ORegAnno 3.0 (Lesurf et al. 2016) and GTRD or TRUSST. In cases where transcriptional regulatory interactions are derived from high-throughput datasets such as ChIP-seq, we attributed the regulatory interaction elicited by the bound transcription factor to genes which lie within a 10kb window on either side of the ChIP-seq peak (ORegAnno) or meta-cluster (in the case of GTRD). LncRNA-protein interactions (ie., lncRNAs regulating proteins) were retrieved from the NPINTER database v3.0 (Hao et al. 2016). Only those lncRNA-protein interactions were considered wherein the interactions were confirmed as physical interactions and did not involve non-specific RNA-binding proteins. Non-specific mice RNA-binding proteins were identified using the RBPDB and ATtRACT databases (Cook et al. 2011; Giudice et al. 2016). TF-lncRNA interactions (ie., TFs regulating lncRNAs) were also inferred based on the ChIP-seq binding profiles represented by meta-clusters in GTRD. We used only TF-lncRNA interactions within intergenic lncRNAs to avoid assigning false regulatory interactions due to the high number of instances where the lncRNAs overlap with protein-coding genes. In addition, no overlaps were allowed between the coordinates of the ChIP-seq peaks / meta-clusters and any gene annotation. Only if the first annotation feature within a 10kb genomic window downstream to the ChIP-seq peak / meta-cluster was designated as an intergenic lncRNA, a regulatory interaction between the TF and the lncRNA was assigned. Bedtools (Quinlan and Hall 2010) was used for the custom analyses involving looking for overlaps between coordinates. All the nodes in the collected interactions were represented by their Ensembl gene IDs for standardization. A summary of the interactions collected from each resource and the quality control criteria applied is given in Supplementary table 1.
Regulatory rewiring analysis
To calculate rewiring scores for regulators, sub-networks were extracted (from the Paneth cell and goblet cell regulatory networks) containing just the regulator of interest and its downstream targets. For each regulator of interest, the subnetworks from the Paneth cell and goblet cell datasets were compared using the Cytoscape app DyNet (Shannon et al. 2003; Goenawan et al. 2016). The degree corrected Dn score was extracted for each regulator and used to quantify rewiring of the regulator’s downstream targets between the Paneth cell and the goblet cell regulatory networks. Functional analysis was carried out on the targets of the top five most rewired regulators. For each regulator, the targets were classified based on whether they are present in only the Paneth cell regulatory network, only the goblet cell regulatory network or in both networks. Each group of targets was tested for functional enrichment (hypergeometric model, p value <= 0.1) based on Reactome and KEGG annotations using the ReactomePA R package (Ogata et al. 1999; Yu and He 2016; Kanehisa et al. 2017; Fabregat et al. 2018) following conversion from mouse to human identifiers using Inparanoid (v8) (O’Brien et al. 2005; Sonnhammer and Östlund 2015).
Results
Establishment of Paneth cell or goblet cell-enriched in vitro models
We applied small intestines from mice to generate 3D self-organising, in vitro organoid cultures. By altering the stem cell culture medium, as described in (Yin et al. 2014) we were able to generate organoids with a standard distribution of cell types (normally differentiated) as well as organoids enriched with Paneth cells and organoids enriched with goblet cells. These organoids recapitulated features of the in vivo small intestinal epithelium, including formation of an internal lumen and distinct crypt-and villus-like domains (Figure 2).
Cell type specific marker genes are enriched in cell type enriched organoids compared to control organoids
Gene expression dynamics were studied by carrying out RNA extraction and sequencing on the normally differentiated, Paneth cell enriched and goblet cell enriched organoids. One technical repeat from the Paneth cell enriched organoids was removed due to poor read quality. Following mapping and identification of reads, differential expression was calculated. The cell type enriched organoid RNAseq data was compared to the normally differentiated data to obtain cell type specific gene expression signatures. In total 4,135 genes were differentially expressed in the Paneth cell enriched data at log2 fold change of 1 or greater (positive or negative) and 2,889 in the goblet cell enriched data (Figure 3). The larger number of differentially expressed genes (DEGs) in the Paneth cell data could be attributed to the highly specialised nature of the cell type (Stappenbeck and McGovern 2017; Clevers and Bevins 2013). Some of these DEGs were identified in both the Paneth cell and the goblet cell datasets, exhibiting the same direction of change compared to the normally differentiated organoid data. In total, 1,363 genes were found upregulated in both the Paneth cell and goblet cell DEGs. 442 genes were found downregulated in both datasets. This result highlights considerable overlap between the cell types,, and can be explained by the shared differentiation history and secretory function of Paneth cells and goblet cells. The majority of the DEGs were annotated as protein coding. In addition, lncRNAs and miRNAs were identified and any gene not annotated as one of the above is classified as ‘other’, which included pseudogenes and antisense genes.
To validate the cell types present in the organoids, major cell type specific markers were observed across the organoids using transcript abundances and RNA differential expression results (Supplementary table 2). The control organoids and the cell type enriched organoids expressed markers for stem cells (Lgr5), enteroendocrine cells (ChgA), goblet cells (Muc2), Paneth cells (Lyz1) and enterocytes (Vil1). This confirms presence of all major small intestinal epithelial cell types in the organoids. In addition, we observed an upregulation of Muc2, Lyz1 and ChgA in Paneth cell and goblet cell enriched organoids compared to the control organoids. Lgr5 is downregulated in the cell type enriched organoids compared to the control organoids, confirming the more pronounced differentiated status of the organoids. Vil1 is not differentially expressed between the cell type enriched and the control organoids, indicating similar levels of enterocytes in organoids. Therefore, using primary cell type specific markers we show that the organoids contain all major cell types at quantities expected based on the culture mediums used.
To further investigate the cell type specificity of the DEGs, we measured enrichment of cell type specific marker genes in the upregulated DEG lists - given that DEGs represent the difference in cell type proportions between the cell type enriched and normally differentiated organoids. Marker gene lists were obtained from a high quality single cell sequencing survey carried out by Regev and colleagues (Haber et al. 2017). Hypergeometric distribution testing was used to generate a significance score for enrichment of Paneth cell and goblet cell type specific markers in each set of DEGs. While all tests were significant at corrected p value <= 0.05, we identified greater enrichment of Paneth cell markers in the Paneth cell DEG list and goblet markers in the goblet cell DEG list (Figure 4). This confirms that differential expression tests successfully discerned cell type specific signatures from the RNAseq data, although the data also contains noise from other cell types. Carrying out the marker enrichment method using enteroendocrine markers, we identified that both of the upregulated DEG lists are additionally enriched with enteroendocrine markers (Supplementary table 3). The organoid differentiation protocols used have been shown previously, in addition to Paneth and goblet cell enrichment, to result in enteroendocrine enrichment at the mRNA level (Yin et al. 2014). In conclusion, we have used gene expression analysis to identify expression signatures of Paneth cells and goblet cells. Application of extensive cell type specific markers data shows that the Paneth and goblet expression datasets exhibit cell type specific signatures.
Reconstruction of cell type specific regulatory networks
A mouse molecular interaction network was reconstructed through collation of experimentally verified regulatory interactions from seven open source databases (see Methods, Supplementary table 1). This background network provided a framework on which we overlaid the differential expression data. All filtered interactions are directed and comprise the following types: transcription factor (TF) - target gene (TG), TF - lncRNA, miRNA - mRNA, TF - miRNA, lncRNA - miRNA and lncRNA - protein. All molecules are represented using their Ensembl gene ID. In total 1,383,896 unique interactions were combined to create the network (Supplementary table 1). Of these, the largest two contributors are TF-TG interactions at 77% and TF-lncRNA interactions at 11%. These interactions connect a total of 23,093 unique molecules.
To generate cell type specific regulatory networks, background interactions were filtered using the differential expression data. The assumption was made that if both nodes of a particular interaction were expressed in the RNAseq data, the interaction was possible within the cell type of interest. Furthermore, to filter for the interactions of prime interest, only nodes which were differentially expressed and their associated interactions were included in the cell type specific networks. In total the Paneth network, generated using differential expression data from the Paneth cell enriched organoids compared to the normally differentiated organoids, contained 37,062 interactions, connecting 208 unique regulators with 3,023 unique targets (Figure 5). The goblet cell network, generated using differential expression data from the goblet cell enriched organoids compared to the normally differentiated organoids, contained 19,173 interactions connecting 125 unique regulators with 2,095 unique targets (Figure 5). 15.7% of all interactions (8,856 out of 56,235) were shared between the Paneth cell and goblet cell networks - although the direction of differential expression of the interacting nodes do not necessarily match between the networks. In each of these networks, some nodes act as both a regulator and a target and some were represented in more than one molecular form, for example, as a lncRNA as well as a target gene. Consequently, we have generated cell type specific regulatory interaction networks for Paneth cell and goblet cell types including transcriptional and post-transcriptional interactions.
Identification of potential cell type specific master regulators
Cell type specific markers represent genes and associated functions specific to a particular cell type. Therefore, we expect that the regulators of these marker genes have an important role in cell type specification and function. Consequently, to identify potential master regulators of the Paneth and the goblet cell types, the upstream regulators of cell type specific markers were investigated. To do this, all markers were mapped to the relevant networks and subnetworks were extracted consisting of markers and their regulators (Supplementary table 4). 33 Paneth cell markers were identified, all of which act as target genes in TF-TG interactions and 9 of which also act as mRNAs in miRNA-mRNA interactions (Figure 6). 62 regulators were identified for these markers in the Paneth cell network. In the goblet cell network 150 markers were identified with 63 regulators (Figure 6).
Observing the ratio of regulators and markers in the cell type specific marker networks (subnetworks), the Paneth cell dataset has, on average, 1.88 regulators for each marker. On the other hand, the goblet cell dataset exhibits only 0.42 regulators for each marker. The quantity of markers identified in each sub-network (33 in the Paneth network and 150 in the goblet network) correlates with the number of marker genes identified by Haber et al. (Haber et al. 2017) far fewer regulators are identified in the goblet cell subnetwork per marker than for Paneth cell. Whilst the underlying reason for this discrepancy is unknown, it could potentially be evidence of the complex regulatory environment required to integrate and respond to the arsenal of signals recognised by Paneth cells in comparison to goblet cells (Stappenbeck and McGovern 2017).
Further investigation of the distinct regulator-marker interactions highlights a gradient of regulator specificity (Figure 7). A collection of regulators (both cell type specific and shared) appear to regulate large proportions of the markers in the networks. For example, Ets1, Nr3c1 and Vdr regulate >50% of the markers in both the Paneth cell and the goblet cell subnetworks. Specific to the Paneth cell subnetwork, Cebpa, Jun, Nr1d1 and Rxra regulate >50% of the markers. Specific to the goblet cell subnetwork, Gfi1b and Myc regulate >50% of the markers. These regulators are potential master regulators of the given cell types. In contrast, regulators such as Mafk in the Paneth cell subnetwork and Spdef in the goblet cell subnetwork regulate only one marker. Klf15 is present in both subnetworks, regulating only two markers in each; Fgfrl1 and Pla2g2f in Paneth cells, Ggcx and Tor3a in goblet cells. These regulators likely have more functionally specific roles. Together, these results highlight potential cell type specific regulators which likely play key roles in specification and maintenance of Paneth and goblet cells and their functions.
Putative cell type specific regulators exhibit rewiring between Paneth cells and goblet cells
Of all the regulators of cell type specific markers, we identified in the Paneth and the goblet regulatory networks approximately 1/3rd (33/92) as present in both networks (Figure 6). Given that the markers are different between the cell types, a regulator shared between the Paneth cell and goblet cell marker sub-networks must show an altered pattern of regulatory targeting in the two cell types. This phenomenon, referred to as regulatory rewiring, often results in functional differences between regulators in different environments - for example, in this case, between the Paneth cells and goblet cells.
The Cytoscape application, DyNet, was used to quantify the levels of regulatory rewiring for each of the 33 shared marker regulators in the Paneth cell and goblet cell regulatory networks (Supplementary table 5) (Goenawan et al. 2016). Using this analysis we identified the five most rewired regulators as Etv4, mmu-let-7e-5p, mmu-miR-151-3p, Myb and Rora. Functional enrichment analysis was carried out on the targets of these regulators to test whether the targets specific to the Paneth cell and goblet cell networks have different functions (hypergeometric model, p value <= 0.1) (Supplementary table 6). Across all five regulators the general trend indicated that regulator targets specific to the Paneth cell regulatory network are associated with metabolism and regulator targets specific to the goblet cell regulatory network are associated with cell cycle and DNA repair. This suggests these functions are key features of the different cell types. Looking at the regulators in more detail, the goblet specific targets of mmu-miR-151-3p are significantly enriched in functions relating to antigen presentation, cell junction organisation, Notch signalling and the calnexin/calreticulin cycle. None of these functions are enriched in the shared or Paneth specific targets. Of particular interest is the calnexin/calreticulin cycle which is known to play an important role in ensuring proteins in the endoplasmic reticulum are correctly folded and assembled (Leach and Williams 2013). Therefore we hypothesise that mmu-miR-151-3p plays a role in the secretory pathway of goblet cells. In addition, different functional profiles were also observed for the targets of Rora in the Paneth and goblet regulatory networks. Targets present in both networks are significantly associated with mitosis, whereas those specific to the Paneth network are associated with metabolism, protein localisation, nuclear receptor transcription pathway, circadian clock and hypoxia induced signalling. Goblet cell specific targets of Rora are connected to cell cycle, signalling by Rho GTPases (associated with cell migration, adhesion and membrane trafficking), Notch signalling and interferon signalling. Altogether these observations suggest that regulatory rewiring has occurred between the Paneth and goblet cell types, resulting in functional differences in the actions of the same regulators. In particular, we see that metabolic functions are associated with Paneth cell regulators and the cell cycle is associated with goblet cell regulators.
Integrating our data with disease-related literature associations
To investigate the relevance of the identified regulators in human gastro-intestinal diseases, such as IBD, we carried out literature searches of the three groups of predicted master regulators: those specific to the Paneth marker subnetwork (Cebpa, Jun, Nr1d1 and Rxra), those specific to the goblet marker subnetwork (Gfi1b and Myc) and those which appear to regulate many of the markers in both the Paneth and the goblet subnetworks (Ets1, Nr3c1 and Vdr). Interestingly, five of these genes (Ets1, Nr1d1, Rxra, Nr3c1 and Vdr) encode transcription factors with connections to inflammation, autophagy and inflammatory bowel disease (IBD), as shown in Table 1. All five are associated with the Paneth cell subnetwork and apart from ETS1, all are nuclear hormone receptors. Paneth cell and goblet cell secretory functions, which play a major role in gut homeostasis, are highly dependent on autophagy (Cadwell et al. 2008; Stappenbeck 2010; Patel et al. 2013; Jones et al. 2018). Given that dysfunction of secretion and autophagy in Paneth and goblet cells has been associated IBD (Cadwell et al. 2009; Gersemann et al. 2009; Kim and Ho 2010; Ke et al. 2016; Stappenbeck and McGovern 2017; Jones et al. 2018), these putative master regulators highlight a link between Paneth and goblet cells, inflammation and IBD, providing an interesting avenue for further research.
Additionally, given the possible connections between identified master regulators and IBD, we tested the potential of our cell type specific regulatory networks for the study of Crohn’s disease (CD). To do this, we observed the presence of known CD susceptibility genes in the networks, using lists obtained from two autoimmune disease studies (Jostins et al. 2012; Farh et al. 2015). Of the 81 CD susceptibility genes, 22 were identified in the Paneth cell network and 12 in the goblet cell network (Supplementary table 7). Of these, one gene acts as a regulator in each network. In the Paneth network the CD associated gene Dbp, coding the D site binding protein, regulates Bik which codes the BCL2 interacting killer, a pro-apoptotic, death promoting protein. In the goblet cell network the CD associated gene Notch2 regulates Notch3 and Hes1. Interestingly, Real et al. demonstrated that through Notch signalling, HES1 can inhibit the expression of NR3C1, blocking glucocorticoid resistance in T-cell acute lymphoblastic leukaemia (Real et al. 2009). Additionally, the repression of HATH1 by HES1 via Notch signalling has been previously associated with goblet cell depletion in humans (Zheng et al. 2011). Ultimately, the observation of IBD susceptibility genes in the regulatory networks of these organoids highlights possible application of this model system to the study disease regulation at the cellular level.
Discussion
Through the integration of organoid RNAseq datasets, we have generated Paneth cell type and goblet cell type specific regulatory networks. With network analysis, we predicted regulators of cell type specific marker genes and highlighted the role of regulatory rewiring in related secretory cells, such as Paneth cells and goblet cells. Furthermore, we showed that these regulatory networks have potential applications to study cellular functions in disease contexts, such as IBD.
The study of intestinal epithelia, in particular Paneth cells, has been hampered by difficulty culturing these non-proliferating cell types. In the last decade, development of organoid culturing techniques has enabled great progress in the understanding of intestinal epithelial physiology and function. These ex vivo intestinal models, grown from Lgr5+ stem cells, maintain basic small intestinal crypt physiology, recapitulating epithelial homeostasis (Zachos et al. 2016; Sato et al. 2009). Here, we utilised small intestinal organoid cultures to measure transcriptional signatures of Paneth cells and goblet cells. To obtain cell type specific data, organoid differentiation was directed to enrich for these specific cell types (Yin et al. 2014) and differential expression carried out by comparison to normally differentiated organoids. We showed that relevant cell type specific markers (Haber et al. 2017) are enriched in the differential expression datasets. However, we also found that enteroendocrine markers were significantly enriched in both datasets, which could suggest that the organoid differentiation protocols additionally affected this cell type; a result also seen in the original protocol paper and by others (Yin et al. 2014; Mead et al. 2018). Given that enteroendocrine cell lineage specification is closely related in differentiation pathway to both Paneth and goblet cell types, this is not surprising. In addition, the presence of noise in the differential expression data, as shown by the marker enrichment results, is likely to be a side effect of close functional relationships between the Paneth cells and goblet cells. Furthermore, genes in the applied marker lists may also be expressed (at lower levels) in other cell types. The gene Itln1 provides a good example of this: it is given as a Paneth cell marker, but interrogation of the Haber et al. data shows the gene also has significant expression levels in goblet cells (Haber et al. 2017). Likely the gene, alongside many others, is important in both the Paneth cells and the goblet cells. However, despite the possible noise indicated by the cell-type specific marker analysis, our data provides far greater cell type specific resolution than using whole tissue samples. In future, we hope to significantly reduce this noise by applying fluorescence-activated cell sorting on the organoids to generate pools of cell types for subsequent sequencing - but this approach comes with its own challenges (Jung and Jung 2016). Transcriptomics of a cellular population is preferable to single cell sequencing as it mitigates effects of intracellular variation, for example due to spatio temporal asynchrony.
Through interrogation of transcriptional signatures of stranded and small RNAs, we were able to observe transcriptional (transcription factor - target gene) and post-transcriptional (e.g. miRNA-mRNA) regulation. miRNAs and lncRNAs have been shown to perform critical regulatory and mediatory functions in maintaining intestinal homeostasis (Chapman and Pekow 2015; Mirza et al. 2015). For example, a study by Farh et al. into the genetics of autoimmune diseases found that ~90% of causal variants map to non-coding regions of the genome (Farh et al. 2015), thus highlighting the importance of integrating miRNA and lncRNAs into regulatory networks. However, we believe our current study is the first to generate comprehensive transcriptional and post-transcriptional regulatory networks of Paneth cells and goblet cells. Interestingly, whilst absolute mRNA and protein copy numbers have previously been reported to correlate poorly within fibroblast samples (Schwanhäusser et al. 2011), recent publications found that within cell type enriched organoids, transcriptomic changes are well correlated to the proteome (Lindeboom et al. 2018) and with in vivo gene expression (Mead et al. 2018). Nevertheless, the addition of further ‘omics data-types to the described approach could generate a more holistic view of cellular molecular mechanisms, including the ability to observe post translational regulation.
In total we identified 4,135 differentially expressed genes (DEGs) in the Paneth data and cell 2,889 in the goblet data (Figure 3). The difference in quantity of DEGs between the datasets could be explained by the diverse selection of environmental signals integrated by Paneth cells. Additionally, as goblet cells are present in fewer numbers in the small intestine than the colon, we hypothesise that they could play a lesser role in this environment. (Stappenbeck and McGovern 2017; Clevers and Bevins 2013). These DEGs were subsequently mapped to a network of previously identified molecular interactions. This background network was collated by integrating experimentally verified mouse interactions from seven databases. To generate the cell type specific regulatory networks we extracted, from the background network, only those interactions where both nodes are differentially expressed in the relevant dataset. We cannot expect to reconstruct the exact structure of the underlying biological networks using this method, but we generate a collection of interactions likely present and important in the cell type of interest. For example, whilst regulators do not necessarily show strong co-expression with their targets, where co-expression exists there is greater chance the association is functionally interesting. Therefore, we can use these networks to represent and analyse current biological knowledge as well as to generate hypotheses and guide further research.
Identification of potential cell type specific regulators was carried out by observing upstream regulators of marker genes in the relevant networks. Using this data, we highlighted regulatory rewiring between the cell type networks. Downstream rewiring scores were calculated (using the main regulatory networks) for the 33 marker regulators observed in both the Paneth and goblet marker analyses. This investigation yielded Etv4, mmu-let-7e-5p, mmu-miR-151-3p, Myb and Rora as the most rewired regulators. Functional analysis on the targets of these regulators highlighted an overrepresentation of metabolism associated targets in the Paneth network. This result supports current understanding that Paneth cells rely on high levels of protein and lipid biosynthesis for secretory functions (Cadwell et al. 2008). In addition, recent evidence suggests that Paneth cells play an important role in metabolically supporting stem cells (Rodríguez-Colman et al. 2017). Additionally, functional analysis revealed an overrepresentation of cell cycle associated targets in the goblet network. As terminally differentiated cells do not usually undergo mitosis, the cell cycle connection is likely explained by the presence of many goblet-like transit amplifying cells, which has previously been observed (Paulus et al. 1993). This analysis highlights apparent redundancy and/or cooperation of regulators which control similar cell type specific functions. Additionally we show the potential importance of regulatory rewiring in the evolution of cell type specific pathways and functions.
Furthermore, we used the marker regulators to predict three groups of master regulators: those specific to the Paneth marker subnetwork (Cebpa, Jun, Nr1d1 and Rxra), those specific to the goblet marker subnetwork (Gfi1b and Myc) and those which appear to regulate many of the markers in both the Paneth and the goblet subnetworks (Ets1, Nr3c1 and Vdr). Literature investigation of these regulators highlighted that many of these regulators, particularly those associated with Paneth cells, have connections to autophagy, inflammation and IBD. This highlights the importance of autophagy and inflammation in Paneth cell function and suggests that dysregulation of key cell master regulators could lead to IBD. In addition, we identified 22 (out of 81) CD susceptibility genes in the Paneth cell network and 12 in the goblet cell network (Supplementary table 7). This finding highlights the possible applications of these cell type specific regulatory networks for the study of IBD at a cellular level.
Conclusion
Growing availability of RNA sequencing platforms and datasets has facilitated the identification of transcriptional signatures of tissues, disease states and, more recently, single cells. However, functional association and contextualisation of these signatures has provided challenges for researchers and clinicians. We describe a combinatorial systems biology approach to integrate organoid transcriptomics data into cell type specific interaction networks, incorporating information on transcriptional and post transcriptional regulation. We applied this approach, using Paneth cell and goblet cell enriched organoids, to reconstruct regulatory landscapes and predict master regulators of these cell types. Adaptation of this workflow to patient derived, genetic knockout and/or microbially challenged organoids, alongside appropriate wet-lab validation, will aid discovery of key regulators and signalling pathways of healthy and disease associated intestinal cell types.
Competing interests
None
Author contributions
EJ and ZM performed the experimental work with the organoids. AT, PS, TW and MO carried out the bioinformatic analysis. PP, LH, TW, FDP, WH and TK designed and supervised the experiments. AT, PS, ZM, EJ, IH and TK wrote the manuscript.
Funding
This work was supported by a fellowship to TK in computational biology at the Earlham Institute (Norwich, UK) in partnership with the Quadram Institute (Norwich, UK), and strategically supported by the Biotechnological and Biosciences Research Council, UK grants (BB/J004529/1, BB/P016774/1 and BB/CSP17270/1). ZM was supported by a PhD studentship from Norwich Medical School. PP was supported by the BBSRC grant BB/J01754X/1. AT and MO were supported by the BBSRC Norwich Research Park Biosciences Doctoral Training Partnership (grant BB/M011216/1). Next-generation sequencing and library construction was delivered via the BBSRC National Capability in Genomics and Single Cell (BB/CCG1720/1) at Earlham Institute by members of the Genomics Pipelines Group.
Supplementary
Supplementary table 1: A summary of the physical interactions compiled to generate the background network.
Supplementary table 2: Expression and differential expression values for primary cell type markers. Expression given as mean transcripts per million (TPM) for each organoid type. Log2 fold changes values given where differential expression criteria passed (p value <= 0.05 and log2 fold change >= |1|. Control = normally differentiated organoids; Paneth = Paneth cell enriched organoids; goblet = goblet cell enriched organoids.
Supplementary table 3: Hypergeometric significance testing of cell type specific marker enrichment in upregulated differentially expressed gene lists.
Supplementary table 4: Regulator-marker interactions in the Paneth and the goblet subnetworks, including regulator specificity.
Supplementary table 5: Rewiring analysis results for the marker regulators present in the Paneth and the goblet marker subnetworks. Dn score generated using Cytoscape app DyNet.
Supplementary table 6: Functional enrichment analysis of the top 5 most rewired (shared) marker regulators (p <= 0.1).
Supplementary table 7: Crohn’s disease associated genes in the cell type specific regulatory networks.
Acknowledgement
The authors are grateful for the helpful discussions to the past and present members and visitors of the Korcsmaros and Wileman groups.