ABSTRACT
Reprogramming somatic cells to induced pluripotent stem cells (iPSC) succeeds only in a small fraction of cells within the population. Reprogramming occurs in distinctive stages, each facing its own bottlenecks. It initiates with overexpression of transcription factors OCT4, SOX2, KLF4 and c-MYC (OSKM) in somatic cells such as mouse embryonic fibroblasts (MEFs). OSKM bind chromatin, silencing the somatic identity and starting the stepwise reactivation of the pluripotency program. However, inefficient suppression of the somatic lineage leads to unwanted epigenetic memory from the tissue of origin, even in successfully generated iPSCs. Thus, it is essential to shed more light on chromatin regulators and processes involved in dissolving the somatic identity. Recent work characterized the role of transcriptional co-repressors NCOR1 and NCOR2 (also known as NCoR and SMRT), showing that they cooperate with c-MYC to silence pluripotency genes during late reprogramming stages. NCOR1/NCOR2 were also proposed to be involved in silencing fibroblast identity, however it is unclear how this happens. Here, we shed light on the role of NCOR1 in early reprogramming. We show that siRNA-mediated ablation of NCOR1 and OCT4 results in very similar phenotypes, including transcriptomic changes and highly correlated high content colony phenotypes.. Both NCOR1 and OCT4 bind to promoters co-occupied by c-MYC in MEFs. During early reprogramming, downregulation of one group of somatic MEF-expressed genes requires both NCOR1 and OCT4, whereas another group of MEF-expressed genes is downregulated by NCOR1 but not OCT4. Our data suggest that NCOR1, assisted by OCT4 and c-MYC, facilitates transcriptional inactivation of genes with high expression in MEFs, which need to be suppressed to bypass an early reprogramming block. This way, NCOR1 facilitates early reprogramming progression.
INTRODUCTION
Induced pluripotent stem cells (iPSCs) are generated in vitro by overexpressing factors OCT4, SOX2, KLF4 and c-MYC (OSKM factors) in somatic cells [1]. iPSCs have been successfully used in disease modeling and cell transplantation, showcasing their relevance in regenerative medicine [2]. Despite significant improvements to the original protocol [3–5], in general, only a small percentage of somatic cells become pluripotent [6]. The reason is that, as embryonic development proceeds, cells gradually commit to a certain lineage, and acquire full stability upon differentiation [7]. Thus, in vivo, differentiated cells are not able to become other cell types, and such stability is conferred by chromatin state. Chromatin state defines the properties of regulatory elements as a function of DNA sequence, histone modifications, DNA methylation, chromatin architecture and the activity of transcription factor networks [7]. To understand reprogramming and develop new or improve existent iPS generation protocols, it is crucial to understand the interplay of chromatin modulators and transcriptional regulation.
Pluripotency acquisition involves silencing of somatic genes [8], and reactivation of the pluripotency network, which occurs in a stepwise manner [9]. The OSKM reprogramming transcription factors play different roles, but assist the two processes [10]. For instance, it has been shown that KLF4 and SOX2 bind to closed chromatin in MEFs [11,12]. These binding events are related to opening up of silenced pluripotency enhancers [12]. It has also been suggested that opening up of pluripotency enhancers requires cooperative binding of at least two of the reprogramming factors [10]. OCT4 was shown to bind to closed chromatin in somatic cells [11,13], but accumulating evidence indicates that OCT4 first binds open chromatin in MEFs and is recruited by other reprogramming factors to assist in opening-up of pluripotency enhancers [10,14,15]. OSK also are involved in transcriptional silencing of somatic genes, facilitating the relocation of somatic-specific transcription factors to loci co-occupied by OSK [10]. Thus, reprogramming factors facilitate the repression of the somatic program but also the activation of the pluripotency network. This transcriptional duality is elicited by differential interaction with transcriptional co-activators or co-repressors [16].
The transcriptional co-repressors NCOR1 (nuclear receptor co-repressor 1) and its paralogue NCOR2 (silencing mediator of retinoic acid and thyroid hormone receptor or NCoR2), interact with transcription factors to suppress their target genes [16]. NCOR1/NCOR2 are essential enzymatic co-factors of histone deacetylases, especially HDAC3 [17] and mediate transcriptional repression of their targets via deacetylation of Lysine residues of histone tails [18,19]. NCOR1/NCOR2 are essential for organism homeostasis, and have non-redundant roles in metazoans, as mutation of each gene individually leads to embryonic lethality [16]. In fact, NCOR1/NCOR2 regulate several essential metabolic and cellular processes, such as circadian rhythm [20] and mitochondrial function [21]. Recently, it was shown that NCOR1/NCOR2 interact with all four OSKM factors during reprogramming from MEFs to iPS [22]. NCOR1/NCOR2 interaction with c-MYC promotes the repression of pluripotency genes in later reprogramming stages [22], imposing a barrier for iPS generation. In addition, NCOR1/NCOR2 knockdowns induced transcriptional upregulation of somatic genes in the earliest reprogramming stages [22], arguing in favor of a dual role for these co-repressors. However, it is unknown how these early effects of NCOR1/SMART are brought about.
Previously, we performed a high-content imaging siRNA screening, combined with a secondary RNA-sequencing screen, to identify chromatin-associated regulators during early reprogramming from MEFs to iPS [23]. Using this data from two orthogonal screens, we identified strong phenotypic similarities between NCOR1 and OCT4 knockdowns. We have identified functional interactions from knockdowns showing similar phenotypes before [23,24]. Here we characterize the relationship of NCOR1 and OCT4 in early reprogramming. For such purpose, we first looked closer into the phenotypic similarities of both knockdowns, based on the high-content imaging. Then we set out to investigate the effect on the reprogramming transcriptome in Ncor1 and Oct4 knockdowns with RNA-sequencing. Finally, the analyses we compared our RNA-seq findings to published ChIP-seq and RNA-seq datasets. These analyses not only document the cooperation of OCT4 and NCOR1 in downregulating one set of somatic genes, they also show an antagonistic role in the regulation of another subset of somatic genes during early reprogramming.
RESULTS
High content screening reveals early reprogramming disruption upon Ncor1 depletion
In a previous study, we performed a high-content siRNA screen, targeting 300 chromatin-associated factors during early reprogramming (Fig. 1A) [23]. We hypothesized that the onset of reprogramming would be associated with colony-level phenotypic changes and that these changes are affected upon gene perturbation. Moreover, using computational analysis of high-content imaging data, perturbed genes can be grouped according to their phenotypic similarities in a multidimensional space, rather than focusing on a single phenotype [25]. We used early pluripotency markers CDH1 and SALL4 to detect colonies at day 6 of reprogramming. Additionally, multiple colony features were extracted, including pluripotency marker expression levels, morphological traits such as colony roundness, area and symmetry, and many other colony texture and morphological features [23]. We selected 30 hits that were subjected to a secondary RNA-seq-based screening [23] (overview in Fig. 1A). We have previously identified functional interactions from knockdowns showing similar phenotypes [23,24,26]. Using this rationale, we compared knockdown-to-knockdown correlations based on high-content phenotypes and also based on transcriptomes (Fig. 1B). We filtered for pairwise correlations showing highest scores in both the high-content and the RNA-seq screen (both R> 0.4). We visualized the resulting potential interactions between genes (nodes) as networks, considering both high-content imaging correlations (edge width) and transcriptome-based correlations (edge color) of all siRNA knockdowns. We previously uncovered a functional interaction between BRCA1, BARD1 and WDR5 [23]. The strongest correlation however is between NCOR1-OCT4, raising the possibility of a functional interaction between these factors (Fig. 1B).
We verified that, based on high-content features, siNcor1 positively correlated with siOct4, but anti-correlated with the negative (nt, non-targeting) control (Fig 1C). Colony number analysis derived from the screen showed a significant reduction of colony formation upon Ncor1 depletion (Fig. 1D, Supplemental Fig.1). High-content images from Ncor1 and Oct4 knockdowns showed very low CDH1 and SALL4 intensities and compromised colony formation, when compared to the nt control. We validated this finding using an independent In-Cell Western assay and confirmed that siNcor1 and siOct4 show similarly impaired reprogramming phenotypes (Fig 1E, Supplemental Fig. 1).
From these experiments we concluded that siRNA targeting of Ncor1 results in an early reprogramming phenotype that is similar to that caused by Oct4 knockdown.
Ncor1 depletion affects transcriptional regulation of fibroblast identity
To gain insight in the molecular events that explain early reprogramming disruption by Ncor1 depletion, we performed gene expression analyses. We first asked whether Ncor1 mRNA expression followed a dynamic pattern during early reprogramming. For this, Ncor1 mRNA expression was quantified by RT-qPCR at different reprogramming timepoints, from MEFs to reprogramming day 6. We found that Ncor1 mRNA expression was relatively stable throughout reprogramming, with slight variations (Fig. 2A). Then, we sought to understand the role of NCOR1 in transcriptional dynamics during early phases of iPSC reprogramming. For this purpose, we performed RNA-sequencing at day 3 and day 6 after expression of OSKM factors in MEFs transfected with control or Ncor1 targeting siRNAs, respectively. Differential gene expression analysis revealed 405 deregulated genes at day 6 (siControl versus siNcor1, adjusted p-value <0.05, Fig. 2B-C). The effect size was relatively small for most transcripts, suggesting that the strong defects in repogramming observed with siNcor1, are brought about by moderate changes in expression of hundreds of transcripts (mean absolute log2-fold change of 0.43). The majority of the genes (94%) were up-regulated, as expected for a transcriptional co-repressor, although the deregulation of some of these genes may be an indirect consequence of the reduction of NCOR1 rather than a reduced NCOR1-mediated repression on that particular gene.
To gain insight into which processes were affected in Ncor1-depleted cells, we performed Gene Ontology classification (GO) for upregulated genes because the downregulated were very few (Fig. 2D). This analysis showed that upregulated genes were associated with cell-cell adhesion, poly-A-RNA-binding, exosome and cytoskeleton (Fig. 2D). Using our reprogramming RNA-seq time course dataset [23], we observed that genes associated with these terms were indeed downregulated in the course of early reprogramming (Fig. 2E). To note, the cell adhesion terms featured laminins, collagens and other genes related to basement membrane and extracellular matrix. These data suggest that reduction of Ncor1 prevents downregulation of genes related to fibroblast identity during MEF to iPSC reprogramming.
NCOR1 and OCT4 co-regulate two distinct MEF-expressed groups of genes
To gain insight into the functional relationship between OCT4 and NCOR1, we examined RNA-seq profiles at reprogramming day 6 for both knockdowns. To verify Ncor1 and Oct4 mRNA-targeting by the corresponding siRNAs, we plotted the RNA-seq normalized values for each of the knockdowns, compared to the controls at day 3 (Supplemental Fig. 2). Oct4 knockdown did not show a high efficiency of knockdown at day 3 in these samples, however from time course experiments we have observed that Oct4 mRNA is targeted and efficiently silenced at day 2, but rapidly recovers within a couple of days (Supplemental Fig. 2). This is due to its high overexpression driven by the tet-OSKM cassette [27]. Even with such transient early knockdown we observe a strong phenotype (Fig. 1) and many deregulated genes (Fig. 2A), highlighting the important contribution of OCT4 in OSKM reprogramming.
When we compared the genes deregulated in each knockdown we found that approximately half of the genes deregulated in siNcor1 cells were also deregulated in siOct4 cells (3.4 fold enrichment compared to random overlap, hypergeometric p-value 8.9 x 10-54; Fig.3A). After hierarchical clustering of genes deregulated by both siNcor1 and siOct4 based on RNA-seq, we identified one small cluster (Cluster 1) and a large cluster that was further divided in two (Cluster 2A, 2B; Fig.3B). Cluster 1 genes were relatively unaffected at day 3, but at day 6 they were downregulated for both knockdowns. For cluster 2A we found that genes were also unaffected at day 3, but showed a similar upregulated pattern in both Oct4 and Ncor1 siRNAs at day 6 (Fig. 3B). Cluster 2B contained genes with high expression in MEFs, most of which failed to be downregulated at day 3, whereas by day 6 they predominantly showed opposite expression changes in the knockdowns (siNcor1 versus siOct4) (Fig. 3B). Our RNA-seq time course dataset [23] showed that genes in cluster 1 mostly remain the same in time, whereas clusters 2A and 2B tend to go down in time, but with different dynamics (Fig. 3C).
To shed some light on the difference between clusters 2A and 2B, we performed GO classification for genes of each cluster (Fig. 3D). NCOR1-OCT4 co-regulated genes (cluster 2A) showed an association with RNA metabolism and cell adhesion (Fig. 3D). The term “poly(A)-RNA binding” contained several components for pre-mRNA splicing. Genes increased by siNcor1 but not siOct4 (Cluster 2B), were associated with functions in extracellular matrix organization and other cell adhesion features. We wondered whether the “cell adhesion” terms were related to the mesenchymal-to-epithelial transition (MET) observed in early reprogramming [28,29]. We found that most mesenchymal genes exhibit a cluster 2B-type pattern, with increased expression at day 3 in both knockdowns and opposite effects of the two knockdowns at day 6. Epithelial genes on the other hand show more variable patterns (Fig. 3E). In siOct4 cells, the downregulation of mesenchymal gene expression is delayed. By day 6, it not only catches up, this decrease in expression is more severe in siOct4 cells relatively to control cells, suggesting that OCT4 moderates this decrease during normal reprogramming. In siNcor1 cells on the other hand, mesenchymal gene expression is maintained at higher levels at both time points, although some epithelial genes are also expressed at higher levels (Fig. 3E).
These results identify two distinct groups of somatic genes, the expression of which is normally reduced during reprogramming. Ncor1 depletion prevents downregulation of both groups, whereas Oct4 depletion prevents the downregulation of one group and causes a stronger decrease in expression in the other group of genes.
NCOR1 and OCT4 target promoters associated to cellular and metabolic processes in MEFs and early reprogramming cells
To gain insight into the functional relationship between NCOR1 and OCT4, we analyzed published ChIP-seq and RNA-seq data[10,22]. First, we asked whether the up-regulated genes that we identified in siNcor1 and siOct4 cells, were either MEF genes or genes dynamically changing during reprogramming. Consistent with our earlier analysis (Fig. 3C), these independent data confirmed that upregulated genes in siNcor1 cells were mostly MEF-expressed and very early reprogramming (48h) genes, but were mostly inactive in ES cells (Fig. 4A). In the heatmap several genes related to fibroblast identity are labeled (lamins, collagens, fibronectin, signaling genes; Fig. 4A). In contrast, genes upregulated in siOct4 cells showed a variety of expression patterns during reprogramming (Fig. 4B).
Then, we asked which fraction of differentially expressed genes in Ncor1- or Oct4-knockdowns were directly bound by NCOR1. We overlapped NCOR1 binding sites in reprogramming cells with siNcor1 and siOct4 differential genes (Fig.4C, left and right, respectively) and calculated the fold enrichment. In all cases, the overlap was higher than expected by random chance, as determined with a hypergeometric test (Table 1). These data confirmed that NCOR1 directly binds to a significant fraction of deregulated genes in Ncor1 and Oct4 knockdowns (Fig. 4C, Table 1).
We then asked whether NCOR1-bound genes with increased expression in siNcor1 cells, were co-bound by OCT4 or c-MYC or were decorated with the H3K4me3 promoter mark (121 genes in Fig. 4C left). We observed that these NCOR1-bound regions were also enriched with OCT4 and c-MYC at 48 hours of reprogramming, whereas OCT4 and c-MYC binding was lower in ESC (Fig.4D). Since c-MYC is bound to those regions in MEFs already (Fig. 4D) and c-MYC, OCT4 and NCOR1 colocalize in the genome during reprogramming [22], it is possible that c-MYC facilitates OCT4 and NCOR1 binding to those regions early in reprogramming. Furthermore, the enrichment of H3K4me3 in MEFs, 48 h and ESC suggests that these genomic locations represent promoters (Fig.4D).
To explore the function of genes co-regulated by NCOR1 and OCT4, we performed a Gene Ontology classification of genes analyzed in Fig. 4D (Supplemental Fig.3). This analysis confirmed once more the terms we observed before (Figs. 2D and 3D), related to cell-cell adhesion, extracellular exosome, extracellular matrix, cytoskeleton, focal adhesion and poly(A) RNA binding, among others (Supplemental Fig.3). As before, the poly(A) RNA binding term included several genes related to translation and pre-mRNA splicing.
DISCUSSION
In this study we found that depletion of Ncor1 results in disrupted colony morphology in early iPS cells. Transcriptome analyses showed that NCOR1 has a role in suppression of fibroblast identity and signaling modulation (Fig. 2-3). Our findings extend an earlier report on the role of NCOR1 in later phases of reprogramming [22]. NCOR1 function could be mediated by c-MYC, that occupies many MEF regulatory regions [10].
Using multi-dimensional readouts, we found strong phenotypic similarities between Ncor1 and Oct4 depletion. Such phenotypic similarities often reflect functional interactions [23,24,26]. Our analyses indicate that NCOR1 and OCT4 co-bind a group of promoters that are also bound by c-MYC in early reprogramming cell populations. Genes associated with these promoters are expressed in MEFs and during the first 48 hours of reprogramming, and are mostly transcriptionally silent in ESCs (Fig.4). Upon Ncor1 knockdown they are upregulated. This could mean that the repressor NCOR1 is recruited to such promoters by OCT4/c-MYC quite early in reprogramming, and transcriptional downregulation occurs via histone deacetylation. There was a surprisingly moderate effect on the transcriptome associated with siNcor1. Because of the increased proliferation of reprogramming cells [30], Ncor1 knockdown at day 3 was 50%. Perhaps the effects on the transcriptome would have been stronger if we had transfected siRNAs at day 3 instead of day 0, or alternatively with a complete loss of function model. In addition, Ncor1 and Ncor2 may have partially redundant functions [31], which may have contributed to the relatively mild effect on the transcriptome. Nevertheless, we observe a strong phenotype in colony formation, and the transcriptome data suggest that Ncor1 attenuates signaling, which might be crucial to weaken fibroblast identity. Consistent with this, NCOR1/NCOR2 are known to be connected to various signaling pathways [31].
Other genes involved are related to essential cellular functions, such as cell-cell adhesion, vesicle-mediated transport and RNA splicing. In agreement, complexes and proteins associated with these functions have been shown to be downregulated during intermediate stages of reprogramming [32]. Moreover some of these processes (e.g. cell adhesion, vesicle-mediated transport) have been shown to be barriers for reprogramming [33,34]. One apparent discrepancy in these analyses might be that the active promoter mark H3K4me3, seems abundant in MEFs, 48h reprogramming cells and ESCs. However, one possible explanation would be that target genes are downregulated in intermediate stages but then required again in late stages, as has been shown for some of the processes involved (cell adhesion, spliceosome) [32]. Moreover, some of these genes are essential for the cellular functions and continue to be expressed in ESCs, albeit at lower levels.
Recent work has shown that NCOR1/NCOR2 interact with all OSKM factors, but especially with c-MYC [22]. During late reprogramming stages, the co-repressors NCOR1/NCOR2 interact with c-MYC to silence the pluripotency network, posing a barrier for reprogramming [22]. Our study contributes with further characterization of an early facilitating function of NCOR1 in reprogramming, which in cooperation with OCT4 and c-MYC, may suppress cellular and metabolic functions sustaining fibroblast identity. This is consistent with a role of NCOR1 in overall cellular homeostasis and metabolic regulation [16,21]. It also highlights the emerging role and cooperation of reprogramming factors in transcriptional repression [10]. Future work will focus on potential functional interactions of NCOR1 with other chromatin factors, for instance with PHF1 or APEX1 (Fig.1B), and their physical interactions in early and late stages of reprogramming.
DATA AVAILABILITY
High-content screening data can be downloaded from supplemental information in [23]. Reprogramming RNA-sequencing timecourse data is available from NCBI-GEO database accession number GSE118679. Non-targeting control RNA-seq data accession number in GEO is GSE118677.
EXPERIMENTAL PROCEDURES
Lentivirus production
HEK-293T cells up to passage 30 were cultured in 15 cm dishes to reach around 90 % confluency at the transfection day. Transfections with 3rd generation lentiviruses were done with Lipofectamine 2000 following the manufacturer’s instructions. For each plate we used
MEF-to-iPS Reprogramming
Primary mouse embryonic fibroblasts (MEFs) strain C57/BL6 passage 0-1 were seeded at a density of 10 000 cells per cm2 in MEF medium consisting of 15 % FBS, 100 nM b-mercaptoethanol and 1 % non-essential aminoacids (Thermo Scientific). Next day, MEFs were transduced with the doxycycline-inducible mouse tet-STEMCCA [27] and rtTA lentiviruses (Addgene # 20342) at an MOI of 1. One day after transduction, cells were either transfected with siRNAs or induced for reprogramming with reprogramming medium. This consisted of DMEM-high glucose (Thermo Scientific) supplemented with 10% stem cell grade FBS (Hyclone), 200 nM b-mercaptoethanol (Sigma), 1 % sodium pyruvate (Thermo Scientific), 2μg·mL-1 doxycycline, 3 μM Chiron (GSK3-inhibitor), 0.25 μM Alk5i (TGF-ß inhibitor), 50 μg·mL-1 ascorbic acid and 1·103 U·mL-1 LIF. Next day after adding reprogramming medium was considered reprogramming Day 1.
siRNA transfections
Transfections were performed as described previously [23]. Before adding the cell suspension containing OSKM-transduced MEFs, multi wells were prepared with transfection mix. This consisted of a 40nM pool of 3 different siRNAs per target mRNA, diluted in Optimem (Thermo Scientific) together with RNAiMAX lipofectamine, according to the manufacturers instructions. After this incubation, cell suspensions were added to each well. Twenty four hours after transfections, reprogramming started by adding medium with doxycycline.
High-content screening
High-content screening was performed as described before[23]. Briefly, transfected cells were cultured in Cell Carrier 96-well black plates (Perkin Elmer), after six days, samples were fixed with 4 % PFA and stained for CDH1 (Cell Signaling 14472) and SALL4 (Abcam 29112) with DAPI counterstain. Plates were imaged in an Opera High-Content-Screening System (Perkin Elmer). Colony segmentation was done in multiple Z-planes using SALL4 staining. Columbus Software (Perkin Elmer) was used to extract all features in an automated fashion. Z-score normalization per plate was applied. The normalized data for high-content features were used for knockdown-to-knockdown Pearson-correlation analysis.
RNA isolation and RT-qPCR
RNA was isolated with RNA Micro-prep kit (Zymo Research). Integrity was verified with Bioanalyzer and concentrations were determined with Nanodrop. For reverse transcription we used SuperScript III Kit (Thermo Scientific) starting with 120-180 ng total RNA. For the RT-qPCR reaction 1-2 ng of cDNA were used in 20 uL reaction mix with Sybr Green mix ready to use (iQ-SYBR-Green Supermix, Biorad). Relative gene expression was calculated with the ΔΔCt method using GAPDH as reference.
RNA-sequencing library preparation and data analysis
Kapa-RNA HyperPrep kit with RiboErase (Roche, Kapa Biosystems) was used for ribosomal RNA depletion, and library preparation, with 200 ng total RNA. Library amplification was performed for 10 cycles, according to manufacturer’s instructions. Correct size distribution of 300 bp was verified with Bioanalyzer (Agilent Technologies). Quality controls were performed for selected transcripts in each sample before and after library preparation by qPCR.
NextSeq500 Illumina platform was used for paired-end library sequencing, with 43 bp read length. STAR v.2.5.b [35] was used to align reads to mouse genome assembly mm10. Data was normalized to log2-cpm with R package edgeR v.3.20.9 [36].
For differential gene expression analysis we used R Package DESeq2 v.1.18.1 [37]. Gene lists were filtered with a cutoff of p-adjusted value <0.05.
Gene Ontology classification with differential gene lists was performed with web-based DAVID [38], considering only categories with gene counts > 10 and p-adjusted value <0.05.
ChIP-seq and RNA-seq data integration
For ChIP-seq data integration, FASTQ files of ChIP-seq data were downloaded from NCBI GEO (GSE70736, GSE90893, GSE90895) and mapped to the mm10 genome assembly using BWA allowing one mismatch per read. NCoR/SMRT peaks (Suppl. Table S4;[22]) were intersected with siNcor1 day 6-upregulated genes (linux join, using gene symbols) to obtain genomic locations of interest. Heatmaps of ChIP-seq data were generated with fluff [39] with hierarchical clustering and RPKM normalization. RNA-seq data (Suppl. Table S2; [10]) was intersected with differentially expressed genes in our study (linux join, using gene symbols). Corresponding RNA-seq heatmaps were generated using seaborn clustermap with row clustering and row z-score normalization. Venn diagrams were obtained using venn3 from the matplotlib_venn library. Hypergeometric probabilities were calculated using scipy.stats.