Abstract
The baseline level of transcription is variable and seriously complicates the normalization of comparative transcriptomic data, but its biological importance remains unappreciated. We show that this ingredient is in fact crucial for the interpretation of molecular biology results. It is correlated to the degree of chromatin loosening measured by DNA accessibility, and systematically leads to cellular dedifferentiation as assessed by transcriptomic signatures, irrespective of the molecular and cellular tools used. A theoretical analysis of gene circuits formally involved in differentiation, reveals that the epigenetic landscapes of Waddington are restructured by the level of non-specific expression, such that the attractors of progenitor and differentiated cells can be mutually exclusive. Together, these results unveil a generic principle of epigenetic landscape remodeling in which the basal gene expression level, notoriously important in pluripotent cells, allows the maintenance of stemness by generating a specific landscape and in turn, its reduction favors multistability and thereby differentiation. This study highlights how heterochromatin maintenance is essential for preventing pathological cellular reprogramming, age-related diseases and cancer.
Introduction
Data from the litterature show that basal expression, chromatin loosening and stemness, are intimately connected phenomena: (i) Stem cell chromatin is loosened compared to that of differentiated cells [1, 2, 3] and the differentiation of stem cells is accompanied by the progressive condensation of their chromatin [4]. (ii) A high level of basal expression is a hallmark of stem cells, distinguishing them from their terminally differentiated counterparts [5, 6]. (iii) The histone mark H3K9me3 associated to closed chromatin, prevents reprogramming [7, 8]. Its inhibition forbids differentiation [4] whereas its forced demethylation facilitates reprogramming [9, 10]. (iv) H3K9 acetylation characterizes puripotency and reprogramming capacity [11]. (v) More generally, opening chromatin by inhibition of DNA methyltransferases and histone deacetylases, improves the induction of pluripotent stem cells [12]. Such observations have also been reported for specialized cases of terminal differentiation. For instance, a defect of the H3K9 trimethylase Suv39h1 maintains the reprogramming capacity of CD8+ lymphocytes [13] and chromatin acetylation induces the developmental plasticity of oligodendrocyte precursors [14]. Long before these studies, it had already been shown that cellular differentiation is associated to an overall loss of DNA accessibility, measured experimentally with DNAseI [15]. This impressive list of convergent observations with stem and progenitor cells can be further extended to pathological cases of dedifferentiation, notably cancer. On the one hand, cancer cell chromatin is globally decondensed, with demethylated DNA and acetylated nucleosomes, except at the level of tumor supressors. On the other hand, the aggressiveness of cancer is correlated with the degree of dedifferentiation and cell state plasticity, allowing for example cells originating from the mammary epithelium to forget their initial identity, escape hormonal control and acquire migratory properties [16]. These systematic correlations prompted us to look for an underlying principle rooted in the physics of genetic networks. The dedicated tool for this purpose is naturally the Waddington epigenetic landscape, long envisioned as the ideal framework for conceptualizing cell differentiation and development. An epigenetic landscape, in the sense initiated by Waddington [17], is a n-dimensional potential surface shaped by the mutual compatibility or incompatibility of the concentrations of the n cellular components. Indeed, the basic principle of cellular systems is that the different macromolecules can not be present in arbitrary relative concentrations in the cell, because of the internal constraints of reticulated networks. The most direct interactions are mediated by transcription factors (TFs) and the most widely studied interaction networks are gene regulatory networks (GRN). Epigenomic and transcriptomic profiles found in large datasets emerge from such underlying circuits. Under certain conditions which are fulfilled in living systems, including positive loops and nonlinear interactions, several steady states can coexist in the landscape and the system is said to be multistable. In this picture of generalized interactions, the existing cellular phenotypes correspond to the possible discrete combinations, like barcodes, of cellular components defining the bottom of the basins in the landscape. These local minima are steady states in which the different nodes of the network remain stable. All the other combinations, falling on the ridges or the sides of the mountains, are unstable and automatically pulled down by restoring forces to a basin of attraction located in the vicinity. This view illuminated our understanding of development and cellular differentiation, conceived as emerging from GRNs and biochemical circuits [18, 19, 20]. In this framework, cellular differentiation is underlain by phase space translocation of gene regulatory systems from median attractors with generalized gene expression, to border attractors with selective gene expression. The former are supposed metastable and less resistant to fluctuations whereas the latters are classically considered stable [21], but experiences show that reprogramming differentiated cells generally remains possible and that conversely, multipotent cells can persist indefinitely in culture and in the organisms [21, 22]. Consistent with these observations, a combination of experimental and theoretical approaches reveals a new general principle governing cellular differentiation, in which stemness attractors dominated by median attractors, remain stable as long as the ratio of basal vs regulated transcription is high. In turn, the lateral attractors with selective gene expression and characterizing terminally differentiated phenotypes, progressively deepen when lowering basal expression.
Results
To examine the relation between basal expression and chromatin compaction, we developped cellular/molecular instruments using mutant versions of the myocardin-related TF (MRTFA/MKL1).
MRTFA has been shown to participate to a transcriptional cocktail of stemness in breast cancer cells [23] and to erase the initial differentiation status of cells [24]. But as for most biological molecules, the function of MRTFA is finely regulable, for instance by its level of expression, subcellular location and interaction partners, making it difficult to manipulate. We showed that it is possible to impose to MRTFA clear-cut functions by deletion of specific interaction domains. Overexpression of a dominant positive mutant version (DP-MRTFA) devoid of cytoplasm-anchoring domain, constitutively nuclear and transcriptionally active, leads to global chromatin decondensation and induces stem cell marks such as bivalent chromatin [16]. By contrast, an other mutant, dominant-negative (DN-MRTFA) devoid of transactivation domain, tightens chromatin and strengthens the differentiated phenotype [16]. As shown in Fig.1A, marked phenotypic changes are induced by DP-MRTFA, with disruption of pericellular E-cadherin and of the pseudo-epithelial structure of cell monolayers.
Dedifferentiation, chromatin decondensation and basal expression are coupled phenomena
Global transcriptomic studies allowed to identify the modifications of gene expression occuring in these cells in term of signatures (Fig.1C). DP-MRTFA caused a loss of differentiation characteristics accompanied by a clear emergence of basal cell and epithelial to mesenchymal transition (EMT) signatures, which characterize mammary stem cells and epithelial de-differentiation respectively. A phenomenon regularly associated with EMT: a switch of energy metabolism to glycolysis, is also obtained with DP-MRTFA. For comparison, DN-MRTFA induces no such modifications of gene expression. The changes induced by DN-MRTFA are globally inverse to those of DP-MRTFA, but moderate. These less visible phenotypic and genetic changes could be due to the fairly differentiated nature of the starting MCF7 cells. DP-MRTFA expressing cells specifically contain large amounts of RNA per cell (mRNA + rRNA + small RNAs), as measured using the perchloric acid precipitation procedure, indicative of increased overall transcription (Fig.1B). This hypertranscription, which is another feature common to stem cells [25], could be due either to a strong increase in the specific expression of certain genes, or to a global increase of non-specific gene expression. It is technically challenging to decide between these two possibilities and to compare gene expression between cells, because basal expression is generally unnoticed in current experimental approaches. It is ignored in transcriptome-type techniques where the results are expressed per unit mass of RNA, and/or are calibrated using gene expression supposed invariant between the situations. Since transfected DNA is packaged into nucleosomal structures similar to native chromatin [26], transient expression assays are expected to incorporate basal expression, but they have also some pitfalls: (i) It is first necessary to ensure the equivalence of transfection efficiency between the different culture conditions or the cell types to be compared. (ii) It is then necessary to find the most appropriate point of reference to quantify transcriptional changes. (i) The first requirement can in principle be satisfied by co-transfection of a neutral expression vector, provided its expression is not influenced by the more or less permissive nuclear context of each cell. A suitable internal control for this purpose is a strong promoter capable of abstracting itself from repressive contexts [27], such as the cytomegalovirus promoter (CMV) selected here. (ii) The second point is more subtle. Transcriptional results are generally presented as “fold induction” by arbitrarily setting the uninduced condition to 1, but this presentation could introduce a bias in the interpretation of the results, since the basal (uninduced) expression level depends on the cellular context. To highlight this phenomenon, in Fig.2, the same results are presented in different manners. In these experiments, the transcriptional induction of reporter vectors is directed by either estrogen receptor (ERE) or glucocorticoid receptor response elements (GRE). ERE- and GRE-driven reporter plasmids were transfected with or without expression vectors of their respective inducers: ERα with estradiol for ERE-Luc, and GR with dexamethasone for GRE-Luc. The cells tested where either human cell lines with different degrees of differentiation (Fig.2A) and MCF7 cells expressing MRTFA constructs (Fig.2B). The left histograms in Fig.2A show the results traditionally presented in fold induction, after fixing the basal expression level to 1 for each cell type. Presented in this way, the results suggest that ERα and GR appear less potent in dedifferentiated cells; but setting the induced level to 1, rather suggests that basal expression strongly increases in these cells. It is all the more difficult to decide which conclusion is the right one, that the compared cellular contexts largely differ. To bypass this problem of comparison, the cotransfection procedure was then applied to MCF7 cells only, which allowed to verify that the dedifferentiating construct DP-MRTFA actually increases basal expression (Fig.2B). The reinterpretation of results in term of variation in basal expression instead of modified induction, is unusual in the literature, which could explain why the role of basal expression is generally overlooked. Comparison of Fig.1 and Fig.2B shows that the increase vs decrease of basal expression is correlated with the tendency of dedifferentiation vs differentiation.
When looking for a mechanism possibly underlying both chromatin decondensation and increase of basal expression, the most obvious candidate is the degree of chromatin acetylation. Quantification of the antagonistic chromatin marks H3K9ac and H3K9me3 shows a marked increase in acetylation in DP-MRTFA expressing cells and inverse variations in DN-MRTFA expressing cells (Fig.3A). As a control, the inhibitor of histone deacetylases HDAC trichostatin A (TSA), induces potent H3K9 acetylation, as expected (Fig.3B). Histone acetylation has long been shown to alleviate electrostatic interactions between nucleosomes and DNA, mechanically causing chromatin loosening, which is in turn expected to promote the accessibility of DNA to large proteins such as TFs. To test this hypothesis, we quantified the general accessibility of DNA with a large DNA binding molecule, an antibody directed against double stranded DNA. Remarkably, a significant increase of accessibility was obtained with DP-MRTFA (Fig.3). This observation is consistent with the established importance of histone acetylases in stem cells [28] and with the availability of their cosubstrate, acetyl-CoA [2]. Acetyl-CoA actually drops when switching the energetic metabolic from glycolysis to oxydative phosphorylation (oxphos) during differentiation [2]. An inverse switch towards glycolysis is precisely observed in DP-MRTFA-expressing cells (Fig.1C).
Chromatin hyperacetylation is sufficient to induce some dedifferentiation characteristics
To determine whether the triple relationship between (i) chromatin hyperacetylation, (ii) dedifferentiation and (iii) basal expression, is fortuitous or causal, we tested if chromatin acetylation caused by artificial drug treatment can induce some characteristics of DP-MRTFA-expressing cells. As shown in Fig.4, mechanical opening of chromatin using TSA, turns to be capable in itself to reproduce certain properties of DP-MRTFA cells, including a rise in basal expression (Fig.4A). In this respect, comparison of transcriptional induction by ERα in the presence or absence of TSA clearly confirms the misleading character of representations in fold induction in this context. When setting the control to 1, TSA seems to inhibit the activity of ERα (Fig.4A, middle histograms). But when setting the induced level to 1 (right histograms), its becomes clear that this drop in fold induction may instead be due to a strong increase in basal expression. TSA treatment also induces clear phenotypic (Fig.4B) and genetic (Fig.4C) changes. 24-hour treatment with 200 nM TSA disupts cell-cell contacts and downregulates E-cadherin. TSA does not cause nuclear accumulation of endogenous MRTFA as potent as for the mutant construct DP-MRTFA, but it is however significant, particularly for larger cells, and a strong perinuclear accumulation of MRTFA is observed. A signature analysis was then conducted to determine the transcriptomic changes induced by TSA in MCF7 cells.
As shown in Fig.4C, TSA has significant effects on gene expression, although less marked than with DP-MRTFA, on the decrease in luminal signature and the increase of signatures of basal cells, EMT and glysolysis. This transcriptomic reshaping is illustrated in Fig.4D by the changes in expression of a selection of well-identified marker genes. A decrease in GATA3 and ERα, involved in mammary luminal differentiation, a strong increase in the mammary stem cell marker IL6 [29, 23], and upregulation of genes involved in the metabolic switch to glycolysis (UCP2 and PDK4) (Fig.4D). Considering the systematic correlation observed between basal expression and cellular differentiation, it is now of interest to determine if this association is merely phenomenological or reflects a fundamental property of biological systems. To this end, we tested the influence of basal expression on particular GRNs clearly identified as regulators of cellular differentiation.
Role of basal expression on GRN multistability
The impact of basal expression on differentiation is tested in the framework of Waddington landscapes using two simple model systems, unidimensional and bidimensional.
Principle of differential GRN modeling
A fundamental property of living systems is the permanent renewal of all their constituents, through continuous cellular refuelling with matter and energy. In this highly dynamic picture, the concentration of each constituent x results from the relative synthesis (S) and removal (R)
Synthesis can itself be split into basal synthesis (Sb), independent of the specific regulators of the considered gene, and activated synthesis (Sa) triggered by combinations of TFs, ncRNAs and virtually all the other components of the network in an indirect manner.
The removal of molecules also results from a basal mechanism (Rb) generally approximated as an exponential decay, but in addition there is also the possibility of an active degradation (Ra) by specific actors such as ubiquitin ligases for proteins.
Both basal and specific synthesis will be considered, but for removal, we will only retain, as in most studies, an expotential decay R(x,t) = r x(t). Basal synthesis will be reduced to a basal frequency of transcription initiation Sb(t) = b, whereas activated synthesis Sa(t) is the product of the maximal transcription initiation frequency a and a fractional promoter occupation function f, ranging from 0 to 1, saturable and generally nonlinear, of potentially all system’s components converging to TFs. The global evolution equation of the component xj thus reduces to
The functions f mediating the interdependence of the different constituents of the system, impose a collective organization where only certain combinations of concentrations can remain stable. The steady states at which all constituent concentrations are mutually compatible, define the possible cell types generated by the system. Starting from given ingredients, a system is multistable when multiple such steady states can coexist. In each steady state, a set of compatible relative concentrations defines a cellular state. Multistability is a preeminent feature of living systems and is synonymous to the capacity of differentiation. It can be obtained when (i) the system is open, subject to permanent constituent renewal, (ii) at least one positive circuit is included [30] and (iii) velocities of either synthesis or removal are nonlinearly dependent on constituent concentrations. To test the effect of basal expression on the structure of the epigenetic landscape, we used abundantly documented paradigms of multistable circuits, consisting of one or two genes. Such minimalist circuits may appear ridiculously small compared to complete cellular systems, but they actually underly real cases of bipotent progenitor differentiation. In addition, they have the practical advantage to be representable in the form of 2D and 3D landscapes.
Single gene circuit: The self-regulated gene encoding a dimerizable TF
A single autoregulated gene (Fig.5A) which is certainly one of the simplest possible circuits, is nethertheless sufficient to give rise to bistability, provided the consitions listed above are fulfilled. Despite its simplicity, this minimalist circuit is actually encountered in nature, as for example: (i) For the auroregulated gene ComK in the context of the sporulation of the baterium B. subtilis [31], which is a sort of bacterial differentiation. (ii) In vertebrates, it is involved in the vitellogenesis memory effect, evidenced in all egg-laying vertebrates tested, from fishes to birds [32]. As simple as it is, this system exhibits the principle proposed here. Increasing basal expression shifts from a single attractor to two different states, of low and high gene expression, which can be regarded as a minimalist type of cellular differentiation. This one-dimensional circuit is important to consider because it can give by integration a genuine Waddington landscape. The potential function for this unidimensional landscape can be straightly calculated by integration of the product evolution function (synthesis minus removal) [33, 34]. Assuming a time scale separation between the DNA/TF interactions and gene expression dynamics, the rigorous modeling of this minimalist gene circuit, which distinguishes the monomer, dimer and total concentrations of the TF [35], reads where b is the basal expression rate, a is the maximal rate of activated expression, K is the constant of dissociation from DNA, and [TF2] is the concentration of the TF dimer, related to its total concentration [TF]tot through where D is the homodimerisation constant. When replacing [TF]tot by x, the differential equation becomes
The balance between synthesis and removal terms (in short the right hand side in the last equation Eq.(2c)), is shown in Fig.5B for the following set of parameter values: D = 1.5; K = 0.2; a =19 and r = 15. The effect of basal expression can be straightly understood by considering the evolution function dx/dt represented in Fig.5B. Depending on the relative values of the production and removal functions, dx/dt can cross several times the null line 0, thereby yielding several possible steady states. To obtain such curves, degradation is generally exponential, with a flux proportional to the product concentration, while synthesis follows a saturable and sigmoidal function of the TFs. This sigmoidicity can be due to a variety of reasons, including TF dimerisation [36, 33, 35, 37], or sequestration by a “poison partner”, as in the next example. The resulting landscapes obtained by integration of the evolution function are shown in Fig.5C using the same parameters as above and for various values of b. The blue curves at the bottom of valleys stand for attractive steady state and the green one for the repulsive one.
The values of b for which such a repulsive steady state is present, precisely correspond to the ones for which two attractors coexist. In its dependence in the parameter b, the system exhibits two connected saddle-node bifurcations. We emphasize that the parameter b is not driven by any dynamic or modeling considerations. This is precisely the point to understand how its (slow) evolution may affect the whole stability properties of the (fast) system. The potential scalar function V(b, x) in Fig.5C is obtained directly by integrating the opposite of the right-hand side in the evolution equation Eq.(2c):
This potential provides a full characterization of attractive of repulsive steady states in the following sense: the equilibria appear then respectively as (local) minimizers and maximizers of V(b, ·). Actually, we introduce an artificial additive potential Vo(b), depending only on b, and therefore not affecting the local minimizers and maximizers in the x-direction, but only their level. This allows to obtain a readable graphical representation. The idea behind this choice is to normalize integration constants so that any of the minimizers levels are more or less independant of b. For simplicity, we only choose a hand-designed polynomial corrective term
The influence of basal expression on the multistability of this toy system is well illustrated in Fig.5C, which shows a single attractor of zero expression for b = 0.2, two attractors for b = 0.55 and a single profound attractor for b = 0.8.
Two gene circuit: Re-modeling the celebrated GATA-PU system
Mutual repression has long been envisioned as a stereotyped multistability switch motif [38]. The most popular tristable two-gene landscape is generated by a circuit with mutually repressing and self-activating genes [20, 21, 34, 39, 40, 41]. Remarkably, this model corresponds to real cases of bipotent progenitor differentiation, including: (i) the balance between red and white blood cells resulting from a choice between GATA1/2 and PU.1 [46], (ii) the muscular vs vascular differentiation of somite cells determined by the Pax3:Foxc2 balance [47], (iii) the ectodermal vs mesendodermal differentiation depending on the Sox2:Oct4 circuit [48]. In all these cases, upon differentiation the system evolves from a central attractor where the antagonistic genes are coexpressed, to one of two lateral attractors of exclusive gene expression. Luminal mammary differentiation certainly obeys the resolution of a circuit of this type, but since it is still unclear whether ESR1 and GATA3 form a positive [49] or negative [50] loop, we will rather use the well established GATA1/2 and PU.1 involved in blood cell differentiation. In the initial models of the GATA1/2:PU.1 circuit, selfactivation and mutual repression were disconnected as depicted in the scheme 1. The variables x and y are understood as total GATA1/2 and PU.1 protein concentrations by assuming that translation is not rate-limiting. This modeling is widespread in the literature [34, 39, 40] even if the basal activity is not explicitly written (set to 1) or interpreted as such.
The classical formulation of this scheme is
But in this scheme, basal expression would not be an independent parameter, as defined in Eq.(1d), but would be regulated by system constituents. In fact, based on the physical and functional interactions between GATA1/2 and PU.1 described in the seminal article of [46], which are recalled in Fig.6A, the mutual inhibition between the two genes does not proceed through reduction of some basal level (Scheme 1), but through preventing the self-stimulations of GATA1/2 and PU.1. This scheme thus includes a genuine basal expression frequency (Scheme 2). A key parameter for modeling the revised mechanism of inhibited activation, is the molecular association between GATA1/2 and PU.1 (Fig.6A) [46]. This association corresponds to a mutual sequestration preventing PU.1 from (i) stimulating its own gene and (ii) inhibiting the GATA1/2 gene, and vice versa (Scheme 2).
The set of equations corresponding to scheme 2 reads where xf and yf are the concentrations of molecules not mutually interacting in x ● y complexes. Given the time scale separation between molecular interactions (very fast) and gene expression dynamics (much slower) the free concentrations are simply given by non-differential, algebraic equations. where the complex is given by where D is the equilibrium dimerisation constant between x and y. The exponents n in the classical modeling of Eq.(4) are Hill coefficients describing molecular cooperativity, whose values are generally chosen for convenience. Arbitrarily increasing Hill’s coefficients is indeed an easy way to accentuate the relief of epigenetic landscapes, but this twist is poorly justifiable in practice in absence of precise quantitative data. By contrast, the simple mechanism of mutual sequestration is both biologically relevant and sufficient to provide the nonlinearity necessary for multistability, whether or not the TFs work as monomers or preformed dimers. Concretely, the production functions in the differential equations are unchanged but the TF concentrations should just be replaced by their free concentrations (xf and yf). The roles of GATA1/2 and PU.1 are supposed symmetrical with identical parameters for both genes (a1 = a2, b1 = b2, K1 = K2 and r1 = r2). Unlike the unidimensional evolution system Eq.(3), not any differential system in higher dimension may be described from a simple scalarvalued potential function. When a Lyapunov function exist however, it provides directly a scalar characterization of attractive behavior of steady states and thus enables the possibility to draw a landscape. This is the case for example for any gradient-like systems. More generally, one may try to take into account the hamiltonian part of the dynamic, however this is not clear how to use the Hodge-Helmholtz-like decompositions to draw a Waddington landscape [51, 20, 52]. An alternative approach is based on the probabilistic point of view and concerns the large deviations theory for invariant measures of stochastic reaction-diffusion processes [53]. In the phase space (x, y), we consider many trajectories of the deterministic system Eq.(5) perturbated with a small brownian motion. Roughly speaking, many of these trajectories accumulates in large time close to attractive steady states or and then evolves finally only through a fine balance between the diffusion process and the inherent deterministic dynamic. The density of trajectories at any point of the phase space becomes independant of time, this is called the invariant measure of the stochastic process. By simulating some probabilistic processes or by computing the solutions of a related reaction-diffusion partial differential equations, we can determine this mean equilibrium density of trajectories at any point. This quantity is a new real-valued function revealing the attractive points and/or the attractive limit cycles of the dynamical system. The probability in Fig.6B is obtained using a finite difference scheme to solve the partial differential equation over the domain [0,10] × [0,10] in (x, y) and over [0,10] in the time variable. The initial data is set to a uniform density p(t = 0, x, y) = 1. The dynamical field X(x,y), Y(x,y) correspond to the respective right-hand sides in Eq.(5), and the diffusion parameter is set to ϵ = 0.025. For the biological parameters, we use the following set of values: a1 = a2 = 10; K1 = K2 = 6, r1 = r2 = 1 and D = 1. In Fig.6B, the basal expression rate is set either to a low value (b = 1) for which bistability is present, or to a higher value (b = 4) with then monostability. The background color field represents the intensity of the density p and the black curves figure the deterministic trajectories, solution to the dynamical system Eq.(5). Almost every of these trajectories goes in large time to one (of the two) attractive points. The computational phase space domain [0,10] × [0,10] is chosen sufficiently large so that we can prove it consists in an invariant domain for the dynamical system: no trajectory escape the domain. This property is useful to design convenient boundary conditions when solving Eq.(5). The same strategy is used for the Fig.6C. In summary, rigorous landscape treatment of the celebrated GATA1/2:PU.1 differentiation circuit clearly shows that a simple change in basal expression level can modify the fate of the system. For b =1, two cell types coexist (Fig.6B, left panel), whereas for b = 4, a single “indecise” cell type exists, with equivalent coexpression of GATA1/2 and PU.1. The projection plot of Fig.6C shows the reshaping of Waddington landscape triggered by b. With the set of parameters used, the transition from bistability to monostability occurs at b = 2.54.
General principle of bifurcation by resolution of conflicting circuits
The cellular differentiation tree in development proceeds through a cascade of successive bifurcations [54], each one coinciding with the resolution of a gene conflict, of which a perfect example is the battle between the GATA1/2 and PU.1 genes. The present results show that a high level of basal expression alleviates the impact of their mutual repression and allows the coexpression of both antagonistic genes in progenitor cells. Lowering basal expression increases the intensity of the fight and accelerates its resolution, achieved when one of the genes loses the fight.
Model insertion in the current picture of cellular differentiation
The present model of differentiation centered on the role of basal expression, makes it possible to weave an integrated picture combining several known properties of cellular differentiation.
Orientation of differentiation
The mechanism of differentiation proposed here is attractive in that it is generic and valid for all cell lineages. The orientation of progenitors committed to differentiate towards a particular destination attractor is supposed to result from either (i) stochastic fluctuations, favored by the low number of certain molecules like mRNAs and allowing the cellular system to jump between adjacent attractors with a waiting time exponentially dependent on the height of the saddle point between them; or (ii) instructive exogenous inputs, like EPO vs G-CSF for the red vs white blood cells, or Spemann’s organizers during organogenesis, transiently altering the initial steady state. The strong heterogeneity detected in single cell transcriptomic analyses supports an important role for the first mechanism. In fact, entrusting developmental bifurcations at random is not actually a risk since any imbalance in the number of cells falling in the final attractors can be corrected a posteriori by selective proliferation and/or apoptosis, to restore the appropriate partitioning of cellular masses.
Paradoxical decrease of regulated transcription when opening chromatin
b and a are used here as independent parameters and only b is modified in the simulations shown above, but a mechanical link can exist between them, through which when one decreases, the other increases. In addition to the enhancers present in regulated genes, the genome contains a multitude of non-specific TF binding sites, generally unoccupied in heterochromatin. Hence, these cryptic sites are logically exposed during dedifferentiation and can trap certain TFs, thereby reducing their free concentration and their recruitment at enhancers (thus reducing a). Such a titration mechanism, which is not necessarily valid for all types of TF, has already been invoked for example to explain how the general TF TATA-binding protein (TBP) whose concentration is limiting [55], puts in competition all its target genes [56]. Since the chromatin of dedifferentiated cells is more accessible for proteins, as examplified by anti-DNA antibodies (Fig.3), or DNAseI [15], it seems logical that the access of TFs is also favored. The b/a ratio is therefore expected to increase upon chromatin loosening in two ways: (i) by allowing the generalized access of transcriptional machineries to a wide variety of genes and (ii) by sequestering TFs in bulk DNA sites. This defocusing of TFs from enhancers is therefore capable at the same time to cause an increase of b and a decrease of a, which act in concert to strongly increase the ratio b/a and desimprint previous gene regulatory circuits. Simultaneous enhancer weakening and genome opening, is the ideal scenario for reprogramming systems, by erasing pre-printed circuits in the face of the emergence of new actors.
Hypothetical origin of bivalent chromatin
A particularity of histone acetylation, contrary to other histone modifications is that its role is unambiguous. For comparison, the effect of histone methylation depends on the lysine residue. Methylation of H3K4 has a permissive role and that of H3K9 methylation has a repressive effect on trancription. By contrast, histone acetylation is always permissive for TF binding regardless of the target lysine, including H3K4 in the promoter of active genes [57]. Note that the methylation of H3K4 in the transcribed regions of active genes is precisely stimulated by acetylated substrates [58]. A subtlety however remains to be explained: in the so-called bivalent chromatin of stem cells, acetylated H3K9 which is permissive, coexists with deacetylated and methylated enhancer H3K27 (E-H3K27) which is repressive. A simple hypothesis to explain this apparent paradox, is based on the observation that acetylation of E-H3K27 results from the docking of HAT by TFs at the level of enhancers, thereby generating a positive loop in which TF binding stimulates E-H3K27 acetylation, which in turn favors enhancer accessibility to TFs. In hyperacetylated chromatin, TF-binding motifs previously cryptic in heterochromatin become exposed and can trap TFs, reducing their free concentration and consequently their availability for binding to enhancers (Model formalized in the S.I.). Then, E-H3K27 methylases like polycomb could complete the system by methylating poorly occupied E-H3K27, as verified in [59], thus precluding their reacetylation.
Why some genes are repressed in the context of globally open chromatin
A long standing enigma about cancer cell chromatin is while it is largely released, certain genes, encoding for instance tumor repressors, are closed. Similar situations are found here. In particular, TSA treatment alone is capable of both decondensing chromatin and repressing genes involved in mammary epithelial differentiation such as GATA3 and ERα. Candidate mechanisms to explain this, include transcriptional repressors as the zinc-figers SNAI1 (Snail) and/or SNAI2 (Slug), known to selectively repress differentiation genes, for instance muscular [60]. They are strongly upregulated in the TSA treatment by 22- and 19-fold for SNAI1 and SNAI2 respectively, and in cancer, such as during the mammary hormonal escape, where the SNAI repress ERα and E-cadherin [61]. Another excellent molecular candidate for repressing differentiation genes after chromatin opening is the polycomb system mentioned above, which could simply validate the lower occupancy of enhancers by TFs [59] (model in the S.I.) and proceed to their closure.
Interplay between acetylation, metabolism and dedifferentiation
The link between chomatin loosening and basal expression is likely to be mediated by histone acetylation, itself depending on the cellular amount of acetyl-CoA, which ultimately results from the type of energetic metabolism of the cell. This relation singularly concretizes the intimate relationship between metabolism and differentiation long anticipated by Warburg [62]. Warburg noticed that glycolysis is predominant in “less structured” (understand less differentiated) cells. The activity of acetylation enzymes is critically dependent on acetyl-CoA as a source of acetyl groups. Precisely, the concentration of acetyl-CoA has been shown much higher in undifferentiated cells with high glycolytic activity [2] in full agreement with the present results, including induction of glycolysis (Fig.1C) and H3K9 acetylation (Fig.3A) upon dedifferentiation.
Conclusions
Functional correlations between chromatin loosening, dedifferentiation and basal gene expression, reflect a universal mechanism in which a decrease of basal expression systematically leads to differentiation and conversely, increasing basal expression associated to H3K9 acetylation, opens the way to reprogramming. The release of chromatin repression and the increase in non-specific gene expression naturally participate to the high entropy of the less organized undifferentiated cells. This study points out the importance of the basal expression level in GRNs, which is currently neglected in both experimental and theoretical approaches. It is often omitted in theoretical studies, as shown here for the previous modeling of the GATA1/2:PU.1 circuit, and eliminated during standardization steps in transcriptomic and epigenomic studies. The largest datasets generated by high throughput, multiplexed or single cell approaches, are unable to provide relevant information on the basal expression level. Reintroduction of this overlooked parameter allows to propose a unifying explanation to multiple observations including (i) the wide open chromatin of stem cells [1, 2], their generalized low level of gene expression [5, 6], and (ii) the influence of chromatin on their differentiation [7, 9, 10, 63]. These different results are not only reconciled but in addition, make it possible to develop a model of cellular differentiation/dedifferentiation in the spirit of Waddington, that is to say based more on a physical principle than on specific genes. Waddington epigenetic landscapes are shown here to be structured by the level of basal gene expression. Two concurrent views of Waddington landscapes coexist: (i) a single rigid landscape specific to the genome of each organism, whose different basins correspond to the different possible cell types in the organism, or (ii) a deformable landscape whose attractors and their depth can vary with parameter adjustments. The new mechanism proposed here clearly belongs to the latter category as it predicts that the landscape is shaped by the degree of basal expression as shown in Fig.5C and Fig.6C, in such a way that the undecided progenitor attractors remain profound as long as the cells retain their basal expression and open chromatin. This mechanism is strongly consistent with gene expression specificities of stem cells. As basal expression is progressively reduced, differentiation attractors emerge sequentially. In this gradual process, the initial commitment of totipotent cells could be triggered by a modest reduction of basal expression, while terminal differentiation of bipotent progenitors requires a strong reduction of basal expression and chromatin closure. Terminally differentiated cells with robustly imprinted circuits have a tightly packed chromatin enriched in H3K9me3, with some islets of accessibility to TFs at the level of H3K27 acetylated enhancers. The sharply partitioned chromatin of differentiated cells may ensure the persistence of well-focused specific circuits. Conversely, chromatin hyperacetylation could be responsible for a rise in basal expression and expose newly accessible binding sites for TFs, defocusing them from enhancers. A remarkable property of the present model is that differentiation and stemness attractors do not coexist at a given moment, so that stem cells cannot accidentally fall into a differentiation attractor. Conversely, the strength of established heterochromatin in differentiated cells is a powerful barrier against the risk of de-differentiation, since stem cell attractors no longer exist in that state. However, pathological or age-related loss of H3K9me3 can unlock the system and re-open the road to dedifferentiation. Hence, mechanisms ensuring the maintenance of H3K9me3 [64, 8] are essential for longevity and cancer prevention. For example, the better heterochromatinized mammary cells of formerly gestating women are less prone to cancerization, even after menopause [65]. Although the word epigenetics was first introduced in the context of the gene networks conceived by Waddington, this term was then hijacked by researchers working on chromatin, who restricted the term epigenetics to chromatin “marks” [41]. Strikingly, the present theory merges these two views by confering to chromatin epigenetics a driver role in Waddington epigenetics.
Materials and methods
Plasmids, MCF7 subclones, antibodies
The following constructs used in this study: pCR ERα, pSG-GR, pCR-DP-MRTFA (ΔN200), pCR DN-MRTFA (ΔC301), ERE-Luc (C3-Luc) and GRE-tk-LUC, are described in [16]. pCMV-galactosidase is from Promega. The stably transfected MCF7 control, DP-MRTFA and DN-MRTFA subclones are described in [16]. The following primary antibodies were used: anti-E-cadherin (ab15148; Abcam), anti-MKL1 (sc21558; Santa Cruz Biotechnology), anti-histone H3 (E173-58; Epitomics), anti-H3K9ac (histone H3 acetylated at Lys9; ab10812; Abcam), anti-H3K9me3 H3K9me3 (histone H3 trimethylated at Lys9; ab8898, Abcam), anti-double stranded DNA (ab27156, Abcam). Secondary antibodies conjugated to Alexa Fluor 488 or 594 were from Invitrogen.
Cell culture, transfection and reporter assays
HepG2, HeLa, MCF7, MDA-MB231 (MDA), MCF7 control and constitutively expressing MRTFA constructs (T-Rex system, Invitrogen) were grown in DMEM (Dulbeccos modified Eagles medium; Invitrogen) supplemented with 10% fetal bovine serum (FBS, Biowest) and antibiotics (Invitrogen) at 37C and 5% CO2 humidified atmosphere. Prior all transfections and treatments, the medium was replaced with phenol red-free DMEM (Invitrogen) containing 2.5% charcoal-stripped FCS (Biowest). Expression of the MRTFA proteins of interest was induced by a 48h treatment of MCF7 sub-clones with tretracyclin. Cells were treated for 24 h when required with ligands (10 nM estradiol or dexamethasone) or ethanol (vehicle control). The treatment with trichostatin A (TSA 647925, Merck) was performed for 24 hours at 100, 200 or 500 nM. Transfection experiments were carried out exactly as previously described [16]. RNA vs DNA content ratios were determined using the HClO4 hydrolysis method [42].
Immonohistochemistry
Cells were grown on 10-mm-diameter coverslips in 24-well plates in DMEM containing 2 % charcoal-stripped FBS. A suprimer (and treated with trichostatin A (TSA, 647925, Merck) for 24 hours at 100, 200 or 500 nM). Cells were fixed with 4% PFA (paraformaldehyde) for 10 min and permeabilized in PBS-0.3%Triton X-100 for 10 min. Incubation with the primary antibody (1:1000 dilution) was performed overnight at 4C. Secondary antibodies conjugated to Alexa Fluor were incubated for 1 h at room temperature. After washing in PBS, the cover slides were mounted in Vectashield® medium with DAPI (Vector) and images were obtained with an Imager.Z1 ApoTome AxioCam (Zeiss) epifluorescentmicroscope and processedwith AxioVision Software. For each coverslips, 10 to 20 pictures were randomly taken. Pictures were visually screened in blind condition and deleted if artefactual fluorescent aggregates were present or in case of focus problems. For each picture, fluorescence values of each nucleus were obtained in an automatic manner using a homemade plugin working on Fiji [43]. Briefly, each nucleus was identified using the DAPI labelling and, after background subtraction, total fluorescence of each nucleus was extracted from the picture obtained with the fluorescent antibody. For each condition, the mean of fluorescence intensities of more than a thousand cells were calculated.
Transcriptomic data
The microarray data on MRTFA cell lines have been submitted to the NCBI Gene Expression Omnibus website under accession No. GSE107924. Gene signatures were obtained from the publicly available database MsigDb from GSEA. The luminal and basal signatures were extracted from [44] (curated gene sets). A) Comparison of the transcriptional signature of the two DP-MRTFA1 and Dn-MRTFA clones in comparison to MCF7 control cells. B) Comparison of the four transcriptional signatures between TSA treated versus vehicle treated (Cont.) MCF7 cells. Data were obtained from by TempO-Seq targeted whole transcriptome profiling GEO accession: GSE91395 [45].
Acknowledgments
10 We thank the Ligue Régionale Contre le Cancer for its sustained financial support.
Appendix
A Parsimonious model of bivalent chromatin
An intriguing specificity of the so-called bivalent chromatin of stem cells is the presence of repressive marks (low acetylation and high methylation of enhancer H3K27, E-H3K27) and permissive marks (H3K9 acetylated and H3K4 methylated). A speculative hypothesis to explain bivalent chromatin may be based on the wide range decompaction of chromatin by acetylation of H3K9, which could be indirectly responsible for the relative closure of enhancers. Enhancers are precisely characterized, when active, by acetylated H3K27. The bivalent marks would therefore reflect a relative increase in the b/a ratio of basal to regulated transcription frequencies.
Model ingredients
In this integrated model intended to reconcile several observations with a minimum of hypotheses, stationary solutions will be obtained directly by skipping time-dependent differential equations. We will consider the existence of a single acetylation enzyme (HAT) such as CBP, capable of acetylating both H3K9 and H3K27, and a single enzyme (HDAC) capable of deacetylating them. The only difference is that in bulk chromatin, H3 lysines (H3K9, but also possibly H3K27), can be acetylated autonomously by the HAT, whereas acetylation of E-H3K27 is assisted by TFs recruiting the HAT at their target enhancers. The second postulate is that TFs have a large number of cryptic binding sites in genomic DNA that are normally not accessible in the closed chromatin of differentiated cells, but become accessible in case of generalized decompaction. Assuming that the cellular content in TF is approximately constant, this fixation will mechanically reduce its presence on enhancers, and as a consequence decrease the maintenance of their acetylated state. For simplicity, acetylation and deacetylation of H3K9 and E-H3K27 are assumed to follow traditional Michaelis-Menten velocities with the same Michaelis constant (KHAT and KHDAC). The enzymes are supposed to bind to both lysines, but to be significantly sequestrated by H3K9 only, considering that the E-H3K27 sites are restricted to enhancers, so that for a diffusing enzyme, the accessible concentration of H3K9 is much higher than that of E-H3K27.
Fraction of acetylated H3K9
H3K9 acetylation and methylation are mutually exclusive marks, but we will consider here that the dynamics of H3K9 acetylation/deacetylation is fast enough compared to that of methylation, to allow considering only the unmethylated fraction of H3K9. Let us define dimensionless Michaelis constants weighted by the substrate concentrations, with and
The maximal velocity of acetylation is where cA is the catalytic rate, and the velocity of deacetylation is
Writing V the sum of maximal velocities V = VA + VD, we define fractional maximal velocities and
Using this nomenclature, the fraction of acetylated H3K9 (written ρ) and that of deacetylated H3K9 (1 – ρ) are given by the traditional zero-order mechanism formulated in Table 1.
TF sequestration by H3K9-acetylated chromatin
Simple statistics and systematic sequencing have shown that consensual and near-consensual DNA binding sites for most TFs are widespread in the genome, but that only a few of them correspond to genuine regulatory elements or enhancers. ChIP-seq experiments confirmed that these cryptic putative binding sites are generally not occupied in native chromatin, suggesting that their accessibility is prevented by chromatin closure. Hence, we will postulate here that chromatin loosening by acetylation could render the cryptic sites accessible. In random sequences, cryptic sites are distributed on average every n nucleosomes. Their concentration is R = [H3K9ac]/n = ρN/n, n being about 20 for a consensus sequence of 6 base pairs in a random sequence. If the TF binds to these sites with an average dissociation constant K = kd/ka, a fraction of the TF (of constant total concentration F) will be sequestrated, yielding only a residual free concentration f, such that giving
E-H3K27 acetylation
The knowledge of the free concentrations of TF and enzymes finally allows to predict the acetylation status of E-H3K27. Contrary to that of H3K9, E-H3K27 acetylation is supposed to necessitate previous TF binding. In turn TF binding is poorly efficient in absence of E-H3K27ac and is then greatly facilitated by E-H3K27ac. The rates used for the different reactions are listed in Table 2.
In absence of specific data, we will arbitrarily assume that the catalytic rate and Michaelis constant of the HAT are the same for H3K9 in absence of F and for E-H3K27 in presence of TF. In turn, enzymatic sequestration is assumed to be caused by H3K9 only considering the minor contribution of the enhancers in the genomes. Using the first-order or pseudo-first order rates of Table 2, the stationary probabilities of E-H3K27 acetylation is
Replacing in this equation f by its value given in Eq.(7b) and ρ by its value given in Table 1, allows to express P(H3K927ac) as a function of the variable θ only. As represented in Fig.S1, E-H3K27 can be largely deacetylated in spite of an overall hyperacetylation in the cell. In other words, a global increase of acetylases activity can simultaneously open bulk chromatin and alter enhancers, leading to an increase of b and a decrease of a for θ > 0.5 (Fig.S1).
This model is minimalist in that it is based only on the competition between bulk histone acetylation and E-H3K27 acetylation, and recourses to very few ingredients in a field, chromatin epigenetics, which involves a lot of molecular actors. Other correlations and amplification phenomena, not incoporated here are naturally expected to complete the picture. For instance, methylation maintains non-acetylated lysines in non-acetylable form by competition, thereby locking the system. Conversely the histone variant H2A.Z, whose profile is parallel to that of H3K27ac at the level of enhancers, is also an important player in enhancer functions by causing nucleosomal depletion [66]. Certain marks are clearly correlated: H3K9 acetylation is associated to H3K4 methylation, and is closely related to DNA methylation in multiple ways including: (i) the presence of methylcytosine binding protein MBD1 in H3K9 methyl transferase complexes like SETDB1 and CAF1, (ii) the recruitment of HDACs by methylated DNA-bound MeCP2 and conversely (iii) the recruitment of a DNMT by H3K9me3-bound HP1. H3K9 methylation can stabilize chromatin in a non-acetylable form in post-mitotic differentiated cells. It should be noted in this respect that H3K9me3 is particularly persistent and constitutes the main lock against the risk of de-differentiation and reprogramming [7]. From the Waddington-type view, the developmental selection of the genes to close in the course of differentiation, proceeds more by absence of expression than by active repression. Repressive machineries like polycomb complexes, which suppress the expression of many genes in embryonic stem cells [25], could only ratify preexisting low transcription states, as suggested in [59], thereby locking selectively the genes which have already lost their Waddington-type fight in antagonistic genetic circuits. The relative contributions of basal vs regulated expression (b/a) are the main regulator of the balance of cellular dedifferentiation/differentiation in the present model. A speculative and a general scenario depicted in Fig.S2, can thus be proposed around this central core, which connects several results of cellular biology, from metabolism to multistability, which is the fundamental hallmark of differentiation.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵