Abstract
Though the sequence of the genome within each eukaryotic cell is essentially fixed, it exists in a complex and changing chromatin state. This state is determined, in part, by the dynamic binding of proteins to the DNA. These proteins—including histones, transcription factors (TFs), and polymerases—interact with one another, the genome, and other molecules to allow the chromatin to adopt one of exceedingly many possible configurations. Understanding how changing chromatin configurations associate with transcription remains a fundamental research problem. We sought to characterize at high spatiotemporal resolution the dynamic interplay between transcription and chromatin in response to cadmium stress. While gene regulatory responses to environmental stress in yeast have been studied, how the chromatin state is modified and how those modifications connect to gene regulation remain unexplored. By combining MNase-seq and RNA-seq data, we found chromatin signatures of transcriptional activation and repression involving both nucleosomal and TF-sized DNA binding factors. Using these signatures, we identified associations between chromatin dynamics and transcriptional regulation, not only for known cadmium response genes, but across the entire genome, including antisense transcripts. Those associations allowed us to develop generalizable models that can predict dynamic transcriptional responses on the basis of dynamic chromatin signatures.
Introduction
Organisms require genic transcription for the production of proteins required for biological functions including: growth, replication, repair, and response to environmental changes. Transcription is tightly regulated through the complex interplay between a myriad of DNA-binding factors (DBFs) such as the histones that make up a nucleosome, transcription factors (TFs), and polymerases. These proteins and complexes involved in transcription, and the many others involved in protein-DNA interactions, determine the chromatin landscape. How these constituents of the chromatin bind, move, evict, and interact to regulate transcription remains an open area of research.
Many studies have made major strides in characterizing the role of the proteins and complexes involved in transcription. ChIP-based studies have characterized the role of hundreds proteins involved in transcription on a genomic scale. Among those include the interaction of factors involved in SAGA-dominated stress-related pathways and TFIID-dominated housekeeping pathways (Venters et al. 2011). Additionally, these studies have identified protein complexes involved in the formation of the pre-initiation complex required for transcription initiation (Rhee and Pugh 2012). Through gene expression analysis of deletion mutants, proteomics, and chromatin immunoprecipitation (ChIP) studies, the role of numerous chromatin remodelers and their interactions have been characterized in detail (Lenstra 2011, Weiner 2012, Weiner 2015, Krogan 2006, Shivaswamy2008-ep). However, these studies are constrained by limitations of their methods including lack of antibodies for ChIP and viable deletion strains. Analysis is further complicated by the difficulty in deconvolving direct chromatin effects and the pleiotropic action of the many factors and remodelers occurring upstream of transcription. These aspects contribute to reasons why the chromatin landscape involved in transcription is still poorly understood.
Another approach for profiling the chromatin has been utilized to map the chromatin in a protein-agnostic manner using nuclease digestion. Nuclease digestion methods, including micrococcal nuclease (MNase), provide a complementary perspective in understanding the chromatin as it can profile chromatin accessibility at base-pair precision. Recent genome-wide mapping studies have characterized the dynamics of nucleosomes through the use of nucleosome-sized MNase-seq fragments under various conditions including the cell cycle (Nocetti and Whitehouse 2016), DNA damage (Tripuraneni et al. 2019), and heat shock (Teves and Henikoff 2011). Additionally, studies have attempted to understand the role subnucleosomal MNase-seq fragments in DNA transactions (Belsky et al. 2015; Brahma and Henikoff 2019; Chereji et al. 2017; Henikoff et al. 2011; Kubik et al. 2017; Ramachandran et al. 2017; Teves and Henikoff 2011). These studies highlight the challenge of characterizing the vast heterogeneity and interactions between proteins and complexes involved in DNA transactions, including transcription.
Factor agnostic chromatin occupancy profiles from MNase provide an opportunity to link changes in chromatin occupancy at nucleotide resolution with transcriptional regulation, especially regulation induced by environmental perturbations such as cadmium. Cadmium is a toxic metal whose deleterious effects have been well-characterized in yeast. Cadmium toxicity has been shown to induce proteotoxic stress (Faller et al. 2005; Gardarin et al. 2010; Hartwig 2001; Sharma et al. 2008), oxidative stress (Brennan and Schiestl 1996), and inhibition of DNA repair systems (Jin et al. 2003). Yeast respond to cadmium exposure through the activation of stress response genes (Dormer et al. 2000; Hosiner et al. 2014; Jin et al. 2008) and the repression of ribosome biogenesis and translation-related genes (Hosiner et al. 2014; Jin et al. 2008). While these cadmium response pathways have been studied extensively through ChIP, proteomics, and transcription-related studies, the dynamics of the chromatin response have only been inferred through deletion mutants or estimated through a limited set of ChIP antibodies.
Utilizing high-resolution spatiotemporal data, we developed general strategies to analyze, genome-wide, chromatin dynamics relative to changes in transcription. We exposed yeast to cadmium and collected data on its chromatin and gene expression over a two-hour time course. This data allowed us to infer and differentiate essential cadmium stress response pathways solely from the chromatin. We also identified chromatin changes associated with pervasive and potentially regulatory antisense transcription. Our identification of unique classes of chromatin changes enabled us to develop a regression model that can predict the yeast’s transcriptional response to cadmium.
Results
Paired-end MNase-seq captures high-resolution chromatin occupancy dynamics associated with transcription during cadmium stress
We sought to characterize the dynamics of the chromatin through changes in occupancy and organizational structure of nucleosomes and transcription-related proteins. A nucleotide resolution view of chromatin occupancy dynamics in response to cadmium stress would allow us to associate and infer relationships between these chromatin changes with changes in transcription. Yeast cells were exposed to cadmium and samples were collected over a two-hour time course (Fig. 1A). Chromatin occupancy and positioning dynamics were profiled using paired-end MNase-seq to map DBFs at base pair resolution (Fig. 1B). Concurrently, transcripts were interrogated using strand-specific total RNA-seq (Fig. 1C).
To evaluate our data and methods, we analyzed the chromatin local to the well-characterized stress response gene HSP26 whose role is to facilitate the disaggregation of misfolded proteins (Cashikar et al. 2005). Hsp26 has been implicated in many stress conditions including heat shock (Benesch et al. 2010; Franzmann et al. 2008), acidity (Kawahata et al. 2006), sulfur starvation (Pereira et al. 2008), and metals toxicity (Hosiner et al. 2014; Momose and Iwahashi 2001). Furthermore, several transcription factors, including Hsf1, Met4, and Met32, have been found to bind in HSP26’s well-characterized promoter (Boy-Marcotte et al. 1999; Carrillo et al. 2012; Chen and Pederson 1993; Susek and Lindquist 1990; Treger et al. 1998). The context of these studies makes HSP26 a valuable gene to study its local chromatin dynamics in an activating condition.
Local to HSP26’s transcription start site (TSS) (Fig. 2A), we observe significant changes to the chromatin coinciding with its increase in transcript-level. Upstream, in HSP26’s promoter, nucleosome-sized fragments of length 144–174 bp are replaced by small fragments less than 100 bp. In HSP26’s gene body, nucleosome-sized fragments become “fuzzy”, increasing in positional and fragment-length variability (Fig. 2A). Nucleosomes in HSP26’s promoter region are evicted (Lee et al. 2004) and replaced by an enrichment of factors associated with transcription initiation pushing nucleosomes downstream (Fig. 2B). Then, active transcription by RNA polymerases displace and evict nucleosome histones (Kulaeva et al. 2010; Lee et al. 2004; Schwabish and Struhl 2004) which is apparent in the significant disruption in HSP26’s gene body.
Because of these complex transcription-associated chromatin dynamics, we sought to quantify our MNase-seq data by summarizing the chromatin with two scores, which would also enable us to characterize subtle chromatin changes genome-wide, not just at genes with dramatic changes like HSP26. A “promoter occupancy” score was computed by counting the number of small fragments appearing in each gene’s promoter region [–200, TSS] for each time point. We used a core promoter length of 200 bp as previously described (Lubliner et al. 2013; Smale and Kadonaga 2003). Then, because nucleosomes can be characterized by position changes and variance, in addition to occupancy, a per bp score was computed using a cross-correlation with a well-positioned, “idealized” nucleosome signal (Fig. 2C). We used the entropy of the +1, +2, and +3 nucleosome cross correlations to define the gene’s “nucleosome disorganization”. To handle variable RNA stability, we computed transcription rates using difference equations on the transcript levels and previously obtained decay rates.
Using these measures, we are able to succinctly track the coordination between HSP26’s chromatin dynamics and its increased transcription rate (Fig. 2D). Both of HSP26’s chromatin scores and its transcription rate increase dramatically throughout the time course and reach their peaks at 120 minutes. In addition to HSP26, we also examined the chromatin at genes with repressed (Supplemental Fig. 1) and unchanging (Supplemental Fig. 2) transcription contexts in response to cadmium and found similar linkages between changes in chromatin and transcription.
We next wanted to determine if this coordination exists genome-wide. The dynamics of the chromatin for each gene were further summarized into a single quantity. For each gene, their scores for promoter occupancy and nucleosome disorganization were transformed into a z-score and combined into an average across the time course (Fig. 3A, B). Sorting by this “combined chromatin” score, we observed a significant proportion of the genome exhibited a coordination between changes in chromatin and changes in transcription for genes that are both activated and repressed. A significant positive Pearson correlation of 0.49 was computed between the average change in promoter occupancy and the average change in transcription (Fig. 3C). Similarly, a correlation of 0.61 was found between the average change in nucleosome disorganization and the average change in transcription (Fig. 3D). When measuring the correlation between the average change in combined chromatin and change in transcription, an even higher correlation of 0.68 was computed (Supplemental Fig. 3A). This correlation coupled with the lack of correlation (Supplemental Fig. 3B), at 0.17, between promoter occupancy and nucleosome disorganization themselves suggests that each metric provides an orthogonal explanation of the chromatin relative to changes in transcription.
Changes in nucleosome and small factor occupancy at TSSs recapitulate the cell’s genome-wide transcriptional response to cadmium
We next sought to determine how well chromatin dynamics reflect the cell’s stress response to cadmium exposure. To answer this, we used the three, previously defined scores for chromatin dynamics: promoter occupancy, nucleosome disorganization, and combined chromatin. Using the 300 highest and lowest average scoring genes for each metric, using an approximately 90% inner percentile range, we performed Gene Ontology (GO) enrichment analysis to determine regulation pathways implicated under cadmium exposure. For each chromatin score, discrete GO enrichment pathways were identified with varying levels of false discovery rate (FDR) significance.
A well established response for cells undergoing stress involves shutting down ribosomal and translation-related pathways (Hosiner et al. 2014; Reja et al. 2015; Vinayachandran et al. 2018). We identified this repression through GO enrichment analysis as genes with the greatest decreasing promoter occupancy, nucleosome disorganization, and combined chromatin scores. While, the combined score identifies the translation-related pathways with the greatest significance, many of the terms are recovered with FDR less than 10−10 by both promoter occupancy and nucleosome disorganization (Fig. 4A).
While ribosome and translation-related genes are repressed as a tightly regulated cluster, pathways activated under cadmium exposure are recovered with less significance, with FDRs less than 10−4 (Fig. 4B). Consistent with cadmium and heavy metals stress studies, two major stress responses are activated under cadmium exposure: sulfur assimilation and protein folding (Faller et al. 2005; Fauchon et al. 2002; Hartwig 2001). Each metric was able to identify distinct pathways with varying FDR. Promoter occupancy implicated sulfur assimilation and response to stress pathways with a FDR of 10−3.9, and nucleosome disorganization recovered protein refolding with an FDR of 10−2.1. The combined chromatin score identified GO terms found distinctly in the promoter occupancy or nucleosome disorganization analyses, and for some terms with a better FDR, such as with sulfur amino acid metabolic process.
In both sets of GO analyses, using the combined chromatin score provides novel value in identifying implicated GO terms than using each individual metric alone. And though the cell’s response to cadmium has been characterized through gene expression and ChIP-based studies, we show that elements of the chromatin alone are enough to accurately recover major cadmium response pathways.
High-resolution time course recovers cascading induction of sulfur pathways
As suggested by GO enrichment, significant chromatin changes occur at genes in the sulfur metabolic pathways. Utilizing chromatin and transcription data in our time course, we recover findings previously discovered through decades of ChIP, mutant, and transcription-related studies (Barbey et al. 2005; Blaiseau and Thomas 1998; Carrillo et al. 2012; Cormier et al. 2010; Fauchon et al. 2002; Kuras et al. 1996, 2002; McIsaac et al. 2012; Ouni et al. 2010; Patton et al. 2000; Petti et al. 2012) and reveal novel details of the cascade of events regulating these pathways. Yeast cells exposed to cadmium require sulfur metabolized for biosynthesis of the cadmium-chelating glutathione (Fauchon et al. 2002). Genes in the sulfur metabolic pathways are activated primarily through the transcription factor Met4 and its binding complex, comprised of cis-binding factors Cbf1 and Met31/Met32, and accessory factor Met28 (Blaiseau and Thomas 1998; Kuras et al. 1996). Met4 is regulated through ubiquitination by SCFMet30 either targeting it for degradation or rendering into an inactive, but stable state (Barbey et al. 2005; Kaiser et al. 2000; Kuras et al. 2002). Cadmium exposure overrides this ubiquitination enabling Met4’s functional activation of the sulfur metabolic genes (Barbey et al. 2005) (Fig. 5A). Using our calculated transcription rates and measures of chromatin dynamics, we recover three major components of the sulfur metabolic pathways (Fig. 5B): (i) activation of the Met4 complex through its cofactors, (ii) activation of the sulfur pathways by Met4, and (iii) implicit down-regulation of Met4 activity by SCFMet30, evident in diminished transcription of Met4-regulated genes.
Upon deubiquitination, Met4 becomes functionally active and induces its own cofactors (Barbey et al. 2005; McIsaac et al. 2012). Concomitantly, MET32 is activated and, with Met4 through feedforward regulation, activates the sulfur metabolic genes (Carrillo et al. 2012; McIsaac et al. 2012). In our time course, this induction is evident not only in increased transcription within 7.5 minutes for MET32 and MET28, but also in dramatic nucleosome disorganization in MET32’s gene body, (Supplemental Fig. 4) and small fragment enrichment in MET28’s promoter. Meanwhile, MET31’s chromatin exhibits more unexpected behavior relative to its change in repressed transcription. While Met31 shares a binding motif and largely overlaps in function with Met32 (Blaiseau et al. 1997), Met31’s role is not as prominent as Met32’s in the activation of sulfur pathways (Carrillo et al. 2012; McIsaac et al. 2012; Petti et al. 2012). In our data, we observe that while MET31 is repressed, its nucleosomes disorganize with increased antisense transcription (Supplemental Fig. 5). Additionally, downstream of MET31’s transcription end site (TES), small fragments become enriched at a Met31/Met32 binding motif. Taken together, our data identifies MET31 as a potential target for regulation through non-coding RNA (ncRNA) antisense transcription, a result we explore genome-wide in a subsequent section.
Following activation of MET32 and MET28, the Met4 complex is formed and activates genes in the sulfur pathways (Carrillo et al. 2012; McIsaac et al. 2012), which we also observe in the form of increased promoter occupancy, nucleosome disorganization, and transcription. Each of the seven sulfur assimilation genes (Fig. 5C) and many of the downstream pathways increase in promoter occupancy and nucleosome disorganization within 15 minutes.
Additionally, Met4 induces a sulfur-sparing transcriptional-switch between functionally similar isoforms to indirectly contribute sulfur required for chelation. This switch includes replacing sulfur-rich Pdc1 with sulfur-lacking Pdc6, Ald4 with Ald6, and Eno1 with Eno2 (Fauchon et al. 2002). Each of these sulfur-lacking genes show evidence of activation in both their chromatin and transcription and, consistent with known studies (Fauchon et al. 2002), the most dramatic changes are evident as small fragment occupancy changes in PDC6 (Supplemental Fig. 6), PDC1 and ENO1’s promoter.
Following the induction of the sulfur pathways, Met32 and Met4’s activating functions diminish. Because prolonged activity of Met32 and Met4 induces cell cycle arrest, regulation of Met4 and Met32 through SCFMet30 is required for long-term cell proliferation (Ouni et al. 2010; Patton et al. 2000). These events are present most clearly in the chromatin dynamics for MET30 and MET32. We observe increasing transcription and gradual disorganization of gene body nucleosomes through the 120-minute time course. Additionally, we observe evidence for Met4 facilitating regulation of MET32 (Ouni et al. 2010) through a plateau of MET32’s nucleosome disorganization and transcription from 60–120 minutes.
Taken together, we are able to detail the timing of the activation of the Met4 complex, induction of the sulfur genes, and subsequent down-regulation of Met4 activity (Fig. 5D). This analysis complements established transcriptional studies by detailing chromatin dynamics of the sulfur metabolic pathways and identifying a potentially novel regulatory mechanism for MET31 through antisense transcription.
Cadmium treatment induces chromatin dynamics as distinct temporal clusters, including those linked to antisense transcription
Our results demonstrate that the temporal order of chromatin changes were tightly associated with the transcriptional regulation of sulfur pathways. Additionally, we found examples in which chromatin dynamics may not strictly correlate with sense transcription. With these observations in mind, we sought to understand the timing of the chromatin dynamics associated with cadmium stress using hierarchical clustering. We identified eight clusters, to identify generalized patterns, among 832 genes, chosen because they were among the 500 most dynamic in either promoter occupancy or nucleosome disorganization (some genes were in both, which is why there are fewer than 1000; Fig. 6A). GO enrichment analysis was then performed on each cluster. Clustering and GO enrichment analysis revealed three major results: (i) the sulfur and protein folding pathways can be differentiated through the timing of changes in the chromatin, (ii) increased transcription may not always accompany nucleosome disorganization, and (iii) antisense transcription can explain anti-correlated chromatin dynamics.
GO analysis of the eight clusters reveals differences in the timing of the the protein folding and sulfur metabolic pathways (Fig. 6B. Cluster 1, enriched with sulfur assimilation and methionine metabolic process genes, show an increased and steady state of promoter occupancy and nucleosome disorganization, consistent with the previous results. In cluster 2, genes relating to ATPase activity and protein refolding are activated between 7.5–15 minutes, repressed between 30–60 minutes, and reactivated by 120 minutes, evident in the change in promoter occupancy, nucleosome disorganization, and transcription rate for the latter 10% of genes in the cluster, genes that can be differentiated further with a higher k. Clusters 1 and 2 show that not only does the chromatin recover activating stress response pathways, but it also differentiates their temporal changes consistent with changes in transcription rate.
While most genes presented thus far have shown to have a positive correlation between their changes in each chromatin measure and their change in transcription, clusters 6–8 reveal an unexpected anti-correlated relationship. While the transcription of many of these genes are activated, these genes show either a decrease in promoter occupancy and increase in nucleosome disorganization (in clusters 5 and 6), or an increase in promoter occupancy and decrease in nucleosome disorganization (in clusters 7 and 8). This suggest more complex chromatin dynamics at play than the directly correlated measures we previously described. An example of this complexity is present in cluster 7, where the gene coding for an endoplasmic reticulum membrane protein Mcd4 exhibits chromatin with counter-intuitively organized nucleosomes despite increased transcription. (Supplemental Fig. 7).
For genes in cluster 6, some of the anti-correlated phenomena can be attributed to the antisense transcription (Fig. 6C) previously identified in MET31. Antisense transcription presents itself genome-wide with varying changes in sense transcription (Fig. 7A). We observed two major phenomena consistent with existing studies with respect to antisense transcription. First, as identified in other environmental conditions (Kim et al. 2010; Till et al. 2018; Wilhelm et al. 2008), antisense transcription is induced pervasively in yeast undergoing cadmium stress. The distribution of genes with increased antisense transcripts monotonically skews towards more transcription through the 120-minute time course (Fig. 7B). Of the genes whose sense transcription is unchanging, we found 529 exhibit a four-fold increase in antisense transcription (Fig. 7C). The second phenomena we observed pertains to genes whose sense transcription have changed. Previous studies have found that antisense transcription to be associated with both repression or activation of target genes (Kornienko et al. 2013; Swamy et al. 2014; Till et al. 2018; Vance and Ponting 2014). Under cadmium stress, we identified 92 genes whose antisense transcripts increased with decreased sense transcription, such as with MET31 and UTR2, whose overexpression has been linked with endoplasmic reticulum stress (Miller et al. 2010) (Supplemental Fig. 8). We found 125 genes increased in both sense and antisense transcription, including the gene YBR241C (Supplemental Fig. 9) coding for a vacuole localization protein (Wiederhold et al. 2009). These phenomena indicate that changes in the chromatin may not strictly be associated with transcription, at least not solely on the sense strand.
While chromatin dynamics are able to accurately recover and differentiate the timing of known stress response pathways, new questions are raised when these dynamics are not strictly correlated with sense transcription. Complexities introduced by the heterogeneity of DBF binding dynamics and interactions and transcription on the antisense strand indicates that a more complex model of the chromatin landscape is required in elucidating the relationship between chromatin behavior and gene expression.
Chromatin occupancy changes are predictive of changes in gene expression
We next sought to develop a model to quantify the relationship between our measures of the chromatin and changes in transcription. We constructed a Gaussian process regression model to predict the transcription at each time point based solely on chromatin dynamics and the preinduction transcription levels at 0 min. We constructed four models to evaluate various inclusions of measures of the chromatin, including a “full” model incorporating nucleosome positional shift calls (Supplemental Fig. 10) and measures of chromatin relative to called antisense transcripts (Supplemental Fig. 11).
We then evaluated each model using 10-fold cross-validation and the coefficient of determination (R2), as the model’s proportion of predictable variance (Fig. 8A). For each model excepting the intercept model, prediction performance becomes worse through the time course as the transcript level deviates from the 0 minute transcript feature. However, models including features of the chromatin consistently outperform the model using 0 min transcript level alone. Nucleosome disorganization is more informative than promoter occupancy and, consistent with previous results, combining both metrics provides more predictive power than each alone. The full model is not the best between 7.5–15 minutes because prediction is mainly driven by 0 minute transcript level early on (Fig. 8B). It outperforms all other model between 30–120 minutes maintaining an R2 greater than 0.4 two hours after the cell’s exposure to cadmium (Fig. 8C, D).
While our metrics do not describe the full state and variability of the chromatin landscape during transcription, our regression model provides evidence that a proportion of transcription can be explained from modifications of the chromatin state. This model serves as a baseline for understanding a portion of the complex relationship between the chromatin and transcription with numerous opportunities for extension.
Discussion
In contrast to ChIP-based studies, our study surveys the occupancy of DBFs across the entire genome without explicit information on the identities of the DBFs. While nucleosomes are well-characterized by nuclease digestion studies, profiling TFs and complexes that affect gene expression is a more challenging, open problem. Studies have identified the dynamics of various promoter-binding factors including transcription factors, general transcription factors, polymerases, mediator, SAGA, TFIID, histone modifications, chromatin remodelers, and others (Chereji et al. 2017; Huisinga and Pugh 2004; Reja et al. 2015; Rhee and Pugh 2012; Shivaswamy and Iyer 2008; Venters et al. 2011; Vinayachandran et al. 2018; Weiner et al. 2012, 2015). Utilizing both literature and motif analysis of TFs, we can implicitly describe the activity evident within gene promoters, such as with HSP26. Additionally, well-characterized responses, such as the sulfur pathways, allow for additional context in determining the logical sequencing of chromatin modification events and modal changes in gene expression.
Analysis of the gene encoding Met4 cofactor Met31 uncovered chromatin changes, not only linked with gene expression, but also with antisense transcription. While pervasive and regulatory ncRNA and antisense transcription have previously been shown to be associated with environmental perturbation (Camblong et al. 2007; Nadal-Ribelles et al. 2014; Swamy et al. 2014; Toesca et al. 2011), we characterized the relationship between these transcripts with gene expression from the perspective of the chromatin. Including the chromatin measures for the 667 genes with antisense transcripts also provides a marginal benefit in predicting sense transcripts (Fig. 8A). This benefit can be explored further by narrowing in on the effect size of these antisense-related chromatin measures and by examining the individual sets of genes whose gene expression appears to have a relationship with antisense transcription.
Using the initial transcript level and chromatin dynamics of both sense and antisense transcription, our regression model is able to predict the level of sense transcript with an R2 greater than or equal to 0.44 for all time points following cadmium exposure Fig. 8A. There are multiple opportunities to extend this model. Further quantifying of the chromatin may include additional classes of fragments and characterization of the chromatin outside our defined [–200, 0] bp promoter and [0, 500] bp gene body. Additionally, this data set enables opportunities for modeling using other statistical methods including generalized linear models, deep neural networks, or random forests. This model and its predictions serve as a baseline showing the potential modeling opportunities and richness of statistical power of the chromatin.
Materials and Methods
Yeast strain
The yeast strain used in this study has the W303 background with the genotype: MATa, leu2-3,112, trp1-1, can1-100, ura3-1, ade2-1, his3-11,15.
Cell growth
Cells were grown asynchronously in rich medium at 30°C to an OD600 of 0.8. A sample was removed and crosslinked for MNase-seq and another was pelleted and flash frozen for RNA-seq at time 0 before the addition of CdCl2 to a final concentration of 1mM. Samples were taken at 7.5 min, 15 min, 30 min, 60 min, and 120 min following CdCl2 addition. All samples were taken and processed in duplicate.
Chromatin preparation
Chromatin was prepared as previously described (Belsky et al. 2015).
RNA-seq
RNA was prepared using the Illumina TruSeq Stranded Total RNA Human/Mouse/Rat kit (Cat number RS-122-2201) following the protocol provided by Illumina with RiboZero.
Sequencing library preparation
Illumina sequencing libraries of MNase-treated DNA were prepared using 500 ng of DNA as previously described (Henikoff et al. 2011).
Sequencing read alignment to the genome
All reads were aligned to the sacCer3/R64 version of the S. cerevisiae genome using Bowtie 0.12.7 (Langmead et al. 2009). The recovered sequences from all paired-end MNase reads were truncated to 20bp and aligned in paired-end mode using the following Bowtie parameters: -n 2 -l 20 -m l -k l -X 1000.
MNase-seq duplicates, A and B, were randomly subsampled and merged to reduce bias from library preparation, sequencing, and MNase digestion. Separately for each duplicate, the time point with the fewest reads determined the subsampling depth (kA and kB): Each duplicate was then subsampled (uniformly at random) to its respective subsampling depth to form new sets A′ and B′: Finally, the subsampled duplicates were merged into a superset, M, for downstream analysis:
Selection of gene set
We compiled a set of 4,427 genes for analysis. Genes were chosen according to five criteria: (i) classified as either verified or uncharacterized by sacCer3/R64, (ii) contains an open reading frame (ORF) at least 500 bp long, (iii) contains an annotated TSS, (iv) has an estimated half-life value, and (v) has adequate MNase-seq coverage.
Genes whose ORFs are less than 500 bp (Supplemental Fig. 12A) long were omitted in order to ensure valid “gene body” calculations between [TSS, +500]. TSS annotations were determined by (Park et al. 2014). For four important sulfur-related genes, Sul1, Sul2, Met32, and Hsp26, TSS annotations were manually annotated to be consistent with this study’s RNA-seq data. A half-life was required for each gene in order to estimate valid transcription rates. MNase-seq coverage was computed in a 2,000 bp window centered on each gene’s TSS. A position in this window is considered “covered” when there exists at least one read whose center lies on this position. MNase coverage was then defined as the number of covered positions in this window divided by the the length of the window, 2,000 bp. Genes with MNase coverage below 0.85 (n=109) were excluded from further analysis (Supplemental Fig. 12B).
Classification of MNase-seq fragments and occupancy
For each gene, two regions were defined relative to their TSS. The promoter region was defined as a 200 bp region upstream of the TSS, [–200, TSS]. This region was chosen as a length previously described by (Lubliner et al. 2013; Smale and Kadonaga 2003). The gene body region was defined as the 500 bp region downstream of the TSS, [TSS, +500], to include the +1, +2, and +3 nucleosomes.
To compute metrics against nucleosome and small factor binding signals, two reference data sets were used. Nucleosome-related metrics were computed by examining the MNase-seq fragment distribution at 2,500 unique nucleosome positions mapped by a highly sensitive chemical mapping methodology (Brogaard et al. 2012). Small factors metrics were computed using 151 Abf1 binding sites determined through phylogenetic conservation and motif discovery (MacIsaac et al. 2006). Prior studies have found clear signals of small MNase-seq fragment enrichment at Abf1 sites (Henikoff et al. 2011). MNase-seq distributions at each reference set was examined at 0 minutes, prior to cadmium treatment.
Reads were further delineated as nucleosome-sized fragments, those between 144– 174 bp long, or small fragments, those less than 100 bp long. A mode length of 159 bp was computed at the MNase-seq fragments at (Brogaard et al. 2012) sites. A ±15 bp range around this 159 bp mode length was chosen for nucleosome-sized fragments. A mode length of 75 bp was computed at (MacIsaac et al. 2006) sites. Fragments less than 100 were chosen to be small fragments. Occupancy was then defined by counting the number of fragment centers in these regions with the designated fragment lengths. Occupancy was calculated for small fragments in the promoter, nucleosome fragments in the promoter, small fragments in the gene body, and nucleosome fragments in the gene body.
Signal processing of chromatin
A cross-correlation was computed in a similar manner described in (Tripuraneni et al. 2019). Around each gene’s TSS, a per bp cross-correlation score was computed to smooth the positional variation and filter out non-relevant fragments. Three two-dimensional cross-correlation kernels were constructed, an “idealized”, well-positioned nucleosome (Supplemental Fig. 13A) kernel, a clearly bound small factor kernel (Supplemental Fig. 13B), and a triple nucleosome “gene body” summarization kernel (Supplemental Fig. 13C). Each kernel was applied to the region local to each gene’s TSS for each time point to compute a per bp cross-correlation score (Supplemental Fig. 13D).
The nucleosome and small factor kernels were constructed using a bivariate Gaussian distribution parameterized by the mean and variance for the position and length for MNase-seq fragments. The parameters for each kernel were determined using the fragment length and position distributions at positions in (Brogaard et al. 2012) and (MacIsaac et al. 2006) previously described in Classification of MNase-seq fragments and occupancy.
To summarize the gene body chromatin as a whole, a three nucleosome, “triple” kernel was constructed to dampen the effect of the +1 nucleosome becoming more poised to be well-positioned (Nocetti and Whitehouse 2016). The triple nucleosome kernel was constructed by repeating the nucleosome kernel and increasing the variance to take into account variable linker spacing. The nucleosome kernel spacing was determined using the average peak spacing between the [+1,+2] and the [+2,+3] nucleosome cross correlation scores Supplemental Fig. 13E.
Quantifying nucleosome disorganization
For each gene, a random variable X was defined with n possible outcomes representing each position to evaluate relative to the gene TSS. The probability of each outcome is estimated using the triple nucleosome cross correlation scores previously defined and normalized to sum to 1.
Because the triple kernel computes a score for three approximately adjacent nucleosome positions, we set n = 150 to summarize the disorganization of the first three nucleosomes in the gene body starting with +1 within the [0, 150] window.
crossnuc(i) = nucleosome cross correlation at i Using this random variable, a score was computed for each gene to define its “nucleosome disorganization” using information entropy (Supplemental Fig. 13F):
Calling +1, +2, +3 nucleosomes
Nucleosomes were called using the peak nucleosome cross correlation scores local to each gene’s TSS. The peak scores per bp of a 1000 bp window around the TSS were sorted, and with the largest peak iteratively removed. Positions within 150 bp around each peak were also removed and this value and position was called as a nucleosome. This procedure was repeated until all positions were removed and nucleosomes were called for this 1000 bp window.
“Linked” nucleosomes are defined as nucleosomes across the time course that nominally represent the same underlying nucleosome that may have changed in position or “fuzziness”. Nucleosomes were linked across time points using a nearest neighbor approach. In a greedy manner, the nucleosome the lowest disorganization score, the most well-positioned, was considered first. The position of this nucleosome was used to identify the linked nucleosomes in previous and subsequent time points by considering the nearest nucleosome for their respective time points within 100 bp of the original nucleosome’s position.
+1 nucleosomes were called by identifying linked nucleosomes closest to the TSS. +2 and +3 nucleosomes were computed as the next set of nucleosomes at least 80 bp downstream from their neighboring nucleosome.
Gene Ontology enrichment analysis
GO enrichment analysis was performed using GOATOOLS (Klopfenstein et al. 2018) with the go-basic.obo annotations from the Gene Ontology Consortium (Ashburner et al. 2000; The Gene Ontology Consortium 2019). False discovery rate was correcting using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995).
Identification of transcription factor binding sites
TF binding sites were identified with FIMO (Grant et al. 2011) using the motif database from MacIsaac et al. (2006) and default p-value threshold. Selected binding sites with supporting literature were annotated on typhoon plots.
Transcription rate estimation
As previously described in (Cashikar et al. 2005; Rabani et al. 2011; Yang et al. 2003), transcription rates were computed for each gene using a zero-order growth with first-order decay relationship: Where Ci is the total RNA concentration measured by RNA-seq for sample i, k is the fixed decay rate, Gi is the concentration of RNA affected by the zero order growth, and Ri is the unknown transcription rate. Assuming a constant rate of transcription between time points, Ri and Gi can be solved through pairs of difference equations: Transcription rates for Ri, i ∈ {7.5, 15, 30, 60, 120} were computed as systems of difference equations between pairs of RNA-seq measurements to compute Gi and Ri.
Similarly, steady-state transcription rates, R0 at t0 were computed by setting the rate of production equal to the rate of decay: Decay rates were computed using the average half-life values, τ between (Geisberg et al. 2014; Miller et al. 2011; Presnyak et al. 2015) and k = 1/τ. Computed transcription rate values were then truncated to 0.1 TPM/min for valid fold-change evaluation.
Clustering of chromatin measures
Clustering was performed using hierarchical clustering through SciPy (Virtanen et al. 2020) for its flexibility in determining k. The Ward linkage was used for its efficient approximation to the minimal sum of squares objective (Ward 1963).
832 genes were chosen for clustering from the union of the 500 greatest increase in average promoter occupancy or 500 greatest increase in average nucleosome disorganization, genes outside of an approximately 75% inner percentile range for each measure.
Clustering was performed against the pair-wise Euclidean distances between the z-score normalized measures of change in promoter occupancy and nucleosome disorganization. Clustering to k=8 was chosen to balance the interpretability of fewer clusters with the significance of identified GO terms in smaller, but more numerous clusters.
Computing antisense transcription metrics
Antisense transcript levels were quantified using a TPM calculation defined by (Wagner et al. 2012) for strand-specific RNA-seq reads on the antisense strand in the ORF for each gene.
TSSs and transcription end sites (TES) for antisense transcripts was identified using RNA-seq pileup, the number of reads covering a genomic position. To increase the signal of fully transcribed transcripts, per-position pileup values were summed across each time point into a cumulative pileup and smoothed using a Gaussian kernel.
Antisense transcripts were identified starting with the highest cumulative pileup value within a gene’s ORF on the antisense strand. The antisense TSS and TES were each identified by progressively searching upstream and downstream to identify the positions in which the cumulative pileup values are minimized (Supplemental Fig. 11A). Antisense transcripts were not called if they did not meet a minimum threshold.
For the 667 genes in which an antisense transcript could be called (Supplemental Fig. 11B), nucleosome disorganization and promoter occupancy measures were computed, as previously described on the sense strand, relative to the antisense TSSs.
Transcript level prediction model
Gaussian process regression models were constructed to predict the log2 transcript level for each time point using the log2 transcript level at time 0, features of the chromatin at 0 minutes, and features of the chromatin for the time being predicted.
Four models were constructed to compare various combinations of measures of the chromatin: a small fragments promoter occupancy model, a gene body nucleosome disorganization model, a combined chromatin model, and a full model incorporating all previous models’ features with the addition of +1, +2, and +3 nucleosome position shift relative to 0 min (Supplemental Fig. 10) and measures of chromatin relative to called antisense transcripts (Supplemental Fig. 11).
Each Gaussian process regression model developed using scikit-learn (Pedregosa et al. 2011) with a radial-basis function (RBF) kernel with length scale bounded between 0.1 and 100 and a white kernel with noise level 10−4 as priors for covariance. The length scale bounds and noise parameters were determined empirically through a sensitivity analysis on a subset of the data.
Promoter occupancy and nucleosome disorganization measures were log transformed to an approximately normal distribution. Then, each chromatin measure, including nucleosome shift, was z-score normalized so that the RBF length parameter could be successfully approximated.
Performance for each model was evaluated using the coefficient of determination, R2, under 10-fold cross validation.
Acknowledgments
The authors would like to thank Sneha Mitra, Yulong Li, and Greg Crawford for providing suggestions and critical comments for the analyses in this manuscript.