Transcription elongation is dictated by single residues in the histone core domain

The chromatin fiber is thought to suppress transcription in eukaryotes by acting as a structural barrier. However, once begun, transcription can readily proceed on chromatin, suggesting this model is insufficient. Here, we establish that the ultra-conserved core domain of the ancestral histone H2A.Z dictates transcription elongation via direct interaction of its loop 2 region with the RNAPII subunit Spt6, rather than biophysical changes to chromatin. Interrogating H2A.Z sequences representing more than a billion years of eukaryotic evolution in a single synthetic host, we show that Spt6 can distinguish even single-residue substitutions within their loop 2, driving either super-repressed or -activated transcriptional states. Our results place the histone core domain at the origin of eukaryotic gene expression, establishing its transformative power to shape transcription.

To address whether and how changes in the chromatin fiber impact transcription, we developed a synthetic complementation paradigm using the histone H2A variant H2A.Z as a model.H2A.Z is an essential feature of eukaryotic chromatin, and has been tightly linked to both transcription activation (2,3,26,32,38,47,49) and repression (11,33,48,(50)(51)(52)(53) in different species.We reasoned that we could leverage divergent observations of H2A.Z function across species as a natural toolkit to generate functional variation in the chromatin fiber.We interrogated H2A.Z sequence variation representing more than a billion years of eukaryotic evolution in a single experimentally tractable synthetic system, the fission yeast Schizosaccharomyces pombe, replacing the endogenous S. pombe H2A.Z (Pht1) with H2A.Z sequences from diverse eukaryotes.Remarkably, we identify that the RNAPII subunit Spt6 directly reads the primary sequence of the loop 2 (L2) region of the H2A.Z core domain, with even single amino acid changes to its core motif being sufficient to dictate transcription elongation.Together, our findings indicate that an ancient interaction between H2A.Z and Spt6 informs RNAPII directly of the nature of the chromatin template, providing insight into its fundamental role in shaping eukaryotic gene expression, and establishing the nucleosome core as a powerful source of evolutionary innovation.

H2A.Z sequence encodes its transcriptional function
To build our toolkit of H2A.Z sequences, we first constructed a phylogenetic tree of all H2A sequences from diverse eukaryotes in the EukProt database (Fig. S1, S2A, S3; Table S1; Data File S1; Methods).As previously suggested from analyses of smaller datasets (40,57), we found that H2A.Z forms a monophyletic clade that spans nearly the entirety of known eukaryotic diversity (Fig. S2B, S3A).Almost all sequences in this clade, but very few outside of it, harbor the signature DEELD motif located in a C-terminal region of H2A.Zs referred to as the docking domain (DD) (40).A small cluster of sequences, mostly from the unicellular flagellate Carpediemonas membranifera (phylum Metamonada), branch just outside the H2A.Z clade, contain a near-perfect, but nevertheless variable, DEELD motif and might represent rare, divergent forms of H2A.Z (Fig. S2B).Other H2A variants either have a more restricted phylogenetic distribution (e.g.macroH2A and H2A.W) or cannot be traced to a single ancestral sequence, appearing paraphyletic (e.g.H2A.X) (Fig. S3).Importantly, we found H2A.Z orthologs in nearly every eukaryotic genome, even in deep-branching clades like Metamonada that were previously thought to lack H2A.Z (Fig. S2B) (58).Thus, the phylogenetic distribution points to H2A.Z being present in the last eukaryotic common ancestor (LECA), and broadly retained throughout the evolution of eukaryotes.These findings support the view that H2A.Z is the only H2A variant whose ancestry can be traced back to the LECA.
We selected nine H2A.Z sequences from across the phylogenetic tree (Fig. S2B and S4A, Data File S2).The sequences were selected to maximize the number of eukaryotic supergroups represented, and their total sequence diversity.Although they span more than a billion years of evolution, these representative H2A.Z proteins were ~65-90% identical in their core domain (excluding highly divergent N-and C-terminal tail regions; Fig. S4B), highlighting their high degree of conservation.
To test whether transcription is impacted by sequence diversity in these nine representative H2A.Z proteins, we synthesized S. pombe codon-optimized versions of the nine H2A.Z ortholog coding sequences, and generated yeast lines in which the endogenous gene encoding H2A.Z (pht1) was replaced by one of these orthologs (Fig. 1A).We then assessed whether these replacement lines phenocopied the wild-type (WT) gene, resembled a H2A.Z deletion mutant (pht1 -), or differed from both.
We assessed global transcriptional changes in the replacement lines by using strandspecific mRNA-seq.Consistent with prior reports (2), pht1 -cells exhibited modest transcriptional changes, affecting ~10% of detected transcripts (Fig. S4C).By contrast, many H2A.Z replacement lines showed changes in ≥20% of detected transcripts when compared with WT (Fig. S4C).To investigate this further, we examined how similar the transcriptomes of the nine H2A.Z replacement lines were to either WT or pht1 -using principal component analysis.
Replacement lines differed markedly both from WT and pht1 -(Fig.1B), suggesting that H2A.Zs from different species are not simply gradations of total loss of function alleles, but give rise to sequence-specific gain-of-function states.
We next asked whether transcriptional variation in H2A.Z from different species had consequence at the level of phenotype.To do this, we assessed the growth of H2A.Z replacement lines using a plate-based assay in the presence of stress conditions targeting metabolism (rapamycin (59)), protein folding (low-dose guanidinium hydrochloride (60)), cell wall biogenesis (the anti-fungal drug clotrimazole (61)), fermentation (ethanol (62)), or either directly (doxorubicin (63), formamide (64)) or indirectly (caffeine (65)) transcription and chromatin (Fig. 1C).Consistent with functional variation at the level of transcriptomes, the growth of all nine replacement lines differed substantially from WT and pht1 -cells across all tested conditions (Fig. 1C), with most lines growing more slowly than pht1 -cells.Notably, greater sequence divergence of replacement orthologs from S. pombe Pht1 correlated with weaker growth in several stress conditions, with the strongest correlation being for conditions known to be linked directly to transcription (doxorubicin, formamide) or stresses known to require a transcriptional response (ethanol ( 62)) (Fig. 1D, S4D).By contrast, we saw no strong correlation between the level of H2A.Z expression and growth (Fig. 1D, S4D), indicating that these effects were not attributable to differential expression of H2A.Z but likely were related to differences in the primary sequence.Together, these data link primary sequence variation in H2A.Z from different species to variation in its function and confirm a tight linkage between H2A.Z and transcription.

The L2 region of H2A.Z determines its function
Histone proteins are composed of a core histone-fold domain, characterized by a nearly sequence-invariant bundle of three alpha helices (  and ), bounded by mostly unstructured N-and C-terminal tails and separated by two unstructured loop regions, L1 and L2 (Fig. 2A) (1).To pinpoint which sequence domains of H2A.Z are responsible for changes in transcriptional output across the set of orthologs tested above, we first examined sequence conservation across the full H2A.Z dataset (Fig. S3), calculating sequence identity along the length of H2A.Z using a 3-residue sliding window average (Fig. 2A).As expected, we observed high conservation of the core alpha-helical bundle and relatively lower conservation of the tails and the L1 (Fig. 2A).Surprisingly, however, we found that parts of the L2 region, specifically around the core motif KDLKVK, were as highly conserved as the core -helices (Fig. 2A).
This was especially striking when compared with the H2A variant H2A.X, which was notably less conserved in this region.
The unexpectedly high sequence conservation of the L2 region suggested it might play a role in determining the global changes in transcription we observed in the replacement S. pombe lines.To test this, we engineered a series of chimeric histone genes whereby the N, C, L1, L2, or docking domain (DD) (40) regions of S. pombe pht1 were substituted into histone H2A (hta1) (Fig. 2B, S5A).We expressed each of these chimeric genes by knocking them into the pht1 locus and assayed their ability to complement the pht1 -null allele by measuring the growth of the chimeric lines under selected stress conditions based on prior experiments with the representative full-length H2A.Z replacement lines (Fig. 1C).The chimeric histone genes fell into two groups across all phenotypes tested: those that partially complemented the pht1 - mutant (those encoding H2A/Z DD , H2A/Z N , and H2A/Z L1 ) and those that fully complemented the pht1 -mutant in all tested conditions (those encoding H2A/Z L2 and H2A/Z C ), making them indistinguishable from WT (Fig. 2C).These data indicate that both the L2 region and the Cterminal tail play key roles in determining H2A.Z function.
To investigate whether sequence variation in the L2 region of H2A.Z alone is sufficient to explain the transcriptional differences in our full-length H2A.Z replacement lines, we looked for a correlation between transcriptional divergence and sequence divergence from WT S. pombe in any of the structurally defined primary sequence regions of H2A.Z.Although the short L2 affords limited resolution, we observed a strong correlation between L2 sequence divergence and transcriptional divergence ( = 0.63, p < 10 -3 , Fig. 2D).By contrast, the sequence divergence of the C-terminal tail, which fully complemented the pht1 -mutant in growth assays, did not correlate with transcriptional divergence ( = 0.056, p = 0.8, Fig. 2D).
There was also a correlation between the sequence divergence of the N-terminal tail and transcriptional divergence ( = 0.5, p < 10 -2 ; Fig. S5B).This was likely due to co-variation with the L2 sequence, as there was a tight correlation between sequence variation in these two regions ( = 0.77, p < 10 -5 , Fig. S5B).Together, these data show that the L2 region of H2A.Z is an important contributor to its function.

Transcription elongation factor Spt6 binds H2A.Z L2
The L2 region of H2A.Z is largely solvent accessible (55), particularly during transcription when the DNA is unwrapped from the nucleosome (19,20).We therefore hypothesized that L2 might impact transcription by physically interacting with proteins involved in transcription, rather than by influencing histone-DNA contacts, as is the case for the L2 region of histone H3 (1,66).To identify proteins in the S. pombe nuclear proteome that might interact with the L2 region of Pht1, we used AlphaFold2 Multimer (67) to predict physical interactions.We first performed a virtual interaction screen of WT Pht1 L2 against the S. pombe annotated proteome (68, 69) (Fig. 3A), identifying 247 proteins with an interface predicted template modelling (IPTM) score threshold of >0.5 (Table S2).To further refine potential interactors, we reasoned that L2 interactors responsible for driving functional variation in transcriptional output must vary in their interaction in accordance with their L2 primary sequence.We thus selected 17 unique single-residue variants of the L2 core motif from our database of H2A.Z sequences (Fig. S3, Data File S1), performing an AlphaFold2 virtual interaction screen with correspondingly mutated Pht1 L2 sequences against a subset of 500 nuclear proteins based on their IPTM score with WT Pht1 L2 (Fig. 3B, S6A, Table S3).To assess differences in predicted binding across Pht1 mutants, we then calculated the variance in IPTM score for each candidate interactor protein (Fig. 3C).In most cases, the variance was very low, suggesting that the candidate interacting proteins do not respond to variation in L2 (Fig. 3C, S6A, Table S3).Among the candidate interactors with the highest variance, the transcription elongation factor Spt6 (Fig. 3D-E, Table S3) was the only component of RNAPII (70,71).It interacts with nucleosomes (72-74), had highly robust folding models across multiple simulations (Fig. S6B), and closely matched experimentally derived structures (Fig. S6C).
We first experimentally validated the AlphaFold2-predicted interaction between Pht1 and Spt6, using glutathione-S-transferase (GST) as negative control.H2A was previously reported to bind to Spt6 and was used as positive control (75).We expressed and purified recombinant S. pombe GST-tagged Pht1 and Hta1, and His-tagged Spt6 in Escherichia coli (Fig. 3F) and assessed binding in vitro to nickel resin (binds the His tag).Pht1 bound Spt6 (Fig. 3G), confirming the Pht1-Spt6 interaction.Further, we noted a qualitative difference in the amount of Hta1 and Pht1 bound to Spt6, suggesting that the two histone proteins might interact differently with Spt6.To verify this finding, we repeated Spt6-Pht1 pulldown experiments under more stringent binding conditions, where the amount of bound Pht1 was considerably less than that of Hta1 (Fig. S6D), reinforcing the suggestion Hta1 and Pht1 may interact with Spt6 differently.Together, these data establish a physical interaction between Spt6 and Pht1.
To test whether the L2 region was sufficient to mediate the Pht1-Spt6 interaction, we synthesized a biotinylated peptide corresponding to this region of S. pombe Pht1 and assayed its binding to His-tagged Spt6 (Fig. 3H).As a negative control, we used a peptide corresponding to the docking domain (DD) of S. pombe Pht1, which is adjacent to the L2 region and important for the interaction of Pht1 with H3 and H4 (55) and Swr1-C (76).Whereas the DD peptide did not bind Spt6, the peptides corresponding to the L2 region of Hta1 or Pht1 bound Spt6 (Fig. 3I), consistent with the results of our experiments with the full-length proteins.These data establish that the L2 region alone is sufficient to explain the interaction of Pht1 with Spt6.
To test whether sequence variation in the L2 region of H2A.Z alters its binding to Spt6, we synthesized L2 peptides from five of the original set of representative H2A.Z sequences and assayed their binding to S. pombe Spt6, as above.Consistent with the sequence of L2 determining its interaction with Spt6, we observed variable amount of each peptide bound to Spt6 across species (Fig. 3I), with A. godoyi, T. globosa, and P. marinus H2A.Z L2 peptides having the greatest qualitative difference relative to S. pombe Pht1 L2.Although there were clear differences between peptides derived from different species, we did not observe a strong association between Spt6-binding and sequence similarity (Fig. 3I), suggesting that specific sequence features of the peptides, rather than overall sequence divergence, must govern their association with Spt6.Indeed, when we compared the sequence of the three species' L2 peptides that had discernable differences relative to Pht1 to those that did not, we noted that all of them had substitutions within the L2 core motif (Fig. S6F), especially at position L83.
To investigate the role of sequence variation in the core motif around L83 in determining the binding of Pht1 L2 to Spt6, we further examined AlphaFold2 models.Pht1 was predicted to bind to the Spt6 death-like domain (DLD) (Fig. 3D).This domain, whose function has remained uncharacterized, is the most highly conserved feature of the Spt6 surface (77), suggesting an important functional role.To assess this further, we gathered Spt6 sequences from the same EukProt database we used previously to identify H2A.Zs, calculating sequence conservation of the DLD across known eukaryotic diversity (Fig. 3J).In the DLD large numbers of hydrophobic residues were not only highly conserved in absolute terms (Fig. 3J), but also relative to other hydrophobic residues in Spt6 (p < 0.005, Wilcoxon test; Fig. S6F).
These residues were organized into a hydrophobic groove, many of which contacted the Pht1 L2 core motif in AlphaFold2 models (Fig. 3E, S6E).Further, these contacts were predicted to vary according to the H2A.Z L2 sequence, both when examining models for L2s from different species (Fig. S6E), as well as Pht1 mutants to the L83 position we identified earlier (Fig. 3E), suggesting a rationale for why sequence variation at this Pht1 position might affect binding.
To test the contribution of Pht1 L2 sequence variation, particularly at position L83, to the Pht1-Spt6 interaction in the context of the DLD hydrophobic groove, we synthesized three mutated Pht1 L2 peptides with either more hydrophobic (L83F) or more hydrophilic (L83M, L83N) substitutions and tested their binding to Spt6.The more hydrophilic L83N substitution, which corresponds to Hta1 residue N75, does not recapitulate the binding of full-length Hta1 (Fig. 3D), reinforcing our earlier conclusion that numerous other primary sequences differences in Hta1 (Fig. S5A) must contribute to its binding.Whereas L83F mutant L2 peptide bound Spt6 similarly to WT Pht1 peptide, Spt6 bound both more hydrophilic substitutions (L83M/N) to a much lesser extent than WT Pht1 L2 (Fig. 3I).These relative binding abilities corresponded to their hydrophobicities, and are thus consistent with AlphaFold2 predictions that the L2 region of Pht1 interacts with the hydrophobic groove of the DLD of Spt6 (Fig. 3D).
Together, these findings suggest that the hydrophobic L2 region of H2A.Z determines its binding to the DLD of Spt6.

H2A.Z L2 encodes transcription elongation states
Since Spt6 is a transcription elongation factor (70,71), we hypothesized that sequence variation in the L2 region of H2A.Z might be sufficient to affect transcription elongation, thereby explaining species-specific differences in reported H2A.Z function.To test this, we first developed a growth-based reporter assay using the uracil biosynthesis gene ura4.In this assay, the amount of transcription of ura4 is directly related to the amount of growth in media lacking uracil (Fig. 4A).Assaying a growth-based reporter has the advantage that it can be used to test multiple conditions and it is not easily confounded by global transcriptional changes.
We first confirmed that our reporter gene accumulates Pht1 by ChIP-qPCR (Fig. S7A).We then introduced point mutations corresponding to those found in the L2 core motif of our nine representative H2A.Z orthologs (Fig. S4A, S6F) into the endogenous pht1 (H2A.Z) gene in S. pombe (Fig. 4B).Whereas pht1 -cells gave rise to a modest increase in reporter expression (Fig. 4C), supporting a repressive function of H2A.Z in S. pombe as in some other model systems (11,48,50,53), half of L2 mutants had a significant increase (by ~1.3-1.5 fold for L83F, L83N and L83Q) or decrease (by ~2 fold for L83M) in reporter expression.These results were not confounded by general fitness defects in the mutant strains (Fig. S7B), and all the mutants were epistatic to Swc2, an essential subunit of the Swr1-C chromatin remodeling complex, which incorporates H2A.Z into nucleosomes (78, 79) (Fig. S7C).Together these results establish that reporter expression differences are driven by Pht1 presence at the reporter gene and suggest that L2 sequence variation is sufficient to impact transcription.
We next tested whether differences in reporter expression across Pht1 mutant lines were specifically attributable to transcription elongation through two orthogonal approaches.First, focusing on the four Pht1 mutant lines with the strongest differences in reporter expression relative to WT (L83M, L83N, K86Q, and L83F), we made use of the drug mycophenolic acid (MPA) to specifically interfere with RNAPII elongation by lowering cellular GTP levels (80).
In this experiment, if a mutant has increased or decreased RNAPII elongation relative to WT, then it should be resistant or sensitive to MPA, respectively.Treatment of WT cells with MPA caused a ~40% reduction in growth relative to the control (Fig. 4D).By contrast, pht1 -and the Pht1 mutants L83N, K86Q, and L83F were more resistant to MPA than the WT (~25-35% reduction in growth; Fig. 4D), consistent with increased transcription elongation in these lines.
By contrast, the single mutant that had decreased reporter expression (L83M) was also the most sensitive to MPA treatment (~60% reduction, padj < 10 6 , Fig. 4D), confirming its reduction in transcription elongation activity.Together, these data establish that Pht1 L2 mutants are sufficient to impact elongating RNAPII.
Finally, as an orthogonal measure to our reporter-based assays, we probed the impact of Pht1 L2 mutants on transcription elongation by measuring nascent transcription using precision run-on sequencing (qPRO-seq) (81,82).Sequencing nascent transcripts allowed us to position RNAPII in the transcriptional unit and to distinguish between promoter-proximal paused RNAPII and gene body-located RNAPII participating in active elongation across all expressed genes, albeit without accurately quantifying specific parameters of RNAPII activity.Consistent with results of our reporter assays, differential expression analysis of gene body-located qPROseq counts in the WT and pht1 -lines showed a modest increase in active elongation by RNAPII at ~100 genes in pht1 -cells (Fig. 4E).Similar analyses of the Pht1 point mutant lines identified three classes of mutant.In one class, comprising half of the tested mutants (L83M, L83G, A80S, and V85A), we observed essentially no change in RNAPII elongation, when compared with WT at the level of differentially transcribed genes (Fig. 4E).Notably, this class included the single super-repressive L2 mutant (L83M), suggesting it is likely linked to Spt6's function as part of transcription termination (70).The second class comprised a single mutant (K84H), which had a modest increase in RNAPII engaging in elongation, very similar to that of pht1 - (Fig. 4E).The third class comprised three mutants (L83N, K86Q, and L83F) in which we observed a robust increase in RNAPII engaging in elongation at ~50% more genes than in pht1 - (Fig. 4E), consistent with their impact on reporter gene transcription.Together, these data establish that sequence variation in the core L2 region of H2A.Z is sufficient to transform its transcriptional impact on the genome.
For much of the past 50 years, investigations into variation in the core histone fold have focused on its interaction with DNA and the stability of the nucleosome (26,41,42,(83)(84)(85)(86)(87).
Our findings, by contrast, highlight the importance of the histone core domain for genomic regulation through protein-protein interactions (66,88,89).Given the extreme depth of conservation of not only H2A.Z and Spt6, but also the entire chromatin and transcription apparatus across eukaryotic evolution (90), it seems likely that the nucleosome core of transcriptional regulation originated with eukaryogenesis and diversified with histone variants.
That we uncover sequence variation in just seven residues of the L2 core motif of H2A.Z can give rise to enormous phenotypic complexity at the level of transcriptional output and beyond, it is likely that many protein-protein interactions with the core domain of other histone variants remain to be discovered.These will not only broaden our understanding of the importance of the evolution of the nucleosome, but also, given the prevalence of transcription elongation defects in cancers (91,92), provide a new targets to address human disorders.

Figure 1 .
FIGURES AND FIGURE LEGENDS 571

Figure 2 .
Figure 2. The L2 region of H2A.Z is a key determinant of its function.A, Mean pairwise

Figure 3 .
Figure 3.The L2 region of H2A.Z binds to transcription elongation factor Spt6. A,