Transposons repressed by H3K27me3 were co-opted as cis-regulatory elements of H3K27me3 controlled protein coding genes during evolution of plants

The mobility of transposable elements (TEs) contributes to evolution of genomes 1,2. Meanwhile, their uncontrolled activity causes genomic instability and therefore expression of TEs is silenced by host genomes 3,4. TEs are marked with DNA and H3K9 methylation that are associated with silencing in flowering plants 5, animals, and fungi 6. Yet, in distantly related eukaryotes TEs are instead marked by H3K27me3 deposited by the Polycomb Repressive Complex 2 (PRC2) 7–11, an epigenetic mark associated with gene silencing in multicellular eukaryotes 12–15. It was therefore proposed that the ancestral activity of PRC2 was the deposition of H3K27me3 to silence TEs 16. To test this hypothesis we obtained mutants deprived of PRC2 activity and used genomics to analyze the role of PRC2 in extant species along the lineage of Archaeplastida. While in the red alga Cyanidioschyzon merolae more TEs than genes were repressed by PRC2, an opposite trend was observed in bryophytes Marchantia polymorpha and Anthoceros agrestis. In the red alga, TEs silenced by H3K27me3 are in subtelomeres but in bryophytes, TEs and genes marked by H3K27me3 form coregulated transcriptional units. The latter trend was also observed in the flowering plant Arabidopsis thaliana, and we identified cis-elements recognised by transcription factors in TEs flanking genes repressed by PRC2. Together with the silencing of TEs by PRC2 in ciliates that diverged early from an ancestor common with Archaeplastida, our findings support the hypothesis that PRC2 deposited H3K27me3 to silence TEs in early lineages of eukaryotes. During evolution, TE fragments marked with H3K27me3 were selected to shape transcriptional regulation that control networks of genes regulated by PRC2. Highlights H3K27me3 marks a decreasing proportion of TEs during evolution of plants The polycomb repressive complex 2 represses TEs in red algae and bryophytes H3K27me3-marked TEs in flowering plants contain transcription factor binding sites Transcription factors bind TEs and regulate networks of genes controlled by PRC2


19
, and confirmed that 5mC levels of the nuclear genome are not higher than the background levels measured on chloroplast and mitochondria DNA ( Figure S1B). We conclude that it is unlikely that DNA methylation has a role in TE silencing and, furthermore, H3K27me3 is more broadly associated with TEs than with genes in C. merolae.
To explore the role of H3K27me3 in transcriptional repression of TEs in C. merolae, we disrupted the only ortholog of E(z), CmE(z) (CMQ156C) (Figures S1C and S1D). Two independent loss of function alleles Cme(z)-1 and Cme(z)-2 exhibited a near complete loss of H3K27me3 with a concomitant reduction of H3K27me1 levels, but no decreased levels of H3K9me1 ( Figure 1B), suggesting that CmE(z) deposits H3K27me3 but not H3K9me1. The low levels of H3K27me1 in the mutant are likely explained by demethylation of H3K27me3 by JUMONJI demethylases that remain to be characterized in C. merolae. A comparison of transcriptomes of Cme(z)-1 and wild type showed that expression levels of 0.4% (19) and 0.8% (44) protein coding genes (PCGs) out of a total 4984 PCGs decreased and increased, respectively (Q.value < 0.05, |log2 fold change| > 1; Figures 1C -1E).  Figures 1E and 1F). More than 60% (122) of all TEs showing increased expression in Cme(z)-1 were covered by H3K27me3 in the wild type ( Figure 1F), supporting that PRC2 represses the expression of TEs. We defined PCGs and TEs, which exhibited both increased expression in Cme(z) and were covered by H3K27me3 in the wild type as direct targets of PRC2 (26 PCGs and 122 TEs). We observed no enrichment of these PRC2 direct targets in a specific TE family ( Figure S1A).
Most TEs repressed by PRC2 were located in subtelomeric chromosomal regions ( Figure 1G). By contrast, indirect targets of PRC2 that were misexpressed were scattered along chromosomes ( Figure   1G). We concluded that in C. merolae PRC2 deposits H3K27me3 and represses expression of TEs preferentially in subtelomeric regions.

Association of H3K27me3 marks on TEs are conserved among bryophytes
In bryophytes, that diverged from vascular plants ca. 500-460 MYA and comprise hornworts, liverworts and mosses 20 , a large fraction of TEs are covered by H3K27me3 in the liverwort Marchantia polymorpha 9 while TEs in the model moss Physcomitrium patens are covered mostly by H3K9me2 21 . To address the conservation of the association of H3K27me3 with TEs among bryophytes, we used the model hornwort Anthoceros agrestis that diverged from bryophyte ancestors before the divergence of liverworts from mosses 22 . We annotated TEs in A. agrestis Oxford strain and identified 90,809 TEs including 990 intact TEs belonging to various TE families ( Figure S1E). By contrast with C. merolae, we observed a much larger diversity of TE families and half of intact TE were retrotransposons. This enrichment of LTR TE families was also observed in TEs longer than 500 bp (total 26,070 TEs including 923 intact TEs) ( Figure S1E) and thus we analyzed separately longer TEs and shorter TEs. Using chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq), we obtained genomic profiles of five histone post translational modifications (PTMs) (H3K4me3, H3K36me3, H3K9me1, H3K27me1 and H3K27me3) and H3 from four week old vegetative tissue of A. agrestis (see 23 for a general overview of the chromatin of A. agrestis). We performed k-means clustering of chromatin marks over TEs, and defined five major clusters of TEs showing different chromatin environments (Figure 2A). Clusters 2 and 3 contained 39% and 25% of all TEs and were covered with H3K9me1 and H3K27me1 ( Figure 2A). Cluster 1 comprised 3% of all TEs that were not covered with H3K9me1, H3K27me1 and H3K36me3, but with H3K4me3 and H3K27me3. We did not find a preferential association between TEs from cluster 1 and specific TE families. More than 80% of TEs in cluster 1 are located close to genes belonging to PCG cluster 1 that are also covered with H3K27me3 ( Figure 2C). According to their chromatin environment, short TEs clustered in a fashion similar to long TEs ( Figure S2B) and the short TEs covered with H3K27me3 and H3K4me3 were also associated with PCGs sharing the same chromatin environment ( Figure   S2B). Similarly, we observed the association of 5395 TEs with PCGs covered by a chromatin landscape dominated by H3K27me3 in M. polymorpha (TE Cluster 1 and PCG Cluster 1 Figure 2B, S2B 9 ). We concluded that the association of H3K27me3 with TEs is conserved in liverworts and hornworts. Because hornworts are sister to the other two groups 22 , the association of TEs with H3K27me3 would be ancestral to the bryophyte lineage.

PRC2 represses expression of TEs in M. polymorpha
To test whether PRC2 silences TEs in bryophytes, we used M. polymorpha which is amenable to genetic manipulation 24 . To disrupt the function of PRC2, we focused on the ortholog of the PRC2 catalytic subunit Enhancer of zeste, E(z) that is encoded by three E(z) paralogs in M. polymorpha genome 25 . Only MpE(z)1 is expressed in vegetative gametophytic tissues of M. polymorpha 26 . A knock-down of MpE(z)1 showed reduced growth and necrosis of tissues that hindered further analysis 27 . We observed that genes encoding the homeodomain transcription factors KNOX and BELL which are not expressed in the vegetative tissues of the gametophyte were covered by H3K27me3 ( Figure   S3A). Since these transcription factors are essential in determining life phase transitions in plants 28-31 , we hypothesized that their mis-expression in Mpe(z)1 mutant could be responsible for the lethality observed in the knock down of MpE(z)1. Therefore, we disrupted MpKNOX2 (see STAR methods for details) to obtain null Mpknox2-1 alleles in Tak-2 female wild-type strain. In this mutant background we generated two knockout alleles of Mpe(z)1, Mpknox2-1 Mpe(z)1-1 and Mpknox2-1 Mpe(z)1-2 ( Figures S3B and C). We also used a construct expressing two guide RNAs targeting MpKNOX2 and MpE(z)1 to obtain the combination of two additional alleles Mpknox2-2 Mpe(z)1-3 in the Tak-1 male wild-type strain ( Figures S3B and S3C). While Mpknox2 single mutants exhibited no developmental defects during the vegetative growth phase, Mpknox2 Mpe(z)1 double mutants exhibited slower thallus growth but with a morphology similar to the wild type ( Figures S3D and S3E). This supported that misexpression of mpKNOX2 was responsible for the lethality observed in the Mpe(z)1 null mutant.
Protein gel blot analyses using isolated nuclei from 14-day-old thalli indicated that compared with the wild type, H3K27me3 was undetectable in Mpknox2-1 Mpe(z)1, while the levels of H3K9me1 and H3K27me1 were not reduced in Mpknox2-1 Mpe(z)1 ( Figure 3A). In addition the levels of the three post transcriptional modifications analyzed did not change in wild type and  Figure 3C). We confirmed the increased expression of several TE fragments and intact TEs using quantitative real time RT-PCR analysis ( Figure S3H). We observed that TEs exhibiting increased expression in the Mpe(z)1 mutant were primarily marked by H3K27me3 (TE Cluster 1, Figure 3D).
These TEs belonged to mutator DNA transposons and other uncategorised TE families ( Figure 3E and Fischer's test). Because TEs and PCGs are interspersed in the genome of Marchantia, we investigated whether PRC2 coregulated TEs with their closest neighbor PCGs. We observed that three quarters of TEs from the TE cluster 1 were located close to genes from the PCGs cluster 1 ( Figures 2E and S2B).
Importantly, there was a significant enrichment of pairs of up-regulated TEs and neighbor PCGs ( Figure 3F). We conclude that in M. polymorpha, PRC2 represses transcription from TEs. These are usually TEs associated with a PCG that is also repressed by PRC2.

In A. thaliana TEs covered by H3K27me3 contain transcription factors binding sites
In flowering plants TEs are primarily marked by H3K9me1/2 but in mutants devoid of DNA methylation H2K27me3 becomes associated to TEs, suggesting that a link between TEs and PRC2 is masked by the presence of marks of constitutive heterochromatin. Based on a comprehensive analysis of chromatin states in A. thaliana we established that 11% of TEs are covered by H3K27me3 mark ( Figure S4A and S4B, Cluster 6; and clustering details in 32 ). These TEs belong primarily to DNA and rolling circle (RC) transposon families in contrast to TEs from other clusters 1 -3 that are covered with H3K9me1 and H3K27me1 ( Figures S4C and S4D). Most of the TEs from cluster 6 were located nearby PCGs which are also covered by H3K27me3 ( Figure S4E), suggesting that those TEs would have been co-opted to cis-regulatory motifs for nearby PCGs. To test whether these TEs could function as cis-regulatory elements controlling the repression of contiguous PCGs, we searched for binding of transcription factors in all TEs in A. thaliana and calculated enrichment of TF binding from publicly available ChIP-seq experiments in each TE cluster ( Figure 4A). We observed a higher occupancy of TF binding in the TEs from cluster 6, which are marked with H3K27me3, than other TEs clusters ( Figure 4A). This suggested H3K27me3-marked TEs function as cis-regulatory elements.
Moreover, this bias was even clearer when we considered the functional TF binding events associated with the co-expression of the neighboring gene ( Figure 4A). By contrast TF binding events were not found in TEs from clusters 1 -4, which are covered with constitutive heterochromatin (H3K9me1/2 and H3K27me1). Altogether these observations suggested that TFs associated with TEs are important for the regulation of neighboring genes. Among TFs identified enriched in TEs from cluster 6, we found four MADS-box containing TFs (APETALA1, SEPALLATA3, AGAMOUS-LIKE15, and PISTILLATA), which are known to interact with PRC2 and control flower development ( Figure 4B) 33,34 . A high proportion of TEs marked with H3K27me3 was located near PRC2 regulated PCGs covered by the Chromatin Landscape 1, defined by enrichment in H2A.Z, H3K27me3 and H2AK121Ub, and excluded from actively transcribed genes associated with chromatin landscapes 7 -10 (defined in 32 ) ( Figures 4C and 4D). We thus concluded that TEs harboring H3K27me3 contain cis-elements bound by transcription factors involved in the regulation of neighboring genes by PRC2.
We also noticed a strong enrichment of TF binding in TEs from clusters 5 and 7 ( Figure 4A).
TEs from cluster 5 are short and marked by an hybrid chromatin state enriched in both H3K9me1 and H3K27me3 ( Figures S4A and S4D). They are bound by TFs including the MADS-box factor APETALA3, the bHLH factor GLABRA3 and the pioneer factor LEAFY 35,36 that are regulated by H3K27me3 and also control genes regulated by PRC2 and involved in flower development 33,37 .
The largest group of TFs enriched in TEs belong to the TE cluster 7 ( Figure 4A), which are short DNA and RC TEs, not occupied by marks of heterochromatin but with high accessibility at their boundaries ( Figure S4). These TEs were strongly associated either with PCGs associated with the Chromatin Landscapes 6 or 8 ( Figures 4C and S4E), suggesting that they also contain TF-binding cis-elements controling expression of neighbor PCGs. Overall our results suggest that in Arabidopsis a large number of TFs were domesticated and involved in activation (cluster 7) or PRC2-targeted repression (clusters 5 and 6) of PCGs.

Conclusions
In summary, we show that PRC2 silences TEs among diverse groups of Archaeplastida. In the red alga C. merolae, PRC2 deposits H3K27me3 on a majority of the TEs. Although the proportion of TEs covered by H3K27me3 in bryophytes and angiosperms is less prominent than in C. merolae, it is likely that the deposition of H3K27me3 on TEs by PRC2 is general in Archaeplastida. PRC2 also silences TEs in ciliates 7 and is associated with TEs in stramenopiles and fungi 16 allowing us to propose that PRC2 targeted TEs for silencing in LECA, the common ancestors of these groups. In the ciliate Paramecium aureliae the recruitment of PRC2, which deposits both H3K9me3 and H3K27me3 7 involves non-coding RNAs 8 in a manner reminiscent of the recruitment of the machinery that deposits H3K9me3 in fungi and animals 38 . In red algae H3K27me3-silenced TEs are associated with the telomeres which is reminiscent of PRC2 recruitment by TELOMERE REPEAT BINDING FACTORs (TRBs) at telobox motifs in angiosperms 39 . However we could not find homologs of TRBs in C. merolae suggesting that diverse mechanisms of recruitment of PRC2 have been selected to achieve silencing of TEs across eukaryotes.
It has been proposed that during evolution of eukaryotes remnants of TEs became

Declaration of interests
The authors declare no conflict of interest.

Annotation of TEs in C. merolae, A. agrestis and M. polymorpha
Transposable elements of C. merolae and M. polymorpha were annotated using EDTA 1.9.9 50 , which incorporates a bunch of tools including LTRharvest, LTR_FINDER, LTR_retriever, Generic Repeat Finder , TIR-Learner, MITE-Hunter, HelitronScanner, and RepeatMasker. All softwares are adjusted to EDTA with proper filters and parameters. Final non-redundant TE libraries are produced by removing nested insertions and protein-coding genes by EDTA customized scripts.

Re-analyses of DNA methylation in C. merolae
Bisulfite-seq data of C. merolae were downloaded from the sequence read archive of NCBI under the study PRJNA201680 19 . Reads were trimmed with Trim Galore (https://github.com/FelixKrueger/TrimGalore). Bisulfite converted reference genome was prepared from C. merolae ASM9120v1 genome sequence using Bismark 52 . Trimmed reads were mapped to the bisulfite genome using Bowtie2 option of Bismark. Duplicates were removed using deduplicate function in Bismark. Cytosine methylation reports were created from deduplicated reads using bismark_methylation_extractor function in Bismark.

Generation of Cme(z) mutant
To inactivate the CmE(z) (CMQ156C) gene, the chromosomal CmE(z) orf was replaced by the C. merolae URA5.3 selectable marker gene by homologous recombination as follows. All the primers used are listed in Table S1.
To obtain Cme(z)-1, the CmE(z) genomic region, which contained the CmE(z) orf and its 1-kb each of 5'-and 3'-flanking sequences, was amplified with the primers E(z)_KO_F1 and E(z)_KO_R1.
The amplified DNA was cloned into the vector pUC19 by In-Fusion Cloning Kit (Takara Bio Inc., Japan for PCGs and 1 kb for TEs with 1 kb upstream and downstream. Aggregate profile plots of matrices were plotted with plotProfile with k-means clustering. Cluster assignments can be found in ref.
Closest function in BEDTools v2.27.1 was used to define the closest TE and gene pair.

Generation of Marchantia PRC2 knockout mutant
All the primers used to generate M. polymorpha PRC2 knockout mutants are listed in Table   S1.

Transcriptome data analysis
Bam files of RNA-seq reads were sorted with SAMtools v1.9 56  were incubated with histone modification specific antibody in LiCor Odyssey blocking solution for 30 min at room temperature. Subsequently, the membrane was added and incubated overnight at 4°C.

TF analysis in Arabidopsis
Transcription factor peak BED files were obtained from ChIP-seq datasets compiled previously [72][73][74][75][76] . Each significant peak region in the genome was classified according to TE clusters 32 .
Only the center of the peak was considered to classify chromatin states. Enrichment for TF binding was performed as in 77