Uncoupled evolution of the Polycomb system and deep origin of non-canonical PRC1

Polycomb group (PcG) proteins modulate chromatin states to silence gene transcription in plants and animals. Most PcG proteins function as part of distinct multi-subunit Polycomb repressive complexes (PRCs). Gene repression by the Polycomb system involves chromatin compaction by canonical PRC1 (cPRC1), mono-ubiquitylation of histone H2A (H2Aub1) by non-canonical PRC1 (ncPRC1) and tri-methylation of histone H3K27 (H3K27me3) by PRC2. Prevalent models for Polycomb repression emphasize a tight functional coupling between PRC1 and PRC2. However, whether this paradigm indeed reflects the evolution and functioning of the Polycomb system remains unclear. Here, we examined the relationship between cPRC1, ncPRC1 and PRC2 through a comprehensive analysis of their presence and evolution across the entire eukaryotic tree of life. We show that both PRC1 and PRC2 were present in the Last Eukaryotic Common Ancestor (LECA), but that their subsequent evolution is uncoupled. The identification of orthologs for ncPRC1-defining subunits in unicellular relatives of animals and of fungi suggests that the origin of ncPRC1 predates that of cPRC1, and we develop a scenario for the evolution of cPRC1 from ncPRC1. Our results demonstrate the independent evolution and function of PRC1 and PRC2 and show that crosstalk between these complexes is a secondary development in evolution.


117
2022; Déléris et al. 2021). In addition, a phylogenetic screening has shown that PRC2 is conserved 118 across major eukaryotic groups and is likely present in the last eukaryotic common ancestor (LECA) 119 (Sharaf et al. 2022). However, PRC1 has not been studied as extensively in non-animal lineages, 120 and therefore, whether PRC1 and PRC2 co-evolved remains unresolved.

121
In the present study we used highly sensitive and comprehensive phylogenetic profiling to 122 analyze the evolutionary history of PRC1 and PRC2 across the available scope of genomes 123 covering the eukaryotic tree of life. Our analysis suggests that both PRC1 and PRC2 were present 124 in LECA. Importantly, we found that while their intra-complex subunits evolved cohesively, PRC1 125 and PRC2 evolve independently. Furthermore, we discovered previously unreported ncPRC1-126 defining subunits in the relatives of animals and fungi, indicating that the differentiation of ncPRC1 127 occurred before cPRC1. These findings provide a foundation for future research on the biological 128 functions of these proteins in a wider range of eukaryotes and in areas where PRC1 is 129 understudied.

133
To study the evolution of PRC1 beyond animals and plants, we traced the presence of 134 PRC1 core subunits, RING1 and PCGF, in a diverse collection of 178 eukaryotes that covers all 135 known major eukaryotic groups (SI Data and Methods and External Dataset S1). Both RING1 136 and PCGF consist of an N-terminal zf-RING domain and a C-terminal RAWUL domain (Fig.1B)

151
This approach enabled us to identify 69 and 67 species with high-confidence orthologs for 152 RING1 and PCGF, respectively. We considered an ortholog of high-confidence if it clustered 153 monophyletically with the other sequences and contained both the zf-RING and RAWUL domain 154 as predicted by our sequence models and AlphaFold2 (Fig.2, SI Fig.S1A,B, Fig.S2). Additionally,

155
we detected 15 species with a putative RING1-and ten species with putative PCGF orthologs, 156 containing a zf-RING-, but lacking a recognizable RAWUL domain (SI Fig.S1A,B, Fig.S2). The 157 phylogenetic trees of RING1 and PCGF contained multiple groups of sequences from species that 158 are part of all major eukaryotic groups in which the deep duplication nodes displayed large species 159 overlap, which is a defining feature of orthologous groups in a gene phylogeny. The subsequent 160 speciation node can thus be identified with high reliability as a LECA presence (Fig.2, 3A, S1A,B).

161
The increased sensitivity of our approach is illustrated by the detection of a previously unknown 162 RING1 ortholog in the diatom (Stramenopile) Phaeodactylum tricornutum. In addition, we were able 163 to detect a PCGF ortholog in Arabidopsis thaliana that was missed in a recent study (Grau-Bové et

Intra-complex evolution of PRC1 and PRC2 is cohesive while inter-complex evolution is
The evolution of RING1 and PCGF is largely coupled, i.e. these core PRC1 subunits were 256 mostly lost or retained together during eukaryotic evolution (Fig.4A). Likewise, the genes encoding 257 the core of PRC2 were typically lost or retained together. This suggests that both PRC1 and PRC2 258 function and evolve as cohesive units. To obtain a quantitative measure of their cohesiveness 259 throughout eukaryotic evolution, we calculated the average Pearson's correlation coefficient 260 between the phylogenetic profiles of the subunits of PRC1 and PRC2 (Fig.4B). We found that the 261 subunits of the PRC2 core (EZH, EED, and SUZ12) had a high average correlation (r = 0.73), as 262 did the core subunits of PRC1 (RING1 and PCGF, r = 0.7) (SI Fig.S4). This confirms that these 263 subunits tend to evolve together within their resident complexes. As expected given its participation

279
PRC2 might turn-out to be even higher than presented in this study. The presence of either only 280 PRC1 or only PRC2 was particularly noticeable in Amoebozoans, fungi, Cryptista, Haptista, 281 Rhodophyta, Glaucophyta, Chloroplastida, and SAR, but we were unable to determine a consistent 282 trend of either complex being more likely to be lost or retained (Fig.4A). In conclusion, while PRC1

304
CBX harbors a chromodomain at its N-terminus and a CBX7_C domain at its C-terminus, while 305 RYBP contains a zf-RanBP domain at its N-terminus and a YAF2 domain at its C-terminus (Fig.5A).  in the nucleariid Parvularia atlantis, which is a close relative of fungi (Fig.6A). In contrast, we were 338 not able to detect orthologs of CBX outside of animals (SI External Dataset S4). Thus, it appears 339 that RYBP originated before CBX. Next, we performed a phylogenetic analysis of SAM domains 340 which are present in the PRC1-associated proteins PHC, SCML, SCMH, L3MBTL, SFMBT and MBTD (SI Data and Methods External Dataset S4). The SAM domains clustered together in a 342 monophyletic group, reflecting a shared ancestry (SI Fig.S1F), but could be separated further into 343 two distinctive groups; one that consisted mostly of proteins containing an MBT domain, and one 344 without such a domain (Fig.6B). We therefore next performed a phylogenetic analysis of MBT 345 domains with the aim to find orthologs outside of animals that would imply an earlier origin for these

358
In summary, our collective results suggest that the ancestor of animals and fungi already 359 harbored RYBP and a L3MBTL-associated ortholog (Fig.7). We propose a scenario in which RYBP 360 was lost in fungi and Filasterea, but retained in Nuclearidae, Choanoflagellata and animals.

361
Furthermore, we hypothesize that CBX arose via a gene duplication of RYBP at the root of animals, 362 after which the zf-RanBP was replaced by a chromo-domain (Fig.7). Similarly, our data suggests

378
imply that PRC1 and PRC2 were both present in LECA, but that their subsequent evolution is 379 uncoupled. Sensitive analyses allowed us to identify highly diverged orthologs, which previously

394
Ever since their discovery, the relationship between PRC1 and PRC2 has been debated.

395
The binding of CBX subunits of cPRC1 to H3K27me3, and the recognition of H2AK119ub1 by

489
To study the evolutionary history of the PRC1 and PRC2 subunits, a general homolog 490 detection and phylogenetic inference method was initially applied and iterated upon if needed.

513
These custom profiles were used to annotate sequences included in the respective phylogenetic 514 trees, which enabled the detection of proteins domains that were previously reported to be absent, 515 and therefore supported our orthologous inference. For some analyses, we also performed HMM 516 searches with full length profiles. The specific analyses for these subunits, and supporting data,

556
Scripts to remove splice variants from the proteome database are available upon reasonable 557 request to the corresponding author.