Insulator proteins contribute to expression of gene loci repositioned into heterochromatin in the course of Drosophila evolution

Pericentric heterochromatin in Drosophila is generally composed of repetitive DNA forming a transcriptionally repressive environment. Nevertheless, dozens of genes were embedded into pericentric genome regions during evolution of Drosophilidae lineage and retained functional activity. However, factors that contribute to “immunity” of these gene loci to transcriptional silencing remain unknown. Here, we investigated molecular evolution of the essential Myb and Ranbp16 genes. These protein-coding genes reside in euchromatic loci of chromosome X in D. melanogaster and related species, while in other studied Drosophila species, including evolutionary distant ones, they are located in genomic regions highly enriched with the remnants of transposable elements (TEs), suggesting their heterochromatic nature and location. The promoter region of Myb exhibits a conserved structure throughout the Drosophila phylogeny and carries motifs for binding of chromatin remodeling factors, including insulator BEAF-32, regardless of eu- or heterochromatic surroundings. Importantly, BEAF-32 occupies not only the promoter region of Myb but is also found in the vicinity of transcriptional start sites (TSS) of Ranbp16 gene as well as in a wide range of genes located in the contrasting chromatin types in D. melanogaster and D. virilis, denoting the boundary of the nucleosome-free region available for RNA polymerase II recruitment and the surrounding heterochromatin. We also find that along with BEAF-32, insulators dCTCF and GAF are enriched at the TSS of heterochromatic genes in D. melanogaster. Thus, we propose that insulator proteins contribute to gene expression in the heterochromatic environment and, hence, facilitate the evolutionary repositioning of gene loci into heterochromatin. Author summary Heterochromatin in Drosophila is generally associated with transcriptional silencing. Nevertheless, hundreds of essential genes have been identified in the pericentric heterochromatin of Drosophila melanogaster. Interestingly, genes embedded in pericentric heterochromatin of D. melanogaster may occupy different genomic loci, euchromatic or heterochromatic, due to repositioning in the course of evolution of Drosophila species. By surveying factors that contribute to the normal functioning of the relocated genes in distant Drosophila species, i.e. D. melanogaster and D. virilis, we identify certain insulator proteins (e.g.BEAF-32) that facilitate the expression of heterochromatic genes in spite of the repressive environment.

Pericentric heterochromatin in Drosophila is generally composed of repetitive DNA forming a transcriptionally repressive environment. Nevertheless, dozens of genes were embedded into pericentric genome regions during evolution of Drosophilidae lineage and retained functional activity. However, factors that contribute to "immunity" of these gene loci to transcriptional silencing remain unknown. Here, we investigated molecular evolution of the essential Myb and Ranbp16 genes. These protein-coding genes reside in euchromatic loci of chromosome X in D. melanogaster and related species, while in other studied Drosophila species, including evolutionary distant ones, they are located in genomic regions highly enriched with the remnants of transposable elements (TEs), suggesting their heterochromatic nature and location. The promoter region of Myb exhibits a conserved structure throughout the Drosophila phylogeny and carries motifs for binding of chromatin remodeling factors, including insulator BEAF-32, regardless of eu-or heterochromatic surroundings. Importantly, BEAF-32 occupies not only the promoter region of Myb but is also found in the vicinity of transcriptional start sites (TSS) of Ranbp16 gene as well as in a wide range of genes located in the contrasting chromatin types in D. melanogaster and D. virilis, denoting the boundary of the nucleosome-free region available for RNA polymerase II recruitment and the surrounding heterochromatin. We also find that along with BEAF-32, insulators dCTCF and GAF are enriched at the TSS of heterochromatic genes in D. melanogaster. Thus, we propose that insulator proteins contribute to gene expression in the heterochromatic environment and, hence, facilitate the evolutionary repositioning of gene loci into heterochromatin.

Author summary
Heterochromatin in Drosophila is generally associated with transcriptional silencing. Nevertheless, hundreds of essential genes have been identified in the pericentric heterochromatin

Introduction
Eukaryotic genomes are packaged in chromatin consisting of DNA and associated proteins. Typically, chromatin can be divided into two basic forms, euchromatin and heterochromatin (1). These types of chromatin are distinguished by several distinctive properties, including DNA sequence composition, specific histone modifications and binding proteins, nuclear and chromosomal localization, and frequency of meiotic recombination (1,2). One of the major subtypes of heterochromatin in Drosophila is marked by heterochromatin protein 1 (HP1a) and di-or trimethylated H3K9 (3,4). This subtype of heterochromatin covers large genomic segments primarily around centromeres and, in association with the protein POF (painting of fourth), the entire dot chromosome 4 in D. melanogaster (3)(4)(5). Pericentric heterochromatin is mainly composed of repetitive sequences, including remnants of various transposable elements (TEs), satellite DNAs and other repeats (6). A distinctive feature of heterochromatin is the ability to silence euchromatic genes placed within heterochromatic environment due to chromosomal inversions or transposition events, a phenomenon called position effect variegation (PEV) (7-12). Transcriptional silencing of euchromatic genes in PEV is mediated by spreading of heterochromatin-associated marks HP1a and H3K9me3 across the gene loci transferred to heterochromatin (8,10).
Despite the repressive environment, dozens of essential genes were identified in the pericentric heterochromatin of D. melanogaster (13)(14)(15)(16). Interestingly, genes embedded in pericentric heterochromatin in D. melanogaster may occupy distinct genomic loci, euchromatic and heterochromatic, in other Drosophila species (17). For instance, two adjacent genes RpL15 and Dbp80 located in the pericentric region of chromosome 3L in D. melanogaster reside in a euchromatic region in D. pseudoobscura (18). A similar pattern was demonstrated for genes light and Yeti located in pericentric regions in D. melanogaster, while in D. virilis they are found within euchromatin on the same chromosomal elements (19,20). Recently, it was shown that most of the pericentric genes found at both arms of chromosome 2 of D. melanogaster are located in euchromatic loci in the D. virilis genome (21). However, although repositioning of genes between euchromatin and heterochromatin during genome evolution is not unusual in the Drosophilidae lineage, the "immunity" of heterochromatic genes to the transcriptionally repressive environment remains paradoxical and unexplained. Thus, it is not clear whether these loci have undergone adaptation to heterochromatic environment or had some intrinsic properties permitting local adaptation.
Previously, it was shown that molecular organization of promoter regions is largely conserved between heterochromatic and euchromatic genes, indicating that adaptation to heterochromatin probably does not require major changes in regulatory sequences (20). However, expression of heterochromatic genes requires the methylated H3K9 mark (22,23), and the ability to repositioning during evolution is a characteristic feature of gene clusters that show close association with HP1a protein (21).
Chromatin insulator elements and associated proteins were originally defined by their ability to protect transgenes from PEV, exerting a block to cis spreading of a chromatin state (24,25). To date, a set of insulators have been identified in Drosophila, including BEAF-32 (Boundary element associated factor of 32 kDa), dCTCF (Drosophila homolog of CTCF), Su(Hw) (Suppressor of hairy wing), Zw5 (Zeste-white-5), GAF (GAGA factor) and recently described Pita and ZIPIC (zinc-finger protein interacting with CP190) (26,27). Numerous studies demonstrated that insulators are responsible for a vast number of genomic functions, including stimulation of gene transcription, enhancer-blocking and barrier insulation partitioning of eukaryotic genomes into independently regulated domains (27)(28)(29). Hence, one may hypothesize that gene loci capable of adaptation to heterochromatin probably share specific sites of insulation that ensure their expression in the repressive environment.
To address this issue, we investigated molecular evolution of Myb and Ranbp16 genes in the Drosophilidae lineage. Myb is an essential gene encoding a transcription factor involved in transition from G2 to M phase of the cell cycle (30, 31). Ranbp16 encodes a RanGTP-binding protein belonging to the importin-β superfamily and mediates translocation of proteins into the nucleus. Both genes are located in euchromatic loci of the D. melanogaster X-chromosome, while in other studied Drosophila species belonging to Sophophora and Drosophila subgenus they are found in genomic regions with a high density of repetitive DNA elements upstream, downstream and within introns, suggesting their location in pericentric heterochromatin.
Regardless of the euchromatic or heterochromatic surroundings, the promoter region of Myb displays high sequence homology and stable structural organization among Drosophila species studied so far. We find that the conserved motifs in the promoter sequence of Myb serve as a binding site for the chromatin insulator protein BEAF-32 and transcriptional factor DREF (The DNA replication-related element (DRE)/DNA replication-related element-binding factor).
Furthermore, we demonstrate that BEAF-32 occupies not only the promoter region of Myb in two evolutionarily distant species, D. melanogaster and D. virilis, but also, in cooperation with other insulators dCTCF and GAF, is present in the promoters of most studied heterochromatin genes described in D. melanogaster. Importantly, the evolutionary gene repositioning between euchromatin and the pericentric heterochromatin occurred with preservation of sites of insulation, keeping the binding of BEAF-32 in close proximity to the transcription start sites of these genes. Overall, we propose that insulator proteins, in particular BEAF-32, contribute to

Evolutionary repositioning of Myb and Ranbp16 genes between euchromatin and heterochromatin in Drosophila phylogeny
In order to determine whether Myb and Ranbp16 gene locations have been rearranged on the evolutionary timescale, we first mapped these genes onto genomic scaffolds of Drosophila species separated by evolutionary distances from 5 to 40 million years (32)(33)(34)(35)(36). These include species of the melanogaster group (D. melanogaster and D. yakuba) and the obscura group (D.
persimilis and D. miranda), with both groups belonging to the Sophophora subgenus, along with the virilis group (D. virilis and D. novamexicana) and the repleta group (D. mojavensis and D. hydei) that belong to the Drosophila subgenus. Next, we performed comparative analysis of Myb and Ranbp16 genes, as well as the intergenic regions between these genes. As indicated in Fig. 1, the coding sequences of Myb (red nodes) and Ranbp16 (blue nodes) genes are highly homologous between the species studied ( Fig. 1). However, the regions between Myb and virilis, obscura and repleta groups (Fig. 1). According to the integrated physical and cytogenetic map of Schaeffer et al. (37), as well as overall homology of the studied regions, Myb and Ranbp16 genes are located at the chromosome X in all Drosophila species studied.
Next, we studied in more detail the genomic loci containing Myb and Ranbp16 genes, focusing on two evolutionarily distant species, D. melanogaster and D. virilis, separated by 40 million years of evolution (32). The single copies of Myb and Ranbp16 genes map to the chromosome X of D. melanogaster at the cytogenetic loci 13F14 and 14A1 of salivary gland polytene chromosomes, respectively ( Fig. 2A). These regions are significantly less enriched with heterochromatic marks H3K9me3 and HP1a than telomeric and pericentric regions of the chromosome and, hence, represent typical euchromatin. As mentioned earlier, Myb and Ranbp16 genes in D. melanogaster are located at a distance ~ 80 Kb from each other within a large protein-coding gene cluster, which includes only a few repetitive sequences ( Fig. 2A).
To confirm that Myb and Ranbp16 reside in a heterochromatic region in D. virilis, we performed DNA in situ hybridization on polytene chromosomes using the unique sequence of Myb gene as a probe. In situ hybridization indicates that Myb is located at the base of chromosome X near the chromocenter in D. virilis (Fig. 2B). Mapping analysis of Myb and Ranbp16 genes on the genomic scaffolds of D. virilis reveals that both genes reside in one scaffold_13050 at a distance ~ 120 Kb from each other (Fig. 2C). In contrast to D. melanogaster, the region between these genes in D. virilis does not contain any other protein-coding genes and consists of remnants of TEs and other repeats diverged to varying degrees (Fig. S1). We used annotation of known genomic scaffolds of D. virilis made by Schaeffer et al. (37) to assign the proximal scaffold of D. virilis genome r1.06 to the centromeric region of chromosome X.
According to specific marker genes, we retrieved scaffold_12970 and extended it with scaffold_13050 containing Myb and Ranbp16 genes, keeping some space unassembled between these scaffolds (Fig. 2C). Enrichment profile of H3K9me3 clearly indicates that the putative euchromatin-heterochromatin border lies in the proximal 1Mb of scaffold_12970 (Fig. 2C). The To study how the structure of Myb and Ranbp16 genes has been changed in the course of evolution, resulting in their placement in different chromatin contexts, we compared the structure of these genes in eight Drosophila species.
As a starting point, we retrieved the available sequences of Myb and Ranbp16 genes from Despite prolonged evolution, the structure of Myb gene was preserved virtually unchanged in terms of overall gene length and exon-intron organization (Fig. 3A). A single intron located in the protein-coding region of this gene is conserved both in position and approximate size in the species of both subgenera (Fig. 3A). However, sequence homology of this intron is conserved only within species belonging to one subgenus, Sophophora or Drosophila. This suggests that the intronic sequence per se probably does not affect gene regulation. Therefore, the heterochromatic Myb gene tends to be stable during evolution of the In contrast to Myb, the structure of Ranbp16 is more complex and demonstrates greater flexibility in terms of extension of total gene length by increasing the size of its introns (Fig. 3B). was shortened to 100 nt for all species. Using the MEME suite (38) we were able to identify three motifs that are present in the promoter of Myb gene in all analyzed species of Drosophila ( Fig. 4A). Search through OnTheFly (39) and REDfly v5.6 (40, 41) databases of known transcription factors and their binding sites indicates that the two highest-scoring motifs contain a potential binding site for insulator protein BEAF-32 and transcriptional factor DREF (Fig. 4B).

The length of
The third motif shows limited homology to the binding site of ecdysone receptor co-activator Taiman and another undescribed factor (Fig. 4B).   (Table S2).

Insulator protein BEAF-32 is enriched in promoters of
Using these sets of genes, we performed enrichment analysis of BEAF-32, Pol II, H3K9me3 and ATAC-seq reads upstream and downstream of TSS of all these genes (Fig. 6). As indicated in Fig. 6A, the insulator protein BEAF-32 is enriched in the proximity of TSS or within 10 Kb upstream of TSS of both heterochromatic (Group 1) and euchromatic genes (Group 2) of D. melanogaster (Fig. 6A). The binding area of BEAF-32 is strongly correlated with the enrichment profile of Pol II and ATAC-seq reads (Fig. 6A). Importantly, analysis of D. virilis genes revealed a predominantly similar enrichment profile of BEAF-32 in both clusters of genes partitioned with regard to their heterochromatic or euchromatic location in the genome (Fig. 6B).  (Fig. S5A). Finally, Zw5 binding sites are completely absent from the studied region (Fig. S5A).
Next, we analyzed the enrichment profile of insulators upstream, downstream and within the bodies of heterochromatic genes located in the pericentric region of D. melanogaster chromosome 2L (Fig. S5B). We observed that BEAF-32 is not the only insulator found in the vicinity of TSS in heterochromatic genes. Although to a lesser extent, the insulators dCTCF and GAF are also enriched at the TSS of these genes (Fig. S5B). Interestingly, a few binding sites for Pita are present at the boundaries of gene loci as indicated by binding near the transcriptional start and transcriptional end sites (TES). Despite the binding of ZIPIC and Su(HW) to the pericentric DNA, the sites of their binding are not localized at the TSS and TES of the genes, suggesting that these insulators may be involved in long-range interactions of chromosomes, and hardly have any direct impact on gene expression (Fig. S5B). Overall, these data suggest that the insulator proteins are involved in the maintenance of pericentric heterochromatin conformation and normal functioning of structural genes located in pericentromeric heterochromatin.

Discussion
During speciation, the genomes of Drosophila species underwent multiple chromosome rearrangements that disrupt gene order, modifying or preserving gene function (17,42). In this study, we show that in the course of evolution of Drosophila species the essential Myb and According to studies of PEV, genes that are juxtaposed with heterochromatin by chromosomal rearrangements or transposition events exhibit a variegating phenotype resulting in silencing of genes due to heterochromatin environment (8). Given the peculiarities of heterochromatic genes, such as accumulation of TEs within their introns, one may suggest that heterochromatic genes have evolved to utilize the advantage of the heterochromatic environment and became dependent on heterochromatin specific proteins, such as HP1a (4,(57)(58)(59)(60). However, as was demonstrated previously, the regulatory regions including gene promoters apparently did not undergo drastic changes during their evolution in the heterochromatic environment (20).
Moreover, evolutionary repositioning preferentially occurred only with gene clusters exhibiting close association with HP1a, suggesting that HP1a binding to these genes was present in the progenitor (21). Taken together, these observations allow one to suggest that certain gene loci are favored for repositioning from euchromatin to heterochromatin in the course of evolution. If this is the case, gene loci relocated to heterochromatin probably should retain the transcriptionally active euchromatin-like structure of chromatin capable of efficient transcription in the heterochromatin. Indeed, the proximal regulatory regions of heterochromatic genes are not occupied by the heterochromatic mark H3K9me3, forming a nucleosome-free binding platform for transcriptional factors and RNA polymerase II (Fig. 5 and 6).
How might these observations be explained? What determines the "immunity" of regulatory regions of heterochromatic genes to the repressive surroundings? In this study, we have examined the promoter of Myb gene in thirty Drosophila species and revealed common motifs between these sequences that serve as binding sites for the insulator protein BEAF-32 and transcriptional factor DREF. Originally, insulators were defined as regulators of interaction between enhancers and promoters able to block heterochromatin spreading and PEV (24,25).
DREF is known to mediate activation of transcription of genes involved in DNA replication and cell proliferation, including Myb gene (61,62). Furthermore, cis-acting elements that exercise the transcriptional control of genes by DREF are conserved between such evolutionarily distant species as D. melanogaster and D. virilis (63). The DNA recognition motif for DREF (CGATA) is similar to that of BEAF-32 (TATCGATA), and the binding of DREF to DNA has been shown to antagonize the binding of BEAF-32 in vitro (64). Recent studies of the relationship between BEAF-32 and DREF showed that DREF co-localizes at the same genomic sites as BEAF-32 and other insulator proteins and is enriched at the boundaries of topologically associated domains (TAD) (65). Together, these data suggest that DREF function is mediated by and probably depends on insulators on an evolutionary timescale.
A growing body of evidence suggests that insulators exercise diverse roles, including barrier function, and mediate short and long-distance chromosomal contacts at the genome-wide scale (66)(67)(68)(69)(70). Our enrichment analysis of ChIP-seq data indicated that the insulator protein BEAF-32 is enriched upstream of the TSS of heterochromatic genes in D. melanogaster and D.
virilis, demarcating the euchromatin-heterochromatin border between the promoter and the surrounding heterochromatin (Fig. 6). Furthermore, using the same set of genes that reside in different types of chromatin in D. melanogaster and D. virilis, we show that BEAF-32 binding is preserved in the proximity of the promoter regions of these genes during evolution of different Drosophila species, suggesting that BEAF-32 binding is an ancestral property of these genes.
Along with BEAF-32, insulators dCTCF and GAF are also enriched at the TSS of heterochromatic genes, while the binding peaks for insulators Pita, ZIPIC and Su(Hw) are distributed across the pericentric region to a much lesser extent. The insulators dCTCF and GAF have many overlapping binding sites with BEAF-32 and DREF in the Drosophila genome (65).
Furthermore, sites containing DREF without BEAF-32 are instead enriched with the Su(Hw) recognition sequences (65). Pita also belongs to a class of insulator proteins that preferentially bind to promoters near the TSS (71). ZIPIC is described as facilitator of long-distance chromosomal interactions (26,71). demonstrated that most BEAF-associated genes are transcriptionally active or highly expressed and are associated with negative elongation factor NELF that stimulates transcription levels by inhibiting promoter-proximal nucleosome assembly (67, 75). This provides evidence that BEAF-32 facilitates high levels of gene expression.
Previously, it was shown that heterochromatic genes take advantage of repetitive surrounding and heterochromatin factors, such as HP1a, to facilitate their expression and probably long-distance interactions between enhancers and promoters (11,22,23,76). On the other hand, an analysis conducted by Yasuhara et al (20)   Ranbp16 (also known as Xpo7) of mouse and human were extracted from UniProt (https://www.uniprot.org/). Protein motifs were scanned using the PROSITE database and methodology (80,81).
Estimation of repeat content in intergenic regions and within studied genes was performed using RepeatMasker (82) and computationally predicted libraries of TEs generated  and low scaffold contiguity around these genes for D. persimilis and D. hydei.

ChIP-seq, RNA-seq and ATAC-seq analyses
Raw data of genome binding/occupancy (ChIP-seq), transcriptome (RNA-seq) and nucleosome (ATAC-seq) profiling were obtained from GEO database and used in the analyses. melanogaster.
For analysis of sequence data, we used genome sequence and annotations released in FlyBase, D. melanogaster r.6.19 and D. virilis r1.06. Prior to mapping, all libraries were subjected to adapter clipping, filtering by length (>20 nt) and quality (80% of nt must have at least 20 Phred quality) using TrimGalore (https://github.com/FelixKrueger/TrimGalore). Then, sequences were aligned to corresponding genomes using Bowtie (91) with the following settings: "-v 0 --best --strata", retaining only aligned reads with zero mismatches. Output sequence alignment map (SAM) files were converted to binary (BAM) format using SAMtools (92 For ATAC-seq analysis, reads that mapped on mitochondrial genomes were discarded, and peak calling was performed using Genrich (https://github.com/jsh58/Genrich) with the following settings: "-j -y -r -d 50", including removal of PCR duplicates.

Promoter analysis
Because sequence extraction, promoter regions of Myb and Ranbp16 were searched for common motifs using MEME (96) and identification of matches to known transcription factors was performed by Tomtom (97) using OnTheFly (39) and REDfly v5.6 (40, 41) databases implemented in MEME Suite 5.0.5 (38).

Sequence evolution and testing for selection
Analysis of nucleotide substitutions per site was conducted in MEGA X (98) using the Tamura-Nei model (99). Rate variation among sites was modeled with a gamma distribution (shape parameter = 1). All positions containing gaps and missing data were eliminated (complete deletion option).
Ratio of nonsynonymous and synonymous substitutions (dN/dS) was estimated using PAL2NAL software (100) by converting multiple sequence alignment of proteins and the corresponding nucleotide sequences into a codon alignment, and the calculation of synonymous (dS) and non-synonymous (dN) substitution rates using codeml program implemented in PAML package (101).

Cytology and DNA in situ hybridization
D. virilis larvae were grown at 18 0 C on standard medium supplemented with live yeast solution for 2 days before dissection. Salivary glands from 3 rd instar larvae were dissected in 45% acetic acid and squashed. DNA probes corresponding to D. virilis Myb (Dvir\GJ18431; FlyBase ID: FBgn0205590) were prepared by PCR using gene-specific primers (Forward_ GCAAGTGCGAGCACTGAAAA; Reverse_TGCATACTGAGGTGTGCCAG). Then, DNA probe was biotinylated by nick translation using Biotin-14-dATP (Thermo Fisher Scientific, USA) as described in (102). Localization of the probe was made using the cytological map of D.