Evolutionary Analysis of Transcriptional Regulation Mediated by Cdx2 in Rodents

Differences in gene expression, which can arise from divergence in cis-regulatory elements or alterations in transcription factors binding specificity, are one of the most important causes of phenotypic diversity during evolution. By protein sequence analysis, we observed high sequence conservation in the DNA binding domain (DBD) of the transcription factor Cdx2 across many vertebrates, whereas three amino acid changes were exclusively found in mouse Cdx2 (mCdx2), suggesting potential positive selection in the mouse lineage. Multi-omics analyses were then carried out to investigate the effects of these changes. Surprisingly, there were no significant functional differences between mCdx2 and its rat homologue (rCdx2), and none of the three amino acid changes had any impact on its function. Finally, we used rat-mouse allodiploid embryonic stem cells (RMES) to study the cis effects of Cdx2-mediated gene regulation between the two rodents. Interestingly, whereas Cdx2 binding is largely divergent between mouse and rat, the transcriptional effect induced by Cdx2 is conserved to a much larger extent. Author summary Our study 1) represented a first systematic analysis of species-specific adaptation in DNA binding pattern of transcription factor. Although the mouse-specific amino acid changes did not manifest functional impact in our system, several explanations may account for it (See Discussion part for the detail); 2) represented a first study of cis-regulation between two reproductively isolated species by using a novel allodiploid system; 3) demonstrated a higher conservation of transcriptional output than that of DNA binding, suggesting the evolvability/plasticity of the latter; 4) finally provided a rich data resource for Cdx2 mediated regulation, including gene expression, chromatin accessibility and DNA binding etc.


Introduction 50
Gene expression refers to the spatiotemporal conversion of information from DNA to functional 51 gene products such as proteins. Knowing how gene expression is regulated is critical for the 52 understanding of development as well as evolution [1]. Indeed, differences in gene expression 53 are considered to be among the most important causes of phenotypic diversity across species

54
[2]. Multiple layers are involved in the regulation of gene expression, of which transcriptional 55 regulation is considered to be a crucial contributor to phenotypic alterations during evolution.

94
In this study, to find a candidate TF which might have undergone adaptive evolution in 95 either the mouse or the rat lineage, we compared the amino acid sequences of TF-DBDs 96 among mouse, rat and human, and searched for the TFs with highly conserved DBD between 97 human and one of the rodent species, but showing multiple amino acid changes in the other 98 rodent species. It turned out that the DBD in caudal-type homeobox 2 (Cdx2) contains three 99 amino acid changes exclusively in mouse. The finding was further substantiated by including 100 56 species as well as 37 mouse strains in the sequence comparison. Given the established 101 function of Cdx2 in lineage specification and trophectoderm differentiation [30][31][32][33], we 102 investigated the potential effect of the three mouse-specific amino acid changes in the mouse 103 embryonic stem cell (mESC) systems. Unexpectedly, we did not observe any significant 104 effects at either DNA binding specificity or target gene expression induced by the three 105 changes. Then, to study the cis-regulatory changes in Cdx2-mediated transcriptional 106 regulation between rat and mouse, we analyzed the allele-specific binding of Cdx2 as well as 107 allele-specific transcriptional output induced by Cdx2 in rat-mouse allodiploid embryonic stem 108 cells (RMES) [34]. Interestingly, whereas the Cdx2 binding is largely divergent between mouse 109 and rat, the transcriptional effect induced by Cdx2 is conserved to a much larger extent.

112
Exclusive amino acid changes in the DNA binding domain of mCdx2  [30,31,33]. Therefore, to 140 investigate the function of Cdx2 during this process, we applied a doxycycline (DOX) inducible 141 Tet-On system to induce Flag-tagged Cdx2 expression in mESCs and analyzed its function by 142 measuring transcriptome and epigenome changes (Fig 2A). To check the suitability of this with flat or square shape started to appear.

147
We then compared the gene expression of the mESCs before and after DOX induction 148 using RNA-seq (Fig 2A). As shown in Fig 2D,   enriched for house-keeping chromatin regulators, such as Ctcf (Fig 2F).

168
Last, to characterize mCdx2 binding sites on a genome-wide scale, we performed

171
Importantly, mCdx2 binding peaks were differentially distributed between the three categories 172 of ATAC peaks, with predominant binding at the UP group ( Fig 2F). Taken together, the results binding motif for mCdx2 and rCdx2, the same as previously reported for mCdx2 ( Fig 3E).

221
Transcriptional regulation of gene expression is mediated by both cis and trans components.

222
The results shown above indicate that the amino acid differences in the Cdx2 DBD did not lead 223 to divergent trans-regulatory effects between rat and mouse. We then turned to the 224 cis-regulatory part of Cdx2-mediated regulation. One common strategy to study cis-effects is 225 to compare the gene regulation between two alleles in F1 hybrid [3,5,12]. In F1 hybrids, both 226 parental alleles are subject to the same trans-regulatory environments; thus, observed 227 differences in allele-specific regulatory patterns should reflect only the impact of cis regulatory 228 divergence. However, this approach cannot be conducted between mammalian species with 229 long evolutionary distance, such as mouse and rat, due to reproductive isolation. To ChIP-seq datasets from mCdx2-OE and rCdx2-OE RMESCs as experimental replicates.

241
Then we compared the Cdx2 binding sites between the mouse and rat genomes. In order 242 to check how these binding sites evolved, we classified the binding sites determined by were those that could not be aligned to the other species. As shown in Fig 5B,  rat to mouse), respectively. The distributions of the three groups were similar between the binding sites at proximal and those at distal regions (S1 and S2 Tables).

254
Then we compared the signal intensities among these three peak classes. As expected,

255
the binding affinities were the highest for conserved peaks and there were no significant regulatory pattern (Fig 6B). Based on the GO analysis, we found that type 4 genes were highly 280 enriched in functions related to "organism development and cell differentiation, consistent with 281 the known function of Cdx2 in early development (Fig 6C). In addition, the magnitude of gene 282 expression changes was highest for type 4 genes, further suggesting their important functions 283 ( Fig 6D).

284
Finally, we analyzed the relationship between differential binding and gene expression.

285
For this purpose, we checked the distribution of peaks located in different groups of genes. As

306
These data together suggested that the regulatory function of Cdx2 was evolutionarily 307 conserved and was unaffected by any of the derived mouse-specific amino acid changes, 308 which contradicted our initial expectation. It is of course possible that these mouse-specific

319
We then analyzed cis-regulatory divergence on Cdx2 mediated regulation. As to 320 transcriptional effects, Cdx2 mediated regulation appeared to be largely conserved between

347
Cell culture 348 The mESC and RMES cells were cultured as previously reported [34].

413
The libraries were sequenced in 2 × 150 nt manner on HiSeq Xten platform (Illumina

435
Reads were aligned to the mouse reference genome (mm10) using Bowtie2 peaks were referred to as common peaks. HOMER [52] was used to perform motif enrichment 446 analysis on the ATAC peaks. Deeptools [53] was used to plot the heatmap of peak signal

456
To systematically compare mCDX2 ChIP-seq and rCDX2 ChIP-seq peak signals, reads 457 from both mCDX2 and rCDX2 ChIP-seq samples were merged as input for MACS2 to call

463
For RMES cells, reads were aligned to both mm10 and rn6 reference respectively by 464 Bowtie2, and assigned to either mm10 or rn6 according to the alignment edit distance to the 465 reference. The subsequent analysis was the same as the single species data analysis above.