Abstract
The RNA binding protein Dazl is essential for gametogenesis, but its direct in vivo functions, RNA targets, and the molecular basis for germ cell loss in DAZL KO mice are unknown. Here, we generated transcriptome-wide maps of Dazl-RNA interactions in adult and juvenile mouse testes. In parallel, we used transgenic mice and fluorescence activated cell sorting to isolate DAZL knockout germ cells and identify mRNAs sensitive to Dazl deletion. Integrative analyses reveal that Dazl functions as a master regulator of germ cell survival by post-transcriptionally enhancing a vast network of genes necessary for cell cycle regulation and spermatogenesis. Strikingly, Dazl displays a strong positional bias for binding near polyA tails and multimerizes on a subset of its targets. These results reveal a mechanism for Dazl recruitment to its RNA targets and delineate a Dazl-dependent mRNA regulatory program essential for postnatal mammalian germ cell survival.
Introduction
RNA binding proteins (RBPs) are potent post-transcriptional gene regulators. In the nucleus, RBPs can alter pre-mRNA processing to generate mRNAs with different coding and non-coding sequences. In the cytoplasm, RBPs influence mRNA localization, translation, and stability, typically through interactions with 3’ untranslated regions (3’UTRs). Importantly, the combined regulatory impacts of RBPs on RNA stability and translation can greatly influence protein expression. Despite widespread recognition of the importance of RBPs in development, immunity, and disease (Scotti and Swanson, 2016), our understanding of how specific RBPs modulate transcriptional programs to control cell fate is extremely limited.
Regulation of mRNA processing and translation is especially relevant during spermatogenesis, the highly ordered process of postnatal male germ cell development that yields haploid spermatozoa (Licatalosi, 2016). During embryogenesis, primordial germ cells (PGCs) commit to a male or female developmental program (McLaren, 2001). Male germ cells proliferate briefly then arrest at G1/G0 as gonocytes that remain quiescent for the remainder of embryogenesis. The first few postnatal days are critical for the establishment of spermatogonial stem cells (Manku and Culty, 2015). During postnatal day one, mitotically-arrested spermatogonial precursors located in the center of testis cords re-enter the cell cycle and begin proliferating. A few days later, these cells extend projections towards the basement membrane and, after making contact, migrate to the basement membrane. After relocation, the germ cells differentiate, acquire “stem cell” properties, and rapidly expand in number. Continued sperm production throughout life depends on proper control of spermatogonial proliferation and differentiation (Manku and Culty, 2015). Many genes critical for this regulation have been identified, including genes encoding cell cycle checkpoint factors, regulators of the DNA damage response, transcription factors, and RBPs that affect mRNA export and stability (Costoya et al., 2004; Raverot et al., 2005; Hao et al., 2008; Takubo et al., 2008; Pan et al., 2009; Saga, 2010; Song and Wilkinson, 2014). However, the mechanisms that coordinately regulate expression of genes essential for germ cell maintenance are not well understood.
The functional significance of RBPs in germ cell development is well illustrated by the DAZ family of RBPs. The DAZ (deleted in azoospermia) proteins comprise a family of germ cell restricted RBPs (Daz, Dazl, and Boule) that are necessary for gametogenesis in worms, flies, mice, and humans (VanGompel and Xu, 2011). Their significance was first demonstrated in the 1990s, when DAZ was discovered in a region of the Y chromosome deleted in 10-15% of men with azoospermia (no sperm) (Reijo et al., 1995; Reijo et al., 1996). Deletion of DAZL in mice leads to a dramatic decrease in the number of surviving germ cells, and complete arrest as cells enter meiosis. Remarkably, transgenic expression of human DAZL or DAZ partially rescues the extensive germ cell loss in DAZL KO mice (Vogel et al., 2002), indicating functional conservation of DAZ RBPs across species. Despite the clear biological importance of DAZ proteins many critical questions remain, including the identities of their direct RNA targets, how these RNAs are regulated, and why loss of this regulation results in germ cell defects. In this study, we provide answers to these long-standing questions.
Dazl’s cytoplasmic localization, co-sedimentation with polyribosomes, and association with polyA+ RNA in germ cells (Ruggiu et al., 1997; Tsui et al., 2000) suggest that it may regulate mRNA stability or translation. In addition, yeast two hybrid analysis of Dazl-interactors identified RBPs with cytoplasmic roles in mRNA regulation including Pum2, QK3, and the polyA-binding protein Pabpc1 (Moore et al., 2003; Fox et al., 2005; Collier et al., 2005). However, the scarcity and variable number of germ cells present in DAZL KO mice (Ruggiu et al., 1997; Schrans-Stassen et al., 2001; Saunders et al., 2003) present major challenges to investigating Dazl’s direct in vivo function(s) in the male germline. Consequently, previous Dazl studies have mostly used reconstituted systems including transfection of somatic cell lines(Maegawa et al., 2002; Xu et al., 2013), artificial tethering to in vitro synthesized RNAs injected into oocytes or zebrafish (Collier et al., 2005; Takeda et al., 2009), and in vitro-derived germ cells (Haston et al., 2009; Chen et al., 2014) These efforts have suggested diverse functions for Dazl in different cell contexts, including roles in mRNA stabilization, stress granule assembly, mRNA localization, and translation (Fu et al., 2015). However, transfection assays have demonstrated that Dazl can have opposite effects on the same reporter RNA in different somatic cell lines(Xu et al., 2013). It is not clear whether these discrepancies are due to cell context and/or non-physiological levels or recruitment of Dazl to RNA targets.
While the direct in vivo functions are not defined, in vitro binding assays and X-ray crystallography of the Dazl RRM identified GUU as a high affinity binding site (Jenkins et al., 2011). However, the frequency of GUU across the transcriptome hampers bioinformatic predictions of functional Dazl-binding sites and RNA targets in vivo. Microarray analyses of RNAs that co-immunoprecipitate (IP) with Dazl (Reynolds et al., 2005; Reynolds et al., 2007; Chen et al., 2014) have suggested potential targets. Yet, few have been examined in DAZL KO mice, and these cannot account for the dramatic loss of germ cells seen in DAZL null mice. Furthermore, different investigators have arrived at alternate conclusions about Dazl’s role as a translation repressor or activator based on immunofluorescence (IF) assays of wild type (WT) and DAZL KO germ cells (Reynolds et al., 2005; Chen et al., 2014). Moreover, neither group explored whether the observed differences in protein abundance are associated with changes in RNA levels.
Collectively, these observations underscore the need to identify Dazl’s direct in vivo RNA targets in an unbiased and transcriptome-wide manner, and to devise a new strategy for the isolation of limiting DAZL KO germ cells for transcriptome-profiling.
In this study, we used an integrative approach to elucidate the direct RNA targets and in vivo functions of Dazl in male germ cells. High-resolution, transcriptome-wide mapping of Dazl-RNA interactions in testes reveal Dazl binding to a vast set of mRNAs, predominantly through polyA-proximal interactions. Extensive analyses of the Dazl-RNA maps strongly suggest that Dazl recruitment to its RNA targets is facilitated by local Pabpc1-polyA interactions. Using transgenic mice with fluorescently-labeled germ cells and FACS, we isolated germ cells from DAZL KO testes and wild type (WT) controls and used RNA-Seq to identify mRNAs that are sensitive to DAZL-deletion. Integration of the RNA-Seq and Dazl-RNA interaction datasets revealed that Dazl enhances expression of key cell cycle regulatory genes. Collectively, these data provide important new insights into the mechanism of Dazl recruitment to its RNA targets, the molecular basis for germ cell loss caused by DAZL deletion, and reveal an mRNA regulatory program that is essential for postnatal germ cell survival.
Results
Dazl binds GU-rich sequences in the testis transcriptome
To comprehensively map direct sites of Dazl-RNA interactions in vivo, HITS-CLIP libraries were generated from Dazl-RNA complexes purified from UV cross-linked adult mouse testes and sequenced using the Illumina platform (Figure 1A). The resulting CLIP reads from three biological replicates were filtered and mapped individually (Supplemental Table 1), and then intersected to reveal 11,297 genomic positions with overlapping CLIP reads in 3/3 libraries; hereafter designated as BR3 clusters (biologic reproducibility 3/3, Figure 1B). Remarkably, these interactions reveal Dazl directly binds over 3900 transcripts in adult testis.
We next identified CLIP peaks within each cluster (Supplemental Figure 1). Consistent with X-ray crystallographic studies (Jenkins et al., 2011), these sites were enriched for GUU motifs compared to shuffled control sequences (1.8-fold, χ2=0, all peaks, Figure 1C). Using RNA-Seq data from age-matched testes, CLIP peaks were normalized to RNA levels and parsed into 10 bins based on CLIP:RNA-Seq ratios. A clear correlation was evident between CLIP:RNA-Seq read ratio, GUU-enrichment, and the proportion of peaks in each bin that contained GUU (Figure 1D). Separately, de novo motif analysis using the MEME suite (Bailey et al., 2015) identified GTT-containing motifs as the most enriched sequence elements in genomic regions corresponding to peaks with the highest CLIP:RNA-Seq ratios (Figure 1E).
Dazl binding sites were also examined using an independent read mapping and analysis pipeline that takes advantage of cross-link induced mutation sites (CIMS) that reflect sites of protein-RNA cross-linking (Zhang and Darnell, 2011). De novo motif analysis of the top 1000 CIMS deletion sites (±10nt) identified GUU-containing motifs as the most enriched sequence elements (Figure 1F). Examination of 6mers and 4mers associated with the top 1000 CIMS sites ±20nt relative to background sequences identified enrichment of GUU-containing sequences around deletions, particularly the GUUG motif, which was enriched 40-60 fold around CIMS deletion sites, with cross-linking occurring at U residues within GUU triplets (Figure G,H). Thus, two independent read mapping and bioinformatic workflows confirm that Dazl predominantly binds GUU-rich sequences in vivo.
Strikingly, 39% of all Dazl BR3 clusters contained multiple peaks, with a median spacing of ∼60nt (Figure 1C, 1I; Supplemental Figure 1). Notably, motif enrichment values for adjacent peak sequences were nearly identical, regardless of relative CLIP tag density (data not shown) or 5’ versus 3’ position (Figure 1J). These observations are consistent with Dazl homodimerization, which was observed in yeast two hybrid and in vitro assays (Ruggiu and Cooke, 2000). We infer that Dazl multimerizes on a subset of its natural RNA targets.
Dazl predominantly binds GUU sites in close proximity to mRNA polyA tails
We next examined the distribution of Dazl-RNA contacts across the transcriptome. Strikingly, the majority of Dazl BR3 sites mapped to 3’UTRs of thousands of protein-coding genes (Figure. 2A; Supplemental Table 2). To improve the annotation of Dazl-3’UTR interactions we used PolyA-Seq to generate a quantitative global map of polyA site utilization in adult testes (Supplemental Figure 2A). Consistent with widespread transcription and alternative polyadenylation (APA) in the testis (Soumillon et al., 2013; Li et al., 2016), PolyA-Seq identified 28,032 polyA sites in RNAs from 16,431 genes (Figure 2B, Supplemental Figure 2B). Importantly, gene expression estimates from PolyA-Seq and RNA-Seq showed a strong correlation (R=0.83, Supplemental Figure 2C), as did estimates of polyA site usage from PolyA-Seq and qRT-PCR (R=0.72, 10 candidates examined; Supplemental Figure 2D).
Having defined 3’UTRs expressed in the testis, metagene analyses were performed to examine the distribution of 3’UTR Dazl-RNA interactions relative to the upstream stop codon and downstream polyA site. Compared to the expected distribution if peaks were randomly distributed (dashed line in Figure 2C), no positional preference was observed when Dazl-3’UTR interactions were measured relative to the stop codon (Figure 2C, compare red line to dashed line). In stark contrast, a prominant positional bias for Dazl-RNA contacts was observed relative to the polyA site, with strong enrichment within ∼150nt and the greatest number of interactions ∼50nt upstream of the polyA site (Figure 2C, compare blue line to dashed line). This positional preference is also evident on 3’UTRs with multiple Dazl CLIP peaks, with broadening of Dazl binding in a 3’-to-5’ direction as the number of peaks per 3’UTR increased (Figure 2D).
Further examination showed that 3’UTRs with Dazl BR3 CLIP sites had a higher proportion of uridine residues compared to a set of 3’UTRs lacking Dazl CLIP reads (Figure 2E, Supplemental Figure 3A). Moreover, the most enriched motifs in Dazl-bound versus control 3’UTRs were U-rich, with 2 of the top 7 most-enriched 4mers containing GUU (Figure 2F). Similarly, 3 of the top 4 6mers (and 7 of the top 20) contained GUU (Supplemental Figure 3B). Although both sets of 3’UTRs had enrichment of GUU upstream of polyA sites, GUU-enrichment was greater and extended over a broader region in Dazl-bound 3’UTRs (Figure 2G).
Collectively, transcriptome-wide mapping of direct, biologically-reproducible Dazl-RNA interactions in mouse testes indicates that Dazl predominantly binds U-rich 3’UTRs with preferential binding to GUU-containing sequences ∼50nt upstream of polyA sites. In addition, the data suggests that Dazl multimerizes in a 3’-to-5’ manner on a subset of its mRNA targets.
PolyA tracts specify positions of Dazl-RNA interactions
Enrichment of Dazl-RNA contacts near polyA tails, together with evidence of Dazl-Pabpc1 binding (Collier et al., 2005) suggest that Dazl recruitment to its RNA targets may be facilitated by local Pabpc1-polyA interactions. In addition to binding polyA tails, Pabpc1 interacts with genomic-encoded A-rich sequences in mRNA (Kini et al., 2016), including auto-regulatory interactions in the 5’UTR of its own mRNA (Hornstein et al., 1999). Interestingly, PABPC1 was among a small set of 48 genes with Dazl BR3 CLIP sites in the 5’UTR (57 sites in 5’UTRs of 48 genes, compared to 6,397 sites in 3’UTRs of 3,249 genes), prompting further examination of these outliers (Figure 3A; Supplemental Table 2). For the majority (32/48 genes), CLIP read number was greatest in the 3’UTR and declined in a 3’-to-5’ manner (for example ACYP1, CHIC2, and CCNI, Figure 3B-D, 5’low genes). PABPC1 was one of only 16 genes where more than 50% of the total CLIP reads were located in the 5’UTR (Figure 3E-F, 5’high genes). Marked differences were observed in the sequence features associated with these two sets of 5’UTRs. Consistent with low CLIP coverage, 5’UTRs of 5’low genes were not enriched for GUU-containing sequences. In contrast, 5’UTRs of 5’high genes resembled 3’UTRs, whereby the most enriched motifs included polyU and polyU tracts interrupted by a single G (Figure 3G). Strikingly, the second most-enriched hexamer in 5’UTRs of 5’high genes was AAAAAA (Figure 3G, Supplemental Table 3). Thus, a common feature of Dazl-RNA interactions in 5’UTRs and 3’UTRs is a local polyA tract (either genomic-encoded or added to mRNA 3’ends, respectively). These observations implicate polyA tracts (more specifically, local polyA-Pabpc1 interactions) as a key determinant in specifying positions of Dazl-RNA interactions. They also shed light on the potential mechanism of Dazl recruitment to mRNA and the basis for the widespread polyA-proximal Dazl-RNA binding patterns observed across the germ cell transcriptome.
Dazl loss remodels the germ cell transcriptome
Significant barriers to understanding Dazl’s in vivo functions include the scarcity and variable number of germ cells in DAZL KO mice (Ruggiu et al., 1997; Lin and Page, 2005). To overcome these obstacles and collect DAZL KO germ cells for molecular analyses, we used Cre-lox to label germ cells with green fluorescent protein (GFP) followed by FACS. This approach (Zagore et al., 2015) utilizes the Stra8-iCre transgene that expresses Cre recombinase in postnatal germ cells (Sadate-Ngatchou et al., 2008), and the IRG transgene that expresses GFP following Cre-mediated recombination (De Gasperi et al., 2008). To generate animals with GFP+ Dazl-deficient postnatal germ cells, we used mixed background (CD1xC57BL/6J) breeders heterozygous for the null DAZLTm1hgu allele (Ruggiu et al., 1997) (see Methods). Testes of Stra8-iCre+; IRG+; DAZL++ and Stra8-iCre+; IRG+; DAZLTm1hgu/Tm1hgu mice were indistinguishable at postnatal day 0 (P0), however germ cells declined steadily thereafter in the latter (data not shown). Cre expression from Stra8-iCre commences ∼P3, therefore we selected animals at P6 for tissue collection and FACS. At this age, spermatogonia are the only germ cells present. As expected, GFP+ spermatogonia were significantly reduced in Stra8-iCre+; IRG+; DAZLTm1hgu/Tm1hgu animals compared to littermate controls (Figure 4B, C). Importantly, germ cell-restricted GFP expression in Stra8-iCre+; IRG+; DAZL++ and Stra8-iCre+; IRG+; DAZLTm1hgu/Tm1hgu mice was confirmed by IF microscropy (Figure 4A), and germ cell markers were significantly enriched in GFP+ cells (data not shown).
To determine how the absence of Dazl impacts global mRNA levels, RNA-Seq analysis was performed on GFP+ cells collected by FACS. Overall, the number of expressed genes and the distribution of their RPKM values were comparable between GFP+ WT and DAZL KO cells (11,739 and 12,269 expressed genes with RPKM>1, respectively). However, 1,462 transcripts had reduced RNA levels and 1,584 increased in DAZL KO cells (minimum 2-fold, adjusted P<0.01; Figure 4D, green and red dots, respectively; Figure 4E). Notably, DDX4 and SYCP3 were among the genes with reduced RNA in DAZL KO cells (86- and 26.5-fold, respectively). Therefore, the reduced levels of Ddx4 and Sycp3 proteins previously observed in DAZL KO germ cells by IF and attributed to reduced translation (Reynolds et al., 2005; 2007) are associated with significant reductions in their corresponding mRNAs.
To further explore why DAZL deletion results in rapid postnatal germ cell loss, we more closely examined the genes with steady state mRNA differences between WT and KO cells. Notably, 41% of DAZL-repressed genes (ie., up-regulated in KO) had little to no expression in WT cells (656 genes with RPKM<1 in WT), indicating that these genes are under negative regulation in WT cells. Gene ontology (GO) terms associated with this set were enriched for terms associated with the extracellular matrix and plasma membrane (Supplemental Figure 4A). These and other closely related GO terms were also enriched when GO analysis was performed on the complete set of 1,584 genes with increased RNA in DAZL KO cells (Supplemental Figure 4B). Thus, a disproportionate number of genes with increased expression in DAZL KO cells are normally expressed at very low levels in WT cells and encode extracellular and membrane-associated proteins.
In contrast, genes enhanced by DAZL (ie., down-regulated in KO) were among the most abundantly expressed, with ∼50% of these among the top 20% of all genes expressed in WT cells (Figure 4E). In addition, a disproportionate number of DAZL-enhanced genes encode nuclear proteins with roles in cell cycle regulation (Supplemental Figure 4C). This includes clusters of genes and enriched GO terms associated with mitotic cell cycle control, DNA repair, regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle, meiotic cell cycle, and cell cycle checkpoints (Supplemental Figure 5). Also enriched were groups of genes with roles in RNA processing, RNA transport, transcription, DNA methylation, and a broad set of genes with the annotation ‘spermatogenesis’. Contained in this last group are several genes essential for spermatogenesis including DND1 (Yamaji et al., 2017), PLZF/ZBTB16 (Costoya et al., 2004), SOHLH2 (Hao et al., 2008) and SOX3 (Raverot et al., 2005) (Supplemental Figure 5, group G).
Altogether, these observations indicate that genes with decreased or increased RNAs in DAZL KO cells differ considerably with respect to the distribution of RPKM values in WT cells and the functions of the encoded proteins. Importantly, the data shows that a large cohort of genes that are essential for cell cycle regulation and spermatogonial maintenance are normally expressed at very high levels in spermatogonia and have significantly reduced mRNA in the absence of Dazl.
Dazl directly enhances mRNA levels of a subset of in vivo targets
To investigate which RNA changes are primary or secondary consequences of DAZL loss, we mapped Dazl-RNA interactions in P6 testes using iCLIP. As with Dazl BR3 sites in adult testes, P6 BR3 regions were enriched for GUU sequences and the majority mapped to 3’UTRs (Figure 5A, B). Interestingly, when motif-enrichment analysis was performed on complete BR3 regions, AAAAAA was the second most enriched 6-mer (Figure 5B, left). However, AAAAAA enrichment dropped when motif analysis was performed on the 5’ends of BR3 sites (±5nt), where cross-link sites are enriched in iCLIP libraries (Konig et al., 2010) (Figure 5A, B). These observations provide further evidence that polyA tracts are enriched downstream of direct sites of Dazl-RNA interactions.
To accurately map positions of Dazl-RNA contacts in spermatogonia 3’UTRs, PolyA-Seq libraries were generated from GFP+ spermatogonia isolated by FACS from Stra8-iCre+;IRG+ testes. Of the 28,038 3’UTRs defined by PolyA-seq of whole adult testis (described above; Figure 2B), 16,502 were identified in the PolyA-seq data from GFP+ spermatogonia, corresponding to 10,370 genes. Intersecting the P6 iCLIP and spermatogonia PolyA-seq datasets showed that 84% of P6 Dazl BR3 sites (5,400/6,465) mapped to 3’UTRs of 2,290 genes, indicating Dazl directly binds to a vast network of RNAs in juvenile as well as adult germ cells. Furthermore, metagene analysis showed that Dazl-RNA interactions in spermatogonia-expressed 3’UTRs also had a polyA-proximal bias, with the greatest enrichment within 150nt of the polyA site (Figure 5C). Thus, PolyA-seq and transcriptome-wide mapping of Dazl-RNA interactions confirm a polyA-proximal bias for Dazl-RNA binding in both adult and juvenile testes.
Interestingly, the majority of genes with Dazl-3’UTR interactions (74%) had higher RPKM values in WT compared to KO GFP+ cells (74% versus 26%, Figure 5D, left). Comparing genes with Dazl-3’UTR interactions to those displaying RNA differences between WT and KO cells identified a common set of 583 genes (2-fold or greater, P<0.01). Strikingly, a disproportionate number of these were DAZL-enhanced (501 genes with reduced RNA in KO) rather than DAZL-repressed (82 genes with increased RNA in KO) (Figure 5D, green and red dots, respectively, Figure 5E). Metagene analyses revealed striking differences in the distribution of Dazl binding sites in these 3’UTRs. Whereas BR3 sites in 3’UTRs of DAZL-repressed genes showed no positional preference (Figure 5D, red line), Dazl binding sites in 3’UTRs of DAZL-enhanced genes were enriched near the polyA tail, with the greatest number of interactions within 100nt of the polyA site (Figure 5D, green line). Moreover, GUU motifs and Dazl BR3 interaction sites were significantly more abundant in DAZL-enhanced rather than DAZL-repressed genes (Figure 5F). Altogether, these observations indicate that DAZL-enhanced genes are directly regulated by Dazl, while the DAZL-repressed genes represent indirect targets. Furthermore, they strongly suggest that Dazl functions directly in vivo as a positive post-transcriptional regulator of gene expression through polyA-proximal interactions.
We also examined 236 DAZL-insensitive genes with Dazl-3’UTR interactions but whose RNA levels did not differ by more than 20% between WT and DAZL KO cells (Figure 5D, blue dots). Remarkably, Dazl-RNA interactions in these 3’UTRs also displayed a strong polyA proximal bias (Figure 5D, blue line). Furthermore, we identified several examples where Dazl-3’UTR binding patterns were indistinguishable between DAZL-insensitive and DAZL-enhanced 3’UTRs (Supplemental Figure 6). These observations raise the possibility that a significant proportion of Dazl-3’UTR interactions may not have regulatory function and may represent opportunistic interactions reflecting the general mechanism of Dazl recruitment to its RNA targets. They also indicate that polyA-proximal binding alone is not sufficient for Dazl to enhance mRNA levels and suggest that Dazl acts in a 3’UTR specific manner.
Dazl post-transcriptionally controls a network of cell cycle regulatory genes
To understand the primary causes of germ cell depletion in DAZL KO testes, we more closely examined the 501 DAZL-enhanced genes with Dazl-3’UTR interactions. Heirarchical clustering revealed 17 groups of significantly enriched GO terms, with the majority (12/17) associated with different aspects of cell cycle regulation, including chromatin modification, condensation, or segregation; synaptonemal complex assembly; mitotic spindle checkpoint; DNA replication, repair, or packaging; and mitotic or meiotic cell cycle regulation (Figure 6, groups with asterisks). The remaining groups contained genes and GO terms associated with RNA processing, ubiquitination, spermatogenesis, RNA polymerase II (pol II) transcription, and mRNA transport. Among the DAZL-enhanced genes are several required for spermatogenesis (Figure 6, groups A, F, and M), many with essential roles in spermatogonial maintenance or proliferation including ATM (Takubo et al., 2008), DMRT1 (Zhang et al., 2016), NXF2 (Pan et al., 2009), SOHLH2 (Hao et al., 2008), SOX3 (Raverot et al., 2005), and PLZF/ZBTB16 (Costoya et al., 2004). These observations indicate that Dazl directly enhances postnatal germ cell survival by promoting high mRNA levels for a network of genes with key roles in cell cycle regulation and essential for spermatogenesis.
We identified several examples where genes encoding subunits of the same complex were among the 501 DAZL-enhanced genes (for example, gene cluster I in Figure 6). To further assess physical interactions between the protein products of Dazl’s bound and regulated mRNA targets, we used the STRING database of protein-protein interactions (PPIs) (Szklarczyk et al., 2015). Focusing on high confidence PPIs supported by experimental evidence or curated databases (minimum interaction score 0.9), we observed a higher than expected number of the 501 genes encode protein that interact with one another (PPI enrichment p-value = 0). This includes a broad network of interacting proteins including basal transcription factors, subunits of RNA pol II, RNA processing and export factors, chromatin binding proteins, epigenetic regulators, and several components of the proteasome and ubiquitination machineries (Figure 7). Together, GO and PPI enrichment analyses in combination with Dazl CLIP, PolyA-Seq, and RNA-Seq demonstrate that polyA-proximal Dazl-3’UTR interactions directly enhance expression of a network of genes whose protein products physically interact and have essential roles in cell cycle regulation and germ cell survival.
Discussion
The necessity of DAZ proteins for germ cell survival is well established in multiple species. However, the direct targets, regulatory roles, and biological functions of these RBPs remained unclear. Our integrative analyses combining transgenic mice, FACS, and a panel of unbiased, transcriptome-wide profiling tools provide important insights into the molecular and biological functions of this important family of RBPs.
Dazl controls a network of genes essential for germ cell survival
Our RNA-Seq analysis of WT and DAZL KO germ cells shows a critical requirement for DAZL in maintaining fidelity of the germ cell transcriptome. More specifically, we demonstrate that Dazl directly functions as a positive post-transcriptional regulator of a network of genes necessary for cell cycle regulation and germ cell survival. This includes genes with key roles in transcriptional control, chromatin modification, RNA processing, and several regulatory processes and checkpoints that control cell cycle progression. These observations provide important insights into the molecular basis for the dramatic decrease in postnatal DAZL KO germ cells during a period when WT gonocytes normally resume mitotic proliferation.
Global measurements of protein and mRNA stability have indicated that genes with short mRNA and protein half-lives are significantly enriched for the GO terms ‘transcription’, ‘cell cycle’, and ‘chromatin modification’ (Schwanhäusser et al., 2011), all of which are among the most-enriched GO terms associated with DAZL-enhanced genes with Dazl-3’UTR interactions. Notably, the DAZL-enhanced genes were among the most abundantly expressed genes in WT germ cells. Together, these observations indicate that Dazl function is required to maintain high steady state levels of mRNAs that are inherently unstable in germ cells, thus ensuring high concentrations of regulatory factors necessary to resume and/or sustain postnatal germ cell proliferation.
The enrichment of genes that encode core components of the transcription machinery and epigenetic regulators among those enhanced by DAZL indicate that some of the RNA differences between WT and DAZL KO cells may be secondary changes due to altered transcription. Consistent with this possibility, nearly 40% of genes with increased RNA in DAZL KO cells had little to no detectable expression in WT cells (RPKM<1). These observations indicate that in addition to directly regulating cytoplasmic mRNA levels, Dazl exerts indirect control over the germ cell transcriptome by regulating mRNAs for transcription factors and epigenetic regulators that, in turn, define which genes are transcribed. The presence of epigenetic regulators among the list of DAZL-enhanced genes with Dazl-3’UTR interactions (including PIWIL2, TDRD9, DNMT1, and DNMT3B) may also partially explain the defects in erasure and re-establishment of DNA methylation patterns in embryonic DAZL KO germ cells (Gill et al., 2011; Haston et al., 2009).
PolyA tracts facilitate Dazl-3’UTR interactions across the germ cell transcriptome
Transcriptome-wide mapping of direct Dazl-RNA interaction sites shows that Dazl binds mRNAs from a significant proportion of genes expressed in adult and juvenile (3,907 and 2,290 genes, respectively having Dazl-3’UTR interactions). Given the widespread transcription and high RNA levels in male germ cells (Soumillon et al., 2013), it is not unexpected or uncommon for a germ cell-expressed RBP to reproducibly bind transcripts from thousands of genes (Hannigan et al., 2017). However, the Dazl CLIP maps are unusual in that the binding sites are highly concentrated near polyA sites. Our data suggest that widespread polyA-proximal binding reflects a general mechanism of Dazl recruitment and/or stabilization on mRNA. More specifically, multiple lines of evidence suggest a role for local Pabpc1-polyA interactions in specifying where Dazl-RNA interactions occur. This includes the prominent positional bias of Dazl-RNA interactions upstream of polyA sites; enrichment of polyA tracts in the rare set of 5’UTRs with high CLIP read levels; and enrichment of polyA tracts downstream of direct sites of Dazl-RNA cross-linking. Based on these observations and previous evidence of RNA-independent Dazl-Pabpc1 interactions (Collier et al., 2005), we propose that Dazl recruitment and/or stabilization on mRNA is mediated by local Pabpc1-polyA interactions. Considering the prevalence of GUU throughout the transcriptome, a requirement for local Pabpc1-polyA interactions in specifying sites of Dazl-RNA binding would increase the probability of Dazl loading on 3’UTRs, where the majority of cytoplasmic RBPs exert their regulatory functions.
Dazl enhances mRNA levels of specific targets through polyA-proximal interactions
Regulation of mRNA translation and stability are intricately coupled processes (Bicknell and Ricci, 2017), therefore it is not clear whether Dazl functions primarily to prevent mRNA decay, or as a translational enhancer that indirectly impacts mRNA levels. It is also possible that Dazl’s primary function differs on distinct mRNAs. Additional studies are necessary to determine why some mRNAs with polyA-proximal interactions have reduced RNA levels in DAZL KO cells, while others are insensitive to DAZL-deletion. Notably, 3’UTRs of the former had the highest number of GUU sites per 3’UTR, despite modestly shorter lengths (Figure 5F). In addition, Dazl-RNA contacts on DAZL-insensitive genes were more concentrated in the −50nt region, with reduced upstream 3’UTR binding compared to contacts in DAZL-enhanced genes. This suggests that multiple Dazl-RNA contacts, potentially established via 3’-to-5’ multimerization, are associated with enhanced RNA levels for a subset of genes. Binding to a broader segment of the 3’UTR may displace or neutralize mRNA-specific cis -or trans-acting negative regulators that directly or indirectly promote mRNA decay. However, we also identified several examples where the Dazl-RNA binding on 3’UTRs of DAZL-enhanced and DAZL-insensitive genes were indistinguishable suggesting that Dazl functions in a 3’UTR-specific manner(Supplemental Figure 6). These findings also highlight the importance of combining RBP-RNA interaction maps with RNA-profiling to distinguish functional from potentially opportunistic RBP-RNA interactions.
In conclusion, our integrative analysis demonstrates that germ cell survival depends on an mRNA regulatory program controlled by Dazl. Importantly, the data also reveal the molecular basis of germ cell loss in DAZL KO mice. Given the functional conservation between mouse DAZL, human DAZL, and DAZ (Vogel et al., 2002), our study provides important insights into the molecular basis for azoospermia in 10-15% of infertile men with Y chromosome micro-deletions. Dazl’s RNA targets extend far beyond germ cell-specific genes and include many that encode core components of macromolecular complexes present in all proliferating cells. Therefore, our findings may also be relevant to other human diseases as DAZL is a susceptibility gene for human testicular cancer (Ruark et al., 2013), and is amplified or mutated in nearly 30% of breast cancer patient xenografts examined in a single study (Eirew et al., 2015). We propose a general model (Supplemental Figure 7) whereby Dazl binds a vast set of mRNAs via polyA-proximal interactions facilitated by Pabpc1-polyA binding and post-transcriptionally enhances expression of a subset of mRNAs, namely a network of genes essential for cell cycle regulation and mammalian germ cell maintenance. These observations provide new insights into molecular mechanisms by which a single RBP is recruited to its RNA targets and coordinately controls a network of mRNAs to ensure germ cell survival.
Materials and Methods
Animals and tissue collection
C57BL/6J animals bearing the DAZLTm1hgu allele were rederived at the Case Transgenic and Targeting Facility, and bred with animals bearing the Stra8-iCre or IRG transgenes. Mixed background (CD1xC57BL/6J) Stra8-iCre++; DAZLTm1hgu/+ males and IRG+; DAZLTm1hgu/+ females were crossed to generate DAZL WT and KO offspring with GFP+ male germ cells. Day of birth was considered P0. For all procedures, animals were anesthetized by isoflourane inhalation and death confirmed by decapitation or cervical dislocation. HITS-CLIP, iCLIP, adult testis RNA-Seq, and adult testis Poly-Seq were performed using CD-1 animals purchased from Charles River labs. Spermatogonia PolyA-Seq libraries were generated from FACS-isolated cells from 8 week Stra8-iCre+; IRG+ C57BL/6J males as previously described (Zagore et al., 2015). All animal procedures were approved by the Institutional Animal Care and Use Committee at CWRU.
Cell isolation
Isolations of GFP+ cells was performed using dual fluorescence labeling as previously described (Zagore et al., 2015), except P6 cells were not stained with Hoechst and were collected using either BD Biosciences Aria or iCtye Reflection cytometry instrumentation.
Microscopy
P6 testes were decapsulated in cold HBSS, fixed overnight in 4% paraformaldehyde, and paraffin-embedded by the Histology Core Facility at CWRU. Slides containing 5uM sections were de-paraffinized with 3 5-minute washes in xylene followed by a 5-minute rinse in 100% ethanol. Tissue was rehydrated with 5-minute incubations in 100%, 95%, 70% and 50% ethanol followed by tap water. Antigen retrieval was performed with citrate buffer, pH 6.0 (10mM Sodium Citrate, .05% Tween 20), at 95C for 20 minutes. Slides were cooled with tap water for 10 minutes followed by a 5-minute wash in 1X PBS. Tissue was permeabilized for 10 minutes in 0.25% TritonX-100 in 1X PBS and rinsed in PBST (0.1% Tween 20 in 1XPBS). Slides were blocked in 1% BSA in PBST for 1 hour. Sections were incubated with 1:500 anti-GFP (Abcam) and 1:500 anti-Ddx4 (Abcam) for 1 hour followed by 3 5-minute washes in PBST. An additional one hour incubation was performed using 1:100 anti-mouse Alexa-488 (ThermoFisher) and 1:100 anti-rabbit Cy3 (Jackson ImmunoResearch). Slides were washed 3 times in PBST before staining with 0.5ug/mL DAPI for 5 minutes and rinsing in 1X PBS. Tissue was mounted on coverslips with Fluoromount G (Southern Biotech) and images captured using a Deltavision Deconvolution Microscope.
RNA-Seq
Adult testis RNA-Seq libraries (Tru-Seq, Illumina) were prepared from testes of two 8 week CD1 males. Illumina TruSeq/Clontech ultra low input library preparation with Nextera XT indexing was used to generate RNA-Seq libraries from GFP+ WT and DAZL KO cells purified by FACS. All RNA-Seq libraries were sequenced at CWRU Sequencing Core. Read mapping and gene expression quantification was performed using Olego and Quantas (Wu et al., 2013).
PolyA-Seq
Sequencing library construction
Adult testis PolyA-Seq libraries were generated from 8 week CD1 mice. Spermatogonia PolyA-Seq libraries were generated from 400ng of RNA from spermatogonia isolated by FACS from 8 week old Stra8-iCre+; IRG+ males as previously described (Zagore et al., 2015). PolyA+ RNA was selected by oligodT-hybridization (Dynal) and fragmented by alkaline hydrolysis. RNA fragments 50-100 nt long were gel purified from 10% PAGE/Urea gels, and used as template for reverse transcriptions with SuperScript III (Invitrogen). First strand cDNA was gel purified, then circularized with CircLigase (EpiCentre). Circularized DNA was used as the template for PCR. After cycle number optimization to obtain the minimal amount of PCR product detectable by Sybr Gold (Invitrogen), the PCR product was gel purified and used as template for a second PCR with primers containing Illumina adaptor sequences. Adult and spermatogonia PolyA-Seq libraries were sequenced at CWRU and UC Riverside, respectively.
Read processing, filtering, and mapping
Reads consisting of two or more consecutive N’s or all poly(A) were filtered out, then remaining reads were processed to remove adapter and poly(A) sequences. Remaining reads were mapped to mouse genome mm9 with GSNAP allowing 2 mismatches. Reads with identical genomic footprint and 4N’s were then collapsed into single reads as likely PCR duplicates. Mapped reads that were likely internal priming events rather than true poly(A) tails were discarded if they had 6 or more consecutive A’s or 7 or more A’s in the 10 nt window downstream of the read 3’ end (cleavage and polyadenylation site). We accepted only 3’ ends with 1 of 14 hexamers within 50 nt upstream of the 3’ end (as described in (Martin et al., 2012).
Clustering of reads into poly(A) sites
To focus on high confidence poly(A) sites, we accepted read 3’ ends that were > 1 tag per million reads (TPM). Due to heterogeneity in cleavage site selection, reads that fell within 10 nt of each other were clustered into poly(A) sites, and a TPM was calculated for the entire cluster (polyA site region).
Identification of 3’UTRs
PolyA sites regions from adult PolyA-Seq were intersected with RefSeq genes. Intergenic sites within 10kb of an upstream stop codon were assigned to the upstream gene. For each polyA site region mapped to a protein coding gene, the closest upstream stop codon was identified and used to define 28,032 3’UTRs in adult testes (only polyA sites with at least 5% of the total PolyA-Seq reads in a gene were considered). To identify 3’UTRs expressed in spermatogonia, mapped reads from spermatogonia PolyA-Seq libraries were intersected with 100nt regions corresponding to the 28,032 adult testis polyA sites, identifying 16,502 sites with read counts greater than zero in both spermatogonia PolyA-Seq replicate libraries.
qRT-PCR validation
qRT-PCR was performed as previously described (Zagore et al. 2015). Primer sequences are available in Supplemental Table 2.
Dazl HITS-CLIP from adult testes
Library construction
Dazl HITS-CLIP libraries were generated from testes from three biologic replicate 8-week old mice. Testes were detunicated in cold HBSS and seminiferous tubules UV-irradiated on ice. All steps were performed as previously described (Licatalosi et al., 2008; Licatalosi et al., 2012). Libraries were sequenced at the CWRU Sequencing Core and resulting reads mapped using Bowtie and Tophat (Trapnell et al., 2012) and filtered as previously described (Hannigan et al., 2017).
Identification of CLIP peaks
To identify peak regions in BR3 HITS-CLIP regions, the sum of all CLIP read lengths in a cluster was divided by the length of the cluster footprint to obtain an ‘expected’ density. Peaks are defined as regions of at least 10 nt long where the observed CLIP read density exceeded the expected density. 11,224 out of 11,297 had identifiable peaks (99.4% of BR3 clusters, 18364 peaks total).
Peak normalization and binning
For each BR3 peak, the sum of all observed-expected values per nt was calculated and divided by the average RNA-Seq read density per nt of the BR3 cluster. Only peaks within clusters with an average RNA-Seq read density per nt greater than 10 were considered for binning (13,106 peaks).
Genomic distribution of BR3 positions
To annotate BR3 sites, RefSeq coding regions, 5’UTRs, 3’UTRs, and introns were downloaded from the UCSC genome browser and intersected individually with BR3 coordinates.
Examination of genes with 5’UTR BR3 sites
For this analysis, only BR3 sites that mapped unambiguously to a single type of RefSeq gene fragment were examined (see Supplemental Figure 2).
Dazl iCLIP from P6 testes
Library construction and analysis
Individual Dazl iCLIP libraries were generated from three P6 testes and processed similar to HITS-CLIP libraries with the following changes. No 5’linker ligation step was performed, and the RT primer contained iSp18 spacers and phosphorylated 5’ end permitting circularization of first strand cDNA (after gel purification) to generate a PCR template without linearization (as described by (Ingolia, 2010). Libraries were sequenced at the CWRU Sequencing Core. Identical reads within each library were collapsed, and then mapped using Olego (Wu et al., 2013). Reads overlapping repetitive elements were discarded. After removing exon-spanning reads (0.06% of total), MultiIntersectBED (Quinlan and Hall, 2010) was used to identify genomic regions with overlapping CLIP reads from 3/3 libraries, with 6465 regions identified of 10nt or more.
Genomic distribution of BR3 positions
BR3 regions were intersected with RefSeq genes and 3’UTRs of RefSeq protein coding genes. Only 2/6465 BR3 regions mapped to coordinates overlapping coding and non-coding RNAs annotated in RefSeq.
Examination of sequence features
Sequence analyses
Enriched motifs within CLIP regions and UTRs were performed using the EMBOSS tool Compseq. To generate z-scores, shuffled control sets were generated for each dataset analyzed using the EMBOSS tools Shuffleseq (10 shuffled versions of each sequence in each dataset). CIMS analysis was performed as previously described (Zhang and Darnell, 2011).
Distribution of Dazl-RNA contacts in 3’UTRs
Metagene analysis of Dazl-3’UTR interactions was performed on a subset of 3’UTRs defined by PolyA-Seq. 3’UTRs that overlapped with any intron sequence annotated in RefSeq were omitted. Only BR3 sites in genes with a single 3’UTR were analyzed. The resulting 5,284 BR3 HITS-CLIP peaks (adult testis) were examined in 2022 3’UTRs, and 3,321 BR3 iCLIP regions (P6 testis) were examined in 1,604 3’UTRs. To calculate distances, the midpoint of each BR3 site was measured relative to the upstream stop codon and the downstream polyA site. Positions of binding were then examined in 20nt windows relative to the stop codon or polyA site. For each 3’UTR, the expected distribution was determining by counting the number of BR3 sites in the 3’UTR and the number of 20nt bins in the 3’UTR, to determine the likelihood of any 20nt bin in a given 3’UTR containing a BR3 site. The control set of 3771 3’UTRs was identified by intersecting the 28032 3’UTRs from adult testis (identified by PolyA-Seq) with all adult testis CLIP reads, and selecting those with zero CLIP read coverage and no overlap with RefSeq introns.
GO analyses
GO term analyses were performed with the Cytoscape application BiNGO (Saito et al., 2012; Cline et al., 2007) using a hypergeometric statistical test and Benjamini & Hochberg FDR correction (significance level of 0.05) to identify enriched terms after multiple testing correction.. GO Slim settings were used to process the 656 set of genes (RPKM<1 in WT), while GO Full was used for the 1462 and 1584 sets of genes (Dazl-enhanced and repressed, respectively). A set of 13,659 genes with RPKM>1 in either WT or DAZL KO samples was used as the background gene set for enrichment. GO analysis of 501 Dazl-bound and enhanced genes was performed using GO Miner (Zeeberg et al., 2003).
Protein-protein-interactions
The STRING database (Szklarczyk et al., 2015) was used to identify PPIs between protein products of the 501 Dazl-enhanced genes with Dazl-3’UTR interactions. Selected parameters were experiments and databases (for interaction sources), and highest confidence (0.900) (for minimum required interaction score).
All deep sequencing datasets associated with this study are available at NCBI Gene Expression Omnibus accession number GSE108997.
Acknowledgements
We are grateful to Marco Conti for providing DAZLTm1hgu/+ mice, Jo Ann Wise for comments on the manuscript, and the following CWRU core facilities: Transgenic and Targeting Facility (mouse rederivation); Genomics (deep sequencing); Tissue Resources (embedding); Virology, Next Generation Sequencing, and Imaging Core (IF microscopy); Cytometry and Microscopy (FACS); Applied Functional Genomics (low input RNA-Seq libraries). This work was supported by funds from the National Institutes of Health to LLZ (T32 GM08056) and DDL (R01 GM107331).