The DNA-to-cytoplasm ratio broadly activates zygotic gene expression in Xenopus

Summary In multicellular animals, the first major event after fertilization is the switch from maternal to zygotic control of development. During this transition, zygotic gene transcription is broadly activated in an otherwise quiescent genome in a process known as zygotic genome activation (ZGA). In fast developing embryos, ZGA often overlaps with the slowing of initially synchronous cell divisions at the mid-blastula transition (MBT). Initial studies of the MBT led to the nuclear-to-cytoplasmic ratio model where MBT timing is regulated by the exponentially increasing amounts of some nuclear component ‘N’ titrated against a fixed cytoplasmic component ‘C’. However, more recent experiments have been interpreted to suggest that ZGA is independent of the N/C ratio. To determine the role of the N/C ratio in ZGA, we generated Xenopus frog embryos with ∼3-fold differences in genomic DNA (i.e., “N”) by using X. tropicalis sperm to fertilize X. laevis eggs with or without their maternal genome. Resulting embryos have otherwise identical X. tropicalis genome template amounts, embryo sizes, and X. laevis maternal environments. We used the X. tropicalis paternally derived mRNA to identify a high confidence set of exclusively zygotic transcripts. Both ZGA and the increase in cell cycle duration are delayed in embryos with ∼3-fold less DNA per cell. Thus, DNA is an important component of the N/C ratio, which is indeed a critical regulator of zygotic genome activation in Xenopus embryos.


Introduction
Animal development is initially driven by the large pool of maternal factors until the maternal-to-zygotic transition, when the developing embryo initiates transcription to begin to take control of its own development 1 . The timing of embryonic gene activation requires coordination of maternal mRNA translation and chromatin changes with cell cleavage divisions that increase cell number to prepare for gastrulation 2 . Yet our understanding for how the zygotic genome initiates transcription in the hours after fertilization remains incomplete.
Classic work in rapidly developing vertebrates produced a model where changes in the nucleo-to-cytoplasmic ratio (N/C) induced simultaneous events at a mid-blastula transition (MBT) comprising cell cycle lengthening and the activation of the zygotic genome 3,4 . The N/C ratio model proposed that the rapid cell divisions of early development that proceed with negligible cell growth change the ratio of one or more nuclear components relative to cytoplasmic volume to induce the MBT. The N/C model was initially based on experiments manipulating DNA content or mechanically separating cytoplasm and then assaying their effects on cell cycle durations. In multiple species, increasing ploidy or decreasing cytoplasm advanced the MBT, and decreasing ploidy delayed the MBT 3,5-8 .
While the classic N/C ratio likely controls cell cycle changes, it has been unclear whether transcription was similarly N/C ratio-dependent 9 . It is now apparent in several fast developing species that hundreds of genes are actively transcribed during cleavage divisions before the MBT, indicating that zygotic genome activation does not necessarily occur at the same time as other MBT events [10][11][12][13][14][15][16][17][18] . Different genes are activated at different times and are regulated by multiple maternally provided transcriptional activators [19][20][21][22] . Moreover, recent work has called into question the extent to which the N/C ratio influences transcription initiation in fish and frogs 21,23 . Although there exists evidence that MBT cell cycle lengthening permissively allows transcription of long genes in Drosophila 15,24 , it has been suggested that neither the N/C ratio or cell cycle slowing affects large-scale z ygotic g enome a ctivation (ZGA) timing in Xenopus frogs 14 . Instead, models suggest that the timing of zygotic gene expression is largely due to activator concentration changes, histone acetylation near promoters, and chromatin state changes 14,23,25 , and may therefore be unrelated to the N/C-ratio. One possibility to bridge this N/C ratio divide would be if a subset of genes were N/C-ratio regulated while others were not, as was suggested to be the case in Drosophila 26 . Thus, the extent to which the N/C ratio influences transcriptional timing remains ambiguous.
Here, we sought to determine the extent to which the N/C ratio regulates zygotic genome activation by examining frog embryos. Prior experiments in Xenopus frogs supporting N/C ratio control of ZGA were limited to assaying either exogenously injected reporter constructs, RNA Pol III mediated transcripts, or a handful of in vivo transcribed mRNA genes 4,[27][28][29] . Furthermore, interpretations of prior studies in vertebrate embryos that underwent altered ploidy in haploids are confounded because any changes in ploidy also change the DNA template for gene expression. To overcome limitations of previous studies, we exploited the viability of Xenopus laevis and Xenopus tropicalis hybrids to create embryos wherein DNA content can be manipulated in vivo , while maintaining a constant genomic template for RNA. We compare the zygotic transcriptome in embryos with a ~3-fold difference in genomic DNA and find that timing for nearly all ZGA genes at the MBT is regulated by the DNA-to-cytoplasm ratio. Increased transcription at the MBT was due to an increasing number of transcripts per genome in addition to the exponentially increasing number of genomes. Our results, combined with recently reported cell size effects -wherein a critical cell size threshold may be necessary for bulk transcriptome activation in individual embryonic cells in Xenopus 30 -support the view that DNA is a critical parameter for "N", and that cell size (or rather, cytoplasmic volume) is the critical parameter for "C". We suggest a model wherein activators and chromatin changes initiate gene expression, but expression levels and timing for ZGA genes are additionally regulated by the N/C-ratio at the MBT.

Results
Using X. laevis and X. tropicalis hybrids to investigate ZGA To assess whether the N/C ratio regulates zygotic gene expression in Xenopus embryos, we focused on the "N" numerator that relates to some parameter(s) of the nucleus. Prior work suggests that one important component of "N" is likely DNA or chromatin, as increased or decreased ploidy manipulations cause an advance or delay in cell cycle lengthening, respectively. The constant length of DNA per cell can then serve as a yardstick to "measure" the exponentially decreasing amount of cytoplasm per cell, as proposed in titration models for MBT induction 4 .
To test whether a change in the amount of DNA per cell affects transcription, we required an experimental approach in which total embryonic DNA content can be manipulated without 3 changing the maternal cytoplasm or amount of DNA template for gene expression. Prior experiments linking ploidy to MBT transcriptional timing relied on injection of exogenous plasmid DNA 4 or haploid embryos 7,29 . However, changing the amount of DNA template for transcription confounds interpretation because one cannot distinguish whether expression level changes are due to changes in timing or due to gene dosage.
To circumvent the limitations of previous experiments to determine whether or what portion of zygotic gene expression is regulated by the N/C ratio, we took advantage of the ability of related Xenopus frog species to cross-fertilize and form hybrid embryos that are competent to develop through embryogenesis and morphogenesis into adult frogs [31][32][33] . X. laevis is allotetraploid (~2.8 x 10 9 Gb DNA; 2n=36 chr) and X. tropicalis is diploid (~1.4 x 10 9 Gb; 2n=20 chr) 34,35 . Thus, fertilizing X. laevis eggs with X. tropicalis sperm results in hybrids whose cells retain one copy of each species' genome -the Xenopus short (S) and long (L) subgenomes and a haploid tropicalis genome -for 3 haploid genome equivalents (~4.2 x 10 9 Gb; N=28 chr). This chromosome complement is stable throughout development.
Using X. laevis (egg) x X. tropicalis (sperm) hybrids as a baseline, we generated embryos with ~3-fold less DNA content by irradiating the X. laevis egg before fertilization with X. tropicalis sperm ( Figure 1A). A short pulse of a ~350 nm UV laser was sufficient to crosslink the X. laevis egg genome without otherwise damaging the embryos or the ability of these embryos to be fertilized by wild-type X. tropicalis sperm ( Figure 1C-D) 32,36 . Crosslinking the maternal genome results in rapid extrusion or loss of the entire maternal genome in the embryo such that only the haploid paternal X. tropicalis genome remains 36 . We refer to these embryos as nucleocytoplasmic hybrids (hereafter, "cybrids") since they contain the maternal X. laevis cytoplasm, and the X. tropicalis nucleus and DNA. We performed karyotyping at the neurulation and tailbud stages to measure single-cell ploidy for both hybrids and cybrids. Hybrids contain the expected 28 chromosomes (18 from laevis and 10 from tropicalis ), whereas the cybrids contained only 10 chromosomes ( Figure 1C-D).
We observed normal development in hybrid embryos from the blastomere stage across gastrulation and neurulation, as previously reported (Gibeaux et al., 2018b). Swimming tadpoles were more morphologically similar to wild-type X. laevis tadpoles than were X. laevis haploids, consistent with prior work 32 . Therefore, X. laevis (egg) x X. tropicalis (sperm) hybrids appear developmentally "normal" during embryogenesis in contrast to haploids in either species which 4 are smaller and have a greater number of morphological aberrations visible at the tailbud stage 32 (data not shown). Like wild-type and hybrid embryos, cybrid embryos were fertilized with high efficiency and displayed synchronous cleavage divisions from the 1-cell stage until the MBT.
Cybrid embryos also develop normally through the MBT and early gastrulation (eventually dying post neurulation) and are thus suitable for studying early embryonic gene expression 32 . The resulting hybrid and cybrid embryos therefore provide a ~3-fold difference in genomic DNA while other aspects of embryo development are maintained.

Defining X. tropicalis ZGA genes
The first step in determining the effect of DNA content on zygotic transcription is to determine the set of genes that are transcriptionally activated during the MZT. While defining ZGA genes is typically difficult due to the large maternal pool of mRNA, it is straightforward to do with hybrids because many transcripts from the X. tropicalis genome can be distinguished from the X. laevis maternal pool. We therefore assayed gene expression in hybrids across the MBT every 30 minutes, a time equivalent to ~1 cleavage cell division, beginning at 5 hours post-fertilization (hpf) using RNA-seq of the non-ribosomal transcriptome. The transcriptomic time series data from up to 11 independent samples across 3 replicates was of high quality and reproducible with high correlations between neighboring time points and replicates at each stage ( Figure   S1A-B). Because the transcriptome composition across early embryogenesis is extremely dynamic compared to somatic cells, we added exogenous RNAs during embryo collection to normalize our samples. Spike-ins showed reproducibility across replicates and time points, and matched their expected abundance ( Figure S2A-C). This normalization approach using spike-ins allows us to perform rigorous quantitative analysis on the resulting data.
X. tropicalis zygotic gene expression in hybrids increases gradually through the first few time points and then more extensively at the MBT ( Figure S3A). Furthermore, we could accurately distinguish most X. tropicalis zygotic transcripts from their orthologous maternal X. laevis S and L transcripts despite their high abundance ( e.g., cdk9, srsf6, vegt, sox2, pou5f3.3, and ets2 ) ( Figure S3B-D). The vast majority (~99%) of sequences from early developmental stages before 7 h.p.f. were X . laevis maternal genes, as expected ( Figure S3A). A subpopulation of X.
tropicalis genes that exhibited high early expression were likely X. laevis maternal genes mis-attributed to X. tropicalis due to sequence similarity. Our filter excluded these genes from further analysis. This approach allowed us to identify a set of 595 zygotically expressed X.
tropicalis genes in hybrids that we define as "ZGA-genes", which all satisfy specific threshold criteria (see methods) ( Figure 2A-B). The expression of the remaining transcriptome, composed primarily of X. laevis maternal genes and X. tropicalis genes that did not pass our filters, remained broadly unchanged through the MBT (5.5 hpf compared to 9.5 hpf) ( Figure 2B).
Although the ZGA set did not include the first expressed transcript in Xenopus , miR-427 , due to extremely high early expression, as expected, this miRNA exhibited a clear ZGA-like trend and was 128-fold more expressed post-MBT vs pre-MBT ( Figure 2B). In addition, our defined ZGA genes are overrepresented for processes such as transcription, gastrulation, and germ layer formation GO terms, as we would expect from bona fide Xenopus ZGA genes ( Figure S4A).
The overall trend for our defined ZGA set is low or zero expression before the MBT, which is followed by an hour of gradual increase, and then a sharp increase at the MBT ( Figure 2C-F) 16,17 . This trend is consistent across replicates from independent clutches as the replicates show high reproducibility at all common time points ( Figure 2C). Our defined zygotic gene set includes many canonical ZGA genes such as gata6 , fgf8, eomes, cdk9, klf17, and gs17 ( Figure 2D-E) 17 , as well as gene families that regulate gastrulation and germ layer development ( Figure S4A). In addition, the ZGA gene set shows statistically significant overlap with previously identified X.
tropicalis zygotic genes ( Figure S4B) 16,17 . Our RNA-seq approach is sensitive enough to detect relatively lowly expressed zygotic genes ( Figure 2E-F), and the maximal expression of each gene ranges over 1000-fold. We conclude that our approach to define X. tropicalis ZGA genes in hybrids is sensitive and accurate, and provides a high-confidence ZGA gene set for rigorous analysis of gene expression timing.

Greater DNA broadly induces earlier ZGA timing
Having defined a high confidence set of X. tropicalis ZGA genes, we sought to determine whether their transcription was sensitive to cellular DNA content. The DNA-to-cytoplasm ratio model predicts that altering DNA content in cells of otherwise similar size will cause a shift in the onset of zygotic gene expression around the time the MBT initiates at the 12th-13th cleavage divisions in Xenopus 3,37 .
To test whether a decrease in per-cell DNA content can delay the ZGA, we compared high-resolution RNA-seq time series expression profiles from hybrids to that of cybrids, which contain ~3-fold less DNA per cell. Hybrid and cybrid embryos from each biological replicate were sampled in matched time points from the same clutch and fertilization event. Two individual samples, and their matching time points, were removed due to low quality from 2   6 replicates. This resulted in 3 gene expression time-courses with matched hybrids and cybrids across the MBT. Cybrids developed normally through the MBT and early gastrulation 32 . We found that the population of maternal X. laevis mRNAs in cybrids was both highly and stably expressed across the cleavage stages, and was very similar to the maternal mRNA pool in hybrids including mRNAs for several transcription factors critical for genome activation ( Figure   S3E) 21 .
A comparison of gene expression during early development in hybrids and cybrids was in broad agreement with the DNA-to-cytoplasm ratio model. We found that ZGA genes were either not expressed or expressed at low levels at early time points, and not noticeably different in hybrids and cybrids ( Figure 3A). At the MBT, however, the ZGA genes had decreased expression as a group in the cybrids relative to the hybrids, suggesting a delay in transcriptional activation in the cybrids that have 3X less DNA per cell ( Figure 3B). After the MBT at 9.5 hpf, ZGA genes were highly and similarly expressed in hybrids and cybrids, suggesting that expression levels for delayed genes in the cybrid eventually recovers to that of the hybrid ( Figure 3C). Moreover, this cybrid ZGA expression level recovery to that of hybrids indicates that the observed expression delay is not due to grossly aberrant or sick embryos. Together, these trends reveal that the expression of ZGA genes is reduced in embryos with less DNA, consistent with the N/C-ratio influencing ZGA timing directly through genomic DNA.
Having seen that higher DNA content broadly increases gene expression, we next sought to analyze how DNA content regulated the timing of transcriptional activation of individual genes.
To do this, we first analyzed how individual gene expression levels differ at each time point in the cybrid relative to the hybrid ( Figure 3F). During the cleavage stages before the MBT ( e.g. , 5.5 hpf) as well as after the MBT ( e.g. , 9.0 hpf), the distribution of per-gene fold-changes was centered around zero in both ploidy conditions ( Figure 3D). In contrast, at the MBT (8.0 hpf), the majority of ZGA genes exhibited decreased expression levels in cybrids ( Figure 3D, middle), which is consistent with more DNA advancing zygotic expression. This trend is also present when examining the mean fold-change across all time points in all replicates ( Figure 3E, S5A-C). The initial delay in gene expression is slightly dependent on the individual gene's maximum expression level ( Figure 3G), likely due to the technical issue of more highly expressed genes more rapidly reaching our detection threshold after initiating transcription.

7
The distribution of expression differences for hybrids and cybrids in the ZGA gene set roughly followed a gaussian distribution with no clearly demarcated subpopulations of genes with unchanged or increased expression ( Figure S5D). The lack of obvious subpopulations within the ZGA set suggests that the majority of ZGA genes are delayed in embryos with less DNA per cell rather than there being a clearly defined subset of genes expressed independently of the N/C-ratio, as has been reported in Drosophila 26 .
Taken together, our analysis so far shows that the vast majority of zygotically expressed X.
tropicalis genes are delayed in embryos with less DNA per cell, consistent with an influence of the N/C-ratio on ZGA timing. As an independent approach to compare ZGA dynamics in hybrids and cybrids, we assessed when a specific gene sharply increases its expression in each condition. Rather than manually annotate regions of expression increase or choosing a simple threshold to determine activation times, we instead developed an algorithm that could be applied in a uniform manner to all genes without bias. To do this, we first normalized the gene expression time series in both hybrids and cybrids, and applied a filter to systematically remove outlying data points or genes with highly aberrant expression curves ( Figure S6A), resulting in 547 ZGA genes from our original set of 595. We then estimated continuous gene expression levels across the time course using smoothing splines with a uniform smoothing parameter ( Figure S6B and 4A) 38 . This algorithm was applied to both the hybrid and cybrid time series for the two replicates containing eight or more time points. The "activation time" ( t Act ) for each gene was determined by calculating 10% of the maximum expression along the profile. This approach was superior to fitting the time series gene expression data with alternative models, such as hinge functions or exponential functions (see methods for discussion), and was consistent across replicates ( Figure S7A-B). We note that t Act is unlikely to describe the initial transcription of each zygotic mRNA, which occurs up to several hours before the MBT for hundreds of genes 14,16 . Rather, our algorithm detects activation as the moment of strong deviation from basal expression levels.
Activation times for X. tropicalis ZGA genes in hybrids were similar across replicates ( Figure   S7A-B), and were correlated with wild-type X. tropicalis activation times generated from our algorithm applied to published wild-type data 17 ( Figure S7C). These results further validate our approach and demonstrate that activation times for X. tropicalis genes in hybrids represent a meaningful biological feature of X. tropicalis zygotic gene expression timing. 8 Decreased DNA per cell led to delayed activation of ZGA genes. We found that the mean activation time in cybrids was delayed by ~21 min ( Figure 4B-D), which is near the time of 1 cleavage division cycle in hybrids (~26 min, Figure 5A). Indeed, the majority of ZGA genes (76.4%; 412/539) exhibited a delay in activation times in cybrids and only a few genes (1.3%; 7/539) advanced their activation times similarly in both replicates ( Figure S7E-G). The shift in activation times did not depend upon a gene's expression level ( Figure 4E) and there were no clear subpopulations whose activation timing was not sensitive to DNA content ( Figure S7H).
Thus, by incorporating information from the entire time-series, we show that gene expression occurs both earlier and at a higher level at a given time point in embryos with more non-template DNA.

DNA content regulates cell cycle duration at the MBT
One hallmark of early development is the transition from rapid and synchronous cell divisions into asynchronous and slower cell cycles. This cell cycle lengthening is regulated by the N/C ratio across many species (reviewed in 2 ) and is likely due to a combination of molecular Having estimated transcription per gene copy, we sought to determine if our identification of transcriptional activation times was sensitive to the number of gene templates per embryo. We applied our smoothing spline analysis to this genome-normalized transcriptomic profile for each gene, and algorithmically identified activation times ( t Act-TPG ) for hybrids and cybrids. Activation times estimated using transcripts-per-genome were in general agreement with those from our analysis without genome normalization (r = 0.80 and r = 0.70 for hybrids, 2 replicates) ( Figure   5C). We note that activation times were estimated to be roughly 30 minutes earlier on a gene-by-gene basis in the genome-normalized dataset, suggesting that this approach may more sensitively detect transcriptional activation. Finally, genome-normalized activation times showed a gene expression delay in cybrids with less DNA ( Figure 5D). Considering that cybrids will generally have more cells per embryo at a given post-MBT time point ( Figure 5B), this result strongly indicates that the observed N/C ratio phenotype is not simply due to cell number differences between hybrids and cybrids.
Our transcription per genome analysis identified an important increase in the transcription rate per gene copy through ZGA. To view this, we centered each ZGA gene's genome-number normalized transcription time series on the time points before and after the estimated original activation time ( t Act ) ( Figure 5E). The slope in expression-per-genome-per-hour of ZGA gene population expression profiles is centered around zero before the genome-normalized activation time, with the high variance likely due to much lower expression levels. We then observed a sharp increase in transcription immediately following activation as the slopes significantly increased ( Figure 5F). This is consistent with a template number-independent increase in transcript levels across the ZGA. Thus, increased transcription near the MBT is due to an N/C ratio-dependent "boost" to transcription per gene copy in addition to the exponential increase in the genome template after each cell division.

Discussion
Overall, our results demonstrate that the DNA content in embryos of constant size and maternal environment clearly affects the timing of zygotic transcription. To show this, we used hybrid frog embryos of X. tropicalis and X. laevis to manipulate embryonic DNA content by 3-fold, while maintaining a constant number of X. tropicalis genome templates from which to measure zygotic transcription. The presence of the tropicalis genome in the laevis maternal environment allowed us to accurately define zygotically activated genes. We compared hybrid embryos with cybrid embryos where the laevis genome had been destroyed prior to fertilization using mild UV irradiation so that there is ~3X less DNA per cell. Broad zygotic genome activation and cell cycle lengthening were delayed in cybrid embryos. Moreover, we did not find evidence of any specific subsets of genes whose zygotic activation did not depend on the N/C-ratio. DNA is therefore a major component of the N/C-ratio that regulates the timing of large-scale zygotic transcription and accelerates gene expression through the MBT in Xenopus .
Our experiments inform the debate around the identity of the "N" and "C" components of the N/C-ratio model. The denominator, "C", appears directly proportional to the cytoplasmic volume.
Blastomeres in the same division cycle are either larger or smaller depending on their position in the embryo, and the onset of bulk zygotic transcription occurs earlier in smaller cells and generally depends on cell size rather than the time since fertilization 30 . The specific aspects of cell size that regulate transcription are unclear, but likely relate to amounts of histones, replication factors, or other molecules whose decreasing cellular amounts can be titrated against the constant amounts of some nuclear factors, N 4,37,39,41 . Here, our work shows that DNA is a critical component of "N", while previous work also found a contribution to N from nuclear volume 28,29 .
One of the more surprising findings in this work is that we failed to find evidence of a discrete subpopulation of genes whose expression timing at the MBT did not depend on the N/C-ratio.
Instead we found that the vast majority of defined ZGA genes delayed their expression in response to 3X less DNA per cell. This result is in contrast to evidence for N/C-ratio-and non-N/C-ratio-regulated gene subsets in Drosophila 26 . Similarly, in another model vertebrate, zebrafish, it was recently reported that at ~90% of zygotic genes showed some N/C-ratio dependency and ~10% were N/C-ratio independent 23 . Taken together, our work indicates that the vast majority of ZGA genes in fast developing vertebrates is regulated by the N/C ratio.
While our work definitively shows that DNA is a crucial component of the global N/C ratio regulating ZGA, the classic N/C-ratio model is incomplete to explain ZGA initiation. For example, the expression of the first gene in zebrafish, miR-430, is unchanged in tetraploid embryos 47 . In addition, zygotic transcription can be detected prior to the MBT when zebrafish or Drosophila embryos are cleavage-arrested at low N/C-ratios 23,43 . Moreover, as zygotic genes are activated at distinct times and levels, diverse transcriptional activators must contribute to determining when, where, and how much transcriptional activation takes place for each gene. Thus, many non-mutually exclusive regulatory mechanisms such as basal activator accumulation, Brd4 activity and -mediated histone acetylation of gene promoters, maternal transcription factor translational regulation, and maternal signaling-induced co-factors all likely act in concert to precisely control the broad onset and timing of zygotic genome activation 14,[19][20][21]23,25,27,48,49 . Here, we conclusively demonstrate that the N/C ratio is also a crucial component regulating ZGA. We propose that the increasing N/C ratio and the exponentially increasing number of genomes globally boosts transcription ( Figure 6)

Declaration of Interests
The authors declare no competing interests.    (A) Schematic illustrating the experimental design. Xenopus laevis has ~2X more DNA than Xenopus tropicalis . X. laevis eggs fertilized with X. tropicalis sperm generate hybrid embryos. UV-irradiation destroys the genome in X. laevis eggs so that subsequent fertilization results in cybrid (cytoplasmic hybrid) embryos containing 3-fold less DNA per cell than hybrid embryos. The X. laevis maternal environment and the X. tropicalis genome template for RNA expression are otherwise identical in both conditions. (C) Images of representative interphase nuclei from hybrid (left) and cybrid (right) embryos, stained to visualize DNA (DAPI, blue) and centromeres using primary antibodies against the centromere-specific protein CENP-A (yellow). The number of interphase centromeres is equivalent to the number of chromosomes.
(D) Graph displaying the chromosome counts in nuclei examined from hybrids and cybrids. Each data point represents a single nucleus, and each cluster is from a different embryo. The number of chromosomes decreases from ~28 in hybrids to ~10 in cybrids. The single nucleus (arrow) with 20 chromosomes in cybrid-embryo-1 was noted to be at the mitotic stage with the expected double X. tropicalis chromosome complement. (A) Heatmap of defined zygotic X. tropicalis gene expression in hybrids across the MBT. Genes were hierarchically clustered across time points by their normalized expression. Expression counts were log2 normalized prior to clustering. Color value scale represents min (expression = 0, dark blue), mean (exp=10), and max (exp=20, dark orange) expression. All expression values in all figures are ERCC-spike-in normalized gene counts prior to additional normalization unless otherwise noted.
(B) Scatterplot showing gene expression for X. tropicalis ZGA genes (red) and remaining genes (gray). Log2-normalized counts expression values are compared before and after the MBT, at 5.5 (Y-axis) and 9.5 (X-axis) hpf, respectively. Each point represents a single gene. The first transcript expressed in X. tropicalis , miR-427 , is annotated (purple). Dashed diagonal line is X=Y.
(C) Scatterplot of the gene counts centroids (mean X and mean Y) from each time point for the 595 ZGA genes, for all common time points from two biological replicates. Each time point is represented by a separate color. Axes display log2-normalized counts. Dashed line is a linear fit to the data.
(D) Genome browser images showing gene expression signal (spike-in normalized read counts) for zygotic X. tropicalis transcripts for 5 representative ZGA genes--gata6 , znf44 , fgf8 , eomes , and eef1a1o . Signal tracks include merged data from replicates 2 and 3. Read count shown in Y-axis. Signal intensity was originally at base-pair resolution (unbinned) with mean signal over windows shown for clarity. Transcript structure at top in black (boxes = exons; line = introns).
Note the presence of intronic signal in zygotic X. tropicalis genes (black arrows) that likely represents nascent transcription. Grey asterisk indicates where the 9.5 hpf Y-axis for znf44 is scaled 6-fold higher to better display the complete intron/exon signal. Note that the 9.5 hpf Y-axis for eef1a1o is scaled to match the earlier time points, and therefore does not display the entire gene signal.
(E) Expression time-course profiles for individual ZGA genes. Gene expression is normalized using spike-in counts. Points are overlaid with a lowess fit (purple line) and standard error of the fit (purple shading).     Normalized counts (log2) Rep 3    (B) Scatter plot showing results of cross-species BLAST between the X. laevis and X. tropicalis transcriptomes (mRNA sequences). There were ~2400 interspecies gene pairs with at least one contiguous nucleotide stretch of >97% identity; each point is one gene pair; genes within a pair are not mutually exclusive, e.g., one X. laevis gene could be the top BLAST hit for multiple independent X. laevis genes. X-axis shows the length of contiguous nucleotides with >97% identity. Y-axis shows the proportion of each X. tropicalis mRNA covered by the contiguous nt region. The blue lines indicate thresholds of 0.3 (Y-axis) and 75 nt (X-axis). All genes (n=22) with at least one stretch of 75 nt with >97% interspecies identity with a length over 30% of the total mRNA were not included in the final ZGA gene set (upper right portion of plot).

hpf
(C) Genome browser images showing gene expression signal (spike-in normalized read counts) for X. laevis and X. tropicalis transcripts for cdk9 and srsf6 . Signal tracks include merged data from replicates 2 and 3. Read count shown in Y-axis. Signal intensity was originally at base-pair resolution (unbinned) with mean signal over windows shown for clarity. Transcript structure at top in black (boxes = exons; line = introns). Note the lack of intronic signal in maternal X. laevis genes that are likely mature, spliced transcripts, and the presence of intronic signal in zygotic X. tropicalis genes (black arrow) that likely reflects nascent transcription. cdk9 (top) and srsf6 (bottom) shown as representative MZT genes.
(D) Expression time-course profiles for cdk9 and srsf6 , which have both maternal ( X. laevis ) and zygotic ( X. tropicalis ) mRNA expression in hybrids. The X. laevis genome had undergone a duplication and subsequent divergence that results in two subgenomes, short (S) and long (L). The subgenomes include paralogous transcripts for most mRNAs that have high, but not identical, sequence homology. The lack of cross-alignment in the earlier time points indicates that each RNA species ( X. laevis L, X. laevis S, and X. tropicalis ) can be distinguished. Gene expression was normalized using spike-ins. Points are overlaid with a lowess fit and the associated standard error of the fit. (Gentsch et al., 2019b). Vegt , sox3 , pou5f3.3 , and ets2 have both maternal ( X. laevis ) and zygotic ( X. tropicalis ) mRNA expression in hybrids. As in panel C, the lack of cross-alignment in the earlier time points indicates that sequencing reads arising from each species can be distinguished. Gene expression was normalized using spike-ins. Points are overlaid with a lowess fit and the associated standard error of the fit.  Figure S4.

Supplemental Figure 4. Validation of defined zygotically active genes in hybrids.
(A) Gene Ontology (GO) term enrichment of the defined ZGA genes compared to the entire transcriptome. X-axis is -log(p-value), where the p-value was calculated using a hypergeometric test. Fold enrichment of each GO-term in ZGA genes vs. the transcriptome is represented by color as displayed in the legend at right.
(B) Venn diagram shows overlap between our hybrid X. tropicalis ZGA gene set and wild-type X. tropicalis ZGA genes defined in published data sets. Only genes annotated with common names were used to directly compare genes from different genome assemblies and annotations, v7 (  Act=8.213 Supplemental Figure 6. Data processing and smoothing parameter estimation for spline fits of gene expression dynamics. (A) Gene expression profiles showing examples of filtered points. Initial smoothing spline fits were generated using all data points, and any single outlier data point was removed (see methods for precise filter definitions) and a smoothing spline was re-fit to the remaining data (see methods). Black line in each expression profile indicates the re-fit spline. Dashed blue line indicates the inferred "activation time" for each gene. Y-axis expression counts were mean-normalized before applying the filter and spline fit. If 2 or more data points were removed, the gene was not included in downstream analysis. After filtering, 550/595 genes were retained, with activation times determined for 547 genes.

Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Jan Skotheim ( skotheim@stanford.edu ).

Materials Availability
This study did not generate new unique reagents.

Data and Code availability
Sequencing read data generated during this study are available at NCBI Gene Expression Omnibus, GSEXXXXX, [weblink]. Adult, sexually mature inbred Nigerian strain X. tropicalis males (7 months to 2 years old)

KEY RESOURCES
were obtained from NXR (NXR_1018). All frogs were housed and maintained in the Stanford Aquatic Facility staffed by the Veterinary Service Center. X. laevis were housed at 18°C with a 12/12 hour light/dark cycle, and frogs were fed twice weekly X. tropicalis were housed at 26-28° C. Animal work was carried out in accordance with the guidelines of the Stanford University Administrative Panel on Laboratory Animal Care (APLAC).

Fertilization of hybrid X. laevis eggs with X. tropicalis sperm
Ovulation and egg collection of female X. laevis for hybrid generation was performed as above, but with eggs collected in 10 cm plastic petri dishes. To obtain X. tropicalis testes, males were euthanized in frog tank water and 2g/L Tricaine-S (MS222) and 5 mM sodium bicarbonate prior to dissection. Testes were obtained immediately through dissection, cleaned and placed into 1X MMR at room temperature for fresh use in fertilization. Eggs used for hybrids were set aside for 15-20 minutes while the eggs used for cybrids were manually selected (see below). Both testes from a single male were masticated in a 1.5 µl Eppendorf tube with 400µl 1X MMR and added to the X. laevis eggs. The petri dish was tilted ~15% and excess testes pieces were rubbed over the eggs. After 3 minutes, the dish was flooded with ddH20 and this time was noted as the fertilization time (t=0 hpf). After 5 minutes, the ddH20 was gently exchanged for 0.1X MMR. The

UV-irradiation to generate Cybrids
To generate cybrid embryos that lack maternal DNA, a subset of the X. laevis eggs collected for hybrids were selected for UV-irradiation prior to fertilization. Embryos to be irradiated were treated with 25 µM of 8-methoxypsoralen in 1X MMR for 10-15 minutes before UV-irradiation, which has been shown to aid in complete X. laevis genome cross-linking without adversely affecting the health of the embryo or ability to develop into the tadpole stage 36 . Prior to making cybrids, we tested different UV wavelengths, crosslinking conditions, and UV-light durations on X. laevis eggs to identify the optimum conditions for which maternal egg DNA is completely absent from resulting embryos, but where the eggs remain healthy, have a fertilization efficiency similar to that of unirradiated eggs using both X. laevis and X. tropicalis testes, and match the survival rates to tadpoles in X. laevis x X. laevis crosses generated using irradiated X. laevis sperm instead of the egg (data not shown). For cybrid fertilizations, 50-60 eggs were individually transferred to a dry 6 cm plastic petri dish using a Dumont #55 forceps to gently hold the jelly coat. Individual eggs were surrounded by a small drop of 0.3X MMR, and manually adjusted with the pigmented side up ( i.e. , germinal vesicle near the top), with care taken to touch only the jelly coat and avoid direct contact with the egg. The petri dish was filled with 0.3X MMR. Eggs were irradiated by placing the 6 cm petri dish in a dark chamber for 2 minutes under a 500 Watt Hg arc lamp (Oriel Instruments) producing a downward facing focused beam with diameter of ~8 cm such that every egg in the dish was covered by the UV light. The UV light was filtered to only allow 300-400 nm wavelength light onto the sample with a peak wavelength of ~350 nm.
Following irradiation, the petri dish solution was immediately exchanged for 1X MMR. The small petri dish was then placed in a large glass dish. Fertilization was achieved as with hybrids, with both hybrid and cybrid eggs from the same clutch receiving an equal portion (~200 µl) of the testes/MMR solution from the same male. Fertilizations of hybrid and cybrids were synchronized and embryos de-jellied as described above.

Embryo time course sample collection for RNA-seq
Hybrid and Cybrid embryos from the same fertilization event were collected in stage-matched samples. 3 replicates were obtained from independent females on different days: replicate 1 consisted of 4 time points spaced 1 hour apart; replicates 2 and 3 were 9-10 time points sampled every 30 min. Embryos were assessed at all stages after fertilization and any embryos with even minor aberrant cleavages were removed (always <10% of embryos in each condition).
At each time point, 5 embryos were gently transferred to a 1.5 ml Eppendorf tube using a plastic Pasteur pipet cut with a wide bore tip such that embryos never touched the pipet tip. Embryos were allowed to settle, and excess MMR removed. Embryos were washed quickly with ~1.5 ml cold RNase-free embryo wash buffer (EWB; 10 mM K-HEPES, pH 7.7, 100mM KCl, 1 mM MgCl 2 , 0.1 mM CaCl 2 , 50mM sucrose). Immediately after removal of EWB, 500 µl of Tripure was added to each tube, which was then vortexed for 30 seconds to homogenize the embryo until no visible pieces remained. 400 µl additional Tripure was added and samples were flash frozen in liquid nitrogen and stored at -80° C until RNA extraction.

Karyotyping of hybrids and cybrids by Immunohistochemistry of Centromeres
Embryos were karyotyped using protocols from the Grainger lab (University of Virginia)  Tris-HCl, pH 7.4, 150 mM NaCl with 0.1% Triton X-100, and 2% bovine serum albumin).
Samples were rinsed twice and exposed to 10 µg/mL Hoechst 33258 and 1μg/mL rabbit-anti-xCENP-A diluted in AbDil for 1 hour, washed three times with AbDil, and then exposed to 1 μg/mL AlexaFluor 568-conjugated goat anti-rabbit secondary antibodies (Life Technologies) for 1 hour. Slides were rinsed 3 times, mounted in 100 µl 70% Glycerol in 1X PBS, sealed under a coverslip with clear nail polish, and stored at -4°C until imaging.

Centromere Imaging and Quantification
Imaging was performed on an IX70 Olympus microscope with

Imaging and quantification of cell cycles
To measure cell cycle duration during embryogenesis, we generated and analyzed movies as described in 37  for 1 hour at room temperature followed by isolation with a minElute RNA Cleanup Kit (Qiagen).
5 μg DNase-treated RNA was used as input for ribosomal depletion using the Epicentre (Illumina) Ribo-Zero ribosomal depletion kit ( Cat# SCL24H ). Ribosomal RNAs were depleted and RNA quality was assessed post-depletion using a bioanalyzer to ensure loss of 18S and 28S rRNA. rRNA-depleted RNA was then purified using Ampure SPRI beads with a 1.8X bead to sample ratio.

RNA-seq library preparation and sequencing
RNA was converted to cDNA libraries ready for sequencing using the stand-specific Script-seq V2 RNA-seq kit (Illumina). cDNAs were generated, amplified, and indexed according to the manufacturer's instructions. 14 PCR cycles were used to amplify libraries and samples were
This approach can more accurately quantify 'isoforms' arising from the same transcript 52 and with our Xenopus hybrid transcriptome we can avoid complications from substantial multi-mapping or erroneously mapping to the wrong subgenome. We find that the majority of genes detected in the early stages of the hybrid are X . laevis maternal genes, as expected, and that X. tropicalis zygotic gene expression increases gradually through the first few time points and then more rapidly at the MBT ( Figure S3A). Furthermore, we could accurately distinguish highly expressed maternal X. laevis short (S) and long (L) subgenome transcripts from their cognate X. tropicalis zygotic transcripts for the maternal mRNA of several key transcription factors, including vegt , sox3 , pou5f3.3 , and ets2 21 ( Figure S3E).

Normalization with exogenous spike-in RNA
The transcriptome composition across early embryogenesis is extremely dynamic compared to somatic cells. Maternal mRNAs are degraded while the number of expressed zygotic genes increases from zero to thousands of genes in several hours. Therefore, standard RNA-seq normalization approaches using the background transcriptome or library size are inappropriate.
We therefore use absolute normalization with exogenous spike-in RNAs added during embryo collection to normalize Xenopus transcripts across time points. ERCC Spike-ins showed reproducibility across replicates and time points, and matched their expected abundance ( Figure S2A-C were used in all analyses unless otherwise noted.

Post-processing and filtering of transcripts
Prior to TPM calculations, we removed several genes with extremely high read counts due to homology with rRNA, or mitochondrial RNA, whose reads were not removed during the filter step ( Xetrov90027392, Xetrov90027395, dnajc28_1, Xelaev18000045m, Xelaev18003735m, Xelaev18003967m , slc41a2_1, Xelaev18002981m.g, LOC108645507 ). We also removed all tRNA entries for similar reasons.
We found using BLAST and manual curation that the earliest and most highly expressed zygotic transcript, miR-427, had multiple FASTA entries in version 9.1 of the X. tropicalis transcriptome. Some entries were exact duplications and others were transcript variants.
Because miR-427 is transcribed from repetitive loci and its transcript variants are difficult to quantify, we combined all read counts from these loci into a single entry, named " Xetrov90009984m " in our data set. The following miR-427 loci counts were combined: After the above filtering procedures, we re-calculated estimated TPM values per gene from ERCC spike-in normalized kallisto-derived counts using the formula: 10 6 * ( (estimated_counts / effective_length) / (SUM (estimated_counts / effective_length))). These TPM values were used in the determination of our ZGA gene set. Count values were then used for all other analysis unless noted otherwise. We used spike-in normalized count values because counts are more directly comparable across conditions and time points than TPM.

Identification of zygotic transcripts
To analyze expression dynamics in embryos of different cellular DNA content, we first needed to define the set of X. tropicalis ZGA genes. In principle, every sequencing read mapped to X.
tropicalis should be from a zygotically expressed transcript. However, the high maternal expression of highly similar X. laevis mRNA sequences may allow a low level of cross-contamination due to kallisto using a 31-mer k-mer based approach to assign reads to transcripts. Thus, even if only 5% of the reads from a highly expressed X. laevis maternal mRNA were mis-aligned to X. tropicalis , this could result in false positive expression and mis-characterization of that gene as zygotic X. tropicalis . In addition, the expression profile of many genes with low blastula or MBT expression that express highly in gastrulation could present as noisy data and are not useful for time-series comparisons. Therefore, we applied filters to X. tropicalis genes to identify a conservative, but high-confidence zygotic gene set. We first selected for X. tropicalis genes with expression less than 6 TPM (~30-90 raw estimated counts, depending on gene length) in the first two time points, to remove genes with any potential contamination from X. laevis maternal reads. Next, we selected genes whose expression increased over the time series at least 8-fold between the sum of the first two time points and the sum of the 9.0 and 9.5 hpf time points.
We applied these thresholds to the hybrid and cybrid expression data from replicate 2 and replicate 3, which resulted in ~1200-1400 'ZGA' genes per condition. We retained 722 genes common in all 4 data sets to allow comparison between hybrids and cybrids. We then removed genes with fewer than 10 counts total in time points 9.0+9.5 hpf, removed provisionally annotated histone genes due to their repetitive loci origin, and summed counts for 5 genes that

Fold-change calculations
We reasoned that a subtle shift in time-series expression between hybrid and cybrid conditions might not be detected if replicate expression values were merged initially because the variability in MBT timing between different clutches can be as high as 1-2 cell cycles (~30-60 min). To assess differences in time-series between hybrids and cybrids, we therefore determined the log2-fold-change at each time point per replicate, using the hybrid as a reference, such that a negative fold-change indicates less expression in the cybrid. For plots in Figure 3, replicate log2-fold-changes were averaged across the 2-3 replicates at each time point.

Gene activation time detection algorithm
We reasoned that any change in the activation time between hybrids and cybrids may represent a shift in transcription dynamics near the MBT, and therefore devised an algorithm to rigorously estimate the "activation time" for each gene. We note that this activation time does not correspond to the first transcription event in the embryo for a given gene (e.g., we detect expression earlier than the activation time using qPCR; data not shown). Rather, activation time is the time of large-scale transcript level increase from background cleavage-stage transcription levels to the level that occurs near the MBT. Thus, our activation time likely represents a "boost" in total embryonic transcription levels for a gene. This could be due to an increase in transcription rate at each promoter, or an increase in the number of cells expressing the transcript (at a constant rate), or both. While such an activation time could be obtained from the observed time series using a manual threshold, this threshold would depend on the observer so that we would not be confident in reporting differences in the 30-60 minute time scale required for our analysis.
In order to identify activation times that occur between data points, we used our data to generate a continuous estimate of gene expression. To do this, we fit the observed time series data to a cubic smoothing spline using SciPy's Univariate Spline function (SciPy v1.5.2 Reference Guide; https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html).
Smoothing splines are well-suited for interpolation of RNA-Seq data for two reasons. First, unlike exponential or polynomial regressions, smoothing spline regression does not require a priori assumptions about the shape of the underlying function. Second, in contrast to polynomial splines, a smoothing spline allows us to control how tightly the curves fit to (potentially random) variation in the data. The smoothing parameter p varies from 0 to 1 and determines the tightness of the fit. When p=0, the function passes through each data point, while for p=1, the function is a linear fit.
Prior to fitting the time series to the smoothing spline, it was first necessary to remove data points that were clear outliers because these outliers can drive estimates of the activation time. During the experiment, we expect gene expression to either remain constant (prior to zygotic gene activation) or monotonically increase (following activation). Therefore, we "flag" any upwards or downward 'spike' in the time series for potential removal. We consider the point to be an upward spike if the mean-normalized count value is more than 0.2 greater than both the previous and the subsequent time points. Similarly, a downward spike was taken to occur when the mean-normalized count value is more than 0.2 less than the previous and the subsequent time points ( Figure S6). The above procedure may prove too stringent when a point is flagged due to its deviation from another spike and not from the trajectory of the remaining points. To avoid erroneously discarding a data point, we discard only one of every pair of adjacent flagged points according to the following procedure: First, each point is "unflagged" and fitted to a smoothing spline with p = 0.69, excluding the remaining adjacent point. Next, the point with the lowest distance to the resulting curve, is retained. Since this procedure requires fitting non-flagged points to a cubic smoothing spline, we discard any time series with greater than four flagged points before resolving adjacent flags. Of 595 original genes, 550 were retained across all four conditions (2 replicates, hybrid and cybrid). We note that special consideration must also be given to the first and last points in the time series, which cannot be flagged as spikes by virtue of lacking a previous or subsequent point, respectively. While early expressed time points did not show significant deviations from constant expression, the last point was occasionally significantly lower than the next-to-last point. In the latter case, we flag both the last and next-to-last points and discard one according to the general procedure for resolving adjacent flags described above.
We define the activation time as the first time point in the interpolated smoothing spline when the observed count level exceeds 20% of the spline's maximum value on the interval 5-10 hours. While other threshold cutoffs and metrics based on the first or second derivative of the spline fit were possible 59 ), this method best accorded with visual inspection of gene curves. Our estimate of activation time will have error associated with the experiment and with the algorithm.
We therefore sought to identify the smoothing parameter that minimized the error associated with the algorithm, which corresponds to minimizing the variation from experiment to experiment. As a way of doing this, we selected the smoothing parameter that maximized the inter-replicate activation time Pearson correlation coefficient for the top 100-highly expressed genes. The resulting optimum parameter was 0.69. We determined activation times for 547 of the 550 genes retained post-filter.

Analysis of transcription dynamics per genome
Although Many of the resulting transcripts per genome curves showed an initial decrease before activation, but this concavity was seldom consistent across replicates. Despite this caveat, we were still able to reliably identify activation times from the transcripts per genome time courses using a modified version of the algorithm described above for the total transcription time courses.
More specifically, we defined the activation time as the first interpolated time point where the spline fit attained greater than 20% of maximum expression and had both positive first and second derivatives. The resulting activation times are well-correlated with the activation times determined from the transcript count series but are on average ~30 min earlier. This indicates that the relative ordering of transcriptional activation times remains consistent whether activation is defined on an embryo-wide or per genome basis.

QUANTIFICATION AND STATISTICAL ANALYSIS
Embryo samples were collected in a non-blinded way for the hybrid and cybrid treatment conditions. Individual embryos from each treatment group were randomly selected for each time point. Tubes for RNA extraction, cDNA generation, and library prep steps were processed in time-series order, alternating between hybrid and cybrid conditions, to avoid time-point batch effects during sample processing. Data was analyzed using R, Python, and Unix. Statistical