ABSTRACT
During embryogenesis in animals, initial developmental processes are driven entirely by maternally provided gene products that are deposited into the oocyte. The zygotic genome is transcriptionally activated later, when developmental control is handed off from maternal gene products to the zygote during the maternal to zygotic transition. The maternal to zygotic transition is highly regulated and conserved across all animals, and while some details change across model systems where it has been studied, most are too evolutionarily diverged to make comparisons as to how this process evolves. There are differences in maternal gene products and their zygotic complements across Drosophila species, so here we used hybrid crosses between sister species of Drosophila (D. simulans, D. sechellia, and D. mauritiana) and transcriptomics to determine how gene regulation changes in early embryogenesis across species. We find that regulation of maternal transcript deposition and zygotic transcription evolve through different mechanisms. Changes in transcript levels occur predominantly through differences in trans regulation for maternal genes, while changes in zygotic transcription occur through a combination of regulatory changes in cis, trans, and both cis and trans. We find that patterns of transcript level inheritance in hybrids relative to parental species differs between maternal and zygotic transcripts; maternal transcript levels are more likely to be conserved but both stages have a large proportion of transcripts showing dominance of one parental species. Differences in the underlying regulatory landscape in the mother and the zygote are likely the primary determinants for how maternal and zygotic transcripts evolve.
INTRODUCTION
Many critical early developmental processes are common across all Metazoans, including axial patterning and rapid cleavage cycles. The earliest of these conserved processes are regulated by maternally provided RNA and proteins, which lay the foundation for the rest of development (Tadros and Lipshitz 2009; Vastenhouw et al. 2019). These maternally derived gene products carry out all initial developmental events because at the time of fertilization, the zygotic genome is transcriptionally silent. Post-transcriptional mechanisms also play an important role in regulating the amount of maternal gene products present (Tadros et al. 2007; Rouget et al. 2010; Barckmann and Simonelig 2013), which is beneficial as the zygotic genome is not yet transcriptionally active. As the zygotic genome is activated, control of developmental processes is handed off from the maternally deposited factors to those derived from the zygotic genome in a process known as the maternal to zygotic transition (MZT). The MZT is a highly conserved and regulated process during early development that occurs in all animals and in some species of flowering plants (Baroux et al. 2008; Tadros and Lipshitz 2009; Vastenhouw et al. 2019).
While many aspects of early development are broadly similar across species, certain features of the MZT vary, such as the length of time that developing organisms rely solely on maternal factors and the proportion of the genome that is maternally loaded into the oocyte as mRNA (Vastenhouw et al. 2019). Previous work identified that the maternally deposited transcripts and zygotically transcribed genes differ during early development across species of Drosophila (Atallah and Lott 2018) and more broadly between zebrafish, flies, and mice (Heyn et al. 2014). How differences in gene expression can arise between species in such a highly conserved and tightly regulated early developmental process is unknown. In this study, we characterize the regulatory basis of changes in transcript representation during early development to gain insight into the evolutionary process underlying changes in gene expression, and to understand how transcription is regulated during this critical developmental period.
When considering how regulation of gene transcription evolves between species, one fundamental question is whether differences in expression occur due to regulatory changes in cis regulatory elements (such as enhancers or promoters) or trans-acting factors (such as transcription factor proteins or miRNAs). Unlike changes in cis regulation, which will only affect the allele where the change occurred (assuming the absence of transvection), trans regulators can be pleiotropic and affect the expression of many genes. Previous studies implemented the use of genetic hybrids and methods of detecting allele-specific expression (Wittkopp et al. 2004; Landry et al. 2005; Graze et al. 2009; McManus et al. 2010; Coolon et al. 2014; León-Novelo et al. 2014) to address this question genome-wide. In these investigations, patterns of gene regulatory evolution are determined by comparing transcript levels in hybrids to those in parental lines. Several studies point to differences in cis regulation as the primary mechanism of change in transcript abundance within or between species, while other studies indicate that trans changes are more widespread. For example, a majority of trans regulatory changes were identified as contributing to tissue specific expression divergence in Malpighian tubules (organs that perform excretory and osmoregulation functions) of different strains of D. melanogaster (Glaser-Schmitt et al. 2018) and between whole bodies of D. melanogaster and D. sechellia females (McManus et al. 2010). In contrast to the findings in these Drosophila studies, more changes in cis than in trans regulation were identified as contributing to divergent gene expression in the testes of different species of house mice (Mack et al. 2016). Additional work in Drosophila heads (Graze et al. 2009) also indicate cis regulatory divergence as the leading contributor to regulatory change between species. These studies indicate that the mechanisms of gene regulatory evolution clearly depend on the system and the developmental period or tissue type examined (Buchberger et al. 2019).
To study the regulatory basis of differences in transcript levels between species during early development, we focus on three closely related species of Drosophila (D. simulans, D. sechellia, and D. mauritiana). Despite having a relatively close divergence time of 250,000 years (McDermott and Kliman 2008), these sister species have differences in the pools of transcripts present in the developing embryo both before and after zygotic genome activation (ZGA) (Atallah and Lott 2018). We looked at whether alterations in gene expression occur due to changes in cis, in trans, or in a combination of the two.
We find that patterns of gene regulatory changes between these species are distinct across developmental stages, when comparing hybrids and parental lines of the species D. simulans, D. sechellia, and D. mauritiana. Differences in maternal transcript deposition occur much more frequently due to trans as opposed to cis regulatory changes, while differences in zygotic gene transcription occur through a mix of cis, trans, and the combined action of cis and trans regulatory changes. Even though it may be surprising to find many differences in transcript abundance at the maternal stage due to changes in trans, as changes in trans regulators are more likely to have pleiotropic effects, our results suggest that the maternal stage may have unique features that require gene regulation to evolve via trans changes. We find more genes with conserved transcript levels between species at the maternal stage as compared to the zygotic stage. The species used in this study are very closely related and thus conservation of gene expression is expected, however we find a bigger proportion of genes at the maternal stage in comparison to the zygotic stage that are conserved in the hybrids relative to the parental lines. Additionally, at both stages, many genes in the hybrid have a dominant mode of inheritance, where expression in the hybrid at a specific developmental stage is more similar to one parental species than the other. Overall, we find distinct patterns of gene regulatory changes at the two embryonic timepoints, before and after ZGA, indicating changes in gene regulation differ based on the developmental context.
MATERIALS AND METHODS
Crosses and sample acquisition
Hybrid crosses were set up using virgin females from one species and males from another of each of the following three species: D. sechellia (Dsec/wild-type;14021-0248.25) and D. simulans (Dsim/w[501]; 14021-0251.011) from the 12 Genomes study (Clark et al. 2007) and D. mauritiana (Dmau/[w1];14021-0241.60). Two types of hybrid crosses were established from which embryos were collected: 1) to determine regulatory basis of changes in zygotic expression, hybrid F1 embryos were collected; and 2) to determine the regulatory basis of changes in maternal expression, embryos produced by hybrid F1 mothers were collected. To sample a developmental timepoint after zygotic genome activation, we chose the very end of blastoderm stage, stage 5 (Bownes’ stages (Bownes 1975; Campos-Ortega and Hartenstein 2013)). We define late stage 5 by morphology; it is the point when cellularization is complete but gastrulation has not yet begun. Similar crosses were established with hybrid females from the F1 generations of the initial crosses and males that were the same species as the maternal species in the parental cross. This was done to establish consistency amongst crosses, although there are no known mRNA contributions from the sperm to the zygote, thus the male genotype is unlikely to affect our data. As is conventional in Drosophila genetics, we denote these crosses by listing the female genotype first, e.g. mau x sim, and the male genotype second, in this case sim. We then describe the hybrid genotype as (mau x sim) x mau for this cross (also see Figure 1 for cross diagram). This second set of crosses was used to collect stage 2 embryos (Bownes’ stages (Bownes 1975; Campos-Ortega and Hartenstein 2013)), during which time only maternal gene products are present. At this point in development, the vitelline membrane has retracted from the anterior and posterior poles of the embryo but pole cells have not yet migrated to the posterior.
All flies were raised in vials on standard cornmeal media at 25°C. Flies were allowed to lay eggs for ~2 hours (for collecting stage 2 embryos) and ~3 hours (for collecting stage 5 embryos) before they were transferred to a new vial so that the eggs could be harvested. Eggs were collected from 4-14 day old females, dechorionated using 50% bleach and moved into halocarbon oil on a microscope slide for staging. Embryos were staged at the appropriate developmental time point under a microscope (Zeiss AxioImager M2), imaged, and quickly collected at stage 2 or at the end of stage 5 (Bownes’ stages) of embryonic development (Bownes 1975; Campos-Ortega and Hartenstein 2013).
Once staged, the embryos were quickly transferred with a paintbrush to Parafilm (Bemis) and rolled (to remove excess halocarbon oil) into a drop of TRIzol (Ambion). The embryos were ruptured with a needle so that the contents dissolved in the TRIzol and were transferred to a tube to be frozen at −80°C until extraction. RNA was extracted using glycogen as a carrier (as per manufacturer instructions) in a total volume of 1mL TRIzol. Approximately 80-120ng total RNA was extracted from individual embryos, measured using a Qubit 2.0 fluorimeter (Invitrogen). The quality of the RNA was validated on RNA Pico Chips using an Agilent Bioanalyzer.
Genotyping was performed to determine embryo sex for stage 5 samples, as dosage compensation is not complete and transcript levels for genes on the X chromosome may differ for males and females at this time in development (Lott et al. 2014). DNA was extracted from each sample along with the RNA as per manufacturer instructions and amplified using a whole genome amplification kit (illustra GenomePhi v2, GE Healthcare). Sex-specific primers (Table S1) designed for use with all three species, two sets for the Y chromosome (ORY and kl2) and one control set (ftz), were used to genotype the single embryos after genome amplification. For the stage 5 samples, a total of three male and three female embryos from each cross were used for sequencing.
Library preparation and sequencing
The RNA from single embryos was treated with DNase (TurboDNA-free, Life AM1907) using manufacturer instructions and RNA sequencing libraries were constructed with Illumina TruSeq v2 kits following the manufacturer low sample protocol. The Illumina protocol uses oligo (dT) beads to enrich for polyadenylated transcripts. Because poly(A) tail length is important in determining many post-transcriptional processes during early development, including translational efficiency, it is important to ensure that the method used for mRNA selection does not produce a biased set of poly(A) tail lengths. Previous datasets report poly(A) length distributions for transcripts during oogenesis and early development (Lim et al. 2016; Eichhorn et al. 2016). We could not directly compare our data to previous reports, as these studies were done using D. melanogaster, which may have a different poly(A) tail length distribution than the species used in our analysis. However, previous studies comparing distributions of poly(A) tail lengths of all genes to poly(A) tail lengths of transcripts recovered through poly(A) selection in D. melanogaster have demonstrated that poly(A) selection with commonly used methods does not bias which transcripts are recovered from the total pool of transcripts present (Eichhorn et al. 2016). This includes studies that used the same single embryo approaches utilized here (Crofton et al. 2018; Atallah and Lott 2018). Therefore, it seems unlikely that poly(A) selection heavily biases the extracted RNA relative to the RNAs present at these developmental stages. cDNA libraries were quantified using dsDNA BR Assay Kits (Qubit) and the quality of the libraries were assessed on High Sensitivity DNA chips on an Agilent Bioanalyzer. The libraries were pooled (11-12 samples per lane) and sequenced (100bp, paired-end) in four lanes on an Illumina HiSeq2500 sequencer. Sequencing was done at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley.
Data Processing
Raw reads were processed to remove adapter sequences and gently trimmed (PHRED Q<5) (Macmanes 2014) using Cutadapt (version 1.7.1) (Martin 2011). TopHat (version 2.0.13) (Trapnell et al. 2012) was used to align reads to the D. simulans (version r2.02) and D. sechellia (version r1.3) genome assemblies (from the twelve species project, downloaded from Flybase) and to the D. mauritiana MS17 assembly (Nolte et al. 2013). Because the D. mauritiana line used for sequencing and the line used to construct the genome assembly differed, variant sites from the lab line, called using Genome Analysis Toolkit’s (GATK) Haplotypecaller, were incorporated into the MS17 assembly using Pseudoref (Yang 2017). Additionally, an updated annotation file for the MS17 assembly (Torres-Oliva et al. 2016) was used during alignment and in subsequent processing steps. Annotation files for D. simulans and D. sechellia were obtained from the same versions of the genome release of each species. For read alignment, mismatches, edit distance, and gap length were all set to three when using TopHat (version 2.0.13) (Trapnell et al. 2012) to allow for a higher rate of read alignment.
In order to differentiate reads derived from each parental species, variant sites were called between the genomes of the species used in this analysis. RNA-seq reads from parental species samples (from previous data from Atallah and Lott, 2018, GEO accession GSE112858) were aligned to every other parental genome in each pairwise comparison using TopHat (version 2.0.13) (Trapnell et al. 2012). The Bam files from the TopHat output were sorted and indexed using Samtools (version 1.2, (Li et al. 2009)). Picard tools (version 2.7.1) and GATK tools (Van der Auwera et al. 2013) were then used to identify variant sites by using the following programs: AddorReplaceReadGroups, MarkDuplicates, ReorderSam, SplitNCigarReads, and HaplotypeCaller. Additionally, indels were excluded and sites with single variants selected using the SelectVariants tool. The variants were ordered using a Pysam script (Python version 2.7.10) and read assignments to parental genomes were subsequently organized with custom R scripts using the variant sites that exist between the parental genomes (R version 3.4.1, R Core Team 2017) (Files S1, S2, S3 and S4). This pipeline was also used to update the D. mauritiana MS17 assembly (Nolte et al. 2013) with variants present in the line we used in the lab (Dmau/[w1];14021-0241.60). Normalization to the upper quantile was performed across all samples in each set of pairwise species comparisons. This was used to account for differences in the number of reads for each sample as a result of sequencing.
A cutoff of 5 or more reads mapped to any given gene was set to determine if a gene was expressed. Genes with read counts <5 in both species in any pairwise comparison were not considered to be expressed in either species and were removed from the analysis. This cutoff was tested empirically and was set to exclude genes with low count numbers that had a higher frequency of mapping in a biased manner to both parental genomes. Genes analyzed in this analysis were also limited to those with annotated orthologs in both species in any pairwise comparison. An orthologs table from Atallah and Lott, 2018, was updated using the annotations available on flybase (v2017) and an updated set of annotations from Torres-Oliva, et al. The orthologs table (Table S2) was used to compare genes between each species and in each direction of mapping.
Mapping bias due to differing genome quality may occur when using two different reference genomes. In order to alleviate mapping bias that may occur when mapping the parental and hybrid samples to each parental reference genome, Poisson Gamma models (León-Novelo et al. 2014) were employed to calculate mapping bias for every set of mappings, in each pairwise comparison of species. We found that between 9.6% and 10.9% of genes expressed at stage 2 (with a count > 5) mapped in a biased way to both parental genomes when compared to the total number of orthologous genes between any pair of species. Each between-species comparison has a different number of orthologous genes so the proportion of biased genes varies based on the pair of species compared in a cross. In contrast to the maternal stage, we found that between 5.0% and 6.2% of genes expressed at stage 5 mapped in a biased way to parental genomes when compared to the total number of orthologous genes between a pair of species. Overall, when looking at the total proportion of biased genes, not just those that were called “expressed” in our analysis, we found that between 24.8% and 28.3% of genes at stage 2 and between 20.0% and 21.2% of genes at stage 5 mapped in a biased manner when compared to the total number of orthologous genes in any comparison between species in a cross.
Genes used for stage 5 analysis
To focus on the gene regulation from the zygotic genome after ZGA, we removed genes with high levels of maternal transcript deposition from our analysis. We limited the pool of genes analyzed to those that are mostly-zygotic because roughly half of maternal transcripts are not entirely degraded by stage 5 (although studies are somewhat variable in the percent reported) (Tadros et al. 2007; De Renzis et al. 2007; Thomsen et al. 2010; Lott et al. 2014) and we wanted to examine only those genes that have a larger contribution to expression from the zygotic genome. For this analysis, we included genes with “zygotic-only” expression (those that are not maternally deposited) and genes that are “mostly zygotic” (those with 8-fold higher expression at stage 5 relative to stage 2, a log2 difference greater than three). We tested several cutoffs but chose the 8-fold threshold because at this conservative cutoff, most genes with high maternal transcript deposition are removed from the analysis. Additionally, for this analysis we used confidence intervals and averages generated from only female samples for genes on the X chromosome because dosage compensation is not complete at stage 5 (Lott et al. 2011).
Correlation analysis and PCA
We performed correlation analysis (Figure 2, Table S3) between single embryos across replicates, stages, and genotypes in R (R Core Team 2017) using the Spearman option within the corr function. PCA analysis was also performed in R using the prcomp function (Figure S1).
Cis/Trans analysis
To address both mapping bias and allelic imbalance, we used 95% confidence intervals (CIs) from Poisson Gamma (PG) models (León-Novelo et al. 2014). We used the PG model (with fixed bias parameter, q = 0.5) to define differentially expressed genes as genes with CIs that did not overlap the range of 0.49 - 0.51 when comparing the expression levels of parental alleles in each replicate. We set a slightly more conservative standard for classifying allelic imbalance where genes with CIs below 0.49 or above 0.51 were called differentially expressed. Genes with CIs close to 0.50 did not appear differentially expressed when looking at the count data, so we used a more conservative cutoff. Genes that appeared differentially expressed in one direction of mapping but not in the other direction of mapping were removed from the analysis, as this was determined to be a result of mapping bias between the two genomes. We also removed genes that had disparate confidence intervals in the two mapping directions (i.e. one mapping direction yielded a CI that fell above 0.5 and the other direction of mapping yielded a CI that fell below 0.5).
The genes retained for analysis were categorized using cis, trans, cis + trans, cis x trans, compensatory, and conserved categories as described in Coolon, et al. 2014; McManus, et al. 2010; and Landry, et al. 2004 (Figures 3, S2 and S3). We assigned the following categories for regulatory change based on the CIs generated from PG models for individual genes (see Figure 4 for individual examples):
cis: Genes categorized as having changes in cis are those that are differentially expressed (CIs do not overlap 49% - 51%) between the parental species and in the hybrids. (CIs for parental species and hybrids overlap each other for changes purely in cis. To determine this, we used the CIs generated from mapping to the D. simulans genome for D. simulans/D. mauritiana and D. simulans/ D. sechellia comparisons and CIs generated from mapping to the D. sechellia genome for D. sechellia/D. simulans comparisons.)
trans: Genes that are differentially expressed between the parental species (CI does not overlap 49% - 51%) but are not differentially expressed in the hybrid (CI overlaps 49% - 51%).
cis + trans: Genes that are differentially expressed in the hybrids and between the parental species (CI does not overlap 0.49% - 0.51%) and the CI is in the same direction for both the parents and the hybrid (i.e. both are greater than 51% but the CIs for the parents and hybrid do not overlap. For this comparison, we used the CIs generated from mapping to the D. simulans genome for D. simulans/D. mauritiana and D. simulans/ D. sechellia comparisons and CIs generated from mapping to the D. sechellia genome for D. sechellia/D. simulans comparisons.)
cis x trans: Genes that are differentially expressed in the hybrids and between the parental species (CI does not overlap 49% - 51%) and the CI is in opposite directions for the parents and the hybrid (i.e. one is greater than 51%, the other is less than 49%)
compensatory: Genes that are not differentially expressed between the parental species (CI overlaps 49% - 51%) but are differentially expressed in the hybrids (CI does not overlap 49% - 51%).
conserved: Genes are not differentially expressed between the parental species or within the hybrids (CIs overlap 49% - 51%).
Inheritance Patterns
Previous studies from Gibson, et al. 2004 and McManus, et al. 2010 identified and outlined ways to classify inheritance patterns of transcript abundance in hybrids in relation to parental samples. We used these methods in our study to compare the averages of total expression levels in the hybrids relative to those of parental samples. Gene expression was considered conserved if the expression level between parental samples and the total expression in the hybrid (sum of the expression of the two species-specific alleles in the hybrid) were within 1.25-fold of one another, a log2-fold change of 0.32. Overdominant genes were expressed at least 1.25-fold more in the hybrid than in either parent while underdominant genes were expressed at least 1.25-fold lower in the hybrid than in either parent. Genes that were expressed at an intermediate level in the hybrid in comparison to the parental species samples involved in the cross were defined as additive. Dominance was determined when the hybrid had expression within 1.25-fold of one of the parental species.
Candidate transcription factor identification
We took a computational approach to identify potential transcription factors that may change in trans regulation between the species in our analysis. We used motif enrichment programs to find potential binding sites in the upstream regions of genes changing in regulation in D. sechellia and D. simulans. We omitted D. mauritiana from this analysis because the D. mauritiana genome is not as well annotated as the genomes for D. simulans and D. sechellia. We used the Differential Enrichment mode in MEME (Bailey and Elkan 1994) as well as the findMotifs.pl script in HOMER (Heinz et al. 2010) to identify overrepresented motifs in the regions 500bp upstream of genes changing in regulation or with conserved regulation between species in every set of comparisons at stage 2. In MEME, we used options to find motifs with any number of repetitions and a motif width of 8-12. We used default options for HOMER and supplied a background fasta file for enrichment analysis. The background lists supplied were 500bp upstream regions from all annotated genes in the species except for those that were in the target set (either those genes with conserved or changing regulation in any set of comparisons). The 500bp regions were extracted from fasta files (versions were the same as ones used for mapping) for each species using BEDTools (Quinlan and Hall 2010). Significantly represented motifs in the target lists relative to the background supplied were then compared against databases of known transcription factor binding sites using Tomtom (MEME suite) and HOMER. All enriched motifs that appeared in both HOMER and MEME analyses are included in Table S4. All potential targets of discovered motifs with significant E-values (MEME) or high Match Rank scores in HOMER (>0.8) are also listed in Table S4 (see Figure S4 for transcript levels of differentially maternally deposited targets in embryos of parental species).
Gene Ontology
Gene ontology (GO) analysis was done with the statistical overrepresentation test in PANTHER (Mi et al. 2019) using the default settings. We looked at the GO complete annotations for biological processes and molecular function but did not find any significant terms represented in the cellular component categories. For this analysis, we set a cutoff of Bonferroni adjusted p-value < 0.05. We searched for enrichment of GO categories amongst genes that change in trans in each cross, at each stage, compared to the background of genes that are expressed (having a count >5) in each cross, at each stage. We used REVIGO (Supek et al. 2011) to reduce the number of redundant GO categories and used the small (0.5) level of similarity as a cutoff for redundant GO terms. GO categories shared between two or more crosses at stage 5 are represented in Figure 5 and GO categories unique to a cross are shown in Figure S5. All enriched categories are listed in Table S5.
Data Availability
All sequencing data and processed data files from this study are available at NCBI/GEO at accession number: submitted, awaiting accession number.
RESULTS
In order to determine the regulatory basis of changes in maternal transcript deposition and zygotic gene expression between species, we performed a series of crosses between closely related species followed by RNA-Seq on resulting embryos (Figure 1). We used the sister species D. simulans, D. sechellia, and D. mauritiana, all of which may be crossed reciprocally (with the exception of D. sechellia females to D. simulans males; Lachaise et al. 1986). To investigate regulatory changes in zygotic gene expression, the three species were crossed pairwise (with the noted exception), to produce F1 hybrid embryos, which were collected at a stage after zygotic genome activation (end of blastoderm stage or the end of stage 5, Bownes’ stages; Bownes 1975; Campos-Ortega and Hartenstein 2013). While the zygotic genome is fully activated at this developmental stage, maternal transcripts are not entirely degraded at this time so we limited our analysis to those genes that are expressed at a much higher level after ZGA than before the zygotic genome is activated (see Methods). To discover the regulatory basis of changes in maternal transcript deposition, the species were crossed to produce F1 females, whose embryos were collected at a stage when all the transcripts in the egg are maternal in origin (stage 2). The female F1s were produced from a hybrid cross (e.g. D. simulans females crossed to D. mauritiana males produce (sim x mau) F1 females), which were then crossed to males that were the same species as the females in the initial cross (here, D. simulans) to produce hybrid stage 2 embryos. This example cross will be denoted as (sim x mau) x sim, with the first term of the cross (sim x mau) indicating the maternal F1 genotype, and the second term of the cross (sim) indicating the paternal genotype (also see Figure 1). Three replicate samples were obtained for each cross at stage 2, and since stage 5 features incomplete X chromosomal dosage compensation (Lott et al. 2011), 6 replicates were obtained for each cross at late stage 5 (3 female and 3 male embryos). mRNA-sequencing libraries were constructed from each embryo sample using poly(A) selection. Libraries were sequenced paired-end, 100bp, on an Illumina HiSeq2500.
Reproducibility of Single Embryo RNA Sequencing Data
Previous studies have shown that single-embryo RNA-seq data can be highly reproducible, despite replicate samples representing both biological and technical replicates (Lott et al. 2011; 2014; Paris et al. 2015; Atallah and Lott 2018). Our current study extends this to include replicates of F1 and F2 crosses between closely related species, which are as reproducible as the within-species replicates. Spearman’s rank correlation coefficients are high between replicate samples of the same species or cross at the same developmental stage (Figure 2, A,B,D,E, Table S3). For example, stage 2 samples of the (mau x sim) x mau and (sim x mau) x sim hybrid crosses, have correlation coefficients that range from 0.965 to 0.995 (Table S3). Stage 5 hybrids from the mau x sim cross have equally high correlation coefficients, ranging from 0.980 to 0.996 (Table S3). Similarly, correlation coefficients of D. simulans stage 5 embryos, when compared with other D. simulans stage 5 embryos, range from 0.985 to 0.990. The high correlation coefficients between replicates may be due, in part, to the removal of genes with differential mapping to either parental genome and those genes with very low transcript abundances (see Methods) from this analysis.
Transcript levels for embryos of the same stage but different genotypes (parental lines and hybrids) are highly similar, as indicated by their Spearman’s rank correlation coefficients (Table S3), with one notable exception. When we compare stage 5 hybrids to stage 5 embryos of the paternal species in the cross, we see more divergent patterns of gene expression than when we compare stage 5 hybrids to stage 5 embryos of the maternal species in the cross. For example, comparisons between D. simulans stage 5 embryos and stage 5 embryos of the sim x mau cross, where D. simulans is the maternal species in the cross, have high correlation coefficients, ranging from 0.955 to 0.972. In contrast, sim x mau stage 5 hybrid embryos, when compared to D. mauritiana embryos, have lower correlation coefficients, ranging from 0.863 to 0.887. In this particular comparison, the lower correlation coefficients are likely due to having D. simulans as the maternal species in the hybrid cross for the sim x mau embryos. Remaining maternal transcripts are from the D. simulans alleles and likely explain why these hybrid embryos correlate more highly with D. simulans stage 5 embryos.
In contrast to highly correlated samples within a stage, comparing samples of different stages yields strikingly lower correlation coefficients (Figure 2, C,F, Table S3), emphasizing the turnover of transcripts between these stages. When comparing stage 2 hybrids from crosses with D. mauritiana and D. simulans to stage 5 hybrids from the same cross, correlation coefficients range from 0.483 to 0.573 (Table S3). The correlation coefficients are lower between stages, indicating that the pool of transcripts present at the maternal stage is different from that at the zygotic stage of development.
While samples of the same stage but different genotypes have similar transcript abundance, they are still distinguishable by genotype. Samples of the same stage are highly correlated by their Spearman’s rank correlation coefficients but stage-matched embryos separate out by genotype in principal component analysis (PCA; Figure S1) by the second principal component. The second principal component of this PCA accounts for between 6.94-8.44% of the variance in the three sets of comparisons (the first principal component corresponds to developmental stage and explains between 80.65% and 81.86% of the variance). Although samples of the same stage have similar transcript abundances, as evidenced by correlation coefficients, they are most similar to samples of the same genotype, as seen through PCA.
Regulatory changes at the maternal stage of development
Changes in gene expression can occur through many mechanisms: alterations in chromatin state, differences in cis or in trans regulation, or through post-transcriptional modifications. Here, we examine how cis and trans regulation evolves to differentially affect transcript levels during the maternal to zygotic transition, across species of Drosophila. Changes in cis regulation occur through changes in the DNA of regulatory regions proximal to the gene that they regulate. These types of regulatory changes have an allele-specific effect on gene expression. In contrast, changes in trans regulation occur via changes in factors that bind to the DNA, such as transcription factor proteins. Changes in trans regulation affect gene expression independent of allele-specificity.
In order to determine regulatory changes in cis and in trans that lead to differences in maternal transcript deposition between D. simulans, D. sechellia, and D. mauritiana, we used Poisson Gamma (PG) models (León-Novelo et al. 2014) to determine allelic imbalance between parental lines and within hybrid embryos. Reads from each sample were aligned to both genomes of the parental species used in the cross. In these alignments, we identified genomic sites with fixed differences between lines of species used to determine the parental species of origin for each read (see Methods). We used the results from both directions of mapping, meaning mapping to both parental genomes, to determine mapping bias and allelic imbalance. When reads differentially mapped to the two parental genomes, we identified these genes as biased in their mappings and removed them from the analysis (see Methods). Genes that were not biased in their mapping and had a read count of >5 reads were retained for analysis.
Transcript levels between each parental species were compared using PG models to identify differential maternal transcript deposition between the parental lines. Similarly, transcript levels of species-specific alleles in the stage 2 embryos produced by hybrid mothers were compared to determine differential deposition of maternal transcripts into the embryo. We compared the two sets of analyses to determine the regulatory basis of differential maternal transcript deposition between species. We used the logic of Landry, et al. 2005 to classify genes as having changed in cis or in trans regulation, based on comparisons made with confidence intervals of the bias parameters generated through the PG models (see Methods).
We find that most regulatory changes underlying differentially maternally deposited transcripts occurred in trans between each pair of species examined (Figure 3A and C, Figure S2), where a change in a transcription factor or other trans-acting regulatory element affects both alleles equally (shown in Figure 4A and B). In all pairs of comparisons, the proportion of trans changes was higher than any other category of changes. Interestingly, comparisons between D. simulans and D. mauritiana had the highest percentage of trans-only regulatory changes (between 49.4% and 50.8%) while comparisons between D. simulans and D. sechellia had a lower percentage of regulatory changes solely in trans (32.9%). The second highest proportion (between 15.0% and 26.7%) of regulatory changes between species at the maternal stage occurred only in cis regulation. Slightly fewer regulatory changes occur due to a combination of cis and trans acting factors (between 13.4% and 15.8% in all comparisons). Most genes that change in cis and trans are assigned to the cis + trans category, which indicates that the allele with higher expression in the parental lines is also preserved as the allele with higher expression in the hybrid. We also find a percentage of genes with conserved levels of maternal transcript deposition between species, between 16.4% and 25.1% in all crosses. D. simulans and D. sechellia have the highest percentage of conserved genes while D. simulans and D. mauritiana have the lowest percentage of conserved genes. We also find a small proportion of genes (between 4.2% and 4.7% in all comparisons) that have evolved compensatory mechanisms of regulation, where the genes are not differentially expressed between the parental samples but are differentially expressed in hybrids. This implies that while transcript levels are the same between species, regulatory changes have occurred, which then become visible in the environment of the hybrid. Genes with representative trans changes include regulators with critical functions in important processes governed by maternal gene products, such as Cdk1, a cell-cycle regulator necessary for the rapid cleavage cycles in early development(Farrell and O’Farrell 2014). Examples of individual genes with changes attributed to trans regulatory differences at the maternal stage are represented in Figure 4A and B.
As trans regulatory changes can affect numerous genetic loci, we focused on identifying trans regulators that may differentially affect the maternal transcript deposition of many genes between the species studied. In order to identify binding sites of trans factors that may differentially regulate maternally deposited transcripts, we used a computational approach to search for overrepresented motifs among genes that change in maternal transcript deposition and among those that have conserved transcript levels (see Methods). We looked at the upstream regions of all genes that change in regulation, not only those with differences in trans regulation, because genes with changes in cis regulation may have had changes that affect the binding of the same trans regulators. We used the upstream regions of genes in D. simulans and D. sechellia because the D. mauritiana genome is not as well annotated compared to the other two species in this study. Using HOMER and MEME (Bailey and Elkan 1994; Heinz et al. 2010) we find motifs associated with insulator binding in the upstream regions of genes changing in regulation (trans, cis, cis + trans, cis x trans, and compensatory) as well as in the genes expressed at this stage that do not change in regulation (conserved; see Methods), Table S4. The Dref/BEAF-32 binding site (BEAF-32 and Dref bind overlapping DNA sequences, Hart et al. 1999) is the most significantly enriched (Table S4), and these factors are annotated as insulators (Matzat and Lei 2014; Ali et al. 2016) and known to be present at topologically associated domain (TAD) boundaries (Liang et al. 2014; Ramírez et al. 2018). The binding site for M1BP also appeared significantly enriched in both sets of genes that change in regulation and in ones that are conserved in regulation across species (Table S4). M1BP is involved in transcriptional regulation and RNA polymerase II pausing at the promoter of genes (Li and Gilmour 2013), which may also be associated with chromatin state (Ramírez et al. 2018). Transcript abundance data from our study indicates that Dref, BEAF-32, and M1BP are differentially maternally deposited in several between-species comparisons (Figure S4), although in certain crosses, hybrid reads mapped in a biased way to Dref, BEAF-32 and M1BP, and thus they were excluded from our regulatory analysis. As the motifs for these trans-acting factors are enriched in all maternal genes, they are likely important regulators of transcription during oogenesis, and therefore also likely targets of regulatory evolution between species.
Evolution of regulation for zygotically expressed genes
To determine the regulatory basis of changes in zygotic transcript abundance between species, we compared expression levels in late stage 5 parental species samples to late stage 5 hybrid samples and used PG models to identify cis and trans regulatory changes, similar to our maternal analysis (see Methods). We limited our analysis at the zygotic stage to those genes that are mostly-zygotic: zygotically expressed but not maternally deposited (zygotic-only) or expressed at the zygotic stage at an 8-fold higher level when compared to the maternal stage (we will refer to these as mostly zygotic genes, see Methods).
While we find that most genes change in trans regulation between D. simulans and its sister species at the maternal stage of development, we see different patterns of regulatory changes after ZGA. At the zygotic stage, differences in gene regulation between the three species examined occur mostly in both cis and trans, where changes in both types of regulatory elements affect the transcript level of single genes (Figure 3B and D, Figure S3). Using the framework outlined by previous studies (Landry et al. 2005; McManus et al. 2010; Coolon et al. 2014), we define changes in cis + trans as cases where changes in cis and in trans affect gene expression in the same direction. Here, the allele with higher expression in the hybrid comes from the parental line with the higher level of expression. Cis x trans interactions are described as cases where the changes in cis and in trans have opposing effects on gene expression and result in expression patterns where the allele with lower expression in the hybrid comes from the parental line with the higher level of expression. For zygotic genes at stage 5, changes in both cis and trans regulatory elements (either cis + trans or cis x trans interactions) account for expression differences in between 39.0% and 46.7% of genes in our between-species comparisons. We also see a higher proportion of these interactions occurring in a cis + trans pattern (between 29.4% and 35.0% of all genes) as opposed to a cis x trans pattern (between 9.1% and 11.7% of all genes) of regulatory interactions. Cis-only and trans-only changes account for a smaller number of differences in gene expression levels at this stage in development. In all comparisons, we find between 14.8% and 21.0% of genes changing only in trans regulation. There are between 16.2% and 29.9% of genes that change only in cis regulation between each pair of species compared at this stage in development. Compared to the maternal stage, we find a larger proportion of genes with compensatory changes (between 7.4% and 9.9% of all genes) in gene regulation and a smaller proportion of genes that are conserved (between 6.3% and 8.3% of all genes) between each pair of species comparisons. The smaller number of genes with conserved transcript levels at the zygotic stage compared to the maternal stage is consistent with earlier findings showing maternal transcripts to be more highly conserved between species than zygotic transcripts (Atallah and Lott 2018). Examples of evolved changes include regulators critical to important early zygotic processes, such as gap gene Kruppel and pair-rule gene Sloppy paired 1, which are required for segmentation along the anterior-posterior axis (Nüsslein-Volhard and Wieschaus 1980; Grossniklaus et al. 1992) (Figure 4C and D).
In contrast to what we have found for regulation at the maternal stage, where transcription may be broadly determined by chromatin boundaries, regulation at the zygotic stage can be gene or pathway specific and involve transcription only in a spatially localized subset of cells (Jäckle et al. 1986; St Johnston and Nüsslein-Volhard 1992). As such, if a trans regulator changed at the zygotic stage, it may affect genes involved in a specific developmental process. For these reasons, we wanted to ask if changes in zygotic trans regulation affected particular types or categories of genes. We used PANTHER (Mi et al. 2019) to perform gene ontology (GO) analysis on genes changing only in trans regulation in each pairwise comparison of species at stage 5 (see Methods). The significantly enriched GO categories for molecular function and biological process categories that are shared amongst two or more crosses are shown in Figure 5 and Table S5. Identifying GO categories over multiple crosses identifies the types of genes that evolve changes repeatedly over evolution. Shared categories include those related to DNA binding, positive regulation of transcription by RNA polymerase II, cell fate determination, and several developmental categories. We also find biological process categories unique to a specific cross (Figure S5 and Table S5). Again, many of these categories are related to particular developmental processes, consistent with what is known about regulation of transcription at the zygotic stage.
Modes of Inheritance in Hybrids
Misexpression in hybrid offspring has been used to examine regulatory incompatibilities that may contribute to speciation (Michalak 2003; Moehring et al. 2007; Mack et al. 2016). Total expression levels in the hybrid (the sum of the expression of the two species-specific alleles) can be similar to one parental species or the other, or they can deviate from both parental levels entirely. Genes with much higher or lower expression levels than either parental species are considered misexpressed, and can be sources of regulatory incompatibilities between two species. We use methods developed by Gibson, et al., 2004 to define conserved, misexpressed, dominant, and additive inheritance, although many studies use somewhat different metrics to categorize modes of inheritance, making it difficult to compare findings across the field. As in previous studies (Gibson et al. 2004; Landry et al. 2005; McManus et al. 2010; Coolon et al. 2014), we use a conservative fold change of 1.25 (log2-fold change of 0.32) to define those genes that do not change in transcript abundance between genotypes. Yet even with this conservative cutoff, we find a substantial proportion of genes with conserved transcript levels at each developmental timepoint. Cases where transcript levels are higher (overdominant) or lower (underdominant) in the hybrid relative to either parental species are categorized as misexpressed. In other cases, the total transcript abundance in the hybrids is more similar to one parent versus the other. Here, we categorize the parental line with expression most similar to the hybrid as dominant. Expression in the hybrid can also have a level intermediate to both parental species (additive).
Strikingly, while many genes show conservation of expression levels between parental species and in the hybrids at both developmental stages, we find a much higher percentage of conserved transcript levels between parents and hybrids for genes that are maternally deposited (Figure 6). We find a high proportion of genes with conserved transcript levels at stage 2 in all crosses, between 15.4% and 31.4% of all genes. In contrast, in stage 5 crosses we find conserved transcript abundance in between 4.3% and 7.8% of all genes that are either zygotic-only or are mostly zygotic (see Methods for definitions). While there is a large difference in the percentage of conserved genes between the two stages, our stage 5 analysis is limited to those genes with much higher expression at the zygotic stage in comparison to the maternal stage of development. There may be more genes that are mostly zygotic or zygotic-only that are misregulated at this stage in development relative to all of the genes that are expressed at stage 5.
Of the genes that do not have conserved transcript levels in the stage 2 hybrids, most genes show dominance to one parental species or the other (Figure 6). Few genes show additive patterns of inheritance, in keeping with findings from other studies (Gibson et al. 2004; McManus et al. 2010). We also report a fraction of misexpressed genes as being over- or under-dominant in the hybrids in relation to expression levels in the parental lines. Of those genes that are dominant at stage 2, a higher proportion have D. simulans-like expression (in any cross involving D. simulans) in comparison to the proportion that have expression more like the other parental line in the cross. In the D. simulans/D. mauritiana stage 2 comparison, between 39.0% and 39.7% of all genes had transcript levels that were more similar to the D. simulans stage 2 samples than the D. mauritiana stage 2 samples. In contrast, between 11.2% and 12.4% of all genes in the hybrid have transcript levels more like the D. mauritiana stage 2 samples in this comparison. We find that D. mauritiana has the least dominance in any cross involving this species. In the D. sechellia/D.mauritiana comparisions at stage 2, we find that 14.3-14.8% of transcript levels in the hybrid look more like the D. mauritiana samples while 24.0-26.4% of transcript levels in the hybrid look more like the D. sechellia samples. In the comparison with D. simulans and D. sechellia, more genes in the hybrid had transcript levels similar to the D. simulans parental species (26.3%) than the D. sechellia parental species (12.6%). Taken together, our findings indicate that D. simulans has the most dominant effect on gene expression at the maternal stage in development in this comparison, while D. mauritiana has the least dominant effect, with dominance in D. sechellia falling between the other two species.
We find similar patterns of inheritance in comparisons of stage 5 samples with a few notable exceptions (Figure 6). Of the mostly zygotic and zygotic-only genes that are not conserved in the hybrids at stage 5, many show a similar pattern of dominant expression where the hybrid expression is more similar to one parental species or the other. As in the maternal stage (stage 2), crosses with D. simulans have a higher percentage of genes with expression in the hybrid more similar to D. simulans than to the other parental species in the cross. For example, in crosses between D. simulans and D. mauritiana, between 22.3% and 26.8% of all genes have an expression level similar to D. simulans while only between 7.9% and 12.1% of all genes have an expression level more similar to D. mauritiana. In the D. simulans/D. sechellia comparison, more genes have an expression level in the hybrid more similar to D. simulans parental samples (24.2%) than to D. sechellia (17.2%). Similar to the stage 2 comparisons, D. sechellia exerts the second most dominant effect in the hybrids, with D. simulans having the most dominant effect. In crosses with D. sechellia and D. mauritiana, between 23.7% and 26.1% of genes show D. sechellia dominance while only between 11.5% and 13.2% of genes show D. mauritiana dominance. Interestingly, in crosses where D. mauritiana is the maternal species, there is a slightly higher percentage of genes showing D. mauritiana dominance (12.1% versus 7.9% in the crosses involving D. simulans and although less striking, 13.2% versus 11.5% in the crosses involving D. sechellia). This is likely due to the interaction of residual maternal factors from D. mauritiana that have not yet degraded by stage 5 with gene products from alleles of the paternal species expressed after zygotic genome activation. In contrast to stage 2, we find more genes that have an additive mode of inheritance or that are misexpressed in the hybrids at stage 5 (between 27.9% and 45.7% of all genes in stage 2 crosses vs. between 50.8% and 61.3% in stage 5 crosses). Previous studies indicate that additive inheritance is associated with cis regulatory divergence (Lemos et al. 2008; McManus et al. 2010). This is in keeping with our findings that a larger proportion of genes at the zygotic stage have expression divergence due, in part or wholly, to cis regulatory changes and that more zygotic genes show an additive pattern of inheritance.
DISCUSSION
When looking at both regulatory changes and modes of inheritance, we find more genes with conserved transcript levels among those that are maternally deposited relative to those that are zygotically transcribed. This is in agreement with previous studies that identified high conservation of maternally deposited transcripts between species (Heyn et al. 2014; Atallah and Lott 2018) and indicates that the maternal stage is highly conserved. The post-ZGA stage examined here is more complex. While widespread ZGA has occurred, there is still a large proportion of maternal transcripts remaining (while prior studies have varying estimates, roughly 50% of total transcript pool at late stage 5 is maternally derived; Tadros et al. 2007; De Renzis et al. 2007; Thomsen et al. 2010; Lott et al. 2014). In order to examine how zygotic genes change in regulation, it was necessary to focus on genes without a large maternal contribution to transcript levels at the zygotic stage in development. However, sampling a subset of zygotic genes may have eliminated genes with conserved transcript levels between species. Many genes that are expressed at the zygotic stage are involved in regulating conserved processes and housekeeping functions. Transcript levels for genes involved in these processes may be less susceptible to change over the course of evolution but as many are maternally deposited, they were removed from this analysis. Further, the large proportion of genes with conserved transcript levels at the maternal stage may be unexpected considering that there is substantial post-transcriptional regulation of maternally deposited factors (Tadros et al. 2007; Rouget et al. 2010; Barckmann and Simonelig 2013), so it is not clear that a high degree of conservation at the transcript level should be necessary. Further study is needed to disentangle conservation at the transcript and protein levels of maternal factors across species.
In addition to finding many genes with conserved transcript abundance between species, we find a large proportion of genes with dominant patterns of inheritance at the maternal and the zygotic stages. At both developmental timepoints, D. simulans has the most dominance, where the highest proportion of genes with dominant inheritance in the hybrid is most similar in expression to the D. simulans parental line, in crosses involving D. simulans. The second largest category of genes with dominant patterns of inheritance in the hybrids has expression levels most similar to the D. sechellia parental line. D. mauritiana exerts the least amount of dominance in terms of inheritance patterns in the hybrids. The dominant patterns of inheritance are in keeping with the high number of trans regulatory changes we see, as previous work identified trans changes to more likely to be associated with dominance (Lemos et al. 2008). Other work with these species also identified many genes with dominant patterns of expression in hybrids relative to parental lines (Moehring et al. 2007; McManus et al. 2010; Coolon et al. 2014). We find more genes in hybrids with transcript levels more closely matching those in D. simulans in comparison to D. mauritiana, similar to previous work (Moehring et al. 2007). When comparing D. simulans and D. sechellia, we see a higher proportion of genes in hybrids with total expression levels similar to D. simulans. This is opposite of the pattern observed in a previous study, which found equal levels of dominance from both D. simulans and D. sechellia in the hybrid (Moehring et al. 2007). Our findings and those from previous studies may differ in the D. simulans and D. sechellia comparison because different developmental stages were examined in each study. Differences in the species compared and developmental stages or tissue types examined across studies make it difficult to draw conclusions from direct comparisons, but may indicate something fundamental about how gene expression is regulated or evolves at different developmental stages or in different tissues.
Patterns of gene regulatory changes differ greatly between the developmental stages examined, even though the two stages have similar modes of dominant inheritance. We find an overwhelming number of trans regulatory changes that result in differential maternal transcript deposition between the species examined. The biological and regulatory context differs between the two stages and may explain why patterns of gene regulatory evolution are different at the developmental timepoints we examined. Maternal transcripts are produced by support cells called nurse cells during oogenesis, and either transported by actin-dependent mechanisms or dumped into the oocyte along with the cytoplasmic contents of the nurse cells upon apoptosis of these cells (Kugler and Lasko 2009). Many aspects of maternal provisioning have been well-studied in D. melanogaster, including transport of transcripts into the oocyte (Mische et al. 2007), localization of transcripts within the oocyte (Theurkauf and Hazelrigg 1998), post-transcriptional regulation of maternal gene products (Salles et al. 1994), and subsequent degradation of maternal transcripts (Tadros et al. 2007; Bushati et al. 2008; Laver et al. 2015). Surprisingly, how transcription is regulated in the nurse cells is not well understood. Nurse cells are polyploid and transcribe at a high level to provide the oocyte with the large stock of transcripts needed (we extract >100ng of total RNA at both stages), which is exceptional considering the oocyte begins as essentially a single cell (though a highly specialized one). We have found here, and also through another study using computational methods to look for binding motifs in maternal factors across the Drosophila genus (Omura CS and Lott SE. The conserved regulatory basis of mRNA contributions to the early embryo differs between the maternal and zygotic genomes. in prep.; submitted as a supplemental file for initial submission), that maternal transcription is associated with factors annotated to be insulators and that interact with topologically associated domain boundaries. These two studies both provide evidence that maternal transcription is being controlled broadly at the level of chromatin state. In this context, a few trans regulators are predicted to be responsible for the bulk of maternal transcription, and thus changes in the levels of trans regulators at this stage may easily be responsible for changes in transcription level for a number of genes. How changes in trans regulators affecting chromatin state can have subtle quantitative effects on transcript level, as observed, seems less clear. Functional investigations of mechanisms of regulatory control at the maternal stage are ongoing.
In contrast to the large proportion of regulatory changes in trans at the maternal stage, differences in zygotic gene transcription between these species are characterized by a combination of changes in cis, trans, cis + trans, and cis x trans. Zygotic gene transcription is spatially and temporally regulated across the embryo with enhancer regions playing a large role in where and when genes are expressed (Haines and Eisen 2018). Changes in trans regulation, which can affect the expression of many genes, may be detrimental to the developing organism at this stage, whereas changes in cis regulation are gene-specific and may only affect gene expression in a subset of the embryo. Fundamental differences in the regulatory landscape of the maternal and zygotic stages likely explain why the evolution of gene expression occurs through different mechanisms for transcripts that are maternally deposited and genes that are zygotically transcribed.
In this study, we find that differences between species in transcript levels of maternally deposited transcripts and zygotically transcribed genes evolve via different patterns of regulatory change. We find that maternal transcript abundance is more conserved and also changes more readily through trans regulation in comparison to zygotic complements. Regulatory organization and constraints that are specific to each developmental stage is likely to play a large role in determining how gene regulation can evolve at these two embryonic timepoints. Further study is needed to characterize the molecular basis of evolved changes in transcript level on a single gene basis, and more generally what is controlling the regulatory landscape at each stage in development.
ACKNOWLEDGMENTS
We would like to thank members of the Lott lab for comments on the manuscript, as well as members of the UC Davis fly community for their contributions. We would also like to thank Amanda Crofton and Anthony Le for their assistance with genotyping and RNA extraction and Graham Coop for helpful discussions. This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health grant R01GM111362 and the Floyd and Mary Schwall Fellowship.
Footnotes
Data availability: The raw sequencing reads and processed data are available at GEO/NCBI at accession number (submitted, will add when available).