Abstract
Some stalk-eyed flies in the genus Teleopsis carry selfish genetic elements that induce sex ratio (SR) meiotic drive and impact the fitness of male and female carriers. Here, we produce a chromosome-level genome assembly of the stalk-eyed fly, T. dalmanni, to elucidate the pattern of genomic divergence associated with the presence of drive elements. We find evidence for multiple nested inversions along the sex ratio haplotype and widespread differentiation and divergence between the inversion types along the entire X chromosome. In addition, the genome contains tens of thousands of transposable element (TE) insertions and hundreds of transcriptionally active TE families that have produced new insertions. Moreover, we find that many TE families are expressed at a significantly higher level in SR male testis, suggesting a molecular connection between these two types of selfish genetic elements in this species. We identify T. dalmanni orthologs of genes involved in genome defense via the piRNA pathway, including core members maelstrom, piwi and Argonaute3, that are diverging in sequence, expression or copy number between the SR and standard (ST) chromosomes, and likely influence TE regulation in flies carrying a sex ratio X chromosome.
Introduction
The genome was once thought to be little more than a blueprint needed to accomplish the biological functions and, ultimately, reproduction of an organism. Research in past decades has demonstrated that genomes of most organisms are colonized by Selfish Genetic Elements (SGEs) with their own evolutionary interests (Werren et al. 1988; Burt and Trivers 2006; Werren 2011; McLaughlin and Malik 2017). The most well-studied category of SGE - and arguably the most impactful - are transposable elements (TEs). TE’s comprise at least 50% of the human genome (International Human Genome Sequencing Consortium 2001), but most are inactive. In the genome of the model dipteran Drosophila melanogaster, TEs make up about 20% of the genome (Hubley et al. 2016), but up to one-third actively produce new insertions (McCullers and Steiniger 2017). Transposable elements negatively impact hosts in a variety of ways, such as disrupting functional elements in the genome, causing (by their repetitive nature) ectopic recombination (Langley et al. 1988), repressing nearby gene expression (Sienski et al. 2012; Lee 2015; Lee and Karpen 2017), and increasing costs of DNA replication and storage (Badge and Brookfield 1997). Of course, like any type of mutation, TE insertions can occasionally be beneficial (Miller et al. 1997; Sinzelle et al. 2009; Jangam et al. 2017), but the overall negative impact of TEs on organismal fitness is so important that hosts have repeatedly evolved diverse genomic defenses against them (Selker 1990; Aravin et al. 2001; Muckenfuss et al. 2006; Vagin et al. 2006; Wolf and Goff 2009). Genes in the piwi-associated RNA (piRNA) pathway are a primary line of defense against TEs in the germline of Metazoa, and adapt to target active elements by incorporating transcriptionally active repetitive sequences into piRNA loci, which are then expressed and processed into piRNAs that silence their parent elements (reviewed in (Aravin et al. 2007)). Transgenic constructs carrying novel TE’s can quickly become incorporated into piRNA loci (Le Thomas et al. 2013), and during the recent spread of P-elements in Drosophila, new insertions into piRNA loci have occurred many times in response to the new genomic threat (Zhang et al. 2020).
Another well-studied type of SGE are meiotic drivers, variously known as segregation distorters or transmission-ratio distorters. These elements spread by manipulating gametogenesis in their favor –at a cost to the homologous locus – leading to greater than 50% representation of the driver in mature gametes (Lyttle 1991). If drivers are sex-linked, a skew in the sex ratio of offspring will result. Such sex ratio distortion (SRD) may cause population collapse or even extinction (Hamilton 1967), but can be maintained stably (reviewed in (Jaenike 2001; Lindholm et al. 2016)) if impacts on carrier fitness counterbalance the advantage of drive to the element (Curtsinger and Feldman 1980). Drive elements can also persist if their action is suppressed (eliminating SRD); however, enhancers may also evolve, potentially leading to a cyclical arms-race between enhancers and suppressors of drive (Hall 2004).
If segregation distortion occurs due to the combined effects of multiple loci, inversions will be favored to maintain linkage among them. Such drive-associated inversions are common, particularly for autosomal drivers where linkage disequilibrium between the driver and its target must be maintained. Some X-linked distorters also involve large regions of recombination suppression and differentiation relative to the standard (ST) chromosome arrangement (Wu and Beckenbach 1983; Dyer et al. 2007; Paczolt et al. 2017; Fuller et al. 2020) while others are freely recombining single loci (Montchamp-Moreau et al. 2006). Like any inversion, drive-associated inversions may recombine when homozygous, but depending on their frequency or effects on fitness, recombination may occur rarely or not at all. A reduction in the rate of recombination slows or prevents purging of new deleterious variation (Muller 1964) and may also reduce overall levels of nucleotide diversity, as the hitchhiking effects of positive or background selection are stronger when recombination rates are low (Smith and Haigh 1974; Charlesworth et al. 1993). Ultimately, like TE’s, meiotic drivers persist despite typically reducing fitness of individuals carrying them, although theoretical work suggests that populations with X-linked drive could prevail during interspecific competition due to a higher reproductive rate associated with the production of excess females (Unckless and Clark 2014).
Multiple lines of evidence raise the possibility that the spread of these two categories of selfish genetic elements may not always occur independently. Most well-characterized meiotic drive systems involve regions of repetitive or satellite DNA either in the target of drive (Wu et al. 1988; Tao et al. 2007b,a; Helleu et al. 2016) or in both the driver and its target (Hurst 1996; Cocquet et al. 2012). For example, the element targeted during distortion by Segregation Distorter (SD) in Drosophila melanogaster (responder) is composed of complex arrays of 240 bp repeats, with the presence of more copies leading to stronger segregation distortion (Wu et al. 1988; Houtchens and Lyttle 2003; Larracuente 2014). In addition to TE’s, piRNA may also target such satellite DNA for silencing: small RNA matching the responder repeats has been shown to be associated with piRNA pathway genes, Aubergine and Argonaute3 (Nagao et al. 2010), and when Aubergine was knocked out in an SD background, the strength of distortion by the driver was enhanced (Gell and Reenan 2013), suggesting that proper silencing of the responder satellite DNA may act to suppress drive. The first piRNA locus to be characterized, Suppressor of Stellate (Aravin et al. 2001) likely first originated as a suppressor of meiotic drive (Hurst 1996), but now acts to silence TEs. Drivers and TEs, then, may share a common enemy in genome defense, and could facilitate one another’s propagation at the expense of the host, similar to processes that may occur among co-infecting organismal parasites (Krasnov et al. 2005; Karvonen et al. 2009).
Here, we analyze the impacts of selfish genetic elements within the genome of a stalkeyed fly (Teleopsis dalmanni). In this species 10-30% of X chromosomes actively drive against the Y chromosome and result in carrier males producing 90% or more daughters (Presgraves et al. 1997). This sex ratio (SR) X chromosome causes an array of positive and negative impacts on fitness (Wilkinson et al. 2006; Finnegan et al. 2019; Meade et al. 2019) including reduced sexual ornament (eyespan) size in SR males (Wilkinson et al. 1998; Johns et al. 2005; Cotton et al. 2013). Based on a three locus comparison to an outgroup species, the SR X chromosome (XSR)_is old, having originated approximately 500 kya (Paczolt et al. 2017), and hundreds of mostly X-linked genes are differentially expressed in SR males (Reinhardt et al. 2014). XSR contains at least one large chromosomal inversion compared to the standard arrangement (XST) and likely more, as recombination has not been detected in XSR / XST females (Johns et al. 2005; Paczolt et al. 2017). Although recombination occurs between XSR haplotypes in homozygous females, the rate of recombination is about half that in females homozygous for XST (Paczolt et al. 2017), which has led to drastically reduced polymorphism on XSR (Christianson et al. 2011).
To provide a detailed view of how the presence of drive elements have influenced patterns of divergence and patterns of genome evolution, such as TE distribution, we combined a comprehensive genomic assembly with resequencing of sex ratio and standard individuals. Using a chromosome-level assembly of T. dalmanni carrying XST and both short and long read resequencing, we find inversions span the X chromosome, and that expression of multiple TE families is higher in testes from sex ratio males, indicating less TE control in SR males. In addition, multiple genes associated with piRNA control and heterochromatin regulation are also differentially expressed, acquiring novel duplicates, or accumulating sequence differences, including core factors involved in TE defense - Argonaute3 (Brennecke et al. 2007) and maelstrom (Findley et al. 2003).
Results
The Teleopsis dalmanni genome contains 10x more transposable element insertions than Drosophila
The genome of Teleopsis dalmanni, a stalk-eyed fly from southeast Asia, was assembled with MaSuRCA (Zimin et al. 2013) from hybrid sequencing data containing Pacific Biosciences RSII long reads sequenced with P6C4 chemistry, and Illumina PCR-free short-read sequences (Table S1). The combined MaSuRCA assembly was then scaffolded using chromatin conformation information (Hi-C technology, Lieberman-Aiden et al. 2009). This assembly produced three chromosome length scaffolds with a total size of 583.1MB, comprising 93.9% of the MaSuRCA assembly. We validated this assembly by comparison to an independently generated Multiplexed Shotgun Genotyping (Andolfatto et al. 2011) linkage map produced using a backcross family from a QTL study crossing two diverged populations (Wilkinson et al. 2014). The order of X-linked scaffolds that were mapped using both methods was strongly correlated (R2 = 0.94 p < 0.0001, Figure S1), providing high confidence in the integrity of the Hi-C scaffolding. We identified a 12MB region that appears to be inverted between the assembly and the MSG map, which could be due to assembly error or an inversion difference between the two populations used for the backcross. BUSCO (Seppey et al. 2019) analysis confirmed the presence of 96% of 2,799 conserved dipteran genes in the entire assembly (Table S2), and 95.8% within the three chromosomal scaffolds, with 9.3% of BUSCO genes duplicated. Consistent with previous analysis (Baker and Wilkinson 2010), we find extensive synteny between chromosome arms (Figure 1A) and the Muller elements when comparing the positions of 1-to-1 Drosophila melanogaster orthologs. Overall, 88.2% of genes are located on the same Muller element in the two species (Figure 1B). As previously reported (Baker and Wilkinson 2010; Vicoso and Bachtrog 2015), the Teleopsis X chromosome is orthologous to chromosome 2L in D. melanogaster (Muller element B). The smaller autosome, previously referred to as “C1”, consists of Muller D and A (chromosomes 3L and X in D. melanogaster), while the larger autosome (“C2”) contains Muller C, F, and E in that order (chromosomes 2R, 3R and 4 in D. melanogaster). We also produced a draft assembly of a closely-related undescribed cryptic species (Paczolt et al. 2017) we refer to as Teleopsis dalmanni sp 2 (Td2) This assembly contains 50,545 scaffolds and is less complete (BUSCO = 88.5%, N50 = 35,545) than the T. dalmanni s.s. assembly.
We identified TEs in the T. dalmanni s.s. genome using Repeatmodeler (Smit and Hubley 2008) followed by annotation by Repeatmasker (Smit et al. 2013) (Figure 1B). Compared to Drosophila melanogaster putative TE’s cover more of the stalk-eyed fly genome (~26.1% versus ~13.5%). About 8.4% of the genome comprises unclassified interspersed elements, while the rest includes 351 families from 27 superfamilies of Class I (DNA) elements (4.4% of genome) and Class II elements including LINE (9.8%) and LTR (3.4%) elements, but no SINE elements. Given the size of the T. dalmanni genome, this amounts to 27-fold as many TE insertions as in D. melanogaster, and 9.5-fold as many as the malaria mosquito Anopheles gambiae. The most abundant TE superfamilies in T. dalmanni include LOA, Jockey, and RTE-BovB non-LTR (LINE) elements (86552, 39370, and 26607 copies, respectively), TcMar-Mariner Class I (DNA) elements (33668 copies), and Gypsy LTR Class-II elements (19292 copies). We aligned male and female genomic resequencing data (SRS2309195-SRS2309198) to our TE families to identify any transposable elements that show a pattern of male-biased abundance and thus putative Y-linkage. A specific Penelope family element had 77-fold more normalized sequence coverage in the male library. This LINE element is also found on the X and autosomes, so this pattern represents an expansion of male-specific (Y-linked) copies. We found an unclassified interspersed element that was similarly amplified on the Y (Table S5).
The sex ratio X is differentiating due to multiple overlapping inversions
Using PacBio long reads collected from sex ratio male siblings, we identified six large chromosomal inversions specific to the SR haplotype (Figure 2). These inversions (which we validated by comparison with short-read poolseq data, Table S3) spanned the entire 100MB chromosome except for a small region at the proximal end (0-1.76MB) and the region between 61MB and 64MB. Many of the inversions overlap, particularly near the proximal end, likely suppressing crossover between XSR and XST. Alignment of the Td2 draft genome across the inversion breakpoints revealed that the Td2 scaffolds were consistent with the ST karyotype in five out of six cases, indicating the derived arrangement appeared and spread in the SR lineage. Only the SR arrangement for inversion 2 (2.2 MB to 19.8MB) appears to be present in Td2, implying that the ST arrangement is the derived state (Table S3).
Using resequencing data from pools of field-derived and sex-ratio screened sex ratio and standard males (Paczolt et al. 2017) we analyzed patterns of genetic variation on the autosomes and X chromosomes. As expected, nucleotide diversity and differentiation (Figure 2, Figure S2) did not differ on the autosomes between pools of SR and ST males from the same collection sites, though there was minor genome-wide differentiation between the two collection sites. Genomic coverage in females was approximately twice that in males across the X, but not the autosomes, also as expected (Figure S2). All three chromosomes contain regions with reduced pairwise nucleotide diversity in all pools. On the two autosomes, these are near the center of the chromosome, where the Muller elements transition and so presumably are centromeres of these metacentric chromosomes. Given a similar pattern at the proximal end of the X chromosome (Figure 2, Figure S2) we infer this region also represents the centromere of the telocentric X chromosome. Reduced polymorphism in other fly genomes is associated with reduced rates of crossover near the centromeres (Begun and Aquadro 1992; Begun et al. 2007). XST has less nucleotide diversity (0.01547+/- 0.000053 CI) than the autosomes (0.01960+/-0.000028). However, consistent with previous findings (Christianson et al. 2011), XSR has drastically lower diversity (0.00606 +/- 0.000039) (Figure 2).
Genetic differentiation (FST) is elevated across the X chromosome between sex ratio types (XSR vs XST) relative to differentiation between collection sites or between replicate pools from the same collection site (Figure 2). Differentiation was elevated near the centromere where there is a higher density of overlapping inversions. Estimates of FST should be elevated where diversity is low and rare variants are common, as is the case near the centromere. Across the X, overall sequence similarity (%ID) of X-chromosome consensus sequences from each pool is higher between sex ratio types (XSR vs. XST comparisons, 1.78%) than it is within sex ratio types (XSR vs. XSR and XST vs. XST comparisons, 0.80%), though both are lower than comparisons to the sister species T. dalmanni sp2 (Td2) (2.76%)). Divergence (dxy with K2P correction) in the XSR lineage relative to ST (Paczolt et al. 2017) is elevated across the X chromosome, but unlike with FST, no substantial elevation in divergence occurs near the centromere (Figure 2). We did not detect significantly elevated divergence near inversion breakpoints (XST vs. XSR dxy = 0.0119 within 1MB of inversions, dxy = 0.0118 in other regions), as has been reported in other systems (Machado et al. 2007; Korunes and Noor 2019). Reduced divergence nearer the center of inversions is thought to occur due to gene flux between heterokaryotypes, facilitated by double crossover. However, for overlapping and nested inversions, double crossover without loss of genetic material should be extremely rare. This may explain why XSR vs XST divergence remains elevated across the T. dalmanni X.
Hundreds of genes have sequence, copy number and expression differences between XSR and XST
Using the poolseq data from XSR and XST males described above in combination with published XSR and XST RNAseq data (Reinhardt et al. 2014), we jointly analyzed patterns of genetic divergence (copy number variation and dN/dS between XSR and XST pools) and differential expression (DE) due to sex ratio. We identified 652 DE genes (43.7% were more highly expressed in XSR) and 185 genes with significant differential genomic coverage (DC genes, 63.2% with higher coverage on XSR). Among 58 genes that were both DE and DC, there was a highly significant association in the direction of DE and differential genomic coverage (Table S4, FET p < 0.0001) with only 6 showing the opposing pattern (e.g., higher genomic coverage in SR but higher expression in ST). This result indicates differences in copy number are likely impacting expression levels – even though most DE genes (91.2%) show no difference in genomic coverage. We attempted to calculate dN/dS between XSR and XST for 3,525 predicted X-linked protein-coding genes. Among these, 766 were identical while another 1210 had only synonymous (873) or only nonsynonymous (337) differences so dN/dS could not be calculated. We were able to calculate dN/dS between XSR and XST for 1,549 genes and found 78 genes were putatively evolving under positive selection (dN/dS > 1) between XSR and XST. dN/dS between XSR and XST was significantly correlated (Pearson correlation = 0.466, p < 2.2e-16) with dN/dS between Td2 and XST (Figure S3). Across genes for which we had all three measures (dN/dS, DC, and DE), we found that dN/dS was best predicted by a model including both DE and DC (nested within DE) (Table 1), with both predicting higher dN/dS (Figure S3). Genes that are DE have higher dN/dS than non-DE genes, and there was an additional impact (higher dN/dS) among genes that were also DC. dN/dS was not > 1 for most of these genes, so this effect could be caused by reduced selective constraint, adaptive evolution, or a combination. Among the genes that were both DE and DC (Table S4) were two X-linked orthologs of a gene called Jasper that regulates chromatin (Albig et al. 2019), as well as two paralogous copies of versager and Mcm10, both of which are involved in chromosome condensation phenotypes. Jasper additionally appears to have undergone extensive tandem duplication on XSR (Figure S4), and the two tandem duplicated copies of Jasper are truncated at the 5’ end relative to their Drosophila orthologs and a full-length, but weakly expressed copy of Jasper found between them (Figure S4). Another DE and DC gene, Tetraspanin29efb a plasma membrane scaffolding protein, also has dN/dS > 1.
Transposable element activity is influenced by the sex ratio X
We found TEs have inserted more frequently on the T. dalmanni X than on the two autosomes (483.1 TEs/MB vs. 439.8 TEs/MB). However, X enrichment varies by element type (Figure 1B), with more DNA elements (20.6%) and LINE elements (18.3%) but fewer LTR elements (8.4%) than expected given the X comprises 17.5% of the genome (102MB / 586MB). The paucity of LTR elements on the T. dalmanni X is driven by an enrichment of LTR elements on T. dalmanni Muller element A, which is the larger arm of C1 in T. dalmanni and is the X chromosome in D. melanogaster and A. gambiae (Figure 1). We found that fifteen of 75 Class I (DNA) families have more copies (Figure 3A) than expected on the X chromosome (X2 = 11.59, P < 0.0001) including 11 different TcMariner elements, while six of 180 LINE element families have fewer copies than expected on the X (X2 = 11.59, P < 0.0001). Out of 80 LTR element families - including 32 different Gypsy elements - 40 have fewer insertions on the X than expected (X2 = 65.10, P < 0.0001) none are more abundant, and 40 occur in expected proportions.
To investigate potential causes of TE accumulation on the X chromosome, we assessed the expression of 350 classified TE families (Lerat et al. 2016) using testis RNAseq data from SR and ST testis (Reinhardt et al. 2014). Of these 350 TE families, 48 are differentially expressed (DE) between SR and ST testis, 85% with higher expression in testes from sex ratio males compared to testes from standard males (Figure 3). These families include a variety of TE superfamilies from both Class I and II elements (Table S5) as well as unclassified families.
Using the pool-seq data we identified transposable element insertions (Kofler et al. 2016) that were not present in the genome assembly, likely representing polymorphisms enriched for relatively recent insertions. The pools contained 986 total insertions, of which 320 were from unclassified families, leaving 666 insertions that could be classified. These 666 were from 349 different transposable element families. Recent DNA element insertions were more likely to be on the X (20.2%), recent LTR insertions were less likely to be on the X (8.0%), and nonLTR retroelement insertions occurred at expected proportions (17.9%). 47.8% of all insertions on the X and autosomes are present in all six sample pools, but the number of insertions that are restricted to either XSR or XST differs depending on drive status. The SR pools contain three X-linked insertions not found in ST pools – a P DNA element and an unclassified element each found in one pool and an R1 LINE element found in both SR pools. In contrast, the ST pools contain 38 X-linked insertions, including nineteen unknown, eight DNA, one Helitron, one LTR and ten non-LTR elements with just one element found in all four ST pools. Thus, XSR has fewer recently inserted TEs than might be expected, even after considering the ST sample pools contained twice as many individuals in total. This implies that TE insertions are either less common or TEs are more easily lost from XSR than from XST. It is worth noting that the effective population size of XSR is much smaller than XST and, therefore, represents a much smaller target for mobile elements.
New insertions of TE’s could be due to TE expression or to other mechanisms, such as unequal crossover or errors during replication. To determine if the number of TE insertions are influenced by expression in the germline, genomic abundance, and/or the influence of XSR on expression, we used the log number of elements annotated in the genome, the log2 normalized (Love et al. 2014) testis expression of each element, and a categorical variable (SR, ST, none) to indicate presence or absence of differential expression between SR and ST testis to predict the number of novel insertions in each family, after separating by element Type (Class I – nonLTR retroelements, Class I – LTR retroelements, and Class II – DNA elements). We fit generalized regression models by maximum likelihood assuming a negative binomial distribution to predict the number of copies for each class of transposable element in all six poolseq samples (Table 2). We find that genome copy number predicts the number of new Class I LTR insertions and new Class II DNA element insertions. In contrast, the number of Class I non-LTR (LINE) insertions is predicted only by transposable element expression.
piRNA related genes show a variety of effects associated with the sex ratio X
Given the observation that TE control is disrupted on XSR, we were interested to know if genes associated with TE genome defense were likewise impacted. In addition to the canonical piRNA genes, piwi, maelstrom, aubergine, and Argonaute3, we identified an additional 168 piRNA-related genes from three recent studies in D. melanogaster (Handler et al. 2013; Palmer et al. 2018; Ozata et al. 2019) for further analysis. In addition, the Heterochromatin protein 1 family is expanding within diptera in association with meiotic drive (Helleu and Levine 2018) and is associated with de novo heterochromatic silencing near novel TE insertions (Lee and Karpen 2017). In Drosophila the HP1E family includes rhino (HP1E) which is a key piRNA gene in Drosophila (Klattenhoff et al. 2009) but is a young duplicate (Helleu and Levine 2018) not found T. dalmanni.
We identified T. dalmanni ortholog(s) and new paralogs of each piRNA gene, and assessed differences in gene expression, rates of nonsynonymous change and number of X-linked duplication events between XSR and XST (Table 3). Maelstrom (mael) is present in six copies within the stalk-eyed fly genome, four of which arose within the genus (Figure S5). One paralog is X-linked and testis expressed (mael is ovary-expressed in D. melanogaster). This X-linked paralog is evolving under positive selection between XSR and XST (dN/dS = 1.58) and has six nonsynonymous XSR-XST differences, including several within its self-named functional domain (Figure S6). Argonaute3 similarly has accumulated five nonsynonymous changes between XSR and XST (Figure S7), and piwi has two nonsynonymous changes between XSR and XST (Figure S8). Six different genes (Table 3) identified in screens for piRNA function were significantly differentially expressed (DE) between XSR and XST – all but one were overexpressed in SR testis. Eight of these piRNA associated genes (including two that are also DE) are found in multiple copies (2-8) in T. dalmanni (Table 3). Overall, about 10% of piRNA associated genes we analyzed – including core genes – are in some way impacted by the presence of XSR in this species. We also assessed positive selection for a group of core piRNA associated genes previously found to be under positive selection in other species using the McDonald-Kreitman (MK) test and found that Hen1, three of the five maelstrom paralogs, and zucchini showed signs of adaptive protein evolution within T. dalmanni (Table S6).
Discussion
The discovery of genetic elements that persist despite negative impacts on organismal fitness required a reassessment of the traditional idea that selection acts only at the level of the individual. Indeed, such selfish genetic elements overall have divergent evolutionary interests to their hosts, despite an intimate reliance on them, making parasites their best analogs (Burt and Trivers 2006). Here, we analyze the distribution and spread of transposable elements in the context of a long-term co-existence with another selfish genetic element, an X-linked meiotic driver, sex ratio (SR). We find that TE’s play a large role in genome evolution in T. dalmanni, covering ~25% of the genome and totaling a staggering 261,000 copies (plus nearly as many fragments) across the 500MB genome. We found that the presence of the XSR chromosome increased expression of TEs in the genome.
Disruption of TE control in sex ratio males occurs in the presence of an old (~500kya) X karyotype which we find carries multiple overlapping inversions relative to the standard arrangement. The consequences for the evolution of the X chromosome are profound - the reduced gene flux between X karyotypes caused by nested inversions (Johns et al. 2005; Paczolt et al. 2017) has led to reduced polymorphism and differences in sequence and gene expression across the X (Figure 2), confirming previous findings (Christianson et al. 2011; Reinhardt et al. 2014). It is plausible, therefore, that differences in TE expression could be caused by genetic drift between the two chromosome types – XSR has reduced polymorphism and recombination, making selection less efficient and subjecting the chromosome to effects of Muller’s Ratchet (Muller 1964). If higher TE expression generally harms fitness, we might expect this bias as just an example of the accumulation of deleterious variation. Alternatively, a more active relationship between TE’s and drive is plausible given repeated connections between meiotic drivers, regulation of chromatin and TE control. So far, all targets of X-linked meiotic drivers involve repetitive Y-linked sequences (Soh et al. 2014; Helleu et al. 2016). Here, we identified Y-ampliconic interspersed elements that were present in high copy number on the Y chromosome, consistent with these examples (Table S6). Furthermore, the first piRNA locus identified – Supressor of Stellate is likely a relict suppressor of drive (Hurst 1996) and studies of Drosophila SD suggest a role of piRNA transport misregulation in disrupting chromatin condensation in responder-bearing sperm (Nagao et al. 2010; Gell and Reenan 2013; Larracuente 2014).
Supporting the case for an active connection between sex ratio drive and the expansion of TE’s, we find that several piRNA genes are accumulating non-synonymous differences between XSR and XST with some evolving adaptively. We also observe that eight of nine differentially expressed genes associated with piRNA function are upregulated in XSR, implying carriers may be reacting to TE release by upregulating piRNA functions. A missing piece here is the expression of the piRNA themselves, as well as potential divergence between putative piRNA clusters on XST and XSR. Given that piRNA clusters are formed via new mutations (Zhang et al. 2020), if present on the X, they should accumulate different TE insertions over time due to the lack of recombination between XST and XSR and so target different TE’s.
Prior work has shown that some piRNA pathway proteins, like immune genes, evolve rapidly (Obbard et al. 2009; Kolaczkowski et al. 2011a; Lee and Langley 2012; Simkin et al. 2013; Palmer et al. 2018), though the precise mechanism underlying this rapid evolution is unclear (reviewed in (Blumenstiel et al. 2016), but see (Parhad and Theurkauf 2019)). We, too, find that some of these genes (zucchini, hen1, and several maelstrom orthologs) show signs of evolving under positive selection within T. dalmanni (Table S6). Furthermore, we found that XSR alleles of key X-linked TE control genes piwi, Argonaute3, and an X-linked copy of maelstrom have accumulated nonsynonymous differences compared to the allele on XST (Figure S5-S7), with maelstrom showing evidence of positive selection between XSR and XST (dN/dS = 1.5). In Drosophila, the evidence for ongoing positive selection for maelstrom, piwi and Ago3 is mixed, with signs of positive selection in a minority of comparisons, while other genes like Aubergine, spn-E, and armitage are more consistently found to be under positive selection (Blumenstiel et al. 2016). Maelstrom and aubergine have also duplicated within diopsids. We previously (Baker et al. 2016) identified five duplicates of maelstrom, and here found an additional near-identical copy. Four of these six copies, including the adaptively evolving X-linked copy appear to be novel within the genus Teleopsis (Figure S8). During piRNA mediated silencing in Drosophila, piwi is responsible for establishing a heterochromatic histone marks (Histone 3 Lysine 9 methylation) whereas maelstrom promotes the spreading of heterochromatic states to nearby genes (Sienski et al. 2012).
In Drosophila simulans, a young duplicate of heterochromatin protein 1 (HP1D) causes sex ratio meiotic drive (Montchamp-Moreau et al. 2006; Helleu et al. 2016), and HP1D (also known as rhino) is a key player in piRNA genome defense in D. melanogaster and presumably other species (Klattenhoff et al. 2009). Furthermore, recurrent amplification of HP1 copies occurs in lineages where sex ratio drive is common (Helleu and Levine 2018). We identified multiple HP1 family members in the T. dalmanni genome, and one (the HP1E ortholog) was X-linked, but it did not show signs of SR-associated divergence, differential expression or coverage. We did, however, identify XSR-specific amplification of a gene recently found to be associated with the silencing of heterochromatin. Jasper is among the few genes exhibiting strong differential expression and differential copy number between SR and ST males (Table S4), and there has been XSR-specific tandem amplification of the PWWP domain of the protein (Figure S4B). In Drosophila the PWWP domain binds to methylated nucleosomes during heterochromatin silencing by its partner JIL-1 (Albig et al. 2019). Given the tripartite dance between TE control, heterochromatin silencing, and meiotic drive, this will be an intriguing candidate for further analysis.
Finally, meiotic drive has been theorized (Frank 1991; Hurst and Pomiankowski 1991) to be a possible “engine of speciation”, by causing rapid evolution of suppressors and enhancers that influence sperm development, though empirical support is mixed (McDermott and Noor 2010; Meiklejohn et al. 2018). Multiple studies have shown that misregulation of heterochromatin is a major underlying cause of hybrid incompatibility in flies (Brideau et al. 2006; Kelleher et al. 2012), and hybrid incompatibility in D. simulans and D. melanogaster has been linked to a Dobzhansky-Muller incompatibility between interspecific alleles of piRNA pathway gene rhino - alleles of which can also cause drive (Helleu et al. 2016) - and another piRNA gene deadlock (Parhad et al. 2017). XSR causes many fitness effects, and X-linked loci in T. dalmanni have been shown to contribute both to a cryptic meiotic drive and to sperm defects (Wilkinson et al. 2014). Interspecific hybrids often show a series of defects referred to as hybrid dysgenesis, which typically includes misregulation of heterochromatin thought to be caused by a mismatch between repetitive DNA and the machinery regulating it. Given the extensive divergence between XSR and XST it is possible that the TE misregulation we observe in sex ratio males could cause incompatibilities between XSR and autosomal loci. We might also expect X-linked incompatabilities in heterokaryotypic females. Consistent with this idea, XSR causes reduced viability (hatch rate) in embryos of SR males as well as female carriers (Finnegan et al. 2019), although adult XSR / XST females are slightly more fecund than either homokaryotype (Wilkinson et al. 2006). TE expression in females with different sex ratio karyotypes has not yet been measured, but could provide valuable insights into evolutionary dynamics of multiple selfish genetic elements.
Material and Methods
Genome assembly of Teleopsis dalmanni s.s
A draft genome assembly for Teleopsis dalmanni, NLCU01 was created with a combination of Roche 454, Illumina and PacBio sequence data (Table S1) using MaSuRCA (Zimin et al. 2013) and is available on Ensembl genomes (Kersey et al. 2018). This assembly contains ~25k scaffolds with an N50 of ~75k. All DNA sequences were obtained from an inbred population (line “2A”) of Teleopsis dalmanni. This population is derived from flies that were first collected near the Gombak River in peninsular Malaysia (3 12’N, 101 42’E) in 1989 and then maintained as a control line for an artificial selection study on relative eyespan (Wilkinson 1993; Wolfenbarger, L. L. & Wilkinson 2001). After 50 generations of selection, full-sib mating was conducted for seven generations to establish the line, which has subsequently been maintained without additional inbreeding. This population has been used in several prior studies (Christianson et al. 2005; Wilkinson et al. 2014) and does not carry any drive-associated genetic markers. Contaminating bacterial scaffolds were identified and removed prior to submission using a modification of the Wheeler et al. (2013) DNA-based homology pipeline, as described in Poynton et al. (2018). Male and female genomic short-read resequencing data was aligned to each scaffold using nextgenmap (Sedlazeck et al. 2013) with default parameters, and relative coverage of male and female reads was used to identify X-linked scaffolds (cf. (Vicoso and Bachtrog 2015), with the expectation that the normalized ratio of female to male reads should be approximately 1 to 1 for autosomal and 2 to 1 for X-linked scaffolds.
A chromosome-level assembly was then created by incorporating chromatin conformation information and validated with a high-density linkage map. Chromatin conformation capture data was generated using a Phase Genomics (Seattle, WA) Proximo Hi-C Plant Kit, which is a commercially available version of the Hi-C protocol (Lieberman-Aiden et al. 2009). Following the manufacturer’s instructions, intact cells from unsexed pupae from the 2A inbred line were crosslinked using a formaldehyde solution, digested using the Sau3AI restriction enzyme, and proximity-ligated with biotinylated nucleotides to create chimeric molecules composed of fragments from different regions of the genome that were physically proximal in vivo, but not necessarily genomically proximal. Continuing with the manufacturer’s protocol, molecules were pulled down with streptavidin beads and processed into an Illumina-compatible sequencing library. Sequencing was performed on an Illumina HiSeq 4000, generating a total of 202,608,856 100 bp read pairs. Reads were aligned to the draft assembly (NLCU01.30_45_breaks.fasta) following the manufacturer’s recommendations. Briefly, reads were aligned using BWA-MEM (Li and Durbin 2009) with the -5SP and -t 8 options specified, and all other options default. SAMBLASTER (Faust and Hall 2014) was used to flag PCR duplicates, which were later excluded from analysis. Alignments were then filtered with samtools (Li et al. 2009) using the -F 2304 filtering flag to remove non-primary and secondary alignments. Putative misjoined contigs were broken using Juicebox (Rao et al. 2014; Durand et al. 2016) based on the Hi-C alignments. A total of 113 breaks in 105 contigs were introduced, and the same alignment procedure was repeated from the beginning on the resulting corrected assembly. The Phase Genomics’ Proximo Hi-C genome scaffolding platform was used to create chromosome-scale scaffolds from the corrected assembly as described (Bickhart et al. 2017). As in the LACHESIS method (Burton et al. 2013), this process computes a contact frequency matrix from the aligned Hi-C read pairs, normalized by the number of Sau3AI restriction sites (GATC) on each contig, and constructs scaffolds in such a way as to optimize expected contact frequency and other statistical patterns in Hi-C data. In addition to Hi-C data, chromosomal linkage information (see below) was used as input to the scaffolding process. Linkage groups from a linkage map were used to constrain chromosome assignment during the clustering phase of Proximo by discarding any suggested clustering steps that would incorporate contigs from different linkage groups onto the same chromosome, but linkage map data were not used during subsequent ordering and orientation analyses in Proximo. Approximately 528,000 separate Proximo runs were performed to optimize the number of scaffolds and scaffold construction in order to make the scaffolds as concordant with the observed Hi-C data as possible. This process resulted in a set of three chromosome-scale scaffolds containing a total of 581.7 Mbp of sequence (93.6% of the corrected assembly). Finally, Juicebox was again used to correct remaining scaffolding errors.
Linkage groups used to constrain and validate the Hi-C assembly were created by mating a female hybrid offspring obtained from a cross between a male from the 2A inbred strain and a female from a noninbred population of T. dalmanni collected near Bukit Lawang, Sumatra (3 35’N, 98 6’E) to a male from the 2A strain. This backcross produced 249 (131 female and 118 male) individuals which were individually genotyped using multiplex shotgun sequencing (Andolfatto et al. 2011) and multiple STR loci (Wilkinson et al. 2014). Genotypes were determined as either heterozygous or homozygous in each individual for each scaffold by combining all loci present on a scaffold into a single “super locus”. Reads were aligned using bwa (Li and Durbin 2009) and genotypes were assessed as either homozygous or heterozygous using samtools v.1.9 (Li et al. 2009) (mpileup -v). Because this was a backcross, for autosomal loci those individuals with the backcross allele (pure “2A”) should be homozygous at informative markers whereas individuals with the non-backcross allele should be heterozygous, with an expectation of a 1 to 1 ratio of these genotypes. This results in an overall 3 to 1 ratio of the backcross to the non-backcross allele for autosomal markers and X linked markers in females, and an overall 1 to 1 ratio for X linked markers in hemizygous males. Markers were retained as potentially informative if at least one individual was found to carry the non-backcross Bukit Lawang (Wilkinson et al. 2014) allele, defined as the less common allele across all female individuals. Markers were removed if they violated expected allele ratios using a binomial test against the expectations described above, or if more than 20% of female individuals were found to carry only the non-backcross allele. Finally, within each individual all markers from a given scaffold were pooled to give an overall number of reads supporting each genotype, and requiring a minimum coverage of 5 reads per marker. Individuals were assigned in the final matrices as “a” (for 2A/backcross genotypes) or “b” (for Bukit Lawang/foreign genotypes).
Separate genotype matrices were then created for the X-linked scaffolds (as determined by male and female coverage (Vicoso and Bachtrog 2015)) and autosomal scaffolds and rank-ordered by the number of individuals genotyped. We then used JoinMap (Stam 1993) v4.1 to assign the top 1000 autosomal scaffolds into one of two linkage groups (chromosomes). Only scaffolds with a LOD score > 5 were assigned to a chromosome. We used a similar process for the top 250 X-linked scaffolds but used a LOD score > 10 to assign scaffolds to the chromosome. These linkage groups were used to constrain the Hi-C genome assembly as noted above. Then, independent from the Hi-C scaffolding process, we ordered scaffolds within each linkage group by regression mapping using a Haldane mapping function (Rédei 2008). We used regression mapping, rather than maximum likelihood, because it is less sensitive to missing genotype data (Van Ooijen 2006) as is typical for MSG studies. We removed markers from the final map if there was evidence for significant (after Bonferroni correction) lack of fit to their nearest neighbors. The resulting linkage map included 762 scaffolds spanning 147.1 Mbp. Collinearity between the Hi-C and linkage maps was assessed by comparing the relative location of scaffolds which were found in both maps. Chromosomal synteny of the assembly scaffolds with Drosophila melanogaster chromosome arms was assessed by alignment of a set of 8098 previously annotated 1-to-1 Drosophila orthologs (Baker et al. 2016) to the genome assembly using GMAP (Wu and Watanabe 2005) (--npaths=1 --format=gff3_gene --min-identity=0.9). Assemblies were assessed for completeness using BUSCO (Seppey et al. 2019).
Draft Genome assembly of T. dalmanni species 2
We recently described a cryptic species of stalk-eyed fly (Paczolt et al. 2017), which we refer to as T. dalmanni sp 2 (Td2) and which corresponds to prior collections of T. dalmanni from several sites in peninsular Malaysia, such as Cameron Highlands (Christianson et al. 2005; Swallow et al. 2005). To produce a draft genome of this species, we extracted HMW DNA from a single male from a laboratory population of Td2 using the Gentra Puregene tissue kit (Qiagen cat 158667). 1 ug of DNA was sent to the New York Genome Center (NYGC) where it was prepped with the Chromium Genome linked read kit (10X Genomics) and sequenced on a half lane of an Illumina HiSeqX machine, producing a total of 416 million reads. These reads were assembled at NYGC using Supernova (v2.0.1). The resultant draft genome contained 10,290 scaffolds greater than 10 kb with a N50 of 45.2 kb and total genome size of 355 MB.
Short-read resequencing of sex ratio (SR) and standard (ST) males
To identify sequence and structural variations specific to XSR, we sequenced DNA from replicate pools of SR or ST males either collected in the field from two different sites in peninsular Malaysia or representing the first three generations of sons descended from field-collected females. One SR and two ST sample pools were created from the DNA of males from each of two collection sites (Gombak and Kanching) previously phenotyped for offspring sex ratio (Paczolt et al. 2017). When an excess of individuals was available from a collection and sex ratio category, genotype data from 9 X-linked STR loci (Paczolt et al. 2017) was used to avoid oversampling closely related individuals (e.g. male full-siblings from the same brood). Nevertheless, haplotype diversity within pools was not significantly different to diversity among all candidate males for that pool (following Christianson et al 2011, Dunnett’s t-test, p>0.05 for all comparisons, Table S7). DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) and quantified using PicoGreen Quant-IT dsDNA quantification kit (Thermofisher Q33130). Pools were then assembled using an equimolar amount of total DNA from each sample. Sample size for each pool ranged from 15-18 individuals (Table S7). Six barcoded libraries were prepared and multiplexed on two lanes of a HiSeq1500 set to RapidRun mode to generate 150 bp paired-end sequences. Bam-formatted alignments of these libraries to the genome were produced using nextgenmap (Sedlazeck et al. 2013) with default parameters and used in subsequent analyses. Pairwise genetic diversity for each pool was calculated from the pooled resequencing alignments in 5kb non-sliding windows using popoolation (Kofler et al. 2011) and FST was estimated following (Kolaczkowski et al. 2011b). After poolseq had been completed, it was determined that four pooled ST individuals were actually Td2 males (genetic markers distinguishing these species had not been identified until after pooling, see Table S7, (Paczolt et al. 2017), and previous work had suggested that Td2 would not be present in the collection sites we visited (Christianson et al. 2005; Swallow et al. 2005)). Both SR pools and one ST pool (Kanching ST2) were entirely composed of T. dalmanni s.s. (Table S7) so all analyses where excess polymorphism (actually species divergence from the Td2 individuals) within a pool could impact the results of analysis were repeated using only these three samples, and results of the reduced analysis were found to be qualitatively similar to the full analysis (Figure S9 and Table S8).
Long-read resequencing of a SR haplotype
To identify inversion breakpoints between XSR and XST, we used long-read sequencing (Pacific Biosciences). A pool of full-sib males bearing a single identical-by-descent (IBD) XSR haplotype was created by mating a SR/SR female to a male from the 2A strain and then backcrossing the female progeny to males from the same strain. We then genotyped 107 sons from this backcross at three X-linked STR loci (ms125, ms395, and CRC) in order to distinguish XSR sons from XST sons. DNA from 46 XSR sons was then extracted using the Gentra PureGene Tissue Kit (Qiagen) and pooled, followed by a phenol-chloroform extraction and ethanol precipitation. A PacBio long insert (15 Kb) library was then prepared and run on three PacBio Sequel SMRT cells. These runs yielded a total of 15.1 Gb of sequence, with mean read length of 4.9 Kb and maximum read length of 92.8 Kb. Raw long reads were aligned to the genome using ngmlr with default parameters and structural variants were called using sniffles (Sedlazeck et al. 2018) requiring at least 2 reads to support each variant call. The resulting output was filtered to find inversions that were fixed within the SR PacBio long reads relative to the reference genome. Each putative inversion was then validated as a fixed SR specific inversion by comparison to the read-pair orientation in the SR and ST poolseq data at the breakpoint using IGV (Thorvaldsdottir et al. 2013). An inversion was considered validated if reads from all of the SR poolseq samples but none of the ST poolseq samples agreed with the sniffles call at that position (Table S3). Finally, to polarize the direction of the inversion mutation, an alignment of the T. dalmanni sp 2 genome assembly was performed using blat (Kent 2002) and scaffold alignments near the breakpoints were examined to determine if they 1) support the standard arrangement (span the breakpoint), 2) support the SR arrangement (scaffold breaks and aligns to other end of breakpoint), 3) scaffold present near breakpoint but supports another arrangement, or 4) no scaffolds mapping near the breakpoint (Table S3).
Transposable element annotation
Transposable elements were annotated in the T. dalmanni ss assembly using RepeatModeler v. 1.0.4 (Smit and Hubley 2008) with default parameters and the NLCU01 assembly as input. Resulting consensus fasta formatted TE sequences were input into RepeatMasker (Smit et al. 2013) with the Hi-C assembly as the reference, producing a repeat-masked reference genome and repeat annotations. The tool “One code to find them all” (Bailly-Bechet et al. 2014) was used with the RepeatMasker output to count the numbers and locations of each type of insertion in the T. dalmanni genome. These were compared to the RepeatMasker annotations for two other dipterans (Drosophila melanogaster dm6 RepeatMasker open-4.0.6 and Anopheles gambiae anoGam1 RepeatMasker open-4.0.5) analyzed using the same procedure.
Novel, polymorphic insertions of TE’s were called in each of the six Illumina resequencing pools using PopoolationTE2 (Kofler et al. 2016) running the “separate” analysis mode on the repeat masked assembly and TE sequences. To compare the rate of insertions of TE’s between the samples, insertions were inferred to be orthologous if they were an insertion of the same element within 500 bp in multiple pools.
The expression of RepeatModeler TE families in SR and ST males was assessed using TEtools (Lerat et al. 2016), using the default settings and including alignment with bowtie2 (Langmead and Salzberg 2012) using the RepeatModeler TE library and RNAseq reads from two pools of SR male testis and two pools of ST male testis (Reinhardt et al. 2014) (BioProject PRJNA240197). Unannotated repetitive elements (“Unknown” interspersed and simple repeats) were removed after normalization but prior to differential expression analysis with DESeq2 (Love et al. 2014). Differential TE expression between the SR and ST pools was assessed using the negative binomial Wald test on the DESeq-normalized counts for each TE family. TEtools was also used to estimate TE family copy number within the SR and ST genomic resequencing pools and in male and female genomic libraries. DESeq2 was used to normalize read counts prior to comparisons across libraries.
Annotation of gene duplication and expression, and differential coverage
A set of protein-coding genes annotated from a transcriptome assembly (BioProject PRJNA240197) were aligned to the three largest scaffolds using GMAP allowing for up to 10 gene alignments (“paths”) per gene (--npaths=10 --format=gff3_gene). Annotations were removed as potential TEs misannotated as genes if they had >50% alignment overlap with any TE annotation from Repmasker. Bedtools (Quinlan and Hall 2010) (intersect -wao) was used to determine the number of bases of overlap for each exon, then the proportion of overlapping bases was calculated across the entire length of each gene alignment (“path” in gmap terminology). RNAseq data from SR and ST male testis (Reinhardt et al. 2014) was aligned to the genome using Hisat2 (Kim et al. 2019) v2.0.1-beta (--dta -X 800) and expression of each gene feature – including newly discovered duplicates found by GMAP – in each library was estimated using featurecounts (Liao et al. 2014). Differential coverage and differential expression were each assessed on a by-feature basis using DESeq2 (Love et al. 2014) with the default Wald test on the negative binomial distribution.
Molecular evolutionary analyses
Divergence was assessed in comparison to the Teleopsis dalmanni sp 2 (Paczolt et al. 2017) genome described above. The Td2 draft genome scaffolds were aligned to the three largest (chromosomal) scaffolds in the Hi-C assembly for T. dalmanni s.s. using GMAP (Wu and Watanabe 2005) (--nosplicing --format=samse). A Td2 consensus was called from the gmap alignment of Td2 scaffolds to the Teleopsis dalmanni genome by sorting and indexing with samtools (Li et al. 2009) v. 1.3.1, then calling the consensus with Bcftools (Li 2011) v 1.9 (bcftools call --ploidy 1 -mA). Regions which did not have an aligned scaffold or align as gaps show up as stretches of “N’s” in the consensus when these parameters are used. The bam-formatted pool-seq library alignments were used with bcftools (bcftools call --ploidy 1 -c; vcfutils.pl vcf2fq), to create a majority-rule XSR consensus sequence for the large X-chromosomal scaffold (PGA_scaffold1) from a bam file combining both XSR pools into a single bam file. In addition, to have a comparable (similar sequencing and allelic coverage) XST consensus, an XST alignment (bam) file was produced using 1 replicate from each collection site and consensus called as above. The best alignment of the coding regions of 3,709 X-linked genes previously (Baker et al. 2016) assembled and annotated using data from a multi-tissue RNAseq experiment (BioProject PRJNA240197) were localized to the X-chromosomal scaffold via alignment with GMAP (--npaths=1 --format=gff3_gene --min-identity=0.9). Gene sequences were extracted from the XSR and XST consensus X chromosomes described above using the gffread utility (Pertea and Pertea 2020) and the mRNA gff annotations from GMAP. A few hundred genes contained in-frame stop codons in one or more lineages - these genes were trimmed to the longest open reading frame present and if what remained was longer than 50 amino acids, were retained. Genes were excluded if they contained only ambiguity sequence (“N”) in one or more of the consensus genomes. After exclusions, we counted nonsynonymous divergent sites and calculated pairwise dN/dS of XSR vs XST and for XST vs Td2 for 3,191 X-linked genes using the SNAP utility (Korber et al. 2000). Pairwise dN/dS (XST vs Td2) was also calculated and the McDonald-Kreitman test (Td1 vs Td2) was performed on a set of 17 core piRNA genes (Table S6).
A 13 “taxon” unphased alignment was also produced from the poolseq data. A major and minor allele was called for polymorphic sites from each of the 6 pools and represented as two sequences in the alignment, giving 13 total with the Td2 outgroup. The 13 taxon alignment was used to identify segregating vs. divergent (relative to Td2) sites in each pool across the genome. Dxy (K2P corrected) was calculated between either the XSR and XST chromosome (between SR type), or between the two collection sites (across SR type) in 10kb windows using poly+div_sfs.pl.
Data Access
Raw data and genome assemblies used in this project are available on NCBI BioProjects PRJNA655584, PRJNA391339, PRJNA662429 and PRJNA659474. Transposable element family consensus fasta sequences for Teleopsis dalmanni, the msg genotype matrix and gff3 formatted gene annotations are available on the Digital Repository at University of Maryland (DRUM), archive ID’s 1903/26380, fgxn-tuaf, and gfqi-iktk
Disclosure Declaration
Nothing to disclose
Author Contributions
JR, RB, and GW prepared the manuscript. JR, AZ, KP, CL, JW, GW, and RB analyzed the data. CH, GW provided sequencing data.
Acknowledgements
The authors thank Melanie Kirk, Nathaniel Lowe, Wyatt Shell, George Ru, and Gabriel Welsh for assistance with analysis, sample preparation, and fly rearing; Philip Johns and Max Brown for fly collections; Shawn Sullivan for HiC analysis; Najib El-Sayed and Suwei Zhao for HiSeq library prep and sequencing assistance; Ellen Martinson for bacterial contamination screening; and Molly Schumer, Peter Andolffato, and Wei Wang for reagents and advice on multiplexed shotgun genotyping (MSG). Funding for this work was provided by National Science Foundation grants DEB-0951816 to R.H.B., DEB-0952260 to G.S.W., by the USDA National Institute of Food and Agriculture grant 2018-67015-28199 to A.V.Z, and by the University of Maryland and the Geneseo Foundation.