Haplotype divergence supports ancient asexuality in the oribatid mite Oppiella nova

Sex strongly impacts genome evolution via recombination and segregation. In the absence of these processes, haplotypes within lineages of diploid organisms are predicted to accumulate mutations independently of each other and diverge over time. This so-called ‘Meselson effect’ is regarded as a strong indicator of the long-term evolution under obligate asexuality. Here, we present genomic and transcriptomic data of three populations of the asexual oribatid mite species Oppiella nova and its sexual relative Oppiella subpectinata. We document strikingly different patterns of haplotype divergence between the two species, strongly supporting Meselson effect like evolution and ancient asexuality in O. nova: (I) Variation within individuals exceeds variation between populations in O. nova but vice versa in O. subpectinata. (II) Two O. nova sub-lineages feature a high proportion of heterozygous genotypes and lineage-specific haplotypes, indicating that haplotypes diverged independently within the two lineages after their split. (III) The deepest split in gene trees generally separates haplotypes in O. nova, but populations in O. subpectinata. (IV) Tree topologies of the two haplotypes match each other. Our findings provide positive evidence for the absence of sex over evolutionary time in O. nova and suggest that asexual oribatid mites can escape the dead-end fate usually associated with asexual lineages.


Introduction
Sexual reproduction is considered as a prerequisite for the long-term persistence of eukaryote species, because it reduces selective interference among loci and thus facilitates adaptation and purifying 55 selection (Hill & Robertson, 1966;Felsenstein, 1974;Bell, 1982;Rice & Friberg, 2009;Neiman et al. , 2017) . Contrary to this scientific consensus, some exceptional taxa appear to have persisted in the absence of sex over millions of years, the so-called 'ancient asexual scandals' (sensu Judson & Normark, 1996;Schoen et al. , 2009a;Schurko et al. , 2009 ). These exceptional taxa are invaluable, because by understanding how they persisted as asexuals they could help to identify the adaptive 60 value of sex (Butlin, 2002) , one of the major riddles in evolutionary biology (Maynard Smith, 1978;Bell, 1982) . However, several species originally believed to be ancient asexual scandals were later suggested to be either recently derived asexuals or to engage in some form of rare or non-canonical sex (Lunt, 2008;Schurko et al. , 2009;Signorovitch et al. , 2015;Schwander, 2016;Laine et al. , 2020) .
At least two candidates for ancient asexuality remain, the darwinulid ostracods and several 65 parthenogenetic lineages of oribatid mites. Both groups appear to have persisted for tens of millions of years (Heethoff et al. , 2009; and diversified into ecologically different species (Birky & Barraclough, 2009;Schurko et al. , 2009) . However, support for obligate asexuality in darwinulid ostracods and oribatid mites has largely been based on negative evidence, i.e. the absence of males among thousands of females and the non-functionality of rare males (Taberly, 1988;70 Palmer & Norton, 1991;Birky, 2010;Wehner et al. , 2018) . Screening these groups for positive evidence of ancient asexuality is therefore of major importance (Normark et al. , 2003) .
One of the strongest predictions for evolution without recombination and segregation is that the two haplotypes (each stemming from one homologous chromosome copy) within a diploid clonal lineage accumulate mutations independently of each other. Thus, after the loss of sex, the haplotype 75 sequences diverge over time, and levels of intra-individual heterozygosity increase ( Figure 1 ). This intra-individual haplotype divergence is commonly known as the 'Meselson effect' (Birky, 1996;Normark et al. , 2003) .

Figure 1:
Nuclear haplotype trees expected under (long-term) obligate asexual and sexual 80 reproduction. In diploid, functionally clonal organisms, homologous chromosomes accumulate mutations independently of each other and evolve as independent lineages (note that this can be restricted to specific regions sheltered from a loss of heterozygosity caused by mechanisms such as gene conversion). Accordingly, divergence between haplotypes within individuals (blue) is expected to exceed the mean divergence between haplotypes of individuals from different populations. 85 Furthermore, the haplotype tree fully separates homologous haplotypes at its deepest split (red), which results in high frequency of heterozygous SNPs shared among individuals of different populations (green). Finally, the topologies of haplotype subtrees A & B are expected to match each other (the orange line represents the mirror axis) due to their parallel divergence. In sexual organisms, haplotype divergence is expected to follow population divergence and the haplotype tree to resemble 90 that of the populations. Therefore, in sexuals, divergence between haplotypes within individuals is expected to be smaller than the divergence between populations, and the haplotype tree fully separates populations (red dashed). Figure adapted from (Schwander et al. , 2011) .
Surprisingly, there has only been equivocal empirical validation of this strong theoretical prediction 95 thus far. In several asexual lineages the Meselson effect was not found (e.g., darwinulid ostracods;  , or could be explained by mechanisms other than haplotype divergence after the transition to asexuality, such as a hybrid origin (e.g., Meloidogyne nematodes; Lunt, 2008 ) or divergence between paralogs (ohnologs) rather than between haplotypes (e.g., bdelloid rotifers; Mark Welch et al. , 2008;Flot et al. , 2013 , and Timema stick insects;Schwander et al. , 2011 ;Jaron et al. , 100 2020a;reviewed in Hoerandl et al., 2020). Potential support for the Meselson effect was found in fissiparous species of Dugesia flatworms, but with data on only two genes, alternative explanations such as divergent paralogs could not be excluded (Leria et al. , 2019) . The as yet strongest support comes from a whole-genome study of obligately asexual trypanosomes, unicellular parasitic flagellates, in which some genomic regions are highly heterozygous and show the expected parallel 105 haplotype divergence (Weir et al. , 2016) . We still lack any support from genome-wide analyses of the Meselson effect in asexual animals.
In this study, we characterize haplotype divergence patterns in the asexual oribatid mite species 120 Oppiella nova and its sexual relative O. subpectinata. A previous study, based on molecular divergence estimates, suggested that O. nova persisted in the absence of sex for millions of years, given that sub-lineages within this species split 6 to 16 myr ago, (Schaefer et al. , 2010;Von Saltzwedel et al. , 2014 ). Using de novo genomes and polymorphism data from transcriptome-resequencing, we tested for four population genomic signatures expected under obligate   Table S1) .

(I) The divergence within asexual individuals is large and exceeds the divergence between
populations. 150 We analysed within-individual and between-population divergence of individuals from three   subpectinata individuals ( Table 1 ). This is the expected pattern given that haplotype divergence under functionally mitotic asexuality should result in increased heterozygosity levels compared to closely 180 related sexual species, unless gene conversion or other homogenizing mechanisms occur at higher rates than new mutations (Birky, 1996) . subpectinata, values were in the range expected for a non-inbred panmictic sexual species (mean F is = 0.002, Table 2 ) and values for O. nova were significantly lower (mean F is = -0.328; Wilcox rank sum test, W = 0; P < 0.001).  10,197) 210 to the gain of heterozygosity via new mutations. We therefore conducted explicit tests for the Meselson effect solely on the seven of the nine individuals where it could potentially be present.
If the loss of sex in O. nova occurred prior to population separation, the observed heterozygosity excess in seven individuals is expected to result from shared heterozygous SNPs among individuals of the two different lineages (see green lines in Figure 1 ). To test this, we generated a Site Frequency 215 Spectrum (SFS). Sites with heterozygous SNPs shared among the seven individuals were 19 times more frequent than expected under Hardy Weinberg Equilibrium (HWE; yellow bar in Figure 3 ).  with a given number of non-reference variants over the seven heterozygous individuals (e.g., seven diploid individuals can display a maximum of 14 variants relative to the reference genome). Heterozygous genotypes shared among all seven individuals, or among individuals of lineages I and II privately, are colour-highlighted and their excess over HWE indicated (see legend). The SFS is 230 consistent with the accumulation of shared heterozygous variants after the loss of sex, followed by lineage separation and independent accumulation of further heterozygosity in each lineage (inset tree with numbers of shared heterozygous variants at each branch).

III) The deepest split in many haplotype phylogenies separates haplotypes.
235 A classical signature of haplotype divergence under asexuality is the full separation of haplotypes A and B at the deepest split of a haplotype tree. By contrast, haplotypes are generally expected to diverge according to population divergence in sexual organisms ( Figure 1 ). To confirm that the phased regions represent allele-haplotypes and not merged paralogs, we verified 250 that the genomic read coverage of phaseable regions was not exceeding the coverage of single-copy genes from the BUSCO arthropod database (merged paralogs should have an at least 2x higher coverage, see Methods ). This was indeed the case as single-copy BUSCO genes, phased regions and the scaffolds from which phased regions derived showed a median coverage of 126x, 101x, and 87x, respectively. By contrast, the median coverage of known duplicate BUSCO genes, which served as a 255 positive control, was about two-fold higher (207x) ( Supplementary Figure 3 ).
We next aligned the phased haplotype sequences and calculated best fitting ML trees. We then computed for each tree how distinct it was from two topology-constrained trees (Kuhner & Felsenstein, 1994) : (1) Figure 4: Haplotype trees are more consistent with asexuality in Oppiella nova (a) but with sex in Oppiella subpectinata (b) . Frequency distribution of per-region tree-distance score comparisons ( dist. asex-tree -dist. sex-tree ). The score measures the combined distance (dist) in topology and branch lengths between an unconstrained tree and one of two constrained trees (asex-tree, sex-tree; see schematic trees for each species, respectively). A negative value indicates that a phaseable region's best ML tree 280 is more similar to its asex-tree than to its sex-tree. IV) The topologies of haplotype subtrees are matching. 290 The high frequency of heterozygous genotypes shared among the seven individuals ( Figure 3 ) and the split of haplotypes A and B at the basis of a haplotype tree ( Figure 4a ) could theoretically be explained by hybridisation at the origin of asexuality, while the separation of the two lineages I and II could have been caused by the loss of heterozygosity at different sites within each lineage. Therefore, we also tested for evidence of long-term asexual evolution within lineages by assessing parallel parallel divergence. The 38 instances of parallel divergence are more than 2 times more frequent than expected by chance, meaning parallel divergence is also a significant feature of lineage II (P < 0.001).

Discussion
305 Independent mutation accumulation in haplotypes of diploid asexual organisms is considered to be strong direct support for evolution under obligate asexuality (Birky, 1996;Normark et al. , 2003) .
Surprisingly, empirical evidence for this 'Meselson effect' in parthenogenetic animals is, as yet, either lacking or equivocal (recently reviewed in Hoerandl et al. , 2020) . Here, we report population genomic signatures supporting the presence of the Meselson effect in the ancient asexual oribatid mite species Hybridization at the origin of asexuality can generate allele divergence patterns mimicking the 320 Meselson effect even in recently evolved asexual species (Taylor & Foighil, 2000;Delmotte et al. , 2003;Johnson, 2006;Lunt, 2008;Ament-Velásquez et al. , 2016) . However, a hybrid origin of asexuality is unlikely to explain our results for two reasons: first, asexual animals with a hybrid origin typically display within-individual divergence levels far exceeding the estimates in O. nova (Jaron et al. , 2020b) . Second, even if asexuality in O. nova was of hybrid origin, a hybrid origin accounts 325 neither for the high frequencies of heterozygous SNPs private to the two divergent lineages ( Figure 3 ) nor for the parallel divergence of haplotypes within lineages. Both patterns are likely generated by mutations that occurred after the origin of asexuality, which thus indicates long-term asexual evolution independently of possible hybridisation at the origin of asexuality.
Our results indicate that levels of haplotype divergence vary strongly among genomic regions in O.
330 nova individuals, as well as among different O. nova lineages (Figure 4a, Table 1 ). This is in line with previous findings of varying levels of heterozygosity loss vs retention in other asexual animal species (Jaron et al. , 2020b) and could explain why previous studies using individual genes in asexual mites and ostracods found no increased heterozygosity Schaefer et al. , 2006) . Our results thus illustrate that haplotype divergence and other genomic consequences of 335 asexuality need to be studied on whole genomes or transcriptomes rather than on a few genes (see also Neiman et al. , 2018) .
Besides haplotype divergence in O. nova , our study indicates the presence of coexisting, strongly divergent lineages with different heterozygosity levels ( Figure 2a; Table 1 ). Coexistence of strongly divergent lineages in O. nova has been shown previously based on the mitochondrial gene COI 340 (separation of lineages was estimated to have occurred 6-16 mya ago) and was considered to indicate coexistence of forest and grassland genotypes (Von Saltzwedel et al. , 2014) . O. nova occurs over a wide variety of habitat types and shows a cosmopolitan distribution (Subias, 2004) , suggesting that the extensive intra-specific polymorphism might be linked to large population sizes (Brandt et al. , 2017). Independently of the origin of the extensive polymorphism, we also observed differences in 345 heterozygosity between lineages. These could be linked to different mutation rates between lineages or to the presence of non-mutually exclusive mechanisms of heterozygosity loss, including the meiotic parthenogenesis proposed for asexual oribatid mites, lineage-specific deletions (hemizygous genome regions; Xu et al. , 2010 ), and (GC-biased) gene conversion (Marais, 2003) . GC-biased gene conversion has notably been suggested to contribute to the loss of heterozygosity in other 350 parthenogenetic animals, e.g. in the darwinulid ostracod Darwinula stevensoni and in aphids of the tribe Tramini (Normark, 1999; . Irrespective of the mechanisms underlying heterozygosity losses in O. nova , our results strongly support genome evolution in the absence of sex over evolutionary times in the asexual oribatid mite species O. nova . This is in line with previous studies which have shown that oribatid mites are able to 355 overcome some major selective disadvantages predicted for asexual lineages. Unlike other asexual animal taxa, genomes of oribatid mites show reduced accumulation of slightly deleterious mutations compared to their sexual relatives, possibly facilitated by large population sizes (Maraun et al. , 2012;Jaron et al. , 2020b;Brandt et al. , 2017;Bast et al. , 2018) . Also, similar to other asexual organisms, oribatid mites show no increased abundance and activity of transposable elements compared to 360 sexuals (Bast et al. , 2016(Bast et al. , , 2019 . These findings suggest that asexual oribatid mites indeed escape the dead-end fate usually associated with asexual lineages.  Table S6 ). Living animals were separated from the leaf litter with heat gradient extraction (Kempson et al. , 1963) and identified (Weigmann, 2006) , followed by at least one week of starving to reduce potential contaminants derived from gut contents. Afterwards, subpectinata from H, SA and KF (three individuals from each forest site for each species) from the 2017 collection batch. RNA extraction was done as described above. DNA and RNA quantity and 385 quality was measured using respectively Qubit and NanoDrop (Thermo Scientific), and Bioanalyzer (Agilent).

Reference genome assemblies and contaminant removal
For genome sequencing, extracted DNA from single individuals was amplified in two independent 390 reactions using the SYNGIS TruePrime WGA kit and then pooled. With the available read data, we tested a range of assembly strategies. The best assemblies were generated using normalized overlapped reads, because whole genome amplification introduces overrepresented genomic regions, which leads to coverage bias that is problematic for assembly. 410 Overlapped read libraries were generated by merging the paired forward and reverse reads of the 180 bp read libraries and additionally merging unpaired reads, followed by normalization using BBnorm v37.82 (Bushnell, 2014) . These normalized overlap read libraries were assembled into contigs using SPAdes v3.10.1, a multi k-mer assembler (Bankevich et al. , 2012) , with options -m 400 --careful -k 21,33,55,77,99,111,127. The resulting contigs were ordered into scaffolds using the 350 bp, 500 415 bp and 3000 bp read libraries using SSPACE v3.0 (Boetzer et al. , 2011) with default parameters. To close gaps emerging during scaffolding, GapCloser v1.12 (Luo et al. , 2012) with option -l 125 was run. For details see https://github.com/AsexGenomeEvol/HD_Oppiella: assembly and mites.

Genome annotation
The de-contaminated genome assemblies v03 were annotated using MAKER v2.31.8 (Holt & 435 Yandell, 2011) , a hybrid de novo evolution-based and transcript-based method. For this, repetitive genomic regions are first masked using RepeatMasker v4.0.7 (Smit et al. , 2013(Smit et al. , -2015 as implemented in MAKER. Protein coding genes were then predicted in a 2-iterative way described in (Campbell et al. , 2014) with minor modifications following author recommendations. For the first iteration, genes were predicted using Augustus v3.2.3 (Stanke et al. , 2006) trained with the BUSCO v3.0.2 results 440 (arthropoda_odb9 lineage with the --long option). A combination of UniProtKB/Swiss-Prot (release 2018_01) and the BUSCO arthropoda_odb9 proteome were used as protein evidence. The Trinity assembled mRNA-seq sequences (described below) were used as transcript evidence. The resulting gene models from iteration 1 were then used to retrain Augustus as well as SNAP v2013.11.29 (Korf, 2004) and a second iteration was performed. Following, predicted protein coding genes were 445 functionally annotated using Blast2GO v5.5.1 (Conesa et al. , 2005)  455 For quality filtering of the resulting transcriptomes, the trimmed RNAseq reads were mapped back against the transcriptomes using Kallisto v0.43.1 (Bray et al. , 2016) with options --bias and --rf-stranded, then transcripts with at least 1 TPM in any samples were retained. All computation for genome assembly and annotation were run on the Vital-IT cluster of the Swiss Institute of Bioinformatics.

460
Haplotype divergence: RNAseq, quality control and mapping RNA extracts were fragmented to 175 nt for strand-specific library preparation using the NEBNext ® Ultra ™ II Directional RNA Library Prep Kit. Paired-end sequencing with a read length of 100 bp was performed on a HiSeq2000 platform at the GTF (Genomics Technology Facility Lausanne, 465 Switzerland). Data processing for haplotype divergence inference was done using the high performance computing cluster of the Gesellschaft für Wissenschaftliche Datenverarbeitung Goettingen, Germany (GWDG). RNAseq reads were adapter-and quality-trimmed using TrimGalore v0.6.5 with default options (Phred quality threshold 20; adapter auto-detection; Martin, 2011;Krueger, 2012) . Contaminating sequences were removed using kraken2 (--paired; --db 470 minikraken2_v2; Wood & Salzberg, 2014) followed by mapping paired-end reads of each individual simultaneously against their respective reference genome, scaffolds flagged as contaminating sequences assembled together with the respective mite reference genomes (identified as described above), the human reference genome GRCh38.p12 (GenBank assembly accession:

MDS plots and AMOVA
For calculating MDS plots and AMOVA, first genotypes with a coverage < 10 were removed from gvcf-files using vcftools v0.1.15 (Danecek et al. , 2011) . Next, sites including at least one missing 500 genotype, monomorphic or tri-allelic variants, and indels were removed. To visualise genotype composition of populations, multidimensional scaling (MDS; two scales for two-dimensional representation) was done using plink v1.9 with the options --cluster, --mds-plot 2 eigvals and --allow-extra-chr (Purcell et al. , 2007) . Population differentiation was tested based on the filtered set To identify putative paralogs in the phased regions, these regions were tested for double coverage compared to the genomic baseline. Reads used to assemble genomes were mapped back to single 555 copy genes and duplicated genes identified by BUSCO (see above), and additionally to the phased regions and to scaffolds from which the phased regions were derived (but that were masked in the phased regions), using bowtie2 v2.3.4.1 with standard parameters (Langmead & Salzberg, 2012) . The mapped alignments were quality filtered (MAPQ score > 10) using samtools and optical and PCR duplicates removed using Picard Tools Mark Duplicates v2.22.0. Following, coverage was calculated 560 using bedtools genomecov v2.26.0 (Quinlan & Hall, 2010) .

Topology testing
To enable testing whether alignments of phaseable regions are better explained by a topology separating the haplotypes (as expected under asexuality) as compared to a topology separating 565 populations (expected under sexual reproduction) or vice versa , a constrained tree search was done.
Two constrained Maximum Likelihood (ML) trees, one complying with a fixed haplotype-separating topology (asex-tree), the other with a fixed population-separating topology (sex-tree) were calculated for each phased region using iqtree v1.6.10 with 1000 bootstrap replicates and model-testing included (Nguyen et al. , 2015) . https://github.com/AsexGenomeEvol/HD_Oppiella: Ossex.tre). Next, the distance between the two resulting trees and an unrestricted best fitting ML tree was estimated according to Kuhner & Felsenstein, 1994 with the dist.topo function implemented in the R package ape (Paradis & Schliep, 2019) . To enable the comparison between distances of different phaseable regions, the topological 585 distances of the best fitting ML tree to the haplotype-separating tree and to the population-separating tree were combined as Δ dist. HD-tree -dist. sex-tree value for each phaseable region. To test if the haplotype-separating tree was a significantly better fit to the alignment than the population-separating tree, trees were compared using RELL approximation with 10,000 RELL replicates and an approximately unbiased (AU) test with iqtree (Kishino et al. , 1990;Shimodaira, 2002) . Using the AU 590 test, we also compared both constrained trees to the unconstrained tree. For detailed information, see https://github.com/AsexGenomeEvol/HD_Oppiella: treecalcandtopotest.U and topodist.UR.

Parallel divergence testing
Phasing and haplotype reconstruction was done as described above but coverage was reduced to a 595 minimum of five to increase the number of informative sites, and thereby the number of non-polytomous trees. Calculation of best fitting ML trees was done as described above. Resulting topologies were screened by eye for being non-polytomous, for showing haplotype separation and for parallel divergence. To calculate the probability to observe parallel divergence in at least eight trees out of 31 haplotype separating eight taxa trees by chance (lineage I) we used the binomial theorem 600 with n = 31 (31 trees showing haplotype separation), k = 8 (8 trees showing · ∑ n k ( n! (n k!)k! − ) x k y n k − parallel divergence), x = 0.067 (the probability to observe parallel divergence in two rooted four taxa trees by chance) and y = 0.933 (counter event; 1-x). To calculate the probability to observe parallel divergence in at least 38 trees out of 55 haplotype separating six taxa trees by chance (lineage II) we used the binomial theorem with n = 55, k = 38, x = 0.333 and y = 0.667.

Statistical analyses
All statistical analyses were done in R v3.6.3 (R Core Team, 2013) unless mentioned otherwise.

Supplementary Figures
Supplementary Figure 1: The oribatid mite samples were collected in different forests in Germany. For detailed information on sampling, see Supplementary Table S8. Contigs that contained the phased regions but were masked for these, the phased regions, complete single-copy genes and duplicated genes as identified by BUSCO. Dotted lines represent the median of the coverage for the respective regions.