The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies

Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.


4 1
First, we investigated the divergence associated with the male-mimicking A morph. Pairwise 1 4 2 analyses revealed 568,039 and 508,031 k-mers (length = 31 bp) significantly associated with the 1 4 3 A vs O and A vs I comparisons, respectively. To determine whether the associated k-mers 1 4 4 represent differences in genomic content or sequence between the morphs, we mapped these k-1 4 5 mers to morph-specific reference genomes. If the associated k-mers are due to novel sequences 1 4 6 found in one morph but not the other, we would expect a vast majority of the significant k-mers 1 4 7 to be found in only one of the two morphs in a pairwise comparison. If the significant k-mers are 1 4 8 8 3 mer based GWAS, as implemented here, is a powerful method to identify variation in genomic 2 8 4 content and sequence, especially when the genomic architecture of the trait of interest is initially 2 8 5 unknown 35 . In this study, we did not know a priori which of the three morphs, if any, would 2 8 6 harbour unique genomic content. Had we ignored differences in genomic content between 2 8 7 morphs and based our GWAS analysis solely upon the DToL (O) reference assembly, we would 2 8 8 have failed to identify SNPs between I and O morphs (Extended Data Fig. 8), and the origin of I 2 8 9 females via a translocation of A content would have been obscured.

9 0
Second, a role for TEs in creating novel and even adaptive phenotypic variation is increasingly 2 9 1 being recognized 43,44 . Here, we found that a ~400 kb region of unique genomic content, possibly 2 9 2 driven by LINE transposition is associated with the male-mimicry phenotype in at least two 2 9 3 species of Ischnura damselflies. TE activity can contribute to phenotypic evolution by multiple 2 9 4 mechanisms. For instance, TEs may modify the regulatory environment of genes in their vicinity, 2 9 5 by altering methylation 45 and chromatin conformation patterns 46 , or by providing novel cis-2 9 6 regulatory elements 47 . The male-mimicry region in I. elegans is located between two coding 2 9 7 genes with putative DNA-binding domains, and which may thus act as transcription factors.

9 8
However, our expression data does not provide unequivocal support for differential regulation of 2 9 9 either of these genes between female morphs. Importantly, currently available expression data 3 0 0 comes from adult specimens, as female morphs are not visually discernible in aquatic nymphs.

0 1
Yet, the key developmental differences that produce the adult morphs are likely directed by 3 0 2 regulatory variation during earlier developmental stages. Now that the morph locus has been 3 0 3 identified, future work can address differential gene expression at more relevant developmental 3 0 4 stages, before colour differences between morphs become apparent.

0 5
TEs can also contribute to phenotypic evolution if they become domesticated, for example, when 3 0 6 TE-encoded proteins are remodeled through evolutionary change to perform adaptive host 3 0 7 functions 48 . We found two transcripts located in A specific or A/I specific regions that are likely 3 0 8 derived from LINE retrotransposons and are actively expressed in the genomes that harbour 3 0 9 them (Fig. 6b). It is therefore possible that these transcripts participate in the development of 3 1 0 adult colour patterns, which are initially more similar between A and I females than between 3 1 1 either of these morphs and O females 27,29 . Yet, functional work on these transcripts is required to 3 1 2 ascertain their role in morph determination. Finally, TEs can become sources of novel small 3 1 3 regulatory RNAs which play important regulatory roles 49 , including in insect sex 3 1 4 determination 50 . Thus, future work should also address non-coding RNA expression and function 3 1 5 in the morph locus.

1 6
Our results also provide molecular evidence for previous insights, gained by alternative research 3 1 7 approaches, on the micro-and macroevolution of female-limited colour polymorphisms. A 3 1 8 wealth of population data in Southern Sweden has shown that female-morph frequencies are 3 1 9 maintained by balancing selection, as they fluctuate less than expected due to genetic drift 31 .

2 0
Behavioural and field experimental studies indicate that such balancing selection on female 3 2 1 morphs is mediated by negative frequency-dependent male harassment 51,52 . We add to these sexually-antagonistic traits with a known genetic basis, in which predictions about these genomic 3 2 6 effects can be tested 53,54 . Here, the signature of balancing selection on the morph locus matches 3 2 7 the expectation of inter-sexual conflict resulting in negative-frequency dependent selection and 3 2 8 maintaining alternative morph alleles over long periods.

2 9
Similarly, a recent comparative study based on phenotypic and phylogenetic data inferred a 3 3 0 single evolutionary origin of the male-mimicking morph shared by I. elegans and I.

3
3 1 senegalensis 28 . Our present results strongly support this common origin. This is because A 3 3 2 females in both species share unique genomic content, including associated transcripts, and an   few previous examples of a novel phenotypic morph arising via recombination between two pre-3 5 0 existing morph haplotypes 10 . In pond damselflies, female polymorphisms with three or more 3 5 1 female morphs are not uncommon, and in some cases female morphs exhibit graded resemblance 3 5 2 to males 59 . It is therefore possible that recombination, even if rare, has repeatedly generated 3 5 3 diversity in damselfly female morphs.

5 4
While recombination might have had a role in generating the the novel I morph, we observe

6 4
Both historical scenarios are compatible with a morph locus reminiscent of a supergene, which is

9
De novo genome assembly 4 5 0 Bases in raw ONT reads from I. elegans were called using Guppy v 4.0.11 (Ao and Io data) and

8 9
We first investigated genomic divergence between morphs using a standard GWAS approach 4 9 0 based on SNPs (Extended Data Fig. 1). Initially, we conducted preliminary analyses using 4 9 1 different morph assemblies as mapping reference. Once the A-specific genomic region was 4 9 2 confirmed, we designated the A assembly as the mapping reference for the main analyses. Short- with mapping quality > 20, genotype quality > 30 and minor allele frequency > 0.02 (i.e. the 4 9 7 variant is present in more than one sample). To avoid highly repetitive content, we filtered 4 9 8 variants that had a combined depth across samples > 1360 (equivalent to all samples having ~ 4 9 9 50% higher than average coverage), and variants located in sites annotated as repetitive in either

0 5
We created a list of all k-mers of length 31 in the short-read data (19 females per morph,

0 6
Extended Data Fig. 1) following Voichek and Weigel 35 , and counting k-mers in each sample 5 0 7 using KMC v 3.1.0 84 . The k-mer list was filtered by the minor allele count; k-mers that appeared 5 0 8 in less than 5 individuals were excluded. k-mers were also filtered by percent canonized (i.e. the 5 0 9 percent of samples for which the reverse complement of the k-mer was also present). If at least 5 1 0 20% of the samples including a given k-mer contained its canonized form, the k-mer was kept in 5 1 1 the list. The k-mer list was then used to create a table recording the presence or absence of each 5 1 2 k-mer in each sample. A kinship matrix for all samples was calculated from this k-mer table, and 5 1 3 was converted to a PLINK 83 binary file, where the presence or absence of each k-mer is coded as 5 1 4 two homozygous variants. In this step, we further filtered the k-mers with a minor allele 5 1 5 frequency below 5%.

1 6
Because a single variant, be it a SNP or SV, will likely be captured by multiple k-mers, 5 1 7 significance testing of k-mer associations requires a method to control for the non-independence content is distributed across each genome (Extended Data Table 1).

3 1
To validate the k-mer GWAS results of unique genomic content in A-females relative to both I-

3 2
and O-females, we plotted read-depth across our region of interest (the unlocalized scaffold 2 of 5 3 3 chromosome 13, see Results) in the A assembly (Extended Data Fig. 1). Short-read data (19 5 3 4 samples per morph) were mapped to the assembly with bwa-mem v 0.7.17 78 and reads with 5 3 5 mapping score < 20 were filtered, using Samtools v 1.14 87 . Long-read data (one sample per 5 3 6 morph) were also mapped to the assembly using minimap2 v 2.22-r1110 69 , and quality filtering 5 3 7 was conducted as above. Read depth was then averaged for each sample across 500 bp, non-

4 1
To account for differences in overall coverage between samples, we conducted the same 5 4 2 procedure on a large (~15 mb) non-candidate region in chromosome 11 and calculated a 5 4 3 "background read depth" as the mean read depth across the non-repetitive windows of this 5 4 4 region. We then expressed read-depth in the candidate region as a proportion of the background 5 4 5 read depth. Values around 1 thus indicate that a sample is homozygous for the presence of the 5 4 6 sequence in a window. Values around 0.5 suggest that the sample only has one copy of this 5 4 7 sequence in its diploid genome (i.e. it is heterozygous). Finally, values of 0 imply that the 500 bp 5 4 8 reference sequence is not present in the sample (i.e. the window is part of an insertion or 5 4 9 deletion).

0
We also investigated read-depth coverage on the I assembly, specifically across the region that 5 5 1 was identified in the k-mer based GWAS as capturing content that differentiated both A and I 5 5 2 females from O females (Fig. 3b, Extended Data Fig. 1). To do so, we followed the same Evidence of a trans-species polymorphism 5 9 9 We used pool-seq data from the closely related Tropical Bluetail damselfly (Ischnura 6 0 0 senegalensis) to determine whether male-mimicry has a shared genetic basis in the two species 6 0 1 (Extended Data Fig. 1). First, we aligned the short-read data from the the two I. senegalensis 6 0 2 pools (A and O-like) to the A morph assembly of I. elegans using bwa-mem v 0.7.17 78 , and 6 0 3 filtered reads with mapping score < 20, using Samtools v 1.14 87 . We then quantified read depth 6 0 4 as for the I. elegans resequencing data (see Read depth analysis above). To confirm that the 6 0 5 higher read-depth coverage of the A pool is specific to the putative morph locus, we also plotted 6 0 6 the distribution of read-depth differences between O-like and A pools across the rest of the 6 0 7 genome and compared it to the morph locus (Supporting Text 5). Next, we determined if the ~20 6 0 8 kb SV that characterizes A and I females of I. elegans is also present in A females of I. We assembled transcripts in the A morph genome (Extended Data Fig. 1) to identify potential 6 1 7 gene models unique to the A or A and I morphs and which would therefore be absent from the I. 6 1 8 elegans reference annotation (based on the O haplotype). First, all raw RNAseq data from I. 6 1 9 elegans samples were mapped to the A assembly using HISAT2 v 2.2.1 95 and reads with 6 2 0 mapping quality < 60 were filtered using Samtools v 1.19 87 . Transcripts were then assembled in 6 2 1 StringTie v 2.1.4 96 under default options, and merged into a single gtf file. Transcript abundances 6 2 2 were quantified using this global set of transcripts as targets, and a transcript count matrix was 6 2 3 produced using the prepDE.py3 script provided with StringTie. Mapped RNAseq data from I.

2 4
senegalensis was also used to assemble transcripts (Extended Data Fig. 1), but this time the 6 2 5 HISAT2 assembly was guided by the annotation based on I. elegans data, while allowing the 6 2 6 identification of novel transcripts. Transcript abundances were quantified as for I. elegans. 6 2 7 We analyzed differential gene expression using the package edgeR v 3.36 97 in R v 4.2.2 77 .

2 8
Transcripts with fewer than one count per million in more than three samples were filtered.

2 9
Library sizes were normalized across samples using the trimmed mean of M-values method 98 , 6 3 0 and empirical Bayes tagwise dispersion 99 was estimated prior to pairwise expression analyses.

3 1
Differential expression of genes in the morph loci was tested using two-tailed exact tests 100 , 6 3 2 assuming negative-binomially distributed transcript counts and applying the Benjamini and 6 3 3 Hochberg's algorithm to control the false discovery rate (FDR) 101 .

3 4
Nucleotide sequences of all transcripts mapped to the 1.5 mb morph locus in the A assembly 6 3 5 were selected. Coding sequences (CDS) in these transcripts were predicted using Transdecoder v 6 3 6 5.5.0 (https://github.com/TransDecoder/TransDecoder). Predicted CDS and peptide sequences 6 3 7 were read from the assemblies using the genome-based coding region annotation produced with 6 3 8 Transdecoder and gffread v 0.12.7 102 . We investigated whether any inferred peptides or 6 3 9 transcripts were unique to A or A and I by comparing these sequences to the DToL reference