Abstract
Many animal species remain separate not because they fail to produce viable hybrids, but because they “choose” not to mate. However, we still know very little of the genetic mechanisms underlying changes in these mate preference behaviours. Heliconius butterflies display bright warning patterns, which they also use to recognize conspecifics. Here, we couple QTL for divergence in visual preference behaviours with population genomic and gene expression analyses of neural tissue (central brain, optic lobes and ommatidia) across development in two sympatric Heliconius species. Within a region containing 200 genes, we identify five genes that are strongly associated with divergent visual preferences. Three of these have previously been implicated in key components of neural signalling (specifically an ionotropic glutamate receptor and two regucalcins), and overall our candidates suggest shifts in behaviour involve changes in visual integration or processing. This would allow preference evolution without altering perception of the wider environment.
The evolution and maintenance of new animal species often relies on the emergence of divergent mating preferences1,2. Changes in sensory perception or other neural systems must underlie differences in innate behaviours between species, and will ultimately have a genetic basis. However, although the significance of behavioural barriers for speciation has been recognized since the Modern Synthesis3, we know almost nothing of the genes underlying changes in mating preferences, or variation in behaviours across natural populations more broadly4,5. Identifying these genes will provide an important route towards understanding how behavioural differences are generated, both during development and across evolutionary time.
Previous studies of isolating preference behaviours have largely been limited to the identification of causal genomic regions, which almost invariably contain many genes6,7,8,9. Only a handful of studies have identified likely candidate genes that contribute to species behavioural preferences. These are largely limited to chemosensory-guided mating preferences10,11,12,13, and have identified changes at chemoreceptor genes. To our knowledge, only two studies – in incipient fish species – have identified candidates for visual preference evolution, albeit indirectly, both suggesting a role for sensory perception mediated by changes in the peripheral visual system14,15. Whether or not visual preference evolution generally involves shifts at the sensory periphery, or in downstream processing, remains unknown.
The closely related species Heliconius melpomene and H. cydno differ in warning patterns, which are both under disruptive selection for mimicry16 and are important mating cues 17. In central Panama, H. melpomene shares the black, red and yellow pattern of its local Heliconius erato co-mimic. In contrast, H. cydno mimics the black and white patterns of H. sapho. The two species remain separate largely due to strong assortative mating18. Visual preferences for divergent patterns are particularly apparent in males, which strongly prefer to court conspecific females17,19,20. Differences in warning pattern between melpomene and cydno are largely due to expression differences in just three genes, specifically optix21, WntA22 and cortex23.
Quantitative trait locus (QTL) mapping of H. melpomene and H. cydno has revealed three genomic regions of major effect that influence the relative time males spend courting red melpomene or white cydno females20. Notably, the best supported QTL was in the same genomic region as optix, the gene responsible for presence of the red colour pattern elements in H. melpomene21. Genetic linkage will facilitate speciation by impeding the breakdown of genetic associations between ecological and mating traits24. Nevertheless, this QTL, and its associated candidate region, contain hundreds of genes, and the exact genes responsible for differences in preference behaviour are not known.
Here, we first confirm that the behavioural QTLs identified previously are associated with variation in male courtship initiation. We then identify genes within the major QTL, which were differentially expressed in the neural tissue (central brain, optic lobes and ommatidia) of H. melpomene and H. cydno, or have protein coding changes predicted to alter protein function. Out of 200 genes within the QTL region, we identify just five candidates likely to underlie assortative mating behaviours.
Results
Chromosome 18 is associated with differences in courtship initiation
Our previous results reveal that QTLs on chromosomes 1, 17 and 18 influence the relative time hybrid males spend courting red melpomene or white cydno females20. However, the time males spend courting a particular female might depend not only on male attraction, but on the female’s response (and in turn his response to her behaviour). To confirm that these previously reported QTLs influence male approach behaviours (as opposed to other traits that may influence courtship, for example male morphology25), we reanalysed our previous data, this time explicitly considering whether males initiated courtship towards melpomene, cydno or both types of female during choice trials. Consistent with our previous analyses20, we found that F1 and backcross-to-melpomene prefer to court melpomene females, whereas courtship initiation behaviours segregate in the backcrosses to cydno (Figure 1). Notably, backcrosses-to-cydno males heterozygous at the QTL on chromosome 18 (i.e. with a melpomene allele derived from the F1 father) initiated courtship towards melpomene females more frequently than males homozygous for the cydno allele (Figure 1, bottom left; n = 139, ΔELPD: −10.9 (S.E.±5.1), i.e. a change of 2.14 SE units). Together with previous evidence that male hybrids bearing melpomene alleles at optix prefer to court the artificial models of melpomene females over those of cydno26, these results suggest that the QTL on chromosome 18 harbours genes for visual attraction behaviours towards females with the red pattern. Consequently, we focused our subsequent analyses on this QTL on chromosome 18 (and also because tight linkage of optix allowed us to track the alleles at preference-colour locus in hybrid crosses). The QTL on chromosome 1 was also retained in our model of initiation behaviours (Supplementary figure 1;n = 139, ΔELPD = −13.6 (SE±5.7), i.e. a change of 2.34 SE units), in contrast to the QTL on chromosome 17 which was not retained (n = 139, ΔELPD = −2.1 (S.E.±3.0)). Results for the QTL on chromosome 1 are reported in the supplementary materials (Supplementary table 1).
27 genes within the major QTL are differentially expressed in the brains and eyes of H. cydno and H. melpomene
We hypothesized that changes in gene regulation that determine differences in visual mate preference behaviours might occur during pupal development (for instance, during visual circuit assembly) or in the imago, and must involve changes in the peripheral and/or central nervous system27. Therefore, we generated RNA-seq libraries for combined eye and brain tissue, across two pupal stages (around the time of ommochrome pigment deposit and half-way through pupal development) and one adult stage, for H. melpomene and H. cydno and compared their gene expression levels. We found considerable differential expression at the QTL on chromosome 18 (the QTL spans 2.75 Mb, and contains 200 genes). We identified 27 genes within the QTL region that show differential expression between melpomene and cydno, in at least one of the three developmental stages. These were mostly located within the QTL peak (i.e. the genomic region with strongest statistical association with male preference) or in close proximity to optix (Figure 2). The same genes were frequently differentially expressed across development (Supplementary table 1), with 11 genes being differentially expressed in more than one stage.
The genomic region between the start of chromosome 18 and optix (comprising the QTL peak) is highly divergent between melpomene and cydno28, and divergent coding sequences within this region could also introduce mapping biases of RNA-seq reads. To account for this, we repeated the analysis having mapped to both the H. melpomene reference genome29 and to a H. cydno genome30. Generally, we found similar patterns of differential expression when mapping to the H. cydno genome (Supplementary figure 2, Supplementary table 2). Nevertheless, in subsequent analyses we excluded two genes, HMEL034187g1 and HMEL034229g1, which showed reversal of the fold change or did not show differential expression when mapping to the H. cydno genome respectively.
A regucalcin and an ionotropic glutamate receptor are upregulated in both H. melpomene and F1 hybrid males
Our previous behavioural experiments suggest that the alleles for the melpomene behaviour are dominant over the cydno alleles20,26 (Figure 1). Given this pattern of dominance, we predicted that genes underlying variation in male preference to be up- or down-regulated in the brains of both melpomene and first generation (F1) hybrid males, with respect to cydno. Of the putative genes differentially expressed between cydno and melpomene reported above, only four, within the QTL candidate region, were differentially expressed between the F1 hybrids and cydno (Figure 2). These included two regucalcins (also called senescence marker proteins-30: HMEL013552g1, HMEL034199g1), an ionotropic glutamate receptor (HMEL009992g4), which is a putative ortholog of Grik2, and one gene with no annotated function (HMEL009992g1). We obtained the same results regardless of whether we considered both males and females together, or males alone. Further inspection of spliced mRNA-reads indicated that the two annotated regucalcins were in fact a single gene (from now on referred to as regucalcin2). This was also the case for the ionotropic glutamate receptor and the gene with no annotated function (from now on referred to as Grik2).
Differential expression of Grik2 in adults is likely due to cis-regulatory effects
To determine whether differences in gene expression levels between parental species were due to cis- or trans-regulatory changes, we conducted allele specific expression (ASE) analyses in adult F1 hybrids. In F1 hybrids, both parental alleles are exposed to the same trans-environment, and consequently trans-acting factors will act on alleles derived from each species equally (unless there is a change in the cis-regulatory regions of the respective alleles). Therefore, differences in allele specific expression indicate changes in cis-regulatory regions31. Both candidate genes (Grik2 and regucalcin2) had a very low number of SNPs that could differentiate the melpomene and cydno allele (using both gene models of the Hmel2.5 annotation and RABT annotation) and few reads mapped to these SNPs. Nevertheless, for Grik2, the melpomene allele was significantly more highly expressed relative to the cydno allele (p=0.017, Wald test), suggesting cis-regulatory effects (Figure 3). For regucalcin2, although there was a tendency towards up-regulation of the melpomene allele, consistent with cis-regulation, we did not have sufficient power to rule out trans-only regulatory effects (Figure 3; p=0.108, Wald test).
Grik2 is differentially expressed in hybrids that essentially differ only for allelic composition at the behavioural QTL region
In order to study the specific effects that melpomene derived alleles at the QTL on chromosome 18 had on gene-expression, we introgressed this region into a cydno background through multiple backcrosses (crossing design in Supplementary figure 3). We wanted to investigate whether differences at this QTL regulated expression of any specific genetic pathway during development, and more generally what changes in genome-wide transcription were observed in hybrids differing (mostly) just at this QTL region.
Notably, in these third-generation backcross hybrid comparisons, across the entire genome only 23 and 29 genes were differentially expressed (at 156h after pupal formation (APF) and at 60hAPF, respectively). Of these, 20 and 19 genes (at 156hAPF and at 60hAPF, respectively) were located on chromosome 18, indicating that gene expression differences in these comparisons were mostly restricted to the preference-colour region on chromosome 18, segregating for cyd/melp or cyd/cyd alleles. No genetic pathway was enriched for gene expression differences between these hybrids at either pupal stage (PANTHER enrichment test32), suggesting that overall this QTL harbours a few, modular changes in gene regulation in the developing brain/eyes of cydno and melpomene. Grik2 was the only gene detected as differentially expressed between species and hybrids at these pupal stages (Supplementary figure 4).
To verify that differential expression of candidate genes at the QTL region is driven by melpomene alleles on chromosome 18 and not by other melpomene alleles at trans-acting genes on other chromosomes, we compared gene expression levels between hybrids carrying cyd/melp vs. cyd/cyd regions on chromosomes chr1, chr4 and chr15, chr20 (Supplementary figure 5A). In these comparisons, there was no signal of differential expression on chromosome 18. This supports the cis-regulatory activity of the melpomene allele of candidate genes on chromosome 18. To test this further, we conducted another allele specific expression study in the BC3 hybrids, which suggested trans-regulatory effects for Grik2 at these pupal stages, but were less conclusive with regard to regucalcin2 (Supplementary figure 6). Since causal gene/s might exert an effect on behaviour due to their action during development or in adult form, and this action might in turn be differently (cis- vs trans-) regulated, we still considered both genes as strong candidates.
4 genes with protein-coding substitutions within the QTL candidate region have predicted effects on protein function
Because shifts in behavioural phenotypes could be due to changes in protein-coding regions, we additionally considered protein-coding substitutions between melpomene and cydno. Overall, we found 152 protein-coding substitutions, spanning 54 of the 200 genes across the entire QTL candidate region. We then studied whether these variants were predicted to have non-neutral effects on protein function with PROVEAN33. The PROVEAN algorithm predicts the functional effect of protein sequence variations based on how they affect alignments to different homologous protein sequences. We found 4 genes with such predicted effects (PROVEAN score < −2.5): Specifically, a WD40-repeat domain containing protein (HMEL013551g3), a cysteine protease (HMEL009684g2), a MORN motif containing protein (HMEL006660g1), and another regucalcin (HMEL013551g4) adjacent to, but distinct from, that found to be differentially expressed above (from now on referred to as regucalcin1).
Candidate genes occur in regions with reduced gene flow
Of our six candidate genes for preference behaviours that contribute to reproductive isolation between H. cydno and H. melpomene (regucalcin2, Grik2 and the four genes with protein coding modifications), five are found within the QTL peak (Figure 4). Genetic changes causing reproductive isolation between populations are expected to reduce localized gene flow in their genomes. Therefore, we compared the position of our candidate genes to estimated levels of admixture proportions (fd)34 between H. melpomene and H. cydno across the QTL candidate region35. We found that all candidate genes were located in genomic regions with low fd values (Figure 4), suggesting localized resistance to gene flow between melpomene and cydno at these genes and their putative cis-regulatory regions.
Discussion
Behavioural isolation is frequently implicated in the formation of new species, and involves the correlated evolution of both mating cues and mating preference. Here we have analysed a genomic region in a pair of closely related sympatric butterflies, H. cydno and H. melpomene, that contains genes for divergence in both an ecologically relevant mating cue and the corresponding preference. Physical linkage between ecological and mating traits will facilitate speciation by allowing different barriers to act in concert to restrict gene flow36,37. Although the genes underlying changes in the warning pattern cue in Heliconius are well characterized21,22,23,38 (e.g. optix), those underlying the corresponding shift in behaviour have not previously been identified20,39,40. We have pinpointed a small number of genes that fall within the QTL peak, which show either expression (regucalcin2 and Grik2) or protein coding differences (HMEL013551g3, HMEL009684g2, HMEL006660g1, and regucalcin1) and fall within a region of reduced admixture, that are strong candidates for modulating mating behaviour.
Two broad neural mechanisms could underlie the evolution of divergent visual preferences, involving changes in either i) detection at the sensory periphery or ii) the processing and/or integration of visual information. Although H. melpomene and H. cydno have the same retinal mosaics/class of photoreceptors41, spectral sensitivity in the Heliconius eyes could be altered by filtering pigments42, or other physiological processes taking place at the photoreceptors/sensory periphery, eventually shifting sensitivity towards different wavelengths (and possibly colour patterns). It has previously been hypothesized that the gene regulatory networks for ommochrome deposition in the Heliconius eyes might have been co-opted in the wings43, where optix plays a central role, and therefore that optix might play a role in eye pigmentation in Heliconius. However, the protein product of optix has not been detected in pupal or adult retinas of various Heliconius species tested44, and therefore has no obvious link to ommochrome deposition in the eyes. More generally, the underlying evolutionary mechanism is unlikely to involve detection at photoreceptors, as this would probably have a broad effect on downstream processing2 and alter the visual perception of the animal’s wider environment.
The second mechanism, involving changes in the processing, and/or integration, of visual information, could act through an alteration of neuronal activity or connectivity. For instance, different levels of gene expression in conserved neural circuits between melpomene and cydno may affect overall synaptic weighting and determine whether a signal (e.g. colour and motion) elicits a motor pattern (response towards a female) or not. Consistent with this scenario, the composition of ionotropic receptors at post-synapses is a key modulator of synaptic transmission45, implicating Grik2. Interestingly, differential expression of ionotropic glutamate receptors is also associated with variation in social and aggressive behaviours in vertebrates46,47. Regucalcins are involved in calcium signalling48, which regulates synaptic excitability and plasticity49, and has an important role in axon guidance50 (albeit alongside additional roles across a broad range of biological processes), making the two regucalcins we identify strong candidates for behaviour.
Changes in the regulation of genes with pleiotropic effects are likely to be less detrimental compared to changes in their protein-coding sequences51 (although emerging evidence has begun to suggest that enhancer/repressor elements may be more pleiotropic than previously thought52,53). Furthermore, there is considerable evolutionary potential in the co-option of transcription factors/networks51 that regulate neural patterning or neuron-type activity, possibly resulting in novel adaptive expression patterns. In line with this, Regucalcin2 and Grik2, which are differentially expressed in the eyes and brain in both our species and hybrid comparisons, are likely to be involved in multi-functional processes, such as calcium signalling and ion transport, and likely have pleiotropic alleles. We also found evidence of cis-regulatory effects for both genes (albeit not significant for regucalcin2), which would be required of the causal genetic change within the QTL, if it were to be in gene regulation.
Despite expectations that non-coding, regulatory loci may provide a flexible route to divergent mating preferences, we also found substitutions in coding regions at the QTL, which are predicted to have an effect on protein functioning and therefore remain strong candidates. These genes include regucalcin1, which is distinct from, but located next to, regucalcin2 (which is differentially expressed). Notably, the eye transcript of regucalcin1 was recently characterized as fast-evolving across Heliconius species54. Other candidates include a cysteine protease, which functions in protein degradation, and might be linked to behaviour for example through degradation of neurotransmitters, a MORN motif containing protein (function unknown), and a WD40 containing protein. WD-repeat containing proteins have been implicated in a wide array of functions ranging from signal transduction to apoptosis (https://www.ebi.ac.uk/interpro).
Although preference for red colouration and the optix gene are tightly linked, we find no evidence that optix is differentially expressed in the eyes or brains of our two species. It is also not located within the QTL peak (and it contains no non-synonymous changes in protein coding regions21). It seems unlikely therefore that changes in cue and preference are pleiotropic effects of the same allele. More generally, although we have pinpointed the strongest candidates yet identified for assortative mating behaviours in Heliconius, it is possible that actual causal changes in gene regulation are restricted to developmental stages other those sampled, or restricted to a few neuronal populations not detected with transcriptomic data from eyes and whole brain tissue. Nonetheless, by sampling at two pupal stages (around the time of optix expression/ommochrome pigment deposit in the wing/eye and halfway through pupal development) and at the adult stage, we should have captured important transitions for the behavioural programming of the two species.
Work in the past decade has shown that complex innate behavioural differences between species can be encoded in relatively few genetic modules55,56, but very few studies57,58,59 have identified specific genes underlying behavioural evolution. In particular, traditional laboratory organisms continue to provide important insights into the evolution and genetics of behaviour27,58,60, however, comparative approaches are required to determine if developmental principles can be broadly applied, and also to incorporate a wider range of phenotypic variation and sensory modalities. The challenge now is to increase the resolution of studies in non-traditional systems, in order to link individual genetic elements to behaviours, and the sensory and/or neurological structures through which they are mediated. In this light, we have identified a small handful of strong candidate genes associated with the evolution of visual mate preference behaviours in Heliconius. These genes are in tight physical linkage with the locus for the corresponding shifts in an ecologically relevant mating cue, providing an important opportunity to investigate the build-up of genetic barriers crucial to speciation. The candidate genes identified seem more likely to alter visual processing or integration, rather than detection at photoreceptors, consistent with permitting changes in mate preference without altering perception of the animal’s wider environment.
Materials and Methods
Courtship initiation analyses
Butterfly rearing, crossing design and genotyping are described in detail elsewhere20. In brief, we assayed male preference behaviours for H. melpomene, H. cydno, their first generation (F1) hybrids and backcross to hybrids to both parental species in standardized choice trials. Males were introduced into outdoor experimental cages (1×1×2m) with a virgin female of each species and courtship behaviours recorded. Whenever possible, trials were repeated for each male (median = 5 trials). To determine whether previously identified QTLs for courtship time contribute to variation in courtship initiation behaviours, we performed a post-hoc analysis using categorical models in a Bayesian framework with a multinomial error structure, using the R package brms. All models were run under default priors (non- or very weakly informative). In contrast to our previous analysis20, in which we considered the number of minutes (i.e. time) for which courtship was directed towards H. cydno or H. melpomene females, here the response variable was number of trials in which male courtship was initiated towards H. cydno females only, H. melpomene females only, or both female types (hereafter referred to as “initiation”). Across males the median number of trials with a response was 3. Using backcross-to-cydno males only, we fitted initiation as a response variable to genotype (cyd/cyd or cyd/melp) at each QTL, which were included as separate fixed effects. Individual ID was fitted as random factor. To test the effect of each QTL on male initiation, we compared the saturated model incorporating all three QTL with reduced models excluding each QTL in turn, using approximate leave-one-out (LOO) cross-validation61 as implemented in brms, and based on expected log pointwise predictive density (ELPD). Normal distribution of ELPD can be a straightforward approximation given our large samples sizes (n=139)61. Therefore, we considered an absolute value of ELPD greater than 1.96 units of its standard error as indicative of the reduced model being less-informative than the saturated model (95% confidence). Males that did not initiate courtship to any female across trials were excluded from analyses, resulting in a dataset of 139 males, from a total of 146 backcross males for which we had genotype data. Finally, we extracted predictors and credibility intervals for backcross males with differing genotypes from the minimum adequate model. Credibility intervals for H. melpomene, H. cydno, F1 hybrid and backcross to melpomene males displayed in Figure 1 were generated following the same procedures. Raw data and analysis code are available in the following github repository: https://github.com/SpeciationBehaviour/neural_genes_heliconius.git
Butterfly collection, rearing and crossing design for expression analyses
Wild H. melpomene rosina and H. cydno chioneus individuals were caught along Pipeline Road near Gamboa, Panama, in the Soberania National Park, and used to establish stocks at the Smithsonian Tropical Research Institute insectaries in Gamboa. Butterflies were reared in common garden conditions, in 2×2×2m cages, and provided with fresh Psiguria flowers and 10% sugar solution. Larvae were reared on fresh Passiflora shoots/leaves until pupation. H. cydno, H. melpomene and hybrid individuals used for RNA-seq (see below) were reared concurrently and under the same conditions. F1 hybrids were obtained by crossing a wild-caught H. m. rosina male to an insectary-bred virgin H. c. chioneus female.
The introgression line was generated by outcrossing a hybrid male with a red forewing band (crossing design shown in Supplementary figure 3) to virgin H. cydno females, over three generations. The peak of the behavioural QTL reported previously20 on chromosome 18 (at 0cM) is in very tight linkage with the optix colour pattern locus (at 1.2cM), which controls for the presence and absence of the red forewing band seen in H. melpomene rosina. Presence of the red forewing band is dominant over its absence so that segregation of the red band can be used to infer genotype at the optix locus. Specifically, hybrid individuals with a red forewing band are heterozygotes for H. melpomene/H. cydno alleles at the optix locus, whereas individuals lacking the red band are homozygous for the cydno allele. Due to the tight linkage we expected little recombination between optix and QTL peak even after three generations of introgression, allowing us to infer genotype at the preference-optix locus (which we confirmed with genetic data, see below).
Tissue dissection, RNA extraction and mRNA sequencing
Eye (ommatidia and retinal membrane) and brain tissue (central brain and optic lobes) were dissected out of the head capsule in cold (4 °C) 0.01M PBS solution, at two pupal stages: 60 hours after pupal formation (60h APF) and 156h APF; and in adults aged 9 - 13 days. We sampled adults at around 10 days of age because by this stage males are mature and frequently court females62. Adult males and females sampled were sexually naive. We decided to sample at 60h APF because this is the developmental stage at which optix is expressed in the wing, so we hypothesized that it might had also been when optix is expressed in the brain. We sampled at 156h APF as a putative stage halfway through pupal development, and at this stage most of the major neural connections have just been established in the Heliconius brain (Stephen Montgomery, unpublished data).
Tissues were stored in RNAlater at 4 °C for 24 hours, and subsequently at −20 °C, until RNA extraction. Total RNA was extracted using TRIzol Reagent (Thermo Fisher, Waltham, MA, USA) and a RNeasy Mini kit (Qiagen, Valencia, CA, USA). Samples were treated with DNase I (Ambion, Darmstadt, Germany). Integrity of total RNA was checked either on an agarose gel or using an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). RNA concentration was measured on a Nanodrop spectrophotomer. Illumina TruSeq RNA-seq libraries were prepared and sequenced at Edinburgh Genomics (Edinburgh, UK) with 100 bp paired-end reads. To avoid lane effects the distribution of the species samples was randomized on the sequencing platform. More detailed information about individuals and sequencing yields can be found in the Supplementary dataset.
RNA-seq read mapping and differential gene expression analyses
After a quality control of RNA-seq reads with FastQC, we trimmed adaptor and low-quality bases using TrimGalore v.0.4.4 (https://www.bioinformatics.babraham.ac.uk/projects/). RNA-seq reads were mapped to the H. melpomene 2.5 genome29/annotation63 using STAR v.2.4.2a64 in 2-pass mode. We only kept reads that mapped in ‘proper pairs’ using Samtools65. The number of reads mapping to each gene were estimated with HTseq v. 0.9.166 with model “union”, thus excluding ambiguously mapped reads. Differential gene expression analyses between species/hybrids were conducted in DESeq267. We considered only those genes showing a 2-fold change in expression level, and at adjusted (false discovery rate 5%) p-values < 0.05, to be differentially expressed, to exclude expression differences caused by known differences in brain morphology68 (Montgomery et al., in prep).
Sexing pupae
In all DESeq2 analyses, sex was included as a random factor. To sex pupae, we first marked duplicate RNA mapped reads with Picard (https://broadinstitute.github.io/picard/), and used GATK 3.869 to split uniquely mapped reads into exon segments and trim sequences overhanging the intronic regions. We then used Haplotype Caller on each individual, using calling and filtering parameters according to the GATK Best Practices for variant calling on RNA-seq data. The sex of pupal samples was inferred from the proportion of heterozygous (biallelic) SNPs using the R package SNPstats. Males (ZZ) were expected to have ≫ 0% heterozygous sites, whereas females (ZW) to have 0%. Z-linked heterozygosity of the pupal samples (Supplementary table 3) were in line with expectations (either ~ 0 for females or an order of magnitude higher for males), and matched heterozygosity of either adult males or females, for which the sex was determined from external morphology.
Inference of gene function and transcript-based annotation
Biological functions of annotated genes were inferred with InterProScan v570, using the corresponding Hmel2.5 predicted protein sequences. InterProScan uses different databases like InterPro, Pfam, PANTHER, and others, to infer functional protein domains and motifs (based on homology). To study whether specific biological functions were enriched among genes showing differential expression among hybrid types, we conducted the PANTHER enrichment test38 (with Bonferroni correction for multiple testing) using Drosophila melanogaster as the reference gene function database.
Upon detailed inspection of the mapping coverage of spliced RNA-seq reads to the Hmel2.5 gene annotation, we noticed that some gene models were fragmented, namely, a few exons that appeared to be spliced together were incorrectly considered distinct genes. To check that this did not introduced inaccuracies in our differential gene expression analyses, we re-annotated the melpomene genome using the Cufflinks reference annotation-based transcript (RABT) assembly tool71 We used the transcriptomic data from both melpomene and cydno to reannotate the melpomene genome, separately for every developmental stage, and reconducted the differential gene expression analyses in DESeq2 as described above. Repeating all comparative transcriptomic analyses using these new annotations (where exons were correctly considered as part of single genes), we confirmed that both regucalcin2 and Grik2 were differentially expressed in both species and hybrids comparisons.
Inference of BC3 hybrids genome composition
In order to perform comparative transcriptomic analyses between third-generation backcross hybrids (BC3) segregating at the QTL on chromosome 18 (crossing design in Supplementary figure 3), we first determined which genomic regions in these hybrids were heterozygous (cyd/melp) or homozygous (cyd/cyd). For this, we inferred variants from RNA-seq reads for each BC3 hybrid (individually as above), and from the combined melpomene and cydno samples. For the species, we used HaplotypeCaller69 on RNA-seq samples from all developmental stages of either species, to produce individual genomic records (gVCF), and then jointly genotyped melpomene and cydno gVCFs (separately for the two species) using genotypeGVCFs with default parameters. Genotype calls were filtered for quality by depth (QD) > 2, strand bias (FS) < 30 and allele depth (DP) > 4. For further analyses we kept biallelic genotypes only. We then used the intersect function of bcftools65 to infer variants exclusive to the cydno and to the melpomene samples.
We calculated the fraction of variants that each BC3 hybrid individual shared with the melpomene and with the cydno samples, in non-overlapping 100kb windows. We compared these to the fraction of variants that a F1 hybrid and a H. cydno individual (not included in the combined genotyping of the cydno samples), shared with the same species samples, and found that they matched either one of them, indicating heterozygous (cyd/melp) or homozygous (cyd/cyd) regions (Supplementary figure 5B). In this analysis, we considered only those 100kb windows where BC3 hybrids/F1 hybrid/H. cydno individuals shared more than 30 variants with the melpomene/cydno samples.
To corroborate our findings, we repeated the same type of analysis, this time inferring species-specific variants for melpomene and cydno using 10 H. melpomene rosina and 10 H. cydno chioneus genome resequencing samples. Variant calling files (vcf) were retrieved from Martin et al35. We considered only biallelic genotype calls that had 10 < DP < 100 and genotype quality (GQ) > 30. With this analysis we found the same heterozygous and homozygous regions in BC3 hybrids.
The size and number of the introgressed regions were in line with expectations about 3rd generation backcross hybrids following our crossing design: segregating at the level of chromosome 18 and at four other chromosomes. For the BC3 hybrids sampled at 156 hours after pupal formation (APF) we had 6 cyd/melp and 10 cyd/cyd at the QTL region on chromosome 18 (Supplementary figure 5A), for those at 60h APF, 8 cyd/melp and 9 cyd/cyd hybrids at the same region.
Allele-specific expression (ASE) in hybrids
In order to conduct ASE analyses we first identified species specific variants, fixed in either melpomene and cydno. For this, we took the quality filtered variants inferred from the species genome resequencing data, and assigned those genotype calls in cydno and melpomene for which allele frequency (AF) was > 0.9 as homozygous (we did not consider indels in this analysis). We then used bcftools intersect65 to get only those variants for which cydno and melpomene had opposite alleles.
At the same time, we called variants from RNA-seq reads of F1 hybrid individuals, again according to the GATK Best Practices (with the exception of parameters-window 35 - cluster 3, to increase SNPs density), and selected only heterozygous SNPs in F1s that matched the species-specific variants. Finally, we used GATK’s ASEReadCounter69, with default parameters, to count RNA reads in the F1 hybrids (and later on in BC3 hybrids) that mapped to either the cydno or the melpomene allele. We summed all reads mapping to either the cydno or melpomene allele/variant within the same gene (both for gene models of the Hmel2.5 gene annotation and for the Cufflinks annotation we assembled previously). To test for allele specific expression (diffASE) we fitted the model “~0 + individual + allele” in DESeq267, setting library size factors to 1 (thus not normalizing between samples, as the test for diffASE is conducted within individuals). We only considered those alleles showing at least a 2-fold change in expression and p < 0.05, as differentially expressed.
In order to check that there were no biases in alleles assignment to one of the two species, we analyzed the ratios of the species alleles, for every gene, and checked that they were not systematically biased to either one of the two species. The log2 fold-changes of the species alleles were centered around 0, suggesting no obvious bias in alleles assignment72 (Supplementary figure 7).
Protein-coding substitutions and predicted effects on protein-function
We inferred fixed variants in protein-coding regions from the combined melpomene and cydno RNA samples in order to include variants from genes for which we detected expression in the brain/eyes across the 3 stages. We took the quality filtered variants called from the joint genotyping of RNA-seq data of cydno and melpomene (from all stages), and selected those genotype calls for which allele frequency (AF) > 0.8, and where the allelic variant was present in at least 7 individuals of the ~30 samples (for each species). We retained those substitutions/indels validated with the genome resequencing data. For this, of the genotype calls found in RNA reads from brain/eyes of different stages, we kept only those that were also called in at least 8 of the 10 genome resequencing samples of each species. We considered this overlapping set of variants as being fixed in H. melpomene rosina or H. cydno chioneus. Following a similar approach to Bendesky et al.59, we then restricted this set of substitutions between cydno and melpomene to protein-coding regions, and selected those non-synonymous substitutions that were considered to have moderate or high effect on protein function, with SNPeff73. Finally, we used the PROVEAN algorithm33, to further study the functional effects of these substitutions on protein function. The PROVEAN algorithm predicts the functional effect of protein sequence variations based on how they affect alignments to homologous protein sequences (for this we used the PROVEAN protein database online). We selected those amino acid changes with the suggested PROVEAN score < −2.5, indicating non-neutral effect on protein function.
Admixture analyses
We retrieved estimated admixture proportions between H. melpomene rosina and H. cydno chioneus, for 100kb and 20kb windows, from Martin et al.35
Data accessibility
RNA-seq data will be deposited on a public database (https://www.ebi.ac.uk/ena) on acceptance. Analysis scripts and behavioural data are available at: https://github.com/SpeciationBehaviour/neural_genes_heliconius.git
Author contributions
R.M.M. and M.R. conceived the study and designed the experiments, with input from W.O.M. and C.D.J; M.R. analysed expression and sequence data; A.E.H. analysed the behavioural data; T.J.T., S.H.M. and R.M.M. reared butterflies, dissected neural tissue and extracted RNA; R.M.M., C.D.J. and W.O.M. secured funding, contributed resources and provided supervision; R.P. additionally secured funding and contributed resources; M.R. and R.M.M. wrote the manuscript with contributions from all authors.
Supplementary information
Supplementary methods and results
Mapping RNA-seq reads to the Heliconius cydno genome
To determine whether the H. melpomene reference genome introduced mapping biases of RNA-seq reads, possibly affecting differential expression estimates, we also mapped to a H. cydno assembly/annotation. Generally, we found similar patterns of differential expression when mapping to the two genomes. Since i) we observed an equal decrease (~ 40 %) of genes showing 2-fold changes in melpomene and cydno when mapping to H. cydno, at every stage (p-value=0.317 at adult stage, p-value=0.800 at 156h APF, p-value=0.897 at 60h APFP, Fisher’s Exact test, Table S2), and ii) this decrease was widespread throughout the genome, we concluded that the melpomene reference genome did not bias differential gene expression analyses.
Allele-specific expression in the introgression line
BC3 hybrids had different combinations of chromosomes segregating for the melpomene alleles in a cydno background. Therefore, in principle, we could not infer cis- or trans- gene regulatory effects genome-wide from the profiles of allele specific expression (ASE) in these hybrids as for F1 hybrids, due to the diverse trans-acting environments. However, previous analyses (comparing gene expression levels between hybrids carrying cyd/melp vs. cyd/cyd regions on chromosomes other than 18) imply that differential expression of the candidate genes seems to be driven by the melpomene copy difference within the introgressed region on chromosome 18. Therefore, ASE analyses of candidate genes in BC3 hybrids carrying cyd/melp alleles on chromosome 18 should indicate whether the differences are due to cis- or trans-regulatory effects from within the introgressed region (Figure S6).
In BC3 hybrids sampled at 156h APF and 60hAPF, the melpomene and cydno alleles of the ionotropic glutamate receptor (Grik2) are expressed at very similar levels (at 156hAPF p value=0.841, at 60hAPF p value=0.579, Wald test), suggesting trans-only regulatory effects at these stages for Grik2. For regucalcin1 we again had very few allele-informative read counts in hybrids at 156h APF. Although there was a tendency towards up regulation of the melpomene allele, there was no statistical significance to support this (p=0.174, Wald test). We detected diffASE expression of regucalcin2 only at 60h APF (p <0.001, Wald test), but at this stage regucalcin2 was not detected as differentially expressed between pure species. Thus, although there is tentative evidence for cis-regulatory effects for species differences in regucalcin expression during development, it is not conclusive.
Supplementary figures
Supplementary tables
Acknowledgments
We thank Liz Evans and Adriana Tapia for assistance in the insectaries. We are grateful to Ana Pinharanda for sharing the H. cydno genome assembly and annotation and to Simon Martin for sharing admixture proportions and comments on the manuscript. We thank the Smithsonian Tropical Research Institute for providing research infrastructure, the Ministerio del Ambiente for permission to collect butterflies in Panama, Edinburgh Genomics for sequencing support and Jochen Wolf who kindly provided computational resources. MR, AEH and RMM are supported by an Emmy Noether fellowship and research grant awarded to RMM by the Deutsche Forschungsgemeischaft (DFG) (Grant Number: GZ: ME 4845/1-1). RMM was also supported by a Junior Research Fellowship from King’s College Cambridge, an Ernst Mayr fellowship from the Smithsonian Tropical Research Institute, a Varley-Gradwell fellowship from the Oxford University Museum of Natural History, and the Balfour-Browne Fund, University of Cambridge. SHM was supported by a NERC IRF (NE/N014936/1). This project was also supported by an Institutional Development Award (IdeA) INBRE from the National Institute of General Medical Sciences (NIGMS) awarded to RP (Grant number: P20GM103475), and a European Research Council (ERC) awarded to CDJ (Grant number: 339873).
Footnotes
Slight change in the title and abstract.
https://github.com/SpeciationBehaviour/neural_genes_heliconius.git