Abstract
Secondary trait loss is widespread and has profound consequences, from generating diversity to driving adaptation. Sexual trait loss is particularly common1. Its genomic impact is challenging to reconstruct because most reversals occurred in the distant evolutionary past and must be inferred indirectly2, and questions remain about the extent of disruption caused by pleiotropy, altered gene expression and loss of homeostasis3. We tested the genomic signature of recent sexual signal loss in Hawaiian field crickets, Teleogryllus oceanicus. Song loss is controlled by a sex-linked Mendelian locus, flatwing, which feminises male wings by erasing sound-producing veins. This variant spread rapidly under pressure from an eavesdropping parasitoid fly. We sequenced, assembled and annotated the T. oceanicus genome, produced a high-density linkage map, and localised flatwing on the X chromosome. We characterised pleiotropic effects of flatwing, including changes in embryonic gene expression and alteration of another sexual signal, chemical pheromones. Song loss is associated with pleiotropy, hitchhiking and genome-wide regulatory disruption which feminises flatwing male pheromones. The footprint of recent adaptive trait loss illustrates R. A. Fisher’s influential prediction that variants with large mutational effect sizes can invade genomes during the earliest stages of adaptation to extreme pressures, despite having severely disruptive genomic consequences.
Male crickets sing to attract and court females and to fight with rivals, but approximately 15 years ago, silent T. oceanicus males arose and spread in populations on the Hawaiian archipelago4,5 (Fig. 1a). They were first detected in 2003 in a population on Kauai, where they rapidly spread to near-fixation from undetectable starting frequencies, under selection imposed by a lethal parasitoid fly, Ormia ochracea (Fig. 1b)4. Female flies acoustically locate male crickets by eavesdropping on their songs, but silent flatwing males have feminised wings lacking structures used to produce sound and are thus protected (Fig. 1c). The genetic mutation(s) underlying the flatwing phenotype show Mendelian segregation and X-linkage6,7, and the propagation of flatwing males to near-fixation in the Kauai population represents one of the fastest rates of evolution known in the wild, having occurred in fewer than 20 generations4. All males found in a comprehensive survey of this population in October 2018 were flatwing (38 flatwing males, no normal-wing males found or heard singing by JGR and NWB), but the continued existence of the population indicates that silent males still find mates and must compensate for their inability to sing. The selective environment promoting the rapid spread of flatwing crickets is understood, but the mechanistic basis of the phenotype remain an open question. How did such a spectacularly disruptive phenotypic change invade the genome of crickets so quickly? Foundational evolutionary theory predicts that adaptive variants which invade genomes and spread under positive selection should tend to be small in effect size and exert few pleiotropic consequences, although exceptions are predicted during the earliest stages of adaptation8,9. Empirical studies have been unable to address this in naturally-evolving systems.
a, The field cricket T. oceanicus is thought to have migrated to the Hawaiian archipelago from other islands in Oceania, and is attacked by the fatal, acoustically-orienting parasitoid fly Ormia ochracea on Kauai, Oahu and Hawaii. We studied crickets from a population in Kauai, highlighted in dark blue, where parasitoid infestation rates have historically been highest. b, Adult female fly and mature parasitoid larva. Gravid female flies locate hosts by eavesdropping on singing male crickets, then they eject larvae that burrow into the host and consume its viscera before emerging to pupate. Infestation is fatal, and the flies exert significant natural selection against male song. c, Normal-wing males (left) of this field cricket species produce advertisement, courtship and aggressive songs by elevating and rubbing together forewings that bear specialised sound-producing venation. A toothed file on the right wing engages with a thickened ridge of tissue on the opposite, causing resonators to vibrate and produce sound. Two principal resonators are highlighted on this male’s right forewing: the harp in purple and the mirror in turquoise. Flatwing males (right) have wings that are feminised and lack, or have severely reduced, resonators. They still make wing motions characteristic of singing despite the structural inability to produce sound93, but their silence protects them from the fly4. The flatwing phenotype segregates as a single-locus mutation on the X chromosome, and 100% of males from the population studied on Kauai now exhibit flatwing morphology. (Photo credits: N.W. Bailey)
The locus controlling the expression of flatwing morphology could have arisen through de novo mutation(s) coinciding with the time of the phenotype’s first observation in 2003, it could have invaded the genome of the Kauai population via migration from an unknown location elsewhere in Hawaii (flatwing morphs have not been observed outside of the Hawaiian islands), or it could have existed for much longer in the population but at extremely low levels, evading detection by researchers. Studies of insecticide resistance in insects and of melanic morphs of Lepidoptera provide some precedent. While some museum specimens collected before the invention of organophosphates have been shown to contain insecticide-resistance alleles10, in other cases, resistance alleles arose de novo, and also invaded populations and spread under selection11. In the peppered moth, a canonical example of rapid evolution in the wild, melanism had a single recent origin approximately corresponding to the start of the industrial revolution12,13, but melanic morphs are common in many insects and may persist at low frequencies due to negative pleiotropy, at least until favourable selective conditions occur14. In T. oceanicus, parasitoid pressure pre-dated the appearance of flatwing in the Kauai population4, thus the de novo or introduction scenarios are most plausible.
We studied the genomic signature of song loss in the population on Kauai where flatwing crickets were first discovered, and in which rapid spread has been most thoroughly documented4. We sequenced the T. oceanicus genome, generating an assembly of 2.045 Gb consistent with flow cytometry size estimates7, with a scaffold N50 of 62.6 kb (Extended Data Table 1). We established an F3 mapping population using crosses designed to maximise recombination on the X chromosome, which is only diploid in females (Extended Data Fig. 1). Mapping offspring and parents were sequenced using RADseq, and a map was assembled containing 19 linkage groups. T. oceanicus has a haploid chromosome number of (13+X). We identified linkage group 1 (LG1) as the X chromosome by applying coverage and heterozygosity filters and dummy coding putative X-markers prior to constructing the map. LG1 was the largest linkage group, with a female recombination length of 379 cM and a male length of 195 cM (Extended Data Fig. 2). After resolving chimeric scaffolds (Extended Data Table 2), 35.6% of the genome was anchored to a linkage map using a LOD5 cutoff (Extended Data Table 3) (Fig. 2a).
a, Circos plot providing an overview of the genome. Linkage groups (LGs) upon which genome scaffolds were anchored are shown in different colours, with unplaced scaffolds in gray. LG1 was identified as the X chromosome based on heterozygosity and coverage filters (see Main Text). Tracks (i): gene density, (ii): linkage group pseudomolecules, (iii): transposable element density, (iv): genes DE in the thoracic tissues of embryos homozygous for flatwing vs. normal-wing genotypes. Longer bars are DE genes for which log2FC > 1 between genotypes, and short grey bars are all other DE genes. Colours indicate the magnitude of upregulation (red) versus downregulation (blue) in flatwing compared to normal-wing embryos. b, Genome-wide Manhattan plot of the flatwing QTL. Alternating shades of grey and blue indicate different LGs. The horizontal dashed line indicates an FDR-corrected significance threshold of (P < 0.001), and the top 1% most significant QTL markers are plotted in red. c, Enlarged plot for LG1 (X chromosome) showing the flatwing-associated peak.
We performed gene prediction and annotation using custom pipelines incorporating ab initio, homology, and transcriptome-based approaches (Extended Data Fig. 3). Evidence from different gene prediction and annotation methods was weighted and filtered to predict a final, conservative set of 19,157 genes, 75% of which had functional annotation (Extended Data Table 4, Extended Data Fig. 4). Gene density was assessed (Fig. 2a track i), and we tested whether the putative X linkage group showed a different distribution of repeat content relative to the other linkage groups, across eight common categories of repeats. It did not (Fig. 2a track iii, Extended Data Table 5, Extended Data Fig. 5). T. oceanicus gene features were compared to 10 other insect species (Extended Data Table 6), and we contrasted transposable element classifications with three other recently published insect genomes (Extended Data Table 7). The T. oceanicus genome and metadata associated with it are curated in ChirpBase (www.chirpbase.org), a GenomeHubs Ensembl genome browser15 that we created as an openly available, community-based genomics resource for researchers working on singing insects.
Flatwing was definitively mapped to the putative X chromosome (Fig. 2b) using markers supported by a LOD10 cutoff and a mixed model, ANOVA-based approach designed to control for uneven genomic relatedness caused by family structure in the mapping crosses. To cope with the particularly high marker association on the putative X chromosome caused by the Mendelian mode of inheritance of flatwing and the different effective population size of the X compared to autosomes, we identified the QTL using only the top 1% of markers after FDR correction, yielding a prominent peak occupying approximately one third of the X chromosome (Fig. 2c). Flatwing morphology is observable in males during mid-to late-instar stages of juvenile development, so we examined early embryonic gene expression differences associated with flatwing. Females carrying the genotype cannot be visually distinguished and embryos cannot be readily sexed, so we used replicate laboratory lines homozygous for flatwing or normal-wing genotypes to detect widespread differential gene expression in the developing thoraces of embryonic crickets. We found 830 genes differentially expressed (DE), 204 of which had a log2 fold-change > 1, and a predominant pattern of down-regulation in flatwing crickets (Extended Data Table 8, Extended Data Fig. 6). DE genes associated with flatwing were widely distributed across linkage groups and unmapped scaffolds (Fig. 2a track iv).
These physically dispersed expression effects suggest that flatwing acts as a master regulatory switch during early development, with a broad cascade of downstream effects. Pathways reconstructed using differential expression data are consistent with such a mode of action. For example adherin junction activity was enriched, which affects epithelial patterning during early development (Extended Data Tables 9 & 10). Using a stringent and redundant approach combining information from gene sets identified in the QTL study, RNA-seq experiment and a previously-published bulked segregant analysis7, we identified 51 annotated protein-coding genes located within LG1 as top flatwing-associated candidates (Extended Data Table 11). GO enrichment analysis indicated that positive regulation of developmental process was overrepresented in this candidate gene set, with three genes in particular (NBL1, GOGA4, UNC89) known to play a fundamental role in the regulation of cell differentiation (Extended Data Table 12).
In most pterygote insects, wings are derived from imaginal discs formed during development by the invagination of embryonic ectoderm16. Previous work mainly in Drosophila melanogaster has established that the developmental elaboration of wing venation patterns requires the involvement of numerous transcription factors and complex coordination across numerous signalling pathways17. Here, we found that 8 of 51 flatwing associated candidate genes have reported involvement in D. melanogaster wing development. For example, stat92E expands the proximodistal axis of the wing imaginal disc, subdividing and patterning it18. Collier encodes a transcription factor required for wing disc patterning19, and Myoglianin expression is required for normal wing disc development20. ROR1 encodes a transmembrane tyrosine-protein kinase receptor involved in phosphorylating MAP kinases21, and reduction of MAPK activity through ROR1 silencing can lead to a loss of wing venation phenotype17. The protein krasavietz is encoded by PKRA, and establishes planar cell polarity in the wing22, disruption of which can lead to wing distortion23. Knockouts and mutants in Pelle, Gen5, and Plexin-A4 show wing shape and venation alterations with features similar to flatwing24-26.
We tested the consequences of the rapid invasion of flatwing into the T. oceanicus genome by focusing on a distinct, close-range sexual signalling modality that operates alongside acoustic signalling in field crickets. Cuticular hydrocarbons (CHCs) are long-chain, waxy molecules expressed on insect cuticles. CHCs are thought to have evolved for dessication resistance, and they tend to be expressed as a bouquet of numerous individual hydrocarbon compounds. T. oceanicus CHCs are sexually dimorphic and function as sexual signals during male and female mate choice27-29, and they have been found to vary between flatwing and normal-wing male crickets30. We characterised the CHC profiles of F3 mapping individuals, all of which were raised in a common garden environment, by extracting their CHCs and using gas chromatography – mass spectrometry (GCMS) to measure the abundance of 26 individual compounds (Fig. 3a) (Extended Data Table 13). By performing dimension reduction using principal components (PC) analysis of the CHC profiles, we first established that, in our mapping population, males carrying flatwing showed noticeable and significantly different CHC profiles from normal-wing males (Fig. 3b) (multivariate analysis of variance on 6 principal components with eigenvalues > 1 describing male CHC blends: F6,191 = 29.769, p < 0.001) (Extended Data Table 14).
a, Diagram of a T. oceanicus cuticular hydrocarbon (CHC) chromatogram, with the 26 measured peaks indicated by blue wedges. The asterisk indicates the internal standard (pentadecane). b, Space-filling scatterplot of the first three principal components describing male CHC profiles, illustrating differences between flatwing and normal-wing males (variance explained for PC1: 35.18%, PC2: 10.14%, PC3: 9.58%). c, Comparison of QTL on the putative X chromosome for CHCs (top; first principal component mapped) and flatwing (bottom, same as Fig. 2C). Grey shading indicates the extent (in cM) of the CHC peak, showing overlap with the flatwing QTL. Dashed lines indicate FDR-corrected significance of p < 0.001, red points the top 1% significant flatwing QTL markers. Note the different y-axis scales. d, Univariate analyses revealed nine individual CHC components which also co-localised with flatwing. The original flatwing QTL is plotted at the top of each column. Grey shading spans the genetic region of co-localisation. Numbers refer to compounds indicated in a, and dashed lines indicate an FDR-corrected significance threshold of p < 0.001. e, Discriminant function scores describing variation in CHC profiles among female, flatwing male and normal-wing male mapping individuals. Discriminant function 1 explained 78.8% of the variance in CHC profiles between groups. Means ± 2 s.d. are indicated by open black-and-white circles and lines, respectively.
QTL analysis was then performed on the first six CHC PCs to determine whether flatwing-associated variation in male CHC profiles mapped to identifiable genomic regions. The putative X chromosome, LG1, was of particular interest, because we hypothesized that the striking variation between CHC profiles of flatwing and normal-wing males could be a pleiotropic effect of flatwing. Genetic mapping of CHCs was performed blind to male morphotype. PC1, which explained over a third of the variance in male CHC profiles, mapped to a ca. 2.5 cM region strongly co-localised with flatwing (Fig. 3c). PCs 4 and 6 also had co-localizing peaks (Extended Data Fig. 7). As dimension reduction for CHCs can obscure phenotypic patterns in the original individual chemical compounds, we mapped each of the 26 compounds separately. Of these, 9 showed significant peaks co-localising with flatwing (Fig. 3d). We recovered no autosomal QTL peaks for PCs 1-6, and only one QTL peak for one compound on one autosome (compound 11, 7-C31ene, on LG8). However, the latter peak was weakly supported, with only a single marker showing an association at FDR-corrected p < 0.001.
We interrogated genes on scaffolds under the CHC QTL peaks following a similar procedure used to produce the flatwing candidate gene set (Extended Data Table 15). Of 55 protein-coding genes, a subset of 6 were implicated for every CHC trait with a significant QTL peak, and these 6 genes were also present in the flatwing candidate gene set. These are strong candidates for testing the pleiotropic consequence of evolved acoustic sexual signal loss on chemical sexual signals. Our final step was to explore the nature of the phenotypic shift in flatwing male CHC profiles. It is unknown how flatwing males’ profiles compare to those of females30, but given the generally feminising effect of flatwing on male wing morphology, we predicted that flatwing males’ CHC profiles would also be feminised. We compared them to the profiles of normal-wing males and females using discriminant function analysis on profiles from all three groups. Discriminant function 1 (eigenvalue = 2.526) explained 78.8 % of the variance, and indicated that flatwing male crickets’ CHC profiles are strongly feminised (Fig. 3e). Their CHCs appear to be correspondingly less attractive to females31.
The rapid emergence and spread of flatwing crickets on Kauai has been described as one of the fastest rates of evolutionary adaptation ever documented in the wild32. Nearly a century ago in 1930, R. A. Fisher8 developed a ‘geometric’ model that describes the genomic landscape of such early-stage adaptation and predicts what mutational features are associated with adaptive change. In doing so, he reconciled the prevailing, gradualist view of evolution with seemingly inconsistent units of discrete Mendelian inheritance that were being discovered and characterised at the time. Fisher’s key insight was that the process of evolutionary adaptation tends to favour mutations of small effect, with impacts narrowly limited to the phenotypic variants directly under selection33. However, he built exceptions to this general rule into his model when selection is severe, and the genomic signature of song loss in Hawaiian T. oceanicus uniquely confirms and illustrates this insight. Adaptation was recent, abrupt and proceeded rapidly in this system. Prior work on T. oceanicus has found differences in the level of phenotypic plasticity, gene expression, and other reproductive characteristics such as male testis size between male normal-wing and flatwing genotypes34-36, and our present findings reveal the genomic footprint of strong, associated effects on sexual signalling in an entirely different sensory channel. These consequences of rapid adaptive trait loss are early-acting, genome-wide, and impact a range of important fitness traits. The suite of characters affected in flatwing crickets is reminiscent of feminised alternative male morphs in ruff (Calidris pugnax) in which supergene architecture controls size, ornament and behavioural traits simultaneously37, and in feminised bulb mites38. What is surprising is that an evolved loss of function could lead to such similarly wide-ranging phenotypic impacts so quickly. The genomic signature of recent, rapid trait loss in T. oceanicus confirms Fisher’s predictions about adaptive evolution – by demonstrating the exception to his rule.
METHODS
Cricket rearing and maintenance
Laboratory stocks of Teleogryllus oceanicus were established from eggs laid by wild-caught females from a population on the Hawaiian island of Kauai in 2012, and a population near Daintree, Australia in 2011. Stocks were maintained in the laboratory within 16 L plastic containers containing cardboard egg cartons for shelter. All crickets were reared in a single, temperature-controlled chamber a 25 °C, on a 12:12 light:dark cycle. They were maintained regularly and fed ad libitum with Burgess Excel Junior and Dwarf rabbit pellets and provided water in a moist cotton pad that also served as an oviposition substrate. Throughout all experiments, all crickets were reared in a common-garden environment in the same temperature-controlled chamber.
Genome sequencing
Three Illumina sequencing libraries were prepared using genomic DNA extracted from the head capsule and muscle tissue of a single T. oceanicus female using a DNeasy Blood & Tissue kit (Qiagen). The female was sourced from the Kauai stock population. gDNA was quality-checked using Nanodrop and Qubit prior to Illumina library preparation and sequencing at Edinburgh Genomics. We prepared three standard paired-end TruSeq libraries with insert sizes of 180 bp, 300 bp, and 600 bp. We supplemented reads from the above three Illumina libraries with additional sequences from two TruSeq Nano Pippin selected libraries with insert sizes of 350 bp and 550 bp, one 8 kb Nextera gel-plus mate-pair library, and 1 PacBio library. For these libraries, gDNA from a separate, single female cricket from the same laboratory population was extracted using a high molecular weight Genera Puregene Cell Kit (Qiagen). The first three TruSeq libraries were sequenced on 5 lanes of an Illumina HiSeq 2000 v3 to yield 100 bp paired-end reads. NanoPippin libraries and the Nextera mate-pair library were sequenced on 2 Illumina HiSeq 2500 lanes to yield 250 bp paired-end reads. To construct the PacBio library, we purified the extraction with 1x AMPure beads (Agencourt) and performed quality control using Nanodrop and Qubit. Average DNA size and degradation was assessed using a high sensitivity genomic kit on a fragment analyzer. Size-selected and non-size-selected libraries were made by shearing gDNA using g-TUBEs (Covaris). Size selection was performed using the BluePippin DNA Size Selection System with 0.75% cassettes and cutoffs between 7 and 20 kb. Preparation of both libraries then proceeded using the same protocol. We treated DNA for 15 min at 37 °C with Exonuclease V11. DNA ends were repaired by incubating for 20 min at 37 °C with Pacific Biosciences damage repair mix. Samples were then incubated with end repair mix for 5 min at 25 °C followed by washing with 0.5x AMPure and 70% ethanol. DNA adapters were ligated overnight at 25 °C. Incubation at 65 °C for 10 min was used to terminate ligation reactions, and then samples were treated with exonuclease for 1 hr at 37 °C. We purified the SMRTbell library using 0.5x AMPure beads and checked quality and quantity using Qubit assays. Average fragment size was quantified using a fragment analyser. For sequencing, primers were annealed to the SMRTbell library at values determined using PacBio’s Binding Calculator. A complex was formed using DNA polymerase (P6/C4 chemistry), bound to MagBeads, and then used to set up 43 SMRT cells for sequencing to achieve 10X coverage. Sequencing was performed using 240 min movie times.
Genome assembly
Raw reads from all Illumina libraries were trimmed using cutadapt v.1.8.339 to remove adapters, primers and poor quality bases, and then error-corrected using BLESS40. PacBio reads <1,000 bp were discarded. The original fragment length of the 350 bp library was shorter than the sequenced paired read length of 500bp, so reads from this library were merged using Vsearch v.1.10.141. Platanus v.1.2.442 was used to assemble error-corrected reads from all Illumina libraries except the mate-pair library; reads from the latter were added at the scaffolding stage. Next, we selected the contigs >1,000 bp and combined these with the PacBio data to generate a hybrid assembly using PBJelly v.15.2.2043. Pilon v.2.144 was used to improve local base accuracy, and BUSCO v.2.145 was used to assess genome quality through gene completeness.
Repeat annotation
We used de novo and homology-based approaches to identify repetitive regions. We first built a de novo repeat library using RepeatModeler46, with dependencies RECON and RepeatScout47. To scan and classify interspersed repeats and low complexity DNA sequences at the DNA level, we searched the cricket genome sequence against the Dfam consensus database48, RepBase49, and the de novo repeat library using RMBlast50 and RepeatMasker51. Protein-level repeats were identified by searching against the TE Protein Database using RepeatProteinMask51.
Unclassified repetitive elements were further classified by TEclass52, a programme using a support vector machine learning algorithm. Tandem repeats were also identified in the cricket genome using Tandem Repeat Finder53.
Gene prediction
Before running gene prediction pipelines, repetitive regions identified above were masked using an in-house Perl script. We built a pipeline including ab initio, homology and transcriptome-based methods to predict protein-coding genes in the cricket genome (Extended Data Fig. 3). For ab initio prediction, SNAP54, Glimmer-HMM55, GENEID56, and BRAKER157 were used to generate preliminary gene sets from the repeat-masked genome. Specifically, reads obtained from the T. oceanicus transcriptome were aligned against the repeat masked genome with TopHat258. SAMTOOLS59 was used to sort and index the resulting Binary Alignment Map (BAM) format file.
This BAM file was processed in BRAKER157, which used transcriptome data to train GENEMARK-ET60, generate initial gene structures, and then subsequently train AUGUSTUS61 and finally integrate RNA-seq information into final gene predictions. For other ab initio gene prediction programmes, gene sets from Locusta migratoria62, Acyrthosipon pisum63, and Drosophila melanogaster64 were used for model training. For homology-based prediction, we aligned protein sequences of five insect species (L. migratoria62, Drosophila melanogaster, Anoplophora glabripennis65, Nilaparvata lugens66, and Cimex lectularius67) to the repeat-masked cricket genome using TBLASTN (E < 10-5)50. The boundaries of potential genes were further identified using BLAST2GENE68. We then ran GENEWISE269 to obtain accurate spliced alignments and generate a final, homology-based gene set. For prediction based on transcriptome data, the de novo transcriptome assembly generated by Trinity70 was filtered based on gene expression level, and then passed to Program to Assemble Spliced Alignments (PASA)71. PASA performed transcript alignments to the cricket genome, generated a new transcript assembly, and predicted gene structures. All gene sets predicted by ab initio, homology, and transcriptome-based methods were then combined into a weighted consensus gene set using EVidenceModeler (EVM)72. We removed genes likely to be spurious, those with low EVM support, partial genes with coding lengths shorter than 150 bp, and genes only supported by a minority (≤ 2) of ab initio methods73. PASA was used to further update the filtered consensus gene set to produce a finalised official gene set. The completeness of this final gene set was assessed by both BUSCO v.2.1 (using the arthropoda dataset)45 and transcriptome data.
Functional assignment
Putative gene functions were assigned based on InterPro74, SwissProt75, TrEMBL75 and RefSeq non-redundant (NR) protein and Kyoto Encyclopedia of Genes and Genomes (KEGG) gene databases. Briefly, we first obtained protein sequences from the final gene set using EVM72. Functional annotation and gene ontology terms were assigned to genes based on protein sequence, using InterProScan 576. These proteins were also blasted against SwissProt, TrEMBL and NR databases (PLASTP, E < 10-5), and assigned their best hits as functional annotations. Gene ontology (GO) terms were assigned using GO annotations downloaded from the GO Consortium77,78. BLAST2GO79 was implemented to further assign unassigned genes using NCBI databases, and KEGG Orthology (KO) terms were assigned using BlastKOALA80.
Genome anchoring
ALLMAPS81 was used to detect chimeric scaffolds, anchor the cricket genome to the linkage map (see below), and construct pseudo-molecules (reconstructed portions of chromosomal sequence). We first built a consensus genetic map based on male and female genetic distances obtained from linkage maps, in which equal weighting was applied for both sexes. Then, scaffolds for which more than four markers mapped to multiple linkage groups were designated as chimeric scaffolds, and split. After this correction was applied, scaffolds anchored to the linkage maps were oriented and ordered based on the consensus genetic map. We used a custom Perl script to order unanchored scaffolds according to their length, and liftOver82 to convert genome coordinates based on anchoring results.
Genome browser development (ChirpBase)
We created ChirpBase, an open-access community genomics resource for singing insects, such as field crickets and katydids. The resource can be accessed at www.chirpbase.org where users may view and download genomic data and scripts presented in this study in addition to uploading data. An index page links to an ensembl page, where assembly statistics can be visualised using a Challis plot or compared in tabular format. A plot illustrating codon usage is presented, as well as a visualisation of BUSCO scores. Additional pages linking from this include a basic local alignment search tool (BLAST) page and a download page where raw data can be accessed. We used the GenomeHubs framework to set up ChirpBase14. Briefly, the databased is hosted using a Linux container (LXC) on a remote computer, linked to a cluster via an intermediate import computer. A MySQL docker container was started in the LXC, where a database ini file resided to guide additions to the database. An Ensembl-easy mirror Docker container was run to import the database into the MySQL container, uploading data designated in the ini file from the LXC to the database. The MySQL container links to an Ensembl EasyMirror container, BLAST container, and a download container.
Linkage and QTL mapping crosses
We constructed a linkage map for T. oceanicus using a series of crosses to maximise recombination on the X chromosome (Extended Data Fig. 1), combined with restriction-site associated DNA sequencing (RAD-seq) to identify markers. Flatwing segregates on the X chromosome in both Kauai and Oahu populations6,7, so mapping was performed with F3 offspring to increase recombination on the X. We set up two parental mapping families by crossing a flatwing sire from the Kauai population with a virgin dam from the Daintree, Australia population. Daintree females were used in the cross because flatwings do not exist in that population, and other sexually-selected traits such as song and cuticular hydrocarbons show significant divergence between Australian and Hawaiian populations83, which maximised our opportunity to genetically map segregating variation in other phenotypes. Female F1 offspring from parental crosses were heterozygous for flatwing, enabling recombination on the X. Full-sib matings were then performed with F1 males, all of which were normal-wing. The resulting F2 female offspring were a segregating mix of homozygous normal-wing genotypes on the X, or heterozygous with respect to wing morph. Recombination between flatwing and normal-wing genotypes was similarly possible in the heterozygous F2 females, but their phenotype is not externally detectable. To further increase recombination on the X, we performed another generation of crossing by mating F2 females with full-sib flatwing males from the same generation. Screening male morph types in the resulting F3 offspring enabled us to identify F2 crosses involving heterozygous females, as all male offspring of homozygous normal-wing females expressed normal-wing morphology. The crossing procedure resulted in 10 F3 mapping families from the original two parental families, for a total of 192 females, 113 normal-wing males, and 86 flatwing males.
Marker identification using RAD-seq
RAD-seq was used to identify single nucleotide polymorphisms (SNPs) in F3 offspring (n = 391), P0 dams and sires (n = 4), and the F2 sires and dams (n = 19) that were used to produce mapping individuals in the F3 generation. For each individual, gDNA extraction and quality control was performed as described above prior to library preparation. gDNA was digested using SbfI (New England BioLabs). We barcoded individuals by ligating P1 adapters (8 nM), then sheared and size selected 300-700 bp fragments. After ligating P2 adapters to sheared ends, parents were sequenced to an average coverage of 120x and offspring to 30x on an Illumina HiSeq 2000.
Construction of linkage map
Reads from all paired end RAD libraries were demultiplexed by sample using process_radtags from Stacks84, mapped against the reference genome assembly using BWA-MEM85 and duplicates marked using PicardTools MarkDuplicates (http://broadinstitute.github.io/picard). Variants were called using samtools mpileup (version 1.3, parameters -d 2000 -t DP,DPR,DV,DP4,SP -Aef -gu) and bcftools call (version 1.3, parameters -vmO z -f GQ). The resulting variants were filtered using vcfutils.pl (included with bcftools) with minimum quality 50 and a minimum read depth of 150 (-Q 50 -d 150) to only retain high quality variants. The vcf format was converted to the required lepmap2 input format using a custom script of the RADmapper pipeline (RAD_vcf_to_lepmap_with_sexmarker_conversion.py, https://github.com/EdinburghGenomics/RADmapper). During this conversion samples that did not fit relatedness expectations and samples from family J (which lacked a genotyped father) and P0 parents were excluded from linkage map creation. Putative X-linked markers (male_het <=1, female_het > 20, het_sire <=1) were converted to biallelic markers in the relevant male offspring and sires using a dummy allele (Extended Data Table 17). The linkage map was then created using the following steps and parameters in lepmap2 (Filtering: dataTolerance 0.05 keepAlleles=1; SeparateChromosomes: losLimit=10 sizeLimit=10 informativeMask=3;JoinSingles: lodLimit=5;OrderMarkers: filterWindow=10 polishWindow=100; OrderMarkers evaluateOrder: filterWindow=10 polishWindow=100). The resulting linkage map files were merged with the marker and sample information using a custom script from the RADmapper pipeline (LG_to_marker.py).
QTL mapping
To identify the flatwing locus on the putative X chromosome (LG1), we performed ANOVA for each marker using the lm package in R (v. 3.1). Individual p-values were corrected to account for multiple testing using Bonferroni correction and markers supported by a LOD10 cutoff were plotted. QTL for all 26 cuticular hydrocarbon (CHC) peaks as well as the principle components from the CHC analysis were mapped to the linkage groups using mixed linear models in ASReml 4. Mapping used a GWAS-type approach, taking into account genetic relatedness between individuals86.
The markers mapped to the autosomal linkage groups 2-19 were filtered to contain only bi-allelic SNP markers with a MAF <=0.01 and <5% missing samples per marker. In addition, all grandparental, parental and female samples were removed together with samples that clustered with the wrong family or did not have CHC data. Only male samples were selected, as our aim was to map male CHCs (not sex-related associations) on the putative X (LG1) and autosomes using principle components from the CHC analysis as well as individual compounds as traits. The remaining 21,047 markers were used to calculate pairwise genetic relatedness with the first normalisation87. The resulting inverse relatedness matrix was used as random effect in a model: CHC trait ~ mu marker r! Giv(animal). P-values for all markers were extracted from the results and corrected for multiple testing using Bonferroni correction. The same model was used to assess LG1 separately with the same set of samples, for which 6,537 markers were used after filtering.
Pure-breeding lines and embryo sampling for RNA-seq
Lines homozygous for the flatwing and normal-wing genotypes were produced following previously described methods34. Briefly, one generation of crosses was performed, starting with the laboratory population derived from Kauai and crossing males of either wing phenotype to virgin females of unknown genotype. Because the phenotypic effects of flatwing are sex-limited, family-level screening of the resulting male offspring was performed to select homozygous flatwing and homozygous normal-wing lines, resulting in a final selection of three pure-breeding lines for each morph genotype. Developing embryos were sampled from eggs laid by females from each line. Females were maintained in laboratory culture as above, and their oviposition substrates were monitored. Eggs were removed from the substrate and immediately preserved in 500 μL of RNAlater (Qiagen) at the stage when eye pigmentation first develops, ca. 2 weeks after laying. This time point corresponds approximately to embryonic stage 13-14 in the related grylline species Gryllus bimaculatus88. After removing the outer egg chorion, the thoracic segment of each nymph was microdissected. Nymphs cannot be sexed based on external morphology until a later stage of juvenile development, and as chromosomal sex determination is XX/XO, screening for sex-specific markers is not possible. To minimise potential variation in sex ratio of samples between lines, and ensure a sufficient volume of tissue to extract RNA, thoracic tissue from n = 8 nymphs was pooled for each replicate, and 6 biological replicates were produced for each morph type (2 per line).
RNA-seq and gene expression profiling
Total RNA was extracted using the TRIzol plus RNA purification kit (Life Technologies) and DNAse treated using PureLink (Invitrogen). RNA was quantified and quality checked using Qubit assessment (Invitrogen) and Bioanalyser RNA Nano Chips (Agilent), respectively. To isolate mRNA we depleted samples with RiboZero. After verifying depletion, cDNA libraries were constructed using the ScriptSeq protocol (Epicentre) with AMPure XP beads for purification. Following barcoding and multiplexing, final quality was checked and qPCR performed using Illumina’s Library Quantification Kit (Kapa). Sequencing was performed on an Illumina HiSeq 2000 v3, with 1% PhiX DNA spike-in controls to produce 100 base paired-end reads. CASAVA v.1.8.2 was used to demultiplex reads and produce raw fastq files, which were then processed with Cutadapt v.1.2.138 and Sickle v.1.20089 to remove adaptor sequences and trim low-quality bases. Further quality assessment was performed in FastQC. Expression analysis of RNA-seq data was performed broadly following the protocol published by Pertea et al. (2016)90. Reads were aligned to the genome using HISAT2 with strand-specific settings, and transcripts compiled for each sample in StringTie, using the gene annotation file as a reference, which were then merged across all samples to produce a single annotated reference transcriptome. Sample transcript abundances were estimated with the parameter -e specified to restrict abundance estimation to annotated transcripts. Differential expression analysis was performed at the gene level following normalisation of counts by trimmed mean of M-values (TMM), using a generalised linear model (GLM) with negative binomial distribution and a single predictor variable of ‘morph’ in the edgeR package91 in R v.3.4.1. Only genes with an expression level greater than 1 count per million in at least 3 samples were included in the analysis. Significance-testing was performed using likelihood ratio tests, and genes were considered significantly differentially expressed between morph genotypes if FDR-adjusted P-values were below a threshold of 0.05.
Screening for top candidate genes associated with flatwing
We adjusted P-values for significant marker associations in the flatwing QTL mapping procedure using Bonferroni correction with a cut-off of P < 0.001. Three sources of information were used to comprehensively and robustly detect a set of top candidate genes associated with the flatwing phenotype. We detected genes (i.e. any overlapping portion of a predicted gene sequence cf. Extended Data Table 6) located in 1 kb flanking regions of all significant QTL markers after FDR correction as above, and defined these as QTL-associated candidates. We then subsetted these genes to retain only those located in the 1 kb flanking regions of the most significant (top 1%) of all QTL markers, and defined these as Top 1%-associated candidates. We also obtained the flatwing-associated sequences from a previously published bulk segregant analysis (BSA) of Kauai flatwings7, and defined the BSA reference sequences containing flatwing-associated SNPs as flatwing-associated BSA sequences. We mapped these BSA sequences to the T. oceanicus reference genome using BWA-MEM with default parameters85. Coordinates of mapped sequences were extracted from the resulting BAM files using SAMTOOLS59 and custom Perl scripts, and we only retained those sequences that were anchored to LG1. Genes within 1 kb of these retained sequences were defined as BSA-associated candidates. Finally, we extracted differentially expressed genes from the embryonic thoracic transcriptome analysis above, and defined these as DEG-associated candidates. To ensure a reliable final candidate gene set for flatwing, we only retained genes supported by at least two of these four gene sets. We used KEGG pathway mapping (colour pathway) to reconstruct pathways and obtain reference pathway IDs92. To characterise significantly enriched GO terms and KEGG pathways in DEGs, we implemented the hypergeometric test in enrichment analyses. P values for each GO and KEGG map term were calculated and FDR-adjusted in R.
Cuticular hydrocarbon extraction and gas chromatography-mass spectrometry
We extracted CHCs from F3 mapping individuals prior to extracting gDNA for RADseq. Extraction and analysis of CHCs followed previous methodology83, which is briefly described here. Subjects were flash-frozen for several minutes at −20 °C and then thawed. They were individually placed into 4 mL borosilicate glass vials (QMX Laboratories) and immersed for 5 minutes in 4 mL of HPLC-grade hexane (Fisher Scientific), then removed from the vials and stored for later processing. We evaporated a 100 μL aliquot of each sample overnight in a 300 μL autosampler vial (Fisher Scientific). CHC extracts were transported to the University of Exeter for gas chromatography mass spectrometry (GC/MS) using an Agilent 7890 GC linked to an Agilent 5975B MS. Extracts were reconstituted in 100 μL of hexane with a 10 ppm pentadecane internal standard, and 2 μL of this was injected into the GC/MS using a CTC PAL autosampler at 5 °C. The carrier gas was helium and we used DB-WAX columns with a 30 m x 0.25 mm internal diameter and 0.25 μm film. Injection was performed in split-less mode. The column profile was optimised for separation of the CHC extract83 to start at 50 °C for 1 minute, followed by a temperature ramp of 20 °C per minute until finally holding at 250 °C for a total run time of 90 minutes. The inlet temperature was 250 °C and the MS transfer line was 230 °C. We recorded electron-impact mass spectra using a 70 eV ionization voltage at 230 °C, and a C7-C40 alkane standard was run as a standard to enable the later calculation of peak retention indices.
Quantification and analysis of CHC profiles
For each individual, we used MSD CHEMSTATION software (v.E.02.00.493) to integrate the area under each of 26 CHC peaks (Extended Data Table 13) following Pascoal et al. (2016)83. Peak abundances were standardized using the internal pentadecane standard and log10 transformed prior to analysis. After accounting for samples that failed during extraction or during the GC run (n = 9), plus one normal-wing male CHC profile that was identified as an outlier and removed during analysis (Extended Data Fig. 8), we analyzed a total of n = 86 flatwing males, n = 112 normal-wing males, and n = 185 females of unknown genotype. To test whether CHC profiles differed between males of either wing morph, we first performed dimension reduction using principal components analysis (PCA) on male data only. JMP Trial 14.1.0 (SAS Institute Inc.) was used to draw a 3D scatterplot of the first three PCs. To assess statistical significance, we performed a MANOVA using all principal components with eigenvalue > 1.00 (n = 6). This indicated a highly significant difference among male morphs which formed the basis of QTL mapping described above. To visualise the difference between flatwing and normal-wing male CHC profiles with respect to female CHC profiles, we performed a discriminant function analysis (DFA) for all samples and all 26 peaks. DFA highlights the maximal difference between pre-defined groups, with maximum group differences indicated by the first DF axis. Statistical analyses of CHC data were done in SPSS (v.23).
Data Availability
Raw reads from Illumina and PacBio genome sequencing libraries, embryo RNAseq reads, RADseq reads used in the linkage map and QTL analyses, CHC phenotype data will be made publicly available upon acceptance. Custom scripts are available online at http://www.chirpbase.org if not stated otherwise.
Author contributions
N.W.B. conceived and led the study. S.P., K.G., M.B., M.G.R. and N.W.B. designed experiments. S.P. led data collection. S.P. did genetic crosses and wet lab work. S.P., J.E.R., X.Z., T.C., E.L., X.L., J.H., J.G.R., B.L.S., U.T. and N.W.B. performed analyses. M.B., R.J.C., S.J., E.L., M.B. and N.W.B. designed ChirpBase. N.W.B. led manuscript writing. S.P., J.E.R., X.Z., E.L., M.B., M.G.R. and N.W.B contributed to writing.
Competing interests
The authors declare no competing interests.
Acknowledgements
The Natural Environment Research Council provided funding to N.W.B. (NE/G014906/1, NE/L011255/1), and to N.W.B. and M.G.R. (NE/1027800/1). Sequencing support was provided by Edinburgh Genomics and the Centre for Genomic Research at the University of Liverpool. Bioinformatics resources at St Andrews were funded by a Wellcome Trust ISSF award (105621/Z/14/Z). Support from the China Scholarship Council (201703780018) to X.Z. is gratefully acknowledged. The Biotechnology and Biological Sciences Research Council provided support to M.B. which aided in the development of ChirpBase (BB/K020161/1). We thank J. Kenny for NGS sequencing advice; Y. Fang for read processing; R. Fallon for bioinformatics assistance; J.Q. Liu and K. Wang for advice regarding gene prediction pipelines; J. Bastiaansen and P. Gienapp for assistance with ASReml, W.V. Bailey, B. Gray, J.T. Rotenberry, S. Vardy and M. Zuk for assistance in the field; D. Forbes, A. Grant and T. Sneddon for assistance in the laboratory; C. Mitchell for assistance with CHC analyses.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.