Hybridizing wood ants allow testing for natural selection at candidate barrier loci underlying speciation

The current aim of speciation research is to pin-point which genomic regions serve as barriers for gene flow and drive divergence during speciation. At the barrier loci, natural selection is assumed to act against gene flow. Many current approaches, however, rely on indirect measures of gene flow and natural selection. Here we present a system to test the action of natural selection at candidate barrier loci in a natural population. We utilize haplodiploidy to identify candidate barrier loci between two wood ant species and combine survival analysis with SNP genotyping to test for natural selection acting at candidate barrier loci. We find multiple candidate loci distributed over a large part of the genome displaying signatures of natural selection. Surprisingly, however, we find that a proportion of the barrier loci show leakage between data sets collected in 2004 and 2014. We also show that, on average selection favored introgression at candidate barrier loci in year 2014. We discuss reasons for barrier leakage, including environment-dependent selection, formation of compatible combinations of parental alleles and recombination breaking associations between causal and hitchhiking loci. Integrating data on survival allows us to move beyond genome scan studies, bringing additional evidence for natural selection acting in genomic islands of divergence.


Introduction 64
Multiple barrier loci throughout the genome are candidates for hybrid male 134 breakdown 135 We used pooled genomic sequencing to discover the genomic regions associated with 136 hybrid male breakdown (see Figure 1 for experimental design). We sequenced four  However, loci fixed differently in males, despite shared hybrid history and variable in 164 females are expected to experience the strongest antagonistic selection in the current 165 population and offer the best opportunity to test for selection acting on barrier loci. 166 We assumed that the allele fixed in F. aquilonia-like males represents the parental F. Next, we were interested in the functions of genes within candidate barrier regions. 252 For this we used the D. melanogaster gene IDs. The top three biological processes 253 characterizing our candidate barrier genes are "developmental process" (p = 3.5×10 -254 18 ), "system development" (p = 3.5×10 -18 ), and "single-organism developmental 255 process" (p = 4.03×10 -18 ). Similarly, the top three molecular functions are "ion 256 binding" (p = 8.55×10 -6 ), "protein binding" (p = 3.67×10 -5 ) and "calcium ion binding" 257   Figure 3). We found two small female larvae, which were genetically 282 intermediate and had unique combinations of putative parental alleles from the two 283 lineages suggesting they were early-generation hybrids between the lineages 284 (Supplementary Figure 3)  Next, we tested if the barrier loci showed significant allele or genotype frequency 318 changes during development from larva to adult compared to the random loci. This 319 would demonstrate natural selection acting on the candidate barrier loci in nature. No 320 significant allele frequency change was observed between larval and adult 321 developmental stages at candidate barrier loci compared to random loci in females of 322    Allele frequency estimates (poolseq) Allele frequency − late stage Allele frequency − early stage

338
for details on the model).

340
Our previous microsatellite-based study showed that introgressed alleles were favored 341 in females when heterozygous. This selection acted throughout the female lifetime. 342 Thus, we searched for signs of similar selection testing for increase in heterozygote 343 excess between early and late stages using F IS estimates (Table 1) Unfortunately, we were unable to test selection acting at candidate barrier loci in F.

Considering barrier loci that evolve in a network 454
Hybrid breakdown is expected to result from deleterious epistatic interactions (i.e. 455 intrinsic incompatibilities) between diverged genes of the parental species 5,6 . 456 However, two fundamental questions about incompatibilities remain to be answered. 457 First, is hybrid breakdown based on few or many loci? Second, if breakdown is based 458 on many loci do these represent multiple two-locus interactions or do incompatibility 459 loci interact in networks or pathways? Our results provide insights into both of these 460 questions. We found that candidate barrier and hitchhiking loci likely cover a large 461 proportion of the genome in two Formica ant species, which diverged within the last 462 500 000 years 16 . We also found support for epistatic interactions between genes 463 located among the candidate barrier regions. Majority of genes in these regions were 464 predicted to form a single protein-protein interaction network, with interactions for 465 homologous proteins documented or predicted in other species. We were unlikely to  dataset shows a trend of increasing heterozygosity excess at candidate barrier loci 539 during female development, but this is not statistically significant after removal of 540 two outlier individuals likely to be early generation hybrids. However, we find a 541 significant excess of heterozygosity at the candidate barrier loci compared to the 542 random loci. This excess heterozygosity can be explained by two processes or their 543 combination. First, strong allele frequency differences between mothers and fathers 544 would result in high heterozygosity in diploid females without the action of selection 545 in the sampled generation. However, in this scenario initial differentiation between 546 sexes at candidate barrier loci was likely caused by selection (see below). Second, 547 selection that favors heterozygotes may have already acted on the candidate loci 548 before the larval stage we sampled. Our earlier study documented selection during 549 development using eggs and few days old larvae and compared them to adults, whereas here we use 1-2 week old larvae and adults. Thus, in the current study we are 551 likely to miss any signal of selection acting early in development. 552 We assumed the candidate barrier loci to represent recessive incompatibilities and loci 554 linked to them, thus expecting to find evidence of negative selection in the haploid 555 males 17,18 . However, we find the opposite with introgressed alleles increasing in 556 frequency at the candidate barrier loci in the F. aquilonia-like males during 557 development. This is consistent with natural selection favoring introgression in these 558 males. The magnitude of change is comparable to that observed within a generation in 559 other systems, e.g. in stick insects for the color pattern locus 48 . Natural selection for,

624
Conclusions 625 Here we bridged the gap between genome scan studies and fitness by mapping including all the loci found using Popoolation2). Popoolation2 results were more 676 conservative (i.e, less differentially fixed SNPs) and were used for the rest of the considering the number of chromosomes sampled per pool. 680

Gene Annotation, protein-protein interactions and GO term Enrichment Analysis 682
We defined barrier loci as SNP positions where the male genomes (sampled in 2004) 683 from the F. aquilonia-like and F. polyctena-like lineages were fixed differently (i.e. 684 F ST = 1, see Supplementary Figure 2), which led to identification of 711 SNPs. These 685 fall into to 610 assembled scaffolds. We created an automated pipeline to annotate 686 genomic regions around candidate barrier SNPs, taking advantage of the recent 687 release of the closely related F. exsecta genome 22 . First, using our assembly we 688 extracted a sequence of 5,000 bp centered on each of the 711 outlier SNPs. If the 689 contig was smaller than 5,000 bp, the full contig sequence was recovered. Since some 690 SNPs located on the same contig, and sometimes less than 5 kb apart, 639 unique 691 sequences were recovered and were blasted against the F. exsecta assembly. For each 692 queried sequence, hits covering less than 60% of its length were filtered out and the 693 best hit was kept per query based on e-value (over all queries, e-value < 10 -141 ). This 694 allowed to anchor 571 genomic regions on the F. exsecta assembly. The coordinates 695 of these regions were extended if needed to reach 5,000 bp, so that sampling effort 696 was equal among genomic regions. F. exsecta genes overlapping with best hits were 697 collected using the GenomeIntervals Bioconductor package v1.38 59 . Overall, 590 698 unique genes overlapped with our 571 genomic regions. Sixty-three of the candidate 699 genomic regions did not contain any annotated gene. The mean number of genes per 700 candidate genomic region was 1.4, and the maximum was six. For each gene, its F. 701 exsecta transcript sequence was recovered and blasted against D. melanogaster r6.15 702 proteins after filtering for the longest protein per gene. Alignments below 150 bp and 703 with less than 35% identity were filtered out and the best hit was kept for each query 704 based on e-value (over all queries, e-value < 10 -20 ). This pipeline recovered 317 D. 705 melanogaster genes whose homologs reside in candidate barrier regions. These 706 candidate barrier genes were used for subsequent analyses (e.g., GO and PPI 707 enrichment analyses). For a complete list of scaffolds annotated with their gene names 708 see Supplementary Table 1. GO term enrichment analysis was performed using the