Ecological divergence in sympatry causes gene misregulation in hybrids

Ecological speciation occurs when reproductive isolation evolves as a byproduct of adaptive divergence between populations. However, it is unknown whether divergent ecological selection on gene regulation can directly cause reproductive isolation. Selection favoring regulatory divergence between species could result in gene misregulation in F1 hybrids and ultimately lower hybrid fitness. We combined 58 resequenced genomes with 124 transcriptomes to test this hypothesis in a young, sympatric radiation of Cyprinodon pupfishes endemic to San Salvador Island, Bahamas, which consists of a dietary generalist and two novel trophic specialists – a molluscivore and a scale-eater. We found more differential gene expression between closely related sympatric specialists than between allopatric generalist populations separated by 1000 km. Intriguingly, 9.6% of genes that were differentially expressed between sympatric species were also misregulated in their F1 hybrids. Consistent with divergent ecological selection causing misregulation, a subset of these genes were in highly differentiated genomic regions and enriched for functions important for trophic specialization, including head, muscle, and brain development. These regions also included genes that showed evidence of hard selective sweeps and were significantly associated with oral jaw length – the most rapidly diversifying skeletal trait in this radiation. Our results indicate that divergent ecological selection in sympatry can cause hybrid gene misregulation which may act as a primary reproductive barrier between nascent species. Significance It is unknown whether the same genes that regulate ecological traits can simultaneously contribute to reproductive barriers between species. We measured gene expression in two trophic specialist species of Cyprinodon pupfishes that rapidly diverged from a generalist ancestor. We found genes differentially expressed between species that also showed extreme expression levels in their hybrid offspring. Many of these genes showed signs of selection and have putative effects on the development of traits that are important for ecological specialization. This suggests that genetic variants contributing to adaptive trait divergence between parental species negatively interact to cause hybrid gene misregulation, potentially producing unfit hybrids. Such loci may be important barriers to gene flow during the early stages of speciation, even in sympatry.


49
Significance 50 It is unknown whether the same genes that regulate ecological traits can simultaneously 51 contribute to reproductive barriers between species. We measured gene expression in two trophic 52 specialist species of Cyprinodon pupfishes that rapidly diverged from a generalist ancestor. We 53 found genes differentially expressed between species that also showed extreme expression levels 54 in their hybrid offspring. Many of these genes showed signs of selection and have putative 55 effects on the development of traits that are important for ecological specialization. This suggests 56 that genetic variants contributing to adaptive trait divergence between parental species negatively 57 interact to cause hybrid gene misregulation, potentially producing unfit hybrids. Such loci may 58 be important barriers to gene flow during the early stages of speciation, even in sympatry. Adaptive radiations showcase dramatic instances of biological diversification resulting from 76 ecological speciation, which occurs when reproductive isolation (RI) evolves as a byproduct of 77 adaptive divergence between populations (1, 2). Ecological speciation predicts that populations 78 adapting to different niches will accumulate genetic differences due to divergent ecological 79 selection, indirectly resulting in reduced gene flow. Gene regulation is a major target of selection 80 during adaptive divergence, with many known cases of divergent gene regulation underlying 81 ecological traits (3-7). However, it is still unknown whether divergent ecological selection on 82 gene regulation contributes to reproductive barriers during speciation (8,9). 83 Hybridization between ecologically divergent populations can break up coadapted 84 genetic variation, resulting in (Bateson) Dobzhansky-Muller incompatibilities (DMIs) if 85 divergent alleles from parental populations are incompatible in hybrids and cause reduced fitness 86 (10, 11). DMIs can result in gene misregulation: transgressive expression levels that are 87 significantly higher or lower in F1 hybrids than either parental population. Because gene 88 expression is largely constrained by stabilizing selection, gene misregulation in hybrids is 89 expected to disrupt highly coordinated developmental processes and reduce fitness (12, 13). 90 Indeed, crosses between distantly related species show that misregulation is often associated with 91 reduced hybrid fitness in the form of hybrid sterility and inviability (i.e. intrinsic postzygotic 92 isolation) (14-16). DMIs causing these forms of strong intrinsic isolation evolve more slowly 93 than premating isolating barriers and are traditionally modeled as fixed genetic variation between 94 allopatric populations (11). 95 However, it is unknown whether hybrid gene misregulation also contributes to RI during 96 the early stages of speciation, particularly for populations diverging in sympatry (9, 17, 18). 97 Either segregating or fixed alleles causing gene misregulation in hybrids could disrupt 98 developmental processes resulting in genetic incompatibilities (intrinsic postzygotic isolation) or 99 reduced performance under natural conditions (extrinsic postzygotic isolation). Emerging 100 evidence suggests that weak intrinsic DMIs segregate within natural populations (19) and are 101 abundant between recently diverged species, reaching hundreds of incompatibility loci within 102 swordtail fish hybrid zones (20, 21). Furthermore, hybrid gene misregulation has been reported 103 Second, we identified genes showing parallel expression divergence in both specialist 162 species relative to generalists that were misregulated in specialist F1 hybrids (Fig. 3). This 163 pattern likely results from parallel expression in molluscivores and scale-eaters controlled by 164 different genetic mechanisms (33). Significantly more genes showed differential expression in 165 both specialist comparisons than expected by chance (Fig. 3A-D; Fisher's exact test, P < 2.7 × 166 10 -5 ). Of these, 96.6% (1,206) showed the same direction of expression in specialists relative to 167 generalists, which was more than expected under a neutral model of gene expression evolution 168  Table S1). 186 This set of 125 genes, which we refer to as ecological DMI candidate genes, was significantly 187 enriched for functional categories highly relevant to divergent specialist phenotypes, including 188 head development, brain development, muscle development, and cellular response to nitrogen 189 (FDR = 0.05; Fig. 4A, Table S5). 190 8 26 (20.8%) of these ecological DMI candidate genes showed strong evidence of a hard 191 selective sweep in specialists (negative Tajima's D < genome-wide 10 th percentile; range: -1.62 -192 -0.77; SweeD composite likelihood ratio > 90 th percentile by scaffold; Table S6 and S7) and 16 193 of these showed at least a two-fold expression difference in F1 hybrids compared to purebred F1. 194 Several ecological DMI candidate genes have known functions that are compelling targets for 195 divergent ecological selection. For example, the autophagy-related gene map1lc3c has been 196 shown to influence growth when cells are nitrogen deprived (37, 38). Given that specialists 197 occupy higher trophic levels than generalists, as shown by stable isotope ratios (δ15N; Fig. 5B), 198 expression changes in this gene may be important adaptations to nitrogen-rich diets. Similarly,199 expression changes in the ten genes annotated for effects on brain development may influence 200 divergent behavioral adaptations associated with trophic specialists, including significantly 201 increased aggression (39) and female mate preferences (40). 202 Using a genome-wide association mapping method that accounts for genetic structure 203 among populations (41), we found that nine of the 125 genes in differentiated regions were 204 significantly associated with oral jaw sizethe most rapidly diversifying skeletal trait in this 205 radiation (GEMMA PIP > 99 th percentile; Table S8 have not been previously shown to influence cranial phenotypes, but some have known functions 212 in cell types relevant to craniofacial development (Table S8). For example, the gene sema6c, 213 which shows strong signs of selection in both scale-eaters and molluscivores (Fig. S6), is known 214 to be expressed at neuromuscular junctions and is important for neuron growth and development 215 within skeletal muscle (43). Expression changes in this gene may influence the development of 216 jaw closing muscles (adductor mandibulae), which tend to be larger in specialists relative to 217 generalists (Fig. 5B). Overall, we found candidate regulatory variants under selection that likely 218 contribute to hybrid gene misregulation and demonstrate that genes near these variants are 219 strikingly enriched for developmental functions related to divergent adaptive traits. 220 from misregulation of genes that are necessary for the normal development of their adaptive 251 traits. 252 If divergent ecological selection on adaptive traits also causes gene misregulation and 253 subsequently reduced performance and survival of hybrids in the wild, then these ecological 254 DMIs may promote rapid speciation, analogous to the mechanism of magic traits (51). For 255 example, whereas magic traits contribute to RI through assortative mating as a byproduct of 256 divergent ecological selection, these ecological DMIs contribute to RI through gene 257 misregulation and reduced hybrid fitness (18). Thus, our results support a mechanism for 258 divergent ecological selection to generate RI as a byproduct since many adaptive traits are 259 expected to evolve by divergent gene regulation that may come into conflict in a hybrid genetic 260 background (9, 18). 261 Mathematical models and simulations suggest that genetic incompatibilities evolve most 262 rapidly under directional selection (52, 53), and evolve more slowly under stabilizing selection 263 when compensatory cis and trans variants have opposing effects on expression levels (52). We 264 see evidence for both types of selection driving misregulation. 22.3% of all misregulated genes 265 showed expression patterns consistent with compensatory regulation, a signature of stabilizing 266 selection (Table S2). However, 26 ecological DMI candidate genes in highly differentiated 267 genomic regions showed strong evidence of hard selective sweeps due to directional selection 268 (Table S6), and more genes may have experienced soft sweeps that were not detected by our 269 methods. Although scale-eaters from Crescent Pond and Osprey Lake form a monophyletic 270 group (Fig. S1), we found little overlap in misregulated genes between lakes (Fig. 2). This may 271 result from selection on Caribbean-wide standing genetic variation that has similar effects on 272 expression, as we showed previously (33), and could reflect polymorphic incompatibilities 273 segregating within species (19). We also see distinct intraspecific differences between lake 274 populations of trophic specialists in pigmentation, maxillary protrusion, and other traits (54) We extracted RNA from a total of 348 individuals across two early developmental stages (2 days 308 post fertilization (dpf) and 8 dpf) using RNeasy Mini Kits (Qiagen, Inc.). For 2 dpf libraries, we 309 pooled 5 embryos together and pulverized them in a 1.5 ml Eppendorf tube. We used the same 310 extraction method for samples collected at 8 dpf but did not pool larvae. Libraries were prepared 311 using TruSeq stranded mRNA kits and sequenced on 3 lanes of Illumina 150 PE Hiseq4000 at 312 the Vincent J. Coates Genomic Sequencing Center. We mapped 1,638,067,612 adaptor-trimmed 313 reads to the reference genome using the RNAseq aligner STAR with default parameters (59). We 314 did not find a difference between species or outgroup populations for standard quality control 315 measures, (Fig. S7; ANOVA, P > 0.1), except for a marginal difference in transcript integrity 316 numbers ( Fig. S8; ANOVA, P = 0.041) driven by slightly higher transcript quality in North 317 Carolina generalist samples relative to other samples (Tukey post-hoc test: P = 0.043). We found 318 no significant differences among San Salvador Island generalists, molluscivores, scale-eaters, 319 and outgroups in the proportion of reads that mapped to annotated features of the Cyprinodon 320 reference genome (Fig. S9; ANOVA, P = 0.17). 321 We used the Genome Analysis Toolkit (60) to call and refine SNP variants across 58 322 Cyprinodon genomes and across 124 Cyprinodon exomes. We filtered both SNP datasets to 323 include individuals with a genotyping rate above 90% and SNPs with minor allele frequencies 324 higher than 5%. Our final filtered genomic SNP dataset included 13,838,603 variants with a 325 mean sequencing coverage of 8.2× per individual. We further refined our transcriptomic SNP 326 dataset using the allele-specific software WASP (v. 0.3.3) to correct for potential mapping biases 327 that would influence tests of allele-specific expression (61, 62). We re-called SNPs using 328 unbiased BAMs determined by WASP for a final transcriptomic SNP dataset that included 329 413,055 variants with a mean coverage of 1,060× across features per individual. 330 331

Phylogenetic analyses 332
In order to determine the relationship between expression divergence, F1 hybrid misregulation, 333 and phylogenetic distance, we estimated a maximum likelihood tree using RAxML (63). We 334 excluded all missing sites and sites with more than one alternate allele from our genomic SNP 335 dataset, leaving 1,737,591 variants across 58 individuals for analyses. We performed ten separate 336 searches with different random starting trees under the GTRGAMMA model. Node support was 337 estimated from 1,000 bootstrap samples. We fit phylogenetic generalized least-squares (PGLS) 338 models in R with the packages ape (64) and nlme to assess whether gene expression patterns 339 were associated with geographic distance among populations after accounting for phylogenetic 340 relatedness among populations and species. We excluded Osprey Lake populations from these 341 analyses because outgroups were only crossed with Crescent Pond generalists. 342 343

Population genomics and genome-wide association mapping 344
If alleles causing gene expression divergence between species affect the development of adaptive 345 traits, and also cause gene misregulation in hybrids resulting in low fitness, we predicted that 346 genomic regions near these genes would be strongly differentiated between species, associated 347 with divergent ecological traits, and show signatures of positive selection. We measured relative 348 genetic differentiation (Fst), within population diversity (π), and between population divergence 349 (Dxy) across 58 Cyprinodon individuals using 13.8 million SNPs (Table S1 and S7). We 350 identified 20 kb genomic windows significantly associated with variation in oral jaw size across 351 all populations in our dataset (Table S8;

Hybrid misregulation and inheritance of gene expression patterns 363
We aggregated read counts with featureCounts (67) at the transcript isoform level (36,511 364 isoforms corresponding to 24,952 protein coding genes). Significant differential expression 365 between groups was determined with DESeq2 (68) using Wald tests comparing normalized 366 posterior log fold change estimates and correcting for multiple testing using the Benjamini-367 Hochberg procedure with a false discovery rate of 0.05 (69). We compared expression in F1 368 hybrids to expression in F1 purebred offspring to determine whether genes showed additive, 369 dominant, or transgressive patterns of inheritance in hybrids. To categorize hybrid inheritance for 370 F1 offspring generated from a cross between a female from population A and a male from 371 population B (F1(A×B)), we conducted four pairwise differential expression tests with DESeq2: 1) 372

Parallel changes in gene expression in specialists 381
Parallel evolution of gene expression is often associated with convergent niche specialization, 382 but parallel changes in expression may also underlie divergent specialization (33). We looked at 383 the intersection of genes differentially expressed between generalists versus molluscivores and 384 generalists versus scale-eaters to determine whether both specialists showed parallel changes in 385 expression relative to generalists. We asked whether significant parallelism at the level of gene 386 expression in specialists was mirrored by parallel regulatory mechanisms. We predicted that 387  species (100% bootstrap support indicated at nodes). B) Geographic distance separating 581 populations was associated with differential gene expression levels in embryos at 2 days post 582 fertilization (2 dpf; phylogenetic least squares P = 0.02, dotted regression line). C) In whole 583 larvae at 8 dpf differential expression was not associated with geographic distance (PGLS; P = 584 0.18) and was higher between sympatric specialists (red) than between allopatric generalists 585 separated by 300 and 1000 km (black). D and E) Hybrid misregulation for sympatric crosses at 8 586 dpf than 2 dpf. Geographic distance was not associated with hybrid misregulation at either 587 developmental stage (PGLS; 2 dpf P = 0.17; 8dpf P = 0.38). Percentages in B-E were measured 588 using Crescent Pond crosses. 589 and L), we only show genes misregulated in a single cross direction. A total of 716 genes 596 (purple) were differentially expressed between species and also misregulated in their F1 hybrids. 597 Purple Venn diagrams show overlap between lake population comparisons; 4 genes showed 598 differential expression and misregulation in both lake comparisons. 599 in highly differentiated genomic regions that showed differential expression between species and 617 misregulation in F1 hybrids. Consistent with muscle development and nitrogen metabolism 618 enrichment, B) adductor mandibulae muscle mass tends to be larger in specialists and C) stable 619 nitrogen isotope ratios (δ15N) are significantly higher in scale-eaters, indicating that they occupy 620 a higher trophic level (Tukey post-hoc test: P < 0.001***). D) The gene mpp1 is controlled by 621 cis-regulatory divergence as shown by E) allele specific expression in F1 hybrids and F) 622 differential expression between Crescent Pond generalists vs. scale-eaters and misregulation in 623 their F1 hybrids. G) The gene mpp1 (light blue band) is near 170 SNPs fixed between Crescent 624 Pond generalists vs. scale-eaters (black points), shows high absolute divergence between species 625 (Dxy), low within-species diversity (π), signatures of a hard selective sweep (Tajima's D and 626 SweeD composite likelihood ratio (CLR)), and is significantly associated with oral jaw length 627 (PIP; GEMMA genome-wide association mapping). 628

Study system and sample collection 630
We collected 51 wild-caught individuals from nine isolated hypersaline lakes on San Salvador 631 Island, Bahamas (Great Lake, Stout's Lake, Oyster Lake, Little Lake, Wild-caught parents were reared in breeding tanks at 25-27°C, 10-15 ppt salinity, pH 8.3, and 650 fed a mix of commercial pellet foods and frozen foods. All purebred F1 offspring were collected 651 from breeding tanks containing multiple F0 breeding pairs. All F1 offspring from crosses 652 between species and populations were collected from individual F0 breeding pairs that were 653 subsequently sequenced in our genomic dataset. 654 Methods for collecting and raising embryos were similar to previously outlined methods 655 (3, 4). All F1 embryos were collected from breeding mops within one hour of spawning and 656 transferred to petri dishes incubated at 27°C. Embryo water was treated with Fungus Cure (API 657 Inc.) and changed every 48 hours. Embryos were inspected for viability and sampled either 47-658 49 hours post fertilization (hereafter 2 days post fertilization (2 dpf)) or 190-194 hours (eight 659 days) post fertilization (hereafter 8 dpf). These early developmental stages are described as stage 660 23 (2 dpf) and 34 (8 dpf) in a recent embryonic staging series of C. variegatus (5). The 2 dpf 661 stage is comparable to the Early Pharyngula Period of zebrafish, when multipotent neural crest 662 cells have begun migrating to pharyngeal arches that will form the oral jaws and most other 663 craniofacial structures (6-8). Embryos usually hatch six to ten days post fertilization, with 664 similar variation in hatch times among species (3, 7). While some cranial elements are ossified 665 prior to hatching, the skull is largely cartilaginous at 8 dpf (5) (Table S9). We extracted RNA from a total of 348 individuals (whole-embryos and whole-larvae) using 707 RNeasy Mini Kits (Qiagen catalog #74104). For samples collected at 2 dpf, we pooled 5 708 embryos together and pulverized them in a 1.5 ml Eppendorf tube using a plastic pestle washed 709 with RNase Away (Molecular BioProducts). We used the same extraction method for samples 710 collected at 8 dpf but did not pool larvae and prepared a library for each individual separately. We filtered raw reads using Trim Galore (v. 4.4, Babraham Bioinformatics) to remove 717 Illumina adaptors and low-quality reads (mean Phred score < 20) and mapped 1,638,067,612 718 filtered reads to the Cyprinodon reference genome (NCBI, Cyprinodon variegatus annotation 719 release 100; 1.035 Gb; scaffold N50 = 835,301; (7)) using the RNA-seq aligner STAR with 720 default parameters (v. 2.5 (11)). We assessed mapping and read quality using MultiQC (10). We 721 quantified the number of duplicate reads produced during sequence amplification and GC 722 content of transcripts for each sample using RSeQC (12). We also used RSeQC to estimate 723 transcript integrity numbers (TINs) which is a measure of potential in vitro RNA degradation 724 within a sample. TIN is calculated by directly analyzing the uniformity of read coverage across a 725 transcript and is a more reliable measure of degradation compared to RNA integrity number 726 (RIN) which uses ribosomal RNA as a proxy for overall RNA integrity (12, 13). We performed 727 one-way ANOVA to determine whether the GC content of reads, read depth across features, total 728 normalized counts, or TINs differed between samples grouped by species and population. We 729 did not find a difference between species or generalist populations for any quality control 730 measure ( Fig. S7; ANOVA, P > 0.1), except for a marginal difference in TIN ( Fig. S8; ANOVA, 731 P = 0.041) driven by slightly higher transcript quality in North Carolina samples (Tukey multiple 732 comparisons of means; P = 0.043). We found no significant differences among San Salvador 733 Island generalists, molluscivores, scale-eaters, and outgroup generalists in the proportion of reads 734 that map to annotated features of the Cyprinodon reference genome ( Fig. S9; ANOVA, P = 735 0.17). We did find that more reads mapped to features in 2 dpf samples than 8 dpf samples (Fig. 736 S13; Student's t-test, P < 2.2 × 10 -16 ). 737 738

Variant discovery and population genetic analyses 739
We followed the best practices guide recommended by the Genome Analysis Toolkit (v. 3.5 740 (14)) in order to call and refine SNP variants across 58 Cyprinodon genomes and across 124 741 Cyprinodon exomes using the Haplotype Caller function. For both datasets, we used 742 conservative hard filtering criteria to call SNPs (14, 15): Phred-scaled variant confidence divided 743 by the depth of nonreference samples > 2.0, Phred-scaled P-value using Fisher's exact test to 744 detect strand bias > 60, Mann-Whitney rank-sum test for mapping qualities (z > 12.5), Mann-745 Whitney rank-sum test for distance from the end of a read for those with the alternate allele 746 (z > 8.0). We filtered both SNP datasets to include individuals with a genotyping rate above 90% 747 and SNPs with minor allele frequencies higher than 5%. Our final filtered genomic SNP dataset 748 included 13,838,603 variants with a mean sequencing coverage of 8.2× per individual. 749 We further refined our transcriptomic SNP dataset using the allele-specific 750 software WASP (v. 0.3.3) to correct for potential mapping biases that would influence tests of 751 allele-specific expression (ASE; (16, 17)). While we showed that mapping bias does not 752 significantly affect the proportion of reads mapped to features between species (Fig. S9), even a 753 small number of biased sites would likely account for the majority of significant ASE at an 754 exome-wide scale. WASP identified reads that overlapped sites in our original transcriptomic 755 SNP dataset and re-mapped those reads after swapping the genotype for the alternate allele. 756 Reads that failed to map to exactly the same location were discarded. We re-mapped unbiased 757 reads using methods outlined above to create our final BAM files that were used for all 758 downstream analyses. We re-called SNPs using unbiased BAMs for a final transcriptomic SNP 759 dataset that included 413,055 variants with a mean coverage of 1,060× across gene features per 760 individual. 761 We analyzed genomic SNPs to measure within-population diversity (π), between-762 population diversity (Dxy), relative genetic diversity (Fst), and Tajima's D. We measured π, Dxy, 763 and Fst in 20 kb windows using the python script popGenWindows.py created by Simon Martin 764 (available on https://github.com/simonhmartin/genomics_general; (18)). 13.8 million SNP 765 variants genotyped by whole genome resequencing of 58 Cyprinodon individuals revealed more 766 population structure between allopatric generalists than between generalists and specialists on 767 San Salvador (genome-wide mean Fst between San Salvador generalists: vs. North Carolina = 768 0.217; vs. New Providence = 0.155; vs. scale-eaters = 0.106; vs. molluscivores = 0.056). We 769 found consistent relationships across a maximum likelihood phylogeny calculated with RAxML, 770 with longer branch lengths separating allopatric populations (Fig. 1, S1).

Read count abundance and differential expression analyses 799
We used the featureCounts function of the Rsubread package (23) requiring paired-end and 800 reverse stranded options to generate read counts across 36,511 previously annotated features for 801 the Cyprinodon reference genome (7). We aggregated read counts at the transcript isoform level 802 (36,511 isoforms correspond to 24,952 protein coding genes). 803 We used DESeq2 (v. 3.5 (24)) to normalize raw read counts and perform principal 804 component analyses. DESeq2 normalizes read counts by calculating a geometric mean of counts 805 for each gene across samples, dividing individual gene counts by this mean, and then using the 806 median of these ratios as a size factor for each sample. These sample-specific size factors 807 account for differences in library size and sequencing depth among samples. Gene features 808 showing less than 10 normalized counts in every sample were discarded from analyses. We 809 constructed a DESeqDataSet object in R using a multi-factor design that accounted for variance 810 in F1 read counts influenced by parental population origin and sequencing date (design = 811 ~sequencing_date + parental_breeding_pair_populations). Next, we used a variance stabilizing 812 transformation on normalized counts and performed a principal component analysis to visualize 813 the major axes of variation in 2 dpf and 8 dpf samples (Fig. S15). We removed one 8 dpf outlier 814 so that the final count matrix used for differential expression analyses included 124 samples

Hybrid misregulation and inheritance of gene expression patterns 830
We generated F1 hybrid offspring from crosses between populations and generated purebred F1 831 offspring from crosses within populations. We compared expression in hybrids to expression in 832 purebred offspring to determine whether genes showed additive, dominant, or transgressive 833 patterns of inheritance in hybrids. To categorize hybrid inheritance for F1 offspring generated 834 from a cross between a female from population A and a male from population B (F1(A×B)), we 835 conducted four pairwise differential expression tests with DESeq2: inheritance, meaning hybrid gene expression was significantly higher (overdominant) or lower 845 (underdominant) than both parental species (Fig. S10-12). All comparisons were conducted 846 between groups sampled at the same developmental stage (2 dpf or 8 dpf). 847 848

Parallel changes in gene expression in specialists 849
Parallel evolution of gene expression is often associated with convergent niche specialization, 850 but parallel changes in expression may also underlie divergent specialization (4). We looked at 851 the intersection of genes differentially expressed between generalists versus molluscivores and 852 generalists versus scale-eaters to determine whether specialists showed parallel changes in 853 expression relative to generalists. We compared expression between generalists and each 854 specialist grouping samples by lake population and developmental stage. 855 We also examined the direction of expression divergence for each gene to evaluate the 856 significance of parallel expression evolution (Fig 3E). Specifically, we wanted to know whether 857 the fold change in expression for genes tended to show the same sign in both specialists relative 858 to generalists (either up-regulated in both specialists relative to generalists or down-regulated in In order to determine regulatory mechanisms controlling expression divergence between 910 parental species, a transcript had to be included in differential expression analyses and ASE 911 analyses. We were able to classify regulatory categories for more transcripts if breeding pairs 912 were more genetically divergent because we could analyze more heterozygous sites in their 913 hybrids (mean number of informative transcripts across crosses = 1,914; range = 812 -3,543). 914 For each hybrid sample and each transcript amenable to both types of analyses, we calculated H 915 the ratio of maternal allele counts compared to the number of paternal allele counts in F1 916 hybrids, and Pthe ratio of normalized read counts in purebred F1 offspring from the maternal 917 population compared to read counts in purebred F1 offspring from the paternal population. We 918 performed a Fisher's exact test using H and P to determine whether there was a significant trans-919 contribution to expression divergence, testing the null hypothesis that the ratio of read counts in 920 the parental populations was equal to the ratio of parental allele counts in hybrids (26, 28, 30, 921 31). 922 We classified expression divergence due to cis-regulation if a transcript showed 923 significant ASE, significant differential expression between parental populations of purebred F1 924 offspring, and no significant trans-contribution. We identified expression divergence due to 925 trans-regulation if transcripts did not show ASE, were differentially expressed between parental 926 populations of purebred F1 offspring, and showed significant trans-contribution. We found 927 compensatory regulatory divergence (cis-and trans-regulatory factors had opposing effects on 928 expression) in cases where a transcript showed ASE and was not differentially expressed 929 between parental populations of purebred F1 offspring (Fig. S2-S4). 930 931

Phylogenetic analyses 932
Gene expression evolves under the combined forces of selection and drift, and is expected to 933 diverge linearly with increasing phylogenetic distance between closely related species (32). The 934 magnitude of F1 hybrid misregulation likely also depends on phylogenetic distance between 935 parental species (33). In order to determine the relationship between expression divergence, 936 hybrid misregulation, and phylogenetic distance, we constructed a maximum likelihood tree 937 using RAxML. We excluded all missing sites and sites with more than one alternate allele from 938 our genomic SNP dataset, leaving 1,737,591 variants across 58 individuals for analyses. We 939 performed ten separate searches with different random starting trees under the GTRGAMMA 940 model. Node support was estimated from 1,000 bootstrap samples. We used branch lengths from 941 the best fitting tree as a measure of phylogenetic distance between populations. 942 We tested whether isolation by distance (kilometers separating populations) was a 943 significant predictor of gene expression divergence between populations. We also tested whether 944 isolation by distance explained patterns of misregulation in hybrids generated by inter-population 945 crosses. Gene expression levels between species cannot be considered to be independent and 946 identically distributed random variables (34) . We used phylogenetic generalized least-squares 947 (PGLS) models in R, using the packages ape (35) and nlme to assess whether gene expression 948 patterns were predicted by distance between populations (measured in kilometers) after 949 accounting for phylogenetic relatedness. We excluded Osprey Lake populations from these 950 analyses because outgroup generalist hybrid crosses only involved Crescent Pond generalists. 951 We used lake diameter as the distance between populations for sympatric comparisons.    8dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OUE3 8dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OUE4 8dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OVE1 8dpf Jul-18 Osprey Lake generalist female x Osprey Lake scale eater male OVE4 8dpf Jul-18 Osprey Lake generalist female x Osprey Lake scale eater male OVE5 8dpf Jul-18 Osprey Lake generalist female x Osprey Lake scale eater male OXE2 8dpf Jul-18 Osprey Lake snail eater female x Osprey Lake generalist male OYE1 8dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYE2 8dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYE3 8dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYE4 8dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYE5 8dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OZE2 8dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male OZE4 8dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male OZE5 8dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male PAE1 8dpf 2dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OUT2 2dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OUT3 2dpf Jul-18 Osprey Lake generalist female x Osprey Lake snail eater male OVT1 2dpf May-18 Osprey Lake generalist female x Osprey Lake scale eater male OVT2 2dpf May-18 Osprey Lake generalist female x Osprey Lake scale eater male OVT3 2dpf May-18 Osprey Lake generalist female x Osprey Lake scale eater male OXT1 2dpf Jul-18 Osprey Lake snail eater female x Osprey Lake generalist male OXT2 2dpf Jul-18 Osprey Lake snail eater female x Osprey Lake generalist male OXT3 2dpf Jul-18 Osprey Lake snail eater female x Osprey Lake generalist male OYT1 2dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYT2 2dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OYT3 2dpf May-18 Osprey Lake scale eater female x Osprey Lake generalist male OZT1 2dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male OZT2 2dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male OZT3 2dpf Jul-18 Osprey Lake scale eater female x Osprey Lake snail eater male PAT1 2dpf population crosses. Yellow = conserved (no difference in expression between any group or 1306 ambiguous expression patterns), red = cis (significant ASE in hybrids, significant differential 1307 expression between parental populations of purebred F1 offspring, and no significant trans-1308 contribution), green = trans (significant ASE in hybrids, significant differential expression 1309 between parental populations of purebred F1 offspring, and significant trans-contribution), 1310 black = compensatory (significant ASE in hybrids, no significant differential expression between 1311 parental populations of purebred F1 offspring), blue = misregulated (significant differential 1312 expression between purebred F1 and hybrid F1), triangle = compensatory and misregulated. within-population diversity, shows strong signs of a hard selective sweep, and is significantly 1337 associated with oral jaw length variation in a genome-wide association analysis using GEMMA 1338 (Table S8). in hybrid F1), pink = maternal dominant (differential expression between purebred F1, 1412 differential expression between paternal population purebred F1 and F1 hybrids, no differential 1413 expression between maternal population purebred F1 and F1 hybrids), green = paternal dominant 1414 (differential expression between purebred F1, differential expression between maternal 1415 population purebred F1 and F1 hybrids, no differential expression between paternal population 1416 purebred F1 and F1 hybrids), red = overdominant (F1 hybrid gene expression significantly 1417 higher than parental population purebred F1), blue = underdominant (F1 hybrid gene expression 1418 significantly lower than parental population purebred F1). 1419