Abstract
The patterns of genetic relatedness among individuals vary along the genome, representing fluctuation of local ancestry. The factors responsible for this variation have not been well studied in wild animals with ecological and behavioural relevance. Here, we characterise the genomic architecture of genetic relatedness in the Eurasian blackcap, an iconic songbird species in ecology and quantitative genetics of migratory behaviour. We identify 23 genomic regions with deviated local relatedness patterns, using a chromosome-level de novo assembly of the blackcap genome and whole-genome resequencing data of 179 individuals from nine populations with diverse migratory phenotypes. Five genomic regions show local relatedness patterns of polymorphic inversions, three of which are syntenic to polymorphic inversions known in the zebra finch. Phylogenetic analysis reveals these three polymorphic inversions evolved independently in the blackcap and zebra finch indicating convergence of polymorphic inversions. Population genetic analyses in these three inversions in the blackcap suggest balancing selection between two haplotypes in one locus and background selection in the other two loci. One genomic region with deviated local relatedness is under selection against gene flow by population-specific reduction in recombination rate. Other genomic islands including 11 pericentromeric regions consist of evolutionarily conserved and non-conserved recombination cold-spots under background selection. Two of these regions with non-conserved recombination suppression are known to be associated with population-specific migratory phenotypes, where local relatedness patterns support additional effects of population-specific selection. These results highlight how different forms of recombination suppression and selection jointly affect heterogeneous genomic landscape of local ancestries.
Introduction
The effect of population structure and selection on realised genetic relatedness can be distributed heterogeneously along a chromosome (Mathieson and Scally 2020). This heterogeneity arises through recombination events breaking linkage between two neighbouring loci, resulting in different genetic ancestries along a chromosome. Recombination cold-spots —genomic regions with suppressed recombination— can mediate changes in genomic local ancestries both directly and indirectly. As a direct effect, suppressed recombination between sequences from different populations at barrier loci results in faster sorting of lineages than genomic background (Wu 2001; Butlin 2005; Nachman and Payseur 2012; Hejase et al. 2020). Indeed, recombination rate variation is correlated with admixture proportion, suggesting that recombination landscapes play a highly polygenic and general roles in shaping local relatedness patterns (Martin et al. 2019). As an indirect effect, linked selection can also change local relatedness patterns at recombination cold-spots. For example, species-wide long-term background selection (i.e. hitchhiking effects by purifying/negative selection against deleterious mutations) reduces genetic variation at recombination cold-spots in all populations (Roesti et al. 2013; Burri 2017; Vijay et al. 2017). On the other hand, population-specific selective sweeps (i.e. hitchhiking effects by positive selection for beneficial mutations), reduce genetic variation at recombination cold-spots in certain populations (Burri 2017; Vijay et al. 2017; Hejase et al. 2020).
Principal component analysis (PCA) has been widely used to infer population structure by summarising and visualising genetic relatedness among samples based on a genotype matrix (Patterson et al. 2006; Price et al. 2006). When applied to a whole-genome genotype matrix, results of PCA often represent biogeography and history of the populations to which the samples belong, averaging variation of local relatedness patterns along the genome (Becquet et al. 2007; Paschou et al. 2007; Gautier et al. 2009; Willing et al. 2010). To capture fluctuating patterns of local relatedness along chromosomes, Li and Ralph (2019) developed “local PCA (lostruct)”. In this method, PCA is performed in sliding genomic windows to summarise local genetic distances among individuals. Similarities among the genomic windows based on the PCA results are then summarised with multidimensional scaling (MDS), whereby genomic regions with deviated local relatedness patterns are identified. In recent studies, local PCA was applied to support discoveries of polymorphic inversions (Huang et al. 2020; Perrier et al. 2020; Todesco et al. 2020; Mérot et al. 2021) as well as other evolutionary factors that deviate local relatedness patterns (Fuller et al. 2020; Paris et al. 2021).
The Eurasian blackcap (Sylvia atricapilla) is a songbird species that exhibits variation in phenotypes of seasonal migration, specifically orientation, distance, and propensity to migrate (Berthold 1988; Berthold 1991; Helbig 1991). Blackcap populations breeding in central and northern Europe migrate over medium to long distances, while some populations breeding on the Iberian Peninsula migrate over short distances. Some continental populations in northern Africa and the Iberian Peninsula as well as island populations (including the Macaronesian and Mediterranean islands) are resident, i.e. breeding and wintering at the same geographic locations (Berthold 1988; Cuadrado 1994; Pérez-Tris et al. 1999; Aymí et al. 2020). These blackcap populations have split recently (∼30,000 years ago) and have differentiated their migratory phenotypes (Delmore et al. 2020). Although the iconic blackcap has been used to demonstrate the presence of genetic basis of migration and to study evolutionary history of diverged migratory phenotypes, the genomic architecture of relatedness patterns is poorly understood.
In this study, we characterise genomic architecture of local relatedness patterns in the blackcap. By applying local PCA to whole-genome resequencing data of 179 blackcaps from nine populations covering the full range of migratory phenotypes, we identify genomic islands of deviated relatedness patterns. Using population and comparative genomics, we characterise these genomic islands to understand different factors associated with deviated local relatedness patterns. We find different types of selection plays roles in deviating local relatedness patterns, including balancing and background selection at polymorphic inversions, selection against gene flow at a genomic region in which recombination rate is reduced in certain populations, and background selection at conserved and non-conserved recombination cold-spots, two of which are under selection specifically in certain migratory phenotypes. These results highlight how different types of selection and recombination suppression deviate local ancestries along genomes.
Results
To address our questions about genomic architecture of local ancestries, we generated a high-quality, chromosomal-level reference genome of the blackcap using the Vertebrate Genomes Project pipeline v1.5 (Rhie et al. 2021). Blood of a female from the non-migrant Tarifa, Spain population, was collected and chosen in order to assemble both Z and W sex chromosomes. We generated contigs from Pacbio long reads, sorted haplotypes, and scaffolded sequentially with 10X Genomics linked reads, Bionano Genomics optical mapping, and Arima Genomics Hi-C linked reads. Base call errors were polished with both PacBio long reads and Arrow short reads to achieve above Q40 accuracy (no more than 1 error every 10,000 bp). Manual curation identified 33 autosomes and Z and W chromosomes (plus 1 unlocalised W). Autosomes were named in decreasing order of size, and all had counterparts in the commonly used VGP reference zebra finch assembly. The final 1.1 Gb assembly had 99.14% assigned to chromosomes, with a contig N50 of 7.4 Mb, and scaffold N50 of 73 Mb, indicating a high-quality assembly that fulfills the VGP standard metrics. The primary and alternate haplotype assemblies can be found under NCBI BioProject PRJNA558064, accession numbers GCA_009819655.1 and GCA_009819715.1.
Local PCA identifies genomic regions with deviated relatedness patterns
To identify genomic regions with deviated relatedness patterns, we performed local PCA (Li and Ralph 2019). We found 23 genomic islands of deviated relatedness patterns in the blackcap genome (Fig. 1A, B, Table 1). All genomic islands were located on different chromosomes. In the MDS space, windows within a genomic island deviated to the same direction compared to the rest of the same chromosome (Fig. 1C, D, Supplementary Fig. 2). This suggests that each genomic island has a distinct relatedness pattern that differs from the whole-genome population structure, instead of greater stochasticity of local genetic ancestries.
To classify genomic islands of deviated relatedness patterns, we performed PCA for each genomic island, this time using all SNPs within a given genomic island (Supplementary Fig. 3). We grouped genomic islands of deviated relatedness patterns into three classes: class-1 defined as genomic islands where samples were clustered into three groups along either PC1 or PC2 axes (Fig. 1E); class-2 defined as genomic islands where particular population(s) diverged from the other populations (Fig. 1F); and class-3 for all other genomic islands without characteristic patterns (Fig. 1G).
Polymorphic inversions with different types of selection in class-1 genomic islands
Five class-1 genomic islands were located on chromosomes 6, 12, 14, 28, and 30 (Supplementary Fig. 3G, M, O, V, W, Table 1). On the PCA, PC1 separated samples into three groups (Fig. 2A, C), except for the one on chromosome 14 in which clustering occurred along PC2 (Fig. 2B). This parallels with the pattern of eigenvalues: the ratio of eigenvalue of the PC1 to that of the PC2 was high in class-1 genomic islands except for chromosome 14 (Supplementary Fig. 4).
The observed pattern of the PCA with three clusters of samples is indicative of a polymorphic inversion, with the groups at the two edges being homozygous for one of the two haplotypes (the normal and inverted arrangements) and the middle being heterozygous (Ma and Amos 2012; Ruiz-Arenas et al. 2019; Mérot 2020). We inspected soft-clipped read alignments associated with PCA-based genotypes and found one putative breakpoint for chromosomes 12 and 30 (Supplementary Fig. 7A, B). To investigate whether population genetic measures fit expectations for the scenario of polymorphic inversions in class-1 genomic islands, we named major and minor haplotypes A and B (Fig. 2A-C) and calculated heterozygosity in the AA, AB and BB samples. If these three groups of samples on the PCA represent three genotypes of a polymorphic inversion, heterozygosity of the AB samples is expected to be higher than the AA and BB samples (Knief et al. 2017; Huang et al. 2018). Indeed, the AB samples had higher heterozygosity than the AA and BB samples within class-1 genomic islands, and BB had the lowest heterozygosity (Fig. 2D-F, Supplementary Fig. 6, Supplementary Table 2). To characterise ancestral (normal) and derived (inverted) haplotypes, we performed an additional PCA using blackcap samples with two of their closest sister species resequenced on the blackcap reference assembly (five samples of garden warblers Sylvia borin and three samples of African hill babblers Sylvia abyssinica). In PCA for all class-1 genomic islands, the sister species were placed either between the AA and AB (i.e. closer to the AA than BB on chromosomes 12, 28, and 30) or clustered with the blackcap AA samples (on chromosomes 6 and 14) along the PC axis separating three blackcap genotypes (Supplementary Fig. 5), indicating the haplotype B is the derived allele with inverted arrangement.
To investigate genetic variation in class-1 genomic islands, we calculated nucleotide diversity (π) for each genotype, as well as absolute divergence (dXY) and relative differentiation (FST) between homozygous AA and BB samples for comparisons between the two haplotypes (class-1 genomic island on chromosome 6 was not analysed as only one sample was BB genotype). On chromosomes 12 and 30, both FST and dXY between AA and BB samples were elevated within class-1 genomic islands (Fig. 3A-F, G, I, K, Supplementary Table 4), suggesting divergence between A and B haplotypes. In these two regions, π was low in the BB samples (Fig. 3D, F, H, L, Supplementary Table 3). Also in class-1 genomic island on chromosome 14, FST but not dXY between AA and BB was elevated (Fig. 3B, E, I, Supplementary Table 4). π was decreased for both AA and BB samples in this region (Fig. 3E, J, Supplementary Table 3), suggesting loss of genetic variation is responsible for increased FST in this region. Lastly, the genomic island on chromosome 28 did not show elevated FST between AA and BB (Supplementary Fig. 8, Supplementary Table 4).
Coalescent times are known to distribute differently at genomic regions under selection (Charlesworth 2009; Guerrero et al. 2012; Fijarczyk and Babik 2015; Ellegren and Galtier 2016; Speidel et al. 2019; Hejase et al. 2020). To investigate the effects of different types of selection on coalescent time within an inversion, we performed forward-in-time simulations of polymorphic inversions using SLiM (Haller and Messer 2019). Specifically, we simulated a chromosome with a polymorphic inversion under nine conditions with three different fitness scenarios (neutrality, frequency-dependent selection, and overdominance) and three proportions of mutations with different fitness effects (neutral, deleterious, and mixed), and inferred coalescent times along the chromosome using one of the three genotypes at the inversion locus (normal/normal (NN), normal/inverted (NI), and inverted/inverted (II)) with MSMC2-decode (Schiffels and Durbin 2014) over multiple time points of simulations (Materials and Methods). Old polymorphic inversions maintained at low frequencies by balancing selection (both frequency-dependent and overdominance) showed longer coalescent time for NI and shorter coalescent time for II within the inversion compared to the chromosomal background (Supplementary Figs. 10, 11). Young inversions under neutrality and balancing selection exhibited similar patterns of coalescent within the inversion for NN and NI compared to the chromosomal background (Supplementary Figs. 10, 11). High proportion of deleterious mutations (i.e. purifying selection) resulted in short coalescent time regardless of the inversion genotype (Supplementary Figs. 10, 11). These simulations give us qualitative expectations of coalescent time within an inversion under scenarios with different selection pressures and ages compared to chromosomal background.
To characterise evolutionary histories with different types of selection in our blackcap inversions, we inferred the chromosomal distribution of coalescent time between pairs of sequences of the same and different haplotypes by MSMC2-decode. We used AA samples for coalescent between two sequences of the A haplotype, and BB samples for that of the B haplotype. We used AB samples for cross-haplotype coalescent between the A and B haplotypes. Consistent with our simulations, coalescent times for BB within for all class-1 genomic islands except chromosome 6 were shorter than the chromosomal background (Fig. 3M-O, Supplementary Fig. 12), and shorter than AA and AB (Fig. 3M-O, Supplementary Fig. 12, details in Supplementary Tables 5, 6). On chromosomes 28 and 30, coalescent times within class-1 genomic islands for AA and AB were not significantly different between each other as well as compared to the chromosomal background (Fig. 3O, Supplementary Fig. 12, details in Supplementary Table 6). On chromosome 12, the cross-haplotype coalescent times within class-1 genomic island for AB were longer than the chromosomal background (Fig. 3M, Supplementary Fig. 12, details in Supplementary Table 6). Coalescent times within the genomic islands on chromosomes 6 and 14 were shorter than the background for all three genotypes (Fig. 3N, Supplementary Fig. 12, details in Supplementary Table 6). These results suggest heterogeneity among the polymorphic inversions: the inversion on chromosome 12 was under balancing selection for long time; the inversions on chromosomes 6 and 14 were under background selection; and the inversions on chromosomes 28 and 30 have recent origins.
At an inversion locus, recombination is suppressed in heterozygotes (NI) but not in homozygotes (NN and II) (Wellenreuther and Bernatchez 2018). To investigate whether the presence of polymorphic inversions alone determines local recombination landscape in homo- and heterozygotes at class-1 genomic islands, we intended to infer recombination rates using AA, AB, and BB samples separately based on linkage disequilibrium (LD) patterns around the genomic islands. Before addressing this in blackcaps empirically, we first assessed how Pyrho (Spence and Song 2019), an LD-based inference of recombination landscape, performs at an inversion using samples with a certain inversion genotype. We simulated an inversion using SLiM under six scenarios listed in Supplementary Table 7, and inferred recombination rates using samples with a certain inversion genotype (NN, NI, and II) with Pyrho. The simulations revealed that inferred recombination rates using homozygotes of minor haplotype (II for models 1-3 and NN for models 4-6) were decreased in the inversion interval, even without additional genotype-specific recombination suppression (Supplementary Fig. 14A, G). The inferred recombination rates using the major haplotype homozygotes (NN for models 1-3 and II for models 4-6) were decreased only when recombination within them was explicitly suppressed (models 3 and 6, Supplementary Fig. 14E, I). Consistently, LD calculated using major haplotype homozygotes (NN for models 1-3 and II for models 4-6) was elevated only when recombination within the major haplotype was suppressed (Supplementary Fig. 13A vs I, M vs Q). These simulations provide guides on how to interpret recombination rates at class-1 genomic islands inferred using a certain inversion genotype: while low recombination rates inferred using minor haplotype homozygotes (BB) could happen even without recombination suppression in BB, the same pattern inferred using major haplotype homozygotes (AA) would indicate additional recombination suppression besides the presence of a polymorphic inversion.
To characterise inversion genotype-specific recombination landscape around class-1 genomic islands, we applied Pyrho to our empirical data using each inversion genotype (AA, AB, and BB) separately. Recombination rate inferred from the AB samples was low within class-1 genomic islands on chromosomes 6, 12, 14, and 30, as well as to a lesser extent on chromosome 28 (Fig. 3P-R, Supplementary Fig. 16), supporting recombination suppression between arrangements. Recombination rate inferred using AA was at moderate levels within class-1 genomic islands on chromosomes 12 (Fig. 3P), and 28 (Supplementary Fig. 16), suggesting no recombination suppression in AA at these loci. Recombination rate inferred from BB samples was decreased in class-1 genomic islands on chromosomes 12 (Fig. 3P), 28 (Supplementary Fig. 16), and 30 (Fig. 3R), consistent with the simulations, suggesting the effects of low inversion frequency on LD patterns. However, recombination rate inferred from AA samples was low within the class-1 genomic islands of chromosomes 6 (Supplementary Fig. 16) and 14 (Fig. 3Q), suggesting additional recombination suppression besides suppression in inversion heterozygotes (Note that inference of recombination rate using BB samples in class-1 genomic islands on chromosomes 6 and 14 was not performed due to an insufficient number of samples). On chromosomes 6 and 14, the elevated LD in AA and all samples extended to the outside of boundaries of the class-1 genomic islands (Supplementary Fig. 15), indicating recombination suppression in a region containing the class-1 genomic islands on these two chromosomes. These empirical results, combined with our simulations, demonstrate heterogeneity of recombination suppression at class-1 genomic islands: while all class-1 genomic islands are under recombination suppression in heterozygotes (AB), class-1 genomic islands on chromosomes 6 and 14 are nested in additional recombination suppression.
Blackcap and zebra finch have recurrent polymorphic inversions at overlapping genomic regions
To investigate phylogenetic relevance of the polymorphic inversions, we analysed synteny of the blackcap genome to a distant species zebra finch Taeniopygia guttata. The zebra finch is a passerine model species in which four polymorphic inversions had been identified and characterised (Knief et al. 2016). Unexpectedly, three of the five class-1 genomic islands in the blackcap on chromosomes 6, 12, and 14 overlapped with the inversions on zebra finch chromosomes 5, 11, and 13 (Fig. 4A, B).
For each of the three polymorphic inversions syntenic between blackcap and zebra finch, there are two possible evolutionary scenarios. The first scenario is that recurrent inversion events occurred independently in two lineages at the overlapping genomic intervals. The second scenario is that an inversion event occurred in the common ancestor between blackcaps and zebra finches, and both ancestral and inverted arrangements were maintained in the two lineages. In other words, overlapping polymorphic inversions evolved repeatedly in the two lineages under the first scenario, while an old orthologous inversion has been maintained in the two lineages under the second scenario. To distinguish between these two scenarios for each of the three loci, we constructed maximum likelihood phylogenetic trees from the consensus sequences of the two haplotypes in the blackcap and zebra finch, along with other related species. We generated consensus sequences of the A and B haplotypes in the blackcap and zebra finch using reference genome assemblies and SNP data (blackcap with our own data set and zebra finch with a published data set from Singhal et al. (2015)). We focused on class-1 genomic islands on blackcap chromosomes 6 and 12 and excluded chromosome 14 because the zebra finch data set lacked SNPs within the genomic regions syntenic to class-1 genomic island of the blackcap chromosome 14. For maximum likelihood phylogenetic inferences, we included garden warbler and Bengalese finch Lonchura striata as sister groups of blackcap and zebra finch, and rifleman Acanthisitta chloris as the outgroup for the four species. On both phylogenies for the two class-1 genomic islands, the two haplotypes of the same species were clustered next to each other (Fig. 4D, Supplementary Fig. 18), consistent with the species tree constructed from the other part of the same chromosome (Fig. 4C, Supplementary Fig. 18). These results suggest that recurrent inversions at the overlapping genomic regions occurred independently in the two lineages.
Selection against gene flow with potential effects of population-specific sweeps at a class-2 genomic island
One genomic island of deviated relatedness pattern on chromosome 21 was classified as class-2 (Fig. 1F, Table 1). On the PCA performed in this genomic island, samples from two populations (Azores and Cape Verde) were diverged from all the other populations (Fig. 1F, Fig. 5F). Multiple possible evolutionary processes could lead to this pattern: introgression from a distant lineage to these two populations, population-specific selective sweeps, or differentiation by selection against gene flow. We ran VolcanoFinder and found no evidence for introgression in the class-2 genomic island (Supplementary Fig. 19).
Although both population-specific selective sweep and selection against gene flow are often associated with elevated FST between two populations, they leave different patterns of other summary statistics such as π and dXY (Hejase et al. 2020). While a population-specific sweep is expected to lower π for the population with the sweep making Δπ (difference of π between the two populations) greater than their chromosomal background, selection against gene flow is not expected to lower π, leaving Δπ the same level as the chromosomal background. dXY, on the other hand, should be elevated with selection against gene flow, but not with the population-specific sweep. In addition, because variation in recombination rate is negatively correlated with admixture proportion (Martin et al. 2019), low recombination rate is expected for selection against gene flow. To determine which of these scenarios better explains the deviated relatedness patterns at the class-2 genomic island, we calculated FST, dXY and π in a 10-kb sliding window between two groups (group 1: Azores and Cape Verde; and group 2: medium-long distance migrant (represented by Belgium) and continental resident (represented by Cazalla de la Sierra, Spain)), and within each group (Azores vs Cape Verde and medium-long distance migrant vs continental resident). In addition, we inferred recombination landscape along chromosome 21 for medium-long distance migrant, continental resident, Azores, and Cape Verde populations respectively, using Pyrho. The scenario with selection against gene flow was supported by elevated FST and dXY in the class-2 genomic island in all four pairwise analyses between the two groups (Azores vs medium-long distance migrants (Fig. 5B-D), other pairs in Supplementary Fig. 20). Lower recombination rate within the class-2 genomic island in Azores and Cape Verde compared to medium-long distance migrant and continental resident populations (Fig. 5P, Q) was also consistent with the scenario with selection against gene flow in the Azores and Cape Verde populations. However, Δπ between the two groups were significantly greater than chromosomal background (Fig. 5C, E, Supplementary Fig. 20C, E, Supplementary Table 9), supporting the population-specific sweep scenario (Note, however, that π in the class-2 genomic island was not significantly lower than chromosomal background in all populations Supplementary Table 9). These results indicate that selection against gene flow by reduced recombination in Azores and Cape Verde populations is responsible for the deviated local relatedness pattern in the class-2 genomic island of chromosome 21, potentially with additional effects by weak population-specific sweeps.
Class-3 genomic islands experienced linked selection in conserved and non-conserved recombination cold-spots
17 genomic islands of deviated relatedness patterns were classified as class-3 (Supplementary Fig. 3A-F, H-L, N, P-T, Table 1). In these genomic islands, patterns of PCA results were less clear than for class-1 and class-2 genomic islands (Supplementary Fig. 3). Consistently, the ratio between the eigenvalues of the PC1 and PC2 for class-3 genomic islands were lower than that of class-1 genomic islands (Supplementary Fig. 4). The exceptionally high read depth in parts of class-3 genomic islands (Supplementary Fig. 21) prompted us to hypothesise their association with repetitive elements. Our resequencing strategy took advantage of Illumina short reads, which are known to be unsuited to genotype repeats (Weissensteiner and Suh 2019). Therefore, we first focused on analysing the VGP reference genome assembly.
We investigated whether certain types of repeats are co-localised with class-3 genomic islands along the assembly, instead of characterising the variation in the repeat among samples. We found 18,671 tandem repeats (TRs) with repeat unit sizes between 10 and 500 bp. By counting the number of repeats by stepwise ranges of repeat unit size in a 10-kb sliding window, we found TRs with large (>150 bp) unit size co-localised with 11 of the 17 class-3 genomic islands or the adjacent regions especially on long chromosomes (chromosomes 1, 2, 3, Z, 5, 7, 8, 9, 10, 13, and 16 (Fig. 6A, B, Supplementary Fig. 22)), as well as two class-1 genomic islands (chromosomes 6 and 14 (Supplementary Fig. 22)). On the other six chromosomes most of which are short (“microchromosomes” 11, 15, 17, 18, 20, 27), such co-localisation between TRs and class-3 genomic islands was not detected (Fig. 6C, F, Supplementary Fig. 22). To determine whether there are many different TRs (with unique consensus monomer sequences) repeated only few times for each or there are a few unique TRs repeated many times, we listed six TRs with the longest unit sizes in a chromosome and mapped the chromosomal positions and repeat counts of these six TRs (Fig. 6D-F, Supplementary Fig. 23). In most cases, a few TRs with long unit size were repeated tens to hundreds of times within or next to class-3 genomic islands (Fig. 6D, E, Supplementary Fig. 23). These results suggest that the 11 class-3 genomic islands of large chromosomes are associated with relatively long TRs.
Centromeres and peri-centromeric regions are good candidates for a genomic feature underlying deviated local relatedness patterns at the 11 class-3 genomic islands with TRs (Melters et al. 2013; Hartley and O’Neill 2019; Weissensteiner and Suh 2019). In addition to the presence of a few TRs repeated many times in the 11 class-3 genomic islands, all of them are the only genomic island of deviated relatedness patterns in a chromosome, consistent with the possibility that centromeres may be involved in class-3 genomic islands. To further test this possibility, we inferred recombination landscape along the blackcap genome, because recombination is known to be suppressed at centromeres. At most class-3 genomic islands with long TRs, we found that recombination was suppressed (Fig. 6G, H, Supplementary Fig. 25). However, recombination was also suppressed in class-3 genomic islands where we did not find TRs with long unit sizes (e.g chromosome 18 (Fig. 6I)), indicating suppressed recombination (including that of (peri)centromeric regions) may be the factor associated with class-3 genomic islands instead of presence of centromeres per se.
To investigate whether the PCA results for class-3 genomic islands reflect true deviation of local relatedness patterns or they are deviated by technical (i.e. bioinformatic) effects owing to the presence of TRs in the reference, we compared the PCA results with and without masking TRs. Masking TRs did not change PCA results (Supplementary Figs. 3, 24), indicating the deviation of local relatedness patterns in class-3 genomic islands is indirect effects of recombination suppression (such as linked selection) rather than due to technical effects by the presence of TRs.
Having ruled out the possibility that local relatedness patterns are directly affected by the presence of TRs in class-3 genomic islands on the reference, we then calculated nucleotide diversity (π) for each population to decipher which type of linked selection may be able to explain the observed patterns of local relatedness. The same degree of decrease in π in all populations irrespective of the distinct demography among blackcap populations (Delmore et al. 2020) is expected for a scenario with long-term population-non-specific background selection, whereas decrease in π in only a subset of populations is expected for (population-specific) sweeps (Burri 2017). π was decreased similarly in class-3 genomic islands for all populations (Fig. 6J-L, Supplementary Fig. 26), suggesting that background selection deviates local relatedness patterns in class-3 genomic islands. As was the case for recombination suppression, π was decreased not only at putative centromeric regions with long TRs (Fig. 6J, K) but also in class-3 genomic islands without long TRs (Fig. 6L). Together, these results suggest that class-3 genomic islands of deviated local relatedness patterns are associated with population-non-specific long-term background selection where recombination is suppressed. Tandem repeats with long repeat unit sizes were found in many of them probably because centromeres are major recombination cold-spots in the genome.
To investigate whether class-3 genomic islands of the blackcap represent evolutionarily conserved recombination cold-spots, we inferred the recombination landscape of the closest sister species garden warbler using Pyrho, and compared recombination rates between the two species in 50 kb sliding windows for 16 autosomes with class-3 genomic islands. Eight class-3 genomic islands of blackcap autosomes (chromosomes 1 (Fig. 7C), 2, 11, 15, 16, 17, 20) were in apparent recombination cold-spots in garden warblers (i.e. “conserved” recombination cold-spots), while recombination suppression in all or some windows within the other nine class-3 genomic islands (chromosomes 3, 4, 5, 7, 8, 9, 10, 13, 18) was not conserved between the two species (Fig. 7A, B, Supplementary Fig. 27). This result suggests class-3 genomic islands consist of evolutionarily heterogeneous recombination cold-spots (i.e. conserved and non-conserved cold-spots). The presence of conserved recombination cold-spots is in line with population-non-specific long-term background selection at class-3 genomic islands. Meanwhile, the presence of non-conserved recombination cold-spots indicates that their relatedness patterns may be less stable and subject to other types of selection such as population-specific linked selection.
In blackcaps, Delmore et al. (2020) previously identified genomic regions associated with variation in migratory phenotypes among populations. To investigate potential roles of population-specific selection associated with migratory phenotypes in non-conserved re-combination cold-spots within class-3 genomic islands, we compared the positions of genomic islands that we found in this study with the results of Delmore et al. (2020). Two genomic islands of deviated local relatedness patterns (class-3 genomic islands on chromosomes 3 and 10 in non-conserved recombination cold-spots (Fig. 7A, B)) overlapped with two loci identified in Delmore et al. (2020) (Super-Scaffolds 99 and 22) as genomic regions under selection in continental residents and medium distance southeast migrants, respectively. Because these two class-3 genomic islands have local relatedness patterns with triangular spread of samples on PCA (Supplementary Fig. 3C, K), we assumed that there are three non-recombining haplotypes (triallelic model, Supplementary Fig. 28A, Ruiz-Arenas et al. (2019)) for each genomic island. As expected from Delmore et al. (2020), the continental residents and medium distance southeast migrants were distributed in a biased manner in the PC1-PC2 space (Fig. 7D, E). In the genomic island on chromosome 3, one haplotype (haplotype “B” in Fig. 7D) was more frequent in the continental residents than in populations with other migratory phenotypes, suggesting selection for this haplotype in this population. On the contrary, in the genomic island on chromosome 10, two haplotypes (“B” and “C” in Fig. 7E) were both equally frequent in the medium distance southeast migrants, with few individuals with the A haplotype (Fig. 7E), which could be explained by selection against haplotype A in the medium distance southeast migrants. These results are in line with the potential contributions of non-conserved recombination cold-spots to deviate local relatedness patterns through migratory phenotype-specific linked selection.
Discussion
Here, we characterised the heterogeneous genomic architecture of local relatedness in the blackcap. The identified genomic islands of deviated relatedness patterns are associated with polymorphic inversions, selection against gene flow, and different types of linked selection as well as recombination suppression.
Balancing and background selection at polymorphic inversion loci
One polymorphic inversion on chromosome 12 was maintained by balancing selection over a long evolutionary time span. There are multiple cases in which polymorphic inversions are associated with large (and often discrete) physiological, morphological, and behavioural polymorphism under balancing selection, such as the social chromosome in fire ants (Pracana et al. 2017; Huang et al. 2018), mating system, aggressiveness and plumage polymorphism in the male ruff (Küpper et al. 2015; Lamichhaney et al. 2015) and white-throated sparrow (Horton et al. 2014; Tuttle et al. 2016; Merritt et al. 2020), sperm motility in the zebra finch (Kim et al. 2017; Knief et al. 2017), and local adaptation in the sunflower (Huang et al. 2020; Todesco et al. 2020). Alternatively, introgressed inversions under balancing selection can also have large divergence and long cross-haplotype coalescent time, as well as suppressed recombination between the two haplotypes. For instance, a “supergene” associated with wing pattern mimicry in the butterfly Heliconius numata arose by an introgressed inversion from a distant lineage which had diverged for more than a million years (Jay et al. 2018). However, this scenario is unlikely to explain the polymorphic inversion that we identified in the blackcap on chromosome 12, because the phylogenetic distance between the two haplotypes is not comparable to that of between the blackcap and its closest sister species.
Two other inversions nested within recombination cold-spots on chromosomes 6 and 14 in the blackcap are under background selection. These two inversions co-localise with putative centromeres, suggesting they may be pericentric inversions. The effects of purifying selection on realised genetic variation depend on recombination rate (Hudson and Kaplan 1995), thus the reduced nucleotide diversity and coalescent time in these inversions can be well explained by recombination suppression at the centromeres. Despite the effect of background selection, the presence of polymorphic inversions in these regions is the main determinant of local relatedness patterns in which individuals are clustered according to their inversion genotypes. Interaction between background selection and a polymorphic inversion on local relatedness pattern over time (i.e. how a class-3 genomic island transits to a class-1 genomic island by a novel inversion) should be studied further by simulations.
The three inversions in the blackcap discussed above are syntenic to polymorphic inversions that have been independently identified in the zebra finch (Knief et al. 2016). To our knowledge, this is the first example of convergence of polymorphic inversions at overlapping genomic regions. There are cases of convergent evolution of functionally analogous chromosomes (or chromosomal segments) by structural variations such as sex chromosomes in animals and fungi (Fraser et al. 2004) and social chromosomes in fire ants (Purcell et al. 2014), but they involve structural variation on different chromosomes. These loci are associated with variation in several morphological traits in the zebra finch (Knief et al. 2016). Because the exact functions and fitness effects of these inversions in the blackcap are unclear, more detailed genetic and phenotypic analyses are needed, as well as more detailed characterisation of the inversion breakpoints. How general the convergence of polymorphic inversions is and its effects on local ancestries should be further studied with population genomics in other species covering a more complete phylogenetic context.
The inversions on chromosomes 28 and 30 are younger than the three loci discussed above. On chromosome 28, the absence of LD and genetic differentiation between the two arrangements around the locus seems paradoxical given the observed local relatedness pattern in which samples are clustered into three groups. One possible explanation could be subtle levels of recombination and/or gene conversion between the arrangements on chromosome 28, which might cause the observed differences in differentiation and LD patterns between the two loci on chromosomes 28 and 30. Population genetic analyses using phased data will provide more detailed insights on haplotype structure in these loci.
Selection against gene flow with potential effects of a sweep
One genomic island on chromosome 21 is under selection against gene flow by reduced recombination rates in the Azores and Cape Verde populations. Genetic variation in this genomic region is slightly lower compared to the chromosomal background, leaving the possibility of additional effects of a population-specific sweep. There are examples in which genomic regions with selection against gene flow also experience population-specific sweeps (e.g. Hejase et al. 2020, note though that in this example genomic patterns supporting these two scenarios were present in different populations). It is important to realise that similar pattern can also arise without population-specific sweeps in populations under isolation and/or selection against gene flow: stochasticity in fixation time in two populations can cause transient loss of genetic variation in only one population which can be mistaken as a signal of population-specific sweeps by local adaptation (Booker et al. 2021).
The small effect of a population-specific sweep, if any, may indicate that the sweep is partial and/or soft. Recent empirical studies suggest positive selection on standing genetic variation is common in early stages of the speciation continuum, leaving signatures of partial and soft sweeps along the genome (Delmore et al. 2018; Hejase et al. 2020). However, the possibility of population-specific sweeps should be considered with caution in the blackcap, because effects of variation in recombination landscape among populations on local genealogy have not been well studied yet. In addition, the Macaronesian island populations including Azores and Cape Verde experienced a reduction in effective population size after the population split from continental populations (Delmore et al. 2020), thus the inference of the effects of differentiation and sweeps should take population-specific demography into account. Identifying under which conditions relatedness patterns of class-2 genomic islands evolve requires not only empirical analyses but also simulations of a particular demography corroborating recombination landscape varying among populations, hard and soft selective sweeps, and selection against gene flow.
Linked selection at conserved and non-conserved recombination cold-spots
Recombination suppression and decreased nucleotide diversity overlapping with 17 class-3 genomic islands reflects the effect of population-non-specific long-term background selection on deviated local relatedness patterns, consistent with earlier work (Li and Ralph 2019). 11 of these regions contain putative (peri)centromeres and the other six had exceptionally high read depth without overlap with TRs, suggesting association with either other types of repeats or the absence of repeat variant in the reference assembly.
In population genomics, barrier loci against gene flow, and genes under local adaptation and parallel selection have been sought using summary statistics such as FST and cluster separation score (CSS) that rely on realised relatedness patterns (e.g. Irwin et al. 2018; Jones et al. 2012; Malinsky et al. 2015). However, it has also been pointed out that linked selection at recombination cold-spots can increase signals of these summary statistics by reduced absolute genetic diversity (Noor and Bennett 2009; Renaut et al. 2013; Cruickshank and Hahn 2014). In birds, the recombination landscape along the genome is relatively conserved across species (Singhal et al. 2015; Vijay et al. 2017). This facilitates long term effects of stable recombination cold-spots on deviated relatedness patterns at the same genomic regions in multiple populations and species (Burri 2017; Vijay et al. 2017). We interpreted the reduced nucleotide diversity to similar levels in all blackcap populations at recombination cold-spots as support for background selection (hitchhiking effects by purifying/negative selection on linked neutral variations) instead of selective sweeps (hitchhiking effects by positive selection on linked neutral variations), because it is unlikely that recurrent sweeps in populations with different effective population sizes result in the same level of reduced nucleotide diversity. The fact that many of class-3 genomic islands contain putative (peri)centromeric regions is consistent with reduced genetic diversity at (peri)centromeric regions observed in other species (Branca et al. 2011; Roesti et al. 2013; Delmore et al. 2015; Vijay et al. 2017), indicative for a general process with repeatable consequences. Our finding that more than half class-3 genomic islands are recombination cold-spots conserved between blackcaps and garden warblers also supports the effect of stable cold-spots on deviated relatedness patterns through long-term background selection. The lack of conservation of recombination suppression in the other seven class-3 genomic islands also indicates evolutionary heterogeneity of factors that deviate local ancestries.
Despite the relevance, we attempted neither to identify repetitive elements based on the resequence data nor to perform population genetic analyses on variation in the repetitive elements, because short-read sequencing that we used for resequencing is not suited for repetitive sequence analyses. Population genomics with repeat variants in addition to SNP-based analyses has become an option with the development of long-reads sequencing (Weissensteiner and Suh 2019; Weissensteiner et al. 2020), and should facilitate further investigations on the repetitive elements landscape in class-3 genomic islands.
Some class-3 genomic islands with deviated relatedness patterns (on chromosomes 3, 10, 17, and 18) have a triangular spread of samples on the PC1-PC2 space. This pattern could be explained by assuming presence of several non-recombining haplotypes at these regions (Ruiz-Arenas et al. 2019), which is in line with background selection at recombination cold-spots. In diploid species, genetic relatedness among samples at a site with three non-recombining haplotypes (A, B, C) can be represented on a 2D space as a triangular spread of samples (Supplementary Fig. 28). Specifically, homozygous samples (AA, BB, and CC) are clustered at the three nodes and heterozygotes (AB, BC, and AC) at the midpoints of the three edges of the triangle. If mutations and gene conversions (and possibly rare recombinations) are introduced, the spread for each genotype should widen, resulting in a triangular spread of samples as a whole on a 2D space, which is similar to the pattern we find in the four genomic islands.
We applied this perspective to interpret local relatedness patterns in two class-3 genomic islands on chromosomes 3 and 10 in non-conserved recombination cold-spots that overlap with results of Delmore et al. (2020), revealing evidence for migration phenotype-specific selection for and against one haplotype in the class-3 genomic islands on chromosomes 3 and 10. These genomic islands show reduced nucleotide diversity in all populations, suggesting the migration phenotype-specific selection having overlaid effects with general long-term background selection.
Conclusion
Overall, we revealed a heterogeneous genomic architecture of local ancestries in the Eurasian blackcap, an iconic migratory songbird. Deviated local ancestries are associated with recombination suppression, yet the mode of recombination suppression did vary. In many genomic regions with deviated local ancestries, recombination is stably suppressed irrespective of genotypes (inversions at class-1 genomic islands on chromosomes 6 and 14) and across species (conserved cold-spots in class-3 genomic islands), while some others show recombination suppression/reduction only in certain genotype (inversion heterozygotes in class-1 genomic islands on chromosomes 12, 28, 30), populations (class-2 genomic island), and species (non-conserved cold-spots in class-3 genomic islands). The evolutionary time scales and types of selection deviating local ancestries are also diverse, ranging from long-term background selection irrespective of populations or genotypes, positive and negative selection in populations with certain migratory phenotypes, and selection against gene flow, to balancing selection between inversion genotypes. In addition, not all recombination cold-spots coincide with deviated local ancestries, leaving it unclear under which conditions deviated local ancestries evolve. Simulations with diverse scenarios as well as detailed functional analyses of genes within these genomic islands will be the key to disentangling the evolution of such a genomic architecture. Our findings in the blackcap have provided insights into the genomic architecture of deviated local ancestries in a wild bird species that is a powerful system to study evolution of migratory behaviour. Such systems will be invaluable in determining general patterns of heterogeneous genomic evolution as well as their roles in evolution of complex behavioural traits that are difficult to address in classical model species.
Materials and Methods
de novo genome assembly
The de novo assembly of a chromosome-level blackcap reference genome was done with the Vertebrate Genomes Project pipeline v1.5 (Rhie et al. 2021). In brief, blood of a female from the non-migrant Tarifa, Spain population, was collected in 100% ethanol on ice and stored at −80°C (NCBI BioSample accession SAMN12369542). We chose a female in order to assemble both Z and W sex chromosomes.
The ethanol supernatant was removed and the blood pellet was resuspended in Bionano Cell Buffer in a 1:2 dilution. Ultra-long high molecular weight (HMW) DNA was isolated using Bionano agarose plug method, Bionano Frozen Whole Nucleated Blood Stored in Ethanol – DNA Isolation Guidelines (document number 30033) using the Bionano Prep Blood and Cell Culture DNA Isolation Kit. Four DNA extractions were performed yielding a total of 13.5 µg HMW DNA.
About 6 µg of DNA was sheared using a 26G blunt end needle (Pacbio protocol PN 101-181-000 Version 05) to ∼40 kb fragments. A large-insert Pacbio library was prepared using the Pacific Biosciences Express Template Prep Kit v1.0 following the manufacturer protocol. The library was then size selected (>15 kb) using the Sage Science BluePippin Size-Selection System. Sequencing was performed on a PacBio Sequel I instrument, in Continuous Long-Read (CLR) mode. The library was then sequenced on 8 PacBio 1M v3 smrtcells on the Sequel instrument with the sequencing kit 3.0 and 10 hours movie with 2 hours pre-extension time, yielding 77.51 Gb of data (∼66.29X coverage) with N50 read length averaging around 22,927 bp.
We used the unfragmented HMW DNA to generate a linked-reads library on the 10X Genomics Chromium (Genome Library Kit & Gel Bead Kit v2, Genome Chip Kit v2, i7 Multiplex Kit PN-120262). We sequenced this 10X library on an Illumina Novaseq S4 150bp PE lane to ∼60X coverage.
Unfragmented HMW HMW DNA was also used for Bionano Genomics optical mapping. Briefly, DNA was labeled using the Bionano Prep Direct Label and Stain (DLS) Protocol (30206E) and run on one Saphyr instrument chip flowcell. 136.31 Gb of data was generated (N50 = 301.9kb with a label density = 16.91 labels/100kb). Optical maps were assembled using Bionano Access, N50 = 27.48 Mb and total length = 1.41 Gb. Hi-C libraries were generated by Arima Genomics (https://arimagenomics.com/), Dovetail Genomics and sequenced on HiSeq X at ∼60X coverage following the manufacturer’s protocols. Arima Hi-C proximally ligated DNA was produced using the Arima-HiC kit v1, sheared and size selected (200 – 600 bp) with SRI beads, and fragments containing proximity-ligated DNA were enriched using streptavidin beads. A final Illumina lirbary was prepared sing the KAPA Hyper Prep kit following the manufacturer guidelines. FALCON v1.9.0 and FALCON unzip v1.0.6 where used to generate haplotype phased contigs, and purge_haplotigs v1.0.3 used to further sort out haplotypes (Guan et al. 2020). The phased contigs were first scaffolded with 10X Genomics linked reads using scaff10X 4.1.0 software, followed with Bionano Genomics optical maps using Bionano Solve single enzyme DLS 3.2.1, and Arima Genomics in-vitro cross-linked Hi-C maps using Salsa Hi-C 2.2 software (Ghurye et al. 2019). Base call errors were fixed using Arrow software with Pacbio long reads and Freebayes software with Illumina short reads. Manual curation was conducted using gEVAL software by the Sanger Institute Curation team (Howe et al. 2021). Curation identified 33 autosomes and Z and W chromosomes (plus 1 unlocalised). Autosomes were named in decreasing order of size. The total length of the primary haplotype assembly was 1,066,786,587 bp, with 99.14% assigned to chromosomes. There are a total of 601 contigs in 189 scaffolds, with a contig N50 of 7.4 Mb, and scaffold N50 of 73 Mb. The primary and alternate haplotype assemblies can be found under NCBI BioProject PRJNA558064, accession numbers GCA_009819655.1 and GCA_009819715.1.
Whole-genome resequencing
We resequenced 179 blackcaps, five garden warblers, and three African hill babblers (Supplementary Table 1), of which 110 blackcaps, all garden warblers and African hill babblers were already published in Delmore et al. (2020) (details in Supplementary Table 1). Blood samples were collected from the brachial vein and stored in 100% ethanol. High molecular weight genomic DNA was extracted with a standard salt extraction protocol or through the Nanobind CBB Big DNA Kit – Beta following the manufacturer’s instructions. Libraries for short insert fragments between 300 and 500 bp were prepared and were then sequenced for short paired-end reads on either Illumina NextSeq 500, HiSeq 4000 or NovaSeq 5000 (Supplementary Table 1).
We performed quality control of the reads with FastQC version 0.11.8 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads from all samples were mapped against the reference genome following an adjusted pipeline of Genome Analysis Toolkit (GATK version 4.1.7.0, McKenna et al. (2010)) and Picard version 2.21.9 (http://broadinstitute.github.io/picard/). After resetting the base quality of adapter bases in the sequenced reads to 2 with Picard MarkIlluminaAdapters, paired-end reads were mapped to the reference using BWA mem (Li 2013). To ensure that both unmapped mates and secondary/supplementary reads were marked for duplicates, we provided the reads as query sorted when Picard MarkDuplicates was run with the default pixel distance of 100 for reads from Illumina NextSeq 500 or with a pixel distance of 2,500 for reads from HiSeq 4000 and NovaSeq 5000. Due to low coverages, 10 samples (Supplementary Table 1) were sequenced multiple times. Alignment files for these samples (in BAM format) were merged with Picard MergeSamFiles. Per-sample quality control of BAM files using QualiMap version 2.2.1 (Okonechnikov et al. 2016); Picard CollectMultipleMetrics, CollectRawWgsMetrics and CollectWgsMetrics; and MultiQC version 1.8 (Ewels et al. 2016). We called all positions per sample in gVCF format using GATK HaplotypeCaller. To save computing time and memory, the genome was split in 10 equal parts before running the next steps of the pipeline. We combined 187 gVCF files (for all resequenced samples) using GATK CombineGVCFs for each of the 10 parts. We genotyped SNPs and indels using GATK GenotypeGVCFs to create 10 VCF files. These 10 VCF files were then merged using Picard GatherVcfs into one VCF file covering the whole-genome. From the VCF file, SNPs were selected (i.e. indels were excluded) using GATK SelectVariants, after which we filtered SNPs with the following criteria: QD < 2.5; FS > 45.0; SOR > 3.0; MG < 40; MQRankSum < −12.5; ReadPosRankSum < −8.0. We kept only blackcap samples in the VCF file with BCFtools (Danecek et al. 2021). We removed SNPs with per-site missingness greater than 10%, non-segregating sites, and non-biallelic sites with BCFtools and VCFtools (Danecek et al. 2011), yielding 102,753,802 SNPs in the 179 blackcaps.
Local PCA
Genotype tables were generated from the filtered VCF file with BCFtools and a custom script. For each chromosome, the table was read in R (version 3.5.3). Local principal component analysis (local PCA) was performed in R using the lostruct package (Li and Ralph 2019), with a sliding window of 1,000 SNPs, npc=3, k=3. In each chromosome, windows with MDS1 or MDS2 values deviated from the mode of the distribution by greater than 0.3 were defined as outliers. This threshold was determined by visualising the distribution of MDS1 and MDS2 values in each chromosome. A genomic island of deviated relatedness patterns was defined as a genomic interval with at least 10 outlier windows, taking the two furthest positions as the boundaries.
Genetic relatedness pattern at each genomic island was analysed with PCA using PLINK (Purcell et al. 2007). All genomic islands were visually classified into the three classes based on the PCA. Genomic islands in which samples were grouped in three clusters along either PC1 or PC2 were classified as class-1. Genomic islands in which samples of certain populations were deviated from samples of all other populations on PC1-PC2 space were classified as class-2. Other genomic islands were classified as class-3. Additional PCA was performed for the same genomic regions including the outgroup samples (five garden warblers and three African hill babblers).
Population genomics
Population genomic analyses were performed for class-1 (comparing inversion genotypes), class-2 (comparing population pairs), and class-3 (calculating nucleotide diversity in each population) genomic islands. π, dXY, and FST (Hudson’s estimator) were calculated in a 10-kb sliding window with the PopGenome package in R (Pfeifer et al. 2014). We asked whether observed π in class-1 and class-2 genomic islands is smaller, dXY and FST are greater, and Δπ is different than the chromosomal background by performing permutation tests (left-sided, right-sided, and two-sided respectively) with 10,000 times of re-sampling. In the permutation test, we shuffled the positions of genomic windows of the PopGenome outputs, rather than shuffling SNP positions before calculating π, dXY, and FST for computational reasons (the latter would require re-calculation of the window-based summary statistics 10,000 times).
For each inversion genotype, heterozygosity was defined as the number of segregating sites in samples with a certain inversion genotype divided by the number of segregating sites in all samples. We calculated heterozygosity with BCFtools and a custom script.
Coalescence time for each inversion genotype was estimated with MSMC2-decode. Up to four samples were selected for each genotype of each inversion based on callability using bamCaller.py (https://github.com/stschiff/msmc-tools), and in each sample coalescent time was estimated with MSMC2-decode (Schiffels and Durbin 2014; Malaspinas et al. 2016). A generalised linear mixed-effect model (GLMM) with a Poisson distribution treating individual samples as a random-effect variable and the discretised time index between 1 and 32 as the response variable was performed with the lme4 package in R (Bates et al. 2015). The model (T ∼ genotype × interval + (1|id)) was selected by checking residual deviance (for the goodness of fit) and the Akaike information criterion (AIC, for the prediction error). The distribution of the residuals was visually checked with a histogram, a QQ-plot, and residual plots using the ggResidpanel package in R (Goode and Rey 2019). Significance of the interaction term between the genotype and the interval (inside/outside of a class-1 genomic island of deviated relatedness patterns) was tested by analysis of variance on the GLMM using the car package in R (Fox and Weisberg 2019). Post hoc Z-tests with Bonferroni correction were performed using the multcomp and emmeans packages in R (Searle et al. 1980; Hothorn et al. 2008). For visualisation in Fig. 3M-O and Supplementary Fig. 12, the time index with the highest posterior probability of coalescent (chosen from 32 discretised time indices) was averaged across the four samples with the same genotype in each genomic window.
Linkage disequilibrium (LD) was calculated with VCFtools. Because we used unphased genotype data, we calculated squared correlation coefficient r2 between genotypes of a pair of loci with the --geno-r2 option.
We investigated introgression at the class-2 genomic island on blackcap chromosome 21 using VolcanoFinder (Setter et al. 2020). First, we calculated allele frequency at SNP sites in chromosome 21 in seven representative blackcap (sub)populations (Azores, Cape Verde, Canary Islands, Belgium, Gibraltar (southern Spain), and Guadarrama (central Spain). Second, we polarised the SNPs in blackcap populations using the genotypes of the five garden warbler samples in our initial VCF file. We defined the ancestral allele as “the allele that all garden warbler samples are homozygous”, and calculated derived allele frequency in the blackcap populations, excluding sites where genotypes are segregated in the garden warbler samples. Third, we calculated unfolded site frequency spectra (SFS) on chromosome 21 using the derived allele frequency. Finally, VolcanoFinder was run using the derived allele frequency and unfolded SFS.
Recombination rate
We inferred recombination landscape along blackcap chromosomes harbouring class-1, 2, and 3 genomic islands separately using Pyrho (Spence and Song 2019). Pyrho infers demography-aware recombination rates with a composite-likelihood approach from SNPs data of unrelated samples making use of likelihood lookup tables generated by simulations based on demography and sample size of each population. In all inferences, we used demography of focal populations inferred in Delmore et al. (2020). Before the recombination inference, focal samples were filtered and singletons were removed. For class-1 genomic islands, we used five samples for each of the three genotypes (AA, AB, and BB) at the inversion loci. We ran Pyrho with demography of medium-long distance migrants inferred in Delmore et al. (2020), with mutation rate of 4.6 × 10−9 per site per generation (Smeds et al. 2016), block penalty of 20, and window size of 50 kb. For class-2 genomic island on chromosome 21, we inferred population-specific recombination landscape of Azores, Cape Verde, continental resident, and medium-long distance migrants (represented by medium distance south-west migrants sampled in Belgium), using demography of each population respectively. For class-3 genomic islands, we ran Pyrho using blackcap samples from continental resident population. To compare recombination rate in class-3 genomic islands between the blackcap and garden warbler, we inferred recombination rates in garden warbler. We first inferred demography of garden warbler using PSMC-mode of MSMC2, because our resequenced garden warblers were not phased. After confirming that MSMC2 infers consistent demography irrespective of the input sample, we ran Pyrho using demography inferred from one individual (SylBor07) as the input, keeping the same values for the other parameters as the inferences in blackcap. We calculated average recombination rate in 50 kb sliding windows in blackcap and garden warbler using a custom script.
Simulations
Effects of selection on coalescent time at an inversion
In SLiM version 3.5 (Haller and Messer 2019), we simulated a 5 Mb long chromosome with a 3 Mb long inversion in a diploid population with 1,000 individuals. We set the mutation rate to 1 × 10−7 [per base per generation] and recombination rate to 1 × 10−6 [per base per generation]. The purpose of these simulations was to qualitatively assess the effect of an balancing selection between two arrangements at an inversion and background selection on coalescent time inference by MSMC2-decode in respect to chromosomal background, rather than quantitatively estimate expectation under the blackcap demography. As such, we kept the population size smaller than the blackcap effective population size and the mutation rate greater than assumed in order to minimise the time and computational resource for simulations, while allowing MSMC2-decode to run on the simulated data. We considered the following 3 × 3 = 9 conditions.
Inversion fitness
Neutral: Inversion genotype does not alter individual’s fitness.
Frequency-dependent selection: Fitness of inversion is maximum when the inversion frequency is 0.1.
Overdominance: Selection coefficient s = −0.05 and dominance h = −0.5 for the inverted arrangement.
Mutations
Neutral: All mutations are neutral.
Mixed: 80% of all mutations are neutral and the other 20% are deleterious (s = −0.05, h = 0.5).
Deleterious: All mutations are deleterious (s = −0.05, h = 0.5).
We ran 4,000 generations of burn-in to let mutations to reach an equilibrium, then introduced one copy of inversion. For each condition, we ran 10,000 replicates of simulations, recording inversion frequencies at every generation until it was removed or the simulation reaches 4,000 generations after the inversion event. We recorded mutations of all individuals in VCF at most 5 time points: 100, 500, 1,000, 2,000 and 4,000th generations (4,100, 4,500, 5,000, 6,000, and 8,000 including burn-in). For each VCF file, 4 individuals for each genotype were randomly selected, and subset VCF was generated using BCFtools. MSMC2-decode was performed the same way as our empirical analysis.
Effects of recombination suppression model on recombination rate inference at an inversion
We simulated two 5 Mb-long chromosomes with neutral mutation rate of 4.6 × 10−8 in a population of 1,000 individuals in SLiM. The purpose of these simulations was to investigate the effect of an inversion and additional recombination suppression on recombination rate inference and LD in general, rather than investigating the effects specific to blackcap demography. As such, we kept the population size smaller than the blackcap effective population size and the mutation rate greater than assumed in order to minimise the time and computational resource for simulations. We introduced a mutation (inversion marker) on one chromosome at 1 Mb position at the 50th generation. We simulated an inversion on the chromosome by suppressing recombination in an interval from 1 Mb to 4 Mb position if the inversion marker site was heterozygous. We defined additional suppression according to different scenarios (models 1-6). We applied negative frequency-dependent selection (fitness of inversion is 1 − (pinv − 0.2) where pinv is the frequency of the inversion allele). 1,000 generations after the inversion event, we recorded the mutations in all samples, making a VCF file including all samples. Although 1,000 generations is relatively short given the population size of 1,000, the haplotype structure at the inversion locus was stable in test runs of model-1 (inversion frequency of 0.2 without additional recombination suppression). Based on the genotype at 1 Mb position, we randomly chose 10 samples for each inversion genotype. Pyrho was run to estimate recombination rates using the chosen 10 samples, with the block penalty 50 and window size 50. LD was calculated in the same way as the empirical data described above.
Tandem repeats
The genomic distribution of read depth was analysed with SAMtools (Danecek et al. 2021) and custom scripts. TandemRepeatsFinder (Benson 1999) was run on the blackcap reference genome with the parameter set recommended on the documentation (trf </path/to/fasta> 2 7 7 80 10 50 500 -f -d -m -h). The output was formatted and summarised with a custom script.
Synteny and phylogenetic analysis
The reference genome of the zebra finch (taeGut1, also known as WUSTL v3.2.4)) was obtained from https://hgdownload.cse.ucsc.edu/goldenPath/taeGut1/bigZips/. The references genome of the garden warbler and rifleman was obtained from GenomeArk of the Vertebrate Genomes Project (https://vgp.github.io/genomeark/). The reference genome of the Bengalese finch was obtained from GigaDB (http://gigadb.org/dataset/view/id/100398/) (Colquitt et al. 2018). The VCF file of the zebra finches was obtained from https://doi.org/10.5061/dryad.fd24j (Singhal et al. 2015).
We performed synteny analysis between the blackcap and zebra finch with Satsuma with “-l 100 -n 8” options (Grabherr et al. 2010). The synteny was visualised with the circulize package in R (Gu et al. 2014).
We focused on blackcap chromosomes 12 and 6 for phylogenetic analysis excluding chromosome 14, because the zebra finch VCF did not have SNPs in the region syntenic to class-1 genomic island of blackcap chromosome 14. PCA on the regions of zebra finch genome syntenic to the blackcap inversions was performed with PLINK to determine inversion genotypes in zebra finch the same way as in the blackcap. Clear separation of three groups with PCA and heterozygosity appeared in regions of the zebra finch chromosomes 11 and 5 which are syntenic to the blackcap inversions on chromosomes 12 and 6 (Supplementary Fig. 17).
Based on the PCA in zebra finch genomic regions syntenic to blackcap inversions on the zebra finch chromosomes 11 and 5, we arbitrarily determined zebra finch samples with AA and BB genotypes. We made consensus sequences of the A and B haplotypes of blackcap chromosomes 12 and 6 and zebra finch chromosomes 11 and 5 based on allele frequencies at SNP sites in AA and BB samples with BCFtools and a custom script. Specifically, we edited the bases of the reference genome to the alternative allele where allele frequency was greater than 0.5 in each sample set. We restricted this procedure only within a genomic interval from 14,126,710 bp to 22,227,355 bp of the blackcap chromosome 12 and from 10,390 bp to 7,293,168 bp of the zebra finch chromosome 11, so that we can make one phylogeny using the consensus sequence in class-1 genomic island and another phylogeny in background genomic region of the same chromosome using the reference sequences.
We mapped the consensus sequence of the A and B haplotype of blackcap chromosome 12 and 6 and zebra finch chromosomes 11 and 5 on the chromosomes 12 and 6 of the blackcap reference genome with minimap2 (Li 2018). We mapped whole-genome assembly of the garden warbler, Bengalese finch, and rifleman on the blackcap whole-genome reference with minimap2 and extracted chromosomes 12 and 6. This resulted in alignment of the A and B haplotypes of the blackcap and zebra finch, the garden warbler, the Bengalese finch, and the rifleman on the blackcap reference chromosomes 12 and 6. We made aligned FASTA file from these alignment files using SAMtools, BCFtools, and custom scripts. From the alignment files, we made mask files specifying the query sequences which were not mapped as expected for unique homologous regions, by filtering out positions where the depth was not 1. We merged the mask files for all sequences and applied it to the aligned FASTA file with BEDTools (Quinlan and Hall 2010), so that genomic regions where all sequences were mapped properly could be used for phylogeny inference.
We inferred the phylogenetic relationship among the sequences within and outside class-1 genomic island of blackcap chromosomes 6 and 12 with RAxML (version 8.2.9, Stamatakis 2006), with “-m GTRGAMMA -N 1000” options, assessing 1,000 trees with a maximum likelihood method. We evaluated the validity of the nodes on the best trees with 1,000 times of bootstrapping. The phylogenetic trees were visualised with the ape package in R (Paradis and Schliep 2019).
To compare the results of local PCA with Delmore et al. (2020) which use different versions of blackcap reference genomes, we performed synteny analysis between the two assemblies using Satsuma with “-l 100 -n 8” options (Grabherr et al. 2010).
Acknowledgements
This work was supported by the Max Planck Society (Max Planck Research Group grant MFFALIMN0001 to ML) and DFG Research Infrastructure NGS_CC (project 407495230) as part of the Next Generation Sequencing Competence Network (project 423957469). NGS analyses were carried out at the Competence Centre for Genomic Analysis (Kiel). We thank Britta Meyer, Tianhao Zhao, Hanna Koch, Conny Burghardt, and Sven Künzel for DNA extraction, library preparation, and/or sequencing; Gernot Segelbacher, Thord Fransson, Christos Barboutis, Zura Javakhishvili, Stuart Bearhop, Olof Hellgren, Staffan Bensch, Martim Melo, Chris Perrins, Álvaro Ramírez, and Helena Batalha for providing us with samples; Kira Delmore for initial blackcap resequencing data set; Sonal Singhal and Molly Przeworski for zebra finch resequencing data; Julien Dutheil, Diethard Tautz, Linda Odenthal-Hesse, Tobias Kaiser, Gustavo Valadares Barroso, and Carolina Peralta for discussion. Permits were provided to JCI for samples collected in Morocco (Haut Commissariat aux Eaux et Forets et a la Lutte Contre la Desertification, 206/2011, 13 Jan 2011), Cape Verde (Ministerio do Ambiente - Habitacao e Ordenamento do Territorio, 18/CITES/DNA, 17 Dec 2015), Canary Islands (Ref.: 2012/0710), Madeira (Ref.: 02/2016), and the Azores (Instituto da Conservacao da Natureza e da Biodiversidade, 171/2008, 31 Mar 2009); to JP-T for samples collected in Gibraltar and Cazalla de la Sierra (Consejeria de Medio Ambiente, 50.725.548-Z, 12 May 2011), Álava (Arabako Foru Aldundia, Esp Zenb exp 11/32, 12 Apr 2011) and Guadarrama (Consejería de Medio Ambiente Vivenda y Ordenacion del Territorio, 10/160876.9/10, 12 Apr 2010). Thord Fransson received permits for samples collected in Stockholm (Stockholms djurförsöksetiska nämnd Dnr N 16/16 2016-02-25). Permits were provided to Gernot Segelbacher for samples collected in the remaining locations (Regierungspräsidium Freiburg, 55–8853.17/0).