Distinct patterns of genetic variation at low-recombining genomic regions represent haplotype structure

Genetic variation of the entire genome represents population structure, yet individual loci can show distinct patterns. Such deviations identified through genome scans have often been attributed to effects of selection instead of randomness. This interpretation assumes that long enough genomic intervals average out randomness in underlying genealogies, which represent local genetic ancestries. However, an alternative explanation to distinct patterns has not been fully addressed: too few genealogies to average out the effect of randomness. Specifically, distinct patterns of genetic variation may be due to reduced local recombination rate, which reduces the number of genealogies in a genomic window. Here, we associate distinct patterns of local genetic variation with reduced recombination rates in a songbird, the Eurasian blackcap (Sylvia atricapilla), using genome sequences and recombination maps. We find that distinct patterns of local genetic variation reflect haplotype structure at low-recombining regions either shared in most populations or found only in a few populations. At the former species-wide low-recombining regions, genetic variation depicts conspicuous haplotypes segregating in multiple populations. At the latter population-specific low-recombining regions, genetic variation represents variance among cryptic haplotypes within the low-recombining populations. With simulations, we confirm that these distinct patterns of haplotype structure evolve due to reduced recombination rate, on which the effects of selection can be overlaid. Our results highlight that distinct patterns of genetic variation can emerge through evolution of reduced local recombination rate. Recombination landscape as an evolvable trait therefore plays an important role determining the heterogeneous distribution of genetic variation along the genome.

Populations were defined based on the geographic location, migratory phenotype, and genomic-wide population structure. B, C. Genome-wide PCA illustrating population structure. D. Distribution of outlier regions based on local PCA using lostruct. E, F Inferred recombination rates along two exemplified chromosomes (chromosomes 1 and 14) in two blackcap populations (cont_medlong and Azores). In D-F, purple and green shades respectively indicate positions of outliers that coincide with species-wide and population-specific low-recombining regions. cont_medlong: medium and long distance migrant population breeding on the continent; cont_short: short distance migrant population breeding on the continent; cont_res: resident (non-migrant) population breeding on the continent. All island populations (Canary, Madeira, Azores, Cape Verde, Mallorca, and Crete) are resident. A. A putative inversion. Three clusters correspond to combination of two non-recombining alleles possessed by individuals, depicted as AA, AB, and BB. LD calculated using AA individuals is not elevated, in line with heterozygote-specific recombination suppression at an inversion locus (Sup. Fig.  12). B. A species-wide low-recombining region with six loose clusters of individuals. LD calculated using subset individuals was elevated, suggesting genotype-non-specific recombination suppression. C. A population-specific low-recombining region. The variance in genetic distances between individuals of the low-recombining populations (Azores (blue) and Cape Verde (light blue)) is greater than between other pairs of individuals (top). LD calculated using individuals of the low-recombining populations is elevated (bottom). Second, to investigate the effects of population-specific reduction in local recombination rate, we performed simulations under two scenarios. In both scenarios, three populations 153 (pop1, pop2, and pop3) and their ancestral population had 1,000 diploid individuals, and pop1 154 evolved a reduced local recombination rate. The difference between the two scenarios was the 155 timing of introduction of reduced recombination rate. In the first scenario (Sup. Fig. 19),  176 Selection is known to cause distinct patterns of genetic variation (Nielsen, 2005). To test 177 whether the outlier regions based on lostruct identified in the blackcap genome are also 178 targets of selection, we measured nucleotide diversity (π) and Tajima's D in each population, 179 as well as ratio between non-synonymous and synonymous substitutions (d N /d S ) for annotated 180 genes. Many species-wide low-recombining regions showed reduced nucleotide diversity (Sup.  216 We discuss our findings from the perspective of underlying genealogies. We first define  We showed that some distinct patterns of genetic variation are associated with species-wide  We also demonstrated with simulations that distinct patterns of genetic variation at 296 population-specific low-recombining regions represent cryptic haplotype structure within the 297 low-recombining population. The haplotype structure is only cryptic and less apparent than   The biological implication is about evolution of recombination rates and genetic variation 374 along the genome. Based on our findings of a link between these, we predict that organisms 375 with more conserved recombination landscape along the genome may have more conserved 376 genomic landscapes of distinct patterns of genetic variation (Fig. 6B). In other words, the  subsets (to allow parallelisation of the following variant calling step) with GATK CombineGVCFs. 468 We genotyped SNPs and INDELs using GATK GenotypeGVCFs, and the 10 subsets were Azores, Cape Verde, continental resident, and medium-long distance migrants (represented 537 by medium distance south-west migrants). We computed mean recombination rate in 10 kb 538 sliding windows for each population.

539
To test association between local PCA outlier regions and low-recombining regions, we  Table 5). To investigate whether some of these six positions represent inversion breakpoints, 577 we asked whether the soft-clipped segments of the reads have homologous sequences at the 578 other end of the outlier regions. We extracted soft-clipped segments of reads mapped at the 579 focal six positions in AB and BB individuals using a custom script, and re-mapped these 580 segments (instead of the entire reads) to the blackcap reference using BWA mem. We computed 581 the depth of mapped segments in each position using SAMtools (Sup. Table 5). 582 10x linked read 583 We used an independent set of blackcap individuals (hereafter "10x individuals") whose  Fig. 16).

Selection in blackcaps
To test for selection in different outlier regions and to compare them with the genome-wide 634 base line, we computed nucleotide diversity (π) and Tajima Table 6. given the population size of 1,000, the haplotype structure at the inversion locus was stable in test runs of model-1 (inversion frequency of 0.2 without additional recombination suppression).

690
Based on the genotype at the marker, we randomly sampled 10 individuals for each inversion 691 genotype. Pyrho was run to estimate recombination rates using the sampled 10 individuals, 692 with the block penalty 50 and window size 50. The inferred recombination landscape is in 693 Sup. Fig. 11. Kolmogorov-Smirnov test, in three pairs of populations (pop1-pop2, pop1-pop3, pop2-pop3). 713 We counted the number of significant pairs of populations (0, 1, 2, or 3) for each time point of 714 each replicate. We compared between the low-recombining and normally recombining regions 715 the number of pairs of populations with distinct distribution in PCA (Sup. Fig. 24).

Population-specific reduction of recombination rate
To investigate how evolution of low-recombining regions in population(s) affect patterns of PCAs we identified 5% SNPs with the highest loadings to the first two PC axes. We analysed 744 these mutations on the underlying genealogies using tskit. Specifically, we investigated 745 whether mutations originating from the low-recombining population were enriched in the high-loading mutations (Sup. Fig. 20C, G) with a χ 2 test. We also assessed whether multiple 747 mutations originating in the low-recombining population occurring on the same genealogical 748 branches (i.e. mutations on the same ancestral haplotypes) were enriched in the high-loading 749 mutations (Sup. Fig. 20D, H). For this, we compared the number of mutations sharing