Abstract
Population bottlenecks have a profound impact on the genetic makeup of a species including levels of deleterious variation. How reduced selection efficacy and purging interact is known from theory but largely lacks empirical support. Here, we analyze patterns of genome-wide variation in 60 genomes of six ibex species and the domestic goat. Ibex species that suffered recent severe bottlenecks accumulated deleterious mutations compared to other species. Then, we take advantage of exceptionally well-characterized repeated bottlenecks during the restoration of the near-extinct Alpine ibex and show that experienced bottleneck strength correlates with elevated individual inbreeding. Strong bottlenecks led to the accumulation of mildly deleterious mutations and purging of highly deleterious mutations. We show in a simulation model that realistic bottleneck strengths can indeed simultaneously purge highly deleterious mutations during overall mutation accumulation. Genome-wide purging of highly deleterious mutation load over few generations in the wild has implications for species conservation efforts.
Main text
Dramatic, temporary reductions in population size – so-called bottlenecks – occur in nearly all plant and animal populations including humans 1. These demographic changes have important consequences for wildlife management and the conservation of endangered species 2, but they also have profound consequences including genetic disorders e.g 3–7. Increased genetic drift and inbreeding due to bottlenecks lead to loss of neutral genetic variation, a reduced efficacy of natural selection, and increased expression of deleterious recessive mutations 8–10. The expression of recessive mutations under inbreeding creates the potential for selection to act against these mutations, a process known as purging. Purging reduces the frequency of deleterious mutations depending on the population size, the degree of dominance, and the magnitude of the deleterious effects 11. Unless population sizes are extremely low, bottlenecks tend to purge highly deleterious, recessive mutations 11,12. On the other hand, genetic drift during bottlenecks reduces the efficacy of selection 13. This allows mildly deleterious mutations to drift to substantially higher frequencies 6,14 The combination of purging and reduced selection efficacy generates complex dynamics of deleterious mutation frequencies following population bottlenecks 11,12,15–17.
Theory predicts how reduced selection efficacy and purging impact the mutation load through bottlenecks 11,15,16,18, but genetic evidence from wild populations is rare 19–21. Most previous research used changes in fitness to infer possible purging events 18,21–24, but changes in fitness can result from causes unrelated to purging such as adaptation to specific environmental conditions or the fixation of deleterious mutations 10,25. One study provided direct evidence for purging of the most deleterious mutations in isolated mountain gorilla populations that split off larger lowland populations ~20’000 years ago 19. However, it remains unknown how more complex or recent demographic events affect levels of deleterious mutations in the wild. Here, we take advantage of exceptionally well characterized repeated bottlenecks during the reintroduction of the near-extinct Alpine ibex to retrace the fate of deleterious mutations. Alpine ibex were reduced to ~100 individuals in the 19th century in a single population in the Gran Paradiso region of Northern Italy 26. In less than a century, a census size of ca. 50’000 individuals has been re-established across the Alps. Thus, the population bottleneck of Alpine ibex is among the most dramatic recorded for any successfully restored species. Most extant populations experienced at least three bottlenecks leaving strong footprints of low genetic diversity 27,28.
A) Geographical distribution and IUCN conservation status of ibex and wild goat species (LC: Least concern, V: Vulnerable, NT: Near threatened 60). Sample izes: C. ibex: N=29, C. pyrenaica: N=4, C. aegagrus: N=6, C. sibirica: N=2, C. falconeri: N=1, C. nubiana: N=2. B) Maximum likelihood phylogenetic analyses, C) nucleotide diversity, D) proportion of the genome with runs of homozygosity (ROH) longer than 2.5 Mb and E) percentage of polymorphic sites within species that segregate highly deleterious mutations.
We analyze 60 Capra genomes covering Alpine ibex (C. ibex), five additional wild goats and the domestic goat, we find exceptionally low genome-wide variation and an accumulation of deleterious mutations in Alpine ibex (Figure 1). Both nucleotide diversity and number of heterozygous sites per kb sequenced was lowest in Alpine ibex and Iberian ibex (C. pyrenaica), the two species that experienced the strongest recent bottlenecks (Figure 1C, Figure S2, Tables S1 and S2). Genome-wide diversity was highest in Siberian ibex (C. sibirica), which have large and relatively well-connected populations 29.
Genomes of the Siberian and Nubian ibex (C. nubiana) and of some domestic goat (C. aegagrus hircus) showed the least evidence for recent inbreeding estimated by genome-wide runs of homozygosity (ROH) 30. In contrast, the genomes of some Alpine ibex, the Markhor (C. falconeri), and some domestic goat individuals contained more than 20% ROH (Figure 1D, Figures S3A and B, Table S3). Overall, there was clear genomic evidence that the near extinction and recovery of the Alpine ibex resulted in substantial genetic drift and inbreeding, opening the possibility for purging and accumulation of deleterious mutations.
We analyzed all Capra genomes for evidence of segregating deleterious mutations (Figure S6). We restricted our analyses to autosomal coding sequences with evidence for transcriptional activity in Alpine ibex organs. We further removed sites with low genomic evolutionary rate profiling (GERP) 31 scores yielding a total of 370’853 SNPs (Table S4). We functionally annotated SNP variants for the expected impact on the protein function. We found that across all seven Capra species 0.17% of these SNPs carried a highly deleterious variant with the majority incurring a stop-gain mutation (Table S4). We found that the proportion of highly deleterious mutations varied substantially among Capra species (Figure 1E, Figures S7A to D). The proportion of highly deleterious variants segregating within species was inversely correlated with nucleotide diversity (Pearson, df=5, r=-0.86, p=0.012). Hence, the Capra species with the smallest populations or the most severe population size reductions show an accumulation of deleterious mutations.
Both Alpine and Iberian ibex experienced severe bottlenecks due to overhunting and habitat fragmentation. Historic records indicate that Alpine ibex suffered a bottleneck of ~100 individuals at the end of the 19th century and Iberian ibex a bottleneck of ~1000 individuals 32 (Table S1). We first looked for evidence of purging in the allele frequency spectra of mutation classes of varying severity. We focused only on derived sites that were polymorphic in at least one of the two sister species (Figure 2A, S8). We found that frequency distributions of high and moderate impact mutations in Alpine ibex were downwards shifted indicating purifying selection (Figure 2C and GERP score analyses in Figure S9A). Short indels (≤ 10 bp) in coding sequences revealed a similar shift towards lower frequencies (Figure S9B). This is consistent with stronger selection acting against highly deleterious mutations in Alpine ibex.
A) Population sampling locations of Iberian ibex (left, grey circles) and Alpine ibex (right, colored circles). Each filled circle represents a population. Circles with a black outline indicate the first three reintroduced populations in Switzerland that were used for all subsequent population reintroductions of Alpine ibex. Colors associate founder and descendant populations (see also Figure 3A). Site frequency spectra for neutral (modifier), mildly (moderate impact) and highly deleterious (high impact) mutations for (B) Iberian and (C) Alpine ibex. D) Rxy analysis contrasting Iberian with Alpine ibex across the spectrum of impact categories. Rxy < 1 indicates a relative frequency deficit of the corresponding category in Alpine ibex compared to Iberian ibex. E) Individual homozygote counts per impact category for Iberian (light green) and Alpine ibex (dark green).
To test whether Alpine ibex indeed showed evidence for purging of deleterious mutations compared to Iberian ibex, we calculated the relative number of derived alleles Rxy 33 for each mutation impact category (Figure 2D). We used a random set of intergenic SNPs for standardization, which makes Rxy robust against sampling effects and population substructure 33. Low and moderate impact mutations (i.e. mildly deleterious mutations) showed a minor excess in Alpine ibex compared to Iberian ibex, indicating a higher load in Alpine ibex. In contrast, we found that highly deleterious mutations were strongly reduced in Alpine ibex compared to Iberian ibex (Figure 2D). Strikingly, the proportion of SNPs across the genome segregating a highly deleterious mutation is higher in Alpine ibex (Figure 1E), but Rxy shows that highly deleterious mutations have a pronounced downwards allele frequency shift in Alpine ibex compared to Iberian ibex (Figure 2C). Furthermore, the number of homozygous sites with highly deleterious mutations per individual were considerably lower in Alpine ibex than Iberian ibex (Figure 2E). Together, this shows that highly deleterious mutations were substantially purged in Alpine ibex. We also found evidence for the accumulation of mildly deleterious mutations through genetic drift in Alpine ibex.
Consistent with the fact that all extant Alpine ibex originate from the Gran Paradiso, this population occupies the center of a principal component analysis (Figure 3A-B, Figure S11A-B; 28). The first populations re-established in the Alps were already clearly distinct from the Gran Paradiso source population and showed reduced nucleotide diversity (Figure 3A, C), having experienced 1 or 2 additional bottlenecks 27. These initial three reintroduced populations were used to establish additional populations, which underwent a total of 3-4 bottlenecks. These additional bottlenecks lead to further loss of nucleotide diversity and genetic drift, as indicated by the increasing spread in the principal component analysis (Figure 3A-C). An exceptional case constitutes the Alpi Marittime population, which was established through the translocation of 25 Gran Paradiso individuals of which only six successfully reproduced 34. As expected from such an extreme bottleneck, Alpi Marittime showed strong genetic differentiation from all other Alpine ibex populations and highly reduced nucleotide diversity (Figures 3B-C; 35). To estimate the expected strength of drift experienced by different populations, we estimated effective population sizes through the long-term harmonic mean population sizes based on demographic records spanning the near century since establishment 36,37. We found that both the nucleotide diversity and the individual number of heterozygous sites per kb decreased with smaller long-term population size (Figure 3C, Figure S12). In parallel to genetic drift, inbreeding was also higher in the populations with the lowest harmonic mean population sizes. Genomes from the Gran Paradiso source population generally showed the lowest proportions of the genome affected by ROH, while reintroduced populations of lowest effective population size had the highest proportions of the genome affected by ROH (Figure 3D and Figure S3).
A) Schematic showing the recolonization history and population pedigree of Alpine ibex. Locations include also zoos and the population Pilatus (pi), which was not sampled for this study but is known to have contributed to the population Oberbauenstock (ob). am: Alpi Marittime, gp: Gran Paradiso; ih: Zoo Interlaken Harder; al: Albris; bo: Bire Öschinen; br: Brienzer Rothorn; ob: Oberbauenstock; pl: Pleureur; rh: Rheinwald; wh: Weisshorn; pi: Pilatus; pp: Wildpark Peter and Paul. The grey circle represents a population that was founded from more than one population. Figure elements were modified from Biebach and Keller (2009) with permission. B) Principal component analysis of all Alpine ibex individuals included in the study. C) Nucleotide diversity per population. D) Proportion of the genome within runs of homozygosity (ROH) longer than 2.5 Mb. E) Rxy analysis contrasting the strongly bottlenecked Alpi Marittime population with all other Alpine ibex populations across the spectrum of impact categories. Rxy < 1 indicates a relative frequency deficit of the corresponding category in the Alpi Marittime population. Circles with a black outline indicate the first three reintroduced populations in Switzerland that were used for all subsequent population reintroductions of Alpine ibex. Colors associate founder and descendant populations.
Bottlenecks should affect deleterious mutations by randomly increasing or decreasing allele frequencies at individual loci. As predicted from theory, we find that individuals from populations that underwent stronger bottlenecks carry significantly more homozygotes for modifier, low and moderate impact mutations (i.e. nearly neutral and mildly deleterious mutations; Figure 4A). In contrast, individuals showed no meaningful difference in number of homozygotes for high impact (i.e. highly deleterious) mutations across populations. The stability in the number of homozygotes for high impact mutations through successive bottlenecks despite a step-wise increase in the number of homozygotes for weaker impact mutations, strongly suggests that purging occurred over the course of the Alpine ibex reintroductions. This finding was confirmed using an alternative categorization of deleterious mutation load based on phylogenetic conservation based GERP scores (Figure S13). Because the above findings are contingent on a model where deleterious mutations are recessive, we also analyzed the total number of derived alleles per individual. We find a consistent but less pronounced increase in total number of derived alleles per individual for nearly neutral and mildly deleterious mutations (Figure 4B). In contrast, the total number of derived alleles for highly deleterious mutations did not correlate with the strength of bottleneck and was lowest in the most severely bottlenecked Alpi Marittime population (Figure 4B), suggesting that the most deleterious mutations were purged in this population. The Rxy statistics showed a corresponding strong deficit in the Alpi Marittime population (Figure 3E). This is consistent with substantially more purging in the most bottlenecked Alpine ibex population.
(A) Homozygote counts and (B) allele counts per individual for each Alpine ibex population. The schematic between A and B indicates the harmonic mean of the census size of each population, which is inversely correlated with the strength of drift. *) Estimated numbers. Colors associate founder and descendant populations (see also Figure 3A).
We analyzed the predicted protein truncation by highly deleterious mutations using homology-based inferences. Focusing on high-impact mutations segregating in Alpine ibex, we found that nearly all mutations disrupted conserved protein family (PFAM) domains encoded by the affected genes (Figure 5).
The localization of protein family (PFAM) domains are highlighted in dark. Red dots indicate the relative position of a highly deleterious mutation segregating in Alpine ibex. The frequencies of highly deleterious mutations are summarized for Iberian ibex and three subsets of Alpine ibex. The demographic history is shown with a schematic.
To ascertain whether accumulation and purging of different mutation classes is indeed expected to occur in the demographic context of the Alpine ibex reintroductions, we parametrized an individual-based forward simulation model with the demographic record38 (Figure 6A). The model included all populations relevant for the founding of the populations under study and was parametrized with the actual founder size (Figure 6A, S15, Table S1). We used Rxy to analyze the evolution of deleterious mutation frequencies through the reintroduction bottlenecks. The simulations showed a deficit of highly deleterious mutations after the reintroduction bottlenecks consistent with purging (Figure 6B). The most bottlenecked Alpi Marittime population also showed evidence of purging of the highly deleterious mutations in the simulated dataset but simultaneously accumulation of mildly deleterious mutations (Figures 6C, D). Consistent with evidence from Rxy, the simulations showed that the number of derived mildly deleterious homozygotes increased with the strength of drift, while no increase was found for highly deleterious mutations (Figure 6D, Figures S17-S20).
A) Demographic model used for the individual-based simulations. The model was parametrized using census data and historical records (see methods). Bold numbers represent the carrying capacities defined as the harmonic mean of the census size. Numbers not in bold represent the number of individuals released to found each population. If a population was established from two source populations, the individual numbers are separated by commas. *) Upwards adjusted harmonic means of the census size (historical records were ih=16, Zoo Interlaken Harder, and pp = 20, Wildpark Peter and Paul). The adjustment was necessary to prevent extinction of zoo populations. **) Census numbers were estimated based on historical records of the population but no long-term data census data was available. B) Relative frequency comparison (Rxy) of Alpine ibex just before and after the species bottleneck and recolonization. C) Rxy analysis contrasting the strongly bottlenecked Alpi Marittime population with all other Alpine ibex populations across the spectrum of impact categories. D) Individual homozygote counts per impact category. Boxplots summarized 100 population means across simulation replicates. Colors associate founder and descendant populations (see also Figure 3A).
Ibex species with recently reduced population sizes accumulated deleterious mutations compared to closely related species. This accumulation was particularly pronounced in the Iberian ibex that experienced a severe bottleneck and Alpine ibex that went nearly extinct. We show that even though Alpine ibex carry an overall higher mutation burden than related species, the strong bottlenecks imposed by the reintroduction of populations purged highly deleterious mutations, the most bottlenecked population (Alpi Marittime) showing the most purging. However, purging was only effective against the most highly deleterious mutations. Empirical evidence for purging in the wild is scarce 18,19. Here, we show that a few dozen generations were sufficient to reduce the burden of highly deleterious mutations. This suggests that purging may occur widely in populations undergoing severe bottlenecks. It is important to note that mildly deleterious mutations actually accumulated over the course of the reintroduction, consistent with less efficient selection against mildly deleterious mutations in small populations. Hence, the overall mutation load may have increased with bottleneck strength. This is consistent with the finding that population-level inbreeding, which is a strong indicator of past bottlenecks, is correlated with lower population growth rates in Alpine ibex (Bozzuto et al. in review). Our empirical results from the Alpine ibex reintroduction are in line with theoretical predictions that populations with an effective size below 100 individuals can accumulate a substantial burden of mildly deleterious mutations within a relatively short time. The burden of deleterious mutations evident in Iberian ibex supports the notion that even population sizes of ~1000 still accumulate mildly deleterious mutations. High loads of deleterious mutations have been shown to increase the extinction risk of a species 39. Thus, conservation efforts aimed at keeping effective population sizes above a minimum of 1000 individuals 2 are well justified.
Methods
Genomic data acquisition
DNA samples from 29 Alpine ibex, 4 Iberian ibex, 2 Nubian ibex, 2 Siberian ibex and 1 Markhor individuals were sequenced on an Illumina Hiseq2500 or Hiseq4000 to a depth of 15-38 (median of 17). Table S2 specifies individual sampling locations. Libraries were produced using the TruSeq DNA Nano kit. Illumina sequencing data of 6 Bezoar and 16 domestic goat (coverage 6x – 14x, median 12x) were generated by the NextGen Consortium (https://nextgen.epfl.ch). The corresponding raw data was downloaded from the EBI Short Read Archive: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/.
Read alignment and variant calling
Trimmomatic v.0.36 40 was used for quality and adapter trimming before reads were mapped to the domestic goat reference genome (version CHIR1, 41) using Bowtie2 v.2.2.5 42. MarkDuplicates from Picard (http://broadinstitute.github.io/picard, v.1.130) was used to mark duplicates. Genotype calling was performed using HaplotypeCaller and GenotypeGVCF (GATK, v.3.6 43,44). VariantFiltration of GATK was used to remove single nucleotide polymorphisms (SNP) if: QD <2.0, FS > 40.0, SOR > 5.0, MQ < 20.0, −3.0 > MQRandkSum > 3.0, −3.0 > ReadPosRankSum > 3.0 and AN < 62 (80% of all Alpine ibex individuals). Indels up to 10 bp were also retained and filtered using the same filters and filter parameters, except for not including the filter MQRankSum, because this measure is more likely to be biased for indels of several base pairs. Filtering parameters were chosen based on genome-wide quality statistics distributions (see Figures S21 – S38). Variant positions were independently validated by using the SNP caller Freebayes (v1.0.2-33-gdbb6160 45) with the following settings: --no-complex --use-best-n-alleles 6 --min-base-quality 3 --min-mapping-quality 20 --no-population-priors --hwe-priors-off. To ensure high-quality SNPs, we only retained SNPs that were called and passed filtering using GATK, and that were confirmed by Freebayes. Overall, 97.5 % of all high-quality GATK SNP calls were confirmed by Freebayes. This percentage was slightly lower for chromosome X (96,7%) and unplaced scaffolds (95.2%). We tested whether the independent SNP calls of GATK and Freebayes were concordant and we could validate 99.6% of the biallelic SNPs. We retained genotypes called by GATK.
The total number of SNPs detected was 59.5 million among all species. Per species, the number of SNPs ranged from 21.9 million in the domestic goat (N=16) to 2.0 million in Markhor (N=1, Table S2).
RNA-seq data generation
Tissue samples of a freshly harvested Alpine ibex female were immediately conserved in RNAlater (QIAGEN) in the field and stored at −80°C until extraction. The following ten organs were sampled: retina/uvea, skin, heart, lung, lymph, bladder, ovary, kidney, liver and spleen. RNA was extracted using the AllPrep DNA/RNA Mini Kit from Qiagen following the manufacturer’s protocol. Homogenization of the samples was performed using a Retsch bead beater (Retsch GmbH) in RLT plus buffer (Qiagen). RNA was enriched using a PolyA enrichment protocol implemented in the TruSeq RNA library preparation kit. Illumina sequencing libraries were produced using the Truseq RNA stranded kit. Sequencing was performed on two lanes of an Illumina Hiseq4000.
Genetic diversity and runs of homozygosity
Genetic diversity measured as individual number of heterozygous sites and nucleotide diversity were computed using vcftools 46. Runs of homozygosity were called using BCFtools/RoH 47, an extension of the software package BCFtools, v.1.3.1. BCFtools/RoH uses a hidden Markov model to detect segments of autozygosity from next generation sequencing data. Due to the lack of a detailed linkage map, we used physical distance as a proxy for recombination rates with the option -M and assuming 1.2cM/Mb following sheep recombination rates 48. Smaller values for -M led to slightly longer ROH (Figures S3–S5). Because of small per population sample size, we decided to fix the alternative allele frequency (option --AF-dflt) to 0.4. Estimates for the population with the largest sample size (Gran Paradiso, N=7) were very similar if actual population frequencies (option --AF-estimate sp) were used (Figures S4 and S5). Option --viterbi-training was used to estimate transition probabilities before running the HMM. Running the analysis without the option --viterbi-training led to less but longer ROH (Figures S3-S5).
Identification of high-confidence deleterious mutations
Three lines of evidence were used to identify high-confidence deleterious mutations. First, variants leading to a functional change are candidates for deleterious mutations. We used snpEff 49 v.4.3 for the functional annotation of each variant. The annotation file ref_CHIR_1.0_top_level.gff3 was downloaded from: ftp://ftp.ncbi.nlm.nih.gov/genomes/Capra_hircus/GFF and then converted to gtf using gffread. Option -V was used to discard any mRNAs with CDS having in-frame stop codons. SnpEff predicts the effects of genetic variants (e.g. stop-gain variants) and assesses the expected impact. The following categories were retrieved: high (e.g. stop-gain or frameshift variant), moderate (e.g missense variant, in-frame deletion), low (e.g. synonymous variant) and modifier (e.g. exon variant, downstream gene variant). In the case of overlapping transcripts for the same variant, we used the primary transcript for further analysis. A total of 49.0 % of all detected SNPs were located in intergenic regions, 43.2 % in introns, 6.5 % down- and upstream of genes. A total of 0.7% of variants were within CDS, of which ~60% were synonymous and ~40% were missense variants. Overall, 0.002 % were stop-gain mutations.
Protein sequences were annotated using InterProScan v.5.33 by identifying conserved protein family (PFAM) domains 50.
Second, we assessed the severity of a variant by its phylogenetic conservation score. A non-synonymous variant is more likely to be deleterious if it occurs in a conserved region of the genome. We used GERP conservation scores, which are calculated as the number of substitutions observed minus the number of substitutions expected from the species tree under a neutral model. We downloaded GERP scores (accessed from http://mendel.stanford.edu/SidowLab), which have been computed for the human reference genome version hg19. The alignment was based on 35 mammal species but did not include the domestic goat (see https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=allHg19RS_BW for more information). Exclusion of the focal species domestic goat is recommended for the computation of conservation scores, as the inclusion of the reference genome may lead to biases 51.
In order to remap the GERP scores associated to hg19 positions to the domestic goat reference genome positions, we used liftOver (hgdownload.cse.ucsc.edu, v.287) and the chain file downloaded from hgdownload-test.cse.ucsc.edu/goldenPath/capHir1.
Third, we ascertained support for gene models annotated in the domestic goat genome with expression analyses of Alpine ibex tissue samples. We included expression data from 10 organs of an Alpine ibex female (see RNA-seq data section above) to assess expression levels of each gene model. Quality filtering of the raw data was performed using Trimmomatic 40 v.0.36. Hisat2 52 v.2.0.5 was used to map the reads of each organ to the domestic goat reference genome. The mapping was run with option -- rna-strandness RF (stranded library) and supported by including a file with known splice sites (option --known-splicesite-infile). The input file was produced using the script hisat2_extract_splice_sites.py (part of hisat2 package) from the same gtf file as the one used for the snpEff analyis (see above). For each organ, featureCounts 53 (subread-1.5.1) was used to count reads per each exon using the following options: -s 2 (reverse stranded) –f (count reads at the exon level), –O (assign reads to all their overlapping features), –C (excluding read pairs mapping to different chromosomes or the same chromosome but on a different strand). The R package edgeR 54 was used to calculate FPKM (Fragments Per Kilobase Of Exon Per Million Fragments Mapped) per each gene and organ. For variant sites that were included in more than one exon, the highest FPKM value was used. We found that 16’013 out of 17’998 genes showed transcriptional activity of at least one exon (FPKM > 0.3). Overall 166’973 out of 178’504 exons showed evidence for transcription. In a total of 1928 genes, one or more exons showed no evidence for transcription. Retained SNPs were found among 118’756 exons and 17’685 genes. Overall 611’711 out of 677’578 SNPs were located in genes with evidence for transcription.
Deleterious mutations are assumed to be overwhelmingly derived mutations. We used all ibex species except Alpine and Iberian ibex as an outgroup to define the derived state. For each biallelic site, which was observed in alternative state in Alpine ibex or Iberian ibex, the alternative state was defined as derived if its frequency was zero in all other species (a total of 44’730 autosomal SNPs). For loci with more than two alleles, the derived state was defined as unknown. For comparisons among all species, we only used the following criteria to select SNPs (370’853 biallelic SNPs retained): transcriptional activity (FPKM > 0.3 in at least one organ), GERP > −2 and a minimal distance to the next SNP of 3bp.
Individual-based simulations with Nemo
Individual-based forward simulations were run using the software Nemo 38 v.2.3.51. A customized version of aNEMOne 55 was used to prepare input files for parameter exploration. The sim.ini file for the final set of parameters run in 100 replicates is available as Supplementary File 1. All populations relevant for the founding of the populations under study were included in the model. See Figure 6A for the simulated demography, which was modeled with the actual founder numbers (assuming a sex-ratio of 1:1), while the translocations were simplified into four phases (data from 37, DRYAD entry doi:10.5061/dryad.274b1 and 36). The harmonic mean of the population census from the founding up to the final sampling year (2007) was used to define the population carrying capacity. Mating was assumed to be random and fecundity (mean number of offspring per female) set to five. The selection coefficients of 5000 biallelic loci subject to selection were drawn from a gamma distribution with a mean of 0.01 and a shape parameter of 0.3 resulting in s < 1% for 99.2% of all loci 56 (Figure S16). Based on empirical evidence, we assumed a negative relationship between h and s 57. We used the exponential equation h = exp(−51*s)/2 with a mean h set to 0.37 following 58. We assumed hard selection acting at the offspring level. In addition to the 5000 loci under selection, we simulated 500 neutral loci. Recombination rates among each neutral or deleterious locus was set to 0.5. This corresponds to an unlinked state. Initial allele frequencies were set to μ / h * s = 0.0014 (corresponding to the expected mean frequency at mutation-selection balance 59). Mutation rate μ was set to 5e-05 and deleterious mutations were allowed to back-mutate at a rate of 5e-07.
A burn-in of 3000 generations was run with one population (N = 1000) representing the entire species allowing to reach a quasi-equilibrium. N was reduced to N = 500 for five generations before a brief, two generation bottleneck of N = 80. At generation 3007, the population recovered to N = 1000 and three generations later the reintroduction was started with the founding of the two zoos Interlaken Harder (ih) and Peter and Paul (pp). The founding of new populations was modeled by migration of offspring into an empty patch.
The zoo ih (Interlaken Harder) and several populations did not survive all replicates of the simulations. Extinction rates were as follows: ih (Zoo Interlaken Harder) 84%, bo (Bire Öschinen) 3%, wh (Weisshorn) 3%, ob (Oberbauenstock) 9%, am (Alpi Marittime) 14% and pil (Pilatus) 2%. The high extinction rate of the zoo Interlaken Harder did not affect the outcome of the simulations. The extinctions were a result of the strong reduction in population size during the founding and occurred always after the founding (see also Figure S15). The extinctions of the reintroduced populations did not affect the estimates of derived allele counts but reduced sample sizes and, hence, affected the variance of estimators.
Data availability
Raw whole-genome sequencing data produced for this project was deposited at the NCBI Short Read Archive under the Accession nos. SAMN10736122–SAMN10736160 (BioProject PRJNA514886). Raw RNA sequencing data produced for this project was deposited at the NCBI Short Read Archive under the Accession nos. SAMN10839218-SAMN10839227 (BioProject PRJNA517635).
Author contributions
Conception and design of study: CG, DC, LFK Acquisition and analysis of data: CG Interpretation of data: CG, DC, LFK, FG
Funding
CG, LFK Wrote the manuscript with input from the other authors: CG, DC
Acknowledgments
We thank the following organizations and colleagues who contributed samples to this project. Iris Biebach, the Swiss hunting authorities of the cantons of Bern, Nidwalden, Obwalden, Uri, Graubünden and Wallis; the Gran Paradiso National Park (Alice Brambilla) and the Alpi Marittime National Park (Laura Martinelli), Sebastien Regnaut and Richard Kock, Zoological Society of London, Christian Siegenthaler, Ruedi Kunz and Samer Angelone-Alasaad. We are thankful to Glauco Camenisch and Kasia Sluzek, who provided access to Alpine ibex RNAseq datasets. We thank Laurent Excoffier, Stephan Peischl, Kimberly Gilbert, Heidi Lischer, Stefan Wyder, Thomas Wicker, Alan Brelsford and Jessica Purcell for helpful advice and comments on a previous version of the manuscript. We are grateful for drawings by Nadine Coline of the Zoological Museum of Zürich. This work was supported by the University of Zurich through a University Research Priority Program “Evolution in Action” pilot project grant and the Swiss Federal Office for the Environment. DC and CG were supported by the Swiss National Science Foundation (grant 31003A_173265 and 31003A_182343, respectively). This study makes use of data generated by the NextGen Consortium, which was supported by grant agreement number 244356 of the European Union’s Seventh Framework Programme (FP7/2010-2014).