The ghost of selection past: evolution and conservation relevance of the kākāpō color polymorphism

The analysis of population genomic data of entire species can inform us about their evolution and conservation status. We leveraged detailed phenotypic and genomic data of nearly all living kākāpō, an endangered and culturally significant parrot endemic to Aotearoa New Zealand, to describe the genomic architecture of its feather color polymorphism. Green and olive feather coloration are at similar frequencies in the population, which currently numbers less than 250 birds. We further dissected the color phenotype, demonstrating that the two colors differ in their UV reflection due to feather structure. We use quantitative genomics methods to identify two genetic variants whose epistatic interaction can fully explain the species’ color phenotype. The associated genomic region and our phenotypic analyses suggest a potential link to the candidate gene LHX8 , involved in tissue patterning, tissue differentiation, and bone morphogenesis. Our genomic forward simulations show that balancing selection constitutes a likely evolutionary mechanism for the polymorphism to become established in an ancestrally large effective population size, and to be maintained through a continuous population decline, including a severe bottleneck of the kākāpō population. The combination of our genomic, phenotypic, and simulation analyses lets us propose the role of extinct apex predators as the likely agent of such selection, making the color polymorphism in the kākāpō a “ghost of selection past”.


Introduction
About a century ago, population geneticists began to study the evolution of polymorphisms at genetic loci 1 .We have now reached a milestone where we can examine nucleotide variation across the entire genome of every single individual of a species.This remarkable feat has been accomplished for the kākāpō (Strigops habroptilus) 2 , a critically endangered flightless parrot species endemic to Aotearoa New Zealand that is considered a taonga (treasure) of Ngāi Tahu, a Māori iwi (tribe) of Te Wai Pounamu (the South Island of Aotearoa) [3][4][5] .The genomic data that has been generated for this species now enables us to study population genomics at an unprecedented resolution, improving our understanding of the evolutionary forces acting on the genomes of this threatened species with the potential to inform their conservation and recovery in partnership with Ngāi Tahu 2,4,6,7 .
In this study, we explore the puzzling phenomenon of the kākāpō feather color polymorphism ("green" and "olive" phenotypes) and how this severely bottlenecked species managed to retain its phenotypic diversity.The kākāpō has experienced a population decline and significant genetic drift over the past 30,000 years (30 KYA), followed by a sharp, anthropogenically induced population bottleneck resulting in just 51 surviving individuals in 1995 7 .Under the close management of the Aotearoa New Zealand Department of Conservation Te Papa Atawhai Kākāpō Recovery Programme in partnership with Ngāi Tahu 4,8 , the population size has increased to 247 birds located on several predator-free islands [as of September 1 st 2023].Despite these severe bottleneck events, the population exhibits a relatively balanced distribution of green (n=91; 53.8%) and olive (n=78; 46.2%) individuals since the onset of this study [as of January 1 st 2018].To understand the evolution of this color polymorphism, we here assess the phenotype's genomic basis and how the genetic polymorphism was first established (i.e., was not randomly lost in a large ancestral population) and has since been maintained in the population (despite population decline and severe population bottleneck).
Population genetic theory stipulates that newly emerged genetic variants (i.e., mutations), are rarely expected to reach appreciable allele frequencies when they evolve neutrally since often they get lost through genetic drift 9 .Kimura's neutral theory of molecular evolution suggests that most genetic polymorphisms are neutral (or slightly deleterious) because if a mutation conferred selective advantage, it would be expected to rise to fixation, replacing the ancestral allele 10 unless the frequency rise of such beneficial variants is halted by one or more counteracting evolutionary forces.Balancing selection can theoretically balance allele frequencies and produce stable genetic polymorphisms, for example through negative frequency-dependent selection (NFDS), overdominance, antagonistic selection, or temporally and spatially varying selection 11 .To understand the establishment and maintenance of genetic polymorphisms, we therefore need to first reject the null hypothesis of neutral evolution, and then describe the evolutionary forces that help maintain the genetic polymorphism 12,13 .
Understanding the establishment and maintenance of the kākāpō polymorphism might be of importance for ongoing conservation efforts since the significant majority (nearly 70%) of the wild founder population from 1995 were of green color.Whether the color polymorphism is likely to impact fitness in the absence of current intensive conservation management is therefore of potential conservation relevanceespecially given the long-term plans of the Kākāpō Recovery Programme and Ngāi Tahu to restore this critically endangered species beyond predator-free islands.Why the kākāpō species has maintained the green and olive phenotype has so far remained elusive.Before the arrival of tūpuna Māori in AD 1280 14 , the only predators of the kākāpō were avian 15 , the vast majority of which hunt by sight.While the apex predators, the Haast's eagle (Hieraaetus moorei) and the Eyles' harrier (Circus teauteensis), went extinct approximately 600 years ago 14 , we hypothesize that the kākāpō color polymorphism could be a consequence of selective pressures through this past predation.For example, NFDS through search image formation in the avian predators can theoretically maintain such polymorphisms 16 .
To explore which hypothesis best explains the kākāpō color polymorphism, we created and analyzed detailed phenotypic data of kākāpō plumage together with high-coverage genomic data of nearly the entire kākāpō species (n=169; as of January 1 st 2018) 2,11 .We found that two epistatically interacting single-nucleotide polymorphisms (SNPs) at the end of chromosome 8 can explain the color polymorphism of all individuals.The nucleotide divergence between the two haplotypes containing a derived allele predicts that the color polymorphism evolved circa 1.93 MYA, around the time when the main kākāpō predators evolved.Using genomic forward simulations, we determined that the polymorphism's establishment is highly unlikely under neutral evolution, while any selective advantage of the novel color morphology would have likely led to its rapid fixation.We show that the effects of balancing selection would have been sufficient to establish the polymorphism in the large ancestral population.Our simulations show that the polymorphism could have subsequently been maintained despite declining population size and a severe kākāpō population bottleneck, and despite dramatic ecological changes in Aotearoa New Zealand approximately 600 years ago when several indigenous bird species including the major natural kākāpō predator species became extinct 14 .Based on all this evidence, we propose that now-extinct apex predators were the likely agent of balancing selection, making the color polymorphism in the kākāpō a "ghost of selection past".

Phenotypic data
We firstly created a catalog of the feather color polymorphism of the kākāpō species.Wherever possible, we did so through standardized photography of living birds followed by manual annotation (Figure 1A; Material and methods); in the case of deceased birds, we used historical photographs (Supplementary Table 1, Photography sheet).As green and olive individuals are not always easy to distinguish, we used Commission on Illumination L*a*b* (CIELAB) color space analysis, which projects any color into a two-dimensional color space a*b that is perpendicular to and therefore independent of luminance L to define objective criteria of green and olive colors (Material and methods).By analyzing photographs of kākāpō with distinct green or olive color, we found that the median and spread of a is predictive of the color morphology (Material and methods).We then leveraged this CIELAB analysis to annotate all kākāpō individuals for which we had any doubt in terms of their phenotypic classification (Supplementary Table 1, CIELAB sheet).Based on this phenotypic catalog, we found that the relative amount of olive individuals has significantly increased since the discovery of the last remnant kākāpō population in the wild (linear regression R 2 =0.289; p= 0.000132; Figure 1B).Figure 1.The kākāpō feather color polymorphism.A. Green (left) and olive (right) kākāpō individuals; the photographs show the kākāpō individuals Uri (left) and Bravo (right) who are full siblings.The inserts show the standardized photography that was applied for color assessment across the extant kākāpō population (Material and methods).B. Increase in the proportion of olive in the kākāpō population since discovery of the last remnant wild population, which was predominantly green (67.4%).The size of the dots reflects the number of individuals per year.Wherever known, year of hatching is indicated; otherwise, year of discovery in the wild is indicated.All discoveries in 1990 and pre-1990 are binned as "1990".Linear regression modeling (weighted for sample size) confirms a significant correlation between time and the proportion of olive individuals (R 2 =0.289; p= 0.00013).

Genomic data
Our previous population genomic analyses identified 2,102,449 SNPs in Illumina short-read sequencing data across 169 kākāpō individuals using the reference genome bStrHab1.2.pri (NCBI RefSeq assembly accession number GCF_004027225.2) 2,5,7 and the genetic variant calling tool DeepVariant 2,17 .We subsequently excluded the kākāpō individual Konini 3-4-16 since this bird died at 26 days of age, i.e., before its feather color could be determined (kākāpō only possess white down when hatching with adult plumage color becoming apparent at about 50 days of age).We filtered the SNP set according to sequencing depth, quality, minor allele frequency (MAF), genotype missingness, biallelicity, and extreme deviations from Hardy Weinberg equilibrium (HWE) to obtain a final set of 1,211,090 highquality SNPs (Material and methods).We visualized this genomic data through principal component analysis (PCA) 14 , whose first ten axes did not show any obvious genome-wide difference between green and olive individuals (Figure 2A; Material and methods).

Genome-Phenome associations
We used Bayesian multiple regression modelling as implemented in BayesR 18 to estimate the heritability of the kākāpō color polymorphism at 77.7% (95% credible interval of 59.9-89.2%;Material and methods).The chromosome partitioning analysis, which calculates heritability per chromosome, shows that nearly the entire genetic variance (> 99%) lies on chromosome 8 (Figure 2B; Material and methods).Within-population genomic prediction of color using tenfold cross-validation in BayesR resulted in an area under the receiver operating characteristic curve (AU-ROC) of 0.99 (Figure 2C; Material and methods).
We then used RepeatABEL 19 to perform a mixed-model Genome-Wide Association Study (GWAS) across all 168 kākāpō individuals while accounting for relatedness and other random as well as fixed effects (Material and methods).We identified one genomic region strongly associated with color polymorphism (Figure 2D) that contained the top-hit SNP Chr8_63055688 (p=10 -40 ).As RepeatABEL is based on linear mixed-effect models which do not directly account for binary dependent variables, we confirmed this result using PLINK 14 , which only considers fixed effects, but directly models binary phenotypes through logistic regression; we further did not identify any genome-wide significant associations with a dataset of small insertions and deletions (indels) generated by Guhlin et al. (2023)  2 (data not shown).The top SNP for both the heritability test and the GWAS, Chr8_63055688 (A|T alleles), explains the color polymorphism of nearly the entire extant kākāpō population; only the phenotypes of five individuals (Egilsay, Elliott, Percy, Dusky and Te Atapō) were misclassified when assuming a dominant effect of the alternative T allele encoding the olive color morphology.For example, Te Atapō is heterozygous but of green color.We therefore applied a genetic algorithm 20 to 1100 SNPs located in or close (within 400k nucleotides) to the significantly associated genomic region to find combinations of SNPs that can fully describe the phenotypic distribution (Material and methods).This test identified a dominant-epistatic interaction with a second SNP as the most probable genomic architecture of the color polymorphism.This SNP, Chr8_63098195 (T|G alleles), was the second-best GWAS hit (p=10 -31 ).While the phenotype of the kākāpō individuals Egilsay and Elliott could first not be explained correctly by the suggested dominant-epistatic interaction, we investigated our data further and identified a genomic sample swap between these two individuals.Taking this sample swap into account, the color polymorphism of every kākāpō individual can be explained by the genetic rule that olive individuals must carry at least one alternative T allele at Chr8_63055688 and at least one alternative G allele at Chr8_63098195.These two indicator SNPs are in strong linkage disequilibrium (LD) (r 2 =0.696) but are interspersed across genomic regions of low LD (r 2 <0.1; Materials and methods; Supplementary Figure 1A,) and therefore the entire genomic region containing both SNPs is not under strong linkage (i.e., undergoes recombination).To discard the impact of any large genomic rearrangements such as inversions or of uncalled SNPs and indels, we investigated the mean sequencing depth across individuals in this genomic region.The mean depth as captured through a slidingwindow approach remains stable across the genomic region, with a few decreases in depth at individual genetic sites (Supplementary Figure 1B; Material and methods).

Out-of-sample predictions
We next validated the epistatic effect of the two indicator SNPs on the color morphology (Material and methods).As Te Atapō was the only kākāpō whose green phenotype could be explained by reference homozygosity at Chr8_63098195 (TT) while heterozygosity at the top SNP Chr8_63055688 (AT) would have predicted an olive phenotype, we analyzed the genomic and phenotypic data of all offspring of Te Atapō to confirm our predicted epistatic effect: Te Atapō (genotypes for Chr8_63055688 and Chr8_63098195 and phenotype: AT, TT, green) had offspring with Pura (AT, TG, olive) and Tumeke (AA, TT, green).The offspring of Te Atapō and Pura were Uri (AT, TT, green; Figure 1A, left), Bravo (TT, TG, olive; Figure 1A, right), and Hanariki (TT, TG, olive); the offspring of Te Atapō and Tumeke were Tutū (AA, TT, green) and Meri (AA, TT, green).These observations support the hypothesis of the dominantepistatic effect of the indicator SNPs on the color morphology in kākāpō.
We further investigated the genomic underpinnings of a deceased yellow kākāpō individual to examine its genotype (Material and methods).We found this individual to be genetically olive (AT, TG), and the yellow color might therefore be a consequence of discoloration, which is known to occur in parrots due to malnutrition and viral diseases 21 .

Candidate gene analysis
We next identified all genes in the annotated genomic regions to find underlying candidate genes (Supplementary Table 2; Supplementary Figure 1C).Both indicator SNPs lie in predicted intergenic and intronic regions, respectively, so their molecular functional consequence is not straightforward to assess.Furthermore, the gene TNNI3K, in whose first intron the SNP Chr8_63098195 falls, has weak annotation support according to the NCBI XM track (NCBI RefSeq assembly accession number GCF_004027225.2) and the RNA-seq exon coverage (Supplementary Figure 1C).The most likely gene to be involved in color polymorphism according to its known function is LHX8, which lies approximately 200k nucleotides downstream of the indicator SNPs.The protein encoded by this gene is a transcription factor and a member of the LIM homeobox family of proteins, which contains two tandemly repeated cysteine-rich double-zinc finger motifs known as LIM domains in addition to a DNA-binding homeodomain.LHX8 is known to be involved in the patterning and differentiation of various tissue types and plays a role in tooth morphogenesis (Supplementary Table 2).This candidate gene could therefore point to an underlying structural color morphology of the kākāpō color polymorphism.

Optical Analyses
To evaluate the hypothesis of a structural color morphology in kākāpō feathers and its potential functional consequences, we used Scanning Electron Microscopy (SEM) to investigate the surface of three olive and three green feathers (Figure 3A; Material and methods).We found that the green feathers had a smoother surface, especially at a resolution of 5500x (Figure 3A).We then subjected the feather tips to near-infrared/visible photoreflectometry to assess if the difference in surface modality had an impact on light reflectance (Figure 3B); we indeed found evidence that besides differences in the visible wavelength spectrum (400-700 nm), the olive feathers reflected less light in the UV spectrum (<400 nm; Figure 3B).This might suggest potential differential fitness consequences between the color morphologies given that many avian and other predator species can see UV light.

Evolutionary origin
We identified the green allele to be the ancestral allele of our indicator SNPs of interest by leveraging the high-quality reference genome of the most closely related species, the kea (Nestor notabilis), which diverged from the kākāpō about 60-80 MYA 22 (Material and methods).Using principles of evolutionary genetic theory about haplotype divergence and assuming a generation time of 15 years, we estimated the genetic polymorphism to have arisen around 128,500 generations ago (95% confidence interval of 39,300-302,000 generations) or 1.93 MYA (0.59-4.52 MYA) (Material and methods).

Establishment and maintenance of color polymorphism
We tested competing neutral and adaptive hypotheses for the establishment and maintenance of the color polymorphism with individual-based forward simulations in SLiM3 23 (Material and methods).We found that the probability of establishment (i.e., not lost by genetic drift) of the color polymorphism in a large ancestral population (Ne=36,000 7 ) was less than one in a million if we assume neutral evolution (Supplementary Figure 2A).If we assumed a selective advantage, the probability increased modestly; for a small selective advantage of 1%, the novel olive color had an establishment probability of ~500 in a million.For a much larger selective advantage of 20%, the probability increased to nearly ~5000 in a million.While the probability of establishment is therefore overall low and requires a selective advantage (Supplementary Figure 2A-B), our simulations also show thatonce establishedthe newly evolved color morphology would have rapidly swept to fixation under pure adaptive evolution (Figure 4A).Hence, we also reject the hypothesis of positive selection.Instead, our simulations favor the hypothesis that balancing selection, modelled here as a trait under NDFS, is the most likely evolutionary scenario that can explain the establishment of the polymorphism in the ancestral population (Figure 4B; Supplementary Figure 2C).As the longest time for any establishment event in our simulations was 8000 generations (mean=5182; sd=500), our empirical estimate that the polymorphism has arisen 128,500 generations ago would have given plenty of time for the polymorphism to be established in the population (Supplementary Figure 2D; Supplementary Figure 3).
Finally, we modelled the dynamics of the maintenance of the color polymorphism during the last 1000 generations of the declining demographic history of the kākāpō, including the severe bottleneck before 1995 (Figure 4C; Material and methods).The kākāpō ecosystem has changed dramatically in the past approximately 600 years (40 generations), with several indigenous bird species including the major natural kākāpō predators going extinct within 200 years of arrival of tūpuna Māori roughly 700 years ago 14 .Given that our optical analyses pointed to a potential role of predation due to differential UV reflection between green and olive feathers, we here tested under which condition the polymorphism could have been maintained in the extant kākāpō population after being established by NFDS.We tested the following competing scenarios using computer simulations: (1) NFDS was completely removed for all 1000 generations (as negative control); (2) NFDS was removed for the last 40 generations (i.e., since the extinction of the predators); or (3) NFDS was removed for the last 40 generations, but genetic load accumulated in alternative color haplotypes to mimic the effect of associative overdominance (Figure 4D).We were able to marginally reject scenario 1 since under pure genetic drift the polymorphism is often lost (p < 0.05).Scenarios 2 and 3 led to a much higher rate of balanced polymorphisms, with associative overdominance increasing the chances of this happening (Figure 4D).The observed color polymorphism might have therefore been maintained by chance without any form of balancing selection in the past 600 years.

Discussion
By combining deep genomic and detailed phenotypic data with computer simulations and optical analyses, we have studied the origin and evolution of the kākāpō feather color polymorphism.We describe the putative genomic architecture of this phenotype and establish that a dominance epistatic interaction of two biallelic single-nucleotide genetic variants can explain the phenotype of all 168 birds, which had paired genomic and phenotypic data.Integrating this genomic architecture with genomic forward simulations, we explain the evolution and puzzling maintenance of this balanced polymorphism in the heavily bottlenecked kākāpō population.
Established genetic polymorphisms are rare and challenging to identify due to the need to rule out neutral evolution and determine the evolutionary forces maintaining them, such as balancing selection 11 .Rejecting neutral evolution requires assessing the genetic basis, fitness consequences, and potential selective agents of the phenotype.If directly measuring fitness effects is not feasible, balancing selection can still be inferred by rejecting neutral evolution and positive selection as explanations.Here we estimated that the kākāpō feather color polymorphism was established around 1.93 million years ago as an oligogenic trait, with green as the likely ancestral phenotype.Through genomic simulations, we found that neutral and positive evolution were highly unlikely, suggesting balancing selection through NFDS as a driver in establishing this polymorphism.
We hypothesize that the selective agent of such balancing selection could have been the two major natural kākāpō predators, the Haast's eagle and Eyles' harrier, which evolved at around the same time as our observed genetic polymorphism, approximately two million years ago 15 and went extinct approximately 600 years ago 14 .The Haast's eagle evolved in the late Pliocene/early Pleistocene when it diverged from the little eagle 2.22 MYA (95% credibility interval of 1.41-3.25 MYA), coinciding with the increase of open woodlands and grasslands at the onset of Pleistocene glaciations in Aotearoa Zealand about 2.5 MYA.The Eyles' harrier diverged from the spotted harrier around 2.4 million years ago 15 .Our hypothesis is that the kākāpō color phenotypes were maintained by balancing selection, i.e.NFDS, through the predator's search image formation 16 .This mechanism allows predators to detect cues associated with the prey species, and in this sense being the rarer phenotypes confers an adaptive advantage.Our simulations further show that after the extinction of both predator species circa 600 years ago, the genetic polymorphism could have been maintained due to the inherent inertia in allele frequency changes.The genomic signature of the color polymorphism and the dating of its origin and evolutionary trajectory therefore provide the first evidence that the color polymorphism in the kākāpō is likely to be a relic, or "ghost", of balancing selection by now-extinct apex predators.This predation hypothesis would fit the role of our proposed potential candidate gene, LHX8, whose known role in tissue differentiation and tooth morphogenesis might point to structural morphology as the source of the kākāpō color polymorphism.Structural colors are prevalent in birds and have evolved numerous independent times, and, different to simple pigmentation colors, are produced by nanometer-scale biological structures that differentially scatter, or reflect, wavelengths of light 21 .UV-modulating structural colors of feathers of parrots and many other birds are often produced in the feather barbs, and Dyck (1978) noted many years ago that non-iridescent green and olive colors in particular are produced by structural differences in feather barbs 26 .Our analysis of light reflectance of olive and green feathers confirmed differential reflectance patterns, both in the visible as well as in the UV spectrum.Notably, we found that green feathers reflect more light in the UV spectrum.The derived phenotype of olive color might have provided an initial evolutionary advantage, given that it is likely that the Haast's eagle and Eyles' harrier were able to see light in the UV spectrum, before balancing selection might have led to the establishment of the color polymorphisms.
We, however, emphasize that no functional experimental validation of the genomic architecture or of the potential candidate gene is possible in this critically endangered and taonga species.We, therefore, acknowledge that it remains a possibility that we have failed to identify the actual causal genetic variation.While the mean sequencing depth is stable across our genomic region of interest, a few individual genetic sites show a sudden drop in coverage and could potentially harbor undetected genetic variants.As our genetic algorithm that predicts the potential dominant-epistatic interaction between our two indicator SNPs is computationally expensive, we further only focused on our genomic region of interest (at the end of chromosome 8).This means that we ignored the potentially modulating effects of genetic variants outside of this genomic region.Because of the reliance on short-read sequencing data, we might have also missed the impact of complex structural variation, which could be assessed more robustly using long-read sequencing technology in the future.When it comes to the functional annotation of our genomic region of interest, we have further found that the gene annotation of the utilized kākāpō reference genome is of modest confidence, and, for example, lacks the annotation of many open reading frames 2 .An inadequate annotation at the end of chromosome 8 could therefore have prevented us from identifying the correct causal gene or gene pathway.As the kākāpō are doing very well in their recovery thanks to the efforts of the Kākāpō Recovery Programme and Ngāi Tahu (as of September 1 st 2023: 247 living kākāpō individuals), we are however positive that more samples and data will be available in the future to re-assess our hypothesis.
Our understanding of the evolution and maintenance of the kākāpō color polymorphism can inform ongoing conservation efforts.Assuming that balancing selection has regulated the polymorphism through now-extinct apex predators, we contend that its loss would not confer a negative fitness effect or cost to population viability in the present-day environment.Without predator-mediated NFDS or any other form of balancing selection and without any intense conservation management, we predict that the polymorphism is likely to be lost by drift within the next 33 generations (5-95% CI = 10-94 generations).In view of the aim of the kākāpō conservation management to re-establish the species on the Aotearoa New Zealand mainland, our results might suggest that the feather polymorphism is not necessarily a trait that has to be prioritized by the conservation programme for the new founding population.As this re-establishment has just taken its first step with the historic release of ten kākāpō to Sanctuary Mountain Maungatautari, Waikato (a fenced reserve on the mainland), In 2023, we are hopeful that our and others' genomic analyses will play a role in future kākāpō conservation efforts in partnership with Ngāi Tahu.

Phenotypic analyses
As part of the Department of Conservation Kākāpō Recovery Programme in partnership with Ngāi Tahu, we assessed the color polymorphisms of the extant kākāpō population.We developed a standardized protocol to be used whenever a kākāpō individual undergoes their regular health monitoring: we used a mobile phone camera (Samsung S20 FE; flash switched on) to take photographs through a short white plastic pipe while directing the focus on the feathers, ensuring that natural light was kept out to standardize luminance as much as possible.For every bird, we chose an area of undisturbed feathers on the back of its neck that contained as many feather tips as possible.We also included historical photographs of deceased birds, resulting in an extensive color catalogue of 192 individuals (Supplementary  1, CIELAB sheet).As L is therefore an independent axis perpendicular to the color axes a and b, differences in luminance will not have an impact on the a*b projection of the photograph.The color axes a and b describe the green-red and blue-yellow color components, respectively.In the case of the kākāpō feathers, a negative median of a of approximately −10 and a wide spread of a of approximately >20 defines the green phenotype (Supplementary Figure 4A), while a very sharp a peak (spread of a of <20) at a median of approximately 0 defines the olive phenotype (Supplementary Figure 4B), with the b distribution being relatively constant across both phenotypes (Supplementary Figure 4).
Based on this final color polymorphism catalogue, we then calculated the temporal change of the proportion of olive birds in the kākāpō population.We used the hatching year wherever available; for individuals that had been discovered in the wild, we used the year of discovery.We set all discovery years from before 1990 to the year 1990 to summarize the ratio of olive birds in the entire surviving wild kākāpō population.
For the genomic region downstream of chr8_63000000, we further performed the following analyses.We calculated pairwise LD (r 2 between each SNP pair) using PLINK 28 v1.9.We used SAMtools 27 v1.12 to calculate the per-base sequencing depth across all samples (samtools depth).We further used BEDTools 29 v2.29.2 to average the depth across sliding windows of length 10kb and with a step size of 1kb (bedtools makewindows and bedtools intersect).We further accessed the bStrHab1.2.pri genome assembly (NCBI RefSeq assembly accession number GCF_004027225.2) and its associated RNA-seq exon coverage data (aggregate (filtered), Annotation release 101) to annotate and visualize gene annotation.

Genome-Phenome analyses
We used a Bayesian mixture approach implemented in BayesR 18 to estimate the heritability of the color polymorphism (BayesR version: https://github.com/syntheke/bayesR.git [last access: 13/02/23, 23:00 CET]).BayesR models SNP effects as a mixture of four distributions, with one SNP category of effect size 0, and the three other SNP categories reflecting distributions around SNP effect sizes of 0.0001, 0.001 and 0.01 of the overall phenotypic variance; this is important since BayesR allows for SNPs to have zero effects on the phenotype, which is often not the case for other models that assume small but existing contributions to the phenotype from every single SNP, interfering with the heritability estimation of mono-or oligogenic phenotypes.The close relatedness between kākāpō individuals does not pose a problem for BayesR since it informs the additive genetic variance that the mixture approach tries to detect.We ran the Markov chain Monte Carlo (MCMC) for 200k cycles (after which low autocorrelation of MCMC convergence was achieved), with a burn-in of 50k cycles, and sampled every 100 th cycle.To assess polygenicity, we then conducted chromosome partitioning by partitioning the estimated heritability across all chromosomes.We also used BayesR to perform within-population phenotypic prediction using ten-fold cross-validation; we assessed the performance by one AU-ROC estimated across all ten splits.
To prevent spurious phenotype-genotype associations, we conducted GWAS using RepeatABEL 19 v1.1 (see Data and code availability) where we modeled the relatedness matrix between individuals as a random covariate.RepeatABEL has been shown to be suitable for binary data, but its power to detect causal SNPs decreases when the proportion of successes for the analyzed trait decreases 19 .To assess its robustness, we compared the results with the standard tool PLINK 28 v1.9, which can explicitly model binary phenotypes but which only takes into account fixed effects.We therefore modeled the first ten PCs of the genomic covariance matrix as fixed effects.
We applied a custom genetic algorithm to all 1100 SNPs located downstream of chr8_63000000 (i.e., starting from chr8_63000328 to the last SNP on the chromosome, chr8_63426960).The custom genetic algorithm is described in more detail in Smallbone et al.  (2021)  20 .Briefly, it takes as input a SNP matrix with SNPs in columns and sample sequences in rows.An additional column that contains the binary codes for the phenotype (0 for green, 1 for olive) acts as a target variable.The algorithm starts by generating a set of random expressions based on the SNPs, such as (S8_63055688_A_T > 0).This is then evaluated against each allele as true or false.The algorithm can also generate compound expressions that are linked with logical operators, for example ((S8_63056070_T_C > 0) AND (S8_63048210_T_G = 1)).Each of the random expressions is then scored against the target expression using a penalized phi-coefficient.The penalty reduces the phi-coefficient by (1 -p) with p as the proportion of missing alleles.The lowest n scoring rules are discardedthe remainder are preserved as the "breeding" population.The algorithm then loops through a user-defined number of "generations", typically 1000 to 5000, using the breeding population from which to mutate or recombine new predictive expressions, plus a number of entirely new random expressions.These are evaluated as described above.The populate of expressions "evolves" to give the best set of predictive expressions.The algorithm is written in Transect-SQL, and implemented in Microsoft SQL Server version 14.0.2002.14.
We further received access to the genotypes of the two indicator SNPs in the next generation of kākāpō chicks (hatched after our data collection's deadline on January 1 st 2018).This genomic data has been generated and processed according to Guhlin et al. (2023) 2 , but is not yet publicly available.

Evolutionary origin analyses
We identified the putative ancestral allele of our SNPs of interest by leveraging the high-quality reference genome of the most closely related species, the kea (Nestor notabilis) 22 .We downloaded the publicly available kea raw paired-end Illumina sequencing data from the National Center for Biotechnology Information (NCBI; accession numbers SRX341179/80/81) to map the reads to the kākāpō reference genome.Raw data processing and analysis was done as described for the yellow museum specimen (see Material and Methods, Out-ofsample analysis).
We then estimated when the polymorphism might have arisen in the kākāpō based on the number of SNPs and the total number of nucleotides in the candidate genomic region, assuming a mutation rate of 1.33×10 -8 substitutions/site/generation (note that both SNPs are putatively located in non-coding genomic regions) 7 .We estimated a length of 2081 nucleotides for the sum of the haplotypes carrying both indicator SNPs.While the two SNPs are in strong LD with each other, they are separated by SNPs that are not in LD with either of the polymorphisms (Supplementary Figure 1A).As each indicator SNP is therefore the only SNP on its haplotype, we estimated their haplotype lengths by halving the distance to the next neighboring SNP, resulting in a length of 1243 nucleotides for the Chr8_63055688 haplotype and a length of 839 nucleotides for the Chr8_63098195 haplotype.We then used a binomial cumulative distribution and an estimated generation time of 15 years to estimate the likely age of the polymorphism.

Genomic simulations
We performed individual-based, forward in time simulations with a Wright-Fisher implementation in SLiM3 23 .We simulated a genomic region of 50k nucleotides around our two indicator SNPs.We scored the color phenotype of all simulated individuals based on the putative epistatic interaction of the two indicator SNPs, using information based on the empirical data that specifies that an individual is olive if it has at least one derived allele at both genetic loci.Demographic trajectory: To simulate the demographic trajectory of the kākāpō population we first reconstructed its effective population size Ne trajectory during the last 1000 generations by combining an existing PSMC 24 estimate based on the kākāpō reference genome 7 , and a LD-based estimate with GONE 25 with our population genomic dataset of 168 individuals (recombination rate of 2cM/Mb).While GONE can reliably estimate Ne until approximately 200 generations into the past or 3 KYA, PSMC analyses are restricted to making inferences between approximately 20 KYA and 1 MYA ago.To connect the recent and ancient Ne trajectories, we assumed that the estimated trajectory of steady decline of the kākāpō population continued between 20 and 3 KYA.
Establishment simulations: We modelled the conditions required to establish a novel olive polymorphism in a large ancestral population.The size of the simulated ancestral population of 36,000 birds was determined as the largest historical population size by PSMC 7,24 , at approximately 2 MYA which was when we estimated the color polymorphism to have evolved.We assumed one variant (SNP1) was segregating at a low frequency of 0.4% in the population before the second SNP (SNP2) occurred as a de novo mutation.We chose the initial frequency of SNP1 (0.4%) by performing a set of simulations with neutral mutations for the same large ancestral population size and took the median MAF at equilibrium.To assess the impact of different MAFs at SNP1, we ensured that rerunning our simulations assuming a gradually increasing MAF (up to 10%) would result in similar conclusions (Supplementary Figure 3).We further assumed that SNP2 appeared because of a random point mutation in this genomic background.If the polymorphism was lost, we would restart the simulation with a new seed number.Otherwise, we ran each simulation for a maximum of 133,000 generations or 2 MYA.We tracked the dynamics of the polymorphism every 500 generations and if the polymorphism was either balanced (proportion of 0.4-0.6)or (nearly) fixed (proportion > 0.95) for ten consecutive times (5000 generations), we stopped the run and considered the polymorphism as established.
We tested three alternative hypotheses for the establishment of the color polymorphism: (1) a neutral model, where being either green or olive color had no selective advantage or disadvantage; (2) a positive selection model, where the novel olive phenotype had a selective advantage ranging between 1% and 20%; and (3) a balancing selection model, where being the rarest phenotype had a selective advantage ranging between 1% and 20% (i.e., NFDS).We ran 10,000 replicates per scenario.
Maintenance simulations: Starting from an established balanced polymorphism in the ancestral population (proportion of 0.4-0.6 of olive individuals), we tested which conditions could lead to the maintenance of such a polymorphism whilst considering the declining demographic history of the kākāpō population, ecological changes, and the impact of genetic drift for the last 1000 generations.We seeded the maintenance simulations with the output of the previous step (i.e., establishment simulations) that included a large ancestral population with a balanced color polymorphism.A total of 100 starting input files were sampled at random to seed the maintenance simulations to account for a range of starting variation.Since balancing selection (NFDS) is required to establish the polymorphism, we tested whether the polymorphism could be maintained as observed in the extant kākāpō population when removing the effect of NFDS.
We simulated three competing scenarios: (1) NFDS was completely removed for all 1000 generations leaving the polymorphism to drift, representing a control treatment where only drift is at play; (2) NFDS was removed for only the last 40 generations (i.e., the approximate time of predator extinction), leaving the polymorphism to drift just at the very end of the simulation; (3) NFDS was removed for the last 40 generations as before, and additionally genetic load accumulated in alternative color haplotypes to mimic the effect of associative overdominance.In this last scenario, the assumption is that the alternative color haplotypes accumulated recessive highly deleterious mutations (s=0.2) at different loci.Selection would then act against individuals that are homozygous and expressing the deleterious effects of these mutations, and this would maintain a balanced polymorphism.
To compare the simulated and empirical data, we first calculated a metric to express the level of the balanced polymorphism (BP) as BP = 2 -2(G 2 + O 2 ), where G is the proportion of green individuals and O is the proportion of olive individuals.The value of BP ranges between 0 (no polymorphism, with one phenotype being lost and the other fixed) and 1 (completely balanced polymorphism with equal numbers of green and olive).The BP in the empirical data is 0.992, reflecting a value close to a perfect balanced equilibrium.Then, we compared the BP of the last step of the simulation (i.e., the extant population) to that of the empirical data.We ran 10,000 replicates per scenario.

Functional analyses
We subjected green and olive feather tips to Scanning Electron Microscopy (SEM) to study their morphology in detail; we worked with uncoated samples to not destroy the culturally valuable feathers.The feathers were attached to the SEM stage using double sided carbon tape.They were then imaged in a JEOL 6700F field emission SEM.We used an accelerating voltage of 3 kV, and the lower secondary detector for imaging.This detector mixes secondary and backscatter electrons together, which lessens the charging effect for some uncoated samples.
We further subjected the feather tips to near-infrared/visible photoreflectometry.We therefore used a dual-beam photospectrometer of the model Shimadzu UV-3101PC with MPC-3100 reflectometry accessory.The machine shines light of a single wavelength on the sample, measures how much of it is reflected, and then changes the wavelength and repeats.double-zinc finger motifs known as LIM domains, in addition to a DNAbinding homeodomain.This family member is a transcription factor that plays a role in tooth morphogenesis.

SLC44A5
Role in transmembrane transport; implicated in hepatocellular cancer.

Figure 2 .
Figure 2. Genome and genome-phenome analysis of the color polymorphism of 168 kākāpō individuals.A. Global genomic PCA, colored according to the color polymorphism of the individual kākāpō.B. Chromosome partitioning of color polymorphism heritability according to BayesR 18 ; only the largest eight chromosomes are annotated.C. ROC and AU-ROC of within-population ten-fold cross-validation when predicting the color polymorphism from genome-wide data using BayesR; all predictions across ten training/validation splits of ~17 individuals each are shown.D. Manhattan plot of the mixed-model GWAS between all bi-allelic SNPs and the binary color polymorphism phenotype 19 ; the horizontal line indicates the genome-wide significance line after Bonferroni multiple testing correction.The genome-wide significant hit at the end of chromosome 8 represents a single SNP, which we therefore discarded as noise.

Figure 3 .
Figure 3. Optical analyses of kākāpō feathers of both color polymorphisms. A. Scanning Electron Microscopy images of three green (left three columns) and three olive (right three columns) at different resolutions (rows); the green feathers show a smoother surface than the olive ones.B. Photoreflectometry of near-infrared/visible wavelengths in the feather tips; the relative reflectance of an exemplary olive and green is plotted over the wavelength of the reflected light.

Figure 4 .
Figure 4. Genomic forward simulations of color polymorphism evolution assuming A. a positive selection model, where the novel olive phenotype had a selective advantage ranging between 1% and 20%; and B. a balancing selection model (i.e., NFDS), where being the rarest phenotype had a selective advantage ranging between 1% and 20%.We ran 10,000 replicates per scenario: establishment of color polymorphism for a range of selection coefficients (0-0.2). C. Demographic history of the kākāpō population as reconstructed with PSMC 24 and GONE 25 (Material and methods).The vertical dashed line marks the time point of 40 generations when the major natural avian predators of the kākāpō went extinct.D. Probability of the simulated color polymorphism being balanced as observed in the empirical data of the extant population under three competing scenarios: (1) NFDS is removed for all 1000 generations, (2) NFDS is removed for the last 40 generations, or (3) same as scenario 2 plus genetic load accumulation.The dashed red line marks the probability threshold of 5% (Material and methods).