High-altitude adaptation in rhesus macaques

When natural populations split and migrate to different environments, they may experience different selection pressures that can lead to local adaptation. For aerobic life, the low atmospheric oxygen content of high altitude living presents a special challenge and a strong selection pressure. Searching for evidence of adaptation to high altitude, we compare the whole genomes of 23 wild rhesus macaques captured at high altitude (mean altitude > 4000m above sea level) to 22 wild rhesus macaques captured at low altitude (mean altitude < 500m above sea level). To capture the genomic patterns of a positive selective sweep, we develop XP-nSL, a haplotype-based genomic scan for differential local adaptation with power to detect ongoing and recently completed hard and soft sweeps. We find evidence of local adaptation in the high-altitude population at or near 303 known genes and several unannotated regions. We find the strongest signal for adaptation at EGLN1, a classic target for convergent evolution in several species living in low oxygen environments. Furthermore, many of the 303 genes are involved in processes related to hypoxia, regulation of ROS, DNA damage repair, synaptic signaling, and metabolism. These results suggest that, beyond adapting via a beneficial mutation in one single gene, adaptation to high altitude in rhesus macaques is polygenic and spread across numerous important biological systems. Impact Summary Extreme environments pose a challenge to life on multiple fronts. Very high-altitude environments are one such example, with low atmospheric oxygen, increased ultraviolet light exposure, harsh temperatures, and reduced nutrition availability. In spite of these challenges, many plants and animals, including humans, have genetically adapted to cope with these hardships. Here we study a population of rhesus macaques living at high altitude and compare their genomic patterns with those of a population living much closer to sea level, searching for evidence of genetic changes that are indicative of adaptation to their environment. When positive selection is ongoing or a beneficial mutation has recently fixed in a population, genetic diversity is reduced in the vicinity of the adaptive allele, and we expect to observe long homozygous haplotypes at high frequency. Here we develop a statistic that summarizes these expected patterns and compares between two populations in order to search for evidence of adaptation that may have occurred in one but not the other. We implement this statistic in a popular and easy-to-use software package. We find evidence for adaptation at a critical gene that helps control physiological response to low-oxygen, one that has been the target of repeated convergent evolution across many species. We also find evidence for positive selection across a range of traits, including metabolic and neurological. This work helps to explain the evolutionary history of the rhesus macaque and furthers our understanding about the ways organisms genetically adapt to high altitude environments.


Introduction 90
Selective sweeps produce regions of reduced genetic diversity in the vicinity of an 91 adaptive mutation. These patterns manifest as long extended regions of homozygous 92 haplotypes segregating at high frequency (Przeworski 2002;Sabeti et al. 2002; Kim and 93 population, XP-nSL contrasts between haplotype pools in two different populations, 117 allowing it to detect differential local adaptation. 118 An extreme example of adaptation to a local environment is the transition to high 119 altitude living. Organisms living at high altitude are confronted with many challenges, 120 including a low-oxygen atmosphere and increased ultraviolet light exposure, and these 121 harsh environments inevitably exert strong selection pressure. Indeed, adaptation to 122 high altitude living has been studied extensively across many organisms from plants, Here we test and evaluate our XP-nSL statistic and apply it to study the genomic 150 consequences of high altitude living in the rhesus macaque. We use it to compare the 151 haplotype patterns of the 23 animals from the high-altitude population with another 22 152 and we extract the demographic parameters for our two of interest. This demographic 168 history is recapitulated in Fig. 1 with detailed parameters given in Table 1, which are 169 then used for simulations to test XP-nSL. 170

A Statistic for Detecting Local Adaptation 171
We develop a cross-population haplotype-based statistic, XP-nSL, to scan for 172 regions of the genome implicated in differential local adaptation between two 173 and " ( ) = 180 . ! ( ) and " ( ) represent the mean #$ ( ) over all pairs of 181 haplotypes carrying either the ancestral or derived allele at locus k, respectively. nSL

Detecting Local Adaptation in Real Data 242
From the phased data set, animals captured at high altitude (n = 23) and animals 243 captured at low altitude (n = 22) were subset (see Table S1). Using selscan v1.3.0 244 (Szpiech and Hernandez 2014) to compute raw XP-nSL scores across the genome, 245 scores were then normalized using the genome-wide empirical background. The low-246 altitude population was used as the reference population, so positive XP-nSL scores 247 correspond to long homozygous haplotypes and a possible sweep in the high-altitude 248 population compared to the low-altitude population, and vice versa for negative XP-nSL 249

scores. 250
In order to identify regions implicated as potentially adaptive, we search for 251 clusters of extreme scores along a chromosome. Using selscan's companion program 252 norm, the genome is divided into non-overlapping 100kb regions and both the maximum 253 XP-nSL score and the fraction of XP-nSL scores > 2 are computed. norm then creates 254 10 quantile bins for windows and identifies the top 1% of windows with the highest 255 fraction of extreme scores. Each window is then annotated with the ensembl rheMac8 256 gene list, and a maximum XP-nSL score is assigned to a given gene based on the max-257 score in the 100 kb window with which it overlaps. If a gene overlaps more than one 100 258 kb window, it is assigned the top max-score from among the windows. 259

Power Analysis of XP-nSL 261
First, we evaluate the performance of XP-nSL based on simulations. After 262 computing XP-nSL for all sites in all simulations, scores were normalized by subtracting 263 the mean and dividing by standard deviation of the neutral simulations, giving the neutral 264 scores an approximately N(0,1) distribution (Fig. S1). 265 We consider the maximum score in a 100 kb interval as way to identify regions 266 under positive selection similar to (Voight et al. 2006). To get the null distribution of max-267 scores, the maximum score is computed in the central 100 kb of all neutral simulations. 268 The distribution of max-scores in neutral simulations had a median of 2.093 with 95% of 269 the mass between 0.804 and 3.492, which is represented in Fig. 2 as a solid horizontal 270 black line (median) and two dashed horizontal black lines (95% interval). Next, the 271 maximum score is computed in the central 100 kb of all non-neutral simulations, and the 272 median and 95% intervals are plotted for each parameter combination. Fig. 2 shows 273 good separation between the neutral distribution of max-scores and the distribution of 274 max-scores for a range of non-neutral parameters, suggesting that our statistic can 275 distinguish between neutral and non-neutral scenarios. Note that soft sweeps that start Next, we consider that due to linkage disequilibrium consecutive scores will be 285 correlated, and we should therefore expect clusters of extreme scores in true non-286 neutral regions. We thus consider a window-based approach to identify selected regions 287 similar to (Voight et al. 2006 neutral simulation is computed in order to get power. The results are plotted in Fig. 3B, 295 which shows improved power over the max-score approach across a wider range of 296 parameters. Indeed, using the window-based method, power to detect soft sweeps 297 improves substantially across the parameter space, with > 75% power to detect soft 298 sweeps at or near fixation that started at frequency ≤ 0.05. 299

Identifying Evidence for Adaptation to High Altitude in Rhesus Macaques 300
Next we analyze the pair of rhesus macaque populations using XP-nSL, 301 searching for evidence of local adaptation in the high-altitude population. Using the low-302 altitude population as the reference population, normalized % > 0 corresponds to 303 evidence for adaptation in the high-altitude population. Dividing the genome into 100 kb 304 windows, the maximum XP-nSL score of that region is assigned to each gene in it (see 305 The genome-wide top ten 100 kb windows based on the percentage of extreme 325 XP-nSL scores are summarized in Table 3, and these windows overlap several genes, 326 including EGLN1. The EGLN1 locus is single strongest selection signal identified in the 327 entire genome (Fig. 4). This locus has the third highest cluster of extreme scores (Table  328 3), contains the highest XP-nSL score in the entire genome (chr1:207,698,003, % = 329 6.54809), and contains six of the top ten genome wide XP-nSL scores (colored dark red 330 in Fig. 4). EGLN1 encodes Hypoxia-Inducible Factor Prolyl Hydroxylase 2 and acts as to EGLN1, other genes related to lung function, oxygen use, and angiogenesis had 336 evidence for differential local adaptation between low-and high-altitude populations: 337 TRPM7, PAXIP1, RBPJ, and ENSMMUT00000040566 (Table S2) critical enzyme in the citrate cycle (Tanaka et al. 1996) and is found in the top ten 346 genome-wide regions (Table 3). A paralog of MDH1, MDH1B, has been previously 347 identified as a target of selection in humans living at high altitude (Yi et al. 2010). 348 ACADM and COX15 are also found in putatively adaptive regions (Table S2) STXBP5L also appears in the top ten regions (Table 3) (Table S2). JAG2 is involved in the Notch signaling pathway that is 361 crucial for neuronal differentiation in the mammalian brain (Cardenas et al. 2018). 362 TRPM7, in addition to its association with PAH, plays a role in hypoxic neuronal cell 363 death (Aarts et al. 2003).  (Table 3) and has been shown to promote DNA double-strand break repair 370 (Ismail et al. 2015). Furthermore, the DNA polymerase POLH also appears in a 371 putatively adaptive region (Table S2) and is known to be able to efficiently bypass 372 pyrimidine dimer lesions (Zhang et al. 2000). Life at high altitude presents extreme biological challenges, including low 393 atmospheric oxygen, increased UV exposure, harsh winters, and reduced nutrition 394 availability, which create strong selection pressure. Organisms that survive and persist 395 are likely to be carrying genetic mutations that confer an advantage for living in such 396 harsh environments. Common targets for adaptation to such an environment include 397 genes related to hypoxia, regulation of ROS, DNA damage repair, and metabolism. 398 Indeed, in the high-altitude macaque population, we identify a strong signal of positive 399 selection at the EGLN1 locus (Fig. 4), a classic target for adaptation to low-oxygen 400 environments, in addition to 302 other genes, many of which are related to the myriad 401 environmental selection pressures expected in high altitude environments. These results 402 suggest that, rather than a single adaptive mutation at a single locus, adaptation to this 403 extreme environment by rhesus macaques is polygenic and acts through multiple 404 biological systems. 405

406
The authors would like to thank members of the Stevison Lab for helpful discussions 407 and Lawrence Uricchio for helpful comments on early versions of the manuscript. This