Elephant Genomes Elucidate Disease Defenses and Other Traits

Disease susceptibility and resistance comprise important factors in conservation, particularly in elephants. To determine genetic mechanisms underlying disease resistance and other unique elephant traits, we estimated 862 and 1,017 potential regulatory elements in Asian and African elephants, respectively. These elements are significantly enriched in both species with differentially expressed genes involved in immunity pathways, including tumor-necrosis factor which plays a role in the response to elephant endotheliotropic herpesvirus (EEHV). Population genomics analyses indicate that amplified TP53 retrogenes are maintained by purifying selection and may contribute to cancer resistance in elephants, including less malignancies in African vs. Asian elephants. Positive selection scans across elephant genomes revealed genes that may control iconic elephant traits such as tusk development, memory, and somatic maintenance. Our study supports the hypothesis that interspecies variation in gene regulation contributes to differential inflammatory responses leading to increased infectious disease and cancer susceptibility in Asian versus African elephants. Genomics can inform functional immunological studies which may improve both conservation for elephants and human therapies.


Introduction 1
Elephants (family Elephantidae) first appeared on the planet ~5-10 million years ago (MYA) and 2 three species roam today: the African bush elephant (Loxodonta africana), the African forest 3 elephant. (L. cyclotis), and the Asian elephant (Elephas maximus). These species are the only 4 surviving members of the once diverse proboscidean clade of afrotherian mammals 1 . Other 5 elephantids such as straight-tusked elephants (genus Paleoloxodon) and mammoths 6 (Mammuthus) went extinct around 34,000 and 4,300 years ago, respectively 2,3 . All extant 7 elephant species are now threatened with extinction, largely due to poaching and habitat loss. 8 Asian elephants are "endangered," with only ~200 wild individuals in some countries 4 , and 9 African elephants are "vulnerable" with only ~400,000 wild individuals after a decrease of 10 ~100,000 individuals between 2007 and 2016 5, 6 . It is imperative that we study the genetics of 11 these amazing creatures and how this knowledge of their evolutionary history can contribute to 12 their continued conservation. 13 Elephants share many charismatic traits such as prehensile trunks, ivory tusks, 14 intelligence with long-term memory, and large body sizes 1  Another important disease for elephants is cancer, albeit for different reasons. Elephant 1 cancer mortality rates are low compared to humans, despite the fact that cancer is a body size-2 and age-related disease 12 . A potential mechanism of cancer resistance for elephants is an 3 enhanced apoptotic response of elephant cells to DNA damage associated with extensive 4 amplification of retrogene copies of the tumor suppressor gene TP53 12-14 , yet there are still 5 unknowns related to cancer in elephants. For instance, it is unclear if variation in TP53 copy 6 number contributes to differences in cancer defense between elephant species. Also, it is not 7 known whether the observed differential responses to EEHV and TB between Asian and African 8 elephants relate to cancer susceptibility. Detailed analyses of cancer prevalence and mortality in 9 elephants may provide insight into how elephants evolved to handle disease. Here, we add to 10 the knowledge of elephant cancer prevalence with data from zoos accredited by the Association 11 of Zoos and Aquariums (AZA). AZA sets the standards for animal care and welfare in the United 12 States (https://www.aza.org/about-us, last accessed September 2020). Every elephant in an 13 AZA facility undergoes routine blood screens, full body examinations, and thorough necropsy 14 upon death, making the likelihood of documenting elephant cancer high. 15 In addition to maintaining the health of elephants under human care to improve breeding 16 and species survival plans, conservation efforts can benefit from genomic studies that identify 17 genetic variants associated with traits such as disease defense 15 . However, the few elephant 18 functional genomic studies currently available are limited to a small number of individuals and 19 species [16][17][18][19] and the genetic etiologies of most elephant traits are unknown. In our study, we 20 analyze data from three living and two extinct species in comparative and population genomic 21 frameworks in order to understand the genomic basis of elephant traits, including what drives 22 different disease outcomes between species. 23

Asian elephants suffer from higher rates of malignant cancers than African elephants 2
To estimate rates of neoplasia and malignancy in elephants, we collected and analyzed 3 pathology data from 26 AZA-accredited zoos in the USA, which included diagnoses from 76 4 different elephants (n=35 African and n=41 Asian). We found that 5. 71% Table 5). These genome assemblies were used to generate a whole genome alignment (WGA)  3 with 10 other mammals, which we used to define accelerated genomic regions (ARs) unique to 4 Asian and African elephants. We first defined 676,509 60 bp regions that were present in Asian 5 and African elephants and conserved in the 10 background species with phastCons 23,24 6 (conserved regions or CRs, Fig. 1a). 7 Asian and African elephants likely diverged ~5 MYA 25 , and since differences between 8 closely related mammals are primarily due to changes in non-coding regulatory genomic 9 regions 24,26,27 , we focused on the 376,899 CRs detected in non-coding regions. We tested these 10 for accelerated substitution rates in elephants with phyloP 24,26 and found 3,622 regions with 11 significantly increased nucleotide substitution rates in the Asian elephant while 3,777 regions 12 were accelerated in the African bush elephant (q-value < 0.10). We found 2,418 ARs shared 13 between both species, with 862 Asian elephant-specific and 1,017 African bush elephant-14 specific ARs (Fig. 1b) ARs common to both elephants were not significantly enriched near DE genes (Fig. 1c). This 24 pattern remained robust with subsets of increasingly significantly DE genes based on adjusted 25 p-values ( Supplementary Fig. 3). Asian elephant-and African bush elephant-specific ARs 26 disproportionately overlapped DE gene regulatory regions relative to the common ARs (Chi-1 squared test, P=0.019 and P=0.001 respectively; Fig. 1d, Fig. 1e), suggesting that some ARs 2 reflect changes in regulatory regions that alter gene expression patterns in elephant PBMCs. 3 (a), we defined genomic regions that were accelerated (ARs) in two elephant species (red), yet 5 conserved in the set of background species (blue). Branch lengths are given in terms of mean 6 substitutions per site at fourfold degenerate sites (neutral model). Among the ARs detected in 7 elephants, we found ARs common to both elephant species as well as ARs specific to either 8 Asian or African bush elephants (b). Differentially expressed (DE) genes were much more likely 9 to be found in Asian elephant-specific (Fisher's Exact Test, P=2.05e-4) and African elephant-10 specific (P=8.30e-7) ARs than in common ARs (c). Species-specific ARs disproportionately 11 overlap DE gene regulatory regions relative to the common ARs (Chi-squared test, p=0.019 and 12 p=0.001 respectively) (d, e). 13 14 We functionally annotated AR contributions to African and Asian elephant species Data). The broad term 'immune system process' was 4.5 and 2.8 fold enriched with Asian and 8 African elephant-specific ARs (q-value = 4.87e-12 and 7.84e-06, respectively), but not 9 significantly enriched with elephant common ARs. Our results suggest (1) many of the species-10 specific ARs alter gene expression patterns and transcription factor binding networks that 11 eventuate differences in immune function, and (2) Asian-elephant ARs are more enriched in 12 immune pathways than African elephant-specific ARs in terms of both fold-enrichment and 13 statistical significance. 14 We found 109 GO terms significantly enriched with elephant common ARs (q-value < 15 0.05, Supplementary Fig. 6, Supplementary Data), many of which were related to cancer, 16 including "sphingolipid metabolic process" which was in the top 10 most significantly enriched 17 GO terms for both elephant common (5.7 fold enrichment, q-value = 4.69e-08) and African 18 elephant-specific (17.3 fold enrichment, q-value = 4.18e-22) ARs (Fig. 2). Sphingolipid 19 metabolites mediate the signalling cascades involved in apoptosis 29 , necrosis 29 , senescence 30 , 20 and inflammation 31 . We found 2.9 and 3.6 fold enrichments for 'tumor necrosis factor (TNF)-21 mediated signaling pathway' (q-value=4.75e-04) and 'positive regulation of TNF production' (q-22 value = 1.75e-03) in the common ARs, and a 21.5 fold enrichment of "negative regulation of 23 TNF secretion" in African elephant-specific ARs (q-value = 5.01e-04). TNF is a cytokine involved 24 in cell differentiation and death that can induce the necrosis of cancer cells 34 . The upregulation 25 of TNF-alpha has been associated with increased apoptosis in EEHV-infected Asian elephant 1 PBMCs as well 11 . 2 In a check for reproducibility, we found that the number of African elephant-specific ARs 3 assigned to each gene was correlated with previous studies 19 (R = 0.82). The gene most 4 enriched with previously defined non-coding African elephant ARs was the DNA repair gene 5 FANCL (7.3 fold enrichment; q-value = 2.16e-56), which mediates the E3 ligase activity of the 6 Fanconi anemia core complex, a master regulator of DNA repair 32 . We found that  enriched GO terms in terms of -log10(q-value) for each set of ARs (African elephant-specific, 3 African and Asian elephant common, Asian elephant-specific), and their overlap. "Innate 4 immune response" and "immune system process" are in the top 10 most significantly enriched 5 GO terms for Asian elephant-specific ARs, are significantly enriched in African elephant specific 6 ARs but not in the top 10, and are not significantly enriched in the common ARs. "Negative 7 regulation of T-cell proliferation" was only in top 10 significantly enriched GO terms for the Asian 8 elephant-specific ARs. 9

Evolution of TP53 and its retrogenes in elephant genomes 1
The enhanced apoptotic response to DNA damage in elephant cells correlates with the 2 expansion of ~20 copies of the tumor suppressor gene TP53 in elephant genomes 12-14 , and we 3 wanted to (1) understand the evolutionary origins of TP53 gene duplications in Asian and 4 African bush elephants and (2) determine if TP53 copy number is related to body size evolution 5 in elephants. We annotated TP53 homologs in 44 mammalian genomes including Icky the Asian 6 elephant, an additional genome assembly of an Asian elephant 33-35 ("Methai", born in Thailand 7 and living at the Houston Zoo, assembly available at www.dnazoo.org, last accessed 8 September 2020), and the African bush elephant assembly presented in this study 9 (Supplementary Table 6), and incorporated them in a molecular clock analysis. We estimated 10 that TP53 retrogene copies originated in the paneungulate ancestor of manatees and elephants 11 ~55-60 million years ago (MYA) (41.3-75.2 95% highest posterior density or HPD) (Fig. 3a, 12 Supplementary Fig. 7). A subsequent TP53 expansion began ~45 MYA (30.7-60.1 95% HPD) 13 in a common ancestor of Asian and African elephants, and continued throughout elephantid 14 evolution. We estimated 19 copies of TP53 in the African bush elephant genome assembly, and 15 9-11 TP53 copies in the Asian elephant genome based on the two assemblies for the species. 16 We mapped whole genome shotgun data from multiple individuals belonging to three 17 living and two extinct elephant species (Supplementary Table 7) to the bush elephant genome 18 annotation (loxAfr3) and used normalized read counts to estimate TP53 copy numbers in 19 elephant genomes (Figure 3b; Supplementary Table 8). Based on read depth, African bush 20 elephants (n=4) have on average ~19-23 TP53 copies in their genomes, and Asian elephant 21 genomes (n=7) contain as few as 10 TP53 copies, or as many as 37, but without these outliers 22 average ~19-22 TP53 copies in their genomes. These estimates are similar to previous 23 estimates of TP53 copy numbers for bush and Asian elephants based on smaller numbers of 24 individuals 12-14 . We estimated ~21-24 TP53 copy number variants in forest elephant genomes 25 (n=2). The woolly mammoths (n=2) were estimated to have between 19 and 28 TP53 copies in 26 their genomes, which was slightly higher than previous estimates 14 . Meanwhile, the straight-1 tusked elephant genome contained ~22-25 TP53 copies. 2 The number of TP53 copies estimated in the genomes of Asian elephants differed based 3 on the method used. For instance, we validated two TP53 retrogenes in Icky's genome 4 assembly, which phylogenetically clustered closely with two of the nine retrogenes validated in 5 Methai's genome assembly (Fig. 3a). Taken together, based on the reciprocal BLAT searches 6 of both assemblies we estimated 9-11 TP53 copies in the Asian elephant genome. However, 7 based on normalized read counts, we estimated 10-37 TP53 copies (Fig. 3b), which is more 8 consistent with previous studies 12, 14 . The lower estimates we obtained from the Asian elephant 9 genome assemblies may be due to poorly resolved repetitive regions which hamper graph-10 based de novo genome assemblers 33 . Subsequent refinement of Asian elephant genome 11 assemblies using long read sequencing may better resolve these regions. In the meantime, our 12 results suggest that copy number estimates based on read depth are useful approximations for 13 approaches validated from genomic DNA. 14 15 16 TP53 retrogene sequences extracted from the Asian elephant assembly "Icky" presented in this 7 study. (b) TP53 copy number estimates based on read counts from three living elephants 8 (  (2)   Genetic variation in TP53 copy number variants suggests maintenance of some by 8 purifying selection 9 We found a high degree of sequence conservation in the TP53 paralogs both within and 10 between the three living elephants (Supplementary Table 9). For instance, the proportion of 11 polymorphic sites in putatively neutrally evolving ancestral repeats was 0.013, but across 12 12 annotated TP53 paralogs was 0.004. Despite the deep genomic divergence of forest elephants, 13 we found very little genetic variation in TP53 paralogs for the species, with just a single 14 segregating site in three of the retrogenes. Across all species, we collected zero 1 nonsynonymous SNPs for the functional homolog (ENSLAFG00000007483), consistent with 2 strong purifying selection on this gene and ENSLAFG00000028299, or "retrogene 9", whose 3 protein expression is stabilized by DNA damage in human cells 12 . 4 We annotated variants in 12 TP53 paralogs based on the bush elephant genome 5 annotation and found few variants affecting gene function (Supplementary Table 10 For instance, the most significantly enriched BPs (in terms of fold enrichment) were "dendrite 2 self-avoidance" (27-fold enrichment, FDR=0.02), "ionotropic glutamate receptor signaling 3 pathway" (17-fold enrichment, FDR=0.02), and "regulation of NMDA receptor activity" (14-fold 4 enrichment, FDR=0.03). Many significantly enriched GO terms clustered semantically with 5 "trans-synaptic signaling" (Fig. 5a). We also found significant protein interactions among outlier Our study of elephant genomes expands the knowledge of elephant evolution, highlighting 2 differences and similarities between species. Elephant tumors tend to be benign with strong 3 genetic defenses to prevent malignant transformation. Asian elephants reported in our study 4 develop benign tumors and malignant cancer at higher rates than African elephants, and 5 therefore may benefit from increased monitoring for tumors. Even though our data originates 6 from captive elephants, these differences most likely reflect true genetic differences as the AZA 7 has a Species Survival Plan (SSP) (https://www.aza.org/species-survival-plan-programs) for 8 elephants that maximizes genetic diversity via the careful selection of mate pairs and studbook 9 documentation 47,48 . Together with the fact that many elephants in zoos are wild born, it is likely 10 that wild Asian elephants share increased susceptibility to neoplasia with the same observed 11 genetic variation we report in our study. 12 While collecting cancer prevalence data in wild elephants is challenging due to 13 decomposition and predator consumption, future data from wild elephants and genomic analysis 14 of benign vs. malignant tumors will be crucial to further understand the evolutionary basis of 15 differences in cancer risk between elephant species. This information could benefit the survival 16 of individual elephants and assist with selecting the best treatment interventions when the rare 17 elephant tumor is diagnosed in captivity or in the wild. More than half of the elephant tumors 18 reported here were found in reproductive organs (Table 1)

. Even benign reproductive tumors 19
can affect reproduction and conservation, therefore future studies to understand their impact 20 and to develop preventative and treatment measures are needed. 21 While previous studies suggest that TP53 copy number increased with body mass 22 during proboscidean evolution as a response to increased cancer risk 14 , we estimated some of 23 the highest TP53 copy numbers in the smallest elephants. Based on available sequence data, 24 we estimated ~19-21 TP53 copy number variants in the ~44,800 year old woolly mammoth 25 genome from Oimyakon, Russia, but found that the much more recent ~4,300-year-old Wrangel 26 Island mammoth had ~1.3X this number of TP53 copies in its genome. These findings are 1 consistent with the demographic decline of the last woolly mammoths on Wrangel Island 18 , 2 which by 12,000 years before present shrunk in body size by ~30% relative to more ancient 3 mammoths elsewhere 4 . The estimated TP53 copy number increase in the Wrangel Island 4 mammoth may be related to the random fixation of retrogenes in the population rather than 5 selection acting on body size. 6 We estimated ~21-24 TP53 copy number variants in forest elephant genomes, greater 7 than our estimates for bush elephants despite the smaller body size of forest elephants. 8 Meanwhile, we estimated ~23-25 TP53 copy number variants in the genome of the straight-9 tusked elephant, which at ~13,000kg may have been the largest land mammal to have ever 10 lived (Fig. 3b) 49 . Recent genomic evidence suggests that forest elephants are more closely 11 related to straight-tusked elephants than to bush elephants (Fig. 3b) 25,50 , with extensive gene 12 flow occurring between forest and straight-tusked elephants, as well as between straight-tusked 13 elephants and mammoths 25 . Thus, the possibility exists that, as in the Wrangel Island 14 mammoth, the higher estimated TP53 copy number in forest elephants relative to bush 15 elephants may not be related to modern differences in body mass between species (and 16 possible protection from increased cancer risk), but instead may be due to complicated 17 evolutionary and demographic histories which include migration that can dramatically affect the 18 dynamics of repetitive genomic elements such as retrogenes 51 . Nevertheless, we still find that 19 genetic variation at some TP53 retrogenes is tightly conserved in populations of all living 20 elephant species. This adds at least some evidence to the functionality of TP53 retrogenes. We 21 suggest that there may be a core set of TP53 retrogenes that confer the bulk of cancer 22 suppression in elephants. 23 Our results support the idea that regulatory elements play a role in the increased 24 infectious susceptibility with inflammatory response of Asian versus African elephants, 25 particularly the mediation of the TNF cytokine. Asian elephant calves are much more 26 susceptible than African elephant calves to cytokine storm triggered by EEHV infection 52 . 1 Compared to African elephants, we found that Asian elephant ARs are enriched for BP GO 2 terms "interleukin-1 beta (IL-1β) production" (q-value=0.036), "interleukin-18 (IL-18) production" 3 (q-value=0.00073), and "neutrophil activation involved in immune response" (q-value=2.44e-05) 4 (Supplementary Data). IL-1β, IL-18 and neutrophils, combined with TNF-alpha, are potent 5 mediators of innate immunity. Uncontrolled activation of these factors leads to immune-induced 6 disease pathogenesis through excessive inflammation. In humans and other mammals, 7 neutrophil activation directly contributes to tissue damage through the release of neutrophil 8 extracellular traps (NETs) 53,54 . Functional studies to compare cytokine secretion and NET 9 release in response to infectious agents are ongoing and could confirm that genetic differences 10 in innate immunity contribute to differences in disease susceptibility and outcomes between 11 Asian and African elephants. Our study provides an example of how genomics can inform 12 functional immunological and molecular studies, which may lead to improved conservation and 13 medical care for elephants. This type of genetic information could provide important evolutionary 14 insights to one day be translated into human patients with infection or cancer. Throughput Genomics Core (Salt Lake City, UT). 21

TP53 evolution in African and Asian elephants 22
To determine TP53 copy numbers and evolutionary patterns across placental mammals, we 23 exported the TP53 human peptide from Ensembl (July 2019), and used it as a query in 24 reciprocal BLAT searches 71 of 44 mammalian genome assemblies (Supplementary Table 6), 25 validated with a BLASTX query of the human peptide database on NCBI in order to ensure the 1 top hit was human TP53 with ≥70% protein identity, following Tollis et al. (2020) 72 . Accepted 2 nucleotide sequences were aligned with MAFFT 73 , and we weighted and filtered out unreliable 3 columns in the alignment with GUIDANCE2 74 using 100 bootstrap replicates. We reconstructed 4 the phylogeny of all aligned mammalian TP53 homologs and estimated their divergence times in 5 a Bayesian framework with BEAST 2.5 75 using the HKY substitution model, a relaxed lognormal 6 molecular clock model, and a Yule Model tree prior. We used a normal prior distribution for the The final MCMC chain was run for 100,000,000 generations, and we logged parameter samples 15 every 10,000 generations to collect a total of 10,000 samples from the posterior distribution. We 16 then collected 10,000 of the resulting trees, ignored the first 10% as burn-in, and calculated the 17 maximum clade credibility tree using TreeAnnotator. 18

Detection of accelerated regions in African and Asian elephant genomes 19
We generated a multiple alignment (whole genome alignment or WGA) of 12 mammalian 20 genome assemblies. First, we downloaded publicly available pairwise syntenic alignments of 21 opossum (monDom5), mouse (mm10), dolphin (turTru1), cow (bosTau7), dog (canFam3), horse 22 (equCab2), microbat (myoLuc1), tenrec (echTel2), and hyrax (proCap1) to the human reference 23 (hg19) from the UCSC Genome Browser 77 . We also computed two additional de novo pairwise 24 syntenic alignments with the human genome as a target and the two elephant genome 1 assemblies reported here as queries using local alignments from LASTZ v1.02 78 using the 2 following options from the UCSC Genome Browser for mammalian genome alignments: --3 hspthresh 2200 --inner 2000 --ydrop 3400 --gappedthresh 10000 --scores HOXD70, followed by 4 chaining to form gapless blocks and netting to rank the highest scoring chains 79 . We then 5 constructed a multiple sequence alignment with MULTIZ v11.2 80 with human as the reference 6 species. 7 To define elephant accelerated regions (ARs), we used functions from the R package 8 rphast v1.6 24 . We used phyloFit with the substitution model 'REV' to estimate a neutral model 9 based on autosomal fourfold degenerate sites from the WGA. We then used phastCons to 10 define 60 bp autosomal regions conserved in the 10 non-elephant species in the WGA with the 11 following options: expected.length = 45, target.coverage = 0.3, rho = 0.31. We further selected 12 regions with aligned sequence for both African and Asian elephants that have aligned sequence 13 present for at least 9 of the 10 non-elephant species. We tested the resulting 676,509 regions 14 for acceleration in each elephant species relative to the 10 non-elephant species with phyloP 15 using the following options: mode = 'ACC'. We used the Q-Value method 81 to adjust for multiple 16 testing. Statistically significant ARs were defined with a false discovery rate threshold of 10%. 17 We defined regions significantly accelerated in the Asian elephant, but not the African bush 18 elephant as Asian elephant specific ARs and conversely defined African bush elephant specific 19 ARs. Our previous studies of accelerated regions suggest no significant relationship between 20 genome quality and number of ARs discovered 19 . 21 To define genes differentially expressed between Asian and African elephants we took were obtained with biomaRt. The resulting p-values were q-value corrected for multiple 22 testing 81 . We used the same potential regulatory regions and LOLA to test for GO enrichments. 23 We compared elephant AR set GO enrichments to GO enrichments from previously 24 published AR sets for 5 mammalian species (13-lined ground squirrel, naked mole rat, orca, 25 bottlenose dolphin, and little brown bat) 19 . These AR sets were lifted over from hg19 coordinates 26 to loxAfr3 coordinates. Numbers of ARs and background CRs overlapping potential regulatory 1 regions of genes included in and excluded from each GO term were calculated with LOLA. We 2 used generalized linear models with binomial distributions to compare elephant AR enrichments 3 in each GO term to AR enrichments for the 5 other mammals. We contrasted models without 4 and with an interaction term distinguishing the elephant AR set from the others. The two models

Whole genome sequence analysis of living elephants 18
We obtained ~15-40x whole-genome sequencing data from multiple individuals from across the 19 modern range of living elephants from public databases 12,16,28,91 , and the WGS libraries for "Hi-20 Dari" and "Icky" as well as a third African elephant named "Christie" (Supplementary Table 7). 21 We also obtained sequence data from a straight-tusked elephant 25 and two woolly mammoths 91 . 22 Sequences were quality checked using FastQC and trimmed to remove nucleotide biases and 23 adapter sequences with Trimmomatic where necessary. Reads from each individual were 24 mapped to the L. africana genome (loxAfr3.0, Ensembl version) using bwa-mem v077 92 . 1 Alignments were filtered to include only mapped reads and sorted by position using Samtools 2 v0.0.19 93 , and we removed PCR duplicates using MarkDuplicates in picard v1.125 94 . Single-end 3 reads from the ancient samples were mapped to loxAfr3.0 with bwa-aln following Palkolpoulou 4 et al. (2018). To detect loci that have been putatively subjected to positive selection within each living 2 elephant species, we used SweeD v3.3.1. SweeD scans the genome for selective sweeps by 3 calculating the composite likelihood ratio (CLR) test from the folded site frequency spectrum in 1 4 kb grids across each scaffold. We used the folded site frequency spectrum because we lacked 5 a suitable closely related outgroup with genomic resources that would enable us to establish 6 ancestral alleles. For this analysis, we studied each species individually. Following Nielsen et al. 7 (2005), we established statistical thresholds for this outlier analysis. First, we generated a null 8 model by simulating 1,000 data sets with 100,000 SNPs under neutral demographic models. As 9 our population genomics statistics results were highly concordant with Palkopoulou et al. (2018), 10 we constructed demographic models based on their Pairwise Sequential Markovian Coalescent 11 trajectories 101 for each species, which we implemented with ms (October 2007 release) 102 12 ( Supplementary Fig. 9). Then, we categorized regions as outlier regions in the observed SNP 13 data if their CLR was greater than the 99.99 th percentile of the distribution of the highest CLRs 14 from the simulated SNP data. For the neutral simulations, we assumed a per-year mutation rate 15 of 0.406e-09 and a generation time of 31 years, following Palkopoulou et al. (2018). We then 16 calculated the CLR with the simulated neutral SNP datasets. SweeD output files were changed 17 to BED format using namedCapture 103 and data.table 104 R packages, and we used bedtools 18 intersect 105 to collect elephant gene annotations (loxAfr3.0, Ensembl) overlapping putative 19 selective sweeps. 20 Genomic scans for selection may be complicated by several factors that can increase 21 false positive rates, and false negative rates potentially stem from variable mutation and 22 recombination landscapes 38,106 . We established statistical thresholds using null demographic 23 models. However, the estimated split times within living elephant species differ widely, ranging 24 from 609,000 to 463,000 years before present for forest elephants 37 , to as recent as 38,000 to 25 30,000 years before present for bush elephants 91,107 . Estimated split times between the sampled 26 Asian elephants are highly variable, ranging from 190,000 to 24,000 years before present 25 , 1 indicating continental-wide population structure not accounted for here 108,109 . Still, Palkopoulou 2 et al. (2018) found little evidence for gene flow between the three modern species of elephant, 3 which supports our choice of analyzing them separately for selective sweeps. We focused on 4 shared outlier regions, which show consistent evidence of being targeted by positive selection 5 across all three elephant species. 6 Genes overlapping outlier regions of putative selective sweeps were functionally 7 annotated by testing for GO enrichment of terms for biological processes 110 in the outlier gene 8 list, using default parameters and the Benjamini-Hochberg correction for multiple testing with an 9 adjusted p-value < 0.05. We used REVIGO 111 to semantically cluster and visualize the most 10 significant GO terms according to their adjusted p-values using default parameters. We also 11 created annotation clusters from the outlier genes using DAVID v6.8 112 and constructed protein 12 interaction networks with STRING v11.0 113 . Finally, we tested for enriched mouse phenotypes 13 using ModPhea 114 . 14

Data Availability 15
Short-read sequence data generated for this study has been shared under NCBI Bioproject 16 PRJNA622303, and the genome assembly for Icky the Asian elephant is available on NCBI 17 Miami and three other anonymous zoos. We acknowledge Huntsman Cancer Institute's High-7 Throughput Genomics Core and the Monsoon computing cluster at Northern Arizona University 8 (https://nau.edu/high-performance-computing/). Research reported in this publication was 9 supported by the National Cancer Institute of the National Institutes of Health under Award 10 Number U54CA217376. The content is solely the responsibility of the authors and does not 11 necessarily represent the official views of the National Institutes of Health. 12

Conflict of Interest 13
Dr. Schiffman is co-founder, shareholder, and employed by PEEL Therapeutics, Inc., a 14 company developing evolution-inspired medicines based on cancer resistance in elephants. Dr. 15 Abegglen is share-holder and consultant to PEEL Therapeutics, Inc. 16