Elephant genomes reveal insights into differences in disease defense mechanisms between species

1School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ; 2Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ; 3Department of Neurobiology and Anatomy, University of Utah, Salt Lake City, UT 84132-3401, USA; 4Department of Genetics, University of Utah, Salt Lake City, UT, USA; 5Center for Biocomputing, Security and Society, Biodesign Institute, Arizona State University, Tempe, USA; 6Department of Clinical Sciences, North Carolina State University, Raleigh, NC, 27607 USA; 7Ringling Bros Center for Elephant Conservation, Polk City, Florida USA; 8William H. Darr College of Agriculture, Missouri State University, Springfield, Missouri, USA; 9Northwest ZooPath, Monroe, WA 98272; 10Arizona State University, Department of Psychology, Tempe, AZ; 11Department of Anthropology, University of California, Santa Barbara CA, USA; 12Department of Pediatrics & Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA; 13PEEL Therapeutics, Inc., Salt Lake City, UT, USA & Haifa, IL


Introduction
Elephants are charismatic megafauna recognized by their prehensile trunks, ivory tusks, intelligence with long-term memory, and large body sizes 1  are the only surviving members of the proboscidean clade of afrotherian mammals. These three species range in body size from African bush elephants (3,000-6,000kg), Asian elephants (2,700-4,000kg), to the smallest, African forest elephants (~2,000kg) 2 . The evolutionary history of elephants included speciation, population size changes, and extensive gene flow between species 3 . Mammoths (genus Mammuthus) went extinct between ~11,000 and ~4,300 years ago 4 . Straight-tusked elephants (genus Palaeoloxodon) roamed Eurasia until ~34,000 years ago 5 , and at an estimated average body mass of 13,000kg many individuals of P. antiquus may have been the largest ever land mammals 2 .
All extant elephant species are threatened with extinction due to poaching and habitat loss. Asian elephants are recognized as endangered by the International Union for Conservation Red List, with only ~200 wild individuals in some countries 6  identifying adaptive alleles and phenotype-associated variants 9 . While genomic approaches are useful for elephant conservation, the few elephant functional genomic studies currently available are limited to a small number of individuals and species [10][11][12][13] and the etiologies of many elephant traits with profound fitness (and therefore conservation-oriented) consequences have not yet been discovered. For instance, increased frequencies of tuskless elephants may be a response to selective pressures from ivory poaching [14][15][16] . The genes controlling tusk development, however, remain unknown. Disease defense is extremely important for elephants and has important ramifications for species conservation 9 . Asian elephants are threatened by their susceptibility to acute hemorrhagic disease resulting from infection with elephant endotheliotropic herpesvirus (EEHV) 17 . Despite recent reports of African elephant calf fatalities from EEHV, mortality rates are higher for Asian elephants, suggesting a genetic component for increased EEHV lethality.
Furthermore, elephant cancer mortality rates are low compared to humans, despite their large size and long lifespans, and the fact that cancer is a body size-and age-related disease 18 . This suggests that elephants evolved more efficient cancer suppression mechanisms relative to smaller mammals. Cancer suppression in elephants has profound ramifications for elephant health, and mechanisms of cancer resistance in elephants could inform human cancer research [18][19][20] . Recent studies have largely focused on elephant-specific duplications of the tumor suppressor gene TP53 (or EP53). However, the evolution of cancer suppression in elephants likely involved many pathways including additional DNA repair genes 13 .
Here, we generated the largest elephant genomic dataset to date. We present a draft genome assembly for an Asian elephant and an updated African bush elephant assembly.
Using a comparative genomic approach, we discovered elephant genomic regulatory regions and identified differential gene expression between Asian and African elephants. We also leveraged genome sequences from 16 elephants across three living and two extinct species to estimate genetic changes during elephant evolution, including selective sweeps in living populations. The goals of this study are: (1) to provide novel resources for the mammalian comparative genomics and elephant conservation communities; (2) to place genomic changes underlying elephant traits in the contexts of deeper mammalian and elephant evolution; (3) to understand what drives different disease outcomes between elephant species; and (4) to characterize the evolutionary history of cancer suppression in the elephant lineage.
. CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint

New reference genomes for elephant genomics
All genome sequence data and assemblies generated for this study has been shared under NCBI Bioproject PRJNA622303. We sequenced an estimated 94.4x coverage of the Asian elephant genome using multiple Illumina sequence libraries (Supplementary Table 1 Table 5). A total of 528 previously unjoined scaffolds were linked and nine scaffolds/chromosomes were broken ( Supplementary Fig. 1). We suggest future improvements to these assemblies such as high-resolution genomic profiling with long read and/or optical mapping technologies to further resolve structural differences between elephant genomes.

Elephant-specific accelerated regions correlate with species-specific gene expression patterns
Differences between closely related mammals are most often due to changes in non-coding regulatory regions that alter gene expression patterns and transcription factor binding networks 21,22 . To understand how non-coding regulatory regions differ between Asian and African bush elephants, we defined accelerated genomic regions for each species 22 . We first defined 376,899 non-coding regions present in Asian and African elephants and conserved in . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint 10 background mammal species (conserved regions or CRs, Fig. 1a). Of these, 3,622 regions had significantly increased nucleotide substitution rates (accelerated regions or ARs) in the Asian elephant, and 3,777 regions were accelerated in the African bush elephant (q-value < 0.10). We found 2,418 ARs in both species, while 862 are Asian elephant-specific and 1,017 are African bush elephant-specific (Fig. 1b).
Accelerated regions common to Asian and African bush elephants were likely driven by changes pre-dating the evolutionary divergence of the two elephants. In contrast, Asian elephant-and African bush elephant-specific ARs may point to enhancers driving gene expression level changes that impact phenotypes distinguishing the two elephant species. We hypothesized that a disproportionate number of genes near these ARs would be differentially expressed between the two species. Using available African bush elephant and Asian elephant peripheral blood mononuclear cell (PBMC) RNA-Seq data 13,23 , we defined 5,034 differentially expressed (DE) elephant genes (false discovery rate or FDR < 0.01). Both Asian elephant-and African bush elephant-specific ARs are significantly enriched near DE genes relative to conserved regions (q-value = 2.05e-4, q-value = 8.30e-7, respectively). Meanwhile, the 2,418 ARs common to both elephants were not significantly enriched near DE genes (Fig. 1c). We used a Chi-squared test to determine if Asian elephant-and African bush elephant-specific ARs disproportionately overlap DE gene regulatory regions relative to the common ARs. Both tests yielded significant p-values, 0.019 and 0.001 (Fig. 1d, Fig. 1e), suggesting that some ARs reflect changes in regulatory regions that alter gene expression patterns in elephant PBMCs.
. CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint Figure 1. Elephant accelerated regions. Using a whole genome alignment of 12 mammals (a), we defined genomic regions that were accelerated (ARs) in two elephant species (red), yet conserved in the set of background species (blue). Branch lengths are given in terms of mean substitutions per site at fourfold degenerate sites (neutral model). Among the ARs detected in elephants, we found ARs common to both elephant species as well as ARs specific to either Asian or African bush elephants (b). Differentially expressed (DE) genes were much more likely to be found in species-specific ARs than in common ARs (c). Species-specific ARs disproportionately overlap DE gene regulatory regions relative to the common ARs (d, e).
We sought to further explore AR contributions to African and Asian elephant species differences with a gene ontology (GO) enrichment analysis. We tested the elephant species specific and common ARs for Biological Process (BP) GO term enrichments. Of 18,056 terms, 252 were significantly enriched in Asian elephant specific ARs and 275 were enriched in African elephant specific ARs (q-value < 0.05). Many of the GO terms related to the immune system in both species ( Fig. 2; Supplementary Fig. 2; Supplementary Fig. 3; Supplementary Fig. 4). The broad term 'immune system process', for example, was 4.5 and 2.8 fold enriched with Asian and African elephant-specific ARs (q-value = 4.87e-12, q-value = 7.84e-06, respectively), but not . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint significantly enriched with elephant common ARs. We found 109 GO terms significantly enriched with elephant common ARs (q-value < 0.05) including a 3.6 fold enrichment of elephant common ARs for the term 'positive regulation of tumor necrosis factor production' (qvalue = 1.75e-03). In Figure 2 we present the immune and cancer related GO terms significantly enriched with any of the three AR sets. All enrichments are shared as a resource as supplementary data.
The number of African elephant-specific ARs assigned to each gene is well correlated between this study and a previous study (Ferris et

Evolution of TP53 and its retrogenes (EP53) in elephant genomes
A potential mechanism for cancer resistance in elephants is an enhanced apoptotic response to DNA damage associated with extensive amplification of elephant TP53 (EP53) retrogenes 18,19,26 . By incorporating annotated TP53 homologs from Asian elephant, African bush elephant, and 41 additional mammals in a molecular clock analysis, we estimated that the EP53 retrogene families diverged ~35 million years ago (23.51-49.4 95% highest posterior density) ( Fig. 3a). The genome of the West Indian manatee (Trichechus manatus), a paenungulate afrotherian closely related to proboscideans, contains a retrogene copy of TP53 as well. Thus, TP53 retrogenes were likely present in paenungulate ancestors of manatees and elephants, and further expanded in copy number during elephantid evolution.
We obtained whole genome shotgun data from three living and two extinct elephant species from the public databases, and sequenced three additional elephants at ~32-38X coverage (Supplementary Table 6). We mapped these sequences to the African bush elephant genome assembly (loxAfr3.0) with bwa-mem v077 27 (see section on methods). We utilized the loxAfr3.0 version due to its extensive representation in public repositories, including Ensembl (www.ensembl.org). Using normalized read counts to estimate EP53 copy numbers in the genomes of three living and two extinct elephant species, (Figure 3b; Supplementary Table 7), we found that African bush elephants have on average ~19-23 EP53 homologs in their genomes, which is similar to previous estimates 18,19,26 . The Asian elephant genomes contain as few as 10 EP53 copy number variants, or as many as 37, but on average contain ~19-22 EP53 copies in their genomes. This variation across Asian elephant genomes could be due to factors such as the evolutionary distance between African and Asian elephants which may affect mapping quality, or differences between individual Asian elephants. Regardless, these estimates are similar to previous estimates of EP53 copy number for Asian elephants based on smaller numbers of individuals 18,19 . We estimated ~21-24 EP53 copy number variants in forest elephant genomes. The woolly mammoths (Mammuthus primigenius) were estimated to have . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint between 19 and 28 EP53 copies in their genomes, which was slightly higher than previous estimates 19 . Meanwhile, the straight-tusked elephant genome contained ~22-25 EP53 copies.
. CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint

Genetic variation in EP53 suggests maintenance of some paralogs by purifying selection
We analyzed biallelic SNPs called within loxAfr3.0 EP53 paralogous gene annotations to determine patterns of genetic variation in EP53 and its retrogenes in modern populations of elephants. Overall, we found little genetic variation in the EP53 paralogs both within and between the three living elephants (Supplementary Table 8). For instance, the proportion of . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint polymorphic sites in putatively neutrally evolving ancestral repeats was 0.013, while the average proportion of polymorphic sites across 12 EP53 paralogous loci was 0.004. We found the most genetic variation of all EP53 loci in Asian elephants. Meanwhile, despite the deep genomic divergence of forest elephants, we found that EP53 is conserved in forest elephants, with no SNPs except within three retrogenes which contained one segregating site each. We collected zero nonsynonymous SNPs for the functional homolog (ENSLAFG00000007483). In addition, ENSLAFG00000028299 or "retrogene 9", whose protein expression is stabilized by DNA damage in human cells 18 , showed no variation across the three modern elephant species.
We annotated variants in 12 EP53 paralogs based on the bush elephant genome annotation and found few variants affecting gene function (Supplementary Table 9), consisting of mostly missense mutations. There were no variants of high or moderate impact on gene function annotated in the functional homolog (ENSLAFG00000007483), with the majority of variants occurring downstream, in introns, or upstream of the gene. The high degree of sequence conservation across three species of elephant, and in particular the lack of variants with functional effect, especially in "retrogene 9," suggests that at least some EP53 retrogenes are being maintained by purifying selection.

An elephant never forgets: positive selection in elephant genomes
To assess the importance of natural selection in elephants, we scanned the genomes of three species (Elephas maximus, Loxodonta africana, and L. cyclotis) for positive selection using

Discussion
We have leveraged available and de novo genomic resources to investigate functional genomic evolution in elephants, presenting a draft assembly for the Asian elephant and an improved assembly for the African bush elephant. By estimating conserved genomic regions in a set of 10 background mammals, we identified elephant-specific accelerated genomic regions associated with significant gene expression differences between Asian and African elephants. These putative regulatory regions are associated with immune function. This may explain observed differential vulnerability to infection and cancer across the two species. We estimated EP53 copy number and SNP variation in living and extinct elephant species to determine that both neutral and selective forces likely control the segregation of EP53 alleles. By modeling selective sweeps in living elephant populations, we identified genomic regions under positive selection that may control tusk development, memory and learning, and somatic maintenance related to cancer resistance.
Cancer mortality rates in elephants have been reported using limited elephant necropsy data 18 . Here we investigated cancer prevalence in elephants using elephant biopsy and necropsy data from 26 zoos. We confirm that elephants develop cancer at a lower rate than predicted based on their size and lifespan, i.e. Peto's Paradox, with reports of neoplasia in 19 of the 76 individuals (25%) and malignant cancer reported in 6 of the 76 individuals (7.89%).
Tumors that develop in elephants tend to be benign, which may further suggest that elephants have strong cancer defense mechanisms to not only suppress cancer from developing, but also to prevent malignant transformation. Asian elephants develop benign tumors and malignant cancer at higher rates than African elephants; therefore, Asian elephants may benefit from increased monitoring for tumors when under human care as well as in the wild. The collection of additional cancer prevalence data and the genomic analysis of benign and malignant tumors in elephants will be important to continue to understand the evolutionary basis of differences in cancer risk between elephant species. This information could assist with selecting the best While previous studies suggest that EP53 copy number increased with body mass during proboscidean evolution as a response to increased cancer risk 19 , we estimate some of the highest EP53 copy numbers in some of the smallest elephants. For instance, while we estimated ~19-21 EP53 copy number variants in the ~44,800 year old woolly mammoth genome from Oimyakon, Russia, we estimated that the much more recent ~4,300-year-old Wrangel Island mammoth had ~1.3X this number of EP53 copies in its genome. This is remarkably consistent with previous results that estimated an increase in the retention of retrogenes in the genome by 1.3X associated with the demographic decline of the last woolly mammoths on Wrangel Island 12 , which by 12,000 years before present shrunk in body size by ~30% relative to more ancient mammoths elsewhere 4 . Therefore, the estimated EP53 copy number increase in the Wrangel Island mammoth may be related to the random fixation of retrogenes in the population rather than selection acting on body size.
We estimated ~21-24 EP53 copy number variants in forest elephant genomes, greater than our overall estimates for bush elephants despite the smaller body size of forest elephants.
Meanwhile, we estimated ~23-25 EP53 copy number variants in the genome of the straighttusked elephant, which was considerably more massive than any extant elephant 2 . Recent genomic evidence suggests that forest elephants are more closely related to straight-tusked elephants than to bush elephants (Fig. 3b) 3,29 , and further evidence suggests extensive gene flow may have occurred between forest and straight-tusked elephants, as well as between straight-tusked elephants and other elephantids including mammoths 3 . Palkopoulou et al.
(2018) also found that despite hybridization in the modern contact zones between forest and . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint bush elephants, there is little evidence of historical gene flow between these two lineages. Thus, the possibility exists that, as in the Wrangel Island mammoth, the higher estimated EP53 copy number in forest elephants relative to bush elephants may not be related to modern differences in body mass between species (and possible increased cancer risk), but instead may be due to complicated evolutionary and demographic histories. However, we also found that genetic variation at some EP53 retrogenes is conserved in elephant populations, which adds evidence to the potential functionality of at least some EP53 retrogenes. This suggests that there may be a core set of EP53 retrogenes that confer the bulk of cancer suppression in elephants, with important implications for research on the translational and therapeutic potential of EP53.
We found a significant enrichment for the GO term 'tumor necrosis factor-mediated signaling pathway' (q-value=0.000245) in ARs in common to both Asian and African elephants.
Tumor necrosis factor (TNF) is a cytokine involved in cell differentiation and death that can induce the necrosis of cancer cells 34 . The common ancestor of Asian and African elephants was a large-bodied elephantid 1 , and large and long-lived species must evolve cancer suppression mechanisms to offset the increased cancer risk associated with a large number of dividing somatic cells and long lifespans 35 . Therefore, it makes sense that there would be accelerated evolution in cancer-related pathways along the elephant lineage after divergence from the small-bodied common ancestor with the rock hyrax (Procavia capensis). We noted that, in addition, many Asian elephant-or African elephant-specific ARs were found near genes associated with immunological GO terms. Furthermore, genes near Asian elephant-specific ARs were enriched for the GO term 'negative regulation of tumor necrosis factor-mediated signaling pathway' (q-value = 0.0448). While an enhanced apoptotic response to DNA damage can be a potential mechanism for cancer resistance in elephants 18,19 , increased apoptosis also can be associated with the upregulation of TNF-alpha especially in the setting of EEHV-infected Asian elephant PBMCs 36 . This suggests that regulatory elements may govern immunological differences between the two elephant species. In support of these results, Asian elephant . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint calves are much more susceptible than African elephant calves to cytokine storm triggered by EEHV infection 17 . This is consistent with our observation that Asian elephants are also more susceptible to both malignancies and neoplasia in general. We also note that, assuming both species have equal rates of exposure, rates of tuberculosis infection are higher in Asian elephants under human care relative to African elephants 37 (Fisher's Exact Test, P=0.0002528; Chi-squared test, P=0.0006843; Supplementary Fig. 5).
Our data for enriched biological processes associated with AR (Fig. 2) support the possibility that innate immunity and enhanced cytokine storm may play a role in the increased infectious susceptibility with inflammatory response of Asian versus African elephants.
Compared to African elephants, Asian elephants show increased interleukin-1 beta, interleukin-18, and neutrophil activation. These 3 factors, combined with TNF-alpha are potent mediators of innate immunity. Uncontrolled activation of these factors leads to immune induced disease pathogenesis through excessive inflammation. In humans and other mammals, neutrophil activation directly contributes to tissue damage through the release of neutrophil extracellular traps (NETs) 38,39 . Functional studies to compare cytokine secretion and NET release in response to infectious agents could confirm that genetic differences in innate immunity contribute to differences in disease susceptibility and outcomes between Asian and African elephants. Our study provides an example of how genomics can inform functional immunological and molecular studies, which may lead to improved conservation and medical care for elephants, and could provide important evolutionary insights to translate into human patients with infection or cancer.
. CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint

De novo assembly and annotation of the Asian elephant genome
The DNA sequencing libraries from the Asian elephant ("Icky") from the Ringling Bros. Center for Elephant Conservation were constructed and sequenced at the University of Utah Genomics Core Facility. Paired-end DNA libraries were constructed with the TruSeq Library Prep kit for a target insert size of 200 bp, and mate-paired libraries were constructed for target sizes of 3 kb, 5 kb, 8 kb, and 10 kb using the Nextera Mate Pair Library kit. Genomic DNA was sequenced on an Illumina HiSeq2500. Raw reads were trimmed to remove nucleotide biases, adapters and a quality score cutoff of 30 with Trimmomatic v0.33 40 and SeqClean 41 . Genome assembly was carried out using ALLPATHS-LG 42,43 . The expected gene content was assessed by searching for 4,104 mammalian single-copy orthologs (mammalia_odb9) using BUSCO v3.0.2 44 . We annotated and masked repeats in the resulting assembly using both the de novo method implemented in RepeatModeler v1.0.11 45

EP53 evolution in African and Asian elephants
To determine EP53 copy number and evolutionary patterns across placental mammals, we exported the TP53 human peptide from Ensembl (July 2019), and used it as a query in BLAT searches 56 of 43 mammalian genome assemblies hosted on the UCSC Genome Browser 57 .
Within each mammalian genome, we collected putative BLAT hits, filtered out sequences <33.3% of the length of the human peptide sequence, and used the highest scoring BLAT hit in a second BLAT of the target genome to determine copy number. Putative homologs were validated with a BLASTX query of the human peptide database on NCBI in order to ensure the top hit was human TP53 with ≥70% protein identity. Accepted nucleotide sequences were aligned with MAFFT 58 , and we weighted and filtered out unreliable columns in the alignment with GUIDANCE2 59 using 100 bootstrap replicates. We reconstructed the phylogeny of all aligned mammalian TP53 homologs and estimated their divergence times in a Bayesian framework with BEAST 2.5 60 using the HKY substitution model, a relaxed lognormal molecular clock model, and a Yule Model tree prior. We used a normal prior distribution for the age of MCMC chain was run for 100,000,000 generations, and we logged parameter samples every 10,000 generations to collect a total of 10,000 samples from the posterior distribution. We then collected 10,000 of the resulting trees, ignored the first 10% as burn-in, and calculated the maximum clade credibility tree using TreeAnnotator.

Detection of accelerated regions in African and Asian elephant genomes
We generated a multiple alignment (whole genome alignment or WGA) of 12 mammalian genome assemblies. First, we downloaded publicly available pairwise syntenic alignments of opossum (monDom5), mouse (mm10), dolphin (turTru1), cow (bosTau7), dog (canFam3), horse (equCab2), microbat (myoLuc1), tenrec (echTel2), and hyrax (proCap1) to the human reference (hg19) from the UCSC Genome Browser (http://hgdownload.soe.ucsc.edu/downloads.html#human). We also computed two additional de novo pairwise syntenic alignments with the human genome as a target and the two elephant genome assemblies reported here queries using local alignments from LASTZ v1.02 62  The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint regions with aligned sequence for both African and Asian elephants that have aligned sequence present for at least 9 of the 10 non-elephant species. We tested the resulting 676,509 regions for acceleration in each elephant species relative to the 10 non-elephant species with phyloP using the following options: mode = 'ACC'. We used the Q-Value method 65 to adjust for multiple testing. Statistically significant ARs were defined with a false discovery rate threshold of 10%.
We defined regions significantly accelerated in the Asian elephant, but not the African bush elephant as Asian elephant specific ARs and conversely defined African bush elephant specific ARs.
To define genes differentially expressed between Asian and African elephants we took advantage of closeness of the two species. The Asian elephant is more closely related to the African elephant than humans are to chimpanzees (0.01186 substitutions / 100 bp vs 0.01277 substitutions / 100 bp, fourfold degenerate sites). For the purpose of defining differentially expressed genes, chimpanzee RNA-Seq reads have been aligned to human transcriptome indices 66 . We aligned African bush elephant PBMC reads from a previous study 13 69 based on transcription start site (TSS) locations obtained for protein coding genes with the R package biomaRt 70 for the African bush elephant genome (loxAfr3) with basal distances of 5 kb upstream and 1 kb downstream an extension distance of 100 kb. We chose this extension distance because the majority of conserved enhancers are located within 100 kb of a TSS 71 . We used the R package LOLA 72 to test for enrichment of ARs relative to CRs in the potential regulatory regions of DE genes in the loxAfr3 genome. Biological processes (BP) and associated elephant orthologs of human genes were obtained with biomaRt. The resulting p-values were Q-Value corrected for multiple testing.
We used the same potential regulatory regions and LOLA to test for GO enrichments.
To select GO terms related to traits that distinguish elephants from the background species and Asian and African elephants from one another for Figure 2, we selected significantly enriched GO terms containing any of the words immune, viral, pathogen, interleukin, T cell, B Cell, innate, apoptosis, P53, tumor, or TNF.

Whole genome sequence analysis of living elephants
To understand genetic changes underlying adaptations in Loxodonta africana, L. cyclotis, and Elephas maximus, we obtained ~15-40x whole-genome sequencing data from multiple individuals from across the modern range of living elephants from public databases 3,10,18,23 , and the WGS libraries for "Hi-Dari" and "Icky" as well as a third African elephant named "Christie" (Supplementary Table 6). We also obtained sequence data from a straight-tusked elephant 3 and two woolly mammoths 73 . Sequences were quality checked using FastQC and trimmed to remove nucleotide biases and adapter sequences with Trimmomatic where necessary. Reads from each individual were mapped to the L. africana genome (loxAfr3.0, Ensembl version) using bwa-mem v077. Alignments were filtered to include only mapped reads and sorted by position using samtools v0.0.19 74 , and we removed PCR duplicates using MarkDuplicates in picard

Selective sweep analysis
To detect loci that have been putatively subjected to positive selection within each living elephant species, we used SweeD v3.3.1. SweeD scans the genome for selective sweeps by calculating the composite likelihood ratio (CLR) test from the folded site frequency spectrum in 1 kb grids across each scaffold. We used the folded site frequency spectrum because we lacked a suitable closely related outgroup with genomic resources that would enable us to establish ancestral alleles. For this analysis, we studied each species individually. Following Nielsen et al.  ( Supplementary Fig. 4). Then, we categorized regions as outlier regions in the observed SNP data if their CLR was greater than the 99.99 th percentile of the distribution of the highest CLRs from the simulated SNP data. For the neutral simulations, we assumed a per-year mutation rate of 0.406E-9 and a generation time of 31 years, following Palkopoulou et al. (2018). We then calculated the CLR with the simulated neutral SNP datasets. SweeD output files were changed to BED format using the namedCapture (https://github.com/tdhock/namedCapture-article/) and data.table 81 R packages, and we used bedtools intersect 82 to collect elephant gene annotations (loxAfr3.0, Ensembl) overlapping putative selective sweeps.
Genomic scans for selection may be complicated by several factors that can increase false positive rates, and false negative rates potentially stem from variable mutation and recombination landscapes 30,83 . We therefore established statistical thresholds using null demographic models. However, the estimated split times within living elephant species differ widely, ranging from 609,000 to 463,000 years before present for forest elephants 84 , to as recent as 38,000 to 30,000 years before present for bush elephants 73,85 . Estimated split times between the sampled Asian elephants are highly variable, ranging from 190,000 to 24,000 years before present 3 , indicating continental-wide population structure not accounted for here 86,87 . Still, Palkopoulou et al. (2018) found little evidence for gene flow between the three modern species of elephant, which supports our choice of analyzing them separately for selective sweeps. Therefore, we focused on shared outlier regions, which show consistent evidence of being targeted by positive selection across all three elephant species.
Genes overlapping outlier regions of putative selective sweeps were functionally annotated using a variety of methods. First, we tested for the enrichment of Gene Ontology . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint terms for biological processes 88 in the outlier gene list, using default parameters and the Benjamini-Hochberg correction for multiple testing with an adjusted p-value < 0.05. We used REVIGO 89 to semantically cluster and visualize the most significant GO terms according to their adjusted p-values using default parameters. We also created annotation clusters from the outlier genes using DAVID v6.8 90,91 and constructed protein interaction networks with STRING v11.0 92 .
Finally, we tested for enriched mouse phenotypes using ModPhea 93 .
Throughput Genomics Core and the Monsoon computing cluster at Northern Arizona University . CC-BY 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted May 31, 2020. . https://doi.org/10.1101/2020.05.29.124396 doi: bioRxiv preprint