Abstract
The risk of developing cancer is correlated with body size and lifespan within species. Between species, however, there is no correlation between cancer and either body size or lifespan, indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Elephants and their relatives (Proboscideans) are a particularly interesting lineage for the exploration of mechanisms underlying the evolution of augmented cancer resistance because they evolved large bodies recently within a clade of smaller bodied species (Afrotherians). Here, we explore the contribution of gene duplication to body size and cancer risk in Afrotherians. Unexpectedly, we found that tumor suppresxssor duplication was pervasive in Afrotherian genomes, rather than restricted to Proboscideans. Proboscideans, however, have duplicates in unique pathways that may underlie some aspects of their remarkable anti-cancer cell biology. These data suggest that duplication of tumor suppressor genes facilitated the evolution of increased body size by compensating for decreasing intrinsic cancer risk.
Introduction
Among the constraints on the evolution of large bodies and long lifespans in animals is an increased risk of developing cancer. If all cells in all organisms have a similar risk of malignant transformation and equivalent cancer suppression mechanisms, then organisms with many cells should have a higher prevalence of cancer than organisms with fewer cells, particularly because large and small animals have similar cell sizes [1]. Consistent with this expectation there is a strong positive correlation between body size and cancer incidence within species; for example, cancer incidence increases with increasing adult height in humans [2, 3] and with increasing body size in dogs, cats, and cattle [4–6]. There is no correlation, however, between body size and cancer risk between species; this lack of correlation is often referred to as ‘Peto’s Paradox’ [7–9]. Indeed, cancer prevalence is relatively stable at ∼5% across species with diverse body sizes ranging from the minuscule 51g grass mouse to the gargantuan 4800kg African elephant [10–12]. While the ultimate resolution to Peto’s Paradox is obvious, large bodied and long-lived species evolved enhanced cancer protection mechanisms, identifying and characterizing the proximate genetic, molecular, and cellular mechanisms that underlie the evolution of augmented cancer protection has proven difficult [13–17].
One of the challenges for discovering how animals evolved enhanced cancer protection mechanisms is identifying lineages in which large bodied species are nested within species with small body sizes. Afrotherian mammals are generally small-bodied, but also include the largest extant land mammals. For example, maximum adult weights are ∼70g in golden moles, ∼120g in tenrecs, ∼170g in elephant shrews, ∼3kg in hyraxes, and ∼60kg in aardvarks [18]. In contrast, while extant hyraxes are relatively small, the extinct Titanohyrax is estimated to have weighed ∼1300kg [19]. The largest living Afrotheria are also dwarfed by the size of their recent extinct relatives: extant sea cows such as manatees are large bodied (∼322-480kg) but are relatively small compared to the extinct Stellar’s sea cow which is estimated to have weight ∼8,000-10,000kg [20]. Similarly African Savannah (4,800kg) and Asian elephants (3,200kg) are large, but are dwarfed by the truly gigantic extinct Proboscideans such as Deinotherium (∼12,000kg), Mammut borsoni (16,000kg), and the straight-tusked elephant (∼14,000kg) [21]. Remarkably, these large-bodied Afrotherian lineages are nested deeply within small bodied species (Fig. 1) [22–25], indicating that gigantism independently evolved in hyraxes, sea cows, and elephants (Paenungulata). Thus, Paenungulates are an excellent model system in which to explore the mechanisms that underlie the evolution of large body sizes and augmented cancer resistance.
Many mechanisms have been suggested to resolve Peto’s paradox, including a decrease in the copy number of oncogenes, an increase in the copy number of tumor suppressor genes [7, 8, 26], reduced metabolic rates, reduced retroviral activity and load [27], and selection for ‘cheater’ tumors that parasitize the growth of other tumors [28], among many others. Among the most parsimonious routes to enhanced cancer resistance may be through an increased copy number of tumor suppressors. For example, transgenic mice with additional copies of TP53 have reduced cancer rates and extended lifespans [29], suggesting that changes in the copy number of tumor suppressors can affect cancer rates. Indeed, candidate genes studies have found that elephant genomes encode duplicate tumor suppressors such as TP53 and LIF [10, 17, 30] as well as other genes with putative tumor suppressive functions [31, 32]. These studies, however, focused on a priori candidate genes, thus it is unclear whether duplication of tumor suppressor genes is a general phenomenon in the elephant lineage (or reflects an ascertainment bias).
Here we trace the evolution of body mass, cancer risk, and gene copy number variation across Afrotherian genomes, including multiple living and extinct Proboscideans (Fig. 1), to investigate whether duplications of tumor suppressors coincided with the evolution of large body sizes. Our estimates of the evolution of body mass across Afrotheria show that large body masses evolved in a stepwise manner, similar to previous studies [22–25] and coincident with dramatic reductions in intrinsic cancer risk. To explore whether duplication of tumor suppressors occurred coincident with the evolution of large body sizes, we used a genome-wide Reciprocal Best BLAT Hit (RBBH) strategy to identify gene duplications, and used maximum likelihood to infer the lineages in which those duplications occurred. Unexpectedly, we found that duplication of tumor suppressor genes was common in Afrotherians, both large and small. Gene duplications in the Proboscidean lineage, however, were uniquely enriched in pathways that may explain some of the unique cancer protection mechanisms observed in elephant cells. These data suggest that duplication of tumor suppressor genes is pervasive in Afrotherians and proceeded the evolution of species with exceptionally large body sizes.
Methods
Ancestral Body Size Reconstruction
We first assembled a time-calibrated supertree of Eutherian mammals by combining the time-calibrated molecular phylogeny of Bininda-Emonds et al. [33] with the time-calibrated total evidence Afrotherian phylogeny from Puttick and Thomas [25]. While the Bininda-Emonds et al. [33] phylogeny includes 1,679 species, only 34 are Afrotherian, and no fossil data are included. The inclusion of fossil data from extinct species is essential to ensure that ancestral state reconstructions of body mass are not biased by only including extant species. This can lead to inaccurate reconstructions, for example, if lineages convergently evolved large body masses from a small-bodied ancestor. In contrast, the total evidence Afrotherian phylogeny of Puttick and Thomas [25] includes 77 extant species and fossil data from 39 extinct species. Therefore, we replaced the Afrotherian clade in the Bininda-Emonds et al. [33] phylogeny with the Afrotherian phylogeny of Puttick and Thomas [25] using Mesquite. Next, we jointly estimated rates of body mass evolution and reconstructed ancestral states using a generalization of the Brownian motion model that relaxes assumptions of neutrality and gradualism by considering increments to evolving characters to be drawn from a heavy-tailed stable distribution (the “Stable Model”) implemented in StableTraits [34]. The stable model allows for large jumps in traits and has previously been shown to out-perform other models of body mass evolution, including standard Brownian motion models, Ornstein–Uhlenbeck models, early burst maximum likelihood models, and heterogeneous multi-rate models [34].
Identification of Duplicate Genes
Reciprocal Best-Hit BLAT
We developed a reciprocal best hit BLAT (RBHB) pipeline to identify putative homologs and estimate gene copy number across species. The Reciprocal Best Hit (RBH) search strategy is conceptually straightforward: 1) Given a gene of interest GA in a query genome A, one searches a target genome B for all possible matches to GA ; 2) For each of these hits, one then performs the reciprocal search in the original query genome to identify the highest-scoring hit; 3) A hit in genome B is defined as a homolog of gene GA if and only if the original gene GA is the top reciprocal search hit in genome A. We selected BLAT [35] as our algorithm of choice, as this algorithm is sensitive to highly similar (>90% identity) sequences, thus identifying the highest-confidence homologs while minimizing many-to-one mapping problems when searching for multiple genes. RBH performs similar to other more complex methods of orthology prediction and is particularly good at identifying incomplete genes that may be fragmented in low quality/poor assembled regions of the genome [36, 37].
Effective Copy Number By Coverage
In low-quality genomes, many genes are fragmented across multiple scaffolds, which results in BLA(S)T-like methods calling multiple hits when in reality there is only one gene. To compensate for this, developed a novel statistic, Estimated Copy Number by Coverage (ECNC), which averages the number of times we hit each nucleotide of a query sequence in a target genome over the total number of nucleotides of the query sequence found overall in each target genome (Fig. S1). This allows us to correct for genes that have been fragmented across incomplete genomes, while accounting for missing sequences from the human query in the target genome. Mathematically, this can be written as: Where n is a given nucleotide in the query, l is the total length of the query, Cn is the number of instances that n is present within a reciprocal best hit, and bool (Cn) is 1 if Cn >1 or 0 if Cn =1.
RecSearch Pipeline
We created a custom Python pipeline for automating RBHB searches between a single reference genome and multiple target genomes using a list of query sequences from the reference genome. For the query sequences in our search, we used the hg38 UniProt proteome [38], which is a comprehensive set of protein sequences curated from a combination of predicted and validated protein sequences generated by the UniProt Consortium. In order to refine our search, we omitted protein sequences originating from long, noncoding RNA loci (e.g. LINC genes); poorly-studied genes from predicted open reading frames (C-ORFs); and sequences with highly repetitive sequences such as zinc fingers, protocadherins, and transposon-containing genes, as these were prone to high levels of false positive hits. After filtering out problematic protein queries, we then used our pipeline to search for all copies of 18011 query genes in publicly available Afrotherian genomes [4], including African savannah elephant (Loxodonta africana: loxAfr3, loxAfr4, loxAfrC), African forest elephant (Loxodonta cyclotis: loxCycF), Asian Elephant (Elephas maximus: eleMaxD), Woolly Mammoth (Mammuthus primigenius: mamPriV), Colombian mammoth (Mammuthus columbi: mamColU), American mastodon (Mammut americanum: mamAmeI), Rock Hyrax (Procavia capensis: proCap1, proCap2, proCap2HiC), West Indian Manatee (Trichechus manatus latirostris: triManLat1, triManLat1HiC), Aardvark (Orycteropus afer: oryAfe1, oryAfe1HiC), Lesser Hedgehog Tenrec (Echinops telfairi: echTel2), Nine-banded armadillo (Dasypus novemcinctus: dasNov3), Hoffman’s two-toed sloth (Choloepus hoffmannii: choHof1, choHof2, choHof2HiC), Cape golden mole (Chrysochloris asiatica: chrAsi1), and Cape elephant shrew (Elephantulus edwardii: eleEdw1).
Query gene inclusion criteria
To assemble our query list, we began with the hg38 human proteome from UniProt (Accession UP000005640) [38]. We first removed all unnamed genes from the UP000005640. Next, we excluded genes from downstream analyses for which assignment of homology was uncertain, including uncharacterized ORFs (991 genes), LOC (63 genes), HLA genes (402 genes), replication dependent histones (72 genes), odorant receptors (499 genes), ribosomal proteins (410 genes), zinc finger transcription factors (1983 genes), viral and repetitive-element-associated proteins (82 genes) and any protein described as either “Uncharacterized,” “Putative,” or “Fragment” by UniProt in UP000005640 (30724 genes), leaving us with a final set of 37,582 query protein isoforms, corresponding to 18,011 genes.
Duplication gene inclusion criteria
In order to condense transcript-level hits into single gene loci, and to resolve many-to-one genome mappings, we removed exons where transcripts from different genes overlapped, and merged overlapping transcripts of the same gene into a single gene locus call. The resulting gene-level copy number table was then combined with the maximum ECNC values observed for each gene in order to call gene duplications. We called a gene duplicated if its copy number was two or more, and if the maximum ECNC value of all the gene transcripts searched was 1.5 or greater; previous studies have shown that incomplete duplications can encode functional genes [17, 30], therefore partial gene duplications were included provided they passed additional inclusion criteria (see below). The ECNC cut-off of 1.5 was selected empirically, as this value minimized the number of false positives seen in a test set of genes and genomes. The results of our initial search are summarized in Fig. 3A. Overall, we identified 13880 genes across all species, or 77.1% of our starting query genes.
Genome Quality Assessment using CEGMA
In order to determine the effect of genome quality on our results, we used the gVolante webserver and CEGMA to assess the quality and completeness of the genome [39, 40]. CEGMA was run using the default settings for mammals (“Cut-off length for sequence statistics and composition” = 1; “CEGMA max intron length” = 100000; “CEGMA gene flanks” = 10000, “Selected reference gene set” = CVG). For each genome, we generated a correlation matrix using the aforementioned genome quality scores, and either the mean Copy Number or mean ECNC for all hits in the genome. We observed that the percentage of duplicated genes in non-Pseudoungulatan genomes was higher (12.94% to 23.66%) than Pseudoungulatan genomes (3.26% to 7.80%). Mean Copy Number, mean ECNC, and mean CN (the lesser of Copy Number and ECNC per gene) moderately or strongly correlated with genomic quality, such as LD50, the number of scaffolds, and contigs with a length above either 100K or 1M (Fig. S3). The Afrosoricidians had the greatest correlation between poor genome quality and high gene duplication rates, including larger numbers of private duplications. The correlations between genome quality metric and number of gene duplications was particularly high for Cape golden mole (Chrysochloris asiatica: chrAsi1) and Cape elephant shrew (Elephantulus edwardii: eleEdw1), therefore we excluded these species from downstream pathway enrichment analyses.
Determining functionality of duplicated via gene expression
In order to ascertain the functional status of duplicated genes, we generated de novo transcriptomes using publicly-available RNA-sequencing data for African savanna elephant, West Indian manatee, and nine-banded armadillo (Table S2). We mapped reads to the highest quality genome available for each species, and assembled transcripts using HISAT2 and StringTie [41– 43]. We found that many of our identified duplicates had transcripts mapping to them above a Transcripts Per Million (TPM) score of 2, suggesting that many of these duplications are functional. RNA-sequencing data was not available for Cape golden mole, Cape elephant shrew, rock hyrax, aardvark, or the lesser hedgehog tenrec.
Reconstruction of Ancestral Copy Numbers
We encoded the copy number of each gene for each species as a discrete trait ranging from 0 (one gene copy) to 31 (for 32+ gene copies) and used IQ-TREE to select the best-fitting model of character evolution [44–48], which was inferred to be a Jukes-Cantor type model for morphological data (MK) with equal character state frequencies (FQ) and rate heterogeneity across sites approximated by including a class of invariable sites (I) plus a discrete Gamma model with four rate categories (G4). Next we inferred gene duplication and loss events with the empirical Bayesian ancestral state reconstruction (ASR) method implemented in IQ-TREE [44–48], the best fitting model of character evolution (MK+FQ+GR+I) [49, 50], and the unrooted species tree for Atlantogenata. We considered ancestral state reconstructions to be reliable if they had Bayesian Posterior Probability (BPP) ≥ 0.80; less reliable reconstructions were excluded from pathway analyses.
Pathway Enrichment Analysis
To determine if gene duplications were enriched in particular biological pathways, we used the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt)[51] to perform Over-Representation Analysis (ORA) using the Reactome database [52]. Gene duplicates in each lineage were used as the foreground gene set, and the initial query set was used as the background gene set. WebGestalt uses a hypergeometric test for statistical significance of pathway over-representation, which we refined using two methods: an False Discovery Rate (FDR)-based approach, and an empirical p-value approach [53]. The Benjamini-Hochberg FDR multiple-testing correction was generated by WebGestalt. In order to correct P-values based on an empirical distribution, we modified the approach used by Chen et al. in Enrichr [53] to generate a “combined score” for each pathway based on the hypergeometric P-value from WebGestalt, and a correction for expected rank for each pathway. In order to generate the table of expected ranks and variances for this approach, we randomly sampled foreground sets of 10-5,000 genes from our background set 5000 times, and used WebGestalt ORA to obtain a list of enriched terms and P-values for each run; we then compiled a table of Reactome terms with their expected frequencies and standard deviation. This data was used to calculate a Z-score for terms in an ORA run, and the combined score was calculated using the formula C = log(p)z.
Estimating the Evolution of Cancer Risk
The dramatic increase in body mass and lifespan in some Afrotherian lineages, and the relatively constant rate of cancer across species of diverse body sizes [10], indicates that those lineages must have also evolved reduced cancer risk. To infer the magnitude of these reductions we estimated differences in intrinsic cancer risk across extant and ancestral Afrotherians. Following Peto [54], we estimate the intrinsic cancer risk (K) as the product of risk associated with body mass and lifespan. In order to determine (K) across species and at ancestral nodes (see below), we first estimated ancestral lifespans at each node. We used Phylogenetic Generalized Least-Square Regression (PGLS) [55, 56], using a Brownian covariance matrix as implemented in the R package ape [57], to calculate estimated ancestral lifespans across Atlantogenata using our estimates for body size at each node. In order to estimate the intrinsic cancer risk of a species, we first inferred lifespans at ancestral nodes using PGLS and the model In(lifespan) = β1corBrowninan + β2. In(size) + ϵ. Next, we calculated K1 at all nodes, and then estimated the fold-change in cancer susceptibility between ancestral and descendant nodes (Fig. 2). Next, in order to calculate K1 at all nodes, we used a simplified multistage cancer risk model for body size D and lifespan t: K ≈ Dt6 [9, 54, 58, 59]. The fold change in cancer risk between a node and its ancestor was then defined as .
Data Analysis
All data analysis was performed using R version 4.0.2 (2020-06-22), and the complete reproducible manuscript, along with code and data generation pipeline, can be found on the author’s GitHub page at www.github.com/docmanny/smRecSearch/tree/publication [57, 60–104]
Results
Step-wise evolution of body size in Afrotherians
Similar to previous studies of Afrotherian body size [25, 34], we found that the body mass of the Afrotherian ancestor was inferred to be small (0.26kg, 95% CI: 0.31-3.01kg) and that substantial accelerations in the rate of body mass evolution occurred coincident with a 67.36x increase in body mass in the stem-lineage of Pseudoungulata (17.33kg); a 1.45x increase in body mass in the stem-lineage of Paenungulata (25.08kg); a 11.82x increase in body mass in the stem-lineage of Tehthytheria (296.56kg); a 1.39x increase in body mass in the stem-lineage of Proboscidea (412.5kg); and a 2.69x increase in body mass in the stem-lineage of Elephantimorpha (4114.39kg), which is the last common ancestor of elephants and mastodons using the fossil record (Fig. 2A/B). The ancestral Hyracoidea was inferred to be relatively small (2.86kg-118.18kg), and rate accelerations were coincident with independent body mass increases in large hyraxes such as Titanohyrax andrewsi (429.34kg, 67.36x increase) (Fig. 2A/B). While the body mass of the ancestral Sirenian was inferred to be large (61.7-955.51kg), a rate acceleration occurred coincident with a 10.59x increase in body mass in Stellar’s sea cow (Fig. 2A/B). Rate accelerations also occurred coincident with 36.6x decrease in body mass in the stem-lineage of the dwarf elephants Elephas (Palaeoloxodon) antiquus falconeri and Elephas cypriotes (Fig. 2A/B). These data indicate that gigantism in Afrotherians evolved step-wise, from small to medium bodies in the Pseudoungulata stem-lineage, medium to large bodies in the Tehthytherian stem-lineage and extinct hyraxes, and from large to exceptionally large bodies independently in the Proboscidean stem-lineage and Stellar’s sea cow (Fig. 2A/B).
Step-wise reduction of intrinsic cancer risk in large, long-lived Afrotherians
In order to account for a relatively stable cancer rate across species [10–12], intrinsic cancer risk must also evolve with changes body size (and lifespan) across species. As expected, intrinsic cancer risk in Afrotheria also varies with changes in body size and longevity (Fig. 2A/B), with a 6.41-log2 decreases in the stem-lineage of Xenarthra, followed by a 13.37-log2 decrease in Pseudoungulata, and a 1.49-log2 decrease in Aardvarks (Fig. 2A). In contrast to the Paenungulate stem-lineage, there is a 7.84-log2 decrease in cancer risk in Tethytheria, a 0.67-log2 decrease in Manatee, a 3.14-log2 decrease in Elephantimorpha, and a 1.05-log2 decrease in Proboscidea. Relatively minor decreases occurred within Proboscidea including a 0.83-log2 decrease in Elephantidae and a 0.57-log2 decrease in the American Mastodon. Within the Elephantidae, Elephantina and Loxodontini have a 0.06-log2 decrease in cancer susceptibility, while susceptibility is relatively stable in Mammoths. The three extant Proboscideans, Asian Elephant, African Savana Elephant, and the African Forest Elephant, meanwhile, have similar decreases in body size, with slight increases in cancer susceptibility (Fig. 2A/B).
Pervasive duplication of tumor suppressor genes in Afrotheria
Our hypothesis was that genes which duplicated coincident with the evolution of increased body mass (IBM) and reduced intrinsic cancer risk (RICR) would be uniquely enriched in tumor suppressor pathways compared to genes that duplicated in other lineages. Therefore, we identified duplicated genes in each Afrotherian lineage (Fig. 3A) and tested if they were enriched in Reactome pathways related to cancer biology (Fig. 3B, Table 2). No pathways related to cancer biology were enriched in either the Pseudoungulata (67.36-fold IBM, 13.37-log2 RICR) or Paenungulata (1.45-fold IBM, 1.17-log2RICR) stem-lineages (Fig. 3B), however while a large change in both IBM and RICR occurred in the Pseudoungulata stem-lineage only few were inferred to be duplicated in this lineage, reducing power to detect enriched pathways. Consistent with our hypothesis, 55.8% (29/52) of the pathways that were enriched in the Tethytherian stem-lineage (11.82-fold IBM, 7.84-log2 RICR), 27.8% (20/72) of the pathways that were enriched in the Proboscidean stem-lineage (1.06-fold IBM, 3.14-log2 RICR), and 28% (33/118) of the pathways that were enriched within Proboscideans were related to tumor suppression (Fig. 3B). Similarly, 17.8% (10/56) and 30% (30/100) of the pathways that were enriched in manatee (1.11-fold IBM, 0.89-log2 RICR) and aardvark (67.36-fold IBM, 1.49-log2 RICR), respectively, were related to tumor suppression. In contrast, only 4.9% (2/41) of the pathways that were enriched in hyrax (1.6-fold IBM, 1.49-log2 RICR) were related to tumor suppression (Fig. 3B). Unexpectedly, however, lineages without major increases in body size or lifespan, or decreases in intrinsic cancer risk, were also enriched for tumor suppressor pathways. For example, 13.2% (9/68), 36.1% (13/36), and 20% (20/100) of the pathways that were enriched in the stem-lineages of Afroinsectivoa and Afrosoricida, and in E. telfairi, respectively, were related to cancer biology (Fig. 3B).
Duplication of tumor suppressor genes is pervasive in many Afrotherians
Our observation that gene duplicates in most lineages (15/20) are enriched in cancer pathways suggest that either duplication of genes in cancer pathways is common in Afrotherians, or that there may be a systemic bias in the pathway enrichment analyses. For example, random gene sets may be generally enriched in pathway terms related to cancer biology. To explore this latter possibility, we generated 5000 randomly sampled gene sets of between 10 and 5000 genes, and tested for enriched Reactome pathways using ORA. We found that no cancer pathways were enriched (median hypergeometric p-value ≤ 0.05) among gene sets tested greater than 157 genes; however, in these smaller gene sets, 12% - 18% of enriched pathways were classified as cancer pathways. Without considering p-value thresholds, the percentage of enriched cancer pathways approaches ∼15% (213/1381) in simulated sets. Thus, for larger gene sets, we conservatively used a threshold of 15% for enriched pathways related to cancer biology resulting from sampling bias. We directly compared our simulated and observed enrichment results by lineage and gene set size, and found that only Columbian mammoth, Paenungulata, Elephantidae, African Forest elephant, Afrosoricida, Tethytheria, Asian elephant, African Savannah elephant, Proboscidea, manatee, aardvark, and tenrec had enriched cancer pathway percentages above background with respect to their gene set sizes, i.e., expected enrichments based on random sampling of small gene sets (Fig. 3B). Thus, we conclude that duplication of genes in cancer pathways is common in many Afrotherians but that the inference of enriched cancer pathway duplication is not different from background in some lineages, particularly in ancestral nodes with a small number of estimated duplicates.
Tumor suppressor pathways enriched exclusively within Proboscideans
While duplication of cancer associated genes is common in Afrotheria, the 157 genes that duplicated in the Proboscidean stem-lineage (Fig. 3A) were uniquely enriched in 12 pathways related to cancer biology (Fig. 3B). Among these uniquely enriched pathways (Fig. 3C) were pathways related to the cell cycle, including “G0 and Early G1”, “G2/M Checkpoints” and “Phosphorylation of the APC/C”, pathways related to DNA damage repair including “Global Genome Nucleotide Excision Repair (GG-NER)”, “HDR through Single Strand Annealing (SSA)”, “Gap-filling DNA repair synthesis and ligation in GG-NER”, “Recognition of DNA damage by PCNA-containing replication complex”, and “DNA Damage Recognition in GG-NER”, pathways related to telomere biology including “Extension of Telomeres” and “Telomere Maintenance”, pathways related to the apoptosome including “Activation of caspases through apoptosome-mediated cleavage”, pathways related to “mTORC1-mediated signaling” and “mTOR signaling”. Thus, duplication of genes in with tumor suppressor functions is pervasive in Afrotherians, but genes in some pathways related to cancer biology and tumor suppression are uniquely duplicated in large-bodied (long-lived) Proboscideans (Fig. 4A/B).
Among the genes uniquely duplicated within Proboscideans are TP53, COX20, LAMTOR5, PRDX1, STK11, BRD7, MAD2L1, BUB3, UBE2D1, SOD1, LIF, MAPRE1, CNOT11, CASP9, CD14, HMGB2 (Fig. 4C). Two of these, TP53 and LIF, have been previously described [10, 17, 30]. These genes are significantly enriched in pathways involved in apoptosis, cell cycle regulation, and both upstream and downstream pathways involving TP53. The majority of these genes are expressed in African Elephant transcriptome data (Fig. 4D), suggesting that they maintained functionality after duplication.
Coordinated duplication of TP53-related genes in Proboscidea
Prior studies found that the “master” tumor suppressor TP53 duplicated multiple times in elephants [10, 17], motivating us to further study duplication of genes involved in TP53-related pathways Proboscidea. We traced the evolution of genes the in TP53 pathway that appeared in one or more Reactome pathway enrichments for genes duplicated recently in the African Elephant, which has the most complete genome among Proboscidean and for which several RNA-Seq data sets are available. We found that the initial duplication of TP53 in Tethytheria, where body size expanded, was preceded by the duplication of GTF2F1 and STK11 in Paenungulata and was coincident with the duplication of BRD7. These three genes are involved in regulating the transcription of TP53 [105–108], and their duplication prior to that of TP53 may have facilitated re-functionalization of TP53 retroduplicates. Interestingly, STK11 is also tumor suppressor that mediates tumor suppression via p21-induced senescence [106]. The other genes that are duplicated in the pathway are downstream of TP53; these genes duplicated either coincident with TP53, as in the case of SIAH1, or subsequently in Probodiscea, Elephantidae, or extant elephants (Fig. 4). These genes are expressed in RNA-Seq data (Fig. 4D), suggesting that they are functional.
Discussion
Among the evolutionary, developmental, and life history constraints on the evolution of large bodies and long lifespans is an increased risk of developing cancer. While body size and lifespan are correlated with cancer risk within species, there is no correlation between species because large and long-lived organisms have evolved enhanced cancer suppression mechanisms. While this ultimate evolutionary explanation is straightforward [54], determining the mechanisms that underlie the evolution of enhanced cancer protection is challenging because many mechanisms of relatively small effects likely contribute to evolution of reduced cancer risk. Previous candidate gene studies in elephants have identified duplications of tumor suppressors such as TP53 and LIF, among others, suggesting that an increased copy number of tumor suppressors may contribute to the evolution of large body sizes in the elephant lineage [10, 17, 30–32]. Here we: 1) trace the evolution of body size and lifespan in Eutherian mammals, with particular reference to Afrotherians; 2) infer changes in cancer susceptibility across Afrotherian lineages; 3) use a genome-wide screen to identify gene duplications in Afrotherian genomes, including multiple living and extinct Proboscideans; and 4) show that while duplication of genes with tumor suppressor functions is pervasive in Afrotherian genomes, Proboscidean gene duplicates are enriched in unique pathways with tumor suppressor functions.
Correlated evolution of large bodies and reduced cancer risk
The hundred-to hundred-million-fold reductions in intrinsic cancer risk associated with the evolution of large body sizes in some Afrotherian lineages, in particular Elephantimorphs such as elephants and mastodons, suggests that these lineages must have also evolved remarkable mechanisms to suppress cancer. While our initial hypothesis was that large bodied lineages would be uniquely enriched in duplicate tumor suppressor genes compared to other smaller bodied lineages, we unexpectedly found that the duplication of genes in tumor suppressor pathways occurred at various points throughout the evolution of Afrotheria, regardless of body size. These data suggest that this abundance of tumor suppressors may have contributed to the evolution of large bodies and reduced cancer risk, but that these processes were not necessarily coincident. Interestingly, pervasive duplication of tumor suppressors may also have contributed to the repeated evolution of large bodies in hyraxes and sea cows, because at least some of the genetic changes that underlie the evolution of reduced cancer risk was common in this group. It remains to be determined whether our observation of pervasive duplication of tumor suppressors also occurs in other multicellular lineages. Using a similar reciprocal best BLAST/BLAT approach that focused on estimating copy number of known tumor suppressors in mammalian genomes, for example, Caulin et al. (2015) found no correlation between copy number or tumor suppressors with either body mass or longevity, whilst Tollis et al. (2020) found a correlation between copy number and longevity (but not body size) [12, 31]. These opposing conclusions may result from differences in the number of genes (81 vs 548) and genomes (8 vs 63) analyzed, highlighting the need for genome-wide analyses of many species that vary in body size and longevity.
All Afrotherians are equal, but some Afrotherians are more equal than others
While we found that duplication of tumor suppressor genes is common in Afrotheria, genes that duplicated in the Proboscidean stem-lineage (Fig. 3A/B) were uniquely enriched in functions and pathways that may be related to the evolution of unique anti-cancer cellular phenotypes in the elephant lineage (Fig. 3C). Elephant cells, for example, cannot be experimentally immortalized [109, 110], rapidly repair DNA damage [17, 111, 112], are extremely resistant to oxidative stress [110] - and yet are also extremely sensitive to DNA damage [10, 17, 30]. Several pathways related to DNA damage repair, in particular nucleotide excision repair (NER), were uniquely enriched among genes that duplicated in the Proboscidean stem-lineage, suggesting a connection between duplication of genes involved in NER and rapid DNA damage repair [111, 112]. Similarly, we identified a duplicate SOD1 gene in Proboscideans that may confer the resistance of elephant cells to oxidative stress [110]. Pathways related to the cell cycle were also enriched among genes that duplicated in Proboscideans, and cell cycle dynamics are different in elephants compared to other species; population doubling (PD) times for African and Asian elephant cells are 13-16 days, while PD times are 21-28 days in other Afrotherians [110]. Finally, the role of “mTOR signaling” in the biology of aging is well-known. Collectively these data suggest that gene duplications in Proboscideans may underlie some of their cellular phenotypes that contribute to cancer resistance.
There’s no such thing as a free lunch: Trade-offs and constraints on tumor suppressor copy number
While we observed that duplication of genes in cancer related pathways – including genes with known tumor suppressor functions – is pervasive in Afrotheria, the number of duplicate tumor suppressor genes was relatively small, which may reflect a trade-off between the protective effects of increased tumor suppressor number on cancer risk and potentially deleterious consequences of increased tumor suppressor copy number. Overexpression of TP53 in mice, for example, is protective against cancer but associated with progeria, premature reproductive senescence, and early death; however, transgenic mice with a duplication of the TP53 locus that includes native regulatory elements are healthy and experience normal aging, while also demonstrating an enhanced response to cellular stress and lower rates of cancer [29, 113]. These data suggest duplication of tumor suppressors can contribute to augmented cancer resistance, if the duplication includes sufficient regulatory architecture to direct spatially and temporally appropriate gene expression. Thus, it is interesting that duplication of genes that regulate TP53 function, such as STK11, SIAH1, and BRD7, preceded the retroduplication TP53 in the Proboscidean stem-lineage, which may have mitigated toxicity arising from dosage imbalances. Similar co-duplication events may have alleviated the negative pleiotropy of tumor suppressor gene duplications to enable their persistence and allow for subsequent co-option during the evolution of cancer resistance.
Conclusions, caveats, and limitations
Our genome-wide results suggest that duplication of tumor suppressors is pervasive in Afrotherians and may have enabled the evolution of larger body sizes in multiple lineages by lowering intrinsic cancer risk either prior to or coincident with increasing body size. However, our study has several inherent limitations, for example, we have shown that genome quality plays an important role in our ability to identify duplicate genes and several species have poor quality genomes (and thus were excluded from further analyses). Conversely, without comprehensive gene expression data we cannot be certain that duplicate genes are actually expressed. Duplication of tumor suppressor genes is also unlikely to be the only mechanism responsible for the evolution of large body sizes, long lifespans, and reduced cancer risk. The evolution of regulatory elements, coding genes, and genes with non-canonical tumor suppressor functions are also important for mediating the cancer risk. We also assume that duplicate genes preserve their original functions and increase overall gene dosage. Many processes, however, such as developmental systems drift, neofunctionalization, and sub-functionalization can cause divergence in gene functions [114–116], leading to inaccurate inferences of dosage effects and pathway functions.
Conflicts of Interest
The Authors have no conflicts of interest to report.
Funding Source
We would like to thank the Department of Human Genetics at the University of Chicago for supporting this project.
Supplementary Data Files
Data File S1: “Atlantogenata_GeneCopyNumber.csv” A spreadsheet with genes and copy numbers for all genomes searched.
Data File S2: “Atlantogenata_mlTree.nexus” A NEXUS file containing estimated copy numbers of genes across Atlantogenata.
Data File S3: “Atlantogenata_Reactome_ORA.xlsx” A spreadsheet with all Reactome enrichments.
Data File S4: “AtlantogenataReactomePathwayClasses.csv” Classification of Reactome Pathways.
Data File S5: “Atlantogenata_RBB.zip” A .zip archive containing BED files with the locations of all identified Reciprocal Best Hits.
Acknowledgements
We would like to thank Dr. Olga Dudchenko and Dr. Erez Aiden at Baylor College of Medicine for the Hi-C scaffolded Procavia capensis, Trichechus manatus, Orycteropus afer, and Choloepus hoffmannii genomes. We would also like to thank D.H. Vazquez for his indispensable support.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.↵
- 13.↵
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.↵
- 44.↵
- 45.
- 46.
- 47.
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.↵
- 105.↵
- 106.↵
- 107.
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.
- 116.↵
- 117.↵
- 118.
- 119.
- 120.
- 121.
- 122.