The genetic basis of tail-loss evolution in humans and apes

The loss of the tail is one of the main anatomical evolutionary changes to have occurred along the lineage leading to humans and to the “anthropomorphous apes”1,2. This morphological reprogramming in the ancestral hominoids has been long considered to have accommodated a characteristic style of locomotion and contributed to the evolution of bipedalism in humans3–5. Yet, the precise genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Primate genome sequencing projects have made possible the identification of causal links between genotypic and phenotypic changes6–8, and enable the search for hominoid-specific genetic elements controlling tail development9. Here, we present evidence that tail-loss evolution was mediated by the insertion of an individual Alu element into the genome of the hominoid ancestor. We demonstrate that this Alu element – inserted into an intron of the TBXT gene (also called T or Brachyury10–12) – pairs with a neighboring ancestral Alu element encoded in the reverse genomic orientation and leads to a hominoid-specific alternative splicing event. To study the effect of this splicing event, we generated a mouse model that mimics the expression of human TBXT products by expressing both full-length and exon-skipped isoforms of the mouse TBXT ortholog. We found that mice with this genotype exhibit the complete absence of a tail or a shortened tail, supporting the notion that the exon-skipped transcript is sufficient to induce a tail-loss phenotype, albeit with incomplete penetrance. We further noted that mice homozygous for the exon-skipped isoforms exhibited embryonic spinal cord malformations, resembling a neural tube defect condition, which affects ∼1/1000 human neonates13. We propose that selection for the loss of the tail along the hominoid lineage was associated with an adaptive cost of potential neural tube defects and that this ancient evolutionary trade-off may thus continue to affect human health today.

The loss of the tail is one of the main anatomical evolutionary changes to have occurred along the lineage leading to humans and to the "anthropomorphous apes" 1,2 . This 2 morphological reprogramming in the ancestral hominoids has been long considered to have accommodated a characteristic style of locomotion and contributed to the evolution 4 of bipedalism in humans 3-5 . Yet, the precise genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Primate genome sequencing projects have 6 made possible the identification of causal links between genotypic and phenotypic changes 6-8 , and enable the search for hominoid-specific genetic elements controlling tail 8 development 9 . Here, we present evidence that tail-loss evolution was mediated by the insertion of an individual Alu element into the genome of the hominoid ancestor. We 10 demonstrate that this Alu element -inserted into an intron of the TBXT gene (also called T or Brachyury 10-12 ) -pairs with a neighboring ancestral Alu element encoded in the 12 reverse genomic orientation and leads to a hominoid-specific alternative splicing event.
To study the effect of this splicing event, we generated a mouse model that mimics the 14 expression of human TBXT products by expressing both full-length and exon-skipped isoforms of the mouse TBXT ortholog. We found that mice with this genotype exhibit the 16 complete absence of a tail or a shortened tail, supporting the notion that the exonskipped transcript is sufficient to induce a tail-loss phenotype, albeit with incomplete 18 penetrance. We further noted that mice homozygous for the exon-skipped isoforms exhibited embryonic spinal cord malformations, resembling a neural tube defect 20 condition, which affects ~1/1000 human neonates 13 . We propose that selection for the loss of the tail along the hominoid lineage was associated with an adaptive cost of 22 potential neural tube defects and that this ancient evolutionary trade-off may thus continue to affect human health today. 24 The tail appendage varies widely in its morphology and function across vertebrate species 4,5 .
For primates in particular, the tail is adapted to a range of environments, with implications for 2 the animal's style of locomotion 14, 15 . The New World howler monkeys, for example, evolved a prehensile tail that helps the animal to grasp or hold objects while occupying arboreal habitats 16 . 4 Hominoids -which include humans and the apes -however, are distinct among the primates in their loss of an external tail (Fig. 1a). The loss of the tail is inferred to have occurred ~25 million 6 years ago when the hominoid lineage diverged from the ancient Old World monkeys (Fig. 1a), leaving only 3-4 caudal vertebrae to form the coccyx, or tailbone, in modern humans 17 . It has 8 long been speculated that tail loss in hominoids has contributed to bipedal locomotion, whose evolutionary occurrence coincided with the loss of tail [18][19][20] . Recent progress in developmental 10 biology has led to the elucidation of the gene regulatory networks that underlie tail development 9,21 . Specifically, the absence of the tail phenotype in the Mouse Genome 12 Informatics has so far recorded 31 genes from the study of mutants and naturally occurring variants 21,22 (Supp. Table 1). Expression of these genes is enriched in the development of the 14 primitive streak and posterior body formation, including the core gene regulation network for inducing the mesoderm and definitive endoderm such as Tbxt, Wnt3a, and Msgn1. While these 16 genes and their relationships have been studied, the exact genetic changes that drove the evolution of tail-loss in hominoids remain unknown, preventing an understanding of how tail loss 18 affected other human evolutionary events, such as bipedalism.

A hominoid-specific intronic AluY element in TBXT 22
We screened through the 31 human genes -and their primate orthologs -involved in tail development, with the goal of identifying a genetic variation associated with the loss of the tail in 24 hominoids (Supp. Table 1). We first examined protein sequence conservation between the hominoid genomes and its closest sister lineage, the Old World monkeys (Cercopithecidae).
However, we failed to detect candidate variants in hominoid coding sequences that might 2 provide a genetic mechanism for tail-loss evolution (Supp. File 1). We next queried for hominoid-specific genomic rearrangements in the non-coding regions of genes related to tail 4 development. Surprisingly, we found a hominoid-specific Alu element inserted in intron 6 of TBXT ( Fig. 1b and Supp. File 1) 10,11 . TBXT codes a highly-conserved transcription factor 6 critical for mesoderm and definitive endoderm formation during embryonic development 12,23-25 .
Heterozygous mutations in the coding regions of the TBXT orthologs in tailed animals, such as 8 mouse 10 , Manx cat 26 , dog 27 and zebrafish 28 , lead to the absence or reduced form of the tail, and homozygous mutants are typically not viable. Moreover, this particular Alu insertion is from the 10 AluY subfamily, a relatively 'young' but not human-specific subfamily shared between the genomes of hominoids and Old World monkeys, the activity of which coincides with the 12 evolutionary time when early hominoids lost their tails 29 .
14 The AluY element in TBXT is not inserted in the vicinity of a splice site; rather, it is >500 bp from exon 6 of TBXT, the nearest coding exon. As such, it would not be expected to lead to an 16 alternative splicing event, as found for other intronic Alu elements affecting splicing [30][31][32] .
However, we noted the presence of another Alu element (AluSx1) in the reverse orientation in 18 intron 5 of TBXT, that is conserved in all simians. Together, the AluY and AluSx1 elements form an inverted repeat pair (Fig. 1b). We thus posited that upon transcription, the simian-specific 20 AluSx1 element pairs with the hominoid-specific AluY element, forming a stem-loop structure in the TBXT pre-mRNA and trapping exon 6 in the loop (Fig. 1c). An inferred RNA secondary 22 structure model supported an interaction between these two Alu elements 33 (Fig. S1). The secondary structure of the transcript may thus conjoin the splice donor and receptor of exons 5 24 and 7, respectively, and promote the skipping of exon 6, leading to a hominoid-specific and inframe alternative splicing isoform, TBXT-Δexon6 (Fig. 1c). Indeed, we validated the existence 5 of TBXT-Δexon6 transcripts in human and its corresponding absence in mouse, which lacks the Alu elements, using an embryonic stem cells (ESCs) in vitro differentiation system that induces 2 TBXT expression similar to that present in the primitive streak of the embryo (Fig. S2) 34,35 .
Considering the high conservation of TBXT exon 6 and its potential transcriptional regulation 4 function (Fig. S3), we thus hypothesized that in humans and apes, the TBXT-Δexon6 isoform protein disrupts tail elongation during embryonic development, leading to the reduction or loss of 6 an external tail (Fig. 1c). UCSC Genome browser view of the conservation score through multi-species alignment at the TBXT 10 locus across primate genomes 36 . The hominoid-specific AluY element is labelled in red. c, Schematic of the hypothesized mechanism of tail-loss evolution in hominoids.

AluY insertion in TBXT induces alternative splicing, and requires interaction with AluSx1
To test whether both AluY and AluSx1 are required to induce the hominoid-specific alternatively 2 spliced isoform of TBXT, we used CRISPR/Cas9 in human ESCs to individually delete the hominoid-specific AluY element and -in a separate line -its potentially interacting counterpart 4 AluSx1 (Fig. 2a, S4a). Again, we adapted the hESC in vitro differentiation system to mimic the TBXT expression in the embryo (Fig. S2) 34 . We found that deleting the AluY almost completely 6 eliminated the generation of the TBXT-Δexon6 isoform transcript (Fig. 2b, middle). Similarly, deleting the interacting partner, AluSx1, sufficed to repress this alternatively spliced isoform 8 ( Fig. 2b, right). These results support the notion that the hominoid-specific AluY insertion induces a novel TBXT-Δexon6 AS isoform, through an interaction with the neighboring AluSx1 10 element (Fig. 2c, top).

12
Interestingly, we found that wild-type differentiated hESCs also express a minor, previously unannotated transcript that excludes both exon 6 and exon 7, leading to a frameshift and early 14 truncation at the protein level (Fig. 2b, left, and S4b). Whereas deleting AluY slightly enhanced the abundance of this TBXT-Δexon6&7 transcript, deleting AluSx1 in intron 5 completely 16 eliminated this transcript (Fig. 2b). This may be best explained by a secondary interaction of the AluSx1 element with a distal AluSq2 element in intron 7. In this scenario, the secondary 18 interaction would occur at a lower probability than the AluY-AluSx1 interaction pair (Fig. 2c, bottom). These results further support an interaction among intronic transposable elements 20 affecting splicing of the conserved TBXT regulator (Fig. 2c, S4b).

Tbxt-Δexon6 is sufficient to induce tail loss in mice
To test whether the TBXT-Δexon6 isoform is sufficient to induce tail loss, we generated a 2 heterozygous mouse Tbxt Δexon6/+ model (Fig. 3a). TBXT is highly conserved in vertebrates and human and mouse protein sequences share 91% identity with a similar exon/intron 4 architecture 11 . We could thus simulate a TBXT-Δexon6 isoform by deleting exon 6 in mice, forcing splicing of exons 5 with exon 7. The Tbxt Δexon6/+ heterozygous mouse thus mimics the 6 TBXT gene in human, which expresses both full-length and Δexon6 splice isoforms ( Fig. 2b and 3b-c). 8 Studying the phenotypes of the Tbxt Δexon6/+ mice, we found that simultaneous expression of both 10 isoforms led to strong but heterogeneous tail morphologies, including no-tail and short-tail phenotypes ( Fig. 3d-e, S5). Specifically, 21 of the 63 heterozygous mice showed tail 12 phenotypes, while none of their 35 wild-type littermates showed phenotypes ( Table 1). The incomplete penetration of phenotypes among the heterozygotes was stable across generations 14 and founder lines: no-/short-tailed (Tbxt Δexon6/+ ) parent can give birth to long-tailed Tbxt Δexon6/+ mice, whereas long-tailed (Tbxt Δexon6/+ ) parents can give birth to pups with varied tail phenotypes 16 ( Table 1, Fig. S5), providing further evidence that the presence of TBXT-Δexon6 suffices to induce tail loss. 18 To control for the possibility that zygotic CRISPR targeting induced off-targeting DNA changes 20 at the Tbxt locus, we performed Capture-seq covering the Tbxt locus and ~200kb of both upstream and downstream flanking regions 37 (Fig. S6). Capture-seq did not detect any off-22 targeting at the Tbxt locus across three independent founder mice, supporting our conclusion that the observed tail phenotype from the Tbxt Δexon6/+ mice derived from the Tbxt-Δexon6 24 isoform.
Note: *: Type 1 intercrossing: at least one of the parent mice is no-/short-tailed. Type 2 intercrossing: both parent mice are long-tailed.

4
**: Numbers in parentheses indicate the number of pups with tail phenotypes.

Homozygous removal of Tbxt-Δexon6 is lethal
The human TBXT gene expresses a mixture of TBXT-Δexon6 and TBXT-full length transcripts -10 induced as we inferred by the AluY insertion and interaction with AluSx1 -while mouse Tbxt only expresses the full length Tbxt. Thus, we next inquired into the mode by which homozygous 12 TBXT-Δexon6 mutation (Tbxt Δexon6/Δexon6 ) affects development. Intercrossing the Tbxt Δexon6/+ mice across multiple litters and replicated in different founders, we failed to produce viable 14 homozygotes ( Table 1). Dissecting intercrossed embryos at E11.5 showed that homozygotes either arrested development at ~E9 or developed with spinal cord malformations that 16 consequently led to death at birth (Fig. S7). We noted that the Tbxt Δexon6/Δexon6 embryos showed malformations of the spinal cord similar to spina bifida in humans. Together, while the 18 Tbxt Δexon6/+ mice present incomplete penetrance of the tail phenotypes requires further investigation, these results indicate that the TBXT-Δexon6 isoform, which in human is induced 20 by the intronic AluY-AluSx1 interaction, may indeed be the key driver of tail-loss evolution in hominoids. 22 showing an absence of the tail. Two additional founder mice are shown in Figure S5. e, Tbxt Δexon6/+ 6 heterozygous mice display heterogeneous tail phenotypes varied from absolute no-tail to long-tails. sv, sacral vertebrae; cv, caudal vertebrae.

Discussion
We have presented evidence that tail-loss evolution in hominoids was driven by the intronic 2 insertion of an AluY element. As opposed to disrupting a splice site, we inferred that this element interacts with a neighboring (simian-shared) AluSx1 element in the neighboring intron, 4 leading to an alternatively spliced isoform which skips an intervening exon (Fig. 1c).
Experimental deletion of AluY or its interacting counterpart eliminates such TBXT alternative 6 splicing in differentiated hESCs in the primitive streak state (Fig. 2). is an interesting possibility that the interactions between paired transposable elements may 16 create functional splice variants and circRNA isoforms from the same genetic locus.

18
We found that expressing the Tbxt-Δexon6 transcript -along with the full-length transcript -in mice was sufficient to induce no-tail phenotypes, though with incomplete penetrance (Fig. 3 and 20 Table 1). It is possible that a heterogeneity of tail phenotypes also existed in the ancestral hominoids upon the initial AluY insertion. Thus, while tail-loss evolution in hominoids may have 22 been initiated by the AluY insertion, additional genetic changes may have then acted to stabilize the no-tail phenotype in early hominoids (Fig. S8). Such a set of genetic events would explain 24 how a change to the AluY in modern hominoids would not result in the re-appearance of the tail. The specific evolutionary advantage for the loss of the tail is not clear, though it likely involved 2 enhanced locomotion in a non-arboreal lifestyle. We can assume however that the selective advantage must have been very strong since the loss of the tail may have included an 4 evolutionary trade-off of neural tube defects, as demonstrated by the presence of spinal cord malformations in the Tbxt Δexon6/Δexon6 mutant at E11.5 (Fig. S7). Interestingly, mutations leading 6 to neural tube defect and/or sacral agenesis have been detected in the coding and noncoding regions of the TBXT gene 40-43 . We thus speculate that the evolutionary tradeoff involving the 8 loss of the tail -made ~25 million years ago -continues to influence health today. This evolutionary insight into a complex human disease may in the future lead to the design of 10 therapeutic strategies.

Acknowledgments 14
We thank Naoya Yamaguchi, Eric Wang, John Shin, Susan Liao, Huiyuan Zhang, and the members of the Yanai and Boeke labs for constructive comments and suggestions. We thank 16 Megan Hogan and Raven Luther for sequencing assistance, and Michael Ceriello and Ahmad Naimi for assistance with the mice work. This work was supported in part by the NHGRI RM1 18

Mouse ESC culture and differentiation 2
Mouse ESCs derived from C57BL/6J strain background were cultured in a feeder cellfree condition. Cells were grown on tissue culture-grade plates coated with mESC-qualified 4 gelatin. Before plating cells, the plastic tissue culture-treated plates were coated with 0.1% gelatin (EMD Millipore ES-006-B) at room temperature for at least 30min, followed by switching 6 to mESC medium and warming up the medium at 37°C and 5% CO2 incubator for at least 30min. 8 The feeder cell-free mESC culturing medium, also called '80/20' medium, comprises 80% 2i medium and 20% mESC medium by volume. 2i medium was made from a 1:1 mix of 10

CRISPR targeting
All guide RNAs of the CRISPR experiments were designed using CRISPOR algorithm 6 through its predicted target sites integrated in the UCSC genome browser 47 . Guide RNAs were cloned into the pX459V2.0-HypaCas9 plasmid (AddGene plasmid #62988), or its custom 8 derivative by replacing the puromycin resistance gene to blasticidin resistance gene. Guide RNAs in this study were designed in pairs to delete the intervening sequences. The sequence 10 and targeting sites of the guide RNAs were listed below: day before the nucleofection experiment to maintain a superior condition.
Before performing nucleofection on human ESCs, 6cm tissue culture plates were treated 8 with 0.5µg/cm 2 rLaminin-521 in a 37°C and 5% CO2 incubator for at least 2h. rLaminin-521treated plates give better viability when seeding hESCs as single cells. Cultured human ESCs 10 were then washed with PBS, and dissociated into single cells using TrypLE Select Enzyme (no phenol red. Gibco, Cat. No. 12563011). One million hES single cells were nucleofected using 12 program A-023 according to the manufacturer's instruction of the Nucleofector 2b device.
Transfected cells were transferred on the rLaminin-521-treated 6cm plates with pre-warmed 14 StemFlex complete medium supplementing with 1X RevitaCell but not Penicillin-Streptomycin.
Together with the pX459V2.0-HypaCas9-gRNA plasmids for nucleofection, a single-22 strand DNA oligo were co-delivered for micro homology-induced deletion of the targeted sites 48 .
These ssDNA sequences were synthesized from IDT through its Ultramer DNA Oligo service, 24 including phosphorothioate bond modification on the three bases of each end. Detailed sequence information was listed below ("|" indicates a junction site): T*T*T*ATTCTAGAGCCCATTAACATATCACTCCTGCTCACTTGGTAGAAAGCCACCG| CAGGGGTCCCCAAGGAGGCTTTCATTTCAATATCCATGTGCCTCAGAACATG*C*C* C Genotyping of CRISPR/Cas9-targeted sites were performed through PCR following 2 standard protocol. The genotyping primers were listed below:

Splicing isoforms detection 6
Total RNAa were collected from the undifferentiated or differentiated cells of both human and mouse ESCs, using standard column-based purification kit (QIAGEN RNeasy Kit, Cat. No. 8 74004). DNase treatment was applied during the purification to remove any potential DNA contamination. Following extraction, RNA quality was checked through electrophoresis based 10 on the ribosomal RNA integrity. Reverse transcription was performed with 1µg of high-quality total RNA for each sample, using High-Capacity RNA-to-cDNA™ Kit (Applied Biosystems, Cat. Upon confirming the heterozygous genotype (Tbxt Δexon6/+ ), founder mice were 16 backcrossed with wild-type C57B/6J mice for generating heterozygous F1 pups. Due to the varied tail phenotypes, intercrossing between F1 heterozygotes were performed in two categories: type1 intercrossing includes at least one parent being no-/short-tailed, whereas type 2 2 intercrossing were mated between two long-tailed F1 heterozygotes. Both types of intercrossing produced heterogeneous tail phenotypes in F2 Tbxt Δexon6/+ pups, confirming the 4 incomplete penetrance of tail phenotypes, and the absence of homozygotes (Tbxt Δexon6/Δexon6 ), as summarized in Table 1. To confirm the embryonic phenotypes in homozygotes, embryos were 6 dissected at E11.5 gestation stage from the timed pregnant mice through the standard protocol.
Adult mice (>12 weeks) were anesthetized for X-ray imaging of vertebra using a Bruker In-Vivo 8 Xtreme IVIS imaging system.

Capture-seq genotyping 12
Capture-seq, or targeted sequencing of the loci of interest, was performed as previously described 37 . Conceptually, capture-seq uses custom biotinylated probes to pull down the 14 genomic loci of interest from the standard whole-genome sequencing libraries, thus enabling sequencing of the specific genomic loci in a much higher depth while reducing the cost. 16 Genomic DNA were purified from mESCs or ear punches of founder mice using Zymo Custom biotinylated probes were prepared as bait through nick translation, using BAC DNA and/or plasmids as the template. The probes were prepared to comprehensively cover the 4 whole locus. We used BAC lines RP24-88H3 and RP23-159G7, purchased from BACPAC Genomics, to generate bait probes covering mouse Tbxt locus and ~200kb flanking sequences  (Sx1) or the AluY locus (Y), respectively, with primers that bind the two flanking sequences of the deleted region. Each genotype included two independent clones of AluY deletion or AluSx1 6 deletion, corresponding to the two replicates in Figure 2B. b, Sanger sequencing of the TBXT-Δexon6 and TBXT-Δexon6&7 transcripts detected in Figure 2B. The sequencing results were 8 aligned to the full length TBXT mRNA sequence.