Abstract
Transposable elements (TEs), also known as jumping genes, are sequences able to move or copy themselves within a genome. As TEs move throughout genomes they can be exapted as coding and regulatory elements, or can promote genetic rearrangement. In so doing TEs act as a source of genetic novelty, hence understanding TE evolution within lineages is key in understanding adaptation to their environment. Studies into the TE content of lineages of mammals such as bats have uncovered horizontal transposon transfer (HTT) into these lineages, with squamates often also containing the same TEs. Despite the repeated finding of HTT into squamates, little comparative research has examined the evolution of TEs within squamates. The few broad scale studies in Squamata which have been conducted found both the diversity and total number of TEs differs significantly across the entire order. Here we examine a diverse family of Australo-Melanesian snakes (Hydrophiinae) to examine if this pattern of variable TE content and activity holds true on a smaller scale. Hydrophiinae diverged from Asian elapids ∼15-25 Mya and have since rapidly diversified into 6 amphibious, ∼60 marine and ∼100 terrestrial species which fill a broad range of ecological niches. We find TE diversity and expansion differs between hydrophiines and their Asian relatives and identify multiple HTTs into Hydrophiinae, including three transferred into the ancestral hydrophiine likely from marine species. These HTT events provide the first tangible evidence that Hydrophiinae reached Australia from Asia via a marine route.
Introduction
Elapids are a diverse group of venomous snakes found across Africa, Asia, the Americas and Australia. Following their divergence from Asian elapids ∼30 Mya, the Australo-Melanesian elapids (Hydrophiinae) have rapidly diversified into more than 160 species including ∼100 terrestrial snakes, ∼60 fully marine sea snakes, and 6 amphibious sea kraits [1]. Both the terrestrial and fully marine hydrophiines have adapted to a wide range of habitats and niches. Terrestrial Hydrophiinae are found across Australia, for example the eastern brown snake Pseudonaja textilis in open habitats, the tiger snake (Notechis scutatus) in subtropical and temperate habitats, and the inland taipan (Oxyuranus microlepidotus) to inland arid habitats [2]. Since transitioning to a marine habitat, many sea snakes have specialised to feed on a single prey such as fish eggs, catfish, eels or burrowing gobies, while others such as Aipysurus laevis are generalists [3,4]. Sea kraits (Laticauda) are amphibious and have specialised to hunt various fish including eels and anguilliform-like fish at sea, while digesting prey, mating and shedding on land [5]. Since transitioning to marine environments, both sea snakes and sea kraits have been the recipients of multiple independent horizontal transposon transfer (HTT) events, which may have had adaptive potential [6,7].
Transposable elements (TEs) are mobile genetic elements that can move or copy themselves across the genome, and account for a large portion of most vertebrate genomes [8,9]. Though often given short shrift in genome analyses, TEs are important agents of genome evolution and generate genomic diversity [10,11]. For example, the envelope gene of endogenous retroviruses was exapted by both mammals and viviparous lizards to function in placental development [12]. In addition, unequal crossing over caused by CR1 retrotransposons led to the duplication, and hence diversification, of PLA2 venom genes in pit vipers [13]. Transposable elements (TEs) are classified into one of two major classes based on their structure and replication method [14]. DNA transposons (Class II) proliferate through a “cut and paste” method, possess terminal inverted repeats and are further split based on the transposase sequence used in replication. Retrotransposons (Class I) are split into LTR retrotransposons and non-LTR retrotransposons, which proliferate through “copy and paste” methods. Both subclasses of retrotransposons are split into numerous superfamilies based on both coding and structural features [15–17].Within the diverse lineages of higher vertebrates, the evolution of TEs is well described in eutherian mammals and birds. The total repetitive content of both bird and mammal genomes is consistently at 7-10% and 30-50% respectively. Similarly, most lineages of both birds and eutherian mammals are dominated by a single superfamily of non-LTR retrotransposons (CR1s and L1s respectively) and a single superfamily of LTR retrotransposons (endogenous retroviruses in both) [8,18]. Some lineages of birds and mammals contain horizontally transferred retrotransposons which have variably been successful (AviRTE and RTE-BovB respectively) [19,20].
In stark contrast to mammals and birds, squamates have highly variable mobilomes, both in terms of the diversity of their TE superfamilies and the level activity of said superfamilies within each genome [21]. While these broad comparisons have found significant variation in TEs between distant squamate lineages, none have examined how TEs have evolved within a single family of squamates. The one in depth study into the mobilome of snakes found the Burmese python genome is approximately ∼21% TE and appears to have low TE expansion, while that of a pit viper is ∼45% TE due to the expansion of numerous TE superfamilies and microsatellites since their divergence ∼90 Mya [22,23]. Unfortunately it is unclear whether similar expansions have occurred within other lineages of venomous snakes. Here we examine the TE landscape of the family Hydrophiinae, and in doing so discover horizontal transfer events into the ancestral hydrophiine, sea kraits and sea snakes.
Methods
Ab initio TE annotation of the elapid genomes
We used RepeatModeler2 [24] to perform ab initio TE annotation of the genome assemblies of 4 hydrophiines (Aipysurus laevis, Notechis scutatus, Pseudonaja textilis and Laticauda colubrina) and 2 Asian elapids (Naja naja and Ophiophagus hannah). We manually curated the subfamilies of TEs identified by RepeatModeler (rm-families) to ensure they encompassed the full TE, were properly classified and that each species’ library was non-redundant.
We first purged redundant rm-families from each species library based on pairwise identity to and coverage by other rm-families within the library. Using BLAST [25] we calculated the similarity between all rm-families. Any rm-family with over 75% of its length aligning to a larger rm-family at 90% pairwise identity or higher was removed from the library. We then searched for each non-redundant rm-family within their source genome with BLASTN (-task dc-megablast) and selected the best 30 hits based on bitscore. In order to ensure we could retrieve full length TE insertions, we extended the flanks of each hit by 4000 bp. Using BLASTN (-task dc-megablast) we pairwise aligned each of the 30 extended sequences to others, trimming trailing portions of flanks which did not align to flanks of the other 29 sequences. Following this, we constructed a multiple sequence alignment (MSA) of the 30 trimmed sequences with MAFFT [26] (--localpair). Finally we trimmed each MSA at the TE target site duplications (TSDs) and constructed a consensus from the multiple sequence alignments using Geneious Prime 2021.1.1 (www.geneious.com) which we henceforth refer to as a mc-subfamily (manually curated subfamily).
To classify the mc-subfamilies we searched for intact protein domains in the consensus sequences using RPSBLAST [27] and the CDD library [28] and identified homology to previously described TEs in Repbase using CENSOR online [29]. Using this data in conjunction with the classification set out in Wicker (2007) [14], we classified previously unclassified mc-subfamilies where possible and corrected the classification of mc-subfamilies where necessary. Where possible we used the criteria of Feschotte and Pritham (2007) [17] to identify unclassified DNA transposons using TSDs and terminal inverted repeats. Finally, we removed any genes from the mc-subfamily libraries based on searches using online NCBI BLASTN and BLASTX searches against the nt/nr and UniProt libraries respectively [30,31]. Any mc-subfamilies unable to be classified were labelled as “Unknown”.
TE annotation of the elapid genomes
We constructed a custom library for TE annotation of the elapid genome assemblies by combining the mc-subfamilies from the six assemblies with previously described lepidosaur TEs identified using RepeatMasker’s “queryRepeatDatabase.pl” utility. Using RepeatMasker, we generated repeat annotations of all six elapid genome assemblies.
Estimating ancestral TE similarity
To estimate the sequence conservation of ancestral TEs, and hence categorise recently expanding TEs as either ancestral or horizontally transferred, we identified orthologous TE insertions and their flanks present in both the Notechis scutatus and Naja naja genome assemblies. From the Notechis repeat annotation, we took a random sample of 5000 TEs over 500 bp in length and extended each flank by 1000 bp. Using BLASTN (-task dc-megablast) we searched for the TEs and their flanks in the Naja assembly and selected all hits containing at least 250 bp of both the TE and the flank. Sequences with more than one hit containing flanks were treated as potential segmental duplications. We also removed any potential segmental duplications from the results. We then used the orthologous sequences to estimate the expected range in similarity between TEs present in the most recent common ancestor of Australian and Asian elapids. Based on this information, TEs with 95% or higher pairwise identity to the mc-subfamily used to identify them were treated as likely inserted in hydrophiine genomes since their divergence from Asian elapids. In addition, mc-subfamilies which we had identified as recently expanding in hydrophiines but were not found at 80% or higher pairwise identity in other serpentine genomes, were identified as candidates for horizontal transfer.
Identifying recent TE expansions
In each of the four hydrophiines, using the RepeatMasker output we identified mc-subfamilies comprising at least 100 total kbp having 95% or higher pairwise identity to the mc-subfamily. We treated these mc-subfamilies as having expanded since Hydrophiinae’s divergence from Asian elapids. We reduced any redundancy between recently expanding mc-subfamilies by clustering Using CD-HIT-EST (-c 0.95 -n 6 -d 50) [32]. Using BWA [33], we mapped raw transcriptome reads of eye tissue taken from each of the hydrophiines [34] for back to these mc-subfamilies. Retrotransposons with RNA-seq reads mapping across their whole length and DNA transposons with RNA-seq reads mapping to their coding regions were treated as expressed and therefore currently expanding.
Continued expansion or horizontal transfer
Using BLASTN (-task dc-megablast), we searched for homologs of recently expanding mc-subfamilies in a range of snake genomes including Asian elapids, colubrids, vipers and a python. We classified mc-subfamilies having copies of 80% or higher pairwise identity to the query sequence in other snakes as ancestral. All hydrophiine mc-subfamilies we were unable to find in other snakes were treated as candidates for horizontal transfer. We searched for the horizontal transfer candidates in approximately 600 additional metazoan genomes using BLASTN (-task dc-megablast). We classified all mc-subfamilies present in non-serpentine genomes at 80% or higher pairwise identity and absent from other serpentine genomes at 80% or higher pairwise identity as horizontally transferred into hydrophiines.
Results and Discussion
Genome quality affects repeat annotation
Previous studies have highlighted the importance of genome assembly quality in repeat annotation, with higher sequencing depths and long read technologies critical for resolving TEs [35,36]. Our repeat analysis reveals significant variation in total TE content between genome assemblies (Table 1, Figure 1), however some of this variation is likely due to large differences in assembly quality rather than differential TE expansions or contractions in certain lineages. Most notably, the TE content of the Ophiophagus assembly is significantly lower than that of that of the other species (∼36% compared to ∼46%). The TE content of the Aipysurus assembly is also notably lower, however to a lesser extent (41% compared to ∼46%). The Naja, Laticauda, Notechis, and Pseudonaja assemblies are much higher quality assemblies than the Ophiophagus and Aipysurus assemblies, having longer contigs and scaffolds (SI Table 1). This discrepancy is because the Ophiophagus and Aipysurus genomes are both assembled solely from short read data with a low sequencing depth (28x and 30x respectively). In stark contrast the Naja genome was assembled from a combination of long read (PacBio and Oxford Nanopore) and short read (Illumina) data, scaffolded using Chicago and further improved using Hi-C and optical mapping (Bionano) technologies. In the middle ground, the Laticauda, Notechis and Pseudonaja assemblies utilized a combination of 10X Chromium linked read and short read technologies. Many of the recently expanded TEs in the Ophiophagus and Aipysurus genomes likely collapsed during assembly because of their very high sequence similarity. Therefore, the apparent lack of recent activity in Ophiophagus and Aipysurus is a likely artefact of assembly quality. As the total TE content annotated in the Naja, Laticauda, Notechis and Pseudonaja is comparable at 46-48% of the genome and the four genomes are of comparable quality, the majority of the following analyses focuses on these four species.
Variation in assembly repeat content varies both within hydrophiines and between hydrophiines and Asian elapids. Genome assemblies annotated with RepeatMasker [37] and a custom library of curated RepeatModeler2 libraries [24] and previously described lepidosaur TEs from the Repbase RepeatMasker library [38].
Ancestral TE expansion is similar across hydrophiines and Asian elapids while recent expansion varies between species. Due to much lower genome assembly quality resulting in collapsed TEs, little recent expansion in the Aipysurus laevis and Ophiophagus hannah genomes was detected. TEs were identified using RepeatMasker [37] and a custom repeat library (see methods).
Recent insertions vs ancestral insertions
Recent TE insertions are likely to have diverged only slightly from the sequences RepeatMasker used to identify them, while ancestral insertions will likely be highly divergent. Based on this assumption, we discerned between recent and ancestral insertions using the pairwise identity of TE insertions to the mc-subfamily used to identify them. To estimate the expected divergence of ancestral TE insertions from consensus sequences compared to new insertions we searched for orthologues of 5000 randomly selected Notechis TE insertions and their flanks in the Naja assembly (Figure 2). From the 5000 TEs we were able to identify 2192 orthologues in Naja naja. As expected the median pairwise % identity of the ancestral TEs to the curated consensus sequences was notably lower compared to that of all TEs, at 90.5% compared to 93.6%. Similarly the 95% quantile was lower, at 96.2% compared to 98.4%. Based on these results we treated TEs with a percent identity of 96 % or higher to consensus sequences as having likely been inserted since divergence from Asian elapids.
The similarity of ancestral insertions from mc-consensuses used to identify them was notably lower than that of TEs likely inserted since the species diverged. TEs were initially identified in Notechis using RepeatMasker [37]. The presence of orthologues in Naja was determined using BLASTN (-task dc-megablast) [25].
Recent expansion of specific superfamilies
By comparing TE divergence profiles of the various assemblies, we can gain an overall picture of how TE superfamilies have expanded since the split of Hydrophiinae from Asian elapids (Figures 3-5). Large expansions of Gypsy retrotransposons are apparent in both the Naja and hydrophiine assemblies, however Copia, DIRS and ERVs appear inactive in Naja while expanding in hydrophiines. The divergence profile of DNA transposons suggests Tc1-Mariner and hAT transposons to have either been expanding at a similar rate in all species and/or result from ancestral expansion, with the exception of the explosive expansion of PIF-Harbinger transposons in Laticauda (see [7]). The greatest variation was seen within non-LTR retrotransposons, with L2s highly active in Naja yet completely inactive in hydrophiines. Instead, multiple other LINE superfamilies expanded, in particular CR1s and L1s.. This difference in TE expansion between snake lineages is similar to what was reported by Castoe et. al (2011) [23], Yin et. al (2016) [39] and Pasquesi et. al (2018) [21], except here we see variation within a family of snakes, not just between families of snakes.
Across elapids, hAT transposons appear inactive, Tc1-Mariner transposons appear to have expanded in multiple lineages and PIF-Harbinger transposons appear to have expanded rapidly in Laticauda following horizontal transfer from a marine species. TEs were identified using RepeatMasker [37] and a custom repeat library (see methods).
Gypsy elements are the dominant superfamily in all five species of elapids; ERVs, Copia and DIRS elements have expanded in all Hydrophiinae but have been near inactive in the cobra outgroup. TEs were identified using RepeatMasker [37] and a custom repeat library (see methods).
A wide diversity of non-LTR retrotransposons have expanded in all five genomes, with CR1s and L1s being most active in hydrophiines and L2s most active in the cobra outgroup. TEs were identified using RepeatMasker [37] and a custom repeat library (see methods).
Without highly contiguous assemblies of all species it is difficult to rigorously identify recent or ongoing TE expansions. However, by using transcription as a proxy for transposition we identified currently expressed TE families in present day species as candidates for being active and potentially expanding. To achieve this, we first identified TE subfamilies in each species with over 100 kbp of copies with over 95% pairwise identity to the consensus sequences used to identify them; treating these subfamilies as potentially expanding. By mapping raw transcriptome reads back to these consensuses, we were able to identify expressed TE subfamilies. In all four species, diverse TEs were expressed including subfamilies of Copia, ERV, DIRS, Gypsy, Penelope, CR1, L1s, Rex1, RTE, hAT and Tc1-Mariner.
Continued expansion or horizontal transfer
The TE subfamilies which we have identified as recently expanded within Hydrophiinae could be ancestral and continuously expanding since diverging from Asian elapids or have been horizontally transferred from long diverged species. Differentiating between ancestral and horizontally transferred TEs is difficult and must meet strict conditions. Horizontally transferred sequences are defined as having a patchy phylogenetic distribution and higher similarity to sequences in another species than would be expected based on divergence time. To identify any TEs which may have been horizontally transferred into Hydrophiinae we conservatively estimated the expected minimum similarity of TEs present in both hydrophiines and Asian elapids using the 2192 orthologous sequences identified in Notechis and Naja to be 80% (Figure 6). Based on this, any vertically inherited TE subfamily classified as recently expanding in hydrophiines will likely have copies of 80% or higher similarity present in Asian elapids.
TE initially identified in Notechis scutatus using RepeatMasker, orthologues identified in and pairwise identity calculated for Naja naja using BLASTN (-task dc-megablast) [25].
To determine whether any recently expanding TE subfamilies were horizontally transferred into hydrophiines following their divergence from Asian elapids, we searched for them in the genomes of Naja, Ophiophagus and an additional 8 non-elapid snakes. Some recently expanding subfamilies absent from Naja and Ophiophagus were present in non-elapid snakes at 80% or higher identity. To be conservative we treated these TEs as ancestral, likely being lost from Asian elapids. The remaining TE subfamilies, those present in hydrophiines but absent from other snakes, were treated as horizontal transfer candidates. To confirm these candidate TEs were horizontally transferred into hydrophiines we searched for them in over 600 metazoan genomes. This search revealed at least eleven autonomous TEs present in non-serpentine genomes at 80% or higher identity and are therefore likely to have been horizontally transferred into hydrophiines. Of these eleven, three were transferred into the ancestral hydrophiine, five into sea kraits, one into sea snakes and one into the common ancestor of terrestrial hydrophiines and sea snakes (Figure 7).
At least eleven autonomous TEs have been horizontally transferred into hydrophiines, with most likely from marine organisms. We have previously described the horizontal transfer of the Proto2 to Aipysurus laevis in Galbraith et al. (2020) [6] and the PIF-Harbinger to Laticauda in Galbraith et al. (2021) [7]
We have previously described 2 of the 11 HT events in detail, that of Proto2-Snek to Aipysurus and Harbinger-Snek toLaticauda, both of which were likely transferred from a marine species (see [6,7]). Three of the four newly identified HT events identified in Laticauda were probably also from an aquatic species, because similar sequences are only found in marine or amphibious species. Therefore, the transfer of these elements likely occurred following the transition of each group to a marine habitat. The exception is a Tc1-Mariner which is most similar to sequences identified in hemipterans, a beetle and a spider, however as Laticauda is amphibious this is perhaps not surprising.
The Rex1 transferred to the common ancestor of terrestrial hydrophiines and sea snakes was only identified in the central bearded dragon (Pogona vitticeps), an agamid lizard native to the inland woodlands and shrublands of eastern and central Australia [40]. As this TE is restricted to another species of Australian squamate, this HTT appears to have occurred after hydrophiines reached Australia.
The most interesting of the horizontally transferred TEs are the Tc1-Mariner, Gypsy and Rex1 which were horizontally transferred into the ancestral hydrophiine following its divergence from Asian elapids. Those three are most similar to sequences identified in marine species, either fish or tunicates. Marine elapids (sea kraits and sea snakes) and terrestrial Australian elapids were originally considered two distinct lineages [41–43], however recent adoption of molecular phylogenomics has resolved Hydrophiinae as a single lineage, with sea kraits as a deep-branch and sea snakes nested within terrestrial Australian snakes [1,44,45]. Fossil evidence combined with an understanding of plate tectonics has revealed Hydrophiinae, like many other lineages of Australian reptiles, likely colonised Australia via hopping between islands formed in the Late Oligocene-Early Miocene by the collision of the Australian and Eurasian plates [46–50]. Alternatively, it has also been proposed the common ancestor of Hydrophiinae may have been a semi-marine “proto-Laticauda”, which colonised Australia in the Late Oligocene directly from Asia [51]. The horizontal transfer of three TEs into the ancestral hydrophiine likely from a marine organism provides tangible support for the hypothesis that the ancestral hydrophiine was a semi-marine or marine snake.
Conclusion
In our survey of elapid genomes, we have found that TE diversity and their level of expansion varies significantly within a single family of squamates, similar to the variation previously seen across all squamates or within long diverged snakes. This diversity and variation is much greater than what has been reported for mammals and birds. Our finding of HTT into lineages of hydrophiine exposed to novel environments indicates that environment may play a large role in HTT through exposure to new TEs. Additionally, the HTT of three TEs found solely in marine organisms into the ancestral hydrophiine provides evidence that terrestrial Australian elapids are derived from a marine or amphibious ancestor.
As long read genome sequencing becomes feasible for more species, genome assembly quality will continue to increase and multiple genomes of non-model species will be able to be sequenced. Using these higher quality genomes, we will be able to better understand HTT and the role TEs play in adaptive evolution. Due to their rapid adaptation to a wide range of environments and multiple HTT events into different lineages, Hydrophiinae provide the ideal system for such studies.
Supplementary Information
SI Table 1
Genome assembly statistics. Repeat composition was calculated using RepeatMasker [37] and a custom library of curated RepeatModeler libraries and previously described lepidosaur TEs from the Repbase RepeatMasker library [38].
SI Table 2
A list of the metazoan genome assemblies searched for HTT. All genomes were downloaded from NCBI RefSeq and GenBank [52,53].
SI Table 3
Horizontally transferred TEs and the number of hits in species they were identified in. TEs were identified using BLASTN (-task dc-megablast) [25].
Acknowledgements
We thank James Nankivell and Vicki Thomson for their discussions on Hydrophiinae’s phylogeny and dispersal to Australia.