Summary
Monitor lizards are unique among ectothermic reptiles in that they have a high aerobic capacity and distinctive cardiovascular physiology which resembles that of endothermic mammals. We have sequenced the genome of the Komodo dragon (Varanus komodoensis), the largest extant monitor lizard, and present a high resolution de novo chromosome-assigned genome assembly for V. komodoensis, generated with a hybrid approach of long-range sequencing and single molecule physical mapping. Comparing the genome of V. komodoensis with those of related species showed evidence of positive selection in pathways related to muscle energy metabolism, cardiovascular homeostasis, and thrombosis. We also found species-specific expansions of a chemoreceptor gene family related to pheromone and kairomone sensing in V. komodoensis and several other lizard lineages. Together, these evolutionary signatures of adaptation reveal genetic underpinnings of the unique Komodo sensory, cardiovascular, and muscular systems, and suggest that selective pressure altered thrombosis genes to help Komodo dragons evade the anticoagulant effects of their own saliva. As the only sequenced monitor lizard genome, the Komodo dragon genome is an important resource for understanding the biology of this lineage and of reptiles worldwide.
Introduction
The evolution of form and function in the animal kingdom contains numerous examples of innovation and diversity. Within vertebrates, non-avian reptiles are a particularly interesting lineage. There are an estimated 10,000 reptile species worldwide, found on every continent except Antarctica, with a diverse range of morphologies and lifestyles 1. This taxonomic diversity corresponds to a broad range of anatomic and physiological adaptations. Understanding how these adaptations evolved through changes to biochemical and cellular processes will reveal fundamental insights in areas ranging from anatomy and metabolism to behavior and ecology.
The varanid lizards (genus Varanus, or monitor lizards) are an unusual group of squamate reptiles characterized by a variety of traits not commonly observed within non-avian reptiles. Varanid lizards vary in mass by close to five orders of magnitude (8 grams–100 kilograms), comprising the genus with the largest range in size 2. Among the squamate reptiles, varanids have a unique cardiopulmonary physiology and metabolism with numerous parallels to the mammalian cardiovascular system. For example, their cardiac anatomy allows high pressure shunting of oxygenated blood to systemic circulation 3. Furthermore, varanid lizards can achieve and sustain very high aerobic metabolic rates accompanied by elevated blood pressure and high exercise endurance 4–6, which facilitates high-intensity movements while hunting prey 7. The specialized anatomical, physiological, and behavioral attributes of varanid lizards are all present in the Komodo dragon (Varanus komodoensis). As the largest extant lizard species, Komodo dragons can grow to 3 meters in length and run at speeds of up to 20 kilometers miles per hour, which allows them to hunt large prey such as deer and boar in their native Indonesia 8. Komodo dragons have a higher metabolism than predicted by allometric scaling relationships for varanid lizards 9, which helps explains their extraordinary capacity for daily movement to locate prey 10. Their ability to locate injured or dead prey through scent tracking over several kilometers is enabled by a powerful olfactory system 8. Additionally, serrate teeth, sharp claws, and saliva with anticoagulant and shock inducing properties aid Komodo dragons in hunting prey 11,12. Komodo dragons have aggressive intraspecific conflicts over mating, territory, and food, and wild individuals often bear scars from previous conflicts 8.
To understand the genetic underpinnings of the specialized Komodo dragon physiology, we sequenced its genome and used comparative genomics to discover how the Komodo dragon genome differs from other species. We present a high quality de novo assembly, generated with a hybrid approach of long-range sequencing using 10x Genomics, PacBio, and Oxford Nanopore sequencing, and single-molecule physical mapping using the BioNano platform. This suite of technologies allowed us to confidently assemble a high-quality reference genome for the Komodo dragon, which can serve as a template for other varanid lizards. We used this genome to understand the relationship of varanids to other reptiles using a phylogenomics approach. We uncovered Komodo-specific positive selection for a suite of genes encoding regulators of muscle metabolism, cardiovascular homeostasis, and thrombosis. Further, we discovered multiple lineage-specific expansions of a family of chemoreceptor genes in several squamates, including some lizards and a snake, as well as in the Komodo dragon genome. Finally, we generated a high-resolution chromosomal map resulting from assignment of scaffold to chromosomes, providing a powerful tool to address questions about karyotype and sex chromosome evolution in squamates.
Results
De novo genome assembly
We obtained DNA from peripheral blood of two male individuals housed at Zoo Atlanta: Slasher, a male offspring of the first Komodo dragons given as gifts in 1986 to US President Reagan from President Suharto of Indonesia, and Rinca, a juvenile male (Figure 1A). The V. komodoensis genome is distributed across 20 pairs of chromosomes, comprising eight pairs of large chromosomes and 12 pairs of microchromosomes 13,14. De novo assembly was performed using 57X coverage of 10x Genomics linked-read sequencing data to generate an initial assembly with a scaffold N50 of 10.2 Mb and a contig N50 of 95 kb. Separately, 80X coverage of Bionano physical mapping data was de novo assembled to create an assembly with a scaffold N50 of 1.2 Mb. These two assemblies were merged into a hybrid assembly and then scaffolded further using 6.3X coverage from PacBio sequencing and 0.75X coverage from Oxford Nanopore MinIon sequencing, for a total coverage of 144X. The final assembly contained 1,403 scaffolds (>10 kb) with an N50 of 29 Mb (longest scaffold: 138 Mb) (Table 1). The assembly is 1.51 Gb in size, ~32% smaller than the genome of the Chinese crocodile lizard (Shinisaurus crocodilurus), the closest relative of the Komodo dragon for which a sequenced genome is available, and ~15% smaller than the green anole (Anolis carolinensis), a model squamate lizard (Table S1). An assembly-free error corrected k-mer counting estimate of genome size estimates the Komodo dragon genome to be 1.69 Gb, making our assembly 89% complete. The GC content of the Komodo dragon genome is 44.0%, similar to the GC content of the S. crocodilurus genome (44.5%) but higher than the GC content of A. carolinensis (40.3%) (Table S1). Repeats were annotated using RepeatMasker and the squamate repeat dabatase 15. Repetitive elements accounted for 32% of the genome, most of which were transposable elements (Table S2). As repetitive elements in S. crocodilurus account for 49.6% of the genome 16, most of the difference in size between the Komodo dragon genome and its closest sequenced relative genome can be attributed to differences in repetitive element content.
Chromosome scaffold content
We isolated chromosome-specific DNA pools from a female embryo of V. komodoensis (VKO) from Prague zoo stock through flow sorting 14. We obtained Illumina short-read sequences of these 15 DNA pools containing all chromosomes of V. komodoensis (Table S3). For each chromosome we determined scaffold content and homology to A. carolinensis and G. gallus chromosomes (Table 2 and Table S4). For each chromosome, we determined scaffold content and homology to A. carolinensis and G. gallus chromosomes (Table 2 and Tables S4-5). For those pools where chromosomes were mixed (VKO6/7, VKO8/7, VKO9/10, VKO11/12/W, VKO17/18/Z, VKO17/18/19) we determined partial scaffold content of single chromosomes. Homology to A. carolinensis microchromosomes was determined using scaffold assignments to chromosomes performed in Anolis chromosome-specific sequencing project 17. A total of 243 scaffolds containing 1.14 Gb (75% of total 1.51 Gb assembly) were assigned to 20 chromosomes of V. komodoensis. As sex chromosomes shares homologous regions (pseudoautosomal regions), scaffolds that were enriched in both 17/18/Z and 11/12/W samples most likely contained sex chromosome regions of V. komodoensis. Considering that our reference genome was from a male individual, they were assigned to the Z chromosome. All these scaffolds were homologous to A. carolinensis chromosome 18, and mostly to chicken chromosome 28 as recently determined by transcriptome analysis 18.
Gene annotation
As Komodo dragons have a unique cardiovascular physiology, we used heart tissue as the source for RNA sequencing to increase the accuracy of cardiovascular gene prediction, increasing our power to detect interesting changes to the cardiovascular system encoded in the genome. RNA sequencing was assembled into transcripts with Trinity 19. After soft masking repetitive elements, genes were annotated using the MAKER pipeline with protein homology, assembled transcripts, and de novo predictions as evidence, and stringently quality filtered (see Methods). A total of 18,462 protein coding genes were annotated in the Komodo genome, 17,194 (93%) of which have one or more annotated Interpro functional domain (Table 1). Of these protein-coding genes, 63% were expressed (RPKM > 1) in the heart. A total of 89% of Komodo dragon protein-coding genes are orthologous to genes in the model lizard A. carolinensis genome. The median percent identity of single-copy orthologs between Komodo and A. carolinensis is 68.9%, whereas it is 70.6% between single-copy orthologs in Komodo and S. crocodilurus (Figure S1).
Phylogenetic placement of Komodo
As the Komodo dragon genome is the first monitor lizard (Family Varanidae) to have a complete genome sequence, previous phylogenetic analyses of varanid lizards has been limited to marker sequences 20,21. We used the Komodo dragon genome to estimate a species tree using 2,752 single copy orthologs (see Methods) present in the Komodo dragon and 14 representative non-avian reptile species, including 7 squamates, 3 turtles, and 4 crocodilians, along with one avian species (chicken) and one mammalian species (mouse) (Figure 2). The placement of Komodo dragon and the monitor lizard genus using this genome-wide dataset agrees with previous marker gene studies 20,21.
Expansion of vomeronasal genes across squamate reptiles
The vomeronasal, or the Jacobson’s, organ is a chemosensory tissue that detects chemical cues such as pheromones and kairomones. It is shared across amphibians, mammals, and reptiles though it has been secondarily lost in some groups, including birds 22,23. Squamate reptiles such as snakes and lizards have apparently functional vomeronasal organs with the ability to sense prey-derived chemical signals, as well as specific associated behaviors such as tongue-flicking to deliver olfactory cues to the sensory tissue, and it is clear that the vomeronasal organ plays an important role in squamate reptile ecology 24. Two types of chemosensory receptors, both of which are seven-transmembrane G-protein coupled receptors, function as sensors in the vomeronasal organ. The number of Type 1 vomeronasal receptors (V1Rs) has expanded through gene duplications in certain mammalian lineages, while the number of Type 2 receptors (V2Rs) has expanded in amphibians and some mammalian lineages 22. Crocodilian and turtle genomes contain few to no V1R and V2R genes 25. Snakes, in contrast, have a significantly expanded V2R repertoire that has arisen through gene duplication26.
To clarify the relationship between vomeronasal organ function and evolution of vomeronasal-receptor gene families, we analyzed the coding sequences of 15 reptiles, including Komodo, for presence of V1R and V2R genes (Figure 3A). We confirmed that there are few V1R genes across reptiles generally and few to no V2R genes in crocodilians and turtles (Table S6). The low number of V2R genes in green anole (Anolis carolinensis) and Australian dragon lizard (Pogona vitticeps) suggest that V2R genes are infrequently expanded in iguanians, though more iguanian genomes are needed to test this hypothesis. In contrast, we found a large repertoire of V2Rs, comparable in size to that of snakes, in the Komodo dragon and other lizards.
To infer the details of the dynamic evolution of this gene family, we built a phylogeny of all V2R gene sequences across squamates (Figure 3B). The topology of this phylogeny supports that, as previously hypothesized, V2Rs expanded in the common ancestor of squamates, as there are clades of gene sequences containing members from all species 26. In addition, there are a large number of well-supported single species clades (i.e., Komodo dragon only) dispersed across the gene tree, which is consistent with multiple duplications of V2R genes later in squamate evolution, including in the Komodo and gecko lineages (Figure 3B).
Because V2Rs have expanded in rodents through tandem gene duplications that produced clusters of paralogs 27, we examined clustering of V2R genes in our Komodo assembly to determine if a similar mechanism is likely driving these gene expansions. Of 151 V2Rs, 99 are organized into 26 gene clusters ranging in size from 2 to 14 genes (Figure 4A, Table S7). To understand if these gene clusters arose through tandem gene duplication, we constructed a phylogeny of all Komodo dragon V2R genes (Figure 4B). The largest V2R cluster contains 14 V2R genes, which group together in a gene tree of Komodo V2R genes (Figure 4). Of the remaining 52 V2R genes, 38 are on scaffolds less than 10 Kb in size, so our estimate of V2R clustering is a lower bound due to fragmentation in the genome assembly (Table S7).
Positive selection
To test for adaptive protein evolution in the Komodo dragon genome, we identified single-copy orthologs across squamate reptiles, built codon alignments, and ran tests of positive selection using a branch-site model to determine genes that have diversified in the varanid lineage (see Methods and Table S8). Our analysis revealed 201 genes with signatures of positive selection in Komodo dragons (Table S9). Many of the genes under positive selection point towards important adaptations of the Komodo dragon’s mammalian-like cardiovascular and metabolic functions, which are unique among non-varanid ectothermic reptiles, though 25% of positively selected genes were not detectably expressed in the heart and likely represent adaptations in other aspects of Komodo dragon biology. Pathways with positively expressed genes include mitochondrial regulation and cellular respiration, hemostasis and the coagulation cascade, innate and adaptive immunity, and angiotensinogen (a central regulator of cardiovascular physiology). Many of these have implications for Komodo physiology, and for varanid lizard physiology generally. We identified several functional categories with multiple positively selected for more detailed analysis. In each case, the genes are located in different parts of the Komodo genome and therefore likely represent recurrent selection on these functions during Komodo evolution.
Positive selection of genes regulating mitochondrial function
Mitochondria regulate energy production in cells through the oxidative phosphorylation process, which is mediated through the electron transport chain. Multiple subunits and assembly factors of the Type 1 NADH dehydrogenase and cytochrome c oxidase protein complexes, which perform the first and last steps of the electron transport chain respectively, show evidence of positive selection in the Komodo dragon genome (Figure 5, Figure S2, Table S9). These include the genes NDUFA7, NDUFAF7, NDUFAF2, NDUFB5 from the Type 1 NADH dehydrogenase complex and COX6C and COA5 from the cytochrome c oxidase complex.
Beyond the electron transport chain, other elements of mitochondrial function have signatures of positive selection in the Komodo lineage (Figure 5). Of note, we also detected positive selection for ACADL, which encodes LCAD - acyl-CoA dehydrogenase, long chain—a member of the acyl-CoA dehydrogenase family. LCAD is a critical enzyme for mitochondrial fatty acid beta-oxidation, the major postnatal metabolic process in cardiac myocytes 28. Further, two genes that promote mitochondrial biogenesis, TFB2M and PERM1, have undergone positive selection in the Komodo dragon. TFB2M regulates mtDNA transcription and dimethylates mitochondrial 12s rRNA 29,30. PERM1 regulates the expression of selective PPARGC1A/B and ESRRA/B/G target genes with roles in glucose and lipid metabolism, energy transfer, contractile function, muscle mitochondrial biogenesis and oxidative capacity 31. PERM1 also enhances mitochondrial biogenesis, oxidative capacity, and fatigue resistance when over-expressed in mice 32. Finally, we also identified MDH1, encoding malate dehydrogenase, which together with the mitochondrial MDH2, regulates the supply of NADPH and acetyl-CoA to the cytoplasm, thus modulating fatty acid synthesis 33.
Multiple factors regulating translation within the mitochondria have also undergone positive selection in the Komodo dragon (Figure 5). This includes the mitochondrial ribosome, including four components of 28S small ribosomal subunit (MRPS15, MRPS23, MRPS31, and AURKAIP1) and two components of the 39S large ribosomal subunit (MRPL28 and MRPL37). We also found evidence for positive selection on the ELAC2 and TRMT10C genes, which are required for maturation of mitochondrial tRNA, and MRM1, which encodes a mitochondrial rRNA methyltransferase 34–36.
Overall, these instances of positive selection in a large range of genes encoding proteins important for mitochondrial function and biogenesis clearly point to a coordinated genetic pathway that could explain the remarkable aerobic capacity of the Komodo dragon. While it is not possible to determine whether these adaptations are present in other monitor lizards in the absence of a sequenced genome, changes in cellular respiration likely play a role in the high aerobic capacity of most varanid lizards.
Positive selection of angiotensinogen
We detected positive selection for angiotensinogen (AGT), which encodes the precursor of several important peptide regulators of cardiovascular function, the most well-studied being angiotensin II (AII) and angiotensin1-7 (A1-7). AII has multiple important and potent activities in cardiovascular physiology. The two most notable, and perhaps most relevant to Komodo dragon physiology, are its vasoactive function in blood vessels, and its inotropic effects on the heart. In mammals during intense physical activity, All increases and contributes to arterial blood pressure and regional blood regulation 37,38. The positive selection for AGT points to important adaptations in these physiological parameters. Reptiles have a functional renin-angiotensin system that is important for their cardiovascular physiology 39–41. It is likely that positive selection for AGT is related to a mammalian-like cardiovascular function in the Komodo dragon.
Positive selection of thrombosis-related genes
We find evidence for positive selection across different elements of the coagulation cascade, including regulators of platelets and fibrin. The coagulation cascade controls thrombosis, or blood clotting, preventing blood loss during injury. Four genes that regulate platelet activities, MRVI1, RASGRP1, LCP2, and CD63 have undergone positive selection in the Komodo dragon genome. MRVI1 is involved in inhibiting platelet aggregation 42, RASGRP1 coordinates calcium dependent platelet responses 43, LCP2 is involved in platelet activation 44, and CD63 plays a role in controlling platelet spreading 45. In addition to regulators of platelets, two coagulation factors, F10 (Factor X) and F13B (Coagulation factor XIII B chain) have undergone positive selection in the Komodo genome. Factor X is centrally important to the coagulation cascade and its activation is the first step in initiating coagulation 46. Factor 13B is the beta subunit of Factor 13, which is the final coagulation factor activated in the coagulation cascade 47. Further, FGB, which encodes one of the three subunits of fibrinogen, the molecule which is converted to the clotting agent fibrin 48, has undergone positive selection in the Komodo genome.
Komodo dragons, along with other species of monitor lizards, produce anticoagulants and hypotension-inducing proteins in their saliva which are hypothesized to aid in hunting 11,12. In addition to hunting, Komodo dragons use their serrate teeth during intraspecific conflict, which can be aggressive and inflict serious wounds 8. Because it is likely that their saliva enters the bloodstream of Komodo dragons during these conflicts, we hypothesize that the positive selection that we detected in many Komodo dragon coagulation genes may result from selective pressure for Komodo dragons to evade the anticoagulant effects of conspecifics.
Discussion
We have sequenced and assembled a high-quality genome of the Komodo dragon. The combination of platforms that we used allowed the de novo assembly of a genome that will serve as a template for analysis of other varanid genomes, and for further investigation of genomic innovations in the varanid lineage. Moreover, we assigned 75% of the genome to chromosomes. Assignment of the Komodo dragon genome to chromosomes provides a significant contribution to comparative genomics of squamates and vertebrates in general.
Our comparative genomic analysis identified previously undescribed species-specific expansion of Type 2 vomeronasal receptors across multiple squamates, including lizards and at least one snake. It will be exciting to explore the role this expansion of V2Rs plays in behavior and ecology of Komodo dragons, including their ability to locate prey at long distances 8. Komodo dragons, like other squamates, are known to possess a sophisticated lingual-vomeronasal systems for chemical sampling of their environment 49. This sensory apparatus allows Komodo dragons to perceive chemicals from the environment for a variety of social and ecological activities, including kin recognition, mate choice 50,51, predator avoidance 52,53, hunting prey 54,55, and for locating and tracking injured or dead prey. Komodo dragons are unusual as they adopt both foraging tactics across ontogeny with smaller juveniles preferring active foraging for small prey and large adult dragons targeting larger ungulate prey via ambush predation 10. However, retention of a highly effective lingual-vomeronasal system across ontogeny seems likely, given the exceptional capacity for Komodo dragons of all sizes to locate injured or dead prey.
We find evidence for positive selection across many genes involved in regulating mitochondrial biogenesis, cellular respiration, and cardiovascular homeostasis. Komodo dragons, along with other monitor lizards, have a high aerobic capacity and exercise endurance, and our results reveal selective pressures on biochemical pathways that are likely to be the source of this high aerobic capacity. Future genomic work on additional varanid species, and other squamate outgroups, will test these hypotheses. These selective processes are consistent with the increased oxidative capacity in python hearts after feeding 28. Reptile muscle mitochondria typically oxidize substrates at a much lower rate than mammals, partly based on substrate-type use 56. The findings that Komodo have experienced selection for several genes encoding mitochondrial enzymes, including one involved in fatty acid metabolism, points towards a more mammalian-like mitochondrial function. In addition to a clear indication of adaptive muscle metabolism, we found positive selection for AGT, which encodes two potent vasoactive and inotropic peptides with central roles in cardiovascular physiology. A compelling hypothesis is that this positive selection is an important component in the ability of the Komodo to rapidly increase blood pressure and cardiac output for attacks on prey, extended periods of locomotion including inter-island swimming, and male-male combat during the breeding season. Direct measures of cardiac function have not been made in Komodo dragons, but in other varanid lizards, a large aerobic scope during exercise is associated with a large factorial increase in cardiac output 57. Overall, these cardiovascular genes suggest a profoundly different cardiovascular and metabolic profile relative to other squamates, endowing the Komodo dragon with unique physiological properties.
We also found evidence for positive selection across genes that regulate blood clotting. Like other monitor lizards, the saliva of Komodo dragons contains anticoagulants. The extensive positive selection on the genes encoding their coagulation system likely reflects that there is selective pressure for Komodo dragons to evade the anticoagulant and hypotensive effects of the saliva of conspecific rivals for food, territories, or mates. While all monitor lizards tested contain anticoagulants in their saliva, the precise mechanism by which they act varies 12. It is likely that monitor lizards have evolved different types of adaptations that reflect the diversity of their anticoagulants. Understanding how these systems have evolved has the potential to further our understanding of the biology of thrombosis.
Varanids, including Komodo dragons, possess genotypic sex determination and share ZZ/ZW sex chromosomes with other anguimorphan lizards 14,18. Here, we were able to detail the content of Z chromosome of V. komodoensis. The chromosome sequencing data provided significant insights into the content of V. komodoensis Z chromosome. All scaffolds assigned to Z chromosome were homologous to A. carolinensis chromosome 18 (ACA18) and to chicken chromosome 28, as confirmed by comparison of blood transcriptome between sexes 18. The same syntenic blocks and genes appear to be implicated in different vertebrate lineages in sex determination mechanisms 58. In particular, the regions of sex chromosomes that are shared by the common ancestor of varanids and several other lineages of anguimorphan lizards contain the amh (anti-Müllerian hormone) gene 18, which plays a crucial role in the testis differentiation pathway. Homologs of the amh gene are also strong candidates for being the sex-determining genes in several lineages of teleost fishes and in monotremes 59–62.
Materials and Methods
DNA isolation and processing for Bionano optical mapping
Komodo dragon whole blood was used to extract high molecular weight genomic DNA for genome mapping. Blood was centrifuged at 2000g for 2 minutes, plasma was removed, and the sample was stored at 4°C. 2.5μl of blood was embedded in 100μl of agarose gel plug to give ~7μg DNA/plug, using the BioRad CHEF Mammalian Genomic DNA Plug Kit (Bio-Rad Laboratories, Hercules, CA, USA). Plugs were treated with proteinase K overnight at 50°C. The plugs were then washed, melted, and then solubilized with GELase (Epicentre, Madison, WI, USA). The purified DNA was subjected to four hours of drop-dialysis. DNA concentration was determined using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA), and the quality was assessed with pulsed-field gel electrophoresis.
The high molecular weight DNA was labeled according to commercial protocols using the IrysPrep Reagent Kit (Bionano Genomics, San Diego, CA, USA). Specifically, 300 ng of purified genomic DNA was nicked with 7 U nicking endonuclease Nb.BbvCI (NEB, Ipswich, MA, USA) at 37°C for two hours in NEB Buffer 2. The nicked DNA was labeled with a fluorescent-dUTP nucleotide analog using Taq polymerase (NEB) for one hour at 72°C. After labeling, the nicks were repaired with Taq ligase (NEB) in the presence of dNTPs. The backbone of fluorescently labeled DNA was stained with DNA stain (BioNano).
DNA processing for 10x Genomics linked read sequencing
High molecular weight genomic DNA extraction, sample indexing, and generation of partition barcoded libraries were performed by 10x Genomics (Pleasanton, CA, USA) according to the Chromium Genome User Guide and as published previously 64.
Bionano mapping and assembly
Using the Bionano Irys instrument, automated electrophoresis of the labeled DNA occurred in the nanochannel array of an IrysChip (Bionano Genomics), followed by automated imaging of the linearized DNA. The DNA backbone (outlined by YOYO-1 staining) and locations of fluorescent labels along each molecule were detected using the Irys instrument’s software. The length and set of label locations for each DNA molecule defines an individual single-molecule map. Raw Bionano single-molecule maps were de novo assembled into consensus maps using the Bionano IrysSolve assembly pipeline (version 5134) with default settings, with noise values calculated from the 10x Genomics Supernova assembly.
10x Genomics sequencing and assembly
The 10x Genomics barcoded library was sequenced on the Chromium machine, and the raw reads were assembled using the company’s Supernova software (version 1.0) with default parameters. Output fasta files of the phased Supernova assemblies were generated in pseudohap format.
Merging datasets into a single assembly
Sequencing and mapping data types were merged together as follows. First, Bionano assembled contigs and the 10x Genomics assembly were combined using Bionano’s hybrid assembly tool with the -B2 -N2 options. SSPACE-LongRead (cite https://doi.org/10.1186/1471-2105-15-211) was used in series with default parameters to scaffold the hybrid assembly using PacBio reads, Nanopore reads, and unincorporated 10x Genomics Supernova scaffolds/contigs, resulting in the final assembly.
Assignment of scaffolds to chromosomes
Isolation of V. komodoensis (VKO) chromosome-specific DNA pools as previously described 14. Briefly, fibroblast cultivation of a female V. komodoensis were obtained from tissue samples of an early embryo of a captive individual. Chromosomes obtained by fibroblast cultivation were sorted using a Mo-Flo (Beckman Coulter) cell sorter. Fifteen chromosome pools were sorted in total. Chromosome-specific DNA pools were then amplified and labelled by degenerate oligonucleotide primed PCR (DOP-PCR) and assigned to their respective chromosomes by hybridization of labelled probes to metaphases. V. komodoensis chromosome pools obtained by flow sorting were named according to chromosomes (e.g. majority of DNA of VKO6/7 belong to chromosomes 6 and 7 of V. komodoensis). V. komodoensis pools for macrochromosomes are each specific for one single pair of chromosomes, except for VKO6/7 and VKO8/7, which contain one specific chromosome pair each (pair 6 and pair 8, respectively), plus a third pair which overlaps between the two of them (pair 7). For microchromosomes, pools VKO9/10, VKO17/18/19, VKO11/12/W and VKO17/18/Z contained more than one chromosome each, while the rest are specific for one single pair of microchromosomes. The W and Z chromosomes are contained in pools VKO11/12/W and VKO17/18/Z, respectively, together with two pairs of other microchromosomes each.
Chromosome-pool specific genetic material was amplified by GenomePlex® Whole Genome Amplification (WGA) Kit (Sigma) following manufacturer protocols. DNA from all 15 chromosome pools was used to prepare Illumina sequencing libraries, which were independently barcoded and sequenced 125 bp paired-end in a single Illumina Hiseq2500 lane. Reads obtained from sequencing of flow-sorting-derived chromosome-specific DNA pools were processed with the dopseq pipeline (https://github.com/lca-imcb/dopseq) 17,65. Illumina adapters and WGA primers were trimmed off by cutadapt v1.13 66. Then, pairs of reads were aligned to the genome assembly of V. komodoensis using bwa mem 67. Reads were filtered by MAPQ ≥ 20 and length ≥ 20 bp, and aligned reads were merged into positions using pybedtools 0.7.10 68,69. Reference genome regions were assigned to specific chromosomes based on distance between positions. Finally, several statistics were calculated for each scaffold. Calculated parameters included: mean pairwise distance between positions on scaffold, mean number of reads per position on scaffold, number of positions on scaffold, position representation ratio (PRR) and p-value of PRR. PRR of each scaffold was used to evaluate enrichment of given scaffold on chromosomes. PRR was calculated as ratio of positions on scaffold to positions in genome divided by ratio of scaffold length to genome length. PRRs >1 correspond to enrichment, while PRRs <1 correspond to depletion. As the PRR value is distributed lognormally, we use its logarithmic form for our calculations. To filter out only statistically significant PRR values we used thresholds of logPRR >0 and its p-value <=0.01. Scaffolds with logPRR > 0 were considered enriched in the given sample. If one scaffold was enriched in several samples we chose highest PRR to assign scaffold as top sample. We also assigned homology of V. komodoensis genome to genomes of Anolis carolinensis (AnoCar2.0) and Gallus gallus (galGal3) generating alignment between genomes with LAST 70 and subsequently using chaining and netting technique 71. For LAST we used default scoring matrix and parameters of 400 for gap existence cost, 30 for gap extension cost and 4500 for minimum alignment score. For axtChain we used same distance matrix and default parameters for other chain-net scripts.
RNA sequencing
RNA was extracted from heart tissue obtained from an adult male specimen that died of natural causes. Trizol reagent was used to extract RNA following manufacturer’s instructions. RNAseq libraries were produced using a NuGen RNAseq v2 and Ultralow v2 kits, and sequenced on an Illumina Nextseq 500.
Genome annotation
RepeatMasker was used to mask repetitive elements in the Komodo dragon genome using the squamata repeat database as reference 15. After masking repetitive elements, protein-coding genes were annotated using the MAKER version 3.01.02 72 pipeline, combining protein homology information, assembled transcript evidence, and de novo gene predictions from SNAP and Augustus version 3.3.1 73. Protein homology was determined by aligning proteins from 15 reptile species (Table S10) to the Komodo dragon genome using exonerate version 2.2.0 74. RNA-seq data was aligned to the Komodo genome with STAR version 2.6.0 75 and assembled into 900,722 transcripts with Trinity version 2.4.0 19. Protein domains were determined using InterProScan version 5.31.70 76. Gene annotations from the MAKER pipeline were filtered based on the strength of evidence for each gene, the length of the predicted protein, and the presence of protein domains. Clusters of orthologous genes across 15 reptile species were determined with OrthoFinder v2.0.0 77. A total of 284,107 proteins were clustered into 16,546 orthologous clusters. In total, 96.4% of Komodo genes were grouped into orthologous clusters. For estimating a species phylogeny only, protein sequences from Mus musculus and Gallus gallus were added to the orthologous clusters with OrthoFinder. tRNAs were annotated using tRNAscan-SE version 1.3.178, and other non-coding RNAs were annotated using the Rfam database 79 and the Infernal software suite 80.
Phylogenetic analysis
A total of 2,752 single-copy orthologous proteins across 15 reptile species, including Varanus komodoensis, Shinisaurus crocodilurus, Ophisaurus gracilis, Anolis carolinensis, Pogona vitticeps, Python molorus bivittatus, Eublepharis macularius, Gekko japonicus, Pelodiscus sinensis, Chelonia mydas, Chrysemys picta bellii, Alligator sinensis, Alligator mississippiensis, Gavialis gangeticus, and Crocodylus porosus, along with the chicken Gallus gallus and mouse Mus musculus, were each aligned using PRANK v.170427 81 (Table S10). Aligned proteins were concatenated into a supermatrix, and a species tree was estimated using IQ-TREE version 1.6.7.1 82 with model selection across each partition 83 and 10,000 ultra-fast bootstrap replicates 84.
Gene family evolution analysis
Gene family expansion and contraction analyses were performed with CAFE v4.2 85 for the squamate reptile lineage, with a constant gene birth and gene death rate assumed across all branches.
Vomeronasal type 2 receptors were first identified in all species by containing the V2R domain InterPro domain (IPR004073) 86. To ensure that no V2R genes were missed, all proteins were aligned against a set of representative V2R genes using BLASTp 87 with an e-value cutoff of 1e-6 and a bitscore cutoff of 200 or greater. Any genes passing this threshold were added to the set of putative V2R genes. Transmembrane domains were identified in each putative V2R gene with TMHMM v2.0 88 and discarded if they did not contain 7 transmembrane domains in the C-terminal region. Beginning at the start of the first transmembrane domain, proteins were aligned with MAFFT v7.310 (auto alignment strategy) 89 and trimmed with trimAL (gappyout) 90. A gene tree was constructed using IQ-TREE 82–84 with the JTT+ model of evolution with empirical base frequencies and 10 FreeRate model parameters, and 10,000 bootstrap replicates. Genes were discarded if they failed the IQ-TREE composition test.
Positive selection analysis
We analyzed 4,081 genes that were universal and single-copy across all squamate lineages tested (Varanus komodoensis, Shinisaurus crocodilurus, Ophisaurus gracilis, Anolis carolinensis, Pogona vitticeps, Python molorus bivittatus, Eublepharis macularius, and Gekko japonicus) to test for positive selection (Table S8). An additional 2,040 genes that were universal and single-copy across a subset of squamate species (Varanus komodoensis, Anolis carolinensis, Python molurus bivittatus, and Gekko japonicus) were also analyzed (Table S8). We excluded multi-copy genes from all positive selection analyses to avoid confounding from incorrect paralogy inference. Proteins were aligned using PRANK 81 and codon alignments were generated using PAL2NAL 91.
Positive selection analyses were performed with the branch-site model aBSREL using the HYPHY framework 92,93. For the 4,081 genes that were single-copy across all squamate lineages, the full species phylogeny of squamates was used. For the 2,040 genes that were universal and single-copy across a subset of species, a pruned tree containing only those taxa was used. We discarded genes with unreasonably high dN/dS values across a small proportion of sites, as those were false positives driven by low quality gene annotation in one or more taxa in the alignment. We used a cutoff of dN/dS of less than 50 across 5% or more of sites, and a p-value of less than 0.05 at the Komodo node. Each gene was first tested for positive selection only on the Komodo branch. Genes undergoing positive selection in the Komodo lineage were then tested for positive selection at all nodes in the phylogeny. This resulted in 201 genes being under positive selection in the Komodo lineage (Table S9).
Author contributions
A.L.L. did genome annotation and all comparative genomics analyses. Y.Y.Y.L. led the sequencing and assembly efforts with Y.M. and A.C.Y.M. A.I. sequenced isolated chromosomes with M.R under supervision of L.K., and assigned sequences with A.M., I.K, and V.T. M.F. and V.O. contributed to genome assembly the genome in the lab of R.F. with C.C. A.K.H. led the initial development of the project. W.L.E. initially assembled the transcriptomes and annotated the genome. O.A.R. provided frozen tissue samples. J.M. collected specimen blood. M.M. and M.F. isolated samples and obtained PacBio sequence in the lab of T.P. and C.C.E.S. performed PacBio sequencing. J.W.H., J.M., and C.C. provided direction on Varanid physiology. T.J. and C.C. provided direction on Komodo dragon ecology. P.-Y.K. coordinated the genomics efforts. K.S.P. directed comparative genomic analysis. B.G.B. initiated and coordinated the project. A.L., K.S.P. and B.G.B. wrote the paper with input from all authors.
Supplemental file descriptions
Table S1. Genome statistics for non-avian reptiles used in this study.
Table S2. Repetitive elements in the Komodo dragon genome.
Table S3. Read statistics for chromosomal anchoring.
Table S4. Scaffold assignment and homologies of Komodo dragon scaffolds to green anole and chicken chromosomes.
Table S5. Number of reads, scaffolds, and positions assigned to chromosomes.
Table S6. Number of V1R/V2R genes across non-avian reptiles.
Table S7. V2R gene clusters in the Komodo dragon genome.
Table S8. Genes assayed for positive selection in the Komodo dragon genome.
Table S9. Positively selected genes in the Komodo dragon genome.
Table S10. Sources and versions of genomes used for phylogenetic and comparative methods.
Acknowledgements
Special thanks from B.G.B. to John Romano for inspiration and historical information. We are grateful to staff at Zoo Atlanta for care of Slasher and Rinca and help obtaining samples, Jim Pether from Reptilandia zoo in Gran Canaria in the Canary Islands, for additional samples, and R. Chadwick and N. Carli (Gladstone Genomics Core) for DNA and RNA-seq library preparation. We also thank Kristina Giorda, Rabeea Abbas, Deanna Church (10x Genomics) for 10x Genomics Chromium sequencing and Supernova assembly. This work was supported by institutional funding from the Gladstone Institutes to B.G.B and K.S.P.; an NHLBI grant to K.S.P and B.G.B (HL098179); the Younger Family gift to B.G.B.; an NHGRI grant to P.-Y.K. (R01 HG005946); NIH training grants (T32 AR007175) to A.C.Y.M and (T32 HL007731) to Y.M. M.A. and M.R. were supported by GACR 17-22141Y, M.R. was additionally supported by Charles University projects PRIMUS/SCI/46 and Research Centre (204069).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵