Abstract
The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two giant tortoise species left in the world. The species is endemic to Aldabra Atoll in Seychelles and is considered vulnerable due to its limited distribution and threats posed by climate change. Genomic resources for A. gigantea are lacking, hampering conservation efforts focused on both wild and ex-situ populations. A high-quality genome would also open avenues to investigate the genetic basis of the exceptionally long lifespan. Here, we produced the first chromosome-level de novo genome assembly of A. gigantea using PacBio High-Fidelity sequencing and high-throughput chromosome conformation capture (Hi-C). We produced a 2.37 Gbp assembly with a scaffold N50 of 148.6 Mbp and a resolution into 26 chromosomes. RNAseq-assisted gene model prediction identified 23,953 protein-coding genes and 1.1 Gbp of repetitive sequences. Synteny analyses among turtle genomes revealed high levels of chromosomal collinearity even among distantly related taxa. We also performed a low-coverage re-sequencing of 30 individuals from wild populations and two zoo individuals. Our genome-wide population structure analyses detected genetic population structure in the wild and identified the most likely origin of the zoo-housed individuals. The high-quality chromosome-level reference genome for A. gigantea is one of the most complete turtle genomes available. It is a powerful tool to assess the population structure in the wild population and reveal the geographic origins of ex-situ individuals relevant for genetic diversity management and rewilding efforts.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- μg
- microgram
- μl
- microliter
- °C
- degree Celcius
- AGAT
- Another Gtf/Gff Analysis Toolkit
- ANGSD
- Analysis of Next Generation Sequencing Data
- baq
- base alignment quality
- bp
- base pairs
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- BWA
- Burrows-Wheeler Aligner
- DNA
- deoxyribonucleic acid
- cDNA
- complementary DNA
- dsDNA
- double-strand DNA
- EAZA
- European Association of Zoos and Aquaria
- ETH
- Swiss Federal Institute of Technology in ZüCrich
- GAIA
- Genome-wide Alignment Including Adapter-trimming
- GATK
- Genome Analysis Toolkit
- Gbp
- gigabase pairs
- GC
- guanine and cytosine
- GCE
- Genomic Character Estimator
- gDNA
- genomic DNA
- Hi-C
- chromosome conformation capture
- HiFi
- high-fidelity
- HMW
- high molecular weight
- IsoSeq
- isoform sequencing
- IUCN
- International Union for Conservation of Nature
- JBAT
- Juicebox Assembly Tools
- KAT
- k-mer analysis toolkit
- LS
- liquid sample
- MAF
- minor allele frequency
- Mbp
- megabase pairs
- mg
- milligram
- min
- minute
- mL
- milliliter
- mM
- millimolar
- NCBI
- National Center for Biotechnology Information
- NEB
- New England Biolabs
- ng
- nanogram
- NGSAdmix
- Next Generation Sequencing Admixture
- ngsLD
- Next Generation Sequencing Linkage Disequilibrium
- OrthoDB
- orthologous database
- PacBio
- Pacific Biosciences
- PBS
- phosphate-buffered saline
- PCA
- principal component analysis
- PCR
- polymerase chain reaction
- QUAST
- Quality Assessment Tool
- RefSeq
- reference sequence
- Rfam
- RNA families
- RNA
- ribonucleic acid
- RNA-seq
- RNA sequencinf
- rpm
- revolutions per minute
- Sauropsida_odb10
- sauropsids orthologous database 10
- SMRT
- single-molecule real-time
- SNP
- single nucleotide polymorphism
- SRA
- Sequence Read Archive
- STAR
- Spliced Transcripts Alignment to a Reference
- SyRI
- Synteny and Rearrangement Identifier
- tRNA
- transfer RNA
- rRNA
- ribosomal RNA
- snRNA
- small nuclear RNA
- miRNA
- micro RNA
- TSEBRA
- Transcript Selector for BRAKER
- UniProtKB
- Universal Protein Knowledgebase
- Vertebrata_odb10
- vertebrate orthologous database 10