ABSTRACT
Aloe vera is a species from Asphodelaceae plant family having unique characteristics such as drought resistance and also possesses numerous medicinal properties. However, the genetic basis of these phenotypes is yet unknown, primarily due to the unavailability of its genome sequence. In this study, we report the first Aloe vera draft genome sequence comprising of 13.83 Gbp and harboring 86,177 coding genes. It is also the first genome from the Asphodelaceae plant family and is the largest angiosperm genome sequenced and assembled till date. Further, we report the first genome-wide phylogeny of monocots with Aloe vera using 1,440 one-to-one orthologs that resolves the genome-wide phylogenetic position of Aloe vera with respect to the other monocots. The comprehensive comparative analysis of Aloe vera genome with the other available high-quality monocot genomes revealed adaptive evolution in several genes of the drought stress response, CAM pathway, and circadian rhythm in Aloe vera. Further, genes involved in DNA damage response, a key pathway in several biotic and abiotic stress response mechanisms, were found to be positively selected. This provides the genetic basis of the evolution of drought stress tolerance capabilities of Aloe vera. This also substantiates the previously suggested notion that the evolution of unique characters in this species is perhaps due to selection and adaptive evolution rather than the phylogenetic divergence or isolation.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email addresses of authors: Shubham K. Jaiswal – shubhamj{at}iiserb.ac.in, Abhisek Chakraborty – abhisek18{at}iiserb.ac.in, Shruti Mahajan – shruti17{at}iiserb.ac.in, Sudhir Kumar – sudhir19{at}iiserb.ac.in, Vineet K. Sharma – vineetks{at}iiserb.ac.in
LIST OF ABBREVIATIONS
- MSA
- Multiple signs of adaptive evolution
- CAM
- Crassulacean acid metabolism
- COG
- Clusters of Orthologous Groups
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- GO
- Gene ontology
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- SIFT
- Sorting Intolerant From Tolerant
- FDR
- False discovery rate
- BLAST
- Basic Local Alignment Search Tool
- N50
- minimum contig length needed to cover 50% of the genome
- ABA
- Abscisic acid
- snoRNA
- small nucleolar RNA
- snRNA
- small nuclear RNA
- tRNA
- transfer RNA
- rRNA
- ribosomal RNA
- srpRNA
- signal recognition particle RNA
- miRNA
- micro RNA
- MYB
- Myeloblastosis
- bHLH
- basic helix–loop– helix
- CPP
- cysteine-rich polycomb-like protein
- LBD
- Lateral Organ Boundaries (LOB) Domain
- EMB3127
- Embryo Defective 3127
- PnsB3
- Photosynthetic NDH subcomplex B3
- TL29
- Thylakoid Lumen 29
- IRT3
- Iron regulated transporter 3
- PDV2
- Plastid Division2
- SIRB
- Sirohydrochlorin ferrochelatase B
- G6PD5
- Glucose-6-phosphate dehydrogenase 5
- KAT2
- Potassium channel in Arabidopsis thaliana 2
- PHYB
- Phytochrome B
- ELF3
- Early Flowering 3
- LHY
- Late Elongated Hypocotyl
- FT
- Flowering locus T
- PHYA
- Phytochrome A
- GI
- Gigantea
- FKF1
- Flavin-binding, Kelch repeat, F box 1
- SPA1
- Suppressor of PHYA-105 1
- HY5
- Elongated Hypocotyl5
- CHS
- Chalcone synthase
- CPA
- Capping Protein A
- PNC1
- Peroxisomal adenine nucleotide carrier 1
- PEX14
- Peroxin 14
- IRT3
- Iron regulated transporter 3
- NRAMP1
- Natural Resistance-Associated Macrophage Protein 1
- ALA1
- Aminophospholipid ATPase 1
- NAT8
- Nucleobase-Ascorbate Transporter 8
- NRT2.6
- High affinity Nitrate Transporter 2.6
- ppc-aL1a
- Phosphoenolpyruvate carboxylase
- CENH3
- Centromeric histone H3
- NORK
- Nodulation receptor kinase
- PPR
- Pentatricopeptide Repeat
- PTS1
- Peroxisomal targeting signal 1
- PTS2
- Peroxisomal targeting signal 2
- LTR-RT
- Long terminal repeat Retrotransposons
- EST
- Expressed sequence tag