Determination of phylogenetic relationships in the genus Mangifera based on whole chloroplast genome and nuclear genome sequences

The genus Mangifera (Anacardiaceae) includes 69 species of which Mangifera indica L. is the most important and primarily cultivated species for commercial mango production. Although the species are classified based on morphological descriptors, molecular evidence has proposed the hybrid origin of two species suggesting the possibility that more of the species may be of hybrid origin. To analyze evolutionary relationships within the genus, 13 samples representing 11 Mangifera species were sequenced and whole chloroplast (Cp) genomes and 47 common single-copy nuclear gene sequences were assembled and used for phylogenetic analysis using concatenation and coalescence-based methods. The Cp genome size varied from 151,752 to 158,965 bp with M. caesia and M. laurina having the smallest and largest genomes, respectively. Genome annotation revealed 80 protein-coding genes, 31 tRNA and four rRNA genes across all the species. Comparative analysis of whole Cp genome sequence and nuclear gene-based phylogenies revealed topological conflicts suggesting chloroplast capture or cross hybridization. The Cp genomes of M. altissima, M. applanata, M. caloneura and M. lalijiwa were similar to those of M. indica (99.9% sequence similarity). Their close sequence relationship suggests a common ancestry and likely cross-hybridization between wild relatives and M. indica. This study provides improved knowledge of phylogenetic relationships in Mangifera, indicating extensive gene flow among the different species, suggesting that hybrids may be common within the genus.


Introduction
Mango (Mangifera indica L), an evergreen dicotyledonous angiosperm often referred as "king of fruits" is adapted to grow in tropical and sub-tropical regions of the world (Mukherjee 1949a;Rabah et al. 2017;Singh et al. 2016;Vasanthaiah et al. 2007).It is considered as one of the most economically successful fresh fruits cultivated in more than 100 countries, with 55 million tonnes produced in 2020.India has produced approximately 24.7 million tonnes as the world's largest mango producer accounting for 45% total mango production followed by Indonesia (6.6%), Mexico (4.3%),China (4.3%), and Pakistan (4.3%) (FAOSTAT 2022).Besides being consumed fresh, ripe and unripe mangoes are used to produce pickles, chutney, juices, cereal flakes, sauce, and jam building high demand for mangoes on the international market (Saúco 2016).
The taxonomic history of the genus Mangifera (Anacardiaceae) reveals consistent recognition of two major groups with the number of species reported varying between 45 -69 (Hou 1978;Kostermans and Bompard 1993;Mukherjee 1949a).The most accepted classification described by (Kostermans and Bompard 1993) defines 69 species mainly based on morphological descriptors of reproductive tissues.Of the 69 species, 58 are divided into two subgenera, Mangifera and Limus, with the remaining 11 species placed in an uncertain position due to insufficient voucher material.The subgenus Mangifera is characterized by free stamen filaments and a four/five lobed cushion-shaped papillose disc broader than the base of the ovary while and the subgenus Limus is characterized by united stamen filaments and a stalk-like disc narrower than the base of the ovary.The sub genus Mangifera includes 47 species further divided into four sections: Marchandora Pierre, Euantherae Pierre, Rawa Kosterm, and Mangifera Ding.In the subgenus Mangifera, section Marchandora Pierre includes only M. gedebe which has the unique character of labyrinthine seed.The three species (M.pentandra Hooker, M. cochinchinensis Engler, M. caloneura Kurz) belonging to section Euantherae Pierre are characterized by five fertile stamens.The section Rawa Kosterm includes nine species which are not well delimited.Mangifera Ding Hou is the largest section in the genus with more than 30 species including domesticated mango (M.indica) (Bompard 2009;Kostermans and Bompard 1993).The 11 species in sub genus Limus are further divided into two sections: Deciduae (deciduous trees) and Perrennis (non-deciduous species).
Due to the high demand for mango globally, systemic breeding programs have been initiated recently to develop cultivars with high productivity, consumer, and transportability traits more suited for national and international markets.However, breeding is time-consuming due to the long juvenile period, high heterozygosity and polyembryony observed in mango (Bally and Dillon 2018).Currently, although M. indica is the principal cultivated species for commercial fruit production (Dinesh et al. 2011) from which a set of selected commercial cultivars dominate the crop improvement programs, 26 other species have been reported to produce edible fruits including M. altissima, M. foetida, M. caesia, M. odorata, M. pentandra, M. laurina, M. sylvatica, M. zeylanica and M. pajang (Bally et al. 2021;Bompard 2009;Mukherjee and Litz 2009).Many wild species exhibit potential significance in trait-specific breeding due to their favourable traits related to fruit quality, biotic and abiotic stress tolerance and potential as rootstocks (Bompard 1992;Eiadthong et al. 1999;Iyer 1989) if their distinctive characteristics are properly exploited.Although the species have been described in terms of morphological characteristics, they are not well-characterized in a genetic framework.Therefore, identification of molecular evolutionary relationships within the genus is vital to allow efficient use of wild relatives in future breeding programs.
Recent studies have used molecular markers targeting coding and non-coding regions in the chloroplast genome (Eiadthong et al. 1999;Fitmawati et al. 2017;Hartana 2010;Hidayat et al. 2011) and a set of nuclear genes (Fitmawati 2016;Schnell and Knight Jr 1992;Yonemori et al. 2002) to analyse phylogenetic relationships within the genus.However, the results have not been consistent, and many studies were unsuccessful in inferring evolutionary relationships with fully resolved phylogenies.This may be due to use of molecular markers with limited genetic information within the genome, a slow evolutionary rate within the targeted regions and use of different taxa in different analysis.Two studies have used whole chloroplast genome (Niu et al. 2021) and mitochondrial genome (Niu et al. 2022) sequences alone with a small number of taxa.However, in most angiosperms, chloroplast and mitochondrial genomes are maternally and paternally inherited, respectively (Corriveau and Coleman 1988).These studies prevent precise analysis of evolutionary relationships due to the use of uniparentally inherited genetic information for phylogenetic analysis.
The genus Mangifera is native to South and South-East Asia ranging from Indochina, Burma, Thailand and the Malay Peninsula to Indonesia and Philippines where some of the species are found only in the wild while others are locally grown in gardens and orchards (Kostermans and Bompard 1993).With the introduction of common mango to South-East Asia during 4 th -5 th century (Mukherjee 1949b), M. indica and wild Mangifera species in the region might have come into contact with each other.Since both wild and domesticated mangoes are assumed to be self-incompatible, hybridization is expected among these outcrossing species when grown in close proximity.
Among wild species, a hybrid origin has been reported for M. odorata (Teo et al. 2002) and M. casturi (Matra et al. 2021;Warschefsky 2018).With molecular data suggesting the potential of cross-hybridization in the genus, more hybrids can be expected among these 69 Mangifera species that have been currently identified as distinct species.Comparative phylogenetic analysis based on both chloroplast genome and ideally, a set of single-copy nuclear genes, together representing maternal and biparental inheritance respectively (Liu et al. 2020;Tsutsui et al. 2009), will be a useful approach for precise determination of evolutionary relationships (Duarte et al. 2010).
The availability of a suitable and precise reference genome is crucial in evolutionary studies to determine relationships among the species with higher accuracy.The haploid genome size of M. indica is estimated to be 439 Mbp by flow cytometric analysis (Arumuganathan and Earle 1991).The first draft genome for M. indica was assembled for the Indian cultivar Amrapali (Singh et al. 2014;Singh et al. 2018).The genomes of M. indica cv.
Tommy Atkins (Bally et al. 2021), Kensington Pride (Dillon et al. 2016), Hong Xiang Ya (Li et al. 2020) and Alphonso (Wang et al. 2020) also have been sequenced using advanced sequencing platforms.A high-quality chromosome-level genome is available for the cultivar Alphonso (Wang et al. 2020).The genetics and genomics of chloroplasts have progressed rapidly with the advent of high-throughput sequencing technologies.Chloroplast genomes in higher plants are typically double-stranded and organized into conserved quadripartite structure, consisting of a pair of inverted repeats (IR) separated by small single copy region (SSC) and a large single copy region (LSC).The chloroplast genome size, although far smaller than most of the plant nuclear genomes, ranges from 120 kb to 160 kb (Odintsova and Yurina 2006) with 110 to 130 genes.Conflicts between the chloroplast and nuclear phylogenetic analysis provide valuable insights into speciation, hybridization and incomplete lineage sorting (Degnan and Rosenberg 2009;Joly et al. 2009).The first chloroplast genome sequence in genus Mangifera was reported for M. indica (Azim et al. 2014) and so far, assembled chloroplast genomes of only six out of 69 species (Jo et al. 2017;Niu et al. 2021) are available.In this study, we compared sequences of chloroplast genomes, and a selected set of common single-copy genes present in nuclear genome of 11 Mangifera species to analyse evolutionary relationships in the genus.

Plant material and DNA extraction
A total of 13 samples belonging to 11 Mangifera species were selected for sequencing (Table 1).Leaf tissue of all M. indica varieties and Mangifera species, except M. pajang and M. caesia, are sourced from trees grafted onto M. indica cv.Kensington Pride rootstock at the Walkamin Research Station, Mareeba, (17°08̍ 02″S and 145°25 37″E), North Queensland, Australia.M. pajang and M. caesia were sourced from trees at Treefarm, El Arish (-17° 47'59.99"Sand 146°00'0.00"E)and Fruit Forest Farm (www.fruitforestfarm.com.au,East Feluga,(17°53'46.0"S and 145°59'38.0"E),Queensland, Australia, respectively.Leaves after harvest were snap-frozen in liquid nitrogen, transported under dry ice and stored at -70℃ until processed for DNA extraction.Frozen leaf tissue was first coarsely ground using a mortar and pestle and then finely ground using the Qiagen Tissue lyser (Qiagen, USA).DNA extraction was carried out from fine pulverized mango leaf tissue samples using a cetyltrimethylammonium bromide method (Furtado 2014).The quality and quantity of DNA were assessed for acceptable absorbance ratios (ideal 1.8-2.0 at A260/280 and over 2.0 at 260/230) using Nanodrop Spectrophotometer (Thermo Fisher Scientific).DNA degradation and quantity were assessed by resolving sample and standard DNA by agarose gel electrophoresis (Thermo Fisher Scientific).The isolated DNA was subjected to next-generation short read sequencing (NGS) on an Illumina HiSeq 2000 platform at the Ramaciotti Centre for Genomics, University of New South Wales (UNSW), Australia to obtain sequence data with coverage of the genome that was not less than 20X (Table S1).S2).Chloroplast genomes for each species were assembled using two methods: a chloroplast assembly pipeline (CAP) described by (Moner et al. 2018) in CLC Genomic Workbench (CLC-GWB) software (CLC Genomics Workbench 20.0, http://www.clcbio.com)and "Get Organelle" pipeline (http://github.com/Kinggerm/GetOrganelle)(Jin et al. 2020).Raw reads for all the species were imported to CLC-GWB and trimmed using the quality score limits of 0.01 (Phred score equivalent to >20) at the sequence length of 1000.The CAP processed two approaches to assemble the chloroplast genome, a reference-guided mapping approach and a de-novo assembly approach.For the reference guided mapping, M.
indica cv.Tommy Atkins chloroplast genome (Accession: NC_035239.1)(Rabah et al. 2017), available from NCBI, was imported to CLC-GWB and used as the reference to assemble the chloroplast genomes.The two chloroplast sequences generated using the two approaches of the CAP for each species were aligned in Geneious 2022.2.2 software (www.geneious.com)and Clone Manager Professional 9 to identify mismatches.Manual curation of mismatches involved observing the reads mappings at the position of the mismatch.De-novo assembled chloroplast genomes from Get Organelle pipeline were checked in Bandage v. 0.8.1 (Wick et al. 2015) to visualize the completeness of the assembled genomes.The final chloroplast genome assembled from CAP and Get Organelle pipeline were compared for mismatches and further manual curation, ensuring high-quality chloroplast genomes were assembled for all the species.

Chloroplast genome annotation and identification of single nucleotide polymorphisms (SNPs), insertions and deletions (INDELs)
Genome annotations for assembled chloroplast genomes were performed using GeSeq online tool (https://chlorobox.mpimp-golm.mpg.de/geseq.html)and M. indica cv.Tommy Atkins (Accession: NC_035239.1)was used as the reference genome.Based on the evolutionary relationships observed in chloroplast genome-based phylogenetic analysis, closely related species within the main clades and subclades were compared to determine their evolutionary relationships.Chloroplast genomes of the species were subjected to pairwise alignment in Geneious using the MAFFT alignment tool and the number INDELs, substitutions and SNPs present between the sequences were identified.

Nuclear gene sequence assembly
Here, we used a list of single copy nuclear genes.Details of the genes were not available for Mangifera species.
Therefore, Citrus sinensis, the closest relative of M. indica for which the details of single copy nuclear genes were available (Li et al. 2017) was used as the reference, to extract corresponding single-copy genes in mango.Single copy genes (107) in C. sinensis were mapped against the coding DNA sequences/gene models of M. indica cv.
Alphonso (Wang et al. 2020) in CLC-GWB.Out of 107, 47 were identified as single-copy genes in M. indica.
Then, trimmed paired-end illumina reads of each species were mapped against 47 single-copy genes of M. indica, and consensus gene sequences were extracted.All 47 genes were identified as single-copy genes in Mangifera species.Also, the same 47 gene sequences were extracted from A. occidentale.Details of the single copy genes in mango are indicated in Table S3.

Phylogenetic analysis
Phylogenetic analysis of Mangifera species was undertaken using the chloroplast genome sequences and also for the single-copy nuclear gene sequences.Corresponding sequences of A. occidentale were used as an outgroup.
Apart from the 13 samples sequenced in this study, five species for which chloroplast genomes and nuclear gene sequences generated by downloading sequence data from NCBI were also included.

Chloroplast genome-based phylogenetic analysis
For chloroplast genomes-based phylogenetic analysis, Sequences were imported to the Geneious 2022.2.2 software (www.geneious.com)and aligned by multiple sequence alignment using MAFFT (MAFFT v7.490) alignment tool (Katoh and Standley 2013).Two methods were used for phylogenetic analysis: Maximum likelihood (ML) method and Bayesian inference (BI) method.jMfodelTest v2.1.4(Darriba et al. 2012) was used to select the best fitting nucleotide substitution model using Cyberinfrastructure for Phylogenetic Research (CIPRES) Science Gateway (http://www.phylo.org/)(Table S4).ML analysis was performed in RaXML GUI 2.0 (v 2.0.10)(Stamatakis 2014) with 1000 bootstrap replicates under Akaike information criterion.Bayesian analysis was carried out in Geneious software using MrBayes v. 3.2 (Ronquist et al. 2012) under the Bayesian information criterion (Table S4).iTOL v.6 tool (https://itol.embl.de/)(Letunic and Bork, 2021) was used to visualize and edit the phylogenies.Using posterior probability (PP) and bootstrap support (BS) to evaluate the supports of the phylogenetic tree implemented under BI and ML methods respectively, final trees obtained from both approaches were compared and tree topologies were validated.

Nuclear gene-based phylogenetic analysis
For nuclear gene sequences, phylogenetic trees were generated using two approaches: gene concatenation and coalescent approach to analyse any topological incongruence and for a better understanding of evolutionary relationships among species.

Concatenation approach
All 47 genes of were concatenated in the same order to get a one long sequence per species.Sequences for all the species were imported to the Geneious 2022.2.2 software (www.geneious.com)and aligned by MAFFT alignment.Phylogenetic trees were constructed using ML method and BI methods after selecting the best fitting nucleotide substitution model by running jMfodelTest v2.1.4(Darriba et al. 2012) (Table S4).ML analysis was performed in RAxML (version 8) (Stamatakis 2014) with 1000 bootstrap replicates, and Bayesian analysis was carried out in Geneious software using MrBayes v. 3.2 (Ronquist et al. 2012) (Table S4).iTOL v.6 tool (https://itol.embl.de/)(Letunic and Bork 2021) was used to visualize and edit the phylogenetic trees.By using PP and BS values to evaluate the supports of the phylogenetic tree implemented under BI and ML methods respectively, final trees obtained from both approaches were compared and tree topologies were validated.

Coalescent approach
For coalescent approach, single ML gene trees were constructed by ML method using RAxML (version 8) (Stamatakis 2014).We searched for the best-scoring ML tree using a GTR+GAMMA model with 1000 bootstrap replicates.Low support branches (BS<10%) in gene trees were collapsed to minimize potential impacts of gene tree error for species tree reconstruction.Then, these gene trees were used to construct a coalescent-based species tree using ASTRAL-III (Zhang et al. 2018) 3. Results

Chloroplast genome assembly and annotation
Illumina sequencing conducted for the 13 samples belonging to 11 Mangifera species in this study resulted in 60,699,616 to 181,601,786 raw reads with 150bp mean read length.Trimmed paired-end reads at 0.01 quality limits (Phred score > 20) ranged between 59,763,897 and 171,303,402 reads.The data size of the trimmed reads of all 13 samples corresponded to over 20x of the genome size, therefore all were selected for the chloroplast assembly (Table S1).Raw reads downloaded from NCBI for five Mangifera species (M.odorata, M. sylvatica, M. percisiformis, M. hiemalis and M. indica cv.Tommy Atkins) had a total of 99,649,506 to 127,708,722 reads which ranged from 94,450,606 to 117,445,811 after trimming at 0.01 quality limits.For all the species, the mean coverage was higher than 20x genome size, which enabled them to be included in the analysis (Table S2).The chloroplast genome and respective raw reads were also available for the species M. longipes (synonym: M. laurina), but the average coverage after quality trimming was less than 20x, and therefore this was not included for the analysis.The Get Organelle pipeline resulted in two output files for the chloroplast genome for each species/ genotype, due to possibility of the SSC occurring in both orientations in the chloroplast genomes in plants.
Therefore, the two chloroplast sequences for each species were aligned with the reference (M.indica; Accession: NC_035239.1) in Clone Manager Professional 9 to select the sequence with the widely accepted SSC orientation (5'LSC3':5'IR13':5SSC3':3'IR25').The size of the chloroplast genomes of 13 wild Mangifera species and three cultivars of M. indica ranged from 151,752bp to158,965bp of which the smallest and the largest genomes were recorded for M. caesia and M. laurina Lombok, respectively (Table 2).The typical quadripartite structure of the chloroplast genome was recorded in all 14 Mangifera species and the lengths of LSC, SSC, and IR regions ranged between 86,507 to 98,334 bp, 18,319 to 19,064 bp and 17,177 to 26,412 bp, respectively where overall guaninecytosine content (GC content) ranged from 37.6 to 37.9 %.The chloroplast genomes for all species had the same number of total genes (115), rRNA (4) and tRNA (31) and protein encoding genes (80) (Table 2).Although the size of the chloroplast genomes varies across the Mangifera species, three cultivars of M. indica had identical chloroplast genomes.Two M. odorata accessions had slightly different chloroplast genome sizes, where the accession we sequenced had a genome size of 158,889bp while for the sample for which the data was downloaded from NCBI (M.odorata*) had a genome size of 158,883 bp, representing a 5 bp difference.The length difference was due to two deletions revealed in M. odorata*, one located in a non-coding region of LSC while the other located in the intron1 region of the PetD gene of LSC.The chloroplast sequence of M. indica cultivar Kensington Pride (Fig. 1) is a representation of the chloroplast sequence of the 14 Mangifera species which have the same number of genes although there are differences in total chloroplast size and the sizes of the LSC, IR1 and IR2 and the SSC regions.

Chloroplast phylogeny and identification of SNPs and INDELs
A multiple chloroplast sequence alignment conducted using A. occidentale as the outgroup followed by phylogenetic tree construction resulted in a ML tree and a Bayesian tree with same tree topology.BS and PP values of the final tree are presented in Fig. 2. The model of nucleotide substitutions for ML analysis was GTR+G whereas for the Bayesian analysis, it was TPM1uf+G.The tree developed with the ML approach showed a BS of 100 at most of the nodes and PP of one in all the nodes.In the whole plastome tree, three main clades were   * Species for which chloroplast genomes were assembled using raw data downloaded from NCBI

Concatenation-based nuclear phylogeny
The same approach was used to construct a nuclear phylogeny with concatenation-based approach as was applied in constructing the chloroplast phylogeny.A. occidentale was used as the outgroup.A total of 47 common single copy nuclear genes out of 107 (Li et al. 2017) were identified and selected for Mangifera species.The multiple sequence alignment was 71,881 bp in length and ML and Bayesian trees resulted in the same tree topology.The final tree with BS values and PP values is presented in Fig. 3a.Although some of the nodes showed less BS support values, all the nodes were supported with high PP values.The model of nucleotide substitutions for ML analysis was GTR+I+G whereas TPM1+I+G was used for the Bayesian analysis.hiemalis than to M. percisiformis.
Furthermore, both chloroplast and concatenation-based nuclear phylogenies revealed that M. caesia is evolutionarily distant from the rest of the Mangifera species (Fig. 2 and 3a).Grouping of species in both chloroplast genome and nuclear genes-based analysis does not completely concur with the accepted classification described by (Kostermans and Bompard 1993) for genus Mangifera.Incongruence in tree topologies could be seen between the phylogenies developed based on the whole plastome genome and the nuclear genes.

Coalescence-based nuclear phylogeny
Previously proposed hybrids were included in our dataset and possibility of hybridization events also were observed for some species when compared chloroplast and concatenation-based nuclear phylogenies.Therefore, to further analyse the phylogenetic relationships among Mangifera species with respect to nuclear genes, coalescence approach was utilised to develop individual nuclear gene trees thereby to develop a species tree.
Individual gene trees were analysed to see close evolutionary relationships among species.
In coalescence-based species tree, local posterior probability support (LPP) values are indicated in the branches (/1).In both concatenation and coalescence based nuclear phylogenies, species belonging to subgenus Mangifera, and species placed in uncertain position in the classification were clustered in one clade with high support values (BS=99, PP=1, LPP=0.93) (Fig. 3).Within this clade, pattern of clustering into sub-clades was different for some species between the two nuclear phylogenies but BS and LPP support values also were low for some sub-clades.

Discussion
Determination of phylogenetic relationships among crop species provides basic information for predicting their evolutionary history, taxonomical classification, evaluating their diversity and importance in plant breeding (Zhang et al. 2012).Although genetic analysis of plants has improved rapidly with advanced sequencing technology, many phylogenetic studies in the genus Mangifera have relied on a set of molecular markers such as amplified fragment length polymorphisms (AFLP), rapid amplified polymorphic DNA (RAPD) and simple sequence repeats (SSR) and the sequencing of limited numbers of targeted regions in the chloroplast genome (maturase K, trnL-F spacer regions) and nuclear ribosomal DNA (internal transcribed spacer/ITS region) (Eiadthong et al. 1999;Fitmawati et al. 2017;Fitmawati 2016;Hartana 2010;Hidayat et al. 2011;Schnell and Knight Jr 1992;Yonemori et al. 2002).This is the first comparative analysis of phylogenetic relationship within the genus using both whole chloroplast genomes and multiple single-copy nuclear genes.
Chloroplast genomes for seven species were assembled for the first time in this study for M. pajang, M. altissima, M. caesia, M. lalijiwa, M. zeylanica, M. appalanta and M. casturi.Different pipelines and programs such as SPAdes (Bankevich et al. 2012), SOAPdenovo2 (Luo et al. 2012), ORG.Asm (Coissac et al. 2016), IOGA (Bakker et al. 2016), Fast-Plast (Afinit 2017), Organelle_PBA (Soorni et al. 2017), NOVOPlasty (Dierckxsens et al. 2017), chloroExtractor (Ankenbrand et al. 2018), CAP (Moner et al. 2018) and Get Organelle toolkit (Jin et al. 2020) are available to assemble organelle genomes.Here, we have used CAP and Get Organelle pipeline to assemble plastome genomes independently for each species.The two approaches used in CAP (reference-guided mapping and de-novo assembly) eliminate many errors in genomes developed from each approach giving a highly accurate final chloroplast genome.The Get Organelle pipeline; a fast and versatile toolkit used to assemble organelle genomes via de novo approach is also capable of generating all possible arrangements of the chloroplast genome present because of flip-flop configurations or other isomers mediated by repeats (Jin et al. 2020).Therefore, a comparison of chloroplast genomes generated from CAP and the Get Organelle pipeline validated the development of highly accurate final chloroplast genomes for all the species.Annotation of chloroplast genome with Ge Seq software discovered a total of 115 genes including 80 protein-coding genes, 31 tRNA genes, and four rRNA genes for all the species used in this study.More genes have been annotated in this analysis compared to previous studies, in which (Zhang et al. 2020) reported a total of 112 genes (78 protein-coding genes, 30tRNA genes, 4 rRNA genes) for M. sylvatica while 113 genes (79 protein-coding genes, 30tRNA genes, 4 rRNA genes) for M. sylvatica, M. odorata, M. longipes, M. percisiformis, M. hiemalis and M. indica were reported by (Niu et al. 2021).
Phylogenetic relationships within genus Mangifera showed topological incongruence for some species with respect to whole chloroplast and nuclear genes trees which may be caused by introgressive hybridization, allopolyploidy or incomplete lineage sorting.Reproductive compatibility between different species allows native cytoplasm of a species to be easily replaced by another through hybridization which has been detected both in animals (mitochondrial capture) (Liu et al. 2016;Rebbeck et al. 2011) and plants (chloroplast capture) (Rieseberg 1995;Rieseberg and Soltis 1991).In plants, chloroplast capture events have been reported in many plant families including Maleae, Rosaceae (Liu et al. 2020), Nothofagaceae (Acosta and Premoli 2010) Scrophulariaceae (Wolfe and Elisens 1995), Apiaceae (Yi et al. 2015), Poaceae (Ananda et al. 2021;Moner et al. 2020), Rubiaceae (Charr et al. 2020;Guyeux et al. 2019) and Myrtaceae (Healey et al. 2018).Hybridization followed by recurrent backcrossing have explained discrepancies between chloroplast and nuclear gene-based phylogenies in diverse families of plants (Liu et al. 2017;Smith and Sytsma 1990;Stegemann et al. 2012;Tsitrone et al. 2003).In mango, evidence for inter-specific reproductive compatibility was reported for M. indica and M. laurina.A cross between M. indica and M. laurina have produced 60 successful hybrids (Bally et al. 2010).Hybrid origins were reported for M. odorata (cross-hybrid between M. indica and M. foetida) (Teo et al. 2002) and M. casturi (Matra et al. 2021).
Close genetic relationship between M. applanata and M. altissima has been reported in a phylogenetic analysis Considering domestication of M. indica, although a single domestication event has been reported based on historical records (Mukherjee 1972;Singh et al. 2016), two independent domestication events have also proposed for M. indica in India and Indochina (Bompard 2009).Based on a population genomics study, Warschefsky and von Wettberg (2019) suggested that mango domestication is a complex process and it may involve multiple domestication events and interspecific hybridization; two common phenomena observed in perennial fruit crop domestication.Their results also have indicated a high genetic diversity among M. indica cultivars distributed outside from the region where the mango was originated and a unique genetic diversity in Southeast Asian cultivars compared to other populations.Warschefsky and von Wettberg (2019) suggest that the origin and initial cultivation of mango may have taken place in Southeast Asia and further improvement and domestication may have occurred in India.In addition, cross-hybridization was highly likely to occur between wild relatives and M.
indica at the early stages of domestication because of the presence of a high number of species and presence of evidence for crossbreeding.Thus, apart from descending from a common ancestor, cross-hybridization between M. indica and the four wild relatives is also a possible phenomenon that may have further contributed to the close evolutionary relationships observed in our study.However, this close evolutionary relationship could be further supported by including multiple replicates per species which is a limitation in this study.
M. zeylanica, is an endemic species to Sri Lanka which is only discovered in forests of wet and intermediate which it showed more affinity to M. foetida than to M. indica (Teo et al. 2002;Yonemori. et al. 2002).
Furthermore, closer genetic relationship of M. foetida to M. pajang compared to M indica has been suggested (Schnell and Knight Jr 1992).Having a distinct chloroplast genome and a distant phylogenetic relationship to M.
indica in the chloroplast tree suggests that M. odorata might have captured chloroplast genome from M. foetida and ancestor of M. indica might have contributed as the male progenitor.In both concatenation and coalescence approaches for nuclear genes, M. odorata showed a relatively distant evolutionary relationship with M. indica.
Since individual gene trees cluster the two species in four gene trees only with weak support, it is less likely that M. odorata is a first-generation hybrid.Though a small M. indica background was suggested for M. odorata, without being the other suggested parent, the hybrid status and parentage of M. odorata is inconclusive.
Another discrepancy observed from the chloroplast and nuclear trees is related to the position of M. sylvatica.
Previous studies have revealed a close evolutionary relationship between M. indica and M. sylvatica based on restriction fragment length polymorphism (RFLP) (Eiadthong et al. 1999), ITS (Yonemori. et al. 2002) marker analysis and whole chloroplast genome analysis (Niu et al., 2021).In our study, although M. sylvatica showed a close genetic relationship with M. indica in chloroplast phylogeny, it nested with M. hiemalis in the nuclear phylogeny.M. hiemalis is an endemic species to China and M. sylvatica is also one of the cultivated species in China (Wang et al. 2020;Baul et al. 2016).Therefore, both species share the same geographical distribution.Since individual gene trees revealed clustering with M. hiemalis as sister taxa in 12 trees, M. sylvatica might have a hybrid origin where the hybridization occurred long ago, but the low BS values in individual gene trees does not strongly support this hypothesis.
Topological incongruence observed by chloroplast genome and single copy nuclear gene-based phylogenies reveal that there is a potential for inter-specific hybridization in the genus.But less BS values and weak resolution in gene trees of coalescence approach and low BS/PP/LPP support values in some of the branches of concatenation-based nuclear phylogeny and species tree are clear evidence that the nuclear genes are not well distinguished/ might not vary across the group of species studied.Less variability of nuclear genes and absence of one of the parents for proposed hybrids limited concluding about possible hybridization event/s occurred in the genus and hybrid of M. odorata and M. casturi.But even if we have hybrids in our dataset with the presence of both parents, phylogenies will show their close evolutionary relationships if it is a recent generation hybrid.
Therefore, results of this study suggests that the whole group sufficiently closely related with each other so that we needed large amount of data to get a concatenated/consensus tree.The history of evolution of the species and hybridization is complex in the genus and requires more species to get better understanding.However, is it possible that out of 69 distinct species identified in the genus, some or many of them may have either domestication input or cross-hybridized with other wild relatives.

Conclusions
Our analysis of determining evolutionary relationships within the genus Mangifera, based on whole chloroplast genome and 47 single copy nuclear genes, revealed close genetic relationship among species and discrepancies between whole plastome and nuclear gene-based phylogenies.We suggest that the five species including M. sylvatica and M. hiemalis.However, evidence did not strongly support the possible hybridization due to weak BS/PP and LPP supports in phylogenies.Moreover, it was observed that geographical proximity might have facilitated possible hybridization events.Despite limited number of species used in the study, it seems that evolution of species and hybridization in the genus Mangifera is a complex process.This is the first comparative analysis of evolutionary relationships within the genus with whole chloroplast genome and multiple nuclear genes.
These findings provide an understanding about the nature of hybridization within the genus between wild and domesticated mango revealing potential domestication input into some species.Validation of hybridity and accuracy of evolutionary relationships within the genus can be highly supported and improved by adding more species including potential parents and sampling species from different geographical locations.

Authors contributions
Robert J. Henry: Conceptualization and design, Methodology, Supervision, Validation, Project administration,

Fig. 1
Fig. 1 Genome map of chloroplasts in the genus Mangifera.The genome size of the 14 Mangifera species ranges from 151,752 to 158,965 bp for M. caesia and M. laurina Lombok, respectively.In the outer most circle, the black thick border/line indicates Inverted Repeat Regions (IR) whereas the thin lines indicate Large Single Copy (LSC) and the Small Single Copy (SSC).Genes inside the circle are transcribed in the clockwise direction whereas the genes outside the circle are transcribed in the counter-clockwise direction.Different colours are given for the genes with respect to their functions.The darker grey in the inner circle corresponds to GC content, whereas the lighter grey corresponds to AT content identified.First, 14 Mangifera species were clustered into two distinct clades in which only M. caesia belonging to section Dissidue in the subgenus Limus was placed in first clade (Clade A).Other 13 species were grouped into a separate clade indicating their evolutionary distinct relationship to M. caesia, which were then clustered into two sub clades (clade B and clade C).Clade B included a total of six species which belong to different categories in the classification.M. pajang and M. odorata belong to sub genus Limus while M. casturi and M. laurina belong to subgenus Mangifera.M. percisiformis and M. hiemalis are two species placed under uncertain position in the classification.Within clade B, species in subgenera Mangifera (Clade BI), Limus (Clade BII), and species being classified in an uncertain position (Clade BIII) have localized into well supported distinct clades (BS=100, PP=1).The species belong to subgenera Mangifera and Limus and were sister to each other and both together have become a sister clade to species placed in uncertain position in the classification.Clade C had species belonging only to the sub genus Mangifera which was further divided into three subclades (clades CI, CII and CIII).Interestingly, four wild species (M.lalijiwa, M. applanata, M. altissima and M. caloneura) were clustered with three cultivars of domesticated mango (M.indica) (Clade CI).Although species belong to section Mangifera and Euantherae are characterised by the presence of one and multiple fertile stamens respectively, M. caloneura in section Euantherae was clustered with species belonging to section Mangifera.Furthermore, M. sylvatica and M. zeylanica; two species within the clade C were separately clustered into distinct clades, CII and CIII respectively.M. sylvatica is the sister to domesticated clade (clade CI) and M. zeylanica (Clade CIII) has become sister to the clade includes CI and CII.Therefore, the phylogeny based on whole chloroplast genome clustered species belong to different groups inferring the close genetic and evolutionary relationships of their chloroplast genomes (Fig. 2).

Fig. 2
Fig.2The phylogenetic tree developed for Mangifera species based on whole chloroplast genomes.Phylogenetic tree of 17 accessions belong to 14 species with A. occidentale used as the outgroup.Trees were generated using Maximum Likelihood (ML) and Bayesian inference (BI) method.Numbers associated with the branches are ML bootstrap value (/100) and BI posterior probabilities (/1).Dark Blue: Sub genus Mangifera, Section Mangifera, Light blue: Sub genus Mangifera, Section Euantherae, Red: Sub genus: Limus, Section Perrennis, Yellow: sub genus: Limus, Section: Deciduae.* Mangifera species for which chloroplast genomes were assembled using raw data downloaded from NCBI Except for M. sylvatica, the other eight species belonging to subgenus Mangifera were clustered into one main distinct clade.M. lalijiwa, M. applanata and M. caloneura with M. casturi were clustered into one clade and M. altissima, M. zeylanica, and M. indica cultivars were clustered into another clade within the main clade.M. applanata and M. lalijiwa were sister taxa to each other.Furthermore, two M. indica cultivars (Kensington Pride and Tommy Atkins) showed closer genetic relationship to M. zeylanica and M. altissima revealing close evolutionary relationship of the two wild species to domesticated mango.Moreover, although M. sylvatica was closely related to species in domesticated clade in chloroplast phylogeny, it was clustered with the two species placed in uncertain position in the classification in the nuclear phylogeny where it was more closely related to M.

Fig. 3
Fig. 3 The phylogenetic tree developed for Mangifera species based on a selected set of nuclear genes using a) concatenation and b) coalescence-based methods.Phylogenetic tree of 17 accessions with A. occidentale used as the outgroup.Concatenation-based trees were generated using Maximum Likelihood (ML) and Bayesian inference (BI) methods and consensus tree is shown in the figure.Numbers associated with the branches are ML bootstrap value (/100) and BI posterior probabilities (/1).In the coalescence-based tree (ASTRAL tree), numbers associated with branches are local posterior probability values (/1).Dark Blue: Sub genus Mangifera, Section Mangifera, Light blue: Sub genus Mangifera, Section Euantherae, Red: Sub genus: Limus, Section Perrennis, Yellow: sub genus: Limus, Section: Deciduae.* Species for which nuclear genes were extracted using raw data downloaded from NCBI.** M. indica cultivar from which gene models were downloaded from NCBI and used to create local database in CLC-GWB for the selection of single copy nuclear genes in M. indica conducted based on one chloroplast gene (Maturase K) while M. indica and M. caloneura have nested into two distinct clades(Hidayat et al. 2011).In our study, based on whole chloroplast genomes, M. laijiwa, M. applanata, M. altissima and M. caloneura were clustered with M. indica sharing 99.9% sequence similarity.These four wild relatives clustered with domesticated mango into a distinct clade even in concatenation-based nuclear phylogeny showing their close evolutionary relationship to M. indica whereas only M. laijiwa out of the above four species clustered separately in the coalescent approach.Furthermore, two of the M. indica cultivars (Kensington Pride and Tommy Atkins) were more closely related to M. altissima than M. indica cv.Alphonso failing to resolve M. indica from M. altissima based on nuclear gene sequences.A close evolutionary relationship between M. altissima and M. indica was also observed in the coalescence approach.Therefore, due to remarkably close evolutionary relationships observed among these species in chloroplast and nuclear phylogenies, we suggest these four wild relatives and cultivated mango are very closely related and might have shared or descended from the same common ancestor.
indica, M. altissima, M. applanata, M. caloneura, M. lalijiwa are very closely related and might have descended from the same common ancestor due to their close evolutionary relationship.It was difficult to confirm the hybrid origin of M. odorata and M. casturi as suggested previously due to absence of one of the proposed parents within our dataset and due to clustering of the available proposed parent in only a low number of gene trees.Relatively high number of gene trees showed close evolutionary relationship between M. zeylanica and M. indica, and M.

Table 2 : Details on genome annotation results of Mangifera species
• Note.GC, guanine or cytosine; IR, inverted repeats; LSC, large single copy; SSC, small single copy • * Mangifera species for which chloroplast genomes were assembled using raw data downloaded from NCBI

Table 3 : Details of INDELs, SNPs and substitutions identified with respect to clustering pattern in chloroplast phylogeny.
BS=100, PP=1, LPP=0.96).But in coalescence-based tree, M. hiemalis, M. sylvatica and M. percisiformis were closely related to six other species belong to subgenus Mangifera (M.altissima, M. applanata, M.indica, M. caloneura, M.zeylanica and M. laurina) than M. casturi and M. lalijiwa while these eight species in subgenus Mangifera were clustered together into one clade in concatenation-based tree.M.odorata is a proposed hybrid between M. indica and M. foetida and M. casturi is a proposed hybrid between M. indica and M. quadrifida.Within our dataset, only one of the parents are available for these hybrids.Although it is not possible to validate the hybridity due to absence of one of the parents, we analysed individual gene trees to support the hybridity by recording the number of gene trees where the hybrids were clustered with the parent available in our dataset (M.indica).Out of 47 gene trees, M. odorata was clustered with M. indica as sister taxa in only four gene trees and they were not supported with high BS values (Table4, FigureS1).Similarly, M. casturi was also clustered with M. indica as sister taxa in four gene trees only in which BS support were weak in three gene trees for this clade (Table4, FigureS1).We observed close evolutionary relationship between M. zeylanica both concatenation and coalescence-based phylogenies, M.casturi and M.lalijiwa are clustered together as sister taxa (BS=100, PP1, LPP= 0.92) and M. hiemalis, M. sylvatica and M. percisiformis were clustered in to one sub-clade (and M. indica and also between M. hiemalis and M. sylvatica in nuclear phylogenies.Therefore, we assumed that M. zeylanica might have undergone domestication and M. sylvatica may have cross hybridised with M. hiemalis during evolution of these species.Analysing individual gene trees revealed that M. zeylanica was clustered with M. indica as sister taxa in 14 gene trees and M. sylvatica was clustered with and M. hiemalis as sister taxa in 12 gene trees.Some individual gene trees for M. zeylanica and M. sylvatica showed less BS support when clustering with M. indica and M. hiemalis respectively (Table4, Figure

Trees in which M.odorata clusters with M.indica in the same clade as sister taxa Trees in which M.casturi clusters with M.indica in the same clade as sister taxa Trees in which M.zeylanica clusters with M.indica in the same clade as sister taxa Trees in which M.sylvatica clusters with M.hiemalis in the same clade as sister
(Yonemori.etal. 2002)d distribution.Close evolutionary relationship was observed in the concatenation-based nuclear phylogeny between M. zeylanica and M. indica in spite of having a distinct chloroplast genome when compared the variations between two species (with 44 INDELs, 122 SNPs and nine substitutions).Therefore, we hypothesized that cross hybridization might have occurred between an early lineage of M. zeylanica and M. indica or its close wild relative.Since the species have a distinct chloroplast genome, we assumed that during the cross, M. indica may have most likely acted as the pollen donor/ or paternal parent, resulting hybrids which carry the chloroplast genome M. zeylanica and nuclear genes of both M. zeylanica and M. indica/it's close relative.The nuclear phylogeny/species tree based on the coalescence approach also showed a close relationship between M. indica and M. zeylanica.Clustering of M. zeylanica with M. indica in 14 individual gene trees suggested that M. zeylanica might have a hybrid origin.But as the BS/PP and LPP values are relatively low for this clade in individual gene trees as well as in both the consensus trees, it is also possible that the set of genes are not sufficiently variable to give a better resolution in the phylogeny.Therefore, it is difficult to make a conclusion about the hybrid origin for M. zeylanica.inindividualgenetrees.Therefore, according to our results, coalescence-based nuclear phylogenies don't strongly support the parentage of M. indica for M. casturi.Since a very low number of genes are shared between M. indica and M. casturi, if M. indica is one of the parents, it is not possible that M. casturi is a first-generation hybrid.Also, the absence of the data for other proposed parent (M.quadrifida) and other wild relatives limits analysing the input of M. quadrifuda and any other species for the hybrid origin of M. casturi.M. laurina is a cultivated species in Indonesia where its wild distribution ranges from Myanmar, Cambodia, Vietnam and Malesia, Thailand to New Guinea.Analysis of ITS genomic region(Yonemori.etal.2002)have hybridized with other species after intoroduction to the regions where it is widely cultivated.Due to the relatively close evolutionary relationship observed between M. laurina and M. indica in nuclear gene analysis despite the chloroplast genome being distinct, it might be possible to occur hybridization between early lineage of M. laurina and M. indica.Current data and results only support the close evolutionary relationship between the two species, but further analysis should be conducted with multiple samples for both species and also including more species in the genus.Out of the remaining four species (M.pajang, M. odorata, M. percisiformis and M. hiemalis) clustered within the same main clade in chloroplast phylogeny, M. pajang is an endemic species originating from and cultivated in Borneo, Indonesia.M. odorata cultivated in Malaysia is a proposed hybrid between M. indica and M. foetida from (Hartana 2010)2021)Warschefsky and von Wettberg 2023)M.odorata,M.percisiformis and M. hiemalis) nested together in the chloroplast phylogeny, M. casturi is a cultivated species in Indonesia.This endemic species is only found in cultivation(Rhodes and Maxted 2016)and was proposed to be a natural hybrid between M. indica and M. quadrifida according to a SNP analysis(Warschefsky 2018).Since M. casturi has shown higher affinity to M. indica than to M. quadrifida instead of being direct intermediate between two species, it was further suggested that M. casturi is most likely a result of an F1 hybrid backcrossed with M. indica(Warschefsky 2018;(Warschefsky and von Wettberg 2023).Microsatellite marker based analysis showed broad genetic variation among four M. casturi accessions (Kasturi, Cuban, Pinari and Pelipisan) and DNA barcording based phylogenetic analysis suggested several species as ancesters for M. casturi(Matra et al. 2021).Genetic variation has also been confirmed between 16 accessions of M. casturi using SNP markers (N.Dillon, pers.comm.).Therefore a combination of microsatellite and DNA barcoading data support that M. quadrifida and M. indica was hybridised to result in M. casturi and F1 hybrids may have further hybridized with the ancestors of the parental species or multiple other Mangifera species to generate hybrid cultivars which have high genetic diversity(Matra et al. 2021).In our study, a close genetic relationship was observed between M. casturi and species in the domesticated clade of the concatenation-based nuclear phylogeny despite having distinct chloroplast genomes.In contrast, in the coalescence approach, M.casturi showed a relatively distant evolutionary relationship with M. indica both in species tree and revealed close evolutionary relationship between M. laurina and M. indica.Analysis of Maturase K chloroplast genomic region has differentiated Indonesia and Thailand specimens collected for M. laurina.Since common interspecific hybridization has been suggested for this species(Hartana 2010), it is possible that M. laurina may have cross