Whole-genome SNP analysis elucidates the genetic population structure and diversity of Acrocomia species

Acrocomia (Arecaceae) is a genus widely distributed in tropical and subtropical America that has been achieving economic interest due to the great potential of oil production of some of its species. In particular A. aculeata, due to its vocation to supply oil with the same productive capacity as the oil palm even in areas with water deficit. Although eight species are recognized in the genus, the taxonomic classification based on morphology and geographic distribution is still controversial. Knowledge about the genetic diversity and population structure of the species is limited, which has limited the understanding of the genetic relationships and the orientation of management, conservation, and genetic improvement activities of species of the genus. In the present study, we analyzed the genomic diversity and population structure of seven species of Acrocomia including 117 samples of A. aculeata covering a wide geographical area of occurrence, using single nucleotide Polymorphism (SNP) markers originated from Genotyping By Sequencing (GBS). The genetic structure of the Acrocomia species were partially congruent with the current taxonomic classification based on morphological characters, recovering the separation of the species A. aculeata, A. totai, A. crispa and A. intumescens as distinct taxonomic groups. However, the species A. media was attributed to the cluster of A. aculeata while A. hassleri and A. glauscescens were grouped together with A. totai. The species that showed the highest and lowest genetic diversity were A. totai and A. media, respectively. When analyzed separately, the species A. aculeata showed a strong genetic structure, forming two genetic groups, the first represented mainly by genotypes from Brazil and the second by accessions from Central and North American countries. Greater genetic diversity was found in Brazil when compared to the other countries. Our results on the genetic diversity of the genus are unprecedented, as is also establishes new insights on the genomic relationships between Acrocomia species. It is also the first study to provide a more global view of the genomic diversity of A. aculeata. We also highlight the applicability of genomic data as a reference for future studies on genetic diversity, taxonomy, evolution and phylogeny of the Acrocomia genus, as well as to support strategies for the conservation, exploration and breeding of Acrocomia species and in particular A. aculeata.

31 aculeata showed a strong genetic structure, forming two genetic groups, the first represented 32 mainly by genotypes from Brazil and the second by accessions from Central and North American 33 countries. Greater genetic diversity was found in Brazil when compared to the other countries. Our 34 results on the genetic diversity of the genus are unprecedented, as is also establishes new insights 35 on the genomic relationships between Acrocomia species. It is also the first study to provide a 36 more global view of the genomic diversity of A. aculeata. We also highlight the applicability of 37 genomic data as a reference for future studies on genetic diversity, taxonomy, evolution and

Introduction 42
The genus Acrocomia is endemic to tropical and subtropical America. This genus is one of 43 the most taxonomically complex concerning species in the family Arecaceae [1]. Taxonomic   44 classifications of Acrocomia are mostly limited to the description of species based on 45 morphological and geographical distribution information. However, extensive morphological 46 plasticity, especially for species with wide geographical distribution, has hindered the taxonomic resolution of species. Since the description of the genus Acrocomia by Martius in 1824 [2], many 48 species have been included and removed from the genus. From the most recent classifications, genus level. The population genetics approach can assist in species delimitation and provide 82 reference information on the genetic diversity and structure within and between species. Such 83 knowledge is essential for more efficient management and economic exploration of the species 84 and can guide strategies for domestication and conservation of these genetic resources. A.

85
aculeata is an emerging crop with incipient domestication. The analysis of genetic diversity of A.

86
aculeata is crucial to guide the selection of the most promising materials for crop use, to maximize 87 genetic gains, and to more effectively contribute to the creation of commercial cultivars.

88
In this context, molecular markers have been broadly adopted in plants as an essential tool 89 to investigate genetic diversity in ecological, phylogenetic, and evolutionary studies. In addition, 90 they have been widely used for direct management, conservation, and genetic breeding of several 91 species [18]. More recently, next-generation sequencing (NGS) has facilitated the identification of 92 single nucleotide polymorphisms (SNPs), which have emerged as the most extensively used 93 genotyping markers due to their abundance and distribution in the genome. The use of SNPs has 94 considerably expanded knowledge of the genetic diversity of genomes of various plant species [19] 95 at low cost and without the need for reference genomes [20][21][22]. However, SNPs have not been 96 used as markers in genetic studies of Acrocomia species.

102
Only one study has analyzed the genetic diversity of A. totai (Lima et al., 2020).

103
Considering the wide distribution of A. aculeata in the Americas, all the studies carried out using 104 molecular markers have revealed a limited panorama of species genetic diversity because they 105 considered a very small geographic sampling, with genotypes obtained mainly from the states of 106 São Paulo and Minas Gerais in Brazil (Abreu et al., 2012;Lanes et al., 2015;Mengistu 2016;107 Oliveira et al., 2012;Coelho et al., 2018). Only a single study has evaluated the genetic diversity of natural populations of A. aculeata (termed A. mexicana)  126 a greater distribution in America, was represented by samples from five countries (Fig 1 and S1 127 Table).

131
Data used to generate the species distribution (Colored shading) are based on occurrence record 132 data from GBIF (Global Biodiversity Information Facility www.gbif.org) and Lorenzi et al.,[4].

133
Circles represent geographical location and origin of samples in this study.  [41], and selecting the variables that showed the highest correlation.

210
For the snmf function, the most likely number of populations for the different data sets was 211 determined using 100,000 interactions, and 10 repetitions for K = 1-15.

212
The identification of SNPs hypothetically under selection (outliers) was performed for the 213 following groups: 1) In the genus Acrocomia, considering the species as groups, and 2) within A. aculeata, considering as groups the samples' countries of origin. We considered as loci putatively 215 under selection those shared between the three identification methods (fsthet, pcadapt and LFMM)

216
(S2 Table). Consequently, we adopted the remaining SNPs considered neutral for the analysis of 217 population genomic diversity and structure.

219
Population structure 220 We used all samples (S1 Table) to perform the analysis of the genomic structure for de Analysis of genomic diversity 239 We conducted the population diversity analysis only with the SNP data set identified as 240 neutral for two groups or taxonomic levels: 1) The genus Acrocomia (except the species A. hassleri and A. glaucescens as they contain only one individual for each species), and 2) A. aculeata.

250
In population genetics, neutral loci are genomic regions that are influenced by mutational 251 dynamics and demographic effects, and not by selection. However, loci under selection (i.e., 252 outliers) generally behave differently and therefore reveal "extreme" patterns of variation [55,56].

253
Since most population genetic inferences are based on neutral loci, the loci under selection can 254 greatly influence the estimates of genetic parameters. In this sense, it is important to identify and 255 remove the outlier loci from the analysis, with the aim to infer more reliable parameters of 256 population genetic diversity and structure.

257
Based on pcadapt, fsthet, and LEA, we identified 42 outlier loci for all samples or taxonomic 258 groups for the genus Acrocomia, and 10 outlier loci for the taxonomic group formed by samples of 259 A. aculeata. The neutral datasets for the different groups were constructed by removing the 260 outliers. After the removal of outlier loci (S2 Table), genus Acrocomia (all species) and A. aculeata

277
The majority (n = 34) of these samples were from the southeast region of the country, with five 278 from the north region (BEL population). Cluster 4 was exclusively A. crispa samples, with a 100% 279 probability of assignment to the cluster.

294
The NJ and PCoA analyses (Figs 3a and 3b) performed with all the samples showed strong 295 agreement with the results of the Bayesian analysis performed using Structure software. However, the NJ tree showed higher resolution in group/cluster recovery than the PCoA. In both analyses, A.
297 crispa was clearly separated from the rest of the Acrocomia species. In addition, there is a clear 298 genomic differentiation between A. aculeata and A. totai. Similar to the results obtained using the 299 Structure software, the NJ analysis also recovered the substructure within A. aculeata, separating 300 the Brazilian samples from those from other countries (Fig 3a). This separation did not result from 301 the PCoA (Fig 3b)

315
Based on the Structure software results (Fig 2) and NJ and PCoA data (Fig 3a and 3b The number of polymorphic loci of the five Acrocomia species ranged from 0.017 to 0.601.

328
A. aculeata had the highest mean and A. media had the lowest mean ( identified in the structure analysis at the genus level (Fig 2). The two groups were mainly 348 associated with geographical origin, given that samples from Central and North America

349
(Colombia, Costa Rica, Trinidad and Tobago, and Mexico) were grouped in cluster 1, and most of 350 the collected in Brazil were grouped in cluster 2 (Fig 4). The same two groups identified using the Structure software were also visualized by using 361 the first two PCoA axes as well as the NJ dendrogram. These analyses clearly revealed the 362 formation of two distinct genetic groups within A. aculeata, which are suggested to be 363 geographically separated by the Amazon Rainforest (Fig 4b and 4c) Tobago, and Puerto Rico (Fig 4b). 372 The second PCoA axis comprised three samples from Cáceres, MT (CAC). These samples 373 formed a subgroup that was very distant from the other samples of A. aculeata. However, the 374 Structure and NJ dendrogram data were not able to discriminate these samples, and grouped with 375 individuals from Brazil (Fig 4a and 4c).

376
The 'South' group (Cluster 2 in Fig 4a)  the Structure software analysis (Fig 4a). This result was also corroborated by the NJ and PCoA 384 hierarchical classification (Fig 4b and 4c).

386
Genomic diversity of A. aculeata 387 Concerning the genomic diversity within A. aculeata species, the greatest diversity was

390
However, the greatest allelic richness for the species was registered in Brazil (Ar = 1.44) (

406
To our knowledge, this is the first study using GBS for identifying genome-wide SNPs and 407 their application for inferring the genetic diversity and population structure in Acrocomia species 408 and within A. aculeata. Sampling was broad in terms of the occurrence of Acrocomia species and 409 comprehensively captured the genomic diversity and structure of the species.

411
A. aculeata 412 At the genus level, the distinction of A. aculeata as an independent genetic group or taxon 413 was supported through the results obtained with the Bayesian analyses (Fig 2), and by the PCoA and the NJ tree (Figs 3a and 3b). A notable finding was the identification of an accentuated 415 substructure within A. aculeata, showing two genetic groups, corresponding to a north-south split 416 in which the samples from Brazil (Northern group, blue cluster in Fig 3) were separated from those 417 of Central and North America (Southern group, red Cluster in Fig 3). This result was evident in the

418
Bayesian analysis performed at the genus level (Fig 2) as well as with only samples of A. aculeata 419 (Fig 4a).The substructure identified in A. aculeata has not been previously reported and can be 420 attributed to the greater number of samples included in this study, which covered a wide 421 geographic occurrence of the species in the American continent. The presence of two genetic 422 groups may be the result of reproductive isolation due to the Amazon Rainforest acting as a 423 geographical barrier that prevented gene flow between them and with an independent evolution.

424
Another hypothesis is that these two gene pools support the existence of more than one species, 425 as reported in a previous taxonomic classification in Central and North America Countries [57].

426
Another interesting result observed was that individuals from the population of Maranhão 427 presented as an admixture between the Northern and Southern groups of A. aculeata (Fig 4). The  Table 3).

450
At the intraspecific level, the highest genetic diversity for the species was found in Brazil, especially 451 in the States of Minas Gerais and São Paulo (Table 3). Although it is not possible to make direct

479
Although not treated as distinct species, but considering the geographical distribution of both, 480 several studies using molecular and morphological markers also reinforced the classification of A.  493 taxonomic classification, were attributed to cluster 2 of A. totai by the Bayesian analysis (Fig 2)

609
The genomic data of our study did not allow the assignment of distinct taxonomic units to 610 the species A. hassleri and A. glauscescens. Based on morphological characters, the species are 611 clearly differentiated from the others by their small size. However, based on the results obtained 612 from the cluster analysis, they were assigned to cluster 2, being closely related to A. totai (Fig. 2,   613 3a, and 3b). However, this result should be considered with caution, as we only used one sample 614 of each species in the analyses, which could limit the comparison of genetic estimates and 615 decrease the probability of detecting genetic structure, as evidenced in similar studies with a low 616 number of samples [102,103]. Further studies with a greater number of accessions are needed to 617 increase the species representation, and to establish reliable genetic relationships between A.
618 hassleri and A. glauscescens and other Acrocomia species.

621
Our study is the first to offer evidence of the efficiency of NGS through the application of the 622 GBS protocol in Acrocomia. The data may constitute a reference for the application of this protocol 623 in the genus. Even without a reference genome, we successfully identified a large number of SNPs 624 for several species, revealing potentially valuable markers for future studies in the genus

625
Acrocomia. The SNPs yielded unprecedented results of the genetic relationships between Acrocomia species as well as at the population level for A. aculeata. In general, our results were 627 partially congruent with the taxonomy of the genus, supporting the current separation of some 628 species. The genomic structure revealed the formation of well-defined genetic groups and 629 confirmed the distinction of A. aculeata, A. totai, A. intumescens, and A. crispa, with the latter

630
showing a strong genetic differentiation as well as the absence of genetic distinction of A. media.

631
We recommend a review of the current taxonomic classification of A. crispa and A. media. In 632 addition, SNPs also allowed the identification of gene flow patterns and/or hybridization between 633 species.

634
In the case of A. aculeata, the data provide an overview of the genomic diversity and 635 structure from sampling over a wide area of occurrence. The genomic data showed the existence 636 of two large gene pools in the species at the continental level (north and south), with greater 637 genomic diversity in the latter populations. The results from this study will serve as a reference for 638 current and future studies on genetic diversity, taxonomy, evolution, ecology, and phylogeny of the 639 genus Acrocomia, and will support genetic breeding, conservation, and management activities for