Abstract
Context Renewed interest in European chestnut in France is focussed on finding locally adapted populations partially resistant to ink disease and identifying local landraces.
Aims We genotyped trees to assess (i) the genetic diversity of wild and cultivated chestnut across most of its range in France, (ii) their genetic structure, notably in relation with the sampled regions, and (iii) relations with its neighbors in Spain and Italy.
Methods A total of 1,401 trees in 17 sampling regions in France were genotyped at 13 SSRs, and a subset of 693 trees at 24 SSRs.
Results Genetic diversity was high in most sampling regions, with redundancy between them. No significant differentiation was found between wild and cultivated chestnut. A genetic structure analysis with no a priori information found a low, yet significant structure, and identified three clusters. Two clusters of sampling regions, south east France and Corsica, were less admixed than the others. A substructure was detected in the admixed cluster suggesting differentiation in wild chestnut trees in Finistère and Aveyron sampling regions.
Conclusion The genetic structure within and between our sampling regions is likely the result of natural events (recolonization after the last glaciation) and human activities (migration and exchanges). Notably, we provide evidence for a common origin of most French and Iberian chestnut trees, except those from, south east France that were associated with the Italian gene pool. This advance in our knowledge of chestnut genetic diversity and structure will benefit conservation and help our local partners’ valorization efforts.
Key message This paper presents the results of the first assessment of genetic diversity and structure of wild and cultivated sweet chestnut in France. It reveals high diversity, a low but significant structure, and strongly suggests that the French gene pool is at the intersection between the Italian and Spanish gene pools.
1. Introduction
Sweet chestnut (Castanea sativa Mill.) is an endemic, multi-purpose tree species cultivated for its wood and nuts. It is the third broad-leaved tree species in France in forest area (750,000 ha) and in 2016, accounted for 5% of land used for fruit production (FranceAgriMer 2017). With an annual production of 7,000-9,000 tons in the last 10 years, France is the fifth European producer (FAO 2018). Sweet chestnut has been intensively cultivated in coppices and orchards for centuries in France. However, since the beginning of the 18th century, it has suffered from abandonment, leading to a sharp decrease in production (Pitte 1986; Sauvezon et al. 2000). Many landraces and associated knowledge were lost. In the 1960s, the French National Institute for Agricultural Research (INRA) started a breeding program to develop interspecific hybrids resistant to ink disease caused by a Phytophtora fungus, by crossing two Asian tolerant species, Castanea crenata and Castanea mollissima, with local landraces from regions with an oceanic climate. These hybrids are now mainly used for fruit production and as rootstock (particularly Marigoule and Bouche de Bétizac varieties, and more recently BelleFer). However, they are not adapted to continental and Mediterranean conditions (Martin et al. 2017; Míguez-Soto et al. 2019). Their fruit quality has also been criticized by some growers and by chestnut lovers, particularly in comparison with landraces. Action was thus taken by these actors, involving survey of old chestnut trees, phenotypic observations and the establishment of conservatory orchards.
Strong geographical structure was reported in wild populations in Italy, Spain, Greece and Turkey (Mattioni et al. 2013). A study of wild chestnut in Spain, Italy and Greece (Fernández-Cruz and Fernández-López 2016) found two main gene pools in Europe, and another study of wild, natural or naturalized populations (Mattioni et al. 2017), found three. These findings agree with evidence of spontaneous establishment originating from the Last Glacial Maximum refugia in the north of the Iberian, Italian and Balkan peninsulas, and in northern Anatolia (Krebs et al. 2004, 2019; Roces-Díaz et al. 2018). In southern France, there is possible evidence for chestnut refugia in palaeo-botanical data (Krebs et al. 2019). The preferred hypothesis is therefore that most pre-cultivation Castanea in France are the result of the spontaneous spread of the species from neighboring southern European refugia, i.e. in Spain and Italy. However, the most recent genetic analyses conducted exclusively on French populations were published in the 1990s on wild chestnut and at a regional scale (Frascaria et al. 1991, 1992; Frascaria and Lefranc 1992) the and the results obtained in the CASCADE project (Eriksson et al. 2005) have not yet been published (T. Barreneche pers. com.). Mattioni et al. (2008) compared naturalized, coppice and orchard populations in Italy, Greece, Spain, the UK and France, and showed differences in within-population genetic parameters between fruit orchards and other types of chestnut management. This result implies that long-term management techniques can influence the genetic makeup of the populations. Differences between and within countries have also been reported (Pereira-Lorenzo et al. 2016). For these reasons, specifically French, finer-scale sampling of both wild (forest) and cultivated chestnut trees (orchards and alignments) was needed to help distinguish between natural and anthropogenic evolutionary factors.
In terms of sampling, many authors have genotyped tree collections ex situ, i.e., in conservatories (Martín et al. 2010a), and in situ (Pereira-Lorenzo and Fernandez-Lopez 1997; Gobbin et al. 2007; Martin et al. 2010b; Pereira-Lorenzo et al. 2010, 2019; Beccaro et al. 2012; Mellano et al. 2012, 2018; Beghè et al. 2013; Quintana et al. 2014; Fernández-López and Fernández-Cruz 2015). In this study, we used both in- and ex-situ sources to assess the known and currently used genetic diversity of sweet chestnut. As a result, we often sampled several individuals belonging to the same landrace. Hereafter, we use the term “landrace” as defined by Villa et al. (2005) rather than “variety” or “cultivar”, as it better covers the variety of sampling situations we encountered in the field. However, we do use the term “cultivar” when known cultivars were encountered.
The main aims of this work were to assess (i) the genetic diversity of wild and cultivated chestnut in most of its range in France, (ii) their genetic structure, notably in relation with the sampling regions, and (iii) relations between French chestnut and its neighbors in Spain and Italy. For this purpose, we sampled natural chestnut populations, ancient grafted chestnut identified in situ by local partners and ex situ local landraces in conservatories in the main nut-producing regions and in most of the distribution of natural chestnut forests in France. We used microsatellite markers from the EU chestnut database to genotype all sampled trees at 13 SSRs and a subset at 24 SSRs (Pereira-Lorenzo et al. 2017). By also including Iberian samples cited in the Pereira-Lorenzo et al. publication, we also provide some evidence for the origin of the trees we sampled.
Fixation of genotypes by grafting from spontaneous chestnut, or “instant domestication” as defined by (Harris et al. 2002), is reported in the literature (Aumeeruddy-Thomas et al. 2012), and was recently documented in Italy and Spain (Pereira-Lorenzo et al. 2019). As a working hypothesis, this suggested a possible lack of genetic structure between wild and cultivated chestnut. It is common knowledge that grafts and nuts travel by means of markets, historically via occupational travelers such as glass blowers (Pitte 1986) and now via local and internet-mediated exchange fairs. However, the extent and impact of this phenomenon on the genetic structure of cultivated chestnut was previously unknown in France. We hypothesized that it is sufficiently frequent to have a significant impact, leading to a low genetic structure of cultivated chestnut in France. As reported in (Pereira-Lorenzo et al. 2019), we also expected to find a high overall genetic diversity, but without marked differences between the wild and cultivated sets. In addition, we expected in situ local landraces to be multiclonal due to repeated grafting over the centuries and the accumulation of mutations or the use of seedlings from the landrace.
2. Materials and Methods
Terminology: we avoid the use of “population” and instead use “sampling region” to describe a geographically or socially meaningful region where a non-profit association has prospected and conserved chestnut, or a group of sampling sites located close by. We use “genetic cluster” to denote a cluster of genotyped trees resulting from the analysis of genetic structure. “Chestnut type” is used as a category with two levels, “forest” and “cultivated”.
2.1. Geographical sampling
In forest stands, trees were chosen randomly, located several dozen meters apart in the middle of forest patches. Their exact locations were recorded by GPS. In Brittany, Auvergne-Rhône-Alpes, Occitanie, Provence-Alpes-Côtes d’Azur (PACA) and Corsica, mature leaves were sampled and immediately enclosed in plastic bags with silicagel. In Gironde, dormant buds were sampled from trees close to the laboratory to facilitate frequent re-sampling when assessing the accuracy of genotyping protocols. In Corsica, nuts and dried leaves were also sampled in the field, whereas cultivated chestnut was provided as DNA extract. Whenever we sampled offspring as groups of half sib fruits, we also sampled leaves from their mothers. Nuts harvested in the Finistère, Corsica, Basque Country and Aveyron forest sampling regions were germinated and sown in the greenhouse.
2.2. Expert-based sampling
Field surveys of cultivated chestnut were conducted in 2016-2017 in collaboration with producer and amateur organizations. In 2016, we focused our sampling effort on the landraces they knew and were interested in. In 2017, we expanded sampling to most known landraces and grafted trees, supplemented by random sampling in a few chestnut orchards. Associative conservatories were also sampled. We sampled several chestnut trees that had the same name to test the genetic diversity of landraces. When attributing sampled trees to a given landrace, when known, we followed the field expert’s determination.
2.3. SSR genotyping
A total of 1401 trees were genotyped at 13 SSRs, and a subset of 693 were genotyped at 24 SSRs. Total genomic DNA was extracted from fresh leaves, silica-dried leaves or dormant buds using the DNeasy 96 Plant kit (Qiagen, Hilden Allemagne). Twenty-four SSR markers previously selected to study chestnut genetic diversity were used for this study (Buck et al. 2003; Gobbin et al. 2007; Kampfer et al. 1998; Marinoni et al. 2003; Steinkellner et al. 1997) based on the protocol of Pereira-Lorenzo et al. (2017). We amplified these 24 SSRs into 5 multiplex and 2 singleplex PCRs using one of the FAM, NED, PET, VIC fluorophore-labeled primers (PE Applied Biosystems, Warrington, UK) modified following (Pereira-Lorenzo et al. 2017, 2019). The PCR final reaction volume was 15 µl (7.5 µl of QIAGEN Multiplex Master Mix, 0.075 to 0.3 µM of each primer, 4 to 4.9 µl RNase Free Water and 2 µl of ADN at 5-10 ng/µl). The amplification conditions were 95°C for 5 min, followed by 30 cycles at 95°C for 30 s, annealing at a specific temperature depending on the multiplex set, for 1.5 min, and 1 min at 72°C, and final extension at 60 °C for 30 min. Negative controls were included in all PCR reactions to enable detection of cross contamination of the samples.
Amplifications at 13 SSRs corresponded to sets 1, 2 and 3. Amplifications at 24 SSRs corresponded to all sets. Set 1 (57°C): EmCs14-VIC, EmCs15-FAM, EmCs38-FAM, EmCs2-NED, CsCAT14-PET, CsCAT2-VIC. Set 2: (50°C): CsCAT-16-PET, CsCAT41-FAM, QpZAG110-PET, QpZAG36-VIC, CsCAT3-NED. Set 3: post-PCR multiplexing: QrZAG4-NED (48°C) and QrZAG96-NED (52°C). Set 4: (50°C): CsCAT6-NED, CsCAT1-PET, CsCAT15-FAM, CsCAT8-VIC. Set 5: (58°C): RIC-FAM, CsCAT17-PET, EmCs22-VIC. Set 6: (60°C): EmCs25-FAM, CIO-NED, OCI-PET and OAL-VIC. Amplification products were diluted with water, 2 µl of the diluted amplification product was added to 0.12 µl of 600LIZ size standard (Applied Biosystems, Foster City, USA) and 9.88 µl of formamide.
Genotyping was performed partly on an ABI 310 capillary sequencer (Applied Biosystems, Foster City, CA, USA) at the Xylobiotech FCBA facility of Cestas-Pierroton with further work on an ABI 3500 XL capillary sequencer (Applied Biosystems, Foster City, CA, USA) at the CIRAD GenSeqUM platform in Montpellier, France. Allele sizes were read independently by two investigators using GENEMAPPER 4.1 and 5.0 respectively (Applied Biosystem, Foster City, USA). The output files in the fsa format were made compatible for GENEMAPPER 4.1 using a Python script from the Montpellier platform.
2.4. Data analysis
2.4.1. Detection of clonal groups and null alleles
All individuals with more than 20% of missing alleles were removed along with with individuals showing Asian alleles (Pereira-Lorenzo et al. 2010). CsCAT41 is known to amplify two sites: the CsCAT41A (Pereira-Lorenzo et al. 2010) locus was thus removed before analysis. The presence of uninformative loci was tested with the informloci function in the R/poppr package version 2.8.3 (Kamvar et al. 2015; Kamvar et al. 2014) in both data sets. The percentages of missing data were obtained using the info_table function in R/poppr. The frequency of null alleles per locus was calculated with the R/PopGenReport package version 3.0.4 (Adamack and Gruber 2014) based on Brookfield formula (Brookfield 1996). Following (Lassois et al. 2016), we discarded loci with more than 10% of null alleles. After removing loci, the genotype curve function implemented in the R/poppr. was applied to both data sets to determine the minimum number of loci necessary to discriminate between individuals. Redundant genotypes were searched within each sampling region to identify multi-locus genotypes (MLGs) for each data set, using the clonecorrect function in R/poppr.
2.4.2. Genetic diversity
The observed number of alleles (Na) and observed heterozygosity (Ho) were calculated at each locus using the summary function in the R/adegenet package 2.1.1 (Jombart 2008). The effective number of alleles (Ne) was calculated using the expected heterozygosity (He) from the summary function for genind object in R/adegenet, with Ne=1/(1-He). The Fst and corrected Fst (Fstp), Fis and Dest per locus (Jost 2008; Nei 1987) were calculated using the basic.stats function in the R/hierfstat package version 0.04-22 (Goudet 2005). The poppr function in R/poppr was used to report other basic statistics per sampling region including the Shannon-Weiner diversity index (H), the index of association (Ia), and the standardized index of association (rbarD) (Agapow and Burt 2001). The significance of Ia and rbarD were tested with 1000 permutations, shuffling the genotypes at each locus while maintaining the heterozygosity and allelic structures. Deviation from the Hardy-Weinberg equilibrium (HWE) was tested on both loci and populations with 1000 permutations using the hw.test function in the R/pegas package version 0.11 (Paradis 2010). The Chi2 statistic was calculated over the entire data set and two p values were computed, one analytical and one derived from 1000 Monte-Carlo permutations.
2.4.3. Population structure
In each data set, using the find.clusters function in R/adegenet, SSR genotypes were transformed by a principal component analysis (PCA), followed by the k-means algorithm applied to the principal components (PCs) to identify groups of individuals we call “genetic clusters” (Jombart et al. 2010). The number of clusters was determined using the BIC. Discriminant analysis of principal components (DAPC, Jombart et al. 2010) was then performed based on this grouping. The number of principal components (PCs) to keep was chosen by cross-validation using the xvalDapc function in R/adegenet with 30 repetitions and a maximum of 80 PCs. Hierarchical analysis of molecular variance (AMOVA, Excoffier and Smouse 1992) as implemented in the poppr.amova function in R/poppr was performed using all loci with less than 5% missing data on the preset hierarchy of chestnut types and sampling regions, and on genetic clusters. Fis, pairwise Fst and hierarchical F-statistics were calculated, and 95% confidence intervals were obtained by bootstrapping with 1000 samples over loci using the boot.ppfis, boot.ppfst and boot.vc functions. Differences between hierarchy levels were tested by randomization with the function randtest in the R/ade4 package version 1.7-13 (Excoffier and Smouse 1992; Chessel et al. 2004). Some components of covariance could have slightly negative estimates due to the absence of significant genetic structure at the corresponding hierarchical level (FAQ List for Arlequin 2.000).
2.4.5. Reproducibility
To facilitate method reproducibility (Goodman et al. 2016), all our analyses were performed in R (R Core Team 2019); the scripts are available at https://data.inra.fr/privateurl.xhtml?token=8c03a83c-be4d-4984-972f-7808558b4539.
3. Results
3.1 Sampling scheme
To characterize and understand the genetic diversity and population structure of the European chestnut (Castanea sativa Mill.) in France, we genotyped 1,401 trees in 17 sampling regions in both forest and cultivated areas. Table 1 lists sampling details and Figure 1 shows the location of the sampling regions (GPS of sampled trees are available upon request).
3.2. Detection of null alleles and redundant multi-locus genotypes
After filtering genotyped trees for missing alleles, 1,214 trees genotyped at 13 SSRs (respectively 642 at 24 SSRs) remained for further analysis (Table 1). Moreover, some SSRs were known to often have a high null allele frequency, such as EmCs25 (Lusini et al. 2014) and CsCAT14, CsCAT2, CsCAT41, QrZAG4 and CIO (Pereira-Lorenzo et al. 2017). In our data, EmCs38 null allele frequency was higher than 10% in the 13 SSR data set (respectively EmCs38 CIO and EmCs25 in the 24 SSR data set) and was discarded (Online Resource 1). After filtering uninformative loci and those with more than 5% of missing values, the resulting data sets had 19 SSRs, hereafter called 19All, and 10 SSRs, hereafter called 10All. Redundant multi-locus genotypes (MLGs) were then discarded in each sampling region, as they could be the result of both practices (grafting) and sampling choices, and had to be removed to avoid the artefactual detection of genetic structure resulting from the sampling strategy. The resulting data sets (Table 1) are called 10Unik (1050 trees) and 19Unik (521 trees). In both data sets, the discriminating power of the polymorphic markers to differentiate between genotypes was sufficient to discriminate all individuals irrespective of the number of loci and individuals (Online Resource 2).
3.3. Description of SSR diversity per sampling region
The 19 SSRs analyzed in this study varied greatly in allele diversity (Online Resource 3). The 10Unik data set (respectively 19Unik) had a total of 113 alleles (respectively 186), with an average of 11.3 alleles per locus (respectively 9.8). This ranged from 3 for EMCs2 to 33 for CsCAT3 (respectively 2 for QrZAG4 to 31 for CsCAT3). In terms of expected heterozygosity (He), EMCs2 showed the lowest diversity with 0.66 in 10Unik (respectively QrZAG4 with 0.17 in 19Unik) and CsCAT3 the highest diversity with 0.85 in 10Unik (respectively 0.83 in 19Unik). The within-population inbreeding coefficient (Fis) ranged from -0.437 to 0.134 in 10Unik (respectively -0.439 to 0.152 in 19Unik), with a mean of -0.069 in 10Unik (respectively. -0.116 in 19Unik). Across all sampling regions, in 10Unik, it was not possible to reject the HWE for CsCAT3 and QpZAG110. In 19Unik, only QpZAG110 and QrZAG4 were in the HWE (Online Resource 4). When tested per sampling region, only ForGard, ForHerault and ForBasque were in the HWE in both data sets. ForAveyron and ForCantal were in the HWE only in 10Unik. Moreover, in both data sets, HWE was rejected for all SSR loci in at least one sampling region except OCI in 19Unik
3.4. Redundant diversity among sampling regions and no differentiation between chestnut types
Genetic diversity indices calculated for each sampling region genotyped at 10 SSRs without MLGs are listed in Table 2 (results at 19 SSRs are presented in Online Resource 5). The aim of sampling ForGironde was not to be representative of the region, but to facilitate resampling. In the 19Unik data set ForBasque had a single individual. Therefore, diversity and differentiation are discussed excluding ForGironde in the 10Unik data set, and excluding ForGironde and ForBasque in the 19Unik data set. The highest effective number of alleles per sampling region was found in the Finistère forest sampling regions in 10Unik (ForFinistere, north west of France) and the lowest was found in the cultivated sampling region in Var (CultVar, south east of France). The mean observed heterozygosity was 0.681 and the mean expected heterozygosity was 0.658. The sampling regions with the lowest (respectively highest) observed heterozygosity were ForVar in the south east of France (respectively the forest sampling region in Hérault, CultHerault). The sampling regions with the lowest (respectively highest) expected heterozygosity were CultVar (respectively the forest sampling regions in Finistère, ForFinistere). Excluding ForGironde, no positive and significant inbreeding (Fis) was found in any region. The highest Ia and rbarD were found in CultVar and the lowest (but not significant) were found in ForFinistere. The results of AMOVA (Table 3 and Online Resource 6), revealed no substantial difference in structure in chestnut type between forest stands and cultivated orchards (the variance component did not significantly differ from zero and Fct with confidence intervals excluding zero, although very close). Instead, more than 80% of the variance was found within each sampling region. At a threshold of 0.001, we rejected the null hypothesis of panmixia, both among sampling regions within chestnut types and within sampling regions. Among sampling regions within chestnut types, the Phi test statistic of the AMOVA indicated greater variance than expected under the null hypothesis. This suggested an underlying structure at this hierarchical level that was confirmed by a positive bootstrap-derived confidence interval for Fst (7%-9.3%). Within sampling regions, the Phi test statistic indicated lower variance than expected under the null hypothesis. This suggested some inbreeding at this hierarchical level, but this hypothesis was invalidated by a bootstrap-derived confidence interval for Fis including zero.
3.5. Highly admixed genetic structure
In addition to analyzing genetic diversity per sampling region, we also evaluated the overall genetic structure to detect genetic clusters, if any, and to assess their congruence with respect to each sampling region. The number of genetic clusters was determined using the BIC after running the k-means algorithm. For each data set, this criterion started by decreasing sharply (Online Resource 7), demonstrating the presence of genetic structure. However, the signal was not clear for all the data sets, making the choice of the number of genetic clusters rather difficult. But based on the results and motivated by the parsimony principle, we chose K=3 for the remaining analyses and for each data set (except for the cultivated data set genotyped at 10 SSRs where K=6).
On the 10Unik data set, in the first step of the DAPC, 70 principal components were selected by cross-validation, collectively representing 99.2% of the total variance (Figure 2). In the second step, two linear discriminant functions were used to discriminate the three genetic clusters. The first discriminant function separated clusters 1 and 3 most strongly, and 78.6 % of the individuals from Corsica were grouped in cluster 3. The second discriminant function separated clusters 1 and 2. Cluster 2 grouped most individuals in Var (ForVar and CultVar) and some in Ardèche (CultArdech). However, overall, most individuals (79.8%) of the cultivated and forest types were grouped in cluster 1, pointing to an overall admixed genetic structure in our sample. This was confirmed by the relatively low pairwise Fst calculated between clusters and, as can be seen in the assignment plot (Online Resource 8). Nineteen samples out of 1,050 had a posterior assignment probability for a given genetic cluster of less than 80%. Similar results were obtained with the DAPC at 19 SSRs without MLGs (Online Resource 9, plot 1). Cluster 2 represented 66.4% of all genotyped individuals in most sampling regions. Clusters of forest and cultivated sampling regions in Var and some in Ardèche (n° 3) and Corsica (n° 1) were identified, showing that there was no clear genetic differentiation between the forest and cultivated stands in either of these two sampling regions. A hierarchical AMOVA of the 10Unik data set (respectively 19Unik) corroborated this finding(Table 4 and Online Resource 6) and showed that 84.3% of the variance (respectively 84%) was found among samples within clusters. No substantial difference in structure was found between clusters: the variance component at this level was not significantly different from zero, although the Fst confidence interval excluded zero.
When the inbreeding coefficient was calculated per cluster (Table 5 and Online Resource 10), the 95% confidence interval of all the clusters included zero. The mean observed heterozygosity was 0.693 for 10Unik (respectively 0.703 for 19Unik) and the mean expected heterozygosity was 0.688 (respectively. 0.666 for 19Unik).
As cluster 1 in figure 2 contained 79.8% of all the samples, we investigated its sub-structure by performing a DAPC on its samples (Online Resource 9, plot 3). BIC showed an optimal structure at K=3 (Online Resource 7). The resulting sub-clusters were all admixed with low Fst (0.045 – 0.055) even though the confidence intervals excluded zero. Moreover, this sub-structure separated samples from the two most frequently represented sampling regions: 91% of ForAveyron samples belonged to sub-cluster 1 and 82% of ForFinistere samples belonged to sub-cluster 3.
4. Discussion
4.1. Sampling
This work is the first comprehensive survey of genetic diversity and structure of Castanea sativa Mill. in France. As such, it fills the sampling gap in France for the benefit of future studies of chestnut structure in Europe. Our study benefited from two projects (The first author’s PhD and the FCBA project) which had different goals but whose sampling regions partially overlapped ours, and which used the same genotyping and allele scoring procedures. Combining these projects resulted in a large sampling effort to better assess the overall diversity of cultivated and forest chestnut in France, a crucial component of landscape genetics (Schwartz and McKelvey 2009), although not respective abundance in each sampling regions, which was not our aim in this particular study.
4.2. Diversity indices
The levels of diversity in our sampling regions are comparable with those reported in other studies (Lusini et al. 2014; Mattioni et al. 2017; Mattioni et al. 2013; Skender et al. 2017), similarly, the mean number of alleles per locus are comparable with those obtained in other European regions (Lusini et al. 2014; Pereira-Lorenzo et al. 2017). The high observed heterozygosity in two of our sampling regions, CultArdech and CultLimousin, could be explained by the fact that they were sampled in several local conservatories.
4.3. Redundant diversity among sampling regions and no differentiation between chestnut types
The absence of significant genetic structure between forest and cultivated stands, and the high variance found within sampling regions, implies that each sampling region hosts substantial diversity, mostly shared with the other sampling regions. Such redundancy between sampling regions can be interpreted as the result of human exchanges (Bruneton-Governatori 1999; Conedera et al. 2016; Krebs et al. 2019; Pitte 1986). Concerning Var and Corsica, some information made us think that the sampled forest in these regions may previously have been used as chestnut orchards. One MLG in ForVar region was equivalent to one in CultVar, and forest and cultivated trees from sampling regions of Var and Corsica were grouped in the same genetic cluster. This could be explained by the multipurpose past uses of the forests, as attested by the current owner of the Corsica stands. After performing the AMOVA on the 10Unik data set, this time after removing the forest sampling regions of Var and Corsica, the Fct among chestnut types had a confidence interval including zero (Online Resource 6).
Redundant genetic diversity in our sampling regions should ensure backup diversity, as long as information about landraces is shared among stakeholders in the different sampling regions. In situ sampling revealed that many landraces are multi-clonal. This source of diversity and hence of potential adaptation argues in favor of not reducing a landrace to one arbitrary clone. Even clones should be carefully evaluated, as morphological differences between clones were reported during our field trips, as has been the case in other species (Cipriani et al. 2010). All this is particularly interesting at a time when chestnut valuation tends to be based on heritage, with significance and quality marks based on local landraces (e.g., AOC Châtaigne d’Ardèche, AOC Farine de châtaigne Corse – Farina castagnina corsa, Label rouge Marron du Périgord). Genetics could provide authorities with arguments to justify certifying landraces are “local”. On the other hand, even if a landrace has been cultivated for centuries in a particular place, this may also be the case elsewhere. Therefore, one might rightfully ask whether the quality of local chestnut comes from its locality. For crops like chestnut, usage and practices may be at least as important as genetics to give value to chestnut for growers and consumers (Dupré 2002, 2005; Martin et al. 2017).
4.4. A highly admixed genetic structure
Paralleling the high redundancy between sampling regions, the genetic structure from the DAPC remained low or moderate among subgroups. The main finding here was the high admixture between the regions we sampled, both forest and cultivated. There was thus no clear-cut distinction between sampling regions considered as forest or as cultivated, as confirmed by the AMOVA. This result was not completely unexpected given that chestnut is an outcrossing species and that gene flow between forest and cultivated stands is known to occur, together with changes in usage over time and in certain practices such as forests being used as a source of seedlings for rootstock, good quality fruits as a source of seedlings to plant forests, peasant woods in Limousin (personal communication), and “instant domestication” (Pereira-Lorenzo et al. 2019).
Characterizing genetic diversity (respectively structure) as high (strong) or low (weak) can be particularly risky as it has to be in relative terms. Like (Pereira-Lorenzo et al. 2019), we found a Fct close to zero between wild and cultivated chestnut. When characterizing the genetic diversity and structure of wild chestnut from Italy, Spain, Greece and Turkey, Mattioni et al. (2013) obtained a molecular variance among three clusters of 11.58%, i.e. lower than our 15.7%. They also found a Fst of 12.6% between genetic clusters representing Italian and Spanish samples, higher than our 9%.
4.5. A hypothetical common glacial refugia for French and Iberian chestnut
The genetic structure inferred from our samples did not necessarily match the sampling regions. This result was also expected for a continuously dispersed species affected by human management like European chestnut. Moreover, an admixed genetic structure was consistent with the known patterns of divergence and distribution of chestnut (Mattioni et al. 2017), combined with evidence from fossil pollen of several tree species suggesting that chestnut populations originating from Italy or the Balkans spread into the Iberian Peninsula from the north (Grivet and Petit 2003; Petit 2003).
In the EU database (2017) and in (Pereira-Lorenzo et al. 2019), « Luguesa » was classified with the Italian group of cultivars. In our analyses, it was found in the south-eastern cluster (cluster n°3 in Online Resource 9, plot 2) grouped together with « Puga » and « Raigona », which were both originally classified in the Iberian group, whereas the other Spanish cultivars were found in cluster n°2. Therefore, the majority of the Iberian group seems to match the main French group, suggesting that both originated in the same glacial refugia.
Before removal of hybrid individuals, the Basque sampling region was represented in the 10 (respectively 19) dataset by 119 (respectively 10) successfully genotyped non-redundant individuals. This high number of admixed individuals is an important feature of the actual chestnut forest there, resulting from the long history of interspecific hybridization in this region which extends on both sides of the border between Spain and France (Pereira-Lorenzo et al. 2017). It is further substantiated by the high prevalence of trees tolerant to ink disease, as found in artificial inoculation experiments (Robin et al., in preparation).
A European analysis of the genetic structure of European chestnut including a significant French sampling remains to be done.
4.6. Future outlook of SNP genotyping
The markers we used were selected after an extensive review of the literature (by us for the 13 SSR, and independently by Pereira-Lorenzo et al. 2018), and allele scoring was the subject of a recent optimization by Pereira-Lorenzo et al. (2018). Nevertheless, we faced the usual difficulties and drawbacks of microsatellites, i.e., errors and uncertainties in allele calling, difficulty in data comparison and transferability across labs and collaborators over time, and the huge amount of time needed to perform the analysis, as emphasized in previous studies (reviewed by (Guichoux et al. 2011)).
We consequently set up a small project to define nuclear SNPs, at least to check clear duplicates (in the case of good quality genotyping results) and putative duplicate (in the case of low quality results) among samples from variety repositories. In a few months, we re-genotyped about 500 samples with up to 160 SNPs and confirmed all suspected duplicates. A detailed description of this work will the subject of a separate article.
5. Conclusion
In conclusion, this study revealed the genetic diversity and structure of French forest and cultivated chestnut across most of its range. We showed high diversity redundancy between sampling regions and a weak genetic structure. Based on external knowledge, the influence of human activity is the most probable explanation for this finding. Three main clusters were found, one in Corsica, one in the south east of France, probably partially matching a previously-described Italian group of cultivars, and one main admixed cluster matching the Iberian cultivars. This confirms existing historical knowledge on land use changes, the movement of landraces, and « instant domestication » landraces. Furthermore, we provide evidence for a common origin of most of the French and Iberian chestnut, except those from the south east of France, which were associated with the Italian gene pool. We believe our work provides useful information for conservation planning purposes and for cooperation between chestnut non-profit associations and groups of growers interested in landrace conservation and diffusion.
Footnotes
Email address: cathy.bouffartigue{at}inra.fr; sandrine.debille{at}fcba.fr; olivier.fabreguettes{at}inra.fr; ana.ramos{at}usc.es ; santiago.pereira.lorenzo{at}usc.es ; timothee.flutre{at}inra.fr; luc.harvengt{at}fcba.fr
Revision of typos and authors added in the manuscrit.
https://data.inra.fr/privateurl.xhtml?token=8c03a83c-be4d-4984-972f-7808558b4539