Signatures of Environmental Adaptation During Range Expansion of Wild Common Bean (Phaseolus vulgaris)

Landscape genomics integrates population genetics with landscape ecology, allowing the identification of putative molecular determinants involved in environmental adaptation across the natural geographic and ecological range of populations. Wild Phaseolus vulgaris, the progenitor of common bean (P. vulgaris), has a remarkably extended distribution over 10,000 km from northern Mexico to northwestern Argentina. Earlier research has shown that this distribution represents a range expansion from Mesoamerica to the southern Andes through several discrete migration events and that the species colonized areas with different temperature and rainfall compared to its core area of origin. Thus, this species provides an opportunity to examine to what extent adaptation of a species can be broadened or, conversely, ecological or geographical distribution can be limited by inherent adaptedness. In the current study, we applied a landscape genomics approach to a collection of 246 wild common bean accessions representative of its broad geographical and climatic distribution and genotyped for ∼20K SNPs. We applied two different but complementary approaches for identifying loci putatively involved in environmental adaptation: i) an outlier-detection method that identifies loci showing strong differentiation between sub-populations; ii) an association method based on the identification of loci associated with bio-climatic variables. This integrated approach allowed the identification of several genes showing signature of selection across the different natural sub-populations of this species, as well as genes associated with specific bio-climatic variables related to temperature and precipitation. The current study demonstrates the feasibility of landscape genomics approach for a preliminary identification of specific populations and novel candidate genes involved in environmental adaptation in P. vulgaris. As a resource for broadening the genetic diversity of the domesticated gene pool of this species, the genes identified constitute potential molecular markers and introgression targets for the breeding improvement of domesticated common bean. Author Summary The ancestral form of common bean has an unusually large distribution in the Americas, extending over 10,000 km from ∼35° N. Lat. to ∼35° S. Lat. This wide distribution results from discrete long-range dissemination events to the Andes region from the original environments in Mesoamerica. It also suggests adaptation to new environments that are distinct from those encountered in Mesoamerica. In this research, we identified genes that may be involved in adaptation to climate variables in these new environments using two methods. A first method – outlier detection – was used to identify genome regions that differentiated the wild bean groups in the Andes resulting from discrete dissemination events among themselves and the different groups in Mesoamerica. The second method – genome-wide association – was used to identify candidate genome regions correlated with these same variables across the entire distribution from Mesoamerica to the southern Andes. The two methods identified two sets of candidate genes, several of which were related to the water status of plants, and illustrate how the genetic architecture of adaptation following long-range dissemination. This study provides sets of candidate genes as well as candidate wild bean populations that need to be corroborated for their use in increasing the water use efficiency of domesticated beans.


141
Bio-climatic data analysis 142 The bio-climatic variables downloaded from the WorldClim database concern mostly 143 temperature and rainfall during the year. These bio-climatic variables were developed for 144 generating biologically informative variables useful for species distribution modeling and 145 landscape genomics approaches. In our analyses, the 19 bio-climatic variables analyzed showed a 146 great degree of correlation, in particular for similar variables like bio_14 (precipitation of the 147 driest month) and bio_17 (precipitation of the driest quarter), or bio_13 (precipitation of the 148 wettest month) with bio_16 (precipitation of the wettest quarter) (S1 Fig). 149 The loading plot on the first two PCs showed some correlations between bio-climatic 150 variables and principal components, as well as strong correlations between some of the bio-

Genome-wide association analysis
225 A genome-wide association analysis identified 49 genes associated with the bio-climatic 226 variables selected for this analysis. Except for the bio_18 variable (Precipitation of Warmest 227 Quarter), for which no associations were detected, the other variables were associated with at 228 least one gene. The bio-climatic variables with the highest number of associated genes were 229 bio_3 (Isothermality) with 29 genes, and bio_12 (Annual precipitation) with 11 genes (S2 Table).

230
The associated genes were located in all 11 common-bean reference genome chromosomes, 231 except for chromosome Pv06 where there were no significantly associated SNPs. Some of the 232 genes were associated with more than one bio-climatic variables (Fig 5), suggesting the 233 possibility that they could be related to multiple environmental stimuli.

234
Of these 49 genes identified by genome-wide association analysis, only 10 (20%) were 235 located within a haplotype block (S2 Table). In addition, when mapping significant SNPs 236 identified by association analysis to the latest reference genome (v2.1), 44 genes (88%) were 237 confirmed as putatively associated with environmental variables also in this assembly (S2 Table).
238 Four out of five of the missing genes were not present in the v2.1 annotation file.

239
Among the genes significantly associated with one or more bio-climatic variables, we found 240 several of them related to hormone response, ion homeostasis, plant development, metabolism, 241 and response to stress, in particular drought (S2 Table). Among

260
The allele frequency distribution of the candidate genes identified by genome scan showed 261 drastic differentiation between the genetic groups identified (Fig 6), as expected from the 262 assumptions of the genome scan approach, with some alleles being private for just one of the 263 genetic group (like the alternative alleles for GRP and CAO that were observed only in the AW 264 group). On the other hand, the genes identified by association analysis showed a wide variety of 265 allele frequencies distribution across the different genetic groups (Fig 7), even though some 266 genes had only a single allele in some of the populations (like the reference allele for PLD and 267 TRX in the PhI and AW group). In general, the genes identified by association analysis showed a 268 higher variation of allele frequencies among the different MW groups. 295 was shown again to most diverged group from the Mesoamerican and Andean gene pools, 296 especially along the molPC2 and molPC3 axes (Fig 2), further supporting the hypothesis that this