PT - JOURNAL ARTICLE AU - Matheus D Krause AU - Kaio O G Dias AU - Asheesh K Singh AU - William D Beavis TI - Using large soybean historical data to study genotype by environment variation and identify mega-environments with the integration of genetic and non-genetic factors AID - 10.1101/2022.04.11.487885 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.04.11.487885 4099 - http://biorxiv.org/content/early/2022/05/11/2022.04.11.487885.short 4100 - http://biorxiv.org/content/early/2022/05/11/2022.04.11.487885.full AB - Soybean (Glycine max (L.) Merr.) provides plant based protein for global food production and is extensively bred to create cultivars with greater productivity in distinct environments. Plant breeders evaluate new soybean genotypes using multi-environment trials (METs). Application of METs assume that trial sites provide representative environmental conditions that cultivars are likely to encounter when sold to farmers. Thus, it is important to understand the patterns of genotype by environment interactions (GEI) that occur in METs. In order to evaluate GEI for soybean seed yield and identify mega-environments, historical data were investigated with a retrospective analysis of 39,006 unique experimental soybean genotypes evaluated in preliminary and uniform trials conducted by public plant breeders from 1989-2019. Mega-environments (MEs) were identified using yield records of lines from the annual trials and geographic, soil, and meteorological records at the trial locations. Results indicate that yield variation was mostly explained by location and location by year interactions. The static portion of the GEI represented 26.30% of the total yield variance. Estimates of variance due to genotype by location were greater than estimates of variance due to genotype by year interaction effects. A trend analysis further indicated a two-fold increase in the genotypic variance. Furthermore, the heterogeneous estimates of genotypic, genotype by location, genotype by year, and genotype by location by year variances, were encapsulated by distinct probability distributions. The observed target population of environments (TPE) can be divided into at least two and at most three MEs, thereby suggesting improvements in the response to selection can be achieved when selecting directly for clustered (i.e. regions, ME) versus selecting across regions. Clusters obtained using phenotypic data, latitude, and soil variables plus elevation, were the most effective.HighlightsA target population of environments can be split into mega-environments (MEs) according to phenotypic, geographic, and meteorological information.Reliable estimates of variance components are key to the identification of ME, which can be obtained by analyses of historical experimental data.From experimental soybean seed yields evaluated across 31 years of field trials, the phenotypic variance was mostly attributed to location and location by year effects. In terms of genotype-by-environment interactions (GEI), estimated variances of genotype by location interactions was more important than the genotype by year interactions.The GEI trend was successfully captured in terms of parametric probability distributions of variance components, that can be incorporated in simulation studies.Competing Interest StatementThe authors have declared no competing interest.