Abstract
Key message
Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset.
Abstract
Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1 % on average and the variability in prediction accuracy decreased by 10.8 %. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
Similar content being viewed by others
Abbreviations
- BLUP:
-
Best linear unbiased predictor
- GBLUP:
-
Genomic estimated best linear unbiased predictor
- GEBV:
-
Genomic estimated breeding value
- G*E:
-
Genotype by environment interactions
- GS:
-
Genomic selection
- MET:
-
Multi-environment trials
- QTL:
-
Quantitative trait locus
- Q*E:
-
QTL by environment interaction
- SGL:
-
Sparse group lasso
- SNP:
-
Single nucleotide polymorphism
- TPE:
-
Target population of environments
References
Akdemir D, Heslot N (2012) Soft rule ensembles for statistical learning. Arxiv Prepr Arxiv 1205:4476
Boer MP, Wright D, Feng L et al (2007) A mixed-model quantitative trait loci (QTL) analysis for multiple-environment trial data using environmental covariables for QTL-by-environment interactions, with an example in maize. Genetics. doi:10.1534/genetics.107.071068
Brancourt-Hulmel M, Lecomte C, Meynard JM (1999) A diagnosis of yield-limiting factors on probe genotypes for characterizing environments in winter wheat trials. Crop Sci. doi:10.2135/cropsci1999.3961798x
Brancourt-Hulmel M, Denis JB, Lecomte C (2000) Determining environmental covariates which explain genotype environment interaction in winter wheat through probe genotypes and biadditive factorial regression. Theor Appl Genet. doi:10.1007/s001220050038
Breiman L (2001) Random forests. Mach Learn. doi:10.1023/A:1010933404324
Breiman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80:580–598
Bureau A, Dupuis J, Falls K et al (2005) Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol. doi:10.1002/gepi.20041
Burgueño J, Crossa J, Cornelius PL, Yang RC (2008) Using factor analytic models for joining environments and genotypes without crossover genotype × environment interaction. Crop Sci. doi:10.2135/cropsci2007.11.0632
Burgueño J, Crossa J, Cotes JM et al (2011) Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. doi:10.2135/cropsci2010.07.0403
Burgueño J, De los Campos G, Weigel K, Crossa J (2012) Genomic Prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. doi:10.2135/cropsci2011.06.0299
Chapman SC, Cooper M, Butler D, Henzell R (2000a) Genotype by environment interactions affecting grain sorghum I. Characteristics that confound interpretation of hybrid yield. Aust J Agric Res. doi:10.1071/AR99020
Chapman SC, Cooper M, Hammer G, Butler D (2000b) Genotype by environment interactions affecting grain sorghum. II. Frequencies of different seasonal patterns of drought stress are related to location effects on hybrid yields. Aust J Agric Res 51:209–221
Chapman SC, Hammer G, Butler D, Cooper M (2000c) Genotype by environment interactions affecting grain sorghum III. Temporal sequences and spatial patterns in the target population of environments. Aust J Agric Res. doi:10.1071/AR99022
Chenu K, Chapman SC, Hammer G et al (2008) Short-term responses of leaf growth rate to water deficit scale up to whole-plant and crop levels: an integrated modelling approach in maize. Plant Cell Environ. doi:10.1111/j.1365-3040.2007.01772.x
Chenu K, Deihimfard R, Chapman SC (2013) Large-scale characterization of drought pattern: a continent-wide modelling approach applied to the Australian wheatbelt––spatial and temporal trends. New Phytol. doi:10.1111/nph.12192
Chiquet J, Grandvalet Y, Charbonnier C (2012) Sparsity with sign-coherent groups of variables via the cooperative-lasso. Ann Appl Stat. doi:10.1214/11-AOAS520
Comstock RE (1977) Quantitative genetics and the design of breeding programs. In: Pollak E, Kempthorne O, Bailey TB (eds) Proceedings of the international conference on quantitative genetics. Iowa State University Press, Ames, pp 705–718
Cooper M, DeLacy IH (1994) Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor Appl Genet. doi:10.1007/BF01240919
Crossa J, Vargas M, Van Eeuwijk FA et al (1999) Interpreting genotype × environment interaction in tropical maize using linked molecular markers and environmental covariables. Theor Appl Genet. doi:10.1007/s001220051276
Cullis BR, Smith AB, Beeck CP, Cowling WA (2010) Analysis of yield and oil from a series of canola breeding trials. Part II. Exploring variety by environment interaction using factor analysis. Genome. doi:10.1139/G10-080
DeLacy IH, Basford KE, Cooper M et al (1996) Analysis of multi-environment trials––an historical perspective. In: Cooper M, Hammer G (eds) Plant adaptation and crop improvement. CAB International, Wallingford, pp 39–124
Demotes-Mainard S, Doussinault G, Meynard JM (1996) Abnormalities in the male developmental programme of winter wheat induced by climatic stress at meiosis. Agronomie. doi:10.1051/agro:19960804
Denis JB (1988) Two-way analysis using covariates. Statistics 19:123–132
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Pearson Prentice Hall, Harlow
Fischer RA (1985) Number of kernels in wheat crops and the influence of solar radiation and temperature. J Agri Sci. doi:10.1017/S0021859600056495
Friedman J, Popescu BE (2003) Importance sampled learning ensembles. J Mach Learn Res 94305:1–32
Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954
Friedman JH, Hastie T, Tibshirani R (2010a) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1
Friedman JH, Hastie T,Tibshirani R (2010b) A note on the group lasso and a sparse group lasso. Arxiv Prepr Arxiv:10010736
Gallagher JN, Biscoe PV (1978) Radiation absorption, growth and yield of cereals. J Agri Sci. doi:10.1017/S0021859600056616
Gate P (1995) Ecophysiologie du blé. De la plante à la culture. Tec & Doc, Paris, p 430
Gauch HG (2006) Statistical analysis of yield trials by AMMI and GGE. Crop Sci. doi:10.2135/cropsci2005.07-0193
Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedure. Genetics. doi:10.1534/genetics.105.049510
Gilmour AR, Gogel B, Cullis BR, et al (2009) ASREML user guide release 3.0. VSN International Ltd.
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics. doi:10.1534/genetics.107.081190
Hammer G, Kropff MJ, Sinclair TR, Porter JR (2002) Future contributions of crop modelling—from heuristics and supporting decision making to understanding genetic regulation and aiding crop improvement. Eur J Agron. doi:10.1016/S1161-0301(02)00093-X
He J, Le Gouis J, Stratonovitch P et al (2012) Simulation of environmental and genotypic variations of final leaf number and anthesis date for wheat. Eur J Agron. doi:10.1016/j.eja.2011.11.002
Heffner EL, Lorenz AJ, Jannink J-L, Sorrells ME (2010) Plant breeding with genomic selection: gain per unit time and cost. Crop Sci. doi:10.2135/cropsci2009.11.0662
Heslot N, Jannink J-L, Sorrells ME (2013) Using genomic prediction to characterize environments and optimize prediction accuracy in applied breeding data. Crop Sci. doi:10.2135/cropsci2012.07.0420
Hunt LA (1991) Post anthesis temperature effects on duration and rate of grain filling in some winter and spring wheats. Can J Plant 617:609–617
Jamieson PD, Semenov MA, Brooking IR, Francis GS (1998) Sirius: a mechanistic model of wheat response to environmental variation. Eur J Agron. doi:10.1016/S1161-0301(98)00020-3
Jullien A, Mathieu A, Allirand JM et al (2011) Characterization of the interactions between architecture and source-sink relationships in winter oilseed rape (Brassica napus) using the GreenLab model. Ann Bot-Lond. doi:10.1093/aob/mcq205
Kelly AM, Cullis BR, Gilmour AR et al (2009) Estimation in a multiplicative mixed model involving a genetic relationship matrix. Genet Sel Evol. doi:10.1186/1297-9686-41-33
Landau S, Mitchell RA, Barnett V et al (1998) Testing winter wheat simulation models’ predictions against observed UK grain yields. Agric Forest Meteorol. doi:10.1016/S0168-1923(97)00069-5
Landau S, Mitchell RA, Barnett V et al (2000) A parsimonious, multiple-regression model of wheat yield response to environment. Agric Forest Meteorol. doi:10.1016/S0168-1923(99)00166-5
Lecomte C (2005) Experimental evaluation of varietal innovations. Proposition of genotype––environment analysis tools adapted to the diversity of needs and constraints of the professionals of the seeds industry. Diss AgroParisTech p 262
Levins R (1966) The strategy of model building in population biology. Am Sci 54:421–431
Löffler CM, Wei J, Fast T et al (2005) Classification of maize environments using crop simulation and geographic information systems. Crop Sci. doi:10.2135/cropsci2004.0370
Lorenz AJ, Chao S, Asoro FG et al (2011) Genomic selection in plant breeding: knowledge and prospects. Adv Agron. doi:10.1016/B978-0-12-385531-2.00002-5
Ma CX, Casella G, Wu R (2002) Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161:1751–1762
Malosetti M, Voltas J, Romagosa I et al (2004) Mixed models including environmental covariables for studying QTL by environment interaction. Euphytica. doi:10.1023/B:EUPH.0000040511.46388.ef
Martre P, Jamieson PD, Semenov MA et al (2006) Modelling protein content and composition in relation to crop nitrogen dynamics for wheat. Eur J Agron. doi:10.1016/j.eja.2006.04.007
Messina C, Hammer G, Dong Z et al (2009) Modelling crop improvement in a GXEXM framework via gene-trail-phenotype relationships. In: Sadras VO, Calderini D (eds) Crop physiology: applications for genetic improvement and agronomy. Elsevier, Netherlands, pp 235–265
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Meynard JM, Sebillotte M (1994) L’élaboration du rendement du blé, base pour l’étude des autres céréales à paille. In: Picard D, Combe L (eds) Elaboration du rendement des principales cultures annuelles. INRA, Paris, pp 31–51
Monteith J (1972) Solar radiation and productivity in tropical ecosystems. J Appl Ecol 9:747–766
Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. doi:10.1186/1753-6561-5-S3-S11
Park T, Casella G (2008) The bayesian lasso. Am Stat Assoc. doi:10.1198/016214508000000337
Pérez P, De los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the bayesian linear regression package in R. Plant Gen. doi:10.3835/plantgenome2010.04.0005
Piepho HP (1998) Empirical best linear unbiased prediction in cultivar trials using factor-analytic variance-covariance structures. Theor Appl Genet. doi:10.1007/s001220050885
Piepho HP, Möhring J (2006) Selection in cultivar trials—is it ignorable? Crop Sci. doi:10.2135/cropsci2005.04-0038
Piepho HP, Möhring J (2007) Computing heritability and selection response from unbalanced plant breeding trials. Genetics. doi:10.1534/genetics.107.074229
Piepho HP, Denis JB, Van Eeuwijk FA (1998) Predicting cultivar differences using covariates. J Agric Biol Environ Stat. doi:10.2307/1400648
Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica. doi:10.1007/s10681-007-9449-8
Piepho HP, Ogutu JO, Schulz-Streeck T et al (2012) Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding. Crop Sci. doi:10.2135/cropsci2011.11.0592
Podlich DW, Cooper M, Basford KE (1999) Computer simulation of a selection strategy to accommodate genotype-environment interactions in a wheat recurrent selection programme. Plant Breed. doi:10.1046/j.1439-0523.1999.118001017.x
Quilot B, Génard M, Kervella J, Lescourret F (2004) Analysis of genotypic variation in fruit flesh total sugar content via an ecophysiological model applied to peach. Theor Appl Genet. doi:10.1007/s00122-004-1651-7
Reymond M, Muller B, Leonardi A et al (2003) Combining quantitative trait loci analysis and an ecophysiological model to analyze the genetic variability of the responses of maize leaf growth to temperature and water deficit. Plant Physiol. doi:10.1104/pp.013839.soil
Reymond M, Muller B, Tardieu F (2004) Dealing with the genotype x environment interaction via a modelling approach: a comparison of QTLs of maize leaf length or width with QTLs of model parameters. J Exp Bot. doi:10.1093/jxb/erh200
Smith AB, Cullis BR, Thompson R (2005) The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. J Agric Sci. doi:10.1017/S0021859605005587
Sofield I, Evans L, Cook M, Wardlaw I (1977) Factors influencing the rate and duration of grain filling in wheat. Aust J Plant Physiol. doi:10.1071/PP9770785
Stone P, Nicolas M (1998) The effect of duration of heat stress during grain filling on two wheat varieties differing in heat tolerance: grain growth and fractional protein accumulation. Aust J Plant Physiol. doi:10.1071/PP96114
Tashiro T, Wardlaw I (1990) The response to high temperature shock and humidity changes prior to and during the early stages of grain development in wheat. Aust J Plant Physiol. doi:10.1071/PP9900551
Van der Goot E, Orlandi S (2003) Technical description of interpolation and processing of meteorological data in CGMS. Joint Research Centre of the European Commission, Ispra, Italy, p 23
Van Eeuwijk FA, Denis J-B, Kang MS (1996) Incorporating additional information on genotypes and environments in models for two-way genotype by environments tables. In: Kang MS, Gauch HG (eds) Genotype-by-environment interaction. CRC Press, Boca Raton, pp 15–50
Van Eeuwijk FA, Malosetti M, Yin X et al (2005) Statistical models for genotype by environment data: from conventional ANOVA models to eco-physiological QTL models. Aust J Agric Res. doi:10.1071/AR05153
White JW, Herndl M, Hunt LA et al (2008) Simulation-based analysis of effects of loci on flowering in wheat. Crop Sci. doi:10.2135/cropsci2007.06.0318
Windhausen VS, Wagener S, Magorokosho C et al (2012) Strategies to subdivide a target population of environments: results from the CIMMYT-led maize hybrid testing programs in Africa. Crop Sci. doi:10.2135/cropsci2012.02.0125
Zadoks JC, Chang TT, Konzak CF (1974) A decimal code for the growth stages of cereals. Weed Res. doi:10.1111/j.1365-3180.1974.tb01084.x
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. doi:10.1111/j.1467-9868.2005.00503.x
Acknowledgments
We thank Pierre Martre for providing the crop model. The reviewers provided excellent comments that significantly improved the paper. JRC-MARS––Meteorological Data Base––EC––JRC provided access to the interpolated meteorological data. This research was supported in part by USDA-NIFA-AFRI grants, award numbers 2009-65300-05661, 2011-68002-30029, and 2005-05130 and by Hatch project 149-449. Limagrain Europe provided financial support for N. Heslot.
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standards
The experiments comply with the current laws of the countries in which they were performed.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. E. Melchinger.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Heslot, N., Akdemir, D., Sorrells, M.E. et al. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet 127, 463–480 (2014). https://doi.org/10.1007/s00122-013-2231-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-013-2231-5