ABSTRACT
In the cereal crop sorghum (Sorghum bicolor) inflorescence morphology variation underlies yield variation and confers adaptation across precipitation gradients, but its genetic basis is poorly understood. Here we characterized the genetic architecture of sorghum inflorescence morphology using a global nested association mapping (NAM) population (2200 recombinant inbred lines) and 198,000 phenotypic observations from multi-environment trials for four inflorescence morphology traits (upper branch length, lower branch length, rachis length, and rachis diameter). Trait correlations suggest that lower and upper branch length are under largely independent genetic control, while lower branch length and rachis diameter are pleiotropic. Joint linkage and genome-wide association mapping revealed an oligogenic architecture with 1–22 QTL per trait, each explaining 0.1%–5.0% of variation across the entire NAM population. Overall, there is a significant enrichment (2.4-fold) of QTL colocalizing with homologs of grass inflorescence genes, notably with orthologs of maize (Ramosa2) and rice (Aberrant Panicle Organization1, TAWAWA1) inflorescence regulators. In global georeferenced germplasm, allelic variation at the major inflorescence morphology QTL is significantly associated with precipitation gradients, consistent with a role for these QTL in adaptation to agroclimatic zones. The findings suggest that global inflorescence diversity in sorghum is largely controlled by oligogenic, epistatic, and pleiotropic variation in ancestral regulatory networks. This genotype-phenotype trait dissection in global germplasm provides a basis for genomics-enabled breeding of locally-adapted inflorescence morphology.
INTRODUCTION
Understanding the genetic architecture of complex traits in crops provides insights into crop evolution and guidance on breeding strategies. Adaptive traits are phenotypic characteristics that are subject to selection towards an optimum for a particular environment (Barrett and Hoekstra, 2011). Genetic architecture describes the structure of the genotype-phenotype map for complex traits in populations: the number of loci, distribution of effect size, frequencies of alleles, gene action (dominance and epistasis), and the degree of linkage and pleiotropy (Holland, 2007). Trait variation in a population may shift from oligogenic to polygenic architecture as a population moves towards an optimum (i.e. Fisher’s geometric model) (Orr, 2005; Tenaillon, 2014). Thus, characterizing genetic architecture of complex traits is key step to bridge theoretical understanding (e.g. evolutionary, metabolic, or developmental drivers) and applied outcomes (e.g. crop and livestock breeding strategies, or management of human genetic disorders) (Hansen, 2006; Timpson et al., 2018). For instance, molecular breeding strategies are guided by genetic architecture, with marker-assisted backcross for monogenic traits, marker-assisted recurrent selection for oligogenic traits, and genomic selection for polygenic traits (Bernardo, 2008).
Divergence of adaptive traits often results in genetic differentiation and population structure that hinders effective characterization of their genetic architecture (Myles et al., 2009; Brachi et al., 2011). Genome-wide association studies (GWAS) in diverse natural populations have been widely used to characterize genetic architecture but are limited by a fundamental tradeoff when causative variants (i.e. the oligogenic component) are confounded with polygenic variation. Models without population and/or kinship terms partition the colinear variance into the monogenic/oligogenic term (leading to false positive associations) while models with population and/or kinship terms partition colinear variation into polygenic terms (leading to false negatives) (Bergelson and Roux, 2010). In a nested association mapping (NAM) population, controlled crosses between the common parent and diverse founders breaks up population structure, increasing power for QTL detection (Myles et al., 2009). In addition, the larger population size in most NAM populations mitigates the Beavis effect, the overestimation of QTL effect size that occurs in small populations (Utz et al., 2000). NAM has greatly facilitated the characterization of genetic architecture in species where controlled crosses are feasible, including many major crops (Buckler et al., 2009; Maurer et al., 2015; Bajgain et al., 2016; Bouchet et al., 2017).
Inflorescence morphology is a key component of crop adaptation and yield (Harlan and de Wet, 1972; Cooper et al., 2014). Homologous variation of inflorescence among cereals has long been noted (Vavilov, 1922) and inflorescence morphology has been a valuable system to investigate the evolutionary dynamics and molecular basis of genetic architecture in plants (Hermann and Kuhlemeier, 2011; Zhang and Yuan, 2014). Analysis of inflorescence mutants has revealed regulatory networks with genes controlling hormonal biosynthesis, hormone transport, signal transduction, and transcriptional regulation (Zhang and Yuan, 2014). Comparative studies indicate that components of inflorescence regulatory networks are largely conserved across grass species, but that substantial variation in ancestral regulatory networks exists within and among species (Kellogg, 2007; Barazesh and McSteen, 2008; Tanaka et al., 2013; Huang et al., 2017). However, since most inflorescence regulators were identified via mutant screens, the role of these ancestral genes in natural variation or adaptive divergence of inflorescence morphology is not well understood (Brown et al., 2011; Crowell et al., 2016; Wu et al., 2016). In addition, studies of natural variation may reveal genes not yet identified via mutant analysis.
Sorghum is a source of food, feed, and bioenergy in many parts of the world, especially important to smallholder farmers in semi-arid regions (National Research Council, 1996). Sorghum has diffused to contrasting agroclimatic zones, and harbors abundant variation in traits such as height, leaf architecture, and inflorescence morphology. Variation in inflorescence morphology underlies yield components (Brown et al., 2006; Witt Hmon et al., 2013) and local adaptation to agroclimatic zones defined by precipitation gradients (De Wet et al., 1972; Olatoye et al., 2018). The five major botanical races of sorghum are largely defined based on inflorescence morphology (Harlan and de Wet, 1972). For instance, guinea sorghums with long open panicles predominate in humid zones while durra sorghums with short compact panicles predominate in arid zones (Lasky et al., 2015; Olatoye et al., 2018). However the genetic architecture of inflorescence morphology in sorghum is poorly understood, and no genes have been characterized (Brown et al., 2008; Witt Hmon et al., 2013). In this study we used nested association mapping to characterize the genetic architecture of inflorescence morphology variation in global sorghum. Our results reveal that global sorghum inflorescence variation is under the control of oligogenic, epistatic, and pleiotropic loci, consistent with Fisher’s geometric model under disruptive selection.
MATERIALS AND METHODS
Plant materials and phenotyping
The sorghum NAM population was derived from a cross between an elite U.S. common parent RTx430 and 10 diverse founders that originated from different agroclimatic zones, thereby capturing a wide genetic and morphological diversity (Supplementary Table 1, Supplementary Figure 1) (Bouchet et al., 2017). Each diverse parent and its RILs represent a family of 200–233 RILs making a total of 2220 RILs in the population. Field phenotyping experiments were conducted in semi-arid (Hays, Kansas; 38.86, −99.33) and humid-continental (Manhattan, Kansas; 39.21, −96.59) environments for two years (2014 and 2015) to represent a typical range of growing conditions. Each site-by-year was regarded as one environment (Table 1). In the second year (2015), RILs were randomized within maturity blocks of families in a row-column design based the first-year flowering data. Each row (corresponding to a plot) was 3 m with 1 m alleys between ranges.
The NAM RILs were phenotyped at F6:7 and F6:8 generations for upper primary branch length (UBL), lower primary branch length (LBL), rachis length (RL), and rachis diameter (RD) (Supplementary Figure 2). Three random panicles were collected from each plot after physiological maturity and subsequently used for phenotyping. Inflorescence morphology traits were measured using barcode rulers (1 mm precision) and barcode readers (Motorola Symbol CS3000 Series Scanner, Chicago IL, USA). RL was measured as the distance from apex of the panicle to the point of attachment of the lowest rachis lower primary branch (Brown et al., 2006). RD was measured using a digital Vernier caliper as the diameter of the peduncle at the point of attachment of the bottommost rachis lower primary branch. For UBL, three primary branches were randomly detached from the apex of the panicle. For LBL, three primary branches were randomly detached from the region closest to the peduncle for two panicles (Supplementary Figure 2).
Genomic data analysis
Genotyping-by-sequencing of the NAM population and diverse global germplasm was previously described (Bouchet et al., 2017; Hu et al., 2019). Briefly, Illumina sequence reads were aligned to the BTx623 reference genome version 3 using Burrow Wheeler Aligner 4.0 and SNP calling was done using TASSEL-GBS 5.0 (Glaubitz et al., 2014). For the current study, missing data imputation was done in two stages using Beagle 4 (Browning and Browning, 2013). The NAM population and the sorghum association mapping population (SAP) GBS data were first extracted from the build. Filtering was conducted to remove markers with (i) tri-allelic SNPs, (ii) missing data in more than 80% of individuals, or (iii) < 3% minor allele frequency prior to imputation. The NAM population and SAP were imputed separately and each germplasm set was filtered for MAF > 0.05. NAM RILs with >10% heterozygosity were dropped from the analysis.
Phenotype and heritability analysis
Phenotypic data analysis was carried out using R programming language and SAS (SAS Institute Inc., Cary, NC, USA). All traits were tested for normality and the only trait (UBL) with significantly skewed distribution was log transformed. Analysis of variance was performed for each trait using aov function in R. The best linear unbiased prediction (BLUP) of each trait was estimated using data from five environments with lmer function in LME4 package in R (Bates et al., 2014) with genotype, environment, and genotype-environment interactions fitted as random effects (Wu et al., 2016). The variance components used for broad sense heritability (H2) were estimated using the maximum likelihood method by PROC VARCOMP of the SAS software (SAS Institute Inc., Cary, NC, USA). RIL-nested-within-family and RIL-nested-within-family by environment interaction were fit as random effects. The resulting variance components were used to estimate the broad sense heritability (H2) following equation 1 in (Hung et al., 2012) as: where is the variance component of RILs nested within family p, nenvlp is the harmonic mean of the number of environments in which each RIL was observed, and nplotp the harmonic mean of the total number of plots in which each RIL. Pearson pairwise correlation between traits was estimated using the residuals derived from fitting a linear model for family and trait phenotypic means: where y is the vector of phenotypic data, μ is the overall mean, γi is the term for the NAM families, and εij is the residual.
Joint linkage mapping
Joint linkage analysis was performed using 92,391 markers and 2220 RILs. This approach is based on forward inclusion and backward elimination stepwise regression approaches implemented in TASSEL 5.0 stepwise plugin (Glaubitz et al., 2014). The family effect was accounted for as a co-factor in the analysis. First, a nested joint linkage (NJL) model was fitted where markers were nested within families (Poland et al., 2011; Würschum et al., 2012). In addition, a non-nested joint linkage model (JL), where markers were not nested within families, was used due to its higher predictive power than NJL (Würschum et al., 2012). Entry and exit Ftest values were set to 0.001 and based on 1000 permutations, the P-value threshold was set to 1.84 × 10−6. The JL model was specified as: where b0 is the intercept, uf is the effect of the family of founder f obtained in the cross with the common parent (RTx430), αf is the coefficient matrix relating uf to y, bi is the effect of the ith identified locus in the model, xi is the incidence vector that relates bi to y and k is the number of significant QTL in the final model (Yu et al., 2008).
Genome-wide association studies
GWAS was performed for all traits using 92,391 markers and 2220 RILs using BLUPs adjusted by environments. The multi-locus-mixed model (MLMM) approach (Segura et al., 2012) implemented in R was used for GWAS in the NAM population, as described previously (Bouchet et al., 2017). The MLMM approach performs stepwise regression involving both forward and backward regressions, accounts for major loci and reduces the effect of allelic heterogeneity. The family effect was fitted as a co-factor and a random polygenic term (kinship relationship matrix) was also accounted for in the MLMM model. Bonferroni correction with α = 0.05 was used to determine the cut-off threshold for each trait association (α/total number of markers = 5.4 × 10−7).
For comparison with NAM, GWAS was performed in the sorghum association panel (SAP; 334 accessions) (Casa et al., 2008) using general linear model (GLM) and compressed mixed linear model (CMLM) with the GAPIT R package (Lipka et al., 2012) to match a previous study (Morris et al., 2013). The GLM (naive model) did not account for population structure and was specified as: where y is the vector of phenotypes, α is a vector of SNPs effects, and e is the vector of residual effects, and S is the incident matrix of 1s and 0s relating y to α. The CMLM model (full model) accounted for population structure and polygenic background effects (kinship) was specified as: where y is the vector of phenotype, and u is a vector of random genetic background effects. X, Q, and Z are incident matrices of 1s and 0s relating y to β and u (Yu et al., 2006). The phenotypic data in the SAP used for GWAS is from a previous study (Brown et al., 2008; Morris et al., 2013). A custom script was used to identify QTL that overlapped within a 50 kb window between the NAM and GWAS (GLM or CMLM) mapping results for LBL and RL.
Effect size and allele frequency estimation
Allele frequencies at the SNPs were calculated using snpStats package in R (Clayton 2015). The additive effect size of QTL within and across families were estimated as the difference between the mean of the two homozygous classes for each QTL divided by two. The additive effect of each QTL was estimated relative to RTx430. The sum of squares of QTLs divided by the total sum of squares gave the proportion of variance explained. To estimate within-family variation explained by each QTL, a regression model was fit with terms for family and QTL nested within family as fixed effects (Würschum et al., 2012): where yijkl is the phenotype, γi is the family term, ωjk is the term for QTL nested within family, and εijk is the residual.
Grass homologs search around identified loci and enrichment analysis
A set of known inflorescence genes (n = 29) that control inflorescence morphology in grasses were compiled from literature consisting of 17 maize genes, eight rice genes, three sorghum genes, and one foxtail millet gene (Supplementary File 1). Based on this candidate gene set, 135 sorghum homologs were found using Phytozome (Goodstein et al., 2012). A custom R script was used to search for homologs within 150 kb window upstream and downstream of each association, based on the LD decay rate in the NAM population (Hu et al., 2019). The enrichment of a priori genes around identified QTL was performed using chi square test to compare a priori gene colocalization around identified QTL with colocalization of QTL with random genes from the sorghum genome version 3.1 gff3 file on Phytozome.
Geographic allele distribution and climatic association of inflorescence QTL
The distribution of the alleles of QTL markers underlying inflorescence variation was visualized on a global geographic map with national boundaries. Markers of QTL associated with inflorescence morphology that also colocalized with sorghum orthologs of maize and rice inflorescence genes were first selected from the genetic data of a global georeferenced sorghum germplasm (number of accession = 2227, number of SNP markers = 431691). The alleles of the NAM markers found in the georeferenced germplasm were then plotted based on the geographic coordinates of the individuals in which they are present. Furthermore, climatic genome wide association was performed between the annual mean precipitation and the genetic data of the georeferenced sorghum global germplasm using both the naive model (GLM) and the mixed model that accounted for kinship only.
Data availability
Phenotype and genotype data are available at FigShare: [add after acceptance]. File S1 contains detailed descriptions of QTL information, a priori gene list and a priori genes that colocalized with QTL. File S2 contains heatmap of QTL effects within NAM families. File S3 contains detailed description of associations that colocalized between NAM, GLM, and CMLM and results of association of inflorescence QTL markers with precipitation for both GLM and MLM. The NAM population seeds are available from the USDA National Plant Germplasm System (https://www.ars-grin.gov/). Raw sequencing data for the NAM population are published (Bouchet et al., 2017) and available in the NCBI Sequence Read Archive under project accession SRP095629 and on Dryad Digital Repository (doi:10.5061/dryad.gm073).
RESULTS
Variation of inflorescence morphology in the NAM population
Phenotypic measurements were collected for four inflorescence morphology traits across five environments, representing over 198,000 observations. The number of RILs in each family ranged from 202 in the Segaolane family to 233 in the SC265 family (Supplementary Table 1). Significant genotypic differences were observed for all four inflorescence traits (Table 2). The broad-sense heritability estimates for all four traits were high, ranging from 0.59 to 0.92. The SC265 and SC283 families had the longest lower branches (mean across RILs of 99 mm). The SC283 family had the longest upper branches (mean across RILs of 64 mm). The SC265 and Segaolane families had the longest rachis, with mean lengths of 316 mm and 305 mm, respectively. The largest rachis diameters were observed in the Ajabsido, Macia, and SC35 families (a mean of ~9.5 mm across RILs in each family). Phenotypic variation distribution within families showed that in some families the mean trait value of the RILs was greater than the mean of either parent (Figure 1). The highest trait-by-trait phenotypic correlations were for RL and LBL (r = 0.71; P-value < 0.01). By contrast, UBL and LBL had a low positive correlation (r = 0.19; P-value < 0.01), and RL had no correlation with either UBL or RD (Figure 2).
QTL variation in the NAM population
A total of 116,405 SNPs were obtained after SNP calling, imputation, and filtering (minimum MAF = 5%). After filtering for 0.96 inbreeding coefficient, a total of 92,391 markers were identified. Significant QTL associations were observed for all traits when using MLMM, JL, and NJL models (Figure 3, Supplementary Figure 3–4). MLMM identified nine significant associations in total for all traits. The JL model identified 81 QTL, while the NJL model identified 40 QTL across all traits (Supplementary File 1 and Supplementary Table 2). Allele frequencies at the QTL ranged from 0.05 to 0.48 (Supplementary File 1). The proportion of within-family variation explained by all QTL (i.e. an estimate of the oligogenic component) varied substantially among traits, with 12%, 37%, 31%, and 21% of variation explained by QTL for UBL, LBL, RL, and RD, respectively. Within-family and across-family effect of each QTL for NJL and JL models were estimated relative to RTx430 (Supplementary File 2). LBL QTL qSbLBL7.5960 explained the largest proportion of variation among all QTL identified in this study (Table 3).
QTL colocalization and enrichment with a priori candidate genes
To assess the overall role of variation at ancestral inflorescence regulators, we performed colocalization and enrichment analysis between the QTL and a set of a priori candidate genes containing sorghum homologs of rice, maize, and foxtail millet genes (n = 135). NAM QTL were significantly enriched for colocalization with a priori candidate genes (2.4-fold enrichment; P-value < 0.001). Of 123 unique QTL, 28 colocalized with a priori genes. Among the QTL that overlapped with a priori candidate genes, two QTL were inside the gene, three QTL were <15 kb from the gene, 16 unique QTL were 15–100 kb from the gene, and eight unique QTL were 100–150 kb from the genes (Table 4 and Supplementary File 1). Overall, 24 genes colocalized with inflorescence QTL, while 111 a priori candidate genes (of 135) did not overlap with any inflorescence QTL (Supplementary File 1).
Comparison of NAM and diversity panel GWAS
NAM provides an independent approach to validate GWAS QTL from diversity panels and assess the relative performance of GWAS models. We compared the inflorescence loci identified in the NAM with GWAS QTL for LBL and RL identified in the SAP, identifying colocalization (within 50 kb) between NAM QTL SNPs and top 5% SNP associations in the GLM or CMLM (Figure 4 and Supplemental File 2). For LBL, the comparison revealed 26 overlaps between NAM versus GLM, and 20 overlaps between NAM versus CMLM. For RL, the comparison revealed 17 overlaps for both NAM versus GLM and NAM versus CMLM. To identify gene candidate that are supported by multiple mapping approaches, a priori candidate genes were catalogued in overlapping NAM and GWAS QTL (Supplementary File 3). For LBL, five a priori candidate genes colocalized with overlapping NAM and GLM QTL, while two a priori candidate genes colocalized with overlapping NAM and CMLM QTL. Similarly for RL, six a priori candidate genes colocalized with overlapping NAM and GLM QTL, and six genes colocalized with overlapping NAM and CMLM QTL.
Geographic distribution of allele and environment-marker associations
A strong geographic pattern of distribution of SNP alleles associated with inflorescence morphology was observed (Figure 5). The LBL-associated C allele (S10_56303321) near the sorghum ortholog of APO1 was predominant in Horn of Africa region, Yemen, Southern Africa, Southern India, and Eastern China. The T allele was predominant in West Africa. For the marker (S3_4750709) associated with upper branch length that colocalized with the sorghum ortholog of ramosa2, the allele was predominant in India, south-eastern Africa, and Sahelian region of West Africa. While the G allele was predominant in Nigeria and tropical West Africa, and eastern China. For the LBL-associated SNP (S7_59751994) near YUCCA5 (i.e. the sparse inflorescence1 paralog), one allele was predominant in West Africa and India, while the other allele was predominant in Southeastern Africa. All the QTL markers were found to be significantly associated with precipitation under the GLM but not with the MLM that accounted for kinship (File S3).
DISCUSSION
Genetic architecture of inflorescence adaptation
Nested association mapping can help characterize the genetic architecture of adaptive traits while avoiding some pitfalls of GWAS. This sorghum NAM study provides a high-powered dissection of genetic architecture for global variation in inflorescence morphology, a key trait for adaptation across agroclimatic zones (Harlan and de Wet, 1972; Olatoye et al., 2018). Our study identified many new loci (Table 3) and provided more precise mapping of known loci (Brown et al., 2006; Witt Hmon et al., 2013). Among the known QTL is LBL QTL (qSbLBL7.598), which appears to be pleiotropic with RL (as qSbRL7.598). Previous linkage mapping studies identified association around this same Dw3 region for QTL associated with rachis length and primary branch length (Brown et al., 2006; Shehzad and Okuno, 2015) and YUCCA5 was proposed as a candidate gene for the branch length QTL (Brown et al., 2008).
The preponderance of moderate and large effect QTL for four inflorescence morphology traits suggests a predominantly oligogenic trait architecture for inflorescence variation in global sorghum diversity (Supplementary Table 2, Supplementary Figure 3–4, Supplementary File 1-2). Note, a PVE estimate that would be considered “small effect” (e.g. 1%) in a typical biparental study (e.g. 100–300 RILs) is better characterized as “moderate effect: in a NAM population, given the denominator is phenotypic variance across many diverse families and thousands of RILs. In previous studies of sorghum inflorescence in biparental populations, large effect loci (explaining up to 19% of the variance) were found, but effect size of these loci may have been upwardly biased due to the Beavis effect (Xu, 2003). The population size of the NAM (2200 RILs) used in this study should provide a more robust estimation of QTL effect size, which are expected to be accurate with population sizes >1000 (King and Long, 2017). In maize, effect size distribution of loci associated ear and tassel traits has been linked to strong directional selection during maize domestication (Brown et al., 2011; Xue et al., 2016). In sorghum, the moderate to large effect loci identified here may reflect selection towards multiple contrasting fitness optima during the adaptation to contrasting agroclimatic zones, consistent with Fisher’s geometric model under disruptive selection (Orr, 2005; Tenaillon, 2014).
Epistasis may be reflected in asymmetric transgressive variation (Rieseberg et al., 1999; Gaertner et al., 2012). The shift of the RIL means from the mid-parent value in some families, and some strongly skewed trait distributions in NAM RILs, suggest that epistasis may be pervasive (e.g. UBL in SC283 family or RD in SC1103 family; Figure 1). These results supports previous findings in a small population (Ben-Israel et al., 2012). Further evidence for epistatic interactions of additive QTL can be provided by opposite allelic effects of QTL across families (Buckler et al., 2009; Peiffer et al., 2014). Consistent with a hypothesis of gene-by-genetic background epistasis, inflorescence morphology QTL showed opposed allelic effects across families for 63% (82/131) of QTL (Supplementary File 2). Other QTL (16%) identified had consistent allelic effects in all families. These loci may influence inflorescence variation additively across multiple botanical races, or may reflect rare variants in the common parent. Further analyses to map interacting loci will be needed to characterize the role of epistasis in sorghum inflorescence variation (Chen et al., 2018).
Genetic correlation among traits due to linkage or pleiotropy can limit or promote adaptive evolution (Lynch and Walsh, 1998). LBL and RL had high phenotypic correlation (r = 0.71, P-value < 0.001) and had two major effect QTL that were in common (qSbLBL7.598/qSbRL7.598 and qSbLBL10.563/qSbRL10.563) (Figure 3). Given the large size of NAM population, and concomitant high mapping resolution, if these QTL colocalizations are not due to pleiotropy then linkage must be very tight (e.g. <2 cM). In maize, mutations in the YUCCA-family gene sparse inflorescence1 led to drastic reduction in both inflorescence rachis length and branch length (Gallavotti et al., 2008), suggesting pleiotropy as a parsimonious explanation for the genetic correlation of these traits. By contrast, the two branch length traits (LBL and UBL) had low phenotypic correlation and lack of colocalization between QTL, suggesting that they are largely under independent genetic control. Studies of the underlying molecular network (e.g. mutant analysis, spatiotemporal expression dynamics) should provide further insight on the basis of pleiotropic versus independent genetic control (Eveland et al., 2014).
Studies have shown evidence of sorghum traits are subject to local adaptation across agroclimatic zones (Morris et al. 2013; Olatoye et al. 2018; Wang et al. 2019). In this study, our results showed independent global geographic distribution of the alleles of inflorescence QTL markers that colocalized with a priori genes regulating inflorescence branch traits like lower branch length and upper branch length (Figure 5). This supports previous finding that there is an independent spread of multiple alleles influencing inflorescence traits (Morris et al. 2013). Furthermore, our results showed that these NAM inflorescence QTL markers’ alleles were not strongly associated with annual mean precipitation across global precipitation zones (Figure 5). However, there is evidence for clinal adaptation of sorghum inflorescence to regional precipitation gradient (Olatoye et al. 2018). This suggests that variation at some of these selected genes may not underlie clinal variation of inflorescence to precipitation gradient.
Our comparison of NAM and GWAS QTL highlights that naive GWAS models (GLM) can contain valuable associations signals for adaptive traits. The number of a priori candidate genes that colocalized with NAM versus GLM overlaps were higher than a priori candidate genes that colocalized with NAM versus CMLM overlaps. This finding suggests that while nominal GLM P-values are often inflated, the top associations in simple GLM may reflect true QTL that are not identified in MLM because they are colinear with polygenic variance (Figure 4 and Supplementary File 1) (Huang et al., 2010).
Evidence of variation in ancestral regulatory networks
Conserved regulatory networks underlying inflorescence development have been identified by comparative mutant and QTL studies (Kellogg, 2007; Zhang and Yuan, 2014). However, it is not yet known whether variation in these ancestral regulatory networks underlies local adaptation of inflorescence morphology. The enrichment of sorghum homologs of grass inflorescence genes at inflorescence QTL suggests that a substantial proportion of sorghum inflorescence variation is due to polymorphism in ancestral regulatory networks. Some of the a priori candidate genes that colocalized with inflorescence QTL were sorghum homologs of hormone transporters or biosynthesis enzymes that regulate inflorescence development. One example is at qSbLBL7.598/qSbRL7.598, which was centered on the intragenic region of YUCCA5 (putative flavin monooxygenase auxin biosynthesis gene) (Figure 3, Supplementary Figure 3–4). This YUCCA5 gene is a paralog of maize auxin biosynthesis gene sparse inflorescence1 (Spi1; 62% similar to Spi1) (Figure 3B). The peak SNP for this QTL is also 70 kb from the canonical sorghum height gene and auxin efflux transporter Dw3 (Sobic.007G163800) (Figure 3B and 3C).
Several other a priori candidate genes under QTL are homologs of transcription factors that regulate gene expression during inflorescence meristem differentiation in cereals. For instance, the top UBL QTL (qSbUBL3.475) colocalized with the sorghum ortholog of maize ramosa2 (ra2) encoding a C2H2 zinc-finger transcription factor (Sobic.003G052900, 92.7% similarity to ra2). In maize and sorghum, the ra2 transcript is expressed in a group of cells that predicts the position of axillary meristem formation in inflorescence (Bortiri et al., 2006; Eveland et al., 2014). The QTL qSbUBL4.585 colocalized with a sorghum paralog (Sobic.004G237300, 12% similar to BAD1) of maize Branch Angle Defective1 (BAD1), a TCP transcription factor that controls cell proliferation in tassel development (Bai et al., 2012).
An LBL and RL QTL (qSbLBL10.563/qSbRL10.563) colocalized with the sorghum ortholog of rice Aberrant Panicle Organization1 (APO1) (Sobic.010G220400, 90% similar to APO1), which encodes an F-box protein that regulates inflorescence meristem fate (Figure 3B and 3C) (Ikeda et al., 2007). Sorghum APO1 was also tagged (inside the gene) by a top branch length-associated SNP in a previous GWAS using the SAP (Morris et al., 2013), strongly suggesting this gene underlies variation for inflorescence compactness. Another RL QTL (qSbRL1.207) colocalized with the sorghum ortholog of rice TAW1 (TAWAWA1) gene (Sobic.001G219400, 70% similar to TAW1). TAW1 regulates rice inflorescence meristem development (Yoshida et al., 2013), so our study suggests it may condition natural variation for inflorescence morphology in grasses more generally.
Prospects for genomic prediction of inflorescence morphology
Obtaining locally-adaptive inflorescence morphology is an essential requirement for sorghum breeding programs globally. In field-based phenotypic selection, inflorescence morphology is directly observable prior to pollination. However, a shift to rapid-cycling genomics-enabled breeding in controlled conditions (Watson et al., 2018) would require accurate marker selection or genome prediction of inflorescence morphology along with other agronomic traits. Since the NAM founders originated from diverse agroclimatic zones, genotype-phenotype map we developed should be relevant for sorghum breeding programs globally. More diversity in inflorescence morphology is likely yet to be discovered in sorghum, since ~30% of global variation was not captured in the 11 NAM founder parents (Bouchet et al., 2017). Therefore, increasing the number of families in the NAM population should be beneficial for both increased mapping resolution and allelic diversity. Although this may increase phenotyping burden, the use of high-throughput phenotyping platforms could overcome this challenge (Crowell et al., 2016).
ACKNOWLEDGEMENTS
The study was carried out using the Beocat High-Performance Computing facility and Integrated Genomics Facility at Kansas State University. This study is contribution no. [add after acceptance] from the Kansas Agricultural Experiment Station.