Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential

Frances L. Bekele; Gillian G. Bidaisee; Mathilde Allegre; Xavier Argout; Olivier Fouet; Michel Boccara; Duraisamy Saravanakumar; Isaac Bekele; Claire Lanaud

doi:10.1101/2021.11.22.469505

Abstract

A genome-wide association study was undertaken to unravel marker-trait associations (MTAs) between SNP markers and yield-related traits. It involved a subset of 421 cacao accessions from the large and diverse collection conserved ex situ at the International Cocoa Genebank Trinidad. An average linkage disequilibrium (r²) of 0.10 at 5.2 Mb was found across several chromosomes. Seventeen significant (P ≤ 8.17 × 10^-5 (–log10 (p) = 4.088)) MTAs of interest, which accounted for 5 to 17% of the explained phenotypic variation, were identified using a Mixed Linear Model in TASSEL version 5.2.50. The most significant MTAs identified were related to seed number and seed length on chromosome 7 and seed number on chromosome 1. Other significant MTAs involved seed length to width ratio on chromosomes 3 and 5 and seed length on chromosomes 4 and 9. It was noteworthy that several yield-related traits, viz., seed length, seed length to width ratio and seed number were associated with markers on different chromosomes, indicating their polygenic nature. Approximately 40 candidate genes that encode embryo and seed development, protein synthesis, carbohydrate transport and lipid biosynthesis and transport were identified in this study. A significant association of fruit surface anthocyanin intensity co-localised with MYB-related protein 308 on chromosome 4. Testing of a genomic selection approach revealed good predictive value (GEBV) for economic traits such as seed number (GEBV = 0.611), seed length (0.6199), seed width (0.5435), seed length to width ratio (0.5503), seed/cotyledon mass (0.6014) and ovule number (0.6325). The findings of this study could facilitate genomic selection and marker-assisted breeding of cacao thereby expediting improvement in the yield potential of cacao planting material.

Introduction

Cacao, Theobroma cacao L., Malvaceae sensu lato [1], is an important Neotropical, perennial crop, on which the thriving global cocoa and chocolate industry is based. The World Cocoa Foundation, in 2012, reported that 40-50 million people worldwide depend on cocoa for their livelihood. The global export value of cocoa beans has been fluctuating between 8 billion USD and 10.5 billion USD over the past decade. The global chocolate market was valued at USD 106.6 billion in 2019 [2].

T. cacao L. is a diploid (2n = 10), allogamous species. Its genome is small; reported by Argout et al. [3] to span 411-494 Mb. Its putative centre of genetic diversity is at the headwaters of the Amazon River, South America [4], and it is indigenous to the Amazon and Orinoco rainforests. Currently, the majority of cacao cultivation, concentrated in West Africa, is still based on traditional varieties collected in South America prior to 1950 [5, 6]. There is much scope for enhancement of cacao planting material towards realizing the full potential of the crop. The significant progress made in cacao genomics in the last decade [7] should accelerate progress in improvement of cacao planting material.

Phenotypic characterisation and evaluation of cacao, based on conventional techniques, provide a relatively inexpensive and easy method of selection of superior genotypes where attributes are easy to score by observation of morphology or simple screening procedures. However, polygenic traits with significant environmental effects on their expression are more difficult to score without a tool for identifying and following the major genes influencing the phenotypes. For these, marker-assisted and genomic selections are advantageous [8]. Some traits of economic importance in cacao, such as yield and resistance to Black Pod disease, have been characterized as polygenic [9, 10], and it is thus desirable to tag these traits with molecular markers to facilitate early selection for the desired genotype. This involves the identification of quantitative trait loci (QTLs) [11] and the establishment of linkage maps [12-19, 10, 20-28, 11, 3, 29-33].

Fouet et al. [32] mapped approximately one Simple Sequence Repeat (SSR) tag for every 2 cM on the 10 linkage groups of T. cacao L. This set of Expressed Sequence Tag SSRs included 14 candidate genes for disease resistance and quality traits. The development of a genomic resource database [34] was crucial in the construction of a high-density gene map for T. cacao L. Cacao’s phylogenetic proximity to Arabidopsis, which has a well-described gene ontology, was useful for elucidating some of the metabolic pathways investigated by Argout et al. [34]. The sequencing of the cacao genome [3, 29] should facilitate the identification of candidate genes for traits of interest. Version 2.0 of the Criollo genome of Argout et al. [29] has 99% of the assembly anchored to the 10 chromosomes of the T. cacao L. genome. This will assist researchers in more easily developing superior cacao plants with disease and pest resistance, high yield, desirable flavour, favourable flavanol (antioxidant) content, self-compatibility [7], and other traits of economic interest or with potential health benefits [35].

The availability of robust phenotypic data for hundreds of accessions at the International Cocoa Genebank Trinidad (ICGT) [36–39] along with whole reference genome sequences for cacao allow us to fully explore genetic diversity in a large and diverse cacao collection and its relationship to phenotypic diversity, as outlined by Varshney et al. [40]. Since there is tight coverage with molecular markers along the cacao genome [3, 29, 41, 30], admixture studies and genome-wide association studies are facilitated.

Genome-wide association studies (GWAS) entail detecting significant associations between individual genetic markers, such as Single Nucleotide Polymorphisms (SNPs), linked to functional alleles from a dense genome-wide panel, with the phenotypic traits of a group of individuals [42, 43]. Population-wide associations may be detected between SNPs and causal polymorphisms (viz., QTL) that affect traits of interest such as yield. The genetic profiles of detected superior and other variants can be used for genomic prediction and selection [44, 45]. GWAS also involves searching for genotype-phenotype correlations in unrelated individuals [46, 47], and is based on non-random association of alleles in a population or linkage disequilibrium (LD) [48, 49].

Classical QTL bi-parental mapping (linkage) studies have demonstrated that some molecular markers explain a considerable amount of phenotypic variance in quantitative traits [47], but are constrained by the “paucity of large productive progenies from known parental origin for perennial crops” such as cacao [6, 50]. An advantage of GWAS over classical QTL studies is attributed to the fact that with cumulated cycles of recombination at the population level, associations are broken between a genetic factor determining a phenotypic trait and any marker that is not tightly linked to it. GWAS exploits all of the recombination events that have occurred in the evolutionary history of the germplasm under study, which allows a much higher mapping resolution compared with classical QTL mapping [51, 43]. However, factors such as selection, population admixture and family structure may result in spurious associations between phenotypes and a given marker. In this case, the phenotypes are not physically linked to the marker and co-localized genes, but are inferred to be. Robust GWAS models, using population structure as co-variate, allow the identification of mainly authentic (non-spurious) associations [43].

GWAS may have little or no advantage over QTL mapping in cases where LD is extensive [43]. In cross-pollinated crops like T. cacao L., where LD has been observed to decay rapidly (within 1–2 Mbps to an r² value (measure of LD) of about 0.1) in wild genotypes [49]), GWAS is expected to have high resolution. In this case, any marker showing a significant association with a trait is expected to be tightly linked to the gene affecting that trait.

GWAS have been used successfully for identifying phenotype-genotype associations for many traits [52–54]. In Arabidopsis thaliana, such traits include shade avoidance, heavy metal and salt tolerance, flowering time and life history traits [55, 51]. GWAS have been conducted in cacao [6] and several other crops including rice [56]; maize [57]; wheat [58]; sorghum [59]; barley [60]; rapeseed [61]; soybean [62]; peanut [63] and other plant species [64].

Allegre et al. [33] and Fouet et al. [32] identified and mapped SNPs and SSR markers useful as expressed sequence tags (ESTs) and constructed a high-density genetic map for T. cacao L. Both kinds of markers are co-dominant and thus powerful for genetic analysis. Of the 5,246 SNPs screened by Allegre et al. [33], 1536 were found corresponding to genes with putative functions. Of these, 851 SNPs displayed a distinct polymorphic pattern across a selection of cacao germplasm. The latter are SNPs located within a gene expressed sequence and are thus valuable for identifying candidate genes with functional roles in cacao. The average distance between adjacent markers, in the genetic map constructed by Allegre et al. [33], was 0.6 cM. The data are available at http://tropgenedb.cirad.fr.

The objectives of this research were to exploit the naturally occurring genetic variation in a large collection of cacao trees of diverse origin, including wild genotypes, which are conserved ex situ at the ICGT to:

Facilitate, via GWAS, the identification of SNP markers significantly associated with phenotypic traits (Marker-Trait Associations or MTAs) and putative candidate genes; and
Establish predictive values for phenotypic traits of interest, using a genomic selection approach, to examine the efficiency of this breeding strategy to improve cacao yield.

Materials and Methods

Germplasm studied

Four hundred and twenty-one (421) cacao accessions, including 263 wild genotypes (collected in the Amazon Basin [36]), were included in this study. Complete phenotypic data were available for 346 of these accessions (S1 Table). They represent 23 “accession groups”, as described by Bekele et al. [36] as well as most of the genetic groups defined by Motamayor et al. [65], which are conserved ex situ at the ICGT. Wild cacao types such as those of the AMAZ, GU, IMC, MO, NA, PA, POUND, RB, SCA and SPEC (1-54) accession groups [36], which have evolved over a long period of time, were included to improve the power of detection of associations between SNP markers and phenotypic traits of interest as recommended by Stack et al. [49].

Management of germplasm under study

The ICGT is situated at the University Cocoa Research Station, Centeno, Trinidad at an altitude of 15 m above sea level. Shade is provided by trees of Erythrina sp. planted 6 m apart, and bananas (Musa sp.) placed 4 m apart. The cacao trees are planted 1.8 m apart with typically up to 16 trees per plot for each accession. The soil type is Cunupia fine sandy clay with restricted internal drainage. Over a 30-year period from 1981, the average temperature was 26.3°C. It was 26°C for the period 1961 to 1991. This satisfied the optimal temperature requirements for growing cacao. The mean annual rainfall for 1981 to 2011 was 1945.2 mm, lower than the 2,392 mm recorded for 1961 to 1991 (Trinidad and Tobago Meteorological Office https://www.metoffice.gov.tt). The plants are irrigated as necessary during the dry season (January-June) each year. Regular weeding and pruning of the trees are undertaken. However, disease and pest control are avoided to facilitate scientific monitoring of these conditions. Fertilizers are applied at planting and regularly for only young trees. The trees are maintained within a low input system.

Phenotypic data collection

Cacao accessions were assessed in terms of 27 flower, fruit and seed traits as described by Bekele et al. and Bekele and Butler (Table 1) [36, 66]. The traits studied were found to be the most discriminative and taxonomically useful descriptors, which avoid redundancy. They were also selected for ease of observation, reliability of scoring, and, in the case of seed characters, agronomic and economic value [37, 38]. Sample collection was done at the ICGT and spanned the period 1992-2012. When possible, the full complement of samples was collected for each accession at a given time, but in most cases, samples were obtained over multiple years during the same season to preclude the effect of seasonality on phenotypic trait expression. The fruits characterized were the products of open pollination. These data are available online in the International Cocoa Germplasm Database (ICGD) (http://www.icgd.rdg.ac.uk/).

View this table:

Table 1.

Descriptors used for phenotypic characterisation (n).

Yield-related traits in cacao

The yield-related traits under investigation are listed in Table 1. Since seed/cotyledon mass and seed number per fruit have been reported to have moderate to high heritability [67–69], information on pod index (the number of pods/fruits required to produce 1kg of dried cocoa) is particularly useful to breeders. A low pod index is desirable since it is normally associated with large seed size, which is preferred by chocolate manufacturers, and is a reliable indicator of good yield potential. A maximum pod index of 16.5 fruits was a standard set for selection in Trinidad and Tobago [70].

Collection of genotypic data

SNP markers that provide good coverage of the cacao genome [33] were employed in this study. The selection of 836 SNPs in coding sequences, which displayed significant similarity with known protein sequences, as described by Argout et al. [34], was carried out by Allegre et al. [33]. Illumina SNP genotyping was performed with the Illumina BeadArray platform at the French National Genotyping Centre (CNG, CEA-IG, Evry, France), according to the GoldenGate Assay manufacturer’s protocol. The genotype calling of each marker was verified using reference genotypes and filtered, as described by Argout et al. [34] and Allegre et al. [33]. The QualitySNP pipeline was used for detection of SNPs in the unigenes. All of the SNPs employed for genotyping have been identified in orthologous genes or gene families and this facilitates reference to genetic information, made available via the genome browser, CocoaGen DB (http://cocoa-genome-hub.southgreen.fr/jbrowse).

Statistical analyses

Phenotypic data analysis

Qualitative data such as fruit shape classes were first converted to binary form. The quantitative traits that were found to deviate from normality were log-transformed. Tests of normality, log transformation of data that were not normally distributed, derivation of descriptive statistics and correlation analysis of the collated phenotypic data were performed using Minitab Version 18.

Genotypic data analysis

Determining population structure

Population structure was determined to allow estimation of marker-trait associations without including spurious associations [71]. This was necessary to satisfy the independence assumption under the null hypothesis on which the marker-trait association is based [72]. The Bayesian clustering software, STRUCTURE, [72–75, 54] was employed for this purpose. It defined the inferred ancestry of individuals, studied as coefficients of the individuals across sub-populations. Individuals with coefficients of membership of less than 0.7 were classified as admixed. The Q matrix was used to remove associations due to evolution and to keep only data that have close association to the marker trait. The allele frequencies correlated model used Markov Chain Monte Carlo (MCMC) simulations to estimate the group (cluster, K) membership of each individual studied, assuming Hardy-Weinberg and linkage equilibrium within groups, random mating within populations and free recombination between loci [72].

Multi-locus genotype data for 200 SNPs, distributed over all 10 chromosomes, with minor allele frequency (MAF) greater than 0.05 and low missing values (less than 10%) were analysed in STRUCTURE to describe and visualize population structure, based on allele frequencies of the data. The optimum K value that best defined the population structure was identified using the admixture model of ancestry, assuming correlated allele frequencies for K = 2 to 15 with 150,000 iterations during the burn-in period, 150,000 Markov Chain Monte Carlo repetitions and 10 independent runs for each genetic sub-population (K2-K15).

Analysis of inferred population structure

The STRUCTURE outputs were analysed to infer optimal K based on the method described by Pritchard [72]. The optimal K was chosen by plotting the log probability of the data, Pr (X | K), against a range of K values and selecting the one after which the curve formed a plateau, as indicated by the arrow in Fig 1, while also considering the consistency of the groupings across multiple runs with the same K. Runs for which the variance was not homogeneous with variances of the other runs with the same K value were excluded.

Fig 1 Plot of log of K versus number of clusters based on STRUCTURE analysis

Legend: Analysis of population structure of 421 cacao accessions using STRUCTURE - estimated LnP(K) of possible clusters (K) from 2 to 15.

When K is approaching a true value, L(K) plateaus (or continues increasing slightly).

The population structure of the 421 accessions studied was visualized using DARwin (Dissimilarity Analysis and Representation for Windows) version 6 (http://darwin.cirad.fr) [76]. DARwin was used to estimate pairwise Jaccard’s genetic dissimilarity indices using the 200 SNP markers employed in STRUCTURE Analysis. A tree was constructed by clustering accessions, based on a dissimilarity matrix using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA). Clade strength in the dendrogram was tested using 1000 bootstraps. This tree was rendered with iTOL (https://itol.embl.de/) to produce Fig 2.

Fig 2 Neighbour-joining tree based on UPGMA of 421 cacao genotypes

Legend: The tree was generated in DARwin Version 6 and rendered in iTOL version 6 (https://itol.embl.de/)

Bootstrap set at > =90;

Seven admixed groups are evident.

Genome-wide Association Study Analysis (GWAS)

The detection of associations between SNPs and traits is dependent on the phenotypic variance within the population that is explained by the SNP alleles [51]. This variance is determined by the “extent to which the two allelic variants differ in their phenotypic effect (effect size)” in the population under study [51]. TASSEL [77], version 5.2.50 for Windows, was employed to conduct GWAS. TASSEL employed a fixed effects linear model to test for association between genetic sites and phenotypes. A ‘main effects only’ model was generated using all variables in the input data. Both General Linear (GLM) and the Mixed Linear (MLM) models were used in this study.

MLM was used to correct covariances due to relatedness at the population level between genotypes (due to population structure) [78] as well as co-ancestry (kinship) or identity by descent. The inclusion of the K matrix allowed the inclusion of multiple backgrounds QTL as a random factor in the mixed linear model, as explained by Henderson [79]. The scaled, centred ‘IBS’ method, described by Endelman and Jannink [80], was used to estimate additive genetic variance and generate the Kinship matrix of relationships among genotypes using TASSEL.

The statistical model used for the MLM is as follows:

Y = Xβ + Zu + e

Where Y is the vector of observations, β an unknown vector containing fixed effects including genetic marker and population structure (Q); u is an unknown vector of random additive genetic effects from multiple background QTL for the individuals; X and Z are the known design matrices; and e is the unobserved vector of random residuals.

Each marker allele is fit as a separate class with heterozygotes fit as an additional class so that the resulting marker effect is not broken down into additive and dominance effects. For the most robust MLM model used, the minor allele frequency (MAF) was set to > 0.05 (the SNPS were ‘filtered’). No missing values were included since SNPs with more than 14% missing values were removed and imputation was performed to replace any other missing values using the Euclidean distance measure in the k-nearest-neighbour algorithm [81] generated for the 10 nearest neighbours [77].

Phenotypic data for 75 accessions were imputed based on their genetic profiles.

Testing the robustness of the Mixed Linear Model (MLM) output

Model comparisons were made to test for putative false associations. These involved comparisons of the results of running TASSEL using the General Linear Model (GLM) with the Structure (Q) matrix and Principal Components Analysis (PCA) matrix, respectively, as opposed to the MLM with Structure and Kinship (Q+K). Price et al. [82] described how PCA corrects for stratification in GWAS as an alternative to the Q matrix.

GLM and MLM were run with filtered SNPs (612) and also with unfiltered SNPs with missing values ≤ 14% (737 SNPs). MLM was also run with unfiltered SNPs with missing data > 14%, but less than 20% (836 SNPs) for comparison.

Tests of significance of association

The test of significance in the TASSEL 5 software routine derived from the F-distribution assumes that the traits analysed have normally distributed residual error. The stringent Bonferroni correction test to cater for multiple testing [77] was applied to the derived P values to test the significance of the associations between traits and markers, based on the association analysis using the MLM routine in TASSEL with the Q matrix for K=7, Run 7. Manhattan plots, generated in R [83], were also used to check for evidence of P value inflation using TASSEL and to identify significant MTAs.

Simple M in R [83] was used to determine the number of independent markers that should be used in testing to avoid the penalty of the stringent Bonferroni correction test, as prescribed by Gao et al. [84]. There were 664 SNP markers that were independent and this number of markers was used to infer the Bonferroni threshold of 7.53 × 10^-5 in the analyses with unfiltered SNPs.

To deduce positive associations, when more than one marker in the same linkage group was related to the same trait, only those separated by more than 2 cM in the reference map [29] were considered. This was reinforced by linkage disequilibrium (LD) decay patterns observed by Stack et al. [49] and in this study.

LD estimates are reported as squared correlations of allele frequencies (r²) and as D prime (D²). The correlation between alleles at two loci, r², and the standardized disequilibrium coefficient for determining whether recombination or homoplasy had occurred between a pair of alleles, D², were derived in this study. Fisher’s exact test was calculated to compare alleles at any two loci. The LD pattern was tested with 100,000 permutations in TASSEL to obtain P values for the tests. LD heatmap was used to scan for high linkage disequilibrium within chromosomes based on r² values. High LD was characterized by many red squares in the heatmap generated. Many red blocks together would correspond to haplotype blocks [85, 86].

LD decay was plotted in R [83] to show the points representing the distance, in Mb, on each linkage group/chromosome, at which the mean value of r² decreased to half of the maximum value. LD decay plots were generated and annotated for the chromosomes where significant marker trait associations were observed.

Quantile-quantile plots were generated in R [83] to search for evidence of bias in the GWAS, such as due to genotyping artifacts, and to discern the extent to which the observed distribution of the test statistic followed the expected (null) distribution.

The proportion of phenotypic variance explained by a marker was determined by the square of the partial correlation coefficient (R²%).

Genomic prediction

The predictive breeding value (GEBV) (along with the predictive error variance (PEV)) for each phenotypic trait was derived using ridge regression in TASSEL 5.2.50 software, which performed multiple correlation tests to compute correspondence between genotype and phenotype and accounted for collinearity to avoid bias. The genotypic data (737 SNPs in alpha format) were loaded and kinship analysis performed to generate an Identity by Descent (IBD) file. The 421 phenotypes and K matrix were selected and the genomic selection routine was run with 5-fold cross validation with 100 iterations. The Best Linear Unbiased Prediction (BLUP) model (linear mixed model to estimate random effects) (TASSEL 4 software) was used to derive BLUPs for each SNP marker.

Detecting candidate genes located within marker-trait association zones

The identification of candidate genes was done using the latest cocoa genome sequence (T. cacao Criollo genome version 2) [29] available on the cocoa genome hub (https://cocoa-genome-hub.southgreen.fr/jbrowse). A genetic linkage map, showing the position of putative candidate genes that co-localised with SNP markers associated with traits of interest (MTAs), was then constructed using SpiderMap (Rami, 2007 unpublished, Spidermap v1.7.1, free software, CIRAD).

Results

Phenotypic data analysis

The phenotypic data for the fully characterized accessions are presented in S2 Table. The large phenotypic variation expressed in this panel of cacao genotypes, based on the coefficients of variation (Table 2), indicated a diverse genetic background, which was suitable for GWAS. Correlation analysis revealed the interdependence of traits on one another. Of the 91 Pearson correlation coefficients, r, calculated for the quantitative traits, only 20 were not significant (P ≥ 0.05). It is noteworthy that the yield-related traits, seed number, seed/cotyledon mass and seed dimensions, were highly correlated (P ≤ 0.001) (Table 3a). As observed previously at the ICGT by Bekele et al. [37], seed number was negatively correlated (r = -0.205*** in this study) with individual dried cotyledon mass. Seed/cotyledon length and width were positively correlated with individual dried seed/cotyledon mass (r = 0.688*** and 0.751***, respectively). This suggests that the former traits can be used as indicators of the latter.

View this table:

Table 2.

Descriptive statistics for quantitative fruit and seed traits in wild, cultivated and unclassified cacao germplasm studied.

View this table:

Table 3A.

Pearson correlation coefficients for quantitative phenotypic traits.

There was also a strong correlation (0.568***) between ovule number and seed number that justifies prediction of the latter using the former when fruits are unavailable. The correlations of fruit width with pod index and seed length, r = -0.621*** and 0.680***, respectively, are also noteworthy (Table 3A).

The Spearman correlations for anthocyanin intensity in the various plant organs are presented in Table 3B. All of these correlations were significant (P ≤ 0.05) except for the anthocyanin pigment concentration in the pedicel and anthocyanin pigmentation of the cotyledon (r = -0.097). It was notable that the correlations involving mature fruit surface (ridges) anthocyanin intensity and that in the ligule and filament of the flower and in the seed/cotyledon were all significantly (P ≤ 0.05) negative (r = - 0.150, -0.257, -0.147, respectively).

View this table:

Table 3B.

Spearman correlation coefficients for anthocyanin intensity in various plant organs.

Tests of normality indicated significant deviation from normality for fruit length, width, fruit length to width ratio, total fresh seed mass, seed number, individual seed (cotyledon) mass and width, seed length to width ratio, pod index, ovule number, sepal length and width, and style length. Results of tests of normality performed on the natural log transformed values of fruit and seed quantitative traits indicated that natural log transformation corrected for the deviation from normality of these traits (S3 Table). The phenotypic data were thus natural log transformed to conduct GWAS. Untransformed data were also subjected to GWAS for comparison.

Population structure

Fifteen replicate runs of models using genotypic clusters (K) from 2 to 15 confirmed that K = 7 had the highest log-likelihood probability (log Pr (X | K) versus K) (Fig 1).

The population structure analysis revealed that 74% of the accessions could be stratified into seven sub-populations, while 26% could be regarded as admixtures. The constitution of the seven genetic clusters is presented in Table 4 and S4 Table. The IMC and AMAZ groups were clustered together as were also the SCA, MO and LCTEEN groups (Table 4; S4 Table). Several Upper Amazon accessions were genotyped as Trinitario, and are putatively mislabelled. The genetic diversity of the accessions studied is represented in the neighbour-joining tree in Fig 2, which depicts the seven clusters differentiated. (One SPEC accession was grouped alone and this was not considered a cluster).

View this table:

Table 4.

The fixation index as a measure of population differentiation due to genetic structure for each cluster identified by STRUCTURE analysis.

Linkage disequilibrium (LD)

D² greater than 0.6, where D² of 1 represents the highest amount of disequilibrium possible, was indicative of recombination or homoplasy between pairs of alleles (Table 5). With regard to the squared correlations of allele frequencies, r² values, none were observed to be greater than 0.2 and the mean r² across the 10 chromosomes was 0.1 (Table 5). Chromosomes 1 and 3 had mean r² > 0.14. These findings seem to support that of Stack et al. [49], who found that the wild genotypes studied (such as those from the Purus, Contamana, Curaray, Iquitos, Nanay, Marañón and Guiana genetic groups) exhibited very low overall LD, measured by r², which rapidly decayed within 1–2 Mbps to a value of around 0.1. In this study, an average decay of r² to 50% over chromosomes 1, 4, 5, 7 and 9 occurred at a relatively short distance of 5.21 Mb (9 cM) on average (S5 Table). Motilal et al. [87] reported decay to half over a distance of 9.3 cM on chromosomes 1 to 9 for the germplasm they studied. LD decay plots, based on r², for this study are presented in Fig 3A. These findings suggest that MTAs could be “localized in the genome with relatively high precision using an association mapping approach” [49], particularly when wild germplasm is studied.

Fig 3A. Plots modelling the decay in pairwise linkage disequilibrium coefficients (r²) as a function of the distance between markers in megabases (Mb).

Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 1; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 4; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 5; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 7; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 9.

View this table:

Table 5.

Results of linkage disequilibrium analysis.

It is noteworthy that for cultivated germplasm such as Trinitario (e.g., the ICS accession group), LD decay was observed by Stack et al. [49] to be very gradual with increasing marker distance, falling below 0.1 at approximately 30 Mbps. Marcano et al. [88] found LD to decay to half over 25-35 cM for the recently admixed populations (Meso-American Criollo and South American Forastero) they studied.

The linkage disequilibrium (LD) decay patterns, observed for the chromosomes in this study population and a LD heat map (Figs 3A and 3B, S5 Table), were used to inform the process of searching for putative candidate genes. JBrowse in version 2 of the Criollo B97-61/B2 genome (https://cocoa-genome-hub.southgreen.fr/) was used to locate candidate genes upstream or downstream of MTA zones. The search for putative candidate genes with relevant functional roles was conducted within defined intervals, based on linkage disequilibrium decay, spanning MTAs on each chromosome. For chromosomes 3, 5, 6 and 8, this was over a distance of 2.5 Mb upstream or downstream of the significant MTAs. It was over a distance of 5 Mb for chromosome 1, 1.6 Mb for chromosome 4, 0.87 Mb for chromosome 7 and 0.86 Mb for chromosome 9.

Fig 3B. Heatmap of linkage disequilibrium (r²) across the chromosomes 4 and 5 based on data for 421 cacao accessions genotyped using 612 filtered SNPs.

Legend: Markers were ordered on the x and y axes according to location along the chromosomes and each cell of the heat map represents a single marker pair.

The upper triangle, above the black diagonal, is colour-coded based on the r² value between SNPs while colours depicted in the lower triangle are based on P-values for the corresponding r² values.

Comparison of results generated for different models utilized in GWAS

When 836 SNPs (unfiltered and without correction for missing values >14% <20%) were used for the MLM routine with Bonferroni threshold of P ≤ 7.53 ×10^-5 (0.05/664; 664 being the number of independent SNP markers employed)), 81 significant associations were found with MLM + PCA (accounting for population structure) while 71 significant associations were found with MLM + Q (matrix derived using STRUCTURE). In comparison, when the SNPS were filtered (with MAF ≥ 0.05) in TASSEL prior to executing MLM with Q+K, 612 SNPS were retained, and 17 significant associations were observed.

Performing TASSEL without correcting for kinship using the GLM model, GLM + Q with SNPS filtered (MAF ≥ 0.05), resulted in 5410 significant marker-trait associations at P ≤ 7.53 × 10^-5. It was inferred that most of these MTAs were spurious. In addition, the Manhattan plots generated for the chromosomes showed no clearly distinct peaks. When the results were sorted on marker effects, R², greater than 0.2, 95 positive associations were obtained using GLM with filtered SNPs. One of these involved fruit wall hardness and TcSNP1411 on chromosome 3 (position 25,585,284) and TcSNP1334 also on chromosome 3 (position 31,031,130). However, when the robust MLM was used for GWAS, no significant MTAs were found involving this trait, which is of interest due to its putative correlation with resistance to Cocoa Pod Borer attack [37].

Putatively robust marker-trait associations

The close observed and expected distributions of the –log10(P) values in the MLM+K+Q model, suggested a reduction of potential spurious MTAs for MLM analysis based on 612 filtered SNPS. Consequently, significant associations obtained using this MLM model and Bonferroni threshold of P ≤ 8.17 × 10^-5 (–log10 (P) = 4.088) were scrutinized for potentially authentic/stable marker-trait associations with putative functional value. There was a predominance of positive associations between highly heritable qualitative traits such as fruit shape, anthocyanin intensity on fruit surface [89], fruit apex form, and flower filament anthocyanin intensity and SNP markers. However, there were also some significant associations between SNPs and quantitative traits. Seventeen significant (P ≤ 8.17 × 10^-5) MTAs were identified, including between seed number and log seed length and TcSNP 1335 on chromosome 7 (Fig 4) (at P ≤ 1.15 × 10^-14 and P ≤ 6.75 × 10^-05, respectively), and log seed number and TcSNP 785 on chromosome 1 (P ≤ 2.38 × 10^-05) (Table 6). Manhattan plots and associated Quantile-Quantile plots are presented for yield-related traits in Fig 5A and 5B. The relatively low number of SNPs significantly associated with phenotypic traits, 2.4% of all SNPs used, was probably partly due to the stringent statistical thresholds applied in this study.

Fig 4. Venn diagram depicting relationships among seed traits of interest based on common associations with SNP markers.

Fig 5A. Manhattan plots from genome-wide association analysis.

Legend: Genome-wide association plots across 8 cacao chromosomes for seven phenotypic traits that had statistically significant MTAs: filament anthocyanin intensity, fruit surface (ridges) anthocyanin intensity, log fruit length, log seed length, log seed number, seed length to width ratio, seed number.

Based on TASSEL version 5.2.50 results for 421 cacao accessions, 612 filtered SNPs and the Mixed Linear Model.
Chromosome “11” was designated for unmapped SNP markers.
X- and Y-axes represent the SNP markers along each chromosome and the -log10(P-value), respectively.
The red horizontal line corresponds to the Bonferonni significance threshold of P-values ≤ 8.17 × 10-5 (–log10 (P) = 4.088) and the blue line corresponds to a significance level of 0.005.

Fig 5B. Quantile–quantile plots of estimated−log10 (P) from genome-wide association studies.

Quantile–quantile plots of estimated−log10 (P) for filament anthocyanin intensity;

Quantile–quantile plots of estimated−log10 (P) for fruit surface (ridges) anthocyanin intensity;

Quantile–quantile plots of estimated−log10 (P) for log fruit length;

Quantile–quantile plots of estimated−log10 (P) for log seed length;

Quantile–quantile plots of estimated−log10 (P) for log seed number;

Quantile–quantile plots of estimated−log10 (P) for seed length to width ratio;

Quantile–quantile plots of estimated−log10 (P) for seed number

Legend: The plots provide no evidence of bias in the GWAS, such as due to genotyping artifacts, and display the extent to which the observed distribution of the test statistic followed the expected (null) distribution

The red line represents expected P-values with no associations.

View this table:

Table 6.

Most significant, yield-related and other marker-trait associations and variation explained

Consequently, the results were carefully scrutinized to discern MTAs just below the level of significance, which may also have functional importance.

Nine putative candidate genes with functional roles related to seed development, lipid biosynthesis and transfer and carbohydrate transport were identified on chromosome 1 (Table 7). It is noteworthy that TcSNP 785 on chromosome 1 was co-localized with gene Tc01v2_g025880, which is functionally significant since it encodes protein disulfide isomerase that may be required for proper pollen development, ovule fertilization and embryo development (Table 7). On chromosome 3, 11 putative candidate genes were detected, which encode traits associated with seed development and lipid accumulation (Table 7).

View this table:

Table 7.

Genes co-localized with SNP markers significantly associated with phenotypic traits.

Associations between seed length and TcSNP 953 (P ≤ 6.98 × 10^-04), pod index and TcSNP 667 (P ≤ 4.15 × 10^-04), pod index and TcSNP 555 (P ≤ 3.85 × 10^-04) and seed number and TcSNP 1160 (P ≤ 1.35 × 10^-04), all on chromosome 4 (Table 6), although below the prescribed level of significance with Bonferroni correction, suggest that chromosome 4 may contain a cluster or ‘hotspot’ of QTLs for yield-related traits. There were two putative candidate genes involved with seed development co-localised with TcSNP 344, six that encode for seed development that were co-localised with TcSNP 667, eight responsible for seed protein and development and sugar transport linked to TcSNP 555 and five with seed development functions co-localised with TcSNP 1160 (Table 7). In addition, seven genes involved with lipid formation and seed development were localised close to TcSNP 953 on chromosome 4.

It was noteworthy that when 612 filtered SNPs were employed for GWAS using the MLM, a minor association (at P ≤ 6.98 × 10^-04), below the stringent level of significance with Bonferroni correction, was observed between log seed length and TcSNP 953 at position 2,822,152 on chromosome 4. The latter marker is located 3.7 Kb upstream of the gene that encodes acyl transferase-like protein At3g26840, which controls seed mass and oil content in Assembly cotton 46 (Gossypium spp.). At3g26840 is involved in seed storage (globulins) and seed size in Assembly cotton 46 (Jako et al. 2001) [90]. Gossypium spp. are related to cacao, both being members of the Malvaceae family.

On chromosome 5, seven putative candidate genes were co-localised with TcSNP 733. These were all involved in seed development. One candidate gene encodes soluble inorganic pyrophosphatase 4, which is important for development, but also in stress resistance (including to cadmium ion response) in plants (https://www.uniprot.org/uniprot/Q9LFF9). The latter was not of functional significance in this study, but is important for optimised cocoa production systems. In addition, three functional candidate genes involved with seed development, sugar transport and fruit development were found to be co-localised with TcSNP 1110, and one involved with seed development was co-localised with TcSNP 823 on chromosome 5 (Table 7).

Five putative candidate genes, including a lipid transfer protein, were identified on chromosome 6, co-localised with a MTA involving TcSNP 180. Three putative candidate genes were found on chromosome 7, one of which was a sugar transporter. Likewise, three functional candidate genes with seed development roles were detected on chromosome 8 although the MTA, involving pod index, was below the stringent Bonferroni threshold (P ≤ 2.30 × 10^-04) (Table 7).

On chromosome 9, TcSNP 184 was co-localized with four putative candidate genes, one of which encodes Zinc finger protein CONSTANS-LIKE 5 (Table 7). The latter is involved in the regulation of flower development and regulation of transcription (https://www.uniprot.org/uniprot/Q9FHH8). Other genes detected were all involved in seed development.

It was notable that seed length, seed length to width ratio and seed number were significantly associated with markers on different chromosomes. This suggests putative oligogenic or polygenic inheritance of these yield-related traits.

In addition, seed number and orbicular and oblate fruit shapes were significantly associated with TcSNP 390 on chromosome 7, a possible indication of gene linkage or perhaps pleiotropy. This requires further investigation.

It is also significant that TcSNP 401 (20,485,872), located 199 Kb upstream of Tc00_t058610, which encodes a putative MYB-related protein 308, 2 Mb upstream of the gene, Tc04v2_g008890, which encodes a putative MYB family transcription factor, and 1.3 Mb upstream of the gene, Tc04v2_g009300, which encodes a MYB domain protein 20, was significantly associated with fruit surface (ridge) anthocyanin intensity on chromosome 4. These associations were similar to those found previously by Marcano et al. [50] and Motamayor et al. [89]. A significant association (P ≤ 10 × 4.57^-05) was also found between fruit surface anthocyanin intensity and SNP 644 on chromosome 9.

Other interesting results were obtained when MLM was performed with 737 (unfiltered) SNPs. There was a significant (P ≤ 3.81 × 10^-05) MTA, which was detected between sepal length and TcSNP 1334 on chromosome 3. Furthermore, an association between TcSNP 180 and seed number on chromosome 6, when MLM was performed with 836 unfiltered SNPs, may be of interest despite not being considered stable or robust. TcSNP 180 was co-localized with genes encoding seed development as well as drought tolerance and disease resistance (Table 7). The observation regarding drought tolerance is of considerable significance since the conditions at the ICGT, where these accessions were observed, are considered sub-optimal in terms of soil moisture content [37].

Of the 17 significant MTAs unravelled when MLM with 612 (filtered) SNPS was performed, there were 2 for fruit shape oblate (an uncommon phenotype in this diverse cacao germplasm sample), and orbicular on chromosome 3. These MTAs involved TcSNPs 1353 and 1477. The oblate shape is a trait associated with certain wild types, which have evolved over a long period of time. A well-known accession with this trait is CATONGO. Interestingly, loci controlling fruit shape were dispersed over several chromosomes, representing independent linkage groups, when MLM was performed using 737 (unfiltered) SNPs.

The highly significant MTAs involving quantitative yield-related traits, observed in this study, suggest stability of genomic regions involved, as was also reported by Marcano et al. [50]. However, the genetic markers, co-localized with genes, were not major (accounting for ≥ 20% of the phenotypic variation expressed) since 5 to 17 percent of the phenotypic variation expressed was explained by the marker effect (R²).

In summary, about 40 putative candidate genes of functional importance were identified in this study. These included those that encode protein precursors, carbohydrate transport, lipid synthesis/bioassembly, binding and metabolism, lipid transfer and seed storage, endosperm development and regulation of seed growth, embryo development leading to seed dormancy, seed development/morphogenesis and regulation of flower and pollen development and ovule fertilization as well as responses to water deprivation, cadmium contamination and other abiotic stresses and biotic (disease) stresses (Table 7).

Genomic prediction value of traits

The predictive (GEBV) values of the phenotypic traits studied are presented in Table 8. Of the qualitative traits studied, fruit basal constriction and filament anthocyanin intensity had predictive values greater than 0.5. The quantitative traits, seed number, seed mass, seed length, seed width, seed length to width ratio, pod index, fruit wall hardness, ovule number and fruit width had GEBV values greater than 0.5. The detection of several markers associated with yield-related traits with good predictive value, in this study, could facilitate genomic selection and marker-assisted selection in cacao. The yield-related traits, seed number and dried seed (cotyledon) mass had GEBV values of 0.611 and 0.6014, respectively. The GEBV value of cotyledon length and width, indicators of seed size, were 0.6199 and 0.5435, respectively. Interestingly, ovule number had the largest GEBV value of 0.6325. It is regarded as a reliable predictor of seed number, which is dependent on successful pollination.

View this table:

Table 8.

Predictive values (GEBV) of phenotypic traits associated with SNPS.

Discussion

It is noteworthy that more than 17 potentially useful MTAs were detected in this study. Several hundred marker trait associations or QTLs have previously been identified in cacao. Allegre et al. [33] referred to 300 of these associations. In addition, Motilal et al. [91] identified a QTL for resistance to P. palmivora close to the region identified by Clément et al. [10] on chromosome 4. Queiroz et al. [92] identified a major QTL linked to resistance to Witches’ Broom disease. Royaert et al. [93, 94] identified marker-trait associations for self-compatibility and resistance to Witches’ Broom, respectively, in a segregating mapping population of cacao. Sounigo et al. [95] found several associations related to SSRs and yield. Motamayor et al. [89] identified candidate genes regulating fruit colour. Da Silva et al. [96] identified markers on chromosome 4, which were putatively co-localized with a major gene encoding self-incompatibility. Osorio-Guarín et al. [97] detected two genes putatively associated with productivity (number of healthy fruits) and seven encoding Frosty Pod disease resistance.

It must be borne in mind that genetic variation of quantitative (polygenic, continuous) traits such as yield and disease resistance are controlled by the combined effects of QTL, epistasis (interactions between QTLs) [14], the environment and interaction between environment and QTL [98]. The use of only biallelic subsets of SNPs, in this study, could have excluded multiallelic loci, which may have contributed to additional variance expressed in the study population for polygenic traits, including those related to yield potential. Mir et al. [99] described yield as a very complex quantitative trait that is controlled by a network of a “large number of small effect minor genes or QTLs”. For such polygenic traits, with small effect size, increasing the sample size of the study population and densely sampling a population that shows phenotypic diversity should improve the power to detect meaningful associations [51]. However, the relatively small effect size of the markers associated with traits, in this study, where none of the markers explained more than 20% of the phenotypic variation expressed, is not unusual for quantitative traits. Most of the markers studied explained 5 to 11 % of the phenotypic variation expressed. TcSNP 1335, on chromosome 7, explained 17% of variation expressed for seed number.

Significant associations found in this study between certain traits, such as fruit anthocyanin intensity, shape and seed length to width ratio, seed number, and loci on different chromosomes (Table 6), may be explained by the fact that a large part of trait variance was explained by several marker-trait associations, as described by Semagn et al. [98]. There were 26 SNPs significantly associated with fruit shape, two on chromosome 1, seven on chromosome 2, nine on chromosome 3, three on chromosome 5, two on chromosome 7 and three on chromosome 8, based on the results of MLM using 737 unfiltered SNPs. It seems justifiable to hypothesize that minor genes as well as major genes affect fruit shape.

The presence of markers significantly associated with different traits, in the same genomic region, was also observed in this study. The traits were seed number and orbicular shape on chromosome 7 (SNP 390) and seed number and seed length, also on chromosome 7 (SNP 1335) (Tables 6 and 7). These associations may indicate co-localization of the respective markers with a gene or gene block with pleiotropic effect [100] or may represent the phenomenon of linked genes, each one coded separately for a specific trait, as described by Araújo et al. [11]. In the case of seed number and seed length, indicators of seed size, this putative linkage is noteworthy since pod index (a measure of yield potential) is derived from seed number and mass. The relevant associated putative genes may have adaptive influence due to linkage mediated by selective forces, as explained by Yeaman [101]. The likelihood of such a phenomenon being observed during this study was feasible due to the inclusion of at least 48 cultivated accessions, including 28 Imperial College Selections (ICS) (S1 Table). The latter reportedly evolved over a period of more than two hundred years in Trinidad and Tobago and were selected based on large seed size and seed number and favourable yield, among other selection criteria [70]. It must be noted that pleiotropic markers may facilitate simultaneous selection of the multiple traits with which they are significantly associated and thus gene pyramiding.

Putative candidate genes for yield-related traits

The storage compounds of cacao seeds are starch, lipids (fats) and storage proteins [102]. Bucheli et al. [103] investigated the variation of sugars, carboxylic acids, purine alkaloids, fatty acids, and endoproteinase activity during maturation of cacao seeds. Aspartic endoproteinase activity was observed to increase rapidly during seed expansion and a major change in the fatty acid composition occurred in the young embryo. Mustiga et al. [104] detected a major QTL explaining 24% of the relative level of palmitic acid on the distal end of chromosome 4, located close to the Thecc1EG017405 gene. The latter is an orthologue and isoform of the stearoyl-acyl carrier protein (ACP) desaturase (SAD) gene that is involved in fatty acid biosynthesis.

Cacao seeds also contain a vicilin-like globulin, a seed storage protein [105]. It is noteworthy that TcSNP 555 on chromosome 4 was co-localised with vicilin in this study.

There are three acyltransferases and a phosphohydrolase involved in the bioassembly of plant storage lipids, viz., glycerol-3-phosphate acyltransferase (GPAT), lyso-phosphatidic acid acyltransferase (LPAT), diacylglycerol acyltransferase (DGAT) and phosphatidate phosphohydrolase (PAPase) [90]. Fritz et al. [106] purified glycerol-3-phosphate acyltransferase from the post-microsomal supernatant of cocoa seeds.

Triacylglycerols (TAGs) are the major storage lipids in several plants, and serve as energy reserves in seeds that are later used for germination and seedling development [107, 108]. The terminal step in TAG formation in plants involves the catalytic action of diacylglycerol acyltransferase (DGAT) in the presence of acyl-CoA [107]. Developing seeds in Brassica napus have been reported to produce Diaylglycerol (DAG) during the active phase of oil accumulation [109].

The proteins encoded by candidate genes, which were co-localized with SNPs found to be significantly associated with yield-related traits, during this study, are presented in Tables 6 and 7. An association (just below the stringent significance level with Bonferroni correction) observed between seed length and TcSNP 953 on chromosome 4, at a position of 2,822,152 bp, 3.7 Kb upstream of a gene that encodes diacylglycerol acyltransferase (Acyltransferase-like protein At3g26840%2C chloroplastic), is among the most noteworthy of this study and warrants further investigation. Another putative candidate gene, unravelled during this study, was Tc01v2_g022850, which encodes Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily protein involved in Glycerolipid metabolism (chromosome 1, 2.5 Mb downstream of SNP 785 (31,785,158)) (Table 7). The results presented in Table 7 justify further investigation into the putatively functional roles of genes on chromosomes 1, 3, 4, 5, 7 and 9 in determining seed size and mass, components of yield potential in cacao.

Consensus MTAs involving yield-related traits

Significant (stable) associations between yield-related traits, seed length and seed length to width ratio, seed number and seed mass and pod index, and SNPs were found on chromosomes 1, 3, 4, 5 and 7 in this study (Table 7). Marcano et al. [50] found 8 significant associations between SSR markers and yield-related traits. These included associations with fresh seed mass on chromosomes 1, 2, 5, 6, 9 and 10, MTAs involving dried seed mass (100 seeds) on chromosomes 2, 4, 9 and 10 as well as one marker associated with seed number per fruit on chromosome 5. In their mapping study, dos Santos et al. [69] identified several QTLs flanked by the markers Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580 on chromosome 4, which were associated with pod index, dried individual seed mass, number of fruits harvested and number of healthy fruits harvested. They also identified a significant association between the marker, Tcm002s23708704, and pod index on chromosome 2. The study of dos Santos et al. [69] unravelled 13 candidate genes linked to yield (dried seed mass, pod number), on chromosomes 4 and 2. Nine of these genes are annotated as transmembrane transporters, specializing in sugar transport, two genes are involved in carbohydrate metabolism, one gene is involved in lipid metabolism, and one gene is involved in glucose metabolism [69]. Motilal et al. [87] identified three SNPs (TcSNP368, 697, 1370) on chromosomes 1 and 9 that were significantly associated with seed number. Clément et al. [20] found two QTLs for yield in the clone, POUND 12, located close to a QTL for yield, identified in IMC 78, on chromosome 4.

Previous studies thus commonly observed loci on chromosomes 1, 2, 4, and 9 to be associated with seed mass and dimensions in cacao [50, 69]. The findings of this study suggest that yield-related traits are associated with loci on chromosomes 1, 3, 4, 5, 6, 7, 8 and 9 (Table 7), putatively linked to functional genes. However, not all of these MTAs were highly significant. Two SNPs, TcSNP 344 and 953, were associated with seed length on chromosome 4 (Tables 6 and 7). TcSNP 953 is located at the top of chromosome 4 (Fig 6) and thus the MTA, though below the stringent Bonferroni level of significance, may be considered validated since it was observed in a common region where yield-related MTAs were located in the studies by dos Santos Fernandes et al. [69] and Clément et al. [20].

Fig 6 Genetic linkage map of T. cacao L. showing candidate genes co-localised with SNP markers associated with yield-related and other traits

Legend: Gene loci and proteins are shown on the right and genetic distances (Mb) are shown on the left. No candidate genes were identified on chromosomes 2 and 10

A highly significant MTA, observed in this study, involved seed number and TcSNP 785 on chromosome 1. Motilal et al. [87] also reported such an MTA on chromosome 1. Marcano et al. [50] identified MTAs on chromosome 1 involving fruit number, fresh seed mass per fruit, and seed length, width and thickness.

Marcano et al. [50] found that the ‘Criollo allele’ was favourable in 68% of the seed-marker associations studied, and inferred that the ‘Forastero allele’ may sometimes be favourable for these traits. Criollo, Forastero and Trinitario are widely recognised classes of cacao in the trade. The accessions in this study that displayed favourable yield potential were mainly Trinitario (cultivated germplasm) [36] and those with Criollo ancestry. This was due mainly to their large seed sizes (Table 9). This supports the deduction of Doebley et al. [110] that “cultivated species generally have larger fruits or seeds compared to their wild ancestors, indicating that fruit and seed size are major agronomic traits that have been selected in crops during their domestication.” However, several Upper Amazon Forastero types, in this study, also had favourable (low) pod index due to their large seed numbers (Table 9, S7A Fig and S7B Fig).

View this table:

Table 9.

Superior accessions in terms of yield-related traits and associated haplotypes (allelic variants) for SNP markers of interest.

Based on the findings of this study, elucidation and selection of genotypes associated with large seed size in T. cacao L. may thus be facilitated by using TcSNP953 and TcSNP344 (Table 6 and 7). The results of a preliminary evaluation to detect genotypes associated with favourable yield potential (pod index) are presented in Table 9 and involve TcSNP 785, TcSNP 953 and TcSNP 667. Further investigation with training populations of T. cacao L. in genomic selection studies, as described by Bhat et al. [111], are recommended.

Studies are also recommended to further investigate functional genomics associated with yield-related traits in cacao such as was done by dos Santos et al. [69]. Such studies, in wheat, have revealed transcription factors, which can affect seed number, genes involved in metabolism or signalling of growth regulators, genes determining cell division and proliferation related to seed size, and floral regulators that regulate inflorescence architecture and seed number. Genes involved in carbohydrate metabolism, affecting plant architecture and grain yield such as trehalose phosphate synthase (TPS) and trehalose phosphate phosphatase (TPP) genes have also been identified [112].

Recommended follow-up studies would entail expression analysis, involving transcriptomics. The DGAT gene, Tag1, from Arabidopsis was shown to encode an acyl-CoA-dependent DGAT [90]. Jako et al. [90] demonstrated that seed-specific over-expression of the Diacylglycerol Acyltransferase (DGAT) cDNA in wild-type Arabidopsis “enhances oil deposition and average seed mass, which are correlated with DGAT transcript levels”, and that DGAT has an important role in regulating the quantity of seed triacylglycerols (TAGs), the sink size in developing seeds and thus seed size. They also demonstrated that “over-expression of the acyl-CoA-dependent DGAT in a seed-specific manner in wild-type Arabidopsis plants results in increased oil deposition and average seed mass.”

Ohto et al. [113] found that the gene APETALA2, which is a member of a large family of transcription factors, influences embryo, endosperm, and seed coat development and determines seed size in Arabidopsis. Kroj et al. [114] found the transcription factor, ABI3, to be implicated in seed maturation and in the expression of genes coding for seed storage proteins in Arabidopsis.

Seed size has also been shown to be directly determined by carbohydrate import into seeds, in maize and rice, and involves SWEET-mediated hexose transport [115]. SWEET genes regulate the transport, distribution and storage of carbohydrates in plants, and are involved in many important physiological processes, including phloem loading, reproductive development, disease-resistance, stress response, and host-pathogen interaction. In this study, SWEET17 was localized on chromosome 4 upstream of TcSNP555 (Table 7), which was associated with pod index (P ≤ 4.29 × 10^-4). dos Santos et al. [69] identified a genomic region with copy-number variations of SWEET genes, also on chromosome 4, in their cacao QTL mapping study. In this study, SWEET 2 was also localized downstream of TcSNP1335, which was significantly associated (P≤1.15 ×10^-14) with seed number as well as with log seed length (P≤ 6.75 × 10^-05) on chromosome 7 (Table 6).

Anthocyanin pigmentation

Marcano et al. [50] identified three regions associated with pigmentation on different organs in cacao. They considered the possible co-localization of markers related to pigmentation of structures especially in a small region of the chromosome 4, in which they found the SSR marker, mTcCIR115, to be located. This sector includes the major locus identified by Crouzillat et al. [12] as responsible for controlling ‘seed colour’ in the Catongo x POUND 12 backcross progeny.

Stack et al. [49] reported that cacao ‘fruit colour’ is considered to be controlled by a single gene localized within “a narrow genetic region with a strong phenotypic effect”. In this study, fruit ridge anthocyanin concentration (fruit colour) was significantly associated with TcSNP 401 on chromosome 4, located at 20,485,872, about 199 Kb upstream of the gene Tc00_t058610 (mRNA), which encodes a Putative MYB-related protein 308 and 198 Kb upstream of the Transcription repressor MYB 6 (Table 7) (https://cocoa-genome-hub.southgreen.fr/). TcSNP 401 on chromosome 4accounted for 7.1% of the phenotypic variation in fruit colour observed (P ≤ 4.78 × 10^-06) in this study (Table 6). Motamayor et al. [89] detected four SNP variants on chromosome 4, between 20,878,891 and 20,879,148 bp within a MYB transcription factor gene (TcMYB113), which were inferred to encode ‘fruit colour differences’ between cacao varieties.

Another significant association (P ≤ 6.79 × 10^-05) for fruit surface anthocyanin intensity was found in this study when TASSEL MLM analysis was performed using 737 unfiltered SNPs. It involved TcSNP 1203, on chromosome 3 (located at 563,101), which accounted for 5.6 % of the phenotypic variation in fruit surface anthocyanin concentration in this cohort of germplasm.

MYB proteins are involved in regulatory networks controlling metabolism, including the synthesis of anthocyanins, responsible for the red pigmentation in cacao [116, 117]. Liu et al. [118] observed that overexpression of the Tc-MYBPA gene elicited increased expression of several genes encoding the major structural enzymes of the proanthocyanidin and anthocyanidin pathway in cacao.

There were also significant associations between filament anthocyanin concentration and SNPs on chromosomes 3 and 8 (TcSNPs 1183 and 1441, respectively) in this study (Table 6). All of the MTAs involving anthocyanin concentration in this study suggest multi-gene control of anthocyanin intensity in the mature fruit epidermis and flower filaments of cacao. Similarly, Marcano et al. [50] concluded that “biosynthesis of anthocyanins, involving several responsible enzymes, may produce a complex genetic system rather than one defined by a single gene.” Furthermore, the differential expression of pigmentation in the seeds and fruits of cacao, observed at the ICGT, as evidenced in the negative correlations in Table 3B, and as stated by Bartley [119], may be explained by the association of pigmentation of seeds and fruit pigmentation with several genomic regions. MTAs involving anthocyanin intensity warrant further investigation to determine their value for genomic selection due to the significance of this trait in differentiating among certain genotypes of interest, such as CCN-51 [89, 117], the nutraceutical value of anthocyanin and its putative role in cacao disease resistance [118].

Future prospects

The results of this study appear to support the observation of Rockman [120] that most complex traits (such as those related to yield in cacao) are controlled by several (putatively interacting) loci with small effects. Some phenotypic traits are controlled by a small number of loci with large effects (as is often the case for traits under biotic selection) [51] while others may have more complex genetic architectures. The latter may be controlled by many rare variants, each having a large effect on the phenotype or conversely, many common variants with only small effects on the phenotypes, as described by Lee et al. [121]. The causative variants may be clustered in one or a small number of genes or across many genes. The data presented in Table 7 [on gene ontology] provide evidence of polygenic control of yield-related traits in cacao. For such traits, it may be more effective to predict the performance of genotypes by using multiple molecular markers [122]. Multilocus mixed linear models (MMLMs) may be considered for future studies in cacao when complex traits are investigated because these incorporate multiple markers simultaneously as covariates in a step-wise MLM [123].

The results from this GWAS were complementary to some from previous QTL, admixture and other association mapping studies in cacao. Markers identifying novel QTL in this study should be validated in the future. The use of a substantially larger body of reliable SNPs and an even larger subset of diverse cacao germplasm than that used in this study should be useful to further unravel useful MTAs in cacao.

It is also recommended that further research be undertaken to establish whether the trait associations revealed in this study and putatively linked specific gene variants have a functional role via gene expression and protein synthesis in T. cacao L.. Such studies were conducted by Bailey et al. [124], and Pokou et al. [125] for disease traits, Lanaud et al. [7] for self-compatibility and dos Santos et al. [69] for yield components, and by Chai et al. [126] in maize for seed oil content.

Rebbeck et al. [127] stated that “the lack of reproducibility of many association studies might reflect the number of studies that involve genetic variants with no functional significance.” It is noteworthy that several MTAs detected by this study were also found in previous ones on cacao and functional roles are expected.

Once the functional roles of putative genes co-localized with markers with significant associations to traits of interest are elucidated, the effects of relationships of these putative genes with geography and local adaptation must be established, as recommended by McKown et al. [128]. Micheli et al. [129] have reported on functional genomics in cacao focusing on genes expressed under specific physiological conditions. Consistency in QTL effects over different genetic backgrounds must also be established. Individuals with favourable marker genotypes and haplotypes may then be used as parental types for enhancement or breeding programmes in targeted environments.

Despite the fact that yield is a complex trait, our results on potential genomic selection (GS) for yield traits are very promising, given the high predictive values obtained for these traits, generally superior to 0.5. The slow progress hitherto realized in cocoa breeding [6] may be improved by the advancement made towards genomic prediction and selection in T. cacao L. [130, 35, 131].

Genomic selection predicts genotypic performance using genome-wide marker data [132, 111]. There are prospects for GS or marker-assisted selection (MAS) [133] in cacao, whereby numerous (tens of thousands) genetic markers covering the whole genome may be employed so that all QTL are in linkage disequilibrium with at least one marker. GS-MAS has been found useful for complex traits controlled by many QTL and with low effect and low heritabilities (h²) [134] once the markers with the most significant associations with traits are close to functional genes [46]. Accurate prediction of plant phenotype from genotype through GS-MAS should be facilitated by the utilization of wild cacao germplasm representing different genepools as sources of favourable alleles for traits of interest. It is noteworthy that Romero Navarro et al. [130] used the results of GWAS and genomic prediction to identify associated markers and develop predictive models for frosty pod rot and black pod diseases, as well as yield traits in a population of improved clones. Similarly, McElroy et al. [131] predicted resistance to Moniliophthora spp. diseases in three related populations of cacao using a 15K single nucleotide polymorphism (SNP) microarray for GWAS and genomic selection. They concluded that the “GS framework holds substantial promise in accelerating disease-resistance in cacao.” The results of this study on yield traits substantiate the value of this molecular breeding method to improve cocoa yield.

Conclusion

In this study, carefully collated phenotyping data on traits of economic interest in cacao, such as yield potential, and SNP genotyping data, generated via transcriptome sequencing, were subjected to GWAS. A total of 421 cacao accessions were used for the GWAS. Thirty-one of these accessions represent promising material for breeding in terms of yield potential (Table 9). The goal was to use a large germplasm collection to decipher the genetic bases of yield traits and identify putative candidate genes linked to important phenotypic traits, as well as to simulate a genomic selection approach to evaluate its utility for cocoa breeding. By taking into account population structure and false discovery rates, genomic regions were found significantly associated with yield-related traits, fruit length, filament and fruit anthocyanin intensity and fruit shape.

The rather limited number of significant (stable) and robust associations (MTAs) detected in this study may be due to the prevalence of small effect size and rare genetic variants, which are not easily detected by GWAS. Genetic variants associated with complex traits, such as yield and disease resistance, are expected to have such small effects on function.

The results presented herein indicated oligogenic and polygenic control of yield-related traits in cacao. The stringently significant marker-trait associations related to yield, found in this study, were indications of the presence of quantitative trait loci on chromosomes such as 1, 3, 4, 5 and 7. They were validated by interval mapping analyses that found some corresponding QTL positions in other studies on chromosomes 1 and 4 [10, 20; 50; 87; 69]. Chromosome 4 putatively contains a QTL cluster associated with yield-related traits. This study may be unique in identifying putatively useful candidate genes, responsible for encoding yield-related traits via proteins involved in seed length and seed number determination, on chromosome 7 (Tables 6 and 7). Further studies for estimation of the functional effects of these putative candidate genes should be pursued.

A combination of genetics and functional genomics will facilitate understanding of gene function and gene interrelationships in cacao, as stated by Allegre et al. [33]. Fine mapping studies, such as that done for self-compatibility in cacao [7] and in cotton [135], will be useful especially since some putative candidate genes associated with seed development, seed lipid accumulation, metabolism and development and plant stress responses (including to drought and to cadmium) have been identified in this research and other studies in cacao. Genomic selection could be efficiently used to facilitate early selection of superior genotypes, using data from training and selected populations [136, 131]. The identification of yield-related traits with good predictive value, in the test population of this study, such as seed mass, number, length, width and length to width ratio as well as ovule number, could further facilitate genomic selection for yield potential in cacao. The performance of the non-phenotyped individuals at the Trinidad genebank (ICGT) could also thus be predicted if they are genotyped. This would be particularly useful for the enhanced genotypes (GEBP progeny) described by Bekele et al. [38]. Identification of parents possessing high predictive values and favourable alleles prior to crossing should prove beneficial for more rapid development of enhanced cacao progenies.

Author contributions

Conceived and designed the experiments: CL and FLB

Performed the experiments: FLB, GGB, XA, MA, OF, MB

Analyzed the data: FLB, CL, IB

Wrote the manuscript: FLB, CL, XA, DS

All authors reviewed and approved the final manuscript

Supporting information

S1 Table. Background information on the T. cacao L. accessions used in the analyses.

S2A Table. Phenotypic data for 346 cacao accessions fully phenotyped and used to generate descriptive statistics.

S2B Table. Genotype data used in GWAS.

S3 Table. Results of tests of normality performed on the natural log transformed fruit and seed quantitative traits.

S4 Table. Coefficients of membership for clusters of accessions based on STRUCTURE analysis.

S5 Table. Distances over which linkage disequilibrium decayed to 50 percent over chromosomes 1, 4, 5, 7 and 9.

S6 Data. Summary of significantly positive marker-trait associations.

S7A Fig. Summary Report for Pod index in wild cacao.

S7B Fig. Summary Report for Pod index in cultivated cacao.

Acknowledgements

The Director of CRC, Prof. Pathmanathan Umaharan, is gratefully acknowledged for endorsing this collaborative research. Useful discussions with Drs. Didier Clément, Christian Cilas and Martijn Ten Hoppen, CIRAD, France, Dr. Michelle End and Mr. R.A. (Tony) Lass, UK and Tricianna Maharaj, Trinidad, are deeply appreciated. J. Bhola, Dr. W. Mollineau, V. Badall, A. Richardson-Drakes, N. Persad, S. Samnarine, C. Jagroop, T. Jugmohan, E. Solozano and other individuals are gratefully recognized for technical assistance in phenotyping at various times during the period of study.

Financial support from the Government of Trinidad and Tobago, the Cocoa Research Association, UK and CIRAD, France that facilitated collation of the phenotypic and genotypic data, respectively, is gratefully acknowledged. However, the study design, conduct of this research and preparation of the manuscript were not influenced by the funding agencies.

References

1.↵
Alverson WS, Whitlock BA, Nyffeler R, Bayer C, Baum DA. Phylogeny of the core Malvales: evidence from ndhF sequence data. American Journal of Botany. 1999 Oct;86(10):1474–86. https://doi.org/10.2307/2656928
OpenUrl Abstract/FREE Full Text
2.↵
Expert market research (2020) Expert Market Research Report https://www.expertmarketresearch.com/reports/chocolate-market. Accessed August 6 2020.
3.↵
Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M. The genome of Theobroma cacao. Nature Genetics. 2011 Feb;43(2):101–8. https://doi.org/10.1038/ng.736
OpenUrl CrossRef PubMed
4.↵
Cheesman EE. Notes on the nomenclature, classification and possible relationships of cacao populations. Tropical Agriculture. 1944;21(8).
5.↵
Eskes A, Lanaud C. Cocoa. In: Tropical Plant Breeding. Eds Charrier A, Jacquot M, Hamon S, Nicolas D. CIRAD, Montpellier. 2001. pp. 78–105.
6.↵
Bekele F, Phillips-Mora W. Cocoa Breeding. In: Advances in Plant Breeding: Industrial and Food Crops, Vol 6. Eds J.M. Al-Khayri, et al. Springer-Verlag, Cham. 2019. pp. 409–87. https://doi.org/10.1007/978-3-030-23265-8_12.
OpenUrl
7.↵
Lanaud C, Fouet O, Legavre T, Lopes U, Sounigo O, Eyango MC, Mermaz B, Da Silva MR, Loor Solórzano RG, Argout X, Gyapay G. Deciphering the Theobroma cacao self-incompatibility system: from genomics to diagnostic markers for self-compatibility. Journal of Experimental Botany. 2017 Oct 13;68(17):4775–90. https://doi.org/10.1093/jxb/erx293
OpenUrl
8.↵
Ribaut JM, De Vicente MC, Delannay X. Molecular breeding in developing countries: challenges and perspectives. Current Opinion in Plant Biology. 2010 Apr 1;13(2):213–8. https://doi.org/10.1016/j.pbi.2009.12.011
OpenUrl CrossRef PubMed
9.↵
Simmonds NW. The breeding of perennial crops. In: Proceedings of the Workshop on the Conservation, Characterisation and Utilization of Cocoa Genetic Resources in the 21st Century, 13-17 September 1992. The Cocoa Research Unit, Port of Spain. 1993;156–62.
10.↵
Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping QTL for yield components, vigor, and resistance to Phytophthora palmivora in Theobroma cacao L. Genome. 2003a Apr 1;46(2):204–12. https://doi.org/10.1139/g02-125
OpenUrl CrossRef PubMed
11.↵
Araújo IS, de Souza Filho GA, Pereira MG, Faleiro FG, de Queiroz VT, Guimarães CT, Moreira MA, de Barros EG, Machado RC, Pires JL, Schnell R. Mapping of quantitative trait loci for butter content and hardness in cocoa beans (Theobroma cacao L.). Plant Molecular Biology Reporter. 2009 Jun 1;27(2):177–83. https://link.springer.com/content/pdf/10.1007/s11105-008-0069-9.pdf
OpenUrl
12.↵
Crouzillat D, Lerceteau E, Pétiard V, Morera J, Rodríguez H, Walker D, Phillips W, Ronning C, Schnell R, Osei J, Fritz P. Theobroma cacao L.: a genetic linkage map and quantitative trait loci analysis. Theoretical and Applied Genetics. 1996 Jul 1;93(1-2):205–14. https://doi.org/10.1007/BF00225747
OpenUrl CrossRef Web of Science
13.
Crouzillat D, Ménard B, Mora A, Phillips W, Pétiard V. Quantitative trait analysis in Theobroma cacao using molecular markers. Euphytica. 2000a Jul;114(1):13–23. https://doi.org/10.1023/A:1003892217582
OpenUrl
14.↵
Crouzillat D, Phillips W, Fritz PJ, Pétiard V. Quantitative trait loci analysis in Theobroma cacao using molecular markers. Inheritance of polygenic resistance to Phytophthora palmivora in two related cacao populations. Euphytica. 2000b Jul;114(1):25–36. https://doi.org/10.1023/A:1003994212394
OpenUrl
15.
N’Goran JA, Risterucci AM, Clément D, Sounigo O, Lorieux M, Lanaud C. Identification of quantitative trait loci (QTL) in Theobroma cacao L. L. Agron Afr. 1997;9:55–63.
OpenUrl
16.
Lanaud C, Kébé IS, Risterucci AM, Clément D, N’Goran JA, Grivet L, Tahi GM, Cilas C, Pieretti I, Eskes A, Despréaux D. Mapping quantitative trait loci (QTL) for resistance to Phytophthora palmivora in T. cacao. In: Proceedings of the 12th International Cocoa Research Conference, November 17 1996, Bahia, Brazil. Cocoa Producers’ Alliance, Lagos. 1999;99–105.
17.
Lanaud C, Boult E, Clapperton J, N’Goran JKA, Cros E, Chapelin M, Clément D, Petithugenin P. Identification of QTLs related to fat content, seed size an sensorial traits in Theobroma cacao L. In Proceedings of the 14th International Cocoa Conference, 13 18 October 2003, Accra, Ghana. Cocoa Producers’ Alliance, Lagos. 2005;1119–26.
18.
Lanaud C, Fouet O, Clément D, Boccara M, Risterucci AM, Surujdeo-Maharaj S, Legavre T, Argout X. A meta-QTL analysis of disease resistance traits of Theobroma cacao L. Molecular Breeding. 2009 Nov;24(4):361–74. https://doi.org/10.1007/s11032-009-9297-4
OpenUrl
19.↵
Flament MH, Kébé I, Clément D, Pieretti I, Risterucci AM, N’Goran JA, Cilas C, Despréaux D, Lanaud C. Genetic mapping of resistance factors to Phytophthora palmivora in cocoa. Genome. 2001 Feb 1;44(1):79–85. https://doi.org/10.1139/g00-099
OpenUrl PubMed
20.↵
Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping quantitative trait loci for bean traits and ovule number in Theobroma cacao L. Genome. 2003b Feb 1;46(1):103–11. https://doi.org/10.1139/g02-118
OpenUrl PubMed
21.
Clément D, Lanaud C, Sabau X, Fouet O, Le Cunff L, Ruiz E, Risterucci AM, Glaszmann JC, Piffanelli P. Creation of BAC genomic resources for cocoa (Theobroma cacao L.) for physical mapping of RGA containing BAC clones. Theoretical and Applied Genetics. 2004 May 1;108(8):1627–34. https://doi.org/10.1007/s00122-004-1593-0
OpenUrl CrossRef PubMed Web of Science
22.
Risterucci AM, Paulin D, Ducamp M, N’Goran JA, Lanaud C. Identification of QTLs related to cocoa resistance to three species of Phytophthora. Theoretical and Applied Genetics. 2003 Dec 1;108(1):168–74. https://doi.org/10.1007/s00122-003-1408-8
OpenUrl CrossRef PubMed
23.
Pugh T. Etude du déséquilibre de liaison chez le cacaoyer appartenant aux groupes Criollo/Trinitario. Application au marquage génétique d’intérêt pour la sélection. Thèse Doctorat, Ecole National Supérieur d’Agonomie, Montpellier. 2005;107p.
24.
Pugh T, Fouet O, Risterucci AM, Brottier P, Abouladze M, Delettrez C, Courtois B, Clément D, Larmande P, N’Goran JA, Lanaud C. A new codominant marker-based cocoa linkage map: development and integration of new microsatellite markers into cocoa linkage map. A new cocoa reference map. In Proceedings of 14th International Cocoa Research Conference, Accra, Ghana, 13-18 October 2003 2003. Cocoa Producers’ Alliance, Lagos. 2005;153–60.
25.
Brown JS, Schnell RJ, Motamayor JC, Lopes U, Kuhn DN, Borrone JW. Resistance gene mapping for witches’ broom disease in Theobroma cacao L. in an F2 population using SSR markers and candidate genes. Journal of the American Society for Horticultural Science. 2005 May 1;130(3):366–73. https://doi.org/10.21273/JASHS.130.3.366
OpenUrl Abstract/FREE Full Text
26.
Brown JS, Phillips-Mora W, Power EJ, Krol C, Cervantes-Martinez C, Motamayor JC, Schnell RJ. Mapping QTLs for resistance to frosty pod and black pod diseases and horticultural traits in Theobroma cacao L. Crop Science. 2007 Sep;47(5):1851–8. https://doi.org/10.2135/cropsci2006.11.0753
OpenUrl CrossRef
27.
Brown JS, Sautter RT, Tondo CT, Borrone J, Kuhn D, Motamayor J, Schnell R. A composite linkage map from the combination of three crosses made from commercial clones of cacao, T. cacao L. Trop Plant Biol. 2008 Apr 22;1(2):120–30. https://doi.org/10.1007/s12042-008-9011-4
OpenUrl CrossRef
28.↵
Faleiro FG, Queiroz VT, Lopes UV, Guimarães CT, Pires JL, Yamada MM, Araújo IS, Pereira MG, Schnell R, de Souza Filho GA, Ferreira CF. Mapping QTLs for witches’ broom (Crinipellis perniciosa) resistance in cacao (Theobroma cacao L.). Euphytica. 2006 May;149(1):227–35. https://doi.org/10.1007/s10681-005-9070-7
OpenUrl
29.↵
Argout X, Martin G, Droc G, Fouet O, Labadie K, Rivals E, Aury JM, Lanaud C. The cacao Criollo genome v2. 0: an improved version of the genome for genetic and functional genomic studies. BMC Genomics. 2017 Dec 1;18(1):730. https://doi.org/10.1186/s12864-017-4120-9
OpenUrl CrossRef
30.↵
Saski CA, Feltus FA, Staton ME, Blackmon BP, Ficklin SP, Kuhn DN, Schnell RJ, Shapiro H, Motamayor JC. A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6. BMC Genomics. 2011 Dec;12(1):413–25. https://doi.org/10.1186/1471-2164-12-413
31.
Feltus FA, Saski CA, Mockaitis K, Haiminen N, Parida L, Smith Z, Ford J, Staton ME, Ficklin SP, Blackmon BP, Cheng CH. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes. BMC Genomics. 2011 Dec;12(1):1–6. https://doi.org/10.1186/1471-2164-12-379
OpenUrl CrossRef PubMed
32.↵
Fouet O, Allegre M, Argout X, Jeanneau M, Lemainque A, Pavek S, Boland A, Risterucci AM, Loor G, Tahi M, Sabau X. Structural characterization and mapping of functional EST-SSR markers in Theobroma cacao. Tree Genetics & Genomes. 2011 Aug;7(4):799–817. https://doi.org/10.1007/s11295-011-0375-5
OpenUrl
33.↵
Allegre M, Argout X, Boccara M, Fouet O, Roguet Y, Bérard AU, Thévenin JM, Chauveau AU, Rivallan R, Clément D, Courtois B. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L. DNA research. 2012 Feb 1;19(1):23–35. https://doi.org/10.1093/dnares/dsr039
OpenUrl CrossRef PubMed
34.↵
Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, Risterucci AM, Da Silva C, Cascardo J, Allegre M, Kuhn D. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics. 2008 Dec 1;9(1):512. https://doi.org/10.1186/1471-2164-9-512
OpenUrl CrossRef PubMed
35.↵
Ribeyre F, Sounigo O, Argout X, Cilas C, Efombagn MI, Denis M, Bouvet JM, Fouet O, Lanaud C. The genomic selection of Theobroma cacao L: a new strategy of marker assisted selection to improve breeding efficiency and predict useful traits in new populations. International Symposium on Cocoa Research. Lima, Peru, 13 17 November 2017. ICCO, London. http://agritrop.cirad.fr/589763/1/ID589763.pdf
36.↵
Bekele FL, Bekele I, Butler DR, Bidaisee GG. Patterns of morphological variation in a sample of cacao (Theobroma cacao L.) germplasm from the International Cocoa Genebank, Trinidad. Genetic Resources and Crop Evolution. 2006 Aug;53(5):933–48. https://doi.org/10.1007/s10722-004-6692-x
OpenUrl
37.↵
Bekele FL, Bidaisee GG, Singh H, Saravanakumar D. Morphological characterisation and evaluation of cacao (Theobroma cacao L.) in Trinidad to facilitate utilisation of Trinitario cacao globally. Genetic Resources and Crop Evolution. 2020a Mar;67(3):621–43. https://doi.org/10.1007/s10722-019-00793-7
OpenUrl
38.↵
Bekele F, Bidaisee G, Saravanakumar D. Examining phenotypic diversity and economic value of cacao (Theobroma cacao L.) conserved at the International Cocoa Genebank, Trinidad to support improvement in cocoa yield globally. Tropical Agriculture. 2020b (released 2021 Feb 25);97(2). https://journals.sta.uwi.edu/ojs/index.php/ta/article/view/7970
39.↵
Iwaro AD, Bekele FL, Butler DR. Evaluation and utilisation of cacao (Theobroma cacao L.) germplasm at the International Cocoa Genebank, Trinidad. Euphytica. 2003 Mar;130(2):207–21. https://doi.org/10.1023/A:1022855131534
OpenUrl
40.↵
Varshney RK, Terauchi R, McCouch SR. Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol. 2014 Jun 10;12(6):e1001883. https://doi.org/10.1371/journal.pbio.1001883
OpenUrl CrossRef PubMed
41.↵
Livingstone DS, Motamayor JC, Schnell RJ, Cariaga K, Freeman B, Meerow AW, Brown JS, Kuhn DN. Development of single nucleotide polymorphism markers in Theobroma cacao and comparison to simple sequence repeat markers for genotyping of Cameroon clones. Molecular Breeding. 2011 Jan;27(1):93–106. https://doi.org/10.1007/s11032-010-9416-2
OpenUrl
42.↵
Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theoretical and Applied Genetics. 2009 Aug 1;119(3):507–17. https://link.springer.com/content/pdf/10.1007/s00122-009-1059-5.pdf
OpenUrl CrossRef PubMed Web of Science
43.↵
Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE, Buckler ES. Association mapping: critical considerations shift from genotyping to experimental design. The Plant Cell. 2009 Aug 1;21(8):2194–202. http://www.plantcell.org/content/plantcell/21/8/2194.full.pdf
OpenUrl Abstract/FREE Full Text
44.↵
Jannink JL, Lorenz AJ, Iwata H. Genomic selection in plant breeding: from theory to practice. Briefings in Functional Genomics. 2010 Mar 1;9(2):166–77. https://doi.org/10.1093/bfgp/elq001
OpenUrl CrossRef PubMed Web of Science
45.↵
Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, Burgueño J, González-Camacho JM, Pérez-Elizalde S, Beyene Y, Dreisigacker S. Genomic selection in plant breeding: methods, models, and perspectives. Trends in Plant Science. 2017 Nov 1;22(11):961–75. https://doi.org/10.1016/j.tplants.2017.08.011
OpenUrl CrossRef
46.↵
Breseghello F, Sorrells ME. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Science. 2006 May;46(3):1323–30. https://doi.org/10.2135/cropsci2005.09-0305
OpenUrl CrossRef Web of Science
47.↵
Li H, Bradbury P, Ersoz E, Buckler ES, Wang J. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent. PloS one. 2011 Mar 15;6(3):e17573. https://doi.org/10.1371/journal.pone.0017573
OpenUrl CrossRef PubMed
48.↵
Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theoretical and Applied Genetics. 1968 Jun;38(6):226–31. https://doi.org/10.1007/BF01245622
OpenUrl CrossRef PubMed
49.↵
Stack JC, Royaert S, Gutiérrez O, Nagai C, Holanda IS, Schnell R, Motamayor JC. Assessing microsatellite linkage disequilibrium in wild, cultivated, and mapping populations of Theobroma cacao L. and its impact on association mapping. Tree Genetics & Genomes. 2015 Apr 1;11(2):19. https://doi.org/10.1007/s11295-015-0839-0
OpenUrl
50.↵
Marcano M, Morales S, Hoyer MT, Courtois B, Risterucci AM, Fouet O, Pugh T, Cros E, Gonzalez V, Dagert M, Lanaud C. A genomewide admixture mapping study for yield factors and morphological traits in a cultivated cocoa (Theobroma cacao L.) population. Tree Genetics & Genomes. 2009 Apr 1;5(2):329–37. https://doi.org/10.1007/s11295-008-0185-6
OpenUrl
51.↵
Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013 Dec;9(1):1–9. https://doi.org/10.1186/1746-4811-9-29
OpenUrl CrossRef PubMed
52.↵
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003 Aug 1;164(4):1567–87. https://doi.org/10.1093/genetics/164.4.1567
OpenUrl Abstract/FREE Full Text
53.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes. 2007 Jul;7(4):574–8. https://doi.org/10.1111/j.1471-8286.2007.01758.x
OpenUrl CrossRef PubMed Web of Science
54.↵
Hubisz MJ, Falush D, Stephens M, Pritchard JK. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources. 2009 Sep;9(5):1322–32. https://doi.org/10.1111/j.1755-0998.2009.02591.x
OpenUrl
55.↵
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010 Jun; 465(7298):627–31. https://doi.org/10.1038/nature08800
OpenUrl CrossRef PubMed Web of Science
56.↵
Kadam NN, Jagadish SK, Struik PC, van der Linden CG, Yin X. Incorporating genome-wide association into eco-physiological simulation to identify markers for improving rice yields. Journal of Experimental Botany. 2019 Apr 15;70(9):2575–86. https://doi.org/10.1093/jxb/erz120
OpenUrl
57.↵
Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ES. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proceedings of the National Academy of Sciences. 2001 Sep 25;98(20):11479–84. https://doi.org/10.1073/pnas.201394398
OpenUrl Abstract/FREE Full Text
58.↵
Alemu A, Feyissa T, Maccaferri M, Sciara G, Tuberosa R, Ammar K, Badebo A, Acevedo M, Letta T, Abeyo B. Genome-wide association analysis unveils novel QTLs for seminal root system architecture traits in Ethiopian durum wheat. BMC Genomics. 2021 Dec;22(1):1–6. https://doi.org/10.1186/s12864-020-07320-4
OpenUrl CrossRef
59.↵
Boyles RE, Cooper EA, Myers MT, Brenton Z, Rauh BL, Morris GP, Kresovich S. Genome-wide association studies of grain yield components in diverse sorghum germplasm. The Plant Genome. 2016 Jul;9(2): https://doi.org/10.3835/plantgenome2015.09.0091
60.↵
Zhang M, Kim Y, Zong J, Lin H, Dievart A, Li H, Zhang D, Liang W. Genome-wide analysis of the barley non-specific lipid transfer protein gene family. The Crop Journal. 2019 Feb 1;7(1):65–76. https://doi.org/10.1016/j.cj.2018.07.009
OpenUrl
61.↵
Luo X, Ma C, Yue Y, Hu K, Li Y, Duan Z, Wu M, Tu J, Shen J, Yi B, Fu T. Unravelling the complex trait of harvest index in rapeseed (Brassica napus L.) with association mapping. BMC Genomics. 2015 Dec;16(1):1–0. https://doi.org/10.1186/s12864-015-1607-0
OpenUrl CrossRef PubMed
62.↵
Zhao X, Chang H, Feng L, Jing Y, Teng W, Qiu L, Zheng H, Han Y, Li W. Genome-wide association mapping and candidate gene analysis for saturated fatty acid content in soybean seed. Plant Breeding. 2019 Oct;138(5):588–98. https://doi.org/10.1111/pbr.12706
OpenUrl
63.↵
Wang ML, Sukumaran S, Barkley NA, Chen Z, Chen CY, Guo B, Pittman RN, Stalker HT, Holbrook CC, Pederson GA, Yu J. Population structure and marker-trait association analysis of the US peanut (Arachis hypogaea L.) mini-core collection. Theoretical and Applied Genetics. 2011 Dec 1;123(8):1307–17. https://doi.org/10.1007/s00122-011-1668-7
OpenUrl CrossRef PubMed
64.↵
Zhu C, Gore M, Buckler ES, Yu J. Status and prospects of association mapping in plants. The Plant Genome. 2008 Jul;1(1). https://doi.org/10.3835/plantgenome2008.02.0089
65.↵
Motamayor JC, Lachenaud P, e Mota JW, Loor R, Kuhn DN, Brown JS, Schnell RJ. Geographic and genetic population differentiation of the Amazonian chocolate tree (Theobroma cacao L). PloS one. 2008 Oct 1;3(10):e3311. https://doi.org/10.1371/journal.pone.0003311
OpenUrl CrossRef PubMed
66.↵
Bekele F, Butler DR. Proposed short list of cocoa descriptors for characterization. In: Working procedures for cocoa germplasm evaluation and selection. Proceedings of the CFC/ICCO/IPGRI Project Workshop, Montpellier, France, 1-6 February, 1998. 2000 (pp. 41–48). International Plant Genetic Resources Institute (IPGRI), Rome.
67.↵
Toxopeus H. Cocoa breeding: a consequence of mating system heterosis and population structure. In: Proc. of Conf. on Cocoa and Coconuts in Malaysia. Wastie RL, Earp DA (Eds) 25 27 November, 1971, Kuala Lumpur. 1972;3 12.The Incorporated Society of Planters, Kuala Lumpur.
68.
Cilas C, Machado R, Motamayor JC. Relations between several traits linked to sexual plant reproduction in Theobroma cacao L.: number of ovules per ovary, number of seeds per pod, and seed weight. Tree Genetics & Genomes. 2010 Feb 1;6(2):219–26. https://doi.org/10.1007/s11295-009-0242-9
OpenUrl
69.↵
dos Santos Fernandes L, Correa FM, Ingram KT, de Almeida AA, Royaert S. QTL mapping and identification of SNP-haplotypes affecting yield components of Theobroma cacao L. Horticulture Research. 2020 Mar 1;7(1):1–8. https://doi.org/10.1038/s41438-020-0250-3
OpenUrl
70.↵
Pound FJ. The Progress of Selection. In Third Annual Report on Cacao Research, 1933. 1943;25 8. Trinidad and Tobago Government Printery, Port-of-Spain.
OpenUrl
71.↵
Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, Toomajian C, Zheng H, Dean C, Marjoram P, Nordborg M. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007 Jan 19;3(1):e4. https://doi.org/10.1371/journal.pgen.0030004
OpenUrl CrossRef PubMed
72.↵
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun 1;155(2):945–59. https://doi.org/10.1093/genetics/155.2.945
OpenUrl Abstract/FREE Full Text
73.
Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theoretical Population Biology. 2001 Nov 1;60(3):227–37. https://doi.org/10.1006/tpbi.2001.1543
OpenUrl CrossRef PubMed Web of Science
74.
Pritchard JK, Wen W, Falush D. Documentation for STRUCTURE software: Version 2. University of Chicago, Chicago, IL. 2010 Feb 2. http://pritch.bsd.uchicago.edu/structure.html
75.↵
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 2005 Jul;14(8):2611–20. https://doi.org/10.1111/j.1365-294X.2005.02553.x
OpenUrl CrossRef PubMed Web of Science
76.↵
Perrier X, Jacquemoud-Collet JP. DARwin software. 2006. http://darwin.cirad.fr/
77.↵
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007 Oct 1;23(19):2633–5. https://doi.org/10.1093/bioinformatics/btm308
OpenUrl CrossRef PubMed Web of Science
78.↵
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics. 2006 Feb;38(2):203–8. https://doi.org/10.1038/ng1702
OpenUrl CrossRef PubMed Web of Science
79.↵
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975 Jun 1:423–47. https://doi.org/10.2307/2529430
80.↵
Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3: Genes| Genomes| Genetics. 2012 Nov 1;2(11):1405–13. https://doi.org/10.1534/g3.112.004259
OpenUrl
81.↵
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Transactions on information theory. 1967 Jan;13(1):21–7. doi: 10.1109/TIT.1967.1053964.
OpenUrl CrossRef
82.↵
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006 Aug;38(8):904–9. https://doi.org/10.1038/ng1847
OpenUrl CrossRef PubMed Web of Science
83.↵
R Core Team. R: A Language and Environment for Statistical Computing. 887 R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-proje ct. org. 2017; 888.
84.↵
Gao X, Becker LC, Becker DM, Starmer JD, Province MA. Avoiding the high Bonferroni penalty in genome-wide association studies. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2010 Jan;34(1):100–5. https://doi.org/10.1002/gepi.20430
OpenUrl
85.↵
Utro F, Haiminen N, Livingstone D, Cornejo OE, Royaert S, Schnell RJ, Motamayor JC, Kuhn DN, Laxmi P. iXora: exact haplotype inferencing and trait association. BMC Genetics. 2013 Dec;14(1):1–5. https://doi.org/10.1186/1471-2156-14-48
OpenUrl
86.↵
Gutiérrez-López N, Ovando-Medina I, Salvador-Figueroa M, Molina-Freaner F, Avendaño-Arrazate CH, Vázquez-Ovando A. Unique haplotypes of cacao trees as revealed by trnH-psbA chloroplast DNA. PeerJ. 2016 Apr 7;4:e1855. https://doi.org/10.7717/peerj.1855
87.↵
Motilal LA, Zhang D, Mischke S, Meinhardt LW, Boccara M, Fouet O, Lanaud C, Umaharan P. Association mapping of seed and disease resistance traits in Theobroma cacao L. Planta. 2016 Dec;244(6):1265–76. https://doi.org/10.1007/s00425-016-2582-7
OpenUrl CrossRef
88.↵
Marcano M, Pugh T, Cros E, Morales S, Páez EA, Courtois B, Glaszmann JC, Engels JM, Phillips W, Astorga C, Risterucci AM. Adding value to cocoa (Theobroma cacao L.) germplasm information with domestication history and admixture mapping. Theoretical and Applied Genetics. 2007 Mar 1;114(5):877–84. https://doi.org/10.1007/s00122-006-0486-9
OpenUrl CrossRef PubMed Web of Science
89.↵
Motamayor JC, Mockaitis K, Schmutz J, Haiminen N, Livingstone III D, Cornejo O, Findley SD, Zheng P, Utro F, Royaert S, Saski C. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biology. 2013 Jun;14(6):1–25. https://doi.org/10.1186/gb-2013-14-6-r53
OpenUrl CrossRef
90.↵
Jako C, Kumar A, Wei Y, Zou J, Barton DL, Giblin EM, Covello PS, Taylor DC. Seed-specific over-expression of an Arabidopsis cDNA encoding a diacylglycerol acyltransferase enhances seed oil content and seed weight. Plant Physiology. 2001 Jun 1;126(2):861–74. https://doi.org/10.1104/pp.126.2.861
OpenUrl Abstract/FREE Full Text
91.↵
Motilal LA, Sounigo O, Thévenin JM, Risterucci AM, Pieretti I, Noyer JL, Lanaud C. Theobroma cacao L.: genome map and QTLs for Phytophthora palmivora resistance. In: Towards the effective and optimum promotion of cocoa through research and development. Proceedings of the 13th International Cocoa Research Conference, October 9 14, 2000, Kota Kinabalu, Malaysia. Cocoa Producers’ Alliance, Lagos, 2001;111–17.
92.↵
Queiroz VT, Guimarães CT, Ahnert D, Schuster I, Daher RT, Pereira MG, Miranda VR, Loguercio LL, Barros EG, Moreira MA, Wricke G. Identification of a major QTL in cocoa (Theobroma cacao L.) associated with resistance to Witches’ Broom disease. Plant Breeding. 2003 Jun;122(3):268–72. https://doi.org/10.1046/j.1439-0523.2003.00809.x
OpenUrl
93.↵
Royaert S, Phillips-Mora W, Leal AM, Cariaga K, Brown JS, Kuhn DN, Schnell RJ, Motamayor JC. Identification of marker-trait associations for self-compatibility in a segregating mapping population of Theobroma cacao L. Tree Genetics & Genomes. 2011 Dec;7(6):1159–68. https://doi.org/10.1007/s11295-011-0403-5
OpenUrl
94.↵
Royaert S, Jansen J, da Silva DV, de Jesus Branco SM, Livingstone DS, Mustiga G, Marelli JP, Araújo IS, Corrêa RX, Motamayor JC. Identification of candidate genes involved in Witches’ Broom disease resistance in a segregating mapping population of Theobroma cacao L. in Brazil. BMC Genomics. 2016 Dec;17(1):107. https://doi.org/10.1186/s12864-016-2415-x
OpenUrl CrossRef
95.↵
Sounigo O, Efombagn B, Lemainque A et al. Association mapping on cocoa: a way to identify functional SSR markers linked to yield, tolerance to black pod and mirids assessed in Cameroon and develop a marker assisted breeding programme. In Proceedings of the 16th Int Cocoa Research Conference, Bali, Indonesia, 16 21 November 2009. 2012;153 58. COPAL, Lagos.
OpenUrl
96.↵
Da Silva MR, Clément D, Gramacho KP, Monteiro WR, Argout X, Lanaud C, Lopes U. Genome-wide association mapping of sexual incompatibility genes in cacao (Theobroma cacao L.). Tree Genetics & Genomes. 2016 Jun;12(3):1–3. https://doi.org/10.1007/s11295-016-1012-0
OpenUrl CrossRef
97.↵
Osorio-Guarín JA, Berdugo-Cely JA, Coronado-Silva RA, Baez E, Jaimes Y, Yockteng R. Genome-wide association study reveals novel candidate genes associated with productivity and disease resistance to Moniliophthora spp. in cacao (Theobroma cacao L.). G3: Genes, Genomes, Genetics. 2020 May 1;10(5):1713–25. https://doi.org/10.1534/g3.120.401153
OpenUrl
98.↵
Semagn K, Bjørnstad Å, Xu Y. The genetic dissection of quantitative traits in crops. Electronic Journal of Biotechnology. 2010 Sep;13(5):16–7. https://scielo.conicyt.cl/pdf/ejb/v13n5/a16.pdf
OpenUrl
99.↵
Mir RR, Choudhary N, Bawa V, Jan S, Singh B, Ashraf Bhat M, Paliwal R, Kumar A, Chitikineni A, Thudi M, Varshney RK. Allelic diversity, structural analysis and genome-wide association study (GWAS) for yield and related traits using unexplored common bean (Phaseolus vulgaris L.) germplasm from Western Himalayas. Frontiers in Genetics. 2021;11:1797. doi: 10.3389/fgene.2020.609603
OpenUrl CrossRef
100.↵
Caspari E. Pleiotropic gene action. Evolution. 1952 March;6:1–18. https://www.jstor.org/stable/2405500
OpenUrl CrossRef
101.↵
Yeaman S. Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences. 2013 May 7;110(19):E1743–51. https://doi.org/10.1073/pnas.1219381110
OpenUrl Abstract/FREE Full Text
102.↵
Elwers S, Zambrano A, Rohsius C, Lieberei R. Histological features of phenolic compounds in fine and bulk cocoa seed (Theobroma cacao L.). J Appl Bot Food Qual. 2010 Sep 1;83(2):182–8. https://www.researchgate.net/profile/Alexis-Zambrano-3/publication/259703547_Histological_features_of_phenolic_compounds_in_fi_ne_and_bulk_cocoa_seed_Theobroma_cacao_L/links/5693b49708ae820ff0727949/Histological-features-of-phenolic-compounds-in-fi-ne-and-bulk-cocoa-seed-Theobroma-cacao-L.pdf
OpenUrl
103.↵
Bucheli P, Rousseau G, Alvarez M, Laloi M, McCarthy J. Developmental variation of sugars, carboxylic acids, purine alkaloids, fatty acids, and endoproteinase activity during maturation of Theobroma cacao L. seeds. Journal of Agricultural and Food Chemistry. 2001 Oct 15;49(10):5046–51. https://doi.org/10.1021/jf010620z
OpenUrl PubMed
104.↵
Mustiga GM, Morrissey J, Stack JC, DuVal A, Royaert S, Jansen J, Bizzotto C, Villela-Dias C, Mei L, Cahoon EB, Seguine E. Identification of climate and genetic factors that control fat content and fatty acid composition of Theobroma cacao L. beans. Frontiers in Plant Science. 2019 Oct 14;10:1159. https://doi.org/10.1105/tpc.109.068437
OpenUrl
105.↵
Amin I, Jinap S, Jamilah B, Harikrisna K, Biehl B. Analysis of vicilin (7S)?class globulin in cocoa cotyledons from various genetic origins. Journal of the Science of Food and Agriculture. 2002 May 15;82(7):728–32. https://doi.org/10.1002/jsfa.1104
OpenUrl
106.↵
Fritz PJ, Fritz KA, Kauffman JM, Patterson GR, Robertson CA, Stoesz DA, Wilson MR. Cocoa seeds: changes in protein and polysomal RNA during development. Journal of Food Science. 1985 Jul;50(4):946–50. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2621.1985.tb12986.x
OpenUrl
107.↵
Lung SC, Weselake RJ. Diacylglycerol acyltransferase: a key mediator of plant triacylglycerol synthesis. Lipids. 2006 Dec;41(12):1073–88. https://doi.org/10.1007/s11745-006-5057-y
OpenUrl CrossRef PubMed Web of Science
108.↵
Sørensen BM, Furukawa-Stoffer TL, Marshall KS, Page EK, Mir Z, Forster RJ, Weselake RJ. Storage lipid accumulation and acyltransferase action in developing flaxseed. Lipids. 2005 Oct;40(10):1043–9. https://doi.org/10.1007/s11745-005-1467-0
OpenUrl CrossRef PubMed Web of Science
109.↵
Perry HJ, Harwood JL. Changes in the lipid content of developing seeds of Brassica napus. Phytochemistry. 1993 Jan 1;32(6):1411–5. https://doi.org/10.1016/0031-9422(93)85148-K
OpenUrl CrossRef
110.↵
Doebley JF, Gaut BS, Smith BD. The molecular genetics of crop domestication. Cell. 2006 Dec 29;127(7):1309–21. https://doi.org/10.1016/j.cell.2006.12.006
OpenUrl CrossRef PubMed Web of Science
111.↵
Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP. Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Frontiers in Genetics. 2016 Dec 27;7:221. https://www.frontiersin.org/articles/10.3389/fgene.2016.00221/full
OpenUrl
112.↵
Mangini G, Blanco A, Nigro D, Signorile MA, Simeone R. Candidate genes and quantitative trait loci for grain yield and seed size in Durum Wheat. Plants 2021;10, 312–28. https://doi.org/10.3390/plants10020312
OpenUrl
113.↵
Ohto MA, Floyd SK, Fischer RL, Goldberg RB, Harada JJ. Effects of APETALA2 on embryo, endosperm, and seed coat development determine seed size in Arabidopsis. Sexual Plant Reproduction. 2009 Dec;22(4):277–89. https://doi.org/10.1007/s00497-009-0116-1
OpenUrl CrossRef PubMed Web of Science
114.↵
Kroj T, Savino G, Valon C, Giraudat J, Parcy F. Regulation of storage protein gene expression in Arabidopsis. Development. 2003 Dec 15;130(24):6065–73. https://doi.org/10.1242/dev.00814
OpenUrl Abstract/FREE Full Text
115.↵
Sosso D, Luo D, Li QB, Sasse J, Yang J, Gendrot G, Suzuki M, Koch KE, McCarty DR, Chourey PS, Rogowsky PM. Seed filling in domesticated maize and rice depends on SWEET-mediated hexose transport. Nature Genetics. 2015 Dec;47(12):1489. https://doi.org/10.1038/ng.3422
OpenUrl CrossRef PubMed
116.↵
Falcone Ferreyra ML, Rius S, Casati P. Flavonoids: biosynthesis, biological functions, and biotechnological applications. Frontiers in Plant Science. 2012 Sep 28;3:222. https://doi.org/10.3389/fpls.2012.00222.
OpenUrl
117.↵
Devy L, Anita-Sari I, Saputra TI, Susilo AW, Wachjar A. Identification of molecular marker based on MYB Transcription Factor for the selection of Indonesian Fine Cacao (Theobroma cacao L.). Pelita Perkebunan (a Coffee and Cocoa Research Journal). 2018 Aug 31;34(2):59–68.
OpenUrl
118.↵
Liu Y, Shi Z, Maximova SN, Payne MJ, Guiltinan MJ. Tc-MYBPA is an Arabidopsis TT2-like transcription factor and functions in the regulation of proanthocyanidin synthesis in Theobroma cacao. BMC Plant Biology. 2015 Dec;15(1):1–6. https://doi.org/10.1186/s12870-015-0529-y
OpenUrl
119.↵
Bartley BGD. The genetic diversity of cacao and its utilization. The genetic diversity of cacao and its utilization. 2005. CABI Publishing, Wallingford.
120.↵
Rockman MV. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution: International Journal of Organic Evolution. 2012 Jan;66(1):1–7. https://doi.org/10.1111/j.1558-5646.2011.01486.x
OpenUrl CrossRef PubMed Web of Science
121.↵
Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics. 2014 Jul 3;95(1):5–23. https://doi.org/10.1016/j.ajhg.2014.06.009
OpenUrl CrossRef PubMed
122.↵
Bernardo R. Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Science. 2008 Sep;48(5):1649–64. https://doi.org/10.2135/cropsci2008.03.0131
OpenUrl CrossRef PubMed Web of Science
123.↵
Qi Z, Song J, Zhang K, Liu S, Tian X, Wang Y, Fang Y, Li X, Wang J, Yang C, Jiang S. Identification of QTNs controlling 100-seed weight in soybean using multilocus genome-wide association studies. Frontiers in Genetics. 2020 Jul 16;11:689. https://doi.org/10.3389/fgene.2020.00689
124.↵
Bailey BA, Melnick RL, Strem MD, Crozier J, Shao J, Sicher R, Phillips-Mora W, Ali SS, Zhang D, Meinhardt L. Differential gene expression by Moniliophthora roreri while overcoming cacao tolerance in the field. Molecular Plant Pathology. 2014 Sep;15(7):711–29. https://doi.org/10.1111/mpp.12134
OpenUrl
125.↵
Pokou DN, Fister AS, Winters N, Tahi M, Klotioloma C, Sebastian A, Marden JH, Maximova SN, Guiltinan MJ. Resistant and susceptible cacao genotypes exhibit defense gene polymorphism and unique early responses to Phytophthora megakarya inoculation. Plant Molecular Biology. 2019 Mar;99(4):499–516. https://doi.org/10.1007/s11103-019-00832-y
OpenUrl CrossRef
126.↵
Chai Y, Hao X, Yang X, Allen WB, Li J, Yan J, Shen B, Li J. Validation of DGAT1-2 polymorphisms associated with oil content and development of functional markers for molecular breeding of high-oil maize. Molecular Breeding. 2012 Apr;29(4):939–49. https://doi.org/10.1007/s11032-011-9644-0
OpenUrl
127.↵
Rebbeck TR, Spitz M, Wu X. Assessing the function of genetic variants in candidate gene association studies. Nature Reviews Genetics. 2004 Aug;5(8):589–97. https://doi.org/10.1038/nrg1403
OpenUrl PubMed Web of Science
128.↵
McKown AD, Klápště J, Guy RD, Geraldes A, Porth I, Hannemann J, Friedmann M, Muchero W, Tuskan GA, Ehlting J, Cronk QC. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytologist. 2014 Jul;203(2):535–53. https://doi.org/10.1111/nph.12815
OpenUrl CrossRef PubMed
129.↵
Micheli F, Maximova S, Gramacho KP, Guiltinan M, Wilkinson MJ, Lanaud, C, … de Mattos Cascardo JC. Functional genomics of cacao. In: Advances in Botanical Research, Chapter 3. Jean-Claude K, Michel D (Eds). 2010;119–177. Academic Press, London,
130.↵
Romero Navarro JA, Phillips-Mora W, Arciniegas-Leal A, Mata-Quirós A, Haiminen N, Mustiga G, Livingstone III D, van Bakel H, Kuhn DN, Parida L, Kasarskis A. Application of genome wide association and genomic prediction for improvement of cacao productivity and resistance to black and frosty pod diseases. Frontiers in Plant Science. 2017 Nov 14;8:1905. https://doi.org/10.3389/fpls.2017.01905
OpenUrl
131.↵
McElroy MS, Navarro AJ, Mustiga G, Stack C, Gezan S, Peña G, Sarabia W, Saquicela D, Sotomayor I, Douglas GM, Migicovsky Z. Prediction of cacao (Theobroma cacao) resistance to Moniliophthora spp. diseases via genome-wide association analysis and genomic selection. Frontiers in Plant Science. 2018 Mar 20;9:343. https://doi.org/10.3389/fpls.2018.00343
OpenUrl
132.↵
Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001 Apr 1;157(4):1819–29. https://doi.org/10.1093/genetics/157.4.1819
OpenUrl Abstract/FREE Full Text
133.↵
Cerrudo D, Cao S, Yuan Y, Martinez C, Suarez EA, Babu R, Zhang X, Trachsel S. Genomic selection outperforms marker assisted selection for grain yield and physiological traits in a maize doubled haploid population across water treatments. Frontiers in Plant Science. 2018 Mar 20;9:366. https://doi.org/10.3389/fpls.2018.00366
OpenUrl
134.↵
Bernardo R, Yu J. Prospects for genomewide selection for quantitative traits in maize. Crop Science. 2007 May;47(3):1082–90. https://doi.org/10.2135/cropsci2006.11.0690
OpenUrl CrossRef Web of Science
135.↵
Liu D, Zhang J, Liu X, Wang W, Liu D, Teng Z, Fang X, Tan Z, Tang S, Yang J, Zhong J. Fine mapping and RNA-Seq unravels candidate genes for a major QTL controlling multiple fiber quality traits at the T 1 region in upland cotton. BMC Genomics. 2016 Dec;17(1):1–3. https://doi.org/10.1186/s12864-016-2605-6
OpenUrl CrossRef PubMed
136.↵
Poczai P, Varga I, Laos M, Cseh A, Bell N, Valkonen JP, Hyvönen J. Advances in plant gene-targeted and functional markers: a review. Plant Methods. 2013 Dec;9(1):1–32. https://doi.org/10.1186/1746-4811-9-6
OpenUrl CrossRef PubMed
137.
Li N, Xu R, Li Y. Molecular networks of seed size control in plants. Annual Review of Plant Biology. 2019 Apr 29;70:435–63. https://doi.org/10.1146/annurev-arplant-050718-095851
OpenUrl CrossRef PubMed
138.
Turner SD. qqman: an R package for visualizing GWAS results using QQ and m Manhattan plots. Biorxiv. 2014 Jan 1:005165.

View the discussion thread.

Posted November 22, 2021.

Download PDF

Citation Tools

Subject Area

Genomics

Subject Areas

All Articles

Animal Behavior and Cognition (5200)
Biochemistry (11703)
Bioengineering (8718)
Bioinformatics (29127)
Biophysics (14930)
Cancer Biology (12048)
Cell Biology (17353)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14143)
Epidemiology (2067)
Evolutionary Biology (18266)
Genetics (12219)
Genomics (16765)
Immunology (11841)
Microbiology (28003)
Molecular Biology (11551)
Neuroscience (60804)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3229)
Physiology (4939)
Plant Biology (10383)
Scientific Communication and Education (1679)
Synthetic Biology (2877)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Alverson WS, Whitlock BA, Nyffeler R, Bayer C, Baum DA. Phylogeny of the core Malvales: evidence from ndhF sequence data. American Journal of Botany. 1999 Oct;86(10):1474–86. https://doi.org/10.2307/2656928
OpenUrl Abstract/FREE Full Text

[2] 2.↵
Expert market research (2020) Expert Market Research Report https://www.expertmarketresearch.com/reports/chocolate-market. Accessed August 6 2020.

[3] 3.↵
Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M. The genome of Theobroma cacao. Nature Genetics. 2011 Feb;43(2):101–8. https://doi.org/10.1038/ng.736
OpenUrl CrossRef PubMed

[4] 4.↵
Cheesman EE. Notes on the nomenclature, classification and possible relationships of cacao populations. Tropical Agriculture. 1944;21(8).

[5] 5.↵
Eskes A, Lanaud C. Cocoa. In: Tropical Plant Breeding. Eds Charrier A, Jacquot M, Hamon S, Nicolas D. CIRAD, Montpellier. 2001. pp. 78–105.

[6] 6.↵
Bekele F, Phillips-Mora W. Cocoa Breeding. In: Advances in Plant Breeding: Industrial and Food Crops, Vol 6. Eds J.M. Al-Khayri, et al. Springer-Verlag, Cham. 2019. pp. 409–87. https://doi.org/10.1007/978-3-030-23265-8_12.
OpenUrl

[7] 7.↵
Lanaud C, Fouet O, Legavre T, Lopes U, Sounigo O, Eyango MC, Mermaz B, Da Silva MR, Loor Solórzano RG, Argout X, Gyapay G. Deciphering the Theobroma cacao self-incompatibility system: from genomics to diagnostic markers for self-compatibility. Journal of Experimental Botany. 2017 Oct 13;68(17):4775–90. https://doi.org/10.1093/jxb/erx293
OpenUrl

[8] 8.↵
Ribaut JM, De Vicente MC, Delannay X. Molecular breeding in developing countries: challenges and perspectives. Current Opinion in Plant Biology. 2010 Apr 1;13(2):213–8. https://doi.org/10.1016/j.pbi.2009.12.011
OpenUrl CrossRef PubMed

[9] 9.↵
Simmonds NW. The breeding of perennial crops. In: Proceedings of the Workshop on the Conservation, Characterisation and Utilization of Cocoa Genetic Resources in the 21st Century, 13-17 September 1992. The Cocoa Research Unit, Port of Spain. 1993;156–62.

[10] 10.↵
Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping QTL for yield components, vigor, and resistance to Phytophthora palmivora in Theobroma cacao L. Genome. 2003a Apr 1;46(2):204–12. https://doi.org/10.1139/g02-125
OpenUrl CrossRef PubMed

[11] 11.↵
Araújo IS, de Souza Filho GA, Pereira MG, Faleiro FG, de Queiroz VT, Guimarães CT, Moreira MA, de Barros EG, Machado RC, Pires JL, Schnell R. Mapping of quantitative trait loci for butter content and hardness in cocoa beans (Theobroma cacao L.). Plant Molecular Biology Reporter. 2009 Jun 1;27(2):177–83. https://link.springer.com/content/pdf/10.1007/s11105-008-0069-9.pdf
OpenUrl

[12] 12.↵
Crouzillat D, Lerceteau E, Pétiard V, Morera J, Rodríguez H, Walker D, Phillips W, Ronning C, Schnell R, Osei J, Fritz P. Theobroma cacao L.: a genetic linkage map and quantitative trait loci analysis. Theoretical and Applied Genetics. 1996 Jul 1;93(1-2):205–14. https://doi.org/10.1007/BF00225747
OpenUrl CrossRef Web of Science

[13] 13.
Crouzillat D, Ménard B, Mora A, Phillips W, Pétiard V. Quantitative trait analysis in Theobroma cacao using molecular markers. Euphytica. 2000a Jul;114(1):13–23. https://doi.org/10.1023/A:1003892217582
OpenUrl

[14] 14.↵
Crouzillat D, Phillips W, Fritz PJ, Pétiard V. Quantitative trait loci analysis in Theobroma cacao using molecular markers. Inheritance of polygenic resistance to Phytophthora palmivora in two related cacao populations. Euphytica. 2000b Jul;114(1):25–36. https://doi.org/10.1023/A:1003994212394
OpenUrl

[15] 15.
N’Goran JA, Risterucci AM, Clément D, Sounigo O, Lorieux M, Lanaud C. Identification of quantitative trait loci (QTL) in Theobroma cacao L. L. Agron Afr. 1997;9:55–63.
OpenUrl

[16] 16.
Lanaud C, Kébé IS, Risterucci AM, Clément D, N’Goran JA, Grivet L, Tahi GM, Cilas C, Pieretti I, Eskes A, Despréaux D. Mapping quantitative trait loci (QTL) for resistance to Phytophthora palmivora in T. cacao. In: Proceedings of the 12th International Cocoa Research Conference, November 17 1996, Bahia, Brazil. Cocoa Producers’ Alliance, Lagos. 1999;99–105.

[17] 17.
Lanaud C, Boult E, Clapperton J, N’Goran JKA, Cros E, Chapelin M, Clément D, Petithugenin P. Identification of QTLs related to fat content, seed size an sensorial traits in Theobroma cacao L. In Proceedings of the 14th International Cocoa Conference, 13 18 October 2003, Accra, Ghana. Cocoa Producers’ Alliance, Lagos. 2005;1119–26.

[18] 18.
Lanaud C, Fouet O, Clément D, Boccara M, Risterucci AM, Surujdeo-Maharaj S, Legavre T, Argout X. A meta-QTL analysis of disease resistance traits of Theobroma cacao L. Molecular Breeding. 2009 Nov;24(4):361–74. https://doi.org/10.1007/s11032-009-9297-4
OpenUrl

[19] 19.↵
Flament MH, Kébé I, Clément D, Pieretti I, Risterucci AM, N’Goran JA, Cilas C, Despréaux D, Lanaud C. Genetic mapping of resistance factors to Phytophthora palmivora in cocoa. Genome. 2001 Feb 1;44(1):79–85. https://doi.org/10.1139/g00-099
OpenUrl PubMed

[20] 20.↵
Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping quantitative trait loci for bean traits and ovule number in Theobroma cacao L. Genome. 2003b Feb 1;46(1):103–11. https://doi.org/10.1139/g02-118
OpenUrl PubMed

[21] 21.
Clément D, Lanaud C, Sabau X, Fouet O, Le Cunff L, Ruiz E, Risterucci AM, Glaszmann JC, Piffanelli P. Creation of BAC genomic resources for cocoa (Theobroma cacao L.) for physical mapping of RGA containing BAC clones. Theoretical and Applied Genetics. 2004 May 1;108(8):1627–34. https://doi.org/10.1007/s00122-004-1593-0
OpenUrl CrossRef PubMed Web of Science

[22] 22.
Risterucci AM, Paulin D, Ducamp M, N’Goran JA, Lanaud C. Identification of QTLs related to cocoa resistance to three species of Phytophthora. Theoretical and Applied Genetics. 2003 Dec 1;108(1):168–74. https://doi.org/10.1007/s00122-003-1408-8
OpenUrl CrossRef PubMed

[23] 23.
Pugh T. Etude du déséquilibre de liaison chez le cacaoyer appartenant aux groupes Criollo/Trinitario. Application au marquage génétique d’intérêt pour la sélection. Thèse Doctorat, Ecole National Supérieur d’Agonomie, Montpellier. 2005;107p.

[24] 24.
Pugh T, Fouet O, Risterucci AM, Brottier P, Abouladze M, Delettrez C, Courtois B, Clément D, Larmande P, N’Goran JA, Lanaud C. A new codominant marker-based cocoa linkage map: development and integration of new microsatellite markers into cocoa linkage map. A new cocoa reference map. In Proceedings of 14th International Cocoa Research Conference, Accra, Ghana, 13-18 October 2003 2003. Cocoa Producers’ Alliance, Lagos. 2005;153–60.

[25] 25.
Brown JS, Schnell RJ, Motamayor JC, Lopes U, Kuhn DN, Borrone JW. Resistance gene mapping for witches’ broom disease in Theobroma cacao L. in an F2 population using SSR markers and candidate genes. Journal of the American Society for Horticultural Science. 2005 May 1;130(3):366–73. https://doi.org/10.21273/JASHS.130.3.366
OpenUrl Abstract/FREE Full Text

[26] 26.
Brown JS, Phillips-Mora W, Power EJ, Krol C, Cervantes-Martinez C, Motamayor JC, Schnell RJ. Mapping QTLs for resistance to frosty pod and black pod diseases and horticultural traits in Theobroma cacao L. Crop Science. 2007 Sep;47(5):1851–8. https://doi.org/10.2135/cropsci2006.11.0753
OpenUrl CrossRef

[27] 27.
Brown JS, Sautter RT, Tondo CT, Borrone J, Kuhn D, Motamayor J, Schnell R. A composite linkage map from the combination of three crosses made from commercial clones of cacao, T. cacao L. Trop Plant Biol. 2008 Apr 22;1(2):120–30. https://doi.org/10.1007/s12042-008-9011-4
OpenUrl CrossRef

[28] 28.↵
Faleiro FG, Queiroz VT, Lopes UV, Guimarães CT, Pires JL, Yamada MM, Araújo IS, Pereira MG, Schnell R, de Souza Filho GA, Ferreira CF. Mapping QTLs for witches’ broom (Crinipellis perniciosa) resistance in cacao (Theobroma cacao L.). Euphytica. 2006 May;149(1):227–35. https://doi.org/10.1007/s10681-005-9070-7
OpenUrl

[29] 29.↵
Argout X, Martin G, Droc G, Fouet O, Labadie K, Rivals E, Aury JM, Lanaud C. The cacao Criollo genome v2. 0: an improved version of the genome for genetic and functional genomic studies. BMC Genomics. 2017 Dec 1;18(1):730. https://doi.org/10.1186/s12864-017-4120-9
OpenUrl CrossRef

[30] 30.↵
Saski CA, Feltus FA, Staton ME, Blackmon BP, Ficklin SP, Kuhn DN, Schnell RJ, Shapiro H, Motamayor JC. A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6. BMC Genomics. 2011 Dec;12(1):413–25. https://doi.org/10.1186/1471-2164-12-413

[31] 31.
Feltus FA, Saski CA, Mockaitis K, Haiminen N, Parida L, Smith Z, Ford J, Staton ME, Ficklin SP, Blackmon BP, Cheng CH. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes. BMC Genomics. 2011 Dec;12(1):1–6. https://doi.org/10.1186/1471-2164-12-379
OpenUrl CrossRef PubMed

[32] 32.↵
Fouet O, Allegre M, Argout X, Jeanneau M, Lemainque A, Pavek S, Boland A, Risterucci AM, Loor G, Tahi M, Sabau X. Structural characterization and mapping of functional EST-SSR markers in Theobroma cacao. Tree Genetics & Genomes. 2011 Aug;7(4):799–817. https://doi.org/10.1007/s11295-011-0375-5
OpenUrl

[33] 33.↵
Allegre M, Argout X, Boccara M, Fouet O, Roguet Y, Bérard AU, Thévenin JM, Chauveau AU, Rivallan R, Clément D, Courtois B. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L. DNA research. 2012 Feb 1;19(1):23–35. https://doi.org/10.1093/dnares/dsr039
OpenUrl CrossRef PubMed

[34] 34.↵
Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, Risterucci AM, Da Silva C, Cascardo J, Allegre M, Kuhn D. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics. 2008 Dec 1;9(1):512. https://doi.org/10.1186/1471-2164-9-512
OpenUrl CrossRef PubMed

[35] 35.↵
Ribeyre F, Sounigo O, Argout X, Cilas C, Efombagn MI, Denis M, Bouvet JM, Fouet O, Lanaud C. The genomic selection of Theobroma cacao L: a new strategy of marker assisted selection to improve breeding efficiency and predict useful traits in new populations. International Symposium on Cocoa Research. Lima, Peru, 13 17 November 2017. ICCO, London. http://agritrop.cirad.fr/589763/1/ID589763.pdf

[36] 36.↵
Bekele FL, Bekele I, Butler DR, Bidaisee GG. Patterns of morphological variation in a sample of cacao (Theobroma cacao L.) germplasm from the International Cocoa Genebank, Trinidad. Genetic Resources and Crop Evolution. 2006 Aug;53(5):933–48. https://doi.org/10.1007/s10722-004-6692-x
OpenUrl

[37] 37.↵
Bekele FL, Bidaisee GG, Singh H, Saravanakumar D. Morphological characterisation and evaluation of cacao (Theobroma cacao L.) in Trinidad to facilitate utilisation of Trinitario cacao globally. Genetic Resources and Crop Evolution. 2020a Mar;67(3):621–43. https://doi.org/10.1007/s10722-019-00793-7
OpenUrl

[38] 38.↵
Bekele F, Bidaisee G, Saravanakumar D. Examining phenotypic diversity and economic value of cacao (Theobroma cacao L.) conserved at the International Cocoa Genebank, Trinidad to support improvement in cocoa yield globally. Tropical Agriculture. 2020b (released 2021 Feb 25);97(2). https://journals.sta.uwi.edu/ojs/index.php/ta/article/view/7970

[39] 39.↵
Iwaro AD, Bekele FL, Butler DR. Evaluation and utilisation of cacao (Theobroma cacao L.) germplasm at the International Cocoa Genebank, Trinidad. Euphytica. 2003 Mar;130(2):207–21. https://doi.org/10.1023/A:1022855131534
OpenUrl

[40] 40.↵
Varshney RK, Terauchi R, McCouch SR. Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol. 2014 Jun 10;12(6):e1001883. https://doi.org/10.1371/journal.pbio.1001883
OpenUrl CrossRef PubMed

[41] 41.↵
Livingstone DS, Motamayor JC, Schnell RJ, Cariaga K, Freeman B, Meerow AW, Brown JS, Kuhn DN. Development of single nucleotide polymorphism markers in Theobroma cacao and comparison to simple sequence repeat markers for genotyping of Cameroon clones. Molecular Breeding. 2011 Jan;27(1):93–106. https://doi.org/10.1007/s11032-010-9416-2
OpenUrl

[42] 42.↵
Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theoretical and Applied Genetics. 2009 Aug 1;119(3):507–17. https://link.springer.com/content/pdf/10.1007/s00122-009-1059-5.pdf
OpenUrl CrossRef PubMed Web of Science

[43] 43.↵
Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE, Buckler ES. Association mapping: critical considerations shift from genotyping to experimental design. The Plant Cell. 2009 Aug 1;21(8):2194–202. http://www.plantcell.org/content/plantcell/21/8/2194.full.pdf
OpenUrl Abstract/FREE Full Text

[44] 44.↵
Jannink JL, Lorenz AJ, Iwata H. Genomic selection in plant breeding: from theory to practice. Briefings in Functional Genomics. 2010 Mar 1;9(2):166–77. https://doi.org/10.1093/bfgp/elq001
OpenUrl CrossRef PubMed Web of Science

[45] 45.↵
Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, Burgueño J, González-Camacho JM, Pérez-Elizalde S, Beyene Y, Dreisigacker S. Genomic selection in plant breeding: methods, models, and perspectives. Trends in Plant Science. 2017 Nov 1;22(11):961–75. https://doi.org/10.1016/j.tplants.2017.08.011
OpenUrl CrossRef

[46] 46.↵
Breseghello F, Sorrells ME. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Science. 2006 May;46(3):1323–30. https://doi.org/10.2135/cropsci2005.09-0305
OpenUrl CrossRef Web of Science

[47] 47.↵
Li H, Bradbury P, Ersoz E, Buckler ES, Wang J. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent. PloS one. 2011 Mar 15;6(3):e17573. https://doi.org/10.1371/journal.pone.0017573
OpenUrl CrossRef PubMed

[48] 48.↵
Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theoretical and Applied Genetics. 1968 Jun;38(6):226–31. https://doi.org/10.1007/BF01245622
OpenUrl CrossRef PubMed

[49] 49.↵
Stack JC, Royaert S, Gutiérrez O, Nagai C, Holanda IS, Schnell R, Motamayor JC. Assessing microsatellite linkage disequilibrium in wild, cultivated, and mapping populations of Theobroma cacao L. and its impact on association mapping. Tree Genetics & Genomes. 2015 Apr 1;11(2):19. https://doi.org/10.1007/s11295-015-0839-0
OpenUrl

[50] 50.↵
Marcano M, Morales S, Hoyer MT, Courtois B, Risterucci AM, Fouet O, Pugh T, Cros E, Gonzalez V, Dagert M, Lanaud C. A genomewide admixture mapping study for yield factors and morphological traits in a cultivated cocoa (Theobroma cacao L.) population. Tree Genetics & Genomes. 2009 Apr 1;5(2):329–37. https://doi.org/10.1007/s11295-008-0185-6
OpenUrl

[51] 51.↵
Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013 Dec;9(1):1–9. https://doi.org/10.1186/1746-4811-9-29
OpenUrl CrossRef PubMed

[52] 52.↵
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003 Aug 1;164(4):1567–87. https://doi.org/10.1093/genetics/164.4.1567
OpenUrl Abstract/FREE Full Text

[53] 53.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes. 2007 Jul;7(4):574–8. https://doi.org/10.1111/j.1471-8286.2007.01758.x
OpenUrl CrossRef PubMed Web of Science

[54] 54.↵
Hubisz MJ, Falush D, Stephens M, Pritchard JK. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources. 2009 Sep;9(5):1322–32. https://doi.org/10.1111/j.1755-0998.2009.02591.x
OpenUrl

[55] 55.↵
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010 Jun; 465(7298):627–31. https://doi.org/10.1038/nature08800
OpenUrl CrossRef PubMed Web of Science

[56] 56.↵
Kadam NN, Jagadish SK, Struik PC, van der Linden CG, Yin X. Incorporating genome-wide association into eco-physiological simulation to identify markers for improving rice yields. Journal of Experimental Botany. 2019 Apr 15;70(9):2575–86. https://doi.org/10.1093/jxb/erz120
OpenUrl

[57] 57.↵
Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ES. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proceedings of the National Academy of Sciences. 2001 Sep 25;98(20):11479–84. https://doi.org/10.1073/pnas.201394398
OpenUrl Abstract/FREE Full Text

[58] 58.↵
Alemu A, Feyissa T, Maccaferri M, Sciara G, Tuberosa R, Ammar K, Badebo A, Acevedo M, Letta T, Abeyo B. Genome-wide association analysis unveils novel QTLs for seminal root system architecture traits in Ethiopian durum wheat. BMC Genomics. 2021 Dec;22(1):1–6. https://doi.org/10.1186/s12864-020-07320-4
OpenUrl CrossRef

[59] 59.↵
Boyles RE, Cooper EA, Myers MT, Brenton Z, Rauh BL, Morris GP, Kresovich S. Genome-wide association studies of grain yield components in diverse sorghum germplasm. The Plant Genome. 2016 Jul;9(2): https://doi.org/10.3835/plantgenome2015.09.0091

[60] 60.↵
Zhang M, Kim Y, Zong J, Lin H, Dievart A, Li H, Zhang D, Liang W. Genome-wide analysis of the barley non-specific lipid transfer protein gene family. The Crop Journal. 2019 Feb 1;7(1):65–76. https://doi.org/10.1016/j.cj.2018.07.009
OpenUrl

[61] 61.↵
Luo X, Ma C, Yue Y, Hu K, Li Y, Duan Z, Wu M, Tu J, Shen J, Yi B, Fu T. Unravelling the complex trait of harvest index in rapeseed (Brassica napus L.) with association mapping. BMC Genomics. 2015 Dec;16(1):1–0. https://doi.org/10.1186/s12864-015-1607-0
OpenUrl CrossRef PubMed

[62] 62.↵
Zhao X, Chang H, Feng L, Jing Y, Teng W, Qiu L, Zheng H, Han Y, Li W. Genome-wide association mapping and candidate gene analysis for saturated fatty acid content in soybean seed. Plant Breeding. 2019 Oct;138(5):588–98. https://doi.org/10.1111/pbr.12706
OpenUrl

[63] 63.↵
Wang ML, Sukumaran S, Barkley NA, Chen Z, Chen CY, Guo B, Pittman RN, Stalker HT, Holbrook CC, Pederson GA, Yu J. Population structure and marker-trait association analysis of the US peanut (Arachis hypogaea L.) mini-core collection. Theoretical and Applied Genetics. 2011 Dec 1;123(8):1307–17. https://doi.org/10.1007/s00122-011-1668-7
OpenUrl CrossRef PubMed

[64] 64.↵
Zhu C, Gore M, Buckler ES, Yu J. Status and prospects of association mapping in plants. The Plant Genome. 2008 Jul;1(1). https://doi.org/10.3835/plantgenome2008.02.0089

[65] 65.↵
Motamayor JC, Lachenaud P, e Mota JW, Loor R, Kuhn DN, Brown JS, Schnell RJ. Geographic and genetic population differentiation of the Amazonian chocolate tree (Theobroma cacao L). PloS one. 2008 Oct 1;3(10):e3311. https://doi.org/10.1371/journal.pone.0003311
OpenUrl CrossRef PubMed

[66] 66.↵
Bekele F, Butler DR. Proposed short list of cocoa descriptors for characterization. In: Working procedures for cocoa germplasm evaluation and selection. Proceedings of the CFC/ICCO/IPGRI Project Workshop, Montpellier, France, 1-6 February, 1998. 2000 (pp. 41–48). International Plant Genetic Resources Institute (IPGRI), Rome.

[67] 67.↵
Toxopeus H. Cocoa breeding: a consequence of mating system heterosis and population structure. In: Proc. of Conf. on Cocoa and Coconuts in Malaysia. Wastie RL, Earp DA (Eds) 25 27 November, 1971, Kuala Lumpur. 1972;3 12.The Incorporated Society of Planters, Kuala Lumpur.

[68] 68.
Cilas C, Machado R, Motamayor JC. Relations between several traits linked to sexual plant reproduction in Theobroma cacao L.: number of ovules per ovary, number of seeds per pod, and seed weight. Tree Genetics & Genomes. 2010 Feb 1;6(2):219–26. https://doi.org/10.1007/s11295-009-0242-9
OpenUrl

[69] 69.↵
dos Santos Fernandes L, Correa FM, Ingram KT, de Almeida AA, Royaert S. QTL mapping and identification of SNP-haplotypes affecting yield components of Theobroma cacao L. Horticulture Research. 2020 Mar 1;7(1):1–8. https://doi.org/10.1038/s41438-020-0250-3
OpenUrl

[70] 70.↵
Pound FJ. The Progress of Selection. In Third Annual Report on Cacao Research, 1933. 1943;25 8. Trinidad and Tobago Government Printery, Port-of-Spain.
OpenUrl

[71] 71.↵
Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, Toomajian C, Zheng H, Dean C, Marjoram P, Nordborg M. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007 Jan 19;3(1):e4. https://doi.org/10.1371/journal.pgen.0030004
OpenUrl CrossRef PubMed

[72] 72.↵
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun 1;155(2):945–59. https://doi.org/10.1093/genetics/155.2.945
OpenUrl Abstract/FREE Full Text

[73] 73.
Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theoretical Population Biology. 2001 Nov 1;60(3):227–37. https://doi.org/10.1006/tpbi.2001.1543
OpenUrl CrossRef PubMed Web of Science

[74] 74.
Pritchard JK, Wen W, Falush D. Documentation for STRUCTURE software: Version 2. University of Chicago, Chicago, IL. 2010 Feb 2. http://pritch.bsd.uchicago.edu/structure.html

[75] 75.↵
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 2005 Jul;14(8):2611–20. https://doi.org/10.1111/j.1365-294X.2005.02553.x
OpenUrl CrossRef PubMed Web of Science

[76] 76.↵
Perrier X, Jacquemoud-Collet JP. DARwin software. 2006. http://darwin.cirad.fr/

[77] 77.↵
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007 Oct 1;23(19):2633–5. https://doi.org/10.1093/bioinformatics/btm308
OpenUrl CrossRef PubMed Web of Science

[78] 78.↵
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics. 2006 Feb;38(2):203–8. https://doi.org/10.1038/ng1702
OpenUrl CrossRef PubMed Web of Science

[79] 79.↵
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975 Jun 1:423–47. https://doi.org/10.2307/2529430

[80] 80.↵
Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3: Genes| Genomes| Genetics. 2012 Nov 1;2(11):1405–13. https://doi.org/10.1534/g3.112.004259
OpenUrl

[81] 81.↵
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Transactions on information theory. 1967 Jan;13(1):21–7. doi: 10.1109/TIT.1967.1053964.
OpenUrl CrossRef

[82] 82.↵
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006 Aug;38(8):904–9. https://doi.org/10.1038/ng1847
OpenUrl CrossRef PubMed Web of Science

[83] 83.↵
R Core Team. R: A Language and Environment for Statistical Computing. 887 R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-proje ct. org. 2017; 888.

[84] 84.↵
Gao X, Becker LC, Becker DM, Starmer JD, Province MA. Avoiding the high Bonferroni penalty in genome-wide association studies. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2010 Jan;34(1):100–5. https://doi.org/10.1002/gepi.20430
OpenUrl

[85] 85.↵
Utro F, Haiminen N, Livingstone D, Cornejo OE, Royaert S, Schnell RJ, Motamayor JC, Kuhn DN, Laxmi P. iXora: exact haplotype inferencing and trait association. BMC Genetics. 2013 Dec;14(1):1–5. https://doi.org/10.1186/1471-2156-14-48
OpenUrl

[86] 86.↵
Gutiérrez-López N, Ovando-Medina I, Salvador-Figueroa M, Molina-Freaner F, Avendaño-Arrazate CH, Vázquez-Ovando A. Unique haplotypes of cacao trees as revealed by trnH-psbA chloroplast DNA. PeerJ. 2016 Apr 7;4:e1855. https://doi.org/10.7717/peerj.1855

[87] 87.↵
Motilal LA, Zhang D, Mischke S, Meinhardt LW, Boccara M, Fouet O, Lanaud C, Umaharan P. Association mapping of seed and disease resistance traits in Theobroma cacao L. Planta. 2016 Dec;244(6):1265–76. https://doi.org/10.1007/s00425-016-2582-7
OpenUrl CrossRef

[88] 88.↵
Marcano M, Pugh T, Cros E, Morales S, Páez EA, Courtois B, Glaszmann JC, Engels JM, Phillips W, Astorga C, Risterucci AM. Adding value to cocoa (Theobroma cacao L.) germplasm information with domestication history and admixture mapping. Theoretical and Applied Genetics. 2007 Mar 1;114(5):877–84. https://doi.org/10.1007/s00122-006-0486-9
OpenUrl CrossRef PubMed Web of Science

[89] 89.↵
Motamayor JC, Mockaitis K, Schmutz J, Haiminen N, Livingstone III D, Cornejo O, Findley SD, Zheng P, Utro F, Royaert S, Saski C. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biology. 2013 Jun;14(6):1–25. https://doi.org/10.1186/gb-2013-14-6-r53
OpenUrl CrossRef

[90] 90.↵
Jako C, Kumar A, Wei Y, Zou J, Barton DL, Giblin EM, Covello PS, Taylor DC. Seed-specific over-expression of an Arabidopsis cDNA encoding a diacylglycerol acyltransferase enhances seed oil content and seed weight. Plant Physiology. 2001 Jun 1;126(2):861–74. https://doi.org/10.1104/pp.126.2.861
OpenUrl Abstract/FREE Full Text

[91] 91.↵
Motilal LA, Sounigo O, Thévenin JM, Risterucci AM, Pieretti I, Noyer JL, Lanaud C. Theobroma cacao L.: genome map and QTLs for Phytophthora palmivora resistance. In: Towards the effective and optimum promotion of cocoa through research and development. Proceedings of the 13th International Cocoa Research Conference, October 9 14, 2000, Kota Kinabalu, Malaysia. Cocoa Producers’ Alliance, Lagos, 2001;111–17.

[92] 92.↵
Queiroz VT, Guimarães CT, Ahnert D, Schuster I, Daher RT, Pereira MG, Miranda VR, Loguercio LL, Barros EG, Moreira MA, Wricke G. Identification of a major QTL in cocoa (Theobroma cacao L.) associated with resistance to Witches’ Broom disease. Plant Breeding. 2003 Jun;122(3):268–72. https://doi.org/10.1046/j.1439-0523.2003.00809.x
OpenUrl

[93] 93.↵
Royaert S, Phillips-Mora W, Leal AM, Cariaga K, Brown JS, Kuhn DN, Schnell RJ, Motamayor JC. Identification of marker-trait associations for self-compatibility in a segregating mapping population of Theobroma cacao L. Tree Genetics & Genomes. 2011 Dec;7(6):1159–68. https://doi.org/10.1007/s11295-011-0403-5
OpenUrl

[94] 94.↵
Royaert S, Jansen J, da Silva DV, de Jesus Branco SM, Livingstone DS, Mustiga G, Marelli JP, Araújo IS, Corrêa RX, Motamayor JC. Identification of candidate genes involved in Witches’ Broom disease resistance in a segregating mapping population of Theobroma cacao L. in Brazil. BMC Genomics. 2016 Dec;17(1):107. https://doi.org/10.1186/s12864-016-2415-x
OpenUrl CrossRef

[95] 95.↵
Sounigo O, Efombagn B, Lemainque A et al. Association mapping on cocoa: a way to identify functional SSR markers linked to yield, tolerance to black pod and mirids assessed in Cameroon and develop a marker assisted breeding programme. In Proceedings of the 16th Int Cocoa Research Conference, Bali, Indonesia, 16 21 November 2009. 2012;153 58. COPAL, Lagos.
OpenUrl

[96] 96.↵
Da Silva MR, Clément D, Gramacho KP, Monteiro WR, Argout X, Lanaud C, Lopes U. Genome-wide association mapping of sexual incompatibility genes in cacao (Theobroma cacao L.). Tree Genetics & Genomes. 2016 Jun;12(3):1–3. https://doi.org/10.1007/s11295-016-1012-0
OpenUrl CrossRef

[97] 97.↵
Osorio-Guarín JA, Berdugo-Cely JA, Coronado-Silva RA, Baez E, Jaimes Y, Yockteng R. Genome-wide association study reveals novel candidate genes associated with productivity and disease resistance to Moniliophthora spp. in cacao (Theobroma cacao L.). G3: Genes, Genomes, Genetics. 2020 May 1;10(5):1713–25. https://doi.org/10.1534/g3.120.401153
OpenUrl

[98] 98.↵
Semagn K, Bjørnstad Å, Xu Y. The genetic dissection of quantitative traits in crops. Electronic Journal of Biotechnology. 2010 Sep;13(5):16–7. https://scielo.conicyt.cl/pdf/ejb/v13n5/a16.pdf
OpenUrl

[99] 99.↵
Mir RR, Choudhary N, Bawa V, Jan S, Singh B, Ashraf Bhat M, Paliwal R, Kumar A, Chitikineni A, Thudi M, Varshney RK. Allelic diversity, structural analysis and genome-wide association study (GWAS) for yield and related traits using unexplored common bean (Phaseolus vulgaris L.) germplasm from Western Himalayas. Frontiers in Genetics. 2021;11:1797. doi: 10.3389/fgene.2020.609603
OpenUrl CrossRef

[100] 100.↵
Caspari E. Pleiotropic gene action. Evolution. 1952 March;6:1–18. https://www.jstor.org/stable/2405500
OpenUrl CrossRef

[101] 101.↵
Yeaman S. Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences. 2013 May 7;110(19):E1743–51. https://doi.org/10.1073/pnas.1219381110
OpenUrl Abstract/FREE Full Text

[102] 102.↵
Elwers S, Zambrano A, Rohsius C, Lieberei R. Histological features of phenolic compounds in fine and bulk cocoa seed (Theobroma cacao L.). J Appl Bot Food Qual. 2010 Sep 1;83(2):182–8. https://www.researchgate.net/profile/Alexis-Zambrano-3/publication/259703547_Histological_features_of_phenolic_compounds_in_fi_ne_and_bulk_cocoa_seed_Theobroma_cacao_L/links/5693b49708ae820ff0727949/Histological-features-of-phenolic-compounds-in-fi-ne-and-bulk-cocoa-seed-Theobroma-cacao-L.pdf
OpenUrl

[103] 103.↵
Bucheli P, Rousseau G, Alvarez M, Laloi M, McCarthy J. Developmental variation of sugars, carboxylic acids, purine alkaloids, fatty acids, and endoproteinase activity during maturation of Theobroma cacao L. seeds. Journal of Agricultural and Food Chemistry. 2001 Oct 15;49(10):5046–51. https://doi.org/10.1021/jf010620z
OpenUrl PubMed

[104] 104.↵
Mustiga GM, Morrissey J, Stack JC, DuVal A, Royaert S, Jansen J, Bizzotto C, Villela-Dias C, Mei L, Cahoon EB, Seguine E. Identification of climate and genetic factors that control fat content and fatty acid composition of Theobroma cacao L. beans. Frontiers in Plant Science. 2019 Oct 14;10:1159. https://doi.org/10.1105/tpc.109.068437
OpenUrl

[105] 105.↵
Amin I, Jinap S, Jamilah B, Harikrisna K, Biehl B. Analysis of vicilin (7S)?class globulin in cocoa cotyledons from various genetic origins. Journal of the Science of Food and Agriculture. 2002 May 15;82(7):728–32. https://doi.org/10.1002/jsfa.1104
OpenUrl

[106] 106.↵
Fritz PJ, Fritz KA, Kauffman JM, Patterson GR, Robertson CA, Stoesz DA, Wilson MR. Cocoa seeds: changes in protein and polysomal RNA during development. Journal of Food Science. 1985 Jul;50(4):946–50. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2621.1985.tb12986.x
OpenUrl

[107] 107.↵
Lung SC, Weselake RJ. Diacylglycerol acyltransferase: a key mediator of plant triacylglycerol synthesis. Lipids. 2006 Dec;41(12):1073–88. https://doi.org/10.1007/s11745-006-5057-y
OpenUrl CrossRef PubMed Web of Science

[108] 108.↵
Sørensen BM, Furukawa-Stoffer TL, Marshall KS, Page EK, Mir Z, Forster RJ, Weselake RJ. Storage lipid accumulation and acyltransferase action in developing flaxseed. Lipids. 2005 Oct;40(10):1043–9. https://doi.org/10.1007/s11745-005-1467-0
OpenUrl CrossRef PubMed Web of Science

[109] 109.↵
Perry HJ, Harwood JL. Changes in the lipid content of developing seeds of Brassica napus. Phytochemistry. 1993 Jan 1;32(6):1411–5. https://doi.org/10.1016/0031-9422(93)85148-K
OpenUrl CrossRef

[110] 110.↵
Doebley JF, Gaut BS, Smith BD. The molecular genetics of crop domestication. Cell. 2006 Dec 29;127(7):1309–21. https://doi.org/10.1016/j.cell.2006.12.006
OpenUrl CrossRef PubMed Web of Science

[111] 111.↵
Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP. Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Frontiers in Genetics. 2016 Dec 27;7:221. https://www.frontiersin.org/articles/10.3389/fgene.2016.00221/full
OpenUrl

[112] 112.↵
Mangini G, Blanco A, Nigro D, Signorile MA, Simeone R. Candidate genes and quantitative trait loci for grain yield and seed size in Durum Wheat. Plants 2021;10, 312–28. https://doi.org/10.3390/plants10020312
OpenUrl

[113] 113.↵
Ohto MA, Floyd SK, Fischer RL, Goldberg RB, Harada JJ. Effects of APETALA2 on embryo, endosperm, and seed coat development determine seed size in Arabidopsis. Sexual Plant Reproduction. 2009 Dec;22(4):277–89. https://doi.org/10.1007/s00497-009-0116-1
OpenUrl CrossRef PubMed Web of Science

[114] 114.↵
Kroj T, Savino G, Valon C, Giraudat J, Parcy F. Regulation of storage protein gene expression in Arabidopsis. Development. 2003 Dec 15;130(24):6065–73. https://doi.org/10.1242/dev.00814
OpenUrl Abstract/FREE Full Text

[115] 115.↵
Sosso D, Luo D, Li QB, Sasse J, Yang J, Gendrot G, Suzuki M, Koch KE, McCarty DR, Chourey PS, Rogowsky PM. Seed filling in domesticated maize and rice depends on SWEET-mediated hexose transport. Nature Genetics. 2015 Dec;47(12):1489. https://doi.org/10.1038/ng.3422
OpenUrl CrossRef PubMed

[116] 116.↵
Falcone Ferreyra ML, Rius S, Casati P. Flavonoids: biosynthesis, biological functions, and biotechnological applications. Frontiers in Plant Science. 2012 Sep 28;3:222. https://doi.org/10.3389/fpls.2012.00222.
OpenUrl

[117] 117.↵
Devy L, Anita-Sari I, Saputra TI, Susilo AW, Wachjar A. Identification of molecular marker based on MYB Transcription Factor for the selection of Indonesian Fine Cacao (Theobroma cacao L.). Pelita Perkebunan (a Coffee and Cocoa Research Journal). 2018 Aug 31;34(2):59–68.
OpenUrl

[118] 118.↵
Liu Y, Shi Z, Maximova SN, Payne MJ, Guiltinan MJ. Tc-MYBPA is an Arabidopsis TT2-like transcription factor and functions in the regulation of proanthocyanidin synthesis in Theobroma cacao. BMC Plant Biology. 2015 Dec;15(1):1–6. https://doi.org/10.1186/s12870-015-0529-y
OpenUrl

[119] 119.↵
Bartley BGD. The genetic diversity of cacao and its utilization. The genetic diversity of cacao and its utilization. 2005. CABI Publishing, Wallingford.

[120] 120.↵
Rockman MV. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution: International Journal of Organic Evolution. 2012 Jan;66(1):1–7. https://doi.org/10.1111/j.1558-5646.2011.01486.x
OpenUrl CrossRef PubMed Web of Science

[121] 121.↵
Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics. 2014 Jul 3;95(1):5–23. https://doi.org/10.1016/j.ajhg.2014.06.009
OpenUrl CrossRef PubMed

[122] 122.↵
Bernardo R. Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Science. 2008 Sep;48(5):1649–64. https://doi.org/10.2135/cropsci2008.03.0131
OpenUrl CrossRef PubMed Web of Science

[123] 123.↵
Qi Z, Song J, Zhang K, Liu S, Tian X, Wang Y, Fang Y, Li X, Wang J, Yang C, Jiang S. Identification of QTNs controlling 100-seed weight in soybean using multilocus genome-wide association studies. Frontiers in Genetics. 2020 Jul 16;11:689. https://doi.org/10.3389/fgene.2020.00689

[124] 124.↵
Bailey BA, Melnick RL, Strem MD, Crozier J, Shao J, Sicher R, Phillips-Mora W, Ali SS, Zhang D, Meinhardt L. Differential gene expression by Moniliophthora roreri while overcoming cacao tolerance in the field. Molecular Plant Pathology. 2014 Sep;15(7):711–29. https://doi.org/10.1111/mpp.12134
OpenUrl

[125] 125.↵
Pokou DN, Fister AS, Winters N, Tahi M, Klotioloma C, Sebastian A, Marden JH, Maximova SN, Guiltinan MJ. Resistant and susceptible cacao genotypes exhibit defense gene polymorphism and unique early responses to Phytophthora megakarya inoculation. Plant Molecular Biology. 2019 Mar;99(4):499–516. https://doi.org/10.1007/s11103-019-00832-y
OpenUrl CrossRef

[126] 126.↵
Chai Y, Hao X, Yang X, Allen WB, Li J, Yan J, Shen B, Li J. Validation of DGAT1-2 polymorphisms associated with oil content and development of functional markers for molecular breeding of high-oil maize. Molecular Breeding. 2012 Apr;29(4):939–49. https://doi.org/10.1007/s11032-011-9644-0
OpenUrl

[127] 127.↵
Rebbeck TR, Spitz M, Wu X. Assessing the function of genetic variants in candidate gene association studies. Nature Reviews Genetics. 2004 Aug;5(8):589–97. https://doi.org/10.1038/nrg1403
OpenUrl PubMed Web of Science

[128] 128.↵
McKown AD, Klápště J, Guy RD, Geraldes A, Porth I, Hannemann J, Friedmann M, Muchero W, Tuskan GA, Ehlting J, Cronk QC. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytologist. 2014 Jul;203(2):535–53. https://doi.org/10.1111/nph.12815
OpenUrl CrossRef PubMed

[129] 129.↵
Micheli F, Maximova S, Gramacho KP, Guiltinan M, Wilkinson MJ, Lanaud, C, … de Mattos Cascardo JC. Functional genomics of cacao. In: Advances in Botanical Research, Chapter 3. Jean-Claude K, Michel D (Eds). 2010;119–177. Academic Press, London,

[130] 130.↵
Romero Navarro JA, Phillips-Mora W, Arciniegas-Leal A, Mata-Quirós A, Haiminen N, Mustiga G, Livingstone III D, van Bakel H, Kuhn DN, Parida L, Kasarskis A. Application of genome wide association and genomic prediction for improvement of cacao productivity and resistance to black and frosty pod diseases. Frontiers in Plant Science. 2017 Nov 14;8:1905. https://doi.org/10.3389/fpls.2017.01905
OpenUrl

[131] 131.↵
McElroy MS, Navarro AJ, Mustiga G, Stack C, Gezan S, Peña G, Sarabia W, Saquicela D, Sotomayor I, Douglas GM, Migicovsky Z. Prediction of cacao (Theobroma cacao) resistance to Moniliophthora spp. diseases via genome-wide association analysis and genomic selection. Frontiers in Plant Science. 2018 Mar 20;9:343. https://doi.org/10.3389/fpls.2018.00343
OpenUrl

[132] 132.↵
Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001 Apr 1;157(4):1819–29. https://doi.org/10.1093/genetics/157.4.1819
OpenUrl Abstract/FREE Full Text

[133] 133.↵
Cerrudo D, Cao S, Yuan Y, Martinez C, Suarez EA, Babu R, Zhang X, Trachsel S. Genomic selection outperforms marker assisted selection for grain yield and physiological traits in a maize doubled haploid population across water treatments. Frontiers in Plant Science. 2018 Mar 20;9:366. https://doi.org/10.3389/fpls.2018.00366
OpenUrl

[134] 134.↵
Bernardo R, Yu J. Prospects for genomewide selection for quantitative traits in maize. Crop Science. 2007 May;47(3):1082–90. https://doi.org/10.2135/cropsci2006.11.0690
OpenUrl CrossRef Web of Science

[135] 135.↵
Liu D, Zhang J, Liu X, Wang W, Liu D, Teng Z, Fang X, Tan Z, Tang S, Yang J, Zhong J. Fine mapping and RNA-Seq unravels candidate genes for a major QTL controlling multiple fiber quality traits at the T 1 region in upland cotton. BMC Genomics. 2016 Dec;17(1):1–3. https://doi.org/10.1186/s12864-016-2605-6
OpenUrl CrossRef PubMed

[136] 136.↵
Poczai P, Varga I, Laos M, Cseh A, Bell N, Valkonen JP, Hyvönen J. Advances in plant gene-targeted and functional markers: a review. Plant Methods. 2013 Dec;9(1):1–32. https://doi.org/10.1186/1746-4811-9-6
OpenUrl CrossRef PubMed

[137] 137.
Li N, Xu R, Li Y. Molecular networks of seed size control in plants. Annual Review of Plant Biology. 2019 Apr 29;70:435–63. https://doi.org/10.1146/annurev-arplant-050718-095851
OpenUrl CrossRef PubMed

[138] 138.
Turner SD. qqman: an R package for visualizing GWAS results using QQ and m Manhattan plots. Biorxiv. 2014 Jan 1:005165.

Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential

Abstract

Introduction

Materials and Methods

Germplasm studied

Management of germplasm under study

Phenotypic data collection

Yield-related traits in cacao

Collection of genotypic data

Statistical analyses

Phenotypic data analysis

Genotypic data analysis

Determining population structure

Analysis of inferred population structure

Genome-wide Association Study Analysis (GWAS)

Testing the robustness of the Mixed Linear Model (MLM) output

Tests of significance of association

Genomic prediction

Detecting candidate genes located within marker-trait association zones

Results

Phenotypic data analysis

Population structure

Linkage disequilibrium (LD)

Comparison of results generated for different models utilized in GWAS

Putatively robust marker-trait associations

Genomic prediction value of traits

Discussion

Putative candidate genes for yield-related traits

Consensus MTAs involving yield-related traits

Anthocyanin pigmentation

Future prospects

Conclusion

Author contributions

Supporting information

Acknowledgements

References

Citation Manager Formats

Subject Area