ABSTRACT
The cultivated strawberry (Fragaria ×ananassa) is an economically important fruit crop that is intensively bred for improved sensory qualities. The diversity of fruit flavors and aromas in strawberry result mainly from the interactions of sugars, acids, and volatile organic compounds (VOCs) that are derived from diverse biochemical pathways influenced by the expression of many genes. This study integrates multi-omics analyses to identify QTL and candidate genes for multiple aroma compounds in a complex strawberry breeding population. Novel fruit volatile QTL were discovered for methyl anthranilate, methyl 2-hexenoate, methyl 2-methylbutyrate, mesifurane, and a shared QTL on Chr 3 was found for nine monoterpene and sesquiterpene compounds, including linalool, 3-carene, β-phellandrene, α-limonene, linalool oxide, nerolidol, α-caryophellene, α-farnesene, and β-farnesene. Fruit transcriptomes from a subset of sixty-four individuals were used to support candidate gene identification. For methyl esters including the grape-like methyl anthranilate, a novel ANTHANILIC ACID METHYL TRANSFERASE–like gene was identified. Two mesifurane QTL correspond with the known biosynthesis gene O-METHYL TRANSFERASE 1 and a novel FURANEOL GLUCOSYLTRANSFERASE. The shared terpene QTL contains multiple fruit-expressed terpenoid pathway-related genes including NEROLIDOL SYNTHASE 1 (FanNES1). The abundance of linalool and other monoterpenes is partially governed by a co-segregating expression-QTL (eQTL) for FanNES1 transcript variation, and there is additional evidence for quantitative effects from other terpenoid-pathway genes in this narrow genomic region. These QTL present new opportunities in breeding for improved flavor in commercial strawberry.
INTRODUCTION
The dessert strawberry (Fragaria × ananassa) is a widely celebrated fruit with increasing consumption. For decades consumers have reported the desire for improved flavor in commercial strawberry (Fletcher 1917; Chambers 2013). The aroma intensity of modern cultivars is lower than in wild strawberries (Ulrich and Olbricht 2014), and breeding efforts seek to reclaim these qualities. Today flavor and aroma are central priorities of strawberry breeding programs (Faedi et al. 2002; Whitaker et al. 2011; Vandendriessche et al. 2013). However, breeders face a significant challenge in the recapture and consolidation of genetics contributing to favorable flavors and aromas. Genetic and genomic analysis has been used to identify these elements and contribute to the breeding of new cultivars with improved sensory qualities.
Strawberry flavor and aroma are dictated by several factors, including sugars and acids. But it is the trace volatile organic compounds (VOCs) that shape the sensory experience (Bood and Zabetakis 2002). Volatile organic compounds, represented broadly as esters, alcohols, terpenoids, furans and lactones, are a substantial portion of the fruit secondary metabolome and contribute to aroma, flavor, disease resistance, pest resistance and overall fruit quality (Ulrich et al. 1997; Arroyo et al. 2007). Various studies have helped to identify human preferences for individual strawberry aroma and flavor compounds (Schwieterman et al. 2014; Ulrich et al. 1997; Schieberle and Hofmann 1997; Larsen and Poll 1992). Of hundreds of strawberry VOCs, these studies agree on fewer than ten that clearly influence human preference (Schwieterman et al. 2014). Introgressing important compounds into commercially viable cultivars has been aided by efforts in volatilomics QTL detection (Urrutia et al. 2017), multi-omics identification of VOC candidate genes (Pillet et al. 2017; Chambers et al. 2014), integration of sensory and consumer preference data (Schwieterman et al. 2014; Sánchez-Sevilla et al. 2014), and ultimately introgression of genes via marker-assisted selection (Folta and Klee 2016; Rambla et al. 2017; Eggink et al. 2014).
Only a few strawberry genes controlling desirable aroma compounds have been identified with confidence. These include biosynthesis genes for linalool (Aharoni et al. 2004), mesifurane (Wein et al. 2002), γ-decalactone (Chambers et al. 2014; Sánchez-Sevilla et al. 2014) and methyl anthranilate (Pillet et al. 2017). The gene ANTHRANILIC ACID METHYL TRANSFERASE (FanAAMT), located on octoploid chromosome group 4, is a necessary-but-not-sufficient gene for catalyzing the methylation of anthranilate into the grape-like aroma compound methyl anthranilate (Pillet et al. 2017). Methyl anthranilate production has been long regarded as a complex trait, governed by multiple genes and strong environmental influences. Methyl anthranilate is produced abundantly in the fruit of the diploid strawberry sp. Fragaria vesca, but it is reported in only a few octoploid varieties including ‘Mara des Bois’ and ‘Mieze Schindler’ (Ulrich and Olbricht 2016). For terpenoid biosynthesis, strawberry NEROLIDOL SYNTHASE 1 (NES1) was identified by comparing diploid and octoploid species, which are enriched respectively for nerolidol or linalool. A truncated plastid targeted signal in the octoploid FanNES1 gene retargets the enzyme to the cytosol, where there is abundant precursor for linalool biosynthesis (Aharoni et al. 2004). Recent reports have complicated this story somewhat, as three octoploid F. virginiana lines, hexaploid F. moschata, and some diploid strawberries produce linalool without this truncation (Ulrich and Olbricht 2013). In mesifurane biosynthesis, the gene O-METHYL TRANSFERASE 1 (FanOMT1) catalyzes the methylation of furaneol to create mesifurane (Wein et al., 2002, Zorrilla-Fontanesi et al., 2012). Mesifurane abundance is affected by a common FanOMT1 promoter loss-of-function allele, which both eliminates gene expression and mesifurane production. Only one copy of the competent FanOMT1 allele is reportedly sufficient for robust production, however a lack of production is sometimes observed even in the homozygous positive state (Cruz-Rus et al. 2017). An octoploid gene encoding QUINONE REDUCTASE (FanQR) can produce furaneol in vitro, however no natural variants of this gene have been established which vary mesifurane levels in vivo (Raab et al. 2006). Similarly, glucosylation of both furaneol and mesifurane are known to occur in strawberry, however genetic variation has not been established for this step. Several furaneol glucosyltransferases have been cloned and characterized in vitro from F. ×ananassa (Song et al. 2016; Yamada et al. 2019).
This research integrates high-density genotyping and non-targeted fruit volatile metabolomics from eight pedigree-connected octoploid crosses (n=213) (Figure S1). Fruit transcriptomes from a subset of individuals (n=61) were used to identify fruit-expressed candidate genes within QTL regions (Figure S1). Three maps were utilized independently in this analysis, as less than one-third of octoploid subgenome-specific markers are incorporated in any single octoploid genetic map (Anciro et al. 2018; van Dijk et al. 2014). These are the ‘Holiday’ × ‘Korona’ (van Dijk et al. 2014) and FL_08-10 × 12.115-10 (Verma et al. 2017) genetic maps, and the F. vesca physical map. The correspondence of ‘Holiday’ × ‘Korona’ linkage groups to the recent octoploid ‘Camarosa’ reference genome (Edger et al. 2019) helped specify the subgenomic identity of QTL (Hardigan et al. 2020). To correspond QTL markers to specific candidate gene regions, marker nucleotide sequences from the IStraw35 SNP genotyping platform were aligned by sequence to the octoploid genome. However, the very high sequence identity between homoeologous chromosomes limited the specificity of this approach. Evidence from all of these resources were integrated to specify candidate gene regions.
Fruit transcriptomes were used to identify expressed genes within QTL regions and to associate trait/transcript levels. Genotypic data were associated with fruit transcriptomics data via expression-QTL (eQTL) analysis. Transcript eQTL analysis identifies genetic variants associated with heritable transcript level variation. Transcript eQTL often correspond to the locus of the originating gene (cis-eQTL), and often signify gene promoter mutation or gene presence/absence variation. In cases where a trait is governed by simple genetic control of transcript levels of a causal gene, an eQTL should be detected which co-segregates with trait QTL markers. This approach can help specify the casual mechanisms behind trait QTL. Previous eQTL analyses in these same fruit RNA-seq populations identified hundreds of fruit eQTL, the vast majority of which were proximal to the originating gene locus (Barbey et al. 2020). Global transcriptomes from tissues throughout the octoploid cultivar ‘Camarosa’ were used to correlate candidate genes and transcript abundance with ripening-associated volatile biosynthesis (Sánchez-Sevilla et al. 2017).
MATERIALS AND METHODS
Plant Materials
Eight controlled crosses were made among octoploid cultivars and also elite breeding lines from the University of Florida. These were ‘Florida Elyana’ × ‘Mara de Bois’ (population 10.113), ‘Mara des Bois’ × ‘Florida Radiance’ (population 13.75), ‘Strawberry Festival’ × ‘Winter Dawn’ (population 13.76), 12.115-10 × 12.121-5 (population 15.89), 12.22-10 × 12.115-10 (population 15.91), 12.115-10 × 12.74-39 (population 15.93), ‘Florida Elyana’ × 12.115-10 (population 16.11), and ‘Mara des Bois’ × ‘Mara des Bois’ (population 16.85). Seedlings were clonally propagated by runners in a summer nursery to generate multiple plants (clonal replicates), and two to four plants representing original seedlings were established in single plots in the fruiting field. (Barbey et al. 2020).
Field Collection
All fruit were harvested from a field maintained under commercial growing practices during winter growing seasons at the Gulf Coast Research and Education Center (GCREC) in Wimauma, Florida. Fruit from populations 15.89, 15.91, 15.93 were harvested on February 19, March 15, and April 14, 2016. Populations 16.11, 16.85, were harvested on February 16 and March 2, 2017. Populations 13.75 and 13.76 were harvested during the winter of 2014. Population 10.133 was sampled on January 20, February 11, February 25, and March 18, 2011 (Chambers 2013). Harvest days were selected based on dry weather and moderate temperature, both on the day of harvest and for several days preceding harvest to maximize volatile production and capture. Three mature fruit per genotype were cleaned, collected into a single sample bag, crushed, and immediately flash-frozen in liquid nitrogen, in the field. Samples were transported to Gainesville, Florida and maintained at -80 °C.
Sample Processing and Preparation
Unprocessed frozen fruit samples were equilibrated from -80 °C to liquid nitrogen temperature before being pureed in an electric blender. Frozen puree was collected into a 50ml sterile collection tube and stored at -80 °C. For volatile sample processing, 3g of fruit puree from each sampled genotype at a single timepoint was aliquoted into two technical replicate 20ml headspace vials and combined with 3ml 35% NaCl solution containing 1ppm 3-hexanone as an internal standard. Prepared vials were stored at -80°C, thawed at room temperature and vortexed prior to GC-MS analysis.
Volatile Metabolomic Profiling and Analysis
Samples were equilibrated to 40 °C for 30 min in a 40 °C heated chamber. A 2 cm tri-phase SPME fiber (50/30 μm DVB/Carboxen/PDMS, Supelco, Bellefonte, PA, USA) was exposed to the headspace for 30 min at 40°C for volatile collection and concentration. The fiber was then injected into an Agilent 6890 GC (for 5 min at 250°C for desorption of volatiles. Inlet temperatures were maintained at 250°C, ionizing sources at 230°C, and transfer line temperatures at 280°C. The separation was performed via DB5ms capillary column (60 m x 250 μm x 1.00 μm) (J&W, distributed by Agilent Technologies) at a constant flow (He: 1.5 ml per min). The initial oven temperature was maintained at 40°C for 30 seconds, followed by a 4°C per min increase to a final temperature of 230°C, then to 260°C at 100°C per min, with a final hold time of 10 min. Data were collected using the Chemstation G1701 AA software (Hewlett-Packard, Palo Alto, CA).
Chromatograms were processed using the Metalign metabolomics preprocessing software package (Lommen and Kools 2012). Baseline and noise corrections were performed using a peak slope factor of 1x noise, and a peak threshold factor of 2x noise. Auto-scaling and iterative pre-alignment options were not selected. A maximum shift of 100 scans before peak identification, and 200 scans after peak identification was used. In later validation steps, these search tolerances were determined to be sufficiently inclusive while also limiting to false-positives. The MSClust software package was then used for statistical clustering of ions based on retention time and co-variance across the population using default parameters (Tikunov et al. 2012). Clusters were batch queried against the NIST08 reference database using Chemstation G1701 AA software (Hewlett-Packard, Palo Alto, CA). Library search outputs were parsed using a custom Perl script prior to multivariate analysis. Chromatograms were batched by GC-MS sampling year to mitigate ion misalignments caused by system-dependent retention time shifts. VOC abundances were normalized based on the internal standard. Normalized data between seasons were consolidated manually based on elution order, NIST identification, and rerunning of sample standards.
Genotyping of Flavor and Aroma Populations
Individuals from all populations were genotyped using the IStraw90 (Bassil et al. 2015)) platform, excepting populations 16.11 and 16.89 which were genotyped using the IStraw35 platform (Verma et al. 2017). All parents and 204 progeny were selected for genotyping based on the segregation of desirable fruit volatiles. Sequence variants belonging to the Poly High Resolution (PHR) and No Minor Homozygote (NMH) marker classes were included for association mapping. Mono High Resolution (MHR), Off-Target Variant (OTV), Call Rate Below Threshold (CRBT), and other marker quality classes, were discarded and not used for mapping. Individual marker calls inconsistent with Mendelian inheritance from parental lines were removed.
Fruit Transcriptome Assembly and Analysis
Mature fruit from 61 parents and progeny from the biparental populations 10.113, 13.75, and 13.76 were sequenced via Illumina paired-end RNA-seq (avg. 65million 2×100bp reads) as previously described (Barbey et al. 2019). Briefly, RNA-seq reads were assembled based on the Fragaria ×ananassa octoploid ‘Camarosa’ annotated genome, with reads mapping equally-well to multiple loci discarded from the analysis. Separately, raw RNA-seq reads from the ‘Camarosa’ strawberry gene expression atlas study (Sánchez-Sevilla et al. 2017) were assembled via the same method. Transcript abundances were calculated in Transcripts Per Million (TPM). Fruit eQTL analysis was performed using the mixed linear model method implemented in GAPIT v3 (Tang et al. 2016) as described in (Barbey et al. 2020).
Genetic Association of Fruit Volatiles
Relative volatile abundance values were rescaled using the Box-Cox transformation algorithm (Box and Cox 1964) performed in R (R. Development Core Team 2014)using R-studio (Racine 2011) prior to genetic analysis. GWAS on fruit volatiles was performed using the mixed linear model method implemented in GAPIT v3 (Tang et al. 2016) in R, using marker positions oriented to the F. vesca diploid physical map. Significantly-associated volatiles were then reanalyzed in GAPIT using the ‘Holiday’ × ‘Korona’ and FL_08-10 × 12.115-10 genetic maps. Metabolomic associations were evaluated for significance based on the presence of multiple co-locating markers of p-value < 0.05 after FDR multiple comparisons correction (Benjamini and Hochberg 1995). Narrow-sense heritability (h2) estimates were derived from GAPIT v3 while single-marker analysis was performed via ANOVA in R to investigate allelic effects.
Analysis of Candidate Genes
All gene models in the ‘Camarosa’ genome were analyzed with the BLAST2GO pipeline and the Pfam protein domain database. Genes with significant homology to known volatile biosynthesis genes including FanOMT and FanAAMT were collected from the ‘Camarosa’ genome using BLAST with inclusive criteria. This process was replicated for candidate genes including anthranilate synthase alpha subunit (FanAS-α) and others not presented in this analysis. Deduced protein sequences from transcripts were aligned using the slow progressive alignment algorithm in the CLC Genomics Workbench 11 (Gap Open cost=10; Gap Extension=1). Tree construction was performed using the Neighbor Joining method with Jukes-Cantor distance measuring with 1,000 bootstrapping replicates. Fruit transcript heatmaps were added to the cladogram to show the maximum transcript level detected among the 61 fruit transcriptomes. Genes putatively belonging to published volatile biosynthesis gene families were selected for fruit transcript eQTL analysis, using methods described previously (Barbey et al. 2020; Barbey et al. 2019). The 200 genes surrounding the most-correlated volatile QTL markers were also analyzed for eQTL and compared for co-segregation with volatile QTL.
High Resolution Melting Marker Test
For the marker test of mesifurane, two validation crosses were created consisting of ‘Florida Beauty’× 15.89-25 (population 18.50, n= 27) and 15.34-82 × 15.89-25 (population 18.51, n=44). Total genomic DNA was extracted using the simplified cetyltrimethylammonium bromide (CTAB) method described by (Noh et al. 2018) with minor modifications. To develop HRM markers for mesifurane, two probes were selected (AX-166520175, AX-166502845) that were highly associated with mesifurane on Chr 1 QTL. The primers 5’-CCCTTGGCATCAATATTTGTGAAT-3’ and 5’-GAACTCCATTAGAAATCAAGTTATCAGC-3’ were designed for AX-166520175, and the 5’-CTGATCCTGCTTCAAGTACAAG-3’ and 5’-TCAATGAAGACACTTGATCGAC-3’ were designed for AX-166502845 using IDT’s PrimerQuest Software (San Jose, CA, USA). PCR amplifications were performed in a 5 µl reaction containing 2× AccuStart™ II PCR ToughMix® (Quantabio, MA, USA), 1× LC Green® Plus melting dye (BioFire, UT, USA), 0.5 µM of each HRM primer sets and 1 µl of DNA. The PCR and HRM analysis were performed in a LightCycler® 480 system II (Roche Life Science, Germany) using a program consisting of an initial denaturation at 95°C for 5 minutes; 45 cycle of denaturation at 95°C for 10 seconds, annealing at 62°C for 10 seconds, and extension at 72°C for 20 seconds. After PCR amplification, the samples were heated to 95°C for 1 min and cooled to 40°C for 1 min. Melting curves were obtained by melting over the desired range (60 - 95°C) at a rate of 50 acquisitions per 1°C. Melting data were analyzed using the Melt Curve Genotyping and Gene Scanning Software (Roche Life Science, Germany). Analysis of HRM variants was based on differences in the shape of the melting curves and in Tm values.
RESULTS
Strawberry Fruit Flavor and Aroma QTL
Volatile aroma QTL were discovered for methyl anthranilate (CAS 134-20-3), methyl 2-methylbutyrate (868-57-5), methyl 2-hexenoate (CAS 2396-77-2) (Figures 1 and 2), mesifurane (CAS 4077-47-8) (Figure 3), and nine mono- and sesquiterpenes compounds (Figures 4 and 5). These terpenes include linalool (CAS 78-70-6), 3-carene (CAS 498-15-7), β-phellandrene (CAS 555-10-2), α-limonene (CAS 5989-27-5), linalool oxide (CAS 60047-17-8), nerolidol (CAS 7212-44-4), α-caryophellene (CAS 6753-98-6), α-farnesene (CAS 502-61-4), and β-farnesene (CAS 77129-48-7). Many of these compounds are consensus determinants of preferred strawberry flavor in human sensory trials, including grape-like methyl anthranilate (Ulrich et al. 1997), fruity-sweet methyl 2-methylbutanoate (Schieberle and Hofmann 1997), fruity linalool (Larsen and Poll 1992; Schieberle and Hofmann 1997; Ulrich et al. 1997; Schwieterman 2013), and sherry/caramel-like mesifurane (Larsen and Poll 1992; Schieberle and Hofmann 1997; Ulrich et al. 1997; Schwieterman 2013). Population-wide fruit transcript-level data for all candidate genes in the following analysis are provided in Table S1, and transcript levels throughout the ‘Camarosa’ plant are provided in Table S2.
Methyl Anthranilate and Methyl Ester QTL and Candidate Genes
Methyl anthranilate (h2= 0.59) QTL were identified on octoploid linkage groups (LGs) 2A and 5A of the FL_08-10 × 12.115-10 map (Figure 1A, Table 1). The LG 2A QTL is shared with methyl 2-hexenoate (h2 =0.43) and methyl 2-methylbutyrate (h2 =0.79) (Figure 1A; Table 1). This shared methyl ester QTL accounts for 11.7% of methyl anthranilate variance (p= 5.1e-5), 22.8% of methyl 2-hexenoate variance (p= 7.6e-5), and 18.1% of methyl 2-methylbutyrate variance (p= 4.7e-7) (AX-123540849, single-marker analysis) (Figure 1B). The QTL on LG 5D explains 19.7% of methyl anthranilate variance (AX-123358624, p= 6.324e-07). A diagram of known and hypothesized methyl anthranilate pathway components is provided in Figure S2.
The QTL on LG 2A broadly corresponds to the F. vesca-like Chr 2-2 of the ‘Camarosa’ octoploid genome, based on chromosome-wide genetic-genomic connections (Hardigan et al. 2020). However, nucleotide BLAST of IStraw35 methyl anthranilate marker sequences align non-specifically to all ‘Camarosa’ Chr 2 homoeologs except Chr 2-2 (Table 1). The corresponding syntenic regions of Chr 2-1 and Chr 2-3 each contain an ANTHRANILIC ACID METHYL TRANSFERASE-like homoeolog. The most-correlated QTL marker (AX-123540849) aligns 10 kb (2 genes) from the AAMT-like gene on Chr 2-1 (maker-Fvb2-1-snap-gene-255.58), and 14.6 kb (4 genes) from the AAMT-like gene on Chr 2-3 (maker-Fvb2-3-snap-gene-33.59) (Figure 2A and B; Table 1). These two FanAAMT-like genes have the highest sequence identity to the published Chr 4 FanAAMT gene excepting the highly-expressed FanAAMT gene on Chr 4-1 and its non-expressed homoeologs (Figure 2C). Reference-based RNA-seq shows both candidate FanAAMT-like transcripts are highly abundant in some fruit transcriptomes, but relatively low or almost absent in others (Chr 2-1, 3-278 TPM; Chr 2-3, 15-1324 TPM) (Table S1). The RNA-seq read alignments for both gene models show atypically high degrees of sequence disagreement with the ‘Camarosa’ reference, which suggests these transcript reads could be derived from an alternative locus such as an hypothesized deletion on Chr 2-2 (Figure S3).
In the LG 5D/Chr 5-4 QTL region, no candidate genes were found which correspond to known or hypothesized strawberry methyl anthranilate pathway components. However, a co-segregating transcript eQTL on Chr 5-4 was identified for a putative glutathione peroxidase gene (maker-Fvb5-4-augustus-gene-12.41) (transcript level h2=1.0) (Table 1). Single-marker analysis explains 51.1% of the transcript variation observed (AX-123358624, p=2.8-e9) (Figure S4A). Accumulation of this transcript positively correlates with methyl anthranilate production (Figure S4B).
A possible third methyl anthranilate signal on Chr 7 corresponds with the position of two ANTHRANILATE SYNTHASE ALPHA (FanAS-α) homoelogs (Figure S5A, Table S3). Both genes represent the only FanAS-α transcripts abundant in the fruit (Figure S5B), and presence/absence variation of the Chr 7-4 FanAS-α transcript is governed by a cis-eQTL which co-segregates with the methyl anthranilate signal (Figure S5C and D).
Mesifurane QTL and Candidate Genes
Two mesifurane (h2=0.72) QTL were identified on LGs 1A and 7B of the FL_08-10 × 12.115-10 map (Figure 3A-B, Table 2). The mesifurane volatile QTL on LG 7B co-segregates with a transcript eQTL for the published mesifurane biosynthesis gene O-METHYL TRANSFERASE 1 (FanOMT1; maker-Fvb7-4-augustus-gene-6.44) (h2 of transcript accumulation=0.80) (Figure 3B). Homozygosity of the mesifurane LG 7B minor allele (AX-123363184) eliminates both FanOMT1 transcript and mesifurane production (Figure 3C). The FanOMT1 allelic states correspond with stepwise increases in transcript abundance (Figure 3C). A novel mesifurane QTL was detected on LG 1A of the FL_08-10 × 12.115-10 genetic map (Figure 3A, Table 2). Alignments of probe nucleotide sequences to the ‘Camarosa’ genome are not subgenome-specific (Table 2).
An epistatic interaction was detected between the two QTL (F(6,173) = 13.78, p=9.0e-13) (Figure 3D). Homozygosity of the Chr 1-group QTL (AA genotype) strongly diminishes mesifurane abundance even when the competent Chr 7-4 FanOMT1 allele is homozygous (BB genotype) and transcript abundance is highest (AA/BB vs. AB/BB Tukey HSD p=2.4e-7). At least one competent allele at each locus is required for robust mesifurane production (Figure 3D). Mesifurane abundance is somewhat higher when both alleles are homozygous (BB/BB vs AB/AB, AB/BB, and BB/AB ANOVA (F(1,149) = 4.38, p= 0.038).
Because the mesifurane Chr 1 QTL probe sequences align equally-well to multiple subgenomes, all Chr 1 homoeologous regions were considered for candidate gene identification. The published furaneol biosynthesis gene FanQR (maker-Fvb6-3-augustus-gene-21.50) is not located in Chr group 1, nor were genes of similar function found in the region. The published F. ×ananassa furaneol glucosyltransferase gene from Yamada et al. 2019 (UGT85K16) has two putative homologs located in the ‘Camarosa’ Chr 1 group, however they are over 10Mb from the mesifurane Chr 1 QTL (100% nucleotide identity, maker-Fvb1-4-augustus-gene-115.43; 97% nucleotide identity, maker-Fvb1-2-augustus-gene-137.19) (Table 2). The two most active F. ×ananassa furaneol glucosyltransferases from Song et al. 2016 (UGT71K3a/b, UGT73B23/4) have putative orthologs outside of the Chr 1 group (98% nucleotide identity, augustus_masked-Fvb3-4-processed-gene-50.19; 100% nucleotide identity; augustus_masked-Fvb2-2-processed-gene-195.2). However, UGT73B23/4 has a highly-identical second homolog in Chr 1-2 (augustus_masked-Fvb1-2-processed-gene-35.18; 96% nucleotide identity, 100% coverage) located 164 genes (0.9Mb) from the most-significant mesifurane marker (AX-166520175, p=2.4e-6) and 49 genes (0.2Mb) from the marker AX-166510720 (p=5.6e-05) (Figure 3A; Table 2).
In diverse tissues of the ‘Camarosa’ plant (AX-166520175=BB), the candidate Chr 1-2 furaneol glucosyltransferase transcript levels are high in roots but low in the ripe fruit, with fruit expression somewhat increasing with ripening series (Table S3). In the mature fruit RNA-seq populations, the Chr 1-2 candidate is modestly expressed in the fruit (5.3 ± 2.2 TPM).
Using a high-resolution melting (HRM) assay in two separate crosses (n=72), two HRM markers targeting SNPs in the mesifurane Chr 1-2 QTL (AX-166520175, AX-166502845) were tested for association with mesifurane abundance in marker-assisted seedling selection. Both Chr 1 markers were confirmed to predict the segregation of mesifurane abundance (F(1,71) = 8.25823, p= .0006) (Figure S6).
Terpene QTL and Candidate Genes
A shared QTL for the production of nine terpene compounds was discovered, corresponding to Chr 3 of the F. vesca physical map. Only two shared terpene markers are positioned in an octoploid genetic map (AX-166504318, AX-166521725), both of which correspond to LG 3B in ‘Holiday’ × ‘Korona’ (Figure 4A, Table 3) which represents Chr 3-3 in the ‘Camarosa’ genome (Hardigan et al. 2020). For linalool (h2=0.749), the most-correlated QTL marker (AX-166513106, p= 4.2e-11) explains 14.8% of the observed variance in linalool abundance. This represents a large absolute difference as linalool is among the most abundant volatiles in strawberry fruit, commonly exceeding 100 ng1 gFW-1 hr-1 in the cultivars used as parental lines for these populations (Schwieterman 2013). The linalool QTL is shared with the monoterpenes 3-carene (R2=0.09, p=1.2e-8), β-phellandrene (R2=0.21, p=7.0e-7), α-limonene (R2=0.06, p=7.0e-7), and linalool oxide (R2=0.08, p=4.4e-6), and the sesquiterpenes nerolidol (R2=0.09, p=4.3e-6), α-caryophellene (R2=0.10, p=8.9e-6), α-farnesene (R2=0.16, p=1.6e-6), and β-farnesene (R2=0.12, p=2.4e-7) (Figure 4A-B, Table 4). The four sesquiterpene QTL are comprised of two shared significant markers, which are also common to the monoterpenes (Table 4). Mono- and sesquiterpene volatile abundances are very strongly correlated within-group, and moderately correlated between groups (Figure 5).
The LG 3B terpene QTL IStraw35 probe sequences align non-specifically to all four Chr 3 homoeologs (Table 3). These corresponding genomic regions contain putative terpenoid biosynthesis gene clusters, which together contain three annotated copies of (3S,6E)-NEROLIDOL SYNTHASE, three copies of (E,E)-ALPHA-FARNESENE SYNTHASE, and three copies SOLANESYL DIPHOSPHATE SYNTHASE (Table 3). The characterized FanNES1 deletion responsible for linalool biosynthesis in octoploids was not detected among the ‘Camarosa’ FanNES1-like gene sequences, however this gene appears on Chr 3-3 in the updated ‘Camarosa’ v2 genome (Hardigan, personal communication).
To assess terpenoid-related transcript levels in fruit, the published FanNES1 gene sequence was added ad hoc to the ‘Camarosa’ v1 genome as an independent contig prior to reference-based RNA-seq re-assembly. The resulting FanNES1 transcript was the only NES1-like transcript abundant in fruit (Table 3). Two eQTL signals co-segregated with terpene QTL markers (Table 4). These eQTL correspond to the ambiguously-located FanNES1 gene and to a cis-eQTL for a novel SOLANESYL DIPHOSPHATE SYNTHASE (FanSPS) gene on Chr 3-3 (maker-Fvb3-3-augustus-gene-10.48). Linalool abundance (n=61) is statistically correlated with FanNES1 transcript levels (R2=0.31, p=0.017), but a significant relationship was not detected with FanSPS (R2=0.20, p =0.14). An additional cis-eQTL was detected for an (E,E)-ALPHA-FARNESENE SYNTHASE gene on Chr 3-2 (maker-Fvb3-2-augustus-gene-19.46) (AX-166522353; R2=0.50, p =1.9e-7). While this terpene biosynthesis gene is positioned closely with potential IStraw35 physical positions (Table 3), these markers do not genetically co-segregate with the terpene QTL, and transcript levels are not correlated with linalool abundance (R2=0.004, p =0.98).
Discussion
Many QTL were discovered for strawberry flavor and aroma compounds known to influence the human sensory experience. These QTL are derived from eight biparental crosses phenotyped across multiple seasons under a commercial cultural system in central Florida, and are likely to be useful for making genetic gains in related germplasm. Markers correlated with these traits may be used to guide breeding decisions and identify and select for alleles mediating flavor and aroma. Potential causal genes were identified via a multi-omics approach, and provide a foundation for possible gene-editing-based approaches to improved strawberry flavor. These genetic discoveries represent new opportunities for improving flavor in commercial strawberry, and advance the basic understanding of the molecular mechanisms driving fruit flavor and aroma.
Methyl Anthranilate
Consistent with the long-standing polygenic hypothesis for methyl anthranilate production in octoploid strawberry, multiple QTL were identified for this trait. Multi-omics analysis of QTL regions implicated several likely causal genes within distinct QTL. Because many discrete loci affect methyl anthranilate levels, and because environmental interactions transiently induce wide phenotypic swings including trait presence/absence, the interactions between loci could not be reliably measured in this sample size and diverse set of crosses. However, no loci could be identified as singularly required for production. The published FanAAMT gene on Chr 4, which did not emerge as a QTL in this analysis, was identified solely in the context of the biparental population ‘Florida Elyana’ × ‘Mara de Bois’ (population “10.133”), which contained only 13 analyzed progeny from one cross (Pillet et al. 2017). It is possible that these differences are due to segregating genetic factors becoming fixed or lost in subsequent populations. This hypothesis is supported by the low positive rates resulting from F1 backcrosses to ‘Mara des Bois’ (Chambers, 2013), and the fact that population “10.133” does not independently support the identified QTL regions. This QTL analysis is mostly comprised of populations using the parent ‘12.115-10’, which is a descendant of ‘Mara des Bois’ that produces more methyl anthranilate than its ancestor. It is likely this breeding line has been enriched for favorable methyl anthranilate genetics. These findings might relate more to quantitative differences in methyl anthranilate abundance, rather than the genetic presence/absence which historically defines this rare trait among strawberry cultivars.
The methyl anthranilate LG2A QTL is positively correlated with the production of two other methyl ester volatiles, namely 2-hexenoic acid, methyl ester and methyl 2-methylbutyrate. Consistent with historical segregation ratios which implicate methyl anthranilate as a polygenic trait, less methyl anthranilate variance is explained by this QTL compared with the other two methyl ester volatiles. As their precursors are not closely related, a single promiscuous methyl transferase offers a parsimonious explanation. In Pillet et al 2017, moderate methyl anthranilate levels were occasionally detected in the near-absence of the published FanAAMT transcript, which is suggestive of the possibility of additional methyl transferases.
While hundreds of methyl transferase genes exist in the octoploid genome, only the published FanAAMT has experimentally-demonstrated affinity for anthranilate. Four FanAAMT-like transcripts were abundantly detected in mature octoploid fruit transcriptomes. Two of these expressed FanAAMT-like genes correspond to the QTL on Chr 2, located within two genes (Chr 2-1) and four genes (Chr 2-3) from the most-correlated QTL markers. The expressed Chr 4-1 AAMT-like sequence in the ‘Camarosa’ genome is the most similar to the published Chr 4 AAMT sequence, whose subgenomic identity was not established (Pillet et al. 2017). This gene on Chr 4-1 might be the FanAAMT gene in ‘Camarosa’, particularly as RNA-seq reads from fruit transcriptomes have high sequence fidelity with this gene reference (Figure S3).
Genetic mapping suggests only a single methyl anthranilate QTL for chromosome group 2, which should be located on ‘Camarosa’ Chr 2-2. However, this QTL marker region in the Chr 2-2 physical sequence is completely absent. As ‘Camarosa’ is not capable of producing methyl anthranilate, one or more required genetic elements are expected to be missing in this reference genome. Poor RNA-seq sequence agreement with the FanAAMT-like Chr2-1 and Chr 2-3 homoeologs suggests the correct position of these transcript reads is not in the ‘Camarosa’ genome. It is unlikely that RNA-seq reads corresponding to the published Chr 4 FanAAMT transcript would map falsely to the Chr 2 candidate loci, as the published sequence is the most identical to the Chr 4-1 FanAAMT gene, and the RNA-seq mapping criteria excludes all non-specific reads. A comparative pan-genome analysis using a methyl anthranilate-producing individual would be highly informative, and will be undertaken in the future.
No candidate genes belonging to the hypothesized methyl anthranilate pathway are located in the Chr 5-4 region of the ‘Camarosa’ reference. However a co-segregating transcript cis-eQTL was detected for a putative glutathione peroxidase gene. Many but not all significant markers were shared between the trait QTL and transcript cis-eQTL, since methyl anthranilate levels are influenced at multiple loci while the candidate transcript is under strong single locus control. In microbes, there is precedent for heme peroxidase activity catalyzing methyl anthranilate biosynthesis (Van Haandel et al. 2000), however this reaction is unlikely to proceed via a glutathione peroxidase. It is possible that this cis-eQTL is simply in close linkage with the actual causal gene, which was either not correctly identified or is not present in the ‘Camarosa’ reference genome.
A possible third methyl anthranilate QTL corresponds with two Chr 7 ANTHRANILATE SYNTHASE ALPHA (FanAS-α) homoelogs. Presence/absence variation of the Chr 7-4 FanAS-α transcript is governed by a cis-eQTL which co-segregates with the putative methyl anthranilate markers at this locus. The Chr 7-2 FanAS-α transcript also demonstrates transcript presence/absence variation, but this is apparently due mostly to non-heritable factors that are uncorrelated with Chr 7-4 transcript level variation. Although there are few methyl anthranilate positive individuals among 61 fruit transcriptomes, none of the ten individuals with zero combined FanAS-α expression show methyl anthranilate production. This pathway mechanism is consistent with previous findings implicating FanAAMT as necessary-but-not-sufficient for methyl anthranilate production (Pillet et al. 2017). The absence of anthranilate substrate in the mature fruit would help explain the observed absence of methyl anthranilate production even when FanAAMT transcript levels are high. Further efforts to validate this potential QTL signal are underway.
Mesifurane
Mesifurane (2,5-Dimethyl-4-methoxy-3(2H)-furanone, or DMMF) is derived from the methylation of furaneol (4-hydroxy-2,5-dimethyl-3(2H)-furanone, or HDF) by FanOMT1 (Wein et al. 2002; Zorrilla-Fontanesi et al. 2012). Mesifurane variance is influenced by a loss-of-function mutation in the FanOMT1 promoter, which eliminates transcription and mesifurane production. This model was validated by the detection of a cis-eQTL for the published FanOMT1 gene, which co-segregates with the Chr 7B mesifurane trait QTL. A novel mesifurane QTL was detected on Chr 1A which is in epistasis with the Chr 7B QTL. This QTL region contains a fruit-expressed furaneol glucosyl transferase, which is 95% identical to a characterized furaneol glucosyl transferase from F. ×ananassa. A substrate-restricting glucosyltransferase candidate is consistent with the epistatic interaction detected with the FanOMT1 locus. Depletion of substrate via glucosylation would limit mesifurane biosynthesis regardless of high FanOMT1 transcript levels. Conversely, elimination of the FanOMT1 transcript would eliminate mesifurane production regardless of substrate availability. The LG 1A mesifurane QTL was subsequently confirmed using two validation populations, providing robust support for this QTL. This two-gene model for mesifurane biosynthesis in cultivated strawberry can be exploited for genetic gain via marker-assisted selection. Moderate mesifurane levels can be maintained via dual selection for heterozygous/heterozygous allelic states, and somewhat elevated mesifurane levels can be achieved via double-homozygote selection. These findings may resolve some of the outstanding questions in mesifurane genetics posed by (Cruz-Rus et al. 2017).
Terpenes
Homoeologous terpene gene arrays were detected for nine strawberry mono- and sesquiterpene QTL, including the desirable compound linalool. In citrus, monoterpenes and sesquiterpenes co-locate to single genomic QTL containing paralogous terpene synthases (Yu et al. 2017). We identify a similar phenomenon in cultivated strawberry. This terpene hotspot contains clusters of multiple terpenoid synthase classes, in addition to homoeologous genes on three of four subgenomes. The known biosynthesis gene FanNES1 was associated with terpene levels via trait/transcript level correlations and trait QTL/eQTL co-segregation. Solanesyl diphosphate synthase may contribute to terpene abundances as well. The cis-eQTL/QTL genetic association with solanesyl diphosphate synthase on Chr 3-3 helps support the subgenomic location of FanNES1 and the shared terpenoid QTL in the ‘Camarosa’ genome, despite only two markers being genetically mapped and probe nucleotide sequences aligning to multiple subgenomes. It is possible that the influence of other terpene-related genes in this array remain undetected due to limitations in genome completeness, marker subgenome ambiguity, and/or presence-absence variation among genomes. With additional octoploid strawberry genomes for comparison and improved subgenomic genotyping tools, complex associations in octoploid strawberry will become more robust.
CONFLICT OF INTEREST
The authors declare no conflicts of interest.
ACKNOWLEDGEMENTS
We gratefully acknowledge Aristotle Koukoulidis for genomic DNA isolation, Natalia Salinas for assistance with GC/MS sample collection and preparation, Denise Tieman for assistance with GC/MS operation, Nadia Mourad for assistance compiling eQTL results, Andrew Hanson and Harry Klee for project discussion and guidance, Alan Chambers and Jeremy Pillet for RNA isolation and RNA-seq line selection, Zhen Fan for manuscript editing and discussion, and Angelita Arredondo and Kelsey Cearley for field expertise and assistance with fruit collection.