Extensive genetic variation of leaf specialized metabolites in sessile oak (Quercus petraea) populations

Specialized or secondary metabolites play a key role in plant resistance against abiotic stresses and defences against bioaggressors. For example, in sessile oaks Quercus petraea, phenolics contribute to reduce herbivore damage and improve drought resistance. Here, we explored the natural variation of specialized metabolites in nine European provenances of sessile oaks and aimed to detect its underlying genetic bases. We sampled mature leaves from high and low branches on 225 sessile oak trees located in a common garden and used untargeted metabolomics to characterise the variation of 217 specialized metabolites. In addition, we used whole genome low-depth sequencing to genotype individuals for 1.4M genetic markers. We found that leaf specialized metabolites displayed extensive within-provenance variation, but very little differentiation between provenances. In addition, a genome-wide association study allowed detecting significant associations for 42% of these metabolites. Hence, our results suggest that genetic variation for most leaf specialized metabolites is unlikely to be locally adaptive, however lack of differentiation among populations suggests selection acts locally to maintain diversity at loci associated with leaf specialized metabolites variation.

phenolics contribute to reduce herbivore damage and improve drought resistance. Here, we 23 explored the natural variation of specialized metabolites in nine European provenances of 24 sessile oaks and aimed to detect its underlying genetic bases. We sampled mature leaves from 25 high and low branches on 225 sessile oak trees located in a common garden and used 26 untargeted metabolomics to characterise the variation of 217 specialized metabolites. In 27 addition, we used whole genome low-depth sequencing to genotype individuals for 1.4M 28 genetic markers. We found that leaf specialized metabolites displayed extensive within-29 provenance variation, but very little differentiation between provenances. In addition, a 30 genome-wide association study allowed detecting significant associations for 42% of these 31 metabolites. Hence, our results suggest that genetic variation for most leaf specialized 32 metabolites is unlikely to be locally adaptive, however lack of differentiation among 33 populations suggests selection acts locally to maintain diversity at loci associated with leaf 34 specialized metabolites variation. Dolci & Lanzotti 2008) and tomato (Vargas et al. 2013). In addition, specialized metabolites 51 production is also tightly linked to the plant immune system and phytohormones (Bednarek 52 2012), making them key players in plant pathogen interactions and good candidates for 53 quantitative resistance and even exapted resistance (Newcombe 1998; Bartholomé  association study identified significant associations for 42% of the 217 metabolites we 115 quantified, suggesting that leaf specialized metabolite variation was, to a large 116 extent,genetically determined, and that the variation of individual molecules displayed mono-117 to oligogenic architectures. The fact that the vast majority of the molecules with significant 118 associations did not display differentiation between populations suggests that most molecules 119 did not contribute to local adaptation among provenances, at a European geographical scale.

121
Sampling sessile oak leaves 122 We sampled sessile oak trees grown in a common garden experiment located in Eastern 123 France (Sillegny, 48°59′ 24″ N 06°07′ 56″ E). This common garden is one of the four long-   Table S1). We collected leaves in the common garden on September 7 th and 8 th , 130 2016 on 225 trees (22 to 28 trees per population). At the time of sampling, trees were 131 between 29-35 years old. For each tree, we sampled four to six fully developed leaves from 132 branches at two heights: low branches, mostly protected from direct sunlight by the canopy, 133 and high branches exposed to sunlight. Leaves were collected either using a pole pruner when 134 possible, or with a shotgun whenever the branches were too high to reach. Four to six leaves 135 per branch were stored in 20 mL plastic vials and frozen on dry-ice upon harvest (see 136 Supplementary Material and Methods S1). We collected a total of 582 samples. These 137 samples were used to generate three datasets: genome-wide SNP genotypes of oak trees, a 138 "GWA dataset" and an "annotation dataset". The production and analysis of these three 139 datasets is described in the following and illustrated in Figure S1.   (Plomion et al. 2018) and detected bi-allelic SNPs using a genotype-frequency estimator were achieved with a combination of awk command lines and PLINK v1.9 (Chang et al. 162 2015). We discarded three pairs of individuals with genome-wide genetic similarity above 163 95% (i.e 95% of SNPs with identical genotypes between the two individuals) as they may 164 have corresponded to the same individual sequenced twice ( Table S2a). 165 Population genomics analysis of sessile oak populations 166 We estimated linkage disequilibrium (LD) decay along the oak genome using the r² function 167 available in PLINK v1.9 (Chang et al. 2015) with window size of 50kb.

169
To study the genetic structure of Q. petraea populations we pruned the SNPs according to LD 170 previously estimated and removed highly correlated markers (r² > 0.2) using the "--indep-   Table S2b). The goal was to obtain fresh extracts for a subset of the samples collected in      (Table S2c), 226 following the same protocol that was presented above. Leaf samples included one sample per

237
GWA dataset: This dataset was used to screen specialized metabolite variation and perform 238 genome-wide associations analyses and will be designated hereafter as the "GWA dataset".

239
The liquid chromatography protocol was identical to the liquid chromatography performed    (Table S3) and added a noise threshold at 800. We ran 260 the "findChromPeaks" function to detect chromatographic peaks. Then, we ran the 261 "adjustRtime" function to correct RT deviation and calculated adjusted RTs using a subset-262 based alignment based on the QC samples injected at regular intervals. Finally, we ran the 263 "PeakDensityParam" and "groupChromPeaks" functions to match all the detected peaks 264 between samples. We produced a table of peak intensities and identified pseudo-molecules, 265 based on RT and intensity correlations among peak groups using the "groupFWHM" and  After this step, we replaced NA values by zeros (as the fact that a molecule was undetected in 270 a sample means that it was not present or at very low concentration and is biologically 271 meaningful) and we filtered the dataset to remove outlying samples and peaks. First, we used 272 the median peak intensity across all peak groups to remove samples that had values consistent 273 with blanks and removed blanks that had median peak heights similar to regular samples. 274 Second, we filtered the peak groups defined by XCMS using QCs and blanks, with the 275 rationale that peaks had to be significantly higher in the QCs than in the blanks to be retained. 276 We used a one-way variance analysis (ANOVA, p-value<0.05) on log-transformed intensity 277 data for each peak to determine the average difference in intensity between blanks and QCs 278 and assess significance. Third we also removed all peaks with coefficients of variation above 279 30% in the QCs. Finally, for each injection, we checked that the internal standard included in 280 the extraction buffer was still clearly present and that its intensity remained stable along the 281 injection sequence ( Figure S2). 282 We then corrected peak intensities to account for two potentially confounding effects. First, 283 in LC-MS analyses including many samples over long series, it is frequent to observe 284 intensity drift. To correct the effect of intensity drift (although it was minimal in our analyses, 285 see Figure S2) we divided the intensity values of each peak by the intensity of the 286 corresponding peak in the closest QC injected after the sample along the injection sequence 287 (a QC was injected every twelve samples in the injection sequence, see Supplementary   288 Material and Methods S3). Second, the weight of dry material used in the extraction 289 influences peak intensity. We therefore divided the peak intensities (relative to the values 290 measured in the QCs) in each sample by the weight of material used in the extraction.  consecutive SNPs with significant associations in the same cluster if they were located less 312 than 3000 bases apart. If the distance between consecutive significant SNPs was larger than 313 3000 bases a new cluster was created.  In this study we sampled 225 sessile oak trees originating from nine populations (Figure 1a) 328 and sequenced 224 of these trees to a depth of 10X (on average 36.5M paired-end reads per 329 tree). After filtering, we were able to genotype trees for 1,408,029 SNPs with a minor allele  (Figure 1c). 350 The second PC explained 11.38% of total variance and mostly captured the difference 351 between a group of individuals from Bezange from the other individuals included in the study 352 (Figure 1c). Inspection of kinship values shows that these individuals were likely half-sibs. Leaf specialized metabolites annotation and natural variation 369 We used high-throughput LC-MS analysis to measure the relative concentrations of leaf 370 specialized metabolites in the leaves of the pedunculate oak trees from nine populations that 371 we genotyped. The trees sampled are ~25 years old, grow in a single common garden, and 372 were sampled over two days at the end of the summer 2016. Specialized metabolites were 373 extracted using a methanolic extraction and LC-MS analyses were run to produce two distinct 374 datasets.

375
The first, dedicated to automated annotations of pseudo-molecules and referred to as the 376 "annotation dataset", was produced using a HPLC coupled to a LTQ-Orbitrap and samples 377 were analysed in both positive and negative modes. The second dataset was generated using a 378 HPLC coupled to a QTOF (quadrupole time-of-flight) mass spectrometer in positive mode.

379
Yet the same column and chromatographic conditions were used. This dataset was produced 380 to screen variation of leaf specialized metabolites and is referred to as the "GWA dataset". It   height, which is related to leaf exposure to sunlight, is a major factor driving leaf content in 403 specialized metabolites (Figure 2d). 404 While the effect of branch height was evident from the PCA analysis, the population of origin  439 To investigate the genetic basis underlying the variation of leaf specialized metabolites in 440 sessile oaks, we performed a genome-wide association study, using the "GWA dataset" and     (Figure 4a, Figure S5). Three additional observations can be made about 497 these markers. First, alleles at these markers were present at balanced frequencies in all nine 498 populations, hence their low FST values (Figure 4a). Second, allelic variation at these markers 499 explained phenotypic variation similarly well in all nine populations (Figure 4). The third 500 observation is that three out five, the homozygous genotype for the reference allele not  Based on structural annotations, the pseudo-molecules were assigned to a chemical class and given a putative name. The RT column gives the retention time in min, the m/z+ 505 column gives the mass to charge ratio (in positive mode) for the peak in the "GWA dataset" chosen as representative for the pseudo-molecule (which doesn't always 506 correspond to the parent ion). The identification level column gives confidence levels for annotations based on (Schymanski et al. 2014).The column "Chr:Position" gives the 507 coordinates, on the Q. robur reference genome, of the genetic marker most strongly associated with the variation of each pseudo-molecule. The column "Asso. score" is the 508 highest association score (-log 10 (p-value)) observed for the genetic marker across both branch heights using farmCPU. The columns "candidate gene" and "gene name"

524
For the five pseudo-molecules with the highest associations, we sought to check their 525 automatic annotations by comparing MS n spectra with public spectral libraries and by 526 studying known fragmentation patterns (Sumner et al. 2007). Putative structures were 527 proposed for only three of these molecules by the automated annotation pipeline (see 528 Material and Methods). Of these, we were able to manually confirm the structures of two.

529
The MS2 spectrum for m271 was consistent with it being a galloylquinic acid, a quinic and 530 gallic acid derivative. Pseudo-molecule m39 was found to be a "ferulic acid-like" molecule, 531 belonging to the ontology class of hydroxycinnamic acid ( Table 1).

533
SNPs associated with variation of these five molecules were located near candidate genes on  Interestingly, other associations for m271, but also m269 and m270, were located near the interactions influences the production of leaf specialized metabolites.

596
A second observation was that oak populations display very little differentiation for leaf 597 specialized metabolites when phenotyped in a common garden (Figure 2a,b) and patterns of 598 variation for leaf specialized metabolites didn't match patterns of genetic structure. Despite 599 the lack of differences among populations, our genome-wide association analyses detected 600 significant associations for 42% of the 217 specialized metabolites we investigated. This 601 means that heritable variation exists for many molecules within oak populations. In addition, 602 we found that these molecules were generally associated with a small number of loci,   The second pseudo-molecule, m271, is part of a group of three also including m269 and 708 m270. The three pseudo-molecules m269, m270, and m271 did not have the same retention 709 time, but had identical m/z, which together suggests they were isomers. Our automatic 710 annotations and inspection of fragmentation patterns suggests they were likely galloylquinic 711 acid, also known as theogallin.

712
In oaks, galloylquinic acids were previously identified within leaves of 12 species of black     Dataset S1 Pseudo-molecules intensities, sampling and peak descriptions.