Transcriptome analysis of three Agave fiber-producing cultivars suitable for biochemicals and biofuels production in semiarid regions

Agaves, which have been grown commercially for fiber or alcoholic beverages, are emerging as a candidate crop for biochemicals and biofuels production in semiarid regions because of their high productivity in low rainfall areas, drought tolerance, and low lignin content. In this work, we present the transcriptomic atlas of Agave sisalana, Agave fourcroydes, and agave hybrid 11648 (A. amaniensis x A. angustifolia) under prolonged drought in field conditions. Leaf, stem, and root tissues were sequenced, and gene expression profiles were correlated with biomass composition, enzymatic hydrolysis of cell wall carbohydrates, histochemical analysis, and non-structural carbohydrates content. Differences in biomass accessibility were attributed to either lignin content or lignin composition, possibly through modification of s/g ratio promoted by changes in Caffeic Acid 3-O-Methyltransferase (COMT) transcript abundance. Unlike most plants, the most highly expressed transcripts do not encode photosynthetic proteins, but rather involved in stress response. Although the three cultivars presented quantitative differences in global gene expression, they activated a highly overlapping set of genes. The main molecular strategies employed by agave to cope with high-temperature and drought seem to consist in overexpressing HSP and LEA, as well as promoting raffinose accumulation as an osmolyte. In conclusion, our data provide vital new genetic information for the study of Agave species and provide new insights into cell wall architecture, recalcitrance, and resistance to abiotic stresses for these species.

Million) values representing the transcripts' abundance. For ORF prediction, we used Transdecoder v. 5.0.2 138 (Haas et al., 2013) configured to a minimum length of 200 nucleotides. Subsequently, we selected the longest 139 isoform of each locus, considering only those with a TPM value greater than 1, and ORF length longer than 140 255 nucleotides. Functional annotation assignment was performed using Pannzer2 ( To analyze the specific transcription profile of each tissue, we used the SPM (specificity measure) 157 metric implemented by the software tspex (https://github.com/apcamargo/tspex). Tissue specific genes with 158 SPM >= 0,95 were selected. 159 160

Orthologous analysis between cultivars 161
To compare the transcriptome within cultivars, we first defined orthologous groups using OrthoMCL v. To identify exclusive and expanded gene families from Agave and other related species, we 174 employed a comparative genomics approach using OrthoFinder v. 2.3.1 (Emms & Kelly, 2015). Single-175 copy orthologs were aligned with MAFFT v. 7.394 (Katoh & Standley, 2013) configured to use the L-INS 176 algorithm with 1,000 iterations. All alignments were concatenated in a supermatrix that was used for the 177 phylogenetic inference with IQ-TREE v. 1.6.8 (Nguyen, Schmidt, von Haeseler, & Minh, 2015). BadiRate 178 v. 1.35 (Librado, Vieira, & Rozas, 2012) was used to identify expanded gene families using a FDR 179 threshold of 0.05. We compared our data to protein sequences from genomes available at the Phytozome  (Miller, 1959 From the transcriptome quantification, we have ranked the ten most highly expressed transcripts in each 243 tissue for each cultivar (Table 1). Surprisingly, a similar set of highly expressed transcripts was identified in 244 all tissues of the three cultivars. One of those transcripts is LEA-5 (late embryogenesis abundant protein 5 As expected, our analysis revealed that most of the leaf-specific transcripts are photosynthesis-related 256 (Table 2). Interestingly, in the stem for all cultivars, we have found several homeobox transcripts, which are 257 proteins commonly associated with development (Mukherjee, Brocchieri, & Burglin, 2009). The most 258 expressed root-specific genes are no-hit proteins, and for A. fourcroydes and A. sisalana ubiquitins are present 259 as well. Also, A. sisalana has one root-specific heat shock protein with high expression; and H11648 has a 260 Thaumatin, which is a sweet-tasting protein homologous to pathogenesis-related (PR) protein PR-5 (Min et 261 al., 2003). One member of the PR-5 protein family, Osmotin, is accumulated in cells adapted to osmotic stress 262 (Singh et al., 1987). 263 264

Phenylpropanoids pathway 265
The phenylpropanoid pathway is composed of many branches, but all of them share the same common 266 precursors (Fraser & Chapple, 2011). We analyzed the transcriptional profile of the lignin and flavonoids 267 biosynthesis branches (Figure 6a). At least one member of each gene family of the phenylpropanoid pathway 268 was differentially expressed, and here we highlight the main focal points. Phenylalanine Ammonium Lyase 269 (PAL), the first enzyme of the pathway, presented similar behavior in A. fourcroydes and A. sisalana, in both 270 species, the most expressed isozyme gene had higher transcription in the stem. Also, an orphan PAL isozyme 271 gene was found for each species. Notwithstanding, for H11648, the main PAL isozyme gene was not 272 differentially expressed. The first diverging point is the conversion of p-coumaroyl CoA by either Chalcone Synthase (CHS) or Hydroxycinnamoyl-coenzyme A Shikimate:Quinate Hydroxycinnamoyltransferase 274 (HCT). For CHS, which is the starting enzyme for the flavonoid biosynthesis, two main encoding genes were 275 found to be differentially expressed. However, the overall expression of these two isozymes was relatively 276 low. Curiously, one of those isozymes presented significantly higher expression at the H11648 stem, although, 277 its transcription was not very high. For HCT, which catalyzes the outset of the lignin branch, four HCT 278 isozyme genes were deferentially expressed, and two were responsible for the majority of the transcription. 279 Whereas for H11648, these two isozyme-enconding gene did not present any relevant differences. monocots, such as barley, corn, and sorghum, raffinose concentrations oscillates between 2.6-7.9 mg/g (Kuo,317 VanMiddlesworth, & Wolf, 1988). In contrast, for Agave vegetative tissues, we encountered mean values 318 ranging from 10.96 to 23.38mg/g. Perhaps, the elevated presence of this carbohydrate could be related to the 319 prolonged drought that challenged the plants under field conditions. Interestingly, no significant differences 320 were found between the samples for this carbohydrate. That was not the case for fructose, glucose and sucrose. 321 Glucose content was statistically different in the comparison between A. fourcroydes and H11648 leaves, with 322 almost double of this carbohydrate being present in the leaves of H11648. However, there were no significant 323 differences for the other carbohydrates nor between A. sisalana and A. fourcroydes. Among the leaves of A. 324 sisalana and H11648, a difference was detected in glucose and fructose, again almost 50% more was 325 quantified in the leaves of the hybrid. 326 The highest concentrations of water-soluble carbohydrates were found in leaves regardless of the 327 cultivar ( Fig. 7b). A. fourcroydes and H11648 contained the highest sugar content. However, their mono and 328 oligosaccharides fractions were contrasting. For A. fourcroydes, 26.41% of sugars detected were 329 monosaccharides, while the same category in the H11648 represented 76.11%,. Interestingly, the 330 transcriptomic data revealed a higher transcription of fructan-exohydrolases (FEH) in H11648 leaves (Fig. 4b) 331 that may explain the differences found in monosaccharides content. The oligosaccharides fraction for A. 332 sisalana leaves was also higher than the monosaccharide. In stem, A. sisalana presented higher 333 oligosaccharides content than the other cultivars, when analyzing the transcriptome, we noticed that 334 fructosyltransferases (FT) were more abundant in those samples. 335 336

Compositional analysis of Agave leaves and stem 337
Regardless of the cultivar or tissue, the extractives represented the largest fraction of the biomass 338 ( Figure 6B). A. fourcroydes presented the highest content variation among its tissues, going from 31.2% in stem to 56.89% in leaf. The other cultivars and their tissues varied little, ranging from 49.3% to 52.94%. 340 However, only A. fourcroydes stem was statically different from the other samples. 341 In general, the stems were more lignified than leaves. Considering leaves, A. fourcroydes lignin 342 fraction was significatively lower than A. sisalana and H11648. For A. fourcroydes, lignin levels in leaves 343 were practically half of those encountered for stem. Also, the stem of this agave cultivar presented the highest 344 lignin content in all samples. No statistical difference was found between A. sisalana and H11648 lignin 345 fractions. In relation to the cellulose content, A. fourcroydes stem presented the highest percentage (32.25%), 346 with no statistical difference among the other samples. As for hemicellulose mass fraction, similar patterns to 347 lignin were found. Leaves had lower hemicellulose contents and the stems, higher. There is no significant 348 difference between the leaves of the cultivars. Nonetheless, among stems, hemicellulose fraction was 349 significantly higher for A. fourcroydes, this cultivar presented 18.81%, which is 2.5x higher than its leaves. 350 Concerning ashes, the highest values were found at A. sisalana leaves (11.28%), and the lowest was found at 351 A. fourcroydes stem. In general, ash fractions in leaves were almost two times higher than stems. 352 353

Enzymatic hydrolysis of cell wall carbohydrates 354
The hydrolysis protocol was effective in Agave, and, as expected, differences between samples were 355 observed. Regardless of the agave cultivar, the stem showed higher recalcitrance than the leaves. was the less recalcitrant. Interestingly, even with higher cellulose and lignin levels, the stem of A. fourcroydes 361 was more accessible for digestion than the H11648 stem. 362 363

Microscopic analysis of leaf anatomy and cell wall composition 364
The three cultivars presented thick cuticle, sunken stomata, and well-developed fiber caps, as previously bundle presented more fiber cap cells with thicker cell walls, which may explain the higher fiber quality of 369 this cultivar (Medina, 1959). In contrast to pectin deposition, which was widespread, lignification was detected 370 only in the secondary cell wall of the xylem conducting cells for all cultivars ( figure 8 d-f). For the aniline 371 blue staining, although we encountered brighter regions in the cuticle and fiber caps cells, which could indicate 372 callose deposition, autofluorescence impaired the analysis (supplementary material S4 and S5).

DISCUSSION 374
Little is known about the chemical composition of agaves, especially for the cultivars utilized for fiber 375 production. Few studies have been done on the subject, and most of the data available focus on the chemical 376 composition of the fibers itself, not whole leaves or stems, which could be misleading when evaluating agave 377 bioenergetic potential ( In fact, agaves seem to invest heavily in drought and high-temperature resistance mechanisms. Our 460 transcriptomic data revealed that the most expressed transcripts encode proteins that are well known to be 461 stress-responsive. In A. fourcroydes leaves, Phosphoenolpyruvate carboxylase, responsible for the first step 462 in CAM pathway, appears only in the fifth position of our most expressed transcripts list (Table 1)  by the gas chromatography. Instead, our chemical analysis detected the raffinose in every sample tested. We 496 suggest that, in Agave, one of the main osmolytes might be raffinose. This carbohydrate was detected insimilar 497 concentration in every sample tested, yet our transcriptomic data revealed clear distinction regarding the 498 raffinose biosynthetic pathways; whereas A. fourcroydes seems to prefer the RafS-mediated synthesis, the 499 other two cultivars seem to utilize the route mediated by GosL. It is plausible that the inclination of A. 500 fourcroydes to use RafS-mediated synthesis could affect cellulose biosynthesis since SUSY and RafS could 501 compete for the same substrate. 502 Even though leaves presented higher expression of genes related to abiotic stresses, it was in roots that 503 we have found many of the GO terms referred to these responses (e.g., response to oxidative stress; response 504 to stress). Soil surface can have large daily oscillations in temperature, and in desert-like regions, these temperatures can easily overcome 40ºC (Nobel, 2010; Sattari, Dodangeh, & Abraham, 2017). As agaves 506 present shallow roots systems (Nobel, 1988), this tissue is continuously exposed to harsh conditions and must 507 develop special adaptations to cope with high temperature, salinity, and water deficit. It makes sense that the 508 most abundant transcripts of the root-associated fungi are of proteins related to heat responses, which may 509 indicate that the microbial community is well adapted to such an environment. 510

CONCLUSION 511
Drought is one of the most important environmental factors that impact plant growth and development. 512 In the eminence of climate change, projections estimate disturbances in rainfall patterns and rise in global 513 temperature, arid and semi-arid environments might become more common, and plants will have to be adapted 514 to those conditions. Therefore