Abstract
The transition of plants from sea to land sparked an arms race with pathogens. The increased susceptibility of land plants is largely thought to be due to their dependence on micro-organisms for nutrients; the ensuing co-evolution has shaped the plant immune system. By profiling the immune receptors across flowering plants, we identified species with low numbers of NLR immune receptors. Interestingly, four of these species represent distinct lineages of monocots and dicots that returned to the aquatic lifestyle. Both aquatic monocot and dicot species lost the same well-known downstream immune signalling complex (EDS1-PAD4). This observation inspired us to look for other genes with a similar loss pattern and allowed us to predict putative new components of plant immunity. Gene expression analyses confirmed that a group of these genes was differentially expressed under pathogen infection. Excitingly, another subset of these genes was differentially expressed upon drought. Collectively, our study reveals the minimal plant immune system required for life under water, and highlights additional components required for the life of land plants.
Author summary Plant resistance to pathogens is commonly mediated by a complex gene family, known as NLRs. Upon pathogen infection, changes in the cellular environment trigger NLR activation and subsequent defence responses. Despite the dependence of agricultural practices on NLR genes to control pathogen load, relatively little is known about this gene family outside of model crop species. In this study, we identified a convergent reduction in the NLR gene family among two lineages of aquatic plants. Furthermore, we established that NLR reduction occurred in conjunction with the loss of a common immune signalling pathway. Subsequently, we identified other genes convergently lost in aquatic species and propose these as candidate components of the plant immune signalling pathway. In addition, we revealed components of the agronomically important drought response to be lost in aquatic plants. This study adds to our understanding of the complex interactions between environment and response to biotic stress, widely known as the disease triangle. The pathways identified in this study shed further light on the link between responses to drought and disease.
Introduction
Plants evolved from a common ancestor with charophyte green algae upon a major change in lifestyle -- the transition from water to land -- over 450 million years ago (MYA) (1,2). Extant plant lineages, such as bryophytes (mosses, liverworts and hornworts), terrestrial non-vascular plants (ferns), gymnosperms (pine, conifers) and angiosperms (monocots, dicots) diverged from the ancestor of terrestrial plants over 300 MYA (3). Modern agriculture depends heavily on two prominent branches of angiosperms, monocots and dicots. Monocots range from cereal crops to tropical palms, whilst the most widely consumed dicots include soybean, potato and tomato (4). Crops are continuously exposed to both biotic and abiotic stresses, which can result in major yield loss. Such losses are expected to increase in frequency with climate change (5). Adaptation of crops to new and evolving pathogen threats often utilises introgressing Nucleotide Binding Leucine Rich Repeat (NLR) genes, which are involved in pathogen recognition.
The first evidence of pathogenic infection of land plants comes from fossils dating back 400 million years (6), prior to the divergence of angiosperms from mosses and gymnosperms (7). Similarly, major components of the immune system, such as NLR genes, were prevalent in plants early upon transition to land (8). Plant pathogen interactions therefore predate the divergence of angiosperms and for 200 million years angiosperms relied on common pathways to defend against pathogens. Since the divergence of monocots and dicots 120-180 MYA lineage specific response to stress including pathogen infection have emerged (9).
PAMP-triggered immunity (PTI) is the initial plant disease resistance response to detection of a pathogen through monitoring the extracellular environment, however pathogen effectors can suppress PTI to facilitate virulence. Hence, a second intracellular monitoring system of effector triggered immunity (ETI) is essential for resistance to many pathogens. Plant NLR immune receptors convey ETI, upon detection of intracellular pathogen molecules. The NLR proteins are typically composed of three or more domains. A central Nucleotide Binding (NB-ARC) domain is a component of all NLRs due to it being essential for receptor activation, similar to the NAIP, CIITA, HET-E, and TP1 (NACHT) domains found in animal NLR immune receptors (10). The NB-ARC domain is commonly followed by a series of Leucine Rich Repeats (LRRs), previously shown, in some but not all instances of direct recognition, to mediate pathogen-derived effector binding (11–13). Toll-like, Interleukin-1 (TIR-1) or coiled-coil (CC) domain, typically found at the N-terminus, function in the initiation of the signalling cascade (14). NLRs containing TIR-1 are referred to as TIR-1 NLRs, while CNL refers to NLRs with a CC domain. Within the CNL class, there is a sub-clade characterized by CCRPW8, which have been shown genetically to be required for signalling by other NLRs (15–17).
A subset of NLRs have undergone functional specialisation to serve downstream of other NLRs in the signalling cascade (18–21). This mechanism is referred to as an NLR sensor-helper pair, where genomically the two NLR genes are often adjacent in a head to head orientation with a shared promoter (22,23). The sensor NLR is required to recognise changes upon infection, whilst the helper NLR is activated upon a change in its paired NLR rather than via pathogen induced change (18). The helper NLR is then able to activate the signalling cascade for cell death. Additionally, sensor and helper interaction can be in the form of a network whereby multiple sensors interact with the same helper (24). Examples of network helpers include: NLR REQUIRED FOR CELL DEATH (NRC) clade (NRC2/3/4) (25) and CCRPW8 containing; N-REQUIRED GENE 1 (NRG1) (16,17), ACTIVATED DISEASE RESISTANCE (ADR1) (26) of NLRs. The NLR gene family is complex, with copy numbers variable by 10 fold between species of the same family (27). NLR copy numbers in crops range from as low as 30 in cucumbers to as high as nearly 3,000 in hexaploid wheat (27,28). The variation in copy number of NLRs is becoming more apparent with the increasing number of available genomes, however due to the historical bias in genome sequencing favouring economically important or model species most of our information comes from the Brassicaceae, Solanaceae and Poaceae families (29).
While there is abundant literature on pathogen sensing and signalling by NLRs (30), the next stages of the immune signalling cascade are still unclear. Among dicots, typically over half the NLRs contain a TIR-1 domain (29,31). In contrast, no TIR-1 domain containing NLR receptors have yet been identified in monocots. The two key signalling components that are downstream of TIR-1 NLRs, ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1) and PHYTOALEXIN DEFICIENT 4 (PAD4), are present in almost all angiosperms despite the absence of TIR-1 NLRs in monocots (32). Wheat EDS1 has retained functional importance in immunity despite the absence of TIR-1 NLRs, with EDS1 overexpression contributing to increased powdery mildew resistance (33). Furthermore, EDS1 in dicots forms a complex with PAD4 and also SENESCENCE ASSOCIATED GENE 101 (SAG101) (32). In planta work in Arabidopsis showed EDS1 binding directly to PAD4 and SAG101 to form mutually exclusive heterodimeric complexes (32), with subcellular localization of EDS1 dependent on the interacting partner (34). PAD4 and SAG101 also interact with MYC2/ JASMONATE INSENSITIVE-1, a transcriptional regulator of JA signalling (35). Interestingly, while EDS1 and PAD4 are conserved, the SAG101 gene is absent from available genomes of grasses (32). Regardless of over 120-180 Mya of divergent evolution, some NLRs such as barley MILDEW LOCUS A (MLA) are functional when transformed into Arabidopsis (36). This indicates overall conservation of downstream immune pathways across monocots and dicots.
Within flowering plants, there is a huge amount of diversity that can be mined to understand plant molecular pathways. Specifically, looking at convergent evolution we can predict gains and losses of genetic pathways associated with particular lifestyles. The switch between terrestrial and aquatic environments has occurred numerous times in both monocot and dicot lineages, although the exact points of transition remain unknown due to the lack of fossil evidence and uncertainty in phylogenetic assignment (37). Aquatic plant species genome assemblies are available for the monocot Araceae (Spirodela polyrhiza, Lemna minor) and Zosteraceae (Zostera marina) families as well as the dicot families Lamiales (Utricularia gibba, Genlisea aurea) and Nelumbonaceae (Nelumbo nucifera). Interestingly, aquatic species from both monocot and dicot lineages have reduced genome size compared to terrestrial sister species (38–41). The availability of such genomic resources allows elucidation of pathways that have independently evolved in aquatic species to facilitate acclimation to their new environment.
In this study, we used comparative genomics to identify gene families, which were convergently lost in the transition of monocot and dicot species to an aquatic environment (Fig. 1). We identified independent contractions in NLR numbers in aquatic monocots and dicots. We further analysed the known components of disease resistance pathways to identify connections between the loss of NLR genes and the loss of additional signalling pathways. Finally, we used gene family clustering methods to identify other pathways convergently lost in monocot and dicot aquatic species as well as to predict novel candidates involved in plant innate immunity and drought response.
Results
Aquatic plants have lost most of the NLR plant immune receptors
We annotated and characterized NLRs across 18 genomes that represent taxonomic and ecological diversity across monocots and dicots including plants that have changed their lifestyle from terrestrial to aquatic in both clades (Fig. 2A). Interestingly, we identified multiple independent reductions in the number of NLRs in eelgrass (Zostera marina), duckweed (Spirodela polyrhiza), orchid (Phalaenopsis equestris), resurrection grass (Oropetium thomaeum), humped bladderwort (Utricularia gibba) and corkscrew plant (Genlisea aurea). Compared to the average 300-400 NLRs in monocots, the genomes of duckweed and eelgrass encode only 62 and 10 NLRs (Fig. 2B). Likewise, the aquatic dicot G. aurea had only 8 NLRs while humped bladderwort had no NLRs that retained all motifs associated with this protein family (Fig. 2B). However, not all aquatic species show NLR reduction. An example for this is chinese lotus (Nelumbo nucifera), whose genome encodes 237 NLRs.
To test whether the low number of NLRs found in each species could be attributed to poor assembly or annotation, we assessed presence of single-copy orthologs using the Benchmarking Universal Single-Copy Orthologs (BUSCO) pipeline (42) on the 19 species used in the study (S1 Table). The number of complete single-copy orthologs for duckweed and eelgrass was comparable to other monocots including; orchid, A. hypochondriachus and pineapple. Humped bladderwort and corkscrew flower have a lower number of complete single-copy orthologs than other species. This can be explained as genes from pathways known to be lost in these two aquatic species are present in the BUSCO land plant database. For example, the stomatal development genes (EOG0936017K (AT5G03730) and EOG09360A7W (AT1G80080)) are known to be lost in eelgrass which lacks stomata (41), however these genes are part of the BUSCO database (S2 Table). Presence of these genes in the BUSCO database reduces the number of complete BUSCOs in these aquatic species but due to absence of a conserved pathway rather than poor annotation. Subsequently, we investigated if genome size reduction could explain the NLR reduction in aquatic species. The 99Mb genome of humped bladderwort (38) and 63.5Mb genome of corkscrew plant (43) are two of the smallest among all plant species. Duckweed and eelgrass have genomes of 202Mb (40) and 142Mb (41) in size respectively and thus have a larger genome size than the A. thaliana 135Mb genome (44) which has a higher number of NLRs in both TIR-1 NLR and CNL clades. While NLRs constitute 0.6% of Arabidopsis protein encoding genes, among aquatic species the percentage of NLR protein coding genes is between 0.003-0.43% (S3 Table). The low percentage of NLR proteins in the total proteomes thus are not be explained by genome reduction alone.
We further tested which NLR clades were lost and which retained among the four low NLR aquatic species (Fig. 3). In terrestrial plants, TIR-1 NLRs have been lost in monocots soon after the divergence from dicots (29,45,46). Interestingly, TIR-1 NLRs, which are a major component of the dicot immune system (Fig. 3), are absent from aquatic dicot species. The RPW8 like coiled-coil clade C1 found in all terrestrial monocots and dicots, is absent in all four of the aquatic species (Fig. 3). This is pertinent as CCRPW8 NLR NRG1 is required for TIR-1 NLR signalling (16,17). The absence of this clade in the dicot aquatic species supports its function as a helper for TIR-1 NLR signalling. Overall, 13 NLR clades were lost in multiple aquatic species and only two clades (C2 and C3) are retained in more than one aquatic species. The C2 clade includes the Arabidopsis NLR gene called Resistance to Pseudomonas Syringae 2 (RPS2), whose protein product acts as a guard detecting pathogen effector triggered modification to RPM1-INTERACTING PROTEIN 4 (RIN4) (47,48). The C3 clade includes Arabidopsis LRRAC1 (49). The C3 clade is characterised by a large expansion of monocot NLRs including 7 NLRs with known defence functions (S4 Table). Two distinct clades (C6 and C10) are present only in one of the aquatic species and have recently undergone expansions. Corkscrew flower has retained a single clade of NLRs, whilst eelgrass and duckweed have retained NLR members in 3 clades, both retaining C2 and C3 membership in addition to C4 and C10, respectively. Duckweed additionally has 4 NLRs which did not fall into any defined clades based on our criteria. Phylogenetic analysis not only highlighted the loss of NLRs among aquatic species, but also demonstrated recent expansion of some of the clades implying a much greater initial reduction of the NLR repertoire than what is currently observed. The predominant example of NLR clade expansion is in the C2 NLR family which contains a sub-group of 16 duckweed NLRs (0.5 branch length, 100 bootstrap).
To test if aquatic plants underwent a general reduction in large protein families, we annotated receptors belonging to the Receptor-like kinase family (RLKs) and actin proteins. We focused on RLKs as they represent another family of immune receptors conserved across plants and would allow us to observe if PTI was affected in addition to ETI. The actin gene family does not play a major role in disease perception or resistance and thus represents how the number of housekeeping genes is affected in the different plant species. Our analyses showed that the percentage of RLKs is similar between monocot aquatic species and monocot terrestrial species falling in a range of 0.84-1.0% (S5 Table). However, dicot aquatic species showed a reduction in the RLKs. Orchid has the lowest percentage of proteins encoding RLKs among terrestrial dicots at 0.76%, far greater than the 0.3% and 0.006% we found in corkscrew plant and humped bladderwort, respectively. Across all aquatic species the reduction of RLKs compared to sister terrestrial lineages was not as pronounced as with NLRs, with the exception of corkscrew plant (S3,5 Table). For the actin gene family, the percentage of actin encoding genes in the proteome were the 2nd (0.13%), 3rd (0.11%), 4th (0.11%) and 8th (0.076%) highest among the 12 species sampled. This is consistent with the conservation of core genes despite the genome reduction among aquatic species (S5 Table).
Both monocot and dicot aquatic plants independently lost immune signalling pathway components EDS1/PAD4/SAG101
To identify if other immune signalling components had been lost in aquatic plant species we performed reciprocal BLASTP search for 15 known immune components (S1 Fig and S6 Table) (Fig. 4A). Of the 15 immune genes characterized in Arabidopsis, 5 were found to be conserved across monocots and dicots, including aquatic lineages (Fig. 4B). Among the genes that we identified as lost was SAG101, which has previously been shown to be absent in monocots and a few dicots including A. coerulea and M. guttatus (32). The absence of a reciprocal orthologue for PBS1-Like-1 (PBL1) (50) can be explained by the recent duplications at this locus (S2 Fig).
In addition to SAG101, the aquatic monocot and dicot species with low NLR number appeared to have convergently lost EDS1 and PAD4 (Fig. 4B). To confirm that the inferred absence of EDS1, PAD4 and SAG101 is not an annotation artifact, we scanned all the genomes using TBLASTN and HMMER motif searches for the indicative lipase 3 motif. These two analyses supported absence of all three of these proteins, which have been shown to assemble into a complex (S3 Fig, https://github.com/krasileva-group/Aquatic_NLR/tree/master/).
Another key signalling component downstream of NLRs, NON RACE SPECIFIC DISEASE RESISTANCE 1 (NDR1), is known for its role in activating both PTI and ETI branches of immunity (51) and has a demonstrated role in maintaining adhesion between the plasma membrane and cell wall (52). Similar to SAG101, NDR1 appears to be absent in some terrestrial monocot species and in both sets of aquatic plants (Fig. 4B). The phylogeny of the identified NDR1 orthologs suggests three independent loss events of NDR1, one at the branch point of each of the aquatic clades and a further loss in the Poales order after the divergence from oil palm (S4 Fig). RIN4, which is known to interact with NDR1 in Arabidopsis, is markedly present in all species (53).
Orthogroup analysis of protein families provides a global view of the genes convergently lost in aquatic plants, while conserved in terrestrial lineages
The convergent loss of EDS1 and EDS1-dependent NLRs led us to hypothesize that other, as yet unknown, components of the EDS1-dependent signalling cascade would also have been convergently lost in aquatic species. To uncover novel proteins which can potentially function in conjunction with the EDS1 mediated NLR signalling cascade, we performed two analyses to identify orthogroups; OrthoMCL and GeneSeqToFamily. These methods were applied to the 18 plant proteomes including; monocots (Z. marina, S. polyrhiza, P. equestris, Dioscorea rotundata, Eleais guineesis, O. thomaeum, Oryza sativa and Zea mays) and dicots (Aquilegia coerulea, Nelumbo nucifera, Arabidopsis thaliana, Amaranthus hypochondriachus, Solanum lycopersicum, Fraxinus excelsior, Mimulus guttatus, G. aurea and U. gibba) both with the outgroups Selaginella moellendorffii and Amborella trichopoda (Fig 5A).
We identified 17 genes lost in only aquatic species from combining the monocot and dicot OrthoMCL runs and 31 genes from 10 orthogroups lost in only aquatic species from GeneSeqToFamily (Fig. 5). Four genes were identified by both pipelines including EDS1, PAD4, ACTIVATED DISEASE RESISTANCE-LIKE1 (ADR1-L1) and REGULATOR OF CHROMOSOME CONDENSATION 1-LIKE (RCC1-like). The former three are known to be involved in plant defence, whilst RCC1-like has not previously been implicated in defence response. We further focused on the 44 genes identified by either OrthoMCL or GeneSeqToFamily methods (Fig 5, S7 Table). We designated these genes as AngioSperm Terrestrial-Retained, Aquatic-Lost (ASTRAL).
Arabidopsis and rice homologs of genes lost in aquatic plants are differentially regulated upon drought response and disease resistance
We first took the 44 ASTRAL gene candidates from Arabidopsis and identified only two broad condition perturbations, drought and pathogen infection, were identified as causing differential expression (up or down) (fold change > 3, p-value = < 0.001) of more than 20 of our candidate genes from the 697 microarray experiments available at https://www.ncbi.nlm.nih.gov/geo/. To further investigate the effects of two of these conditions on the ASTRAL genes, we selected representative gene expression sets for these conditions (drought – AT-00626 (54) / AT-00419 (55) (micro-array), pathogen-AT-00744 (56) / AT-00736 (57) (RNAseq)). We extracted the differentially expressed genes in control vs treated conditions (FDR = 0.05, S1 data) and identified the number of ASTRAL genes that were differentially expressed (Fig. 6).
A significant association was present for differentially expressed ASTRAL genes upon drought compared to untreated in the Col-0 and in a mutant of a major regulator of drought, srk2cf background (AT-00419, One sided fisher-exact test, p-value = 0.03612) and at both 1 hour and 4 hours after drought stress of Col-0 compared to control (AT-00626, One sided fisher-exact test, p-value = 0.03678) (Fig. 6A). Across the two drought experiments tested, the same 11 ASTRAL genes were differentially expressed in response to drought (Fig. 6A). Among the upregulated ASTRAL genes was EID-1 Like 3 (EDL3) (58), a known drought response regulator. The ASTRAL genes also appear to show greater differential expression after longer incubation with the hormone abscisic acid ABA, in an SNF1-RELATED PROTEIN KINASE 2 (SnRK2) subclass II (SRK2C and SRK2F) independent manner (Fig. 6A).
The ASTRAL gene set was associated with differential expression upon treatment of Col-0 rosette leaves with Pseudomonas syringae (One sided fisher-exact test, p-value = 0.00433). Upon treatment of necrotrophic pathogen Botrytis cineria there was no association between differentially expressed genes and ASTRAL candidates in a wild-type genetic background (One sided fisher-exact test, p-value = 0.076) (Fig. 6B). Interestingly, in the wrky33 mutant background there was a significant association between differentially expressed and ASTRAL genes (One sided fisher-exact test, p-value = 0.0147) (Fig. 6B). WRKY33 has been shown to be crucial for providing immunity to B. cineria in A. thaliana through negative regulation of ABA (57). 15 ASTRAL genes were differentially expressed in the B. cineria wrky33 mutant background whilst 20 ASTRAL genes were differentially expressed upon P. syringae treatment of Col-0. Of these genes 11 ASTRAL genes were differentially expressed in both experiments. Since EDS1 is known to be induced upon pathogen infection, we were interested to see that upon hierarchical clustering of expression pattern 11 A. thaliana genes fell into a cluster with EDS1, whilst 8 of these genes have previously been implicated in effector triggered immunity, 3 genes (RPP13-like Protein 1 (RPPL1), AT1G55790 and AT5G66890) have not previously been experimentally shown to play a role in immunity (Fig. 6B). Two other clades of differentially expressed genes were of particular interest: the 5 gene clade characterised by another NLR helper gene NRG1 and the 7 gene clade containing CTC-INTERACTING DOMAIN 9 (CID9), MICROTUBULE-ASSOCIATED PROTEIN 65-8 (MAP65-8) and ASPARTIC PROTEASE IN GUARD CELL 2 (ASPG2) all of which were down-regulated upon pathogen infection (Fig. 6B). All three clades of up and down-regulated ASTRAL genes showed minimal changes in expression in SALICYLIC ACID INDUCTION DEFICIENT 2 (sid2-1) knockouts compared to wild type. This matches our expectation since SA pathway components were conserved in several aquatic species. We observed much larger effect in the AGD-2 DEFENCE LIKE PROTEIN 1 (ald-1-2) background. ALD1 encodes an aminotransferase that works synergistically with PAD4, the two genes have been shown to affect expression of one another (59,60).
Four of the 11 drought differentially expressed ASTRAL genes were also differentially expressed upon P. syringae and B. cineria wrky33 treatments. An additional 4 ASTRAL genes that changed expression upon drought where differentially expressed in one of the two pathogen treatments.
To identify if the same stresses disproportionately affect differential expression of ASTRAL genes in a monocot species we used the 34 rice genes which make up the ASTRAL gene set (S8 Table). We looked for patterns of high differential expression of rice ASTRAL genes across 25 RNAseq and 142 microarray datasets available at https://www.ncbi.nlm.nih.gov/geo/. Drought conditions in several studies appeared to cause an increase in the number of differentially expressed ASTRAL genes. We tested the association between ASTRAL and differentially expressed genes for two rice drought studies (OS-00140 (61) and OS-00143 (62) using FDR = 0.05). A significant association was found between differentially expressed and ASTRAL genes on the 9th day of sampling rainfed rice compared to irrigated gene expression (One sided fisher-exact test, p-value = 0.00089) but not on day 14 or 15 respectively (One sided fisher-exact test, p-value = 0.81, 0.245). A significant association was also found between ASTRAL and differentially expressed genes upon drought experiments simulating 10% or 15% available water content across three rice cultivars (One sided fisher-exact test, p-value = 0. 0046) (Fig. 7A). A total of 11 genes were found to be differentially expressed upon drought. Six of these ASTRAL genes were found differentially expressed in both conditions. The rice ortholog of EDL3 (LOC_Os01g58850) was differentially expressed upon conditions of 15% available water capacity (AWC) but not at 10% AWC.
We then tested if there was a significant association between rice ASTRAL genes and differential expression upon pathogen infection. For this we looked at three infection datasets: bacterial blight Xanthomonas oryzae, rice blast Magnaporthe oryzae and rice gall midge Orseolia oryzae (OS-00139 (63), OS-00045 (64), OS-00082 (65)). We decided to include a dataset of the insect O. oryzae as it has been shown that insects secrete effectors which activate plant immunity in a similar manner to other pathogens. Furthermore, insects commonly act as vectors for plant infection (65–67). There was no significant association between differential gene expression and rice ASTRAL genes upon X. oryzae infection with only 5 of the 34 ASTRAL genes differentially expressed upon infection when comparing all isolates of X. oryzae to control treatments (One sided fisher-exact test, p-value = 0.3054) (Fig. 7B). Conversely, upon infection with O. oryzae (GMB1, GMB4M) and M. oryzae (2dpi) there was a significant association between ASTRAL and differentially expressed genes (One sided fisher-exact test, p-value =0.0001, 0.0026, 0.0428) (Fig. 7B). All 4 genes differentially expressed upon rice gall midge GMB1 infection were also differentially expressed in GMB4M along with one additional gene. However, a different set of 5 other non-overlapping ASTRAL genes were differentially expressed in response to M. oryzae (Fig. 7B). Among the genes differentially expressed upon M. oryzae infection was rice ortholog of PAD4 (LOC_Os11g09010), although there was no significant differentially expression of EDS1 (LOC_Os09g22450) between treated and untreated conditions.
A model placing ASTRAL genes in immunity and drought response pathways
We assembled a working model of known plant immunity and drought pathways from literature (Fig 8A, 8B, S9, S10 Table). ASTRAL genes experimentally validated in the plant immunity (15–17,35,68) pathway were underlined in purple and those validated in the drought pathway (69,70) in brown, interactions consistent with current literature were depicted with solid arrows. For the remaining ASTRAL genes identified as differentially expressed upon pathogen treatment we investigated there expression further. In plant immunity, we identified whether the genes expression was EDS1 dependent, utilising microarray from eds1 mutant plants (71), this revealed 7 EDS1 dependent genes which had not previously been implicated with EDS1 yet have an inverted expression pattern in the eds1 mutant (S9 Table). ASTRAL genes genetically downstream of EDS1 based on expression data were connected by a dashed arrow. To construct how the EDS1 downstream ASTRAL genes contribute to known immunity processes a further literature search was used to identify known roles of the genes or conditions in which they were differentially expressed (57,71,72). For the 7 genes whose expression was EDS1 independent we conducted a literature search to see if they had been implicated previously in plant immunity and incorporated this data into the model (Fig 8A, S9 Table) (72,73).
For the drought response pathway, all ASTRAL genes which were known to be differentially expressed upon ABA were included (Fig 6, 7). To dissect where in the ABA drought response pathway the genes were involved we carried out a literature search (S10 Table) using information from literature and associated gene expression datasets (74,75) we were able to identify putative genetic positions of ASTRAL genes in ABA response, these were indicated by dashed arrows (Fig 8B). Interestingly, we were also able to identify some literature linking the pathogen responsive ASTRAL genes to ABA signaling (74–76) and the reciprocal (77,78) (79,80) even though a gene was often only identified as differentially expressed in one of the two conditions. Furthermore, the models highlighted the convergence of drought and pathogen response on similar phenotypes such as senescence and stomatal changes.
Discussion
Previous studies have investigated the genome content of one or two aquatic plant species and highlighted gene loss linked to embryogenesis and root development in humped bladderwort (43), cell wall processes and ABA in duckweed (39,40), and defence response, stomata, terpenoid and hormone pathways in eelgrass (41). In this study, we investigated the convergent adaptation to an aquatic environment across four aquatic plants and identified gene losses linked to the ETI immune signalling pathway. We observed a drastic reduction of NLR immune genes together with the absence of the known signalling components. Using comparative genomics, we identified additional genes that were convergently lost among aquatic lineages and mapped them to defence and drought response pathways using differential expression analyses.
While aquatic species have low numbers of NLRs, phylogenetic analysis shows that many of the retained NLRs come from independent recent expansions of species-specific clades. Furthermore, we were able to identify the parallel loss of the EDS1/PAD4 immune complex in these species as well as other genes such as SAG101 (32,34), NRG1 (16,17), ADR1 (26) and NDR1 (51,53) which have all previously been shown to be genetically required for signalling of certain NLRs. The loss of these components is supported by the absence of TIR1 NLRs in dicot aquatic species, which require EDS1/PAD4 and often NRG1 to cause plant cell death. We subsequently identified orthogroups of genes convergently lost in monocots and dicots with recent aquatic life history. We show that some of these genes have differential expression upon pathogen infection. Finally, we highlighted several orthogroups with similar differential expression patterns upon specific stimuli, suggesting these convergently lost genes in aquatic lineages may function in distinct pathways. It is tempting to speculate that the loss of the majority of NLR clades and the EDS1 complex in aquatic species is due to selection and re wiring of plant immunity toward a single common mechanism that facilitates increased fitness in the aquatic environment.
Duckweed, eelgrass, corkscrew plant, humped bladderwort and lotus all have the ability to survive with a substantial part of the organism submerged underwater. However, lotus does not have a reduced number of NLRs and retains both EDS1 and PAD4. One possible explanation for this is that lotus has roots in the soil despite its aquatic stem. Additionally, lotus has a substantial surface area of leaves and flowers on the water surface which may affect the pathogen load. Hence, for this study we have not considered lotus to truly be an aquatic species.
Interestingly, though the aquatic species typically have a low number of NLRs, the retained NLRs have undergone a recent rapid expansion. This observation is supported by the short branch length and highly branched nature of the NLR clades present in aquatic species. It suggests that these NLRs have a function in the aquatic plants. Nevertheless, whether or not they play a role in pathogen defence requires further work. Previous studies have shown NLRs including VARIATION IN COMPOUND TRIGGERED ROOT GROWTH RESPONSE (81) and CHILLING SENSITIVE 1 (82) can act as receptors of abiotic stress.
Upon pathogen stimulus there are some ASTRAL genes which are not differentially expressed, however, we cannot rule out the possibility that those genes may be involved in disease resistance. We have only looked at differential expression in a small subset of the possible combinations of conditions, tissues and pathogens that can result in an immune response. In addition, a gene doesn’t necessarily need to be differentially expressed in order for the protein or RNA it encodes to function in mediating a defence response.
The findings of the recurrent loss of EDS1/PAD4 and SAG101 together is consistent with genetic, biochemical and structural studies of EDS1/PAD4/SAG101, which show the three proteins function in heterodimeric complexes (32). Previous studies have also shown the absence of TIR-1 NLRs in monocots despite the presence of hundreds of TIR-1 NLRs in some dicot species (29). Despite absence of TIR-1 NLRs in terrestrial monocots EDS1 known to function in TIR-1 NLR signalling is retained (33). Recently, evidence was provided supporting the requirement of CCRPW8 NLR NRG1 in the signalling of NLRs (16,17). We propose that all members of the CCRPW8 clade play a role in NLR signalling that involves EDS1 as we find the CCRPW8 clade is absent in both independent groups of aquatic monocots and dicots despite its retention along with EDS1 in all terrestrial monocots and dicots. Another signalling component, NDR1, was absent in all aquatic species and some terrestrial species despite the retention of CNLs which commonly signal through NDR1. Interestingly, NDR1 is thought to mediate resistance by controlling fluid loss in the cell (52). This property highlights a possible intersection between the drought responsive ASTRAL genes and immunity. Both NDR1 and CCRPW8 mediated signalling converge in triggering increase in SA, which appears to be maintained in the aquatic species in our study (83).
Until now, discoveries of crucial components of the plant immune system have relied heavily on mutant screens and differential expression analysis. Here we have shown a complementary approach to identify potential actors in the plant immune system, which can circumvent issues of genetic redundancy by harnessing conservation and independent transitions in distantly related plant lineages. The study also has practical implications in providing new candidates for roles in disease resistance and shedding light on the important question of the downstream genetic reliance of NLRs used in agriculture crops. To slow the rate of breakdown of resistance by fast evolving pathogen effectoromes, NLRs are often stacked. The downstream signalling components required for NLRs within a stack are rarely considered but if they converge on a single helper or signaller, this creates a strong selection for effectors that would compromise the downstream component and subsequently break the defence conferred by several NLRs at once. In addition, it is crucial to understand the conservation of downstream signalling components to facilitate the successful interspecies transfer of NLRs. This study also provides fundamental understanding towards a minimum plant immune system and in doing so reveals potential new model systems such as duckweed, a rapid growing small plant whose reduced ETI immune system could provide a reduced-complexity background for investigating plant immunity. Unexpectedly, this study has begun to further elucidate the complex cross-talk between the plant immune system and drought tolerance. Future studies could use candidates identified to further query the interconnection of the two pathways.
Materials and methods
Genomic datasets used in this study
Genomic assemblies and annotations were obtained from: Phytozome V12 (https://phytozome.jgi.doe.gov/pz/portal.html) for A. coerulea, A. comosus (v3), A. hypochondriacus (325_v1.0), A. thaliana (167_TAIR9), A. trichopoda (291_v1.0), M. guttatus (256_v2.0), O. sativa (v7), O. thomaeum, S. lycopersicum (390_v2.5), S. moellendorffii (91_v1), S. polyrhiza (v2) and Z.marina (v2.2); from COGE (https://genomevolution.org/coge/) for U. gibba (29027); from KEGG for N. nucifera (4432), E. guineesis (TO3921); from NCBI P. equestrius (PRJNA382149), from Ash Tree Genomes (http://www.ashgenome.org/data) for F. excelsior (BATG-0.5), from Ensembl for Dioscorea rotundata (TDr96_F1_Pseudo_Chromosome_v1.0) and Maize genome database (https://www.maizegdb.org/) for Z. mays (AGPv4). The BUSCO scores were calculated using v1.22 version of BUSCO software to compare proteomes to embryophyta_odb9 BUSCO lineage (42).
Annotation, alignment and phylogenetic analysis of NLRs
To annotate NLRs in plant proteomes, the MEME suite (84) based tool NLR-parser (85) was used in addition to the updated version of the NLR-ID pipeline (29) available at https://github.com/krasileva-group/plant_rgenes. Annotations were combined into a non-redundant list of putative NLRs. Where multiple transcripts were present, these were filtered to retain the longest transcript. In addition, a series of characterised NLRs were added. These are available at https://github.com/krasileva-group/Aquatic_NLR/tree/master/Reference%20NLRs. The HMMALIGN programme from the HMMER3.0 suite (86) was used to align proteins to the NB-ARC1_ARC2_prank_aln_domain_ONLY HMM of the NB-ARC domain (22). The alignment was trimmed to the NB-ARC domain region using Belvu (87) and columns and sequences with over 80% gaps where removed. The NB-ARC domain of the remaining NLRs was then manually curated in Jalview (88) allowing no more than 2 consecutive characteristic NLR motifs (Walker A, RNBS-A, WALKER-B, RNBS-C, GLPL, RNBS-D) to be absent. A maximum likelihood phylogenetic tree was constructed using RAXML-MPI (v.8.2.9) (89) with parameters set as: -f a -x 1123 -p 2341 -# 100 -m PROTCATJTT. Trees were visualised and annotated using iTOL and are available at: [http://itol.embl.de/shared/erin_baggs].
Ortholog identification
To identify orthologs of specific genes of interest, we used reciprocal blast searches using BLASTP (-max_target_seqs 1 -evalue 1e-10) (BLAST+ 2.2.28.mt) (90). If upon reciprocal BLAST, a homologous gene was not identified, we used ensemble gene trees to check for recent duplication events in the Arabidopsis lineage and to confirm our results. Results were manually inspected and filtered using a script available at project github. The presence of EDS1 in O. thomaeum was validated using RNAseq data (BioProject SRS957807) mapped onto the Oropetium V1 Bio_nano genome assembly (http://www.oropetium.org/resources) (91,92) using HISAT2 (93). Bam files were processed using SAMtools-1.7 (94) and results visualised using IGB (95). The absence of EDS1 was validated by running HMMSEARCH was run on the proteomes of A. thaliana, S. polyrhiza and Z. marina using the Lipase 3 domain which is characteristic of EDS1. Proteins containing the domain were then aligned against the domain using HMMALIGN and the Pfam Lipase 3 HMM. The alignment was manually curated in Belvu before submission to RAXML as above.
To identify orthologous gene groups, OrthoMCL (v2.0.9) (96) was used as described previously (97). Due to large computational requirements, we ran OrthoMCL separately for monocots (Z. marina, S. polyrhiza, P. equestris, Dioscorea rotundata, Eleais guineesis, O. thomaeum, Oryza sativa and Zea mays) and dicots (Aquilegia coerulea, Nelumbo nucifera, Arabidopsis thaliana, Amaranthus hypochondriachus, Solanum lycopersicum, Fraxinus excelsior, Mimulus guttatus, G. aurea and U. gibba) both with the outgroups Selaginella moellendorffii and Amborella trichopoda. We overlaid monocot analyses with dicot orthogroups by mapping the monocot gene IDs to the Arabidopsis homologs using reciprocal BLASTP. Additionally, we applied the GeneSeqToFamily pipeline (98) on all 19 genomes. We then filtered orthogroups to identify orthogroups lost in aquatic species (S. polyrhiza, Z. marina, U. gibba, G. aurea) but retained in all terrestrial species of the same monocot/dicot clade. We cannot preclude the possibility that some gene families that have been convergently lost in aquatic species have not also been lost independently in some of the terrestrial lineages. After grouping and manual curation, gene families for which pan-species evolutionary history had been previously established were compared to gene families in our orthogroups. This curation led to the decision to mask S. moellendorffii, A. trichopoda, A. coerulea and D. rotundata from later analysis with the former two species rarely having homologs due to large phylogenetic distance and the latter two species having many erroneous gene fusion annotations. The analyses were filtered using scripts available at https://github.com/krasileva-group/Aquatic_NLR. Grep on cigar string output of parse_output.pl was used to identify gene families present in all terrestrial species diverging after A. trichopoda but absent in aquatic species. For comparison of OrthoMCL gene families retained between monocots and dicots, we used Arabidopsis and rice proteins as representative genome members and cross-referenced them using blastp reciprocal search (e-value cutoff 1e-10). The validity of this approach was checked on random protein families using Plant Ensembl trees (http://plants.ensembl.org/index.html). The rice gene ids were converted between Phytozome and Plant Ensembl using Plant Ensembl conversion tool (http://rapdb.dna.affrc.go.jp/tools/converter/run).
Expression profiling
The expression analysis for this study was performed and visualised using the 706 rice mRNA samples and 2,836 rice Affymetrix rice genome array samples available on Genevestigator v7.0.3 (https://genevestigator.com). For Arabidopsis the 1,031 Arabidopsis RNA samples and 10,615 Affymetrix Arabidopsis array sample available on Genevestigator v7.0.3 (https://genevestigator.com) were used. The datasets used to dissect drought and pathogen infection gene expression across the 44 A. thaliana ASTRAL, monocot and dicot overlapping genes were as follows: RNAseq - Pathogen - AT-00744 (56), AT-00736 (57), microarray – drought - AT-00419 (55), AT-00626 (54). The RNAseq dataset OS-00143 (62) and OS-00140 (61) was used to assay drought induced differential expression among ASTRAL genes. Gene expression experiments OS-00139 (63) (RNAseq), OS-0045 (64) (microarray), OS-0082 (65) (microarray) were used to analyse the differential expression of rice ASTRAL genes under pathogen infection.
Lists of differentially expressed genes were produced in Genevestigator using parameters FDR = 0.05 (S1 Data). A Fisher-exact test was performed in R (data <-matrix(c(DE ASTRAL, Non-ASTRAL DE, non-DE ASTRAL genes, remaining genes), nrow =2); fisher.test(data, alternative = “greater”). Hierarchical clustering was generated considering both genes and conditions with parameters of Euclidean distance and optimal leaf-ordering.
Contributions
ELB, WH and KVK designed the study. ELB performed NLR, phylogenetic, analysis. AST performed GeneSeqToFamily pipeline. ELB analysed orthogroups from OrthoMCL and GeneSeqToFamily to identify ASTRAL candidates. RNAseq analysis of ASTRAL genes performed by ELB and statistical tests by ELB and KVK. Model of gene interaction by ELB. All authors contributed to final manuscript. All authors read and approved final manuscript.
Acknowledgements
The authors are grateful to all members of the Krasileva group and their many colleagues, for thoughtful discussion on the presented material. We thank Daniil Prigozhin and Janina Tamborski for suggestions on the manuscript. The high-performance computing resources and services used in this work were supported by the EI Scientific Computing group alongside the NBIP Computing infrastructure for Science (CiS) group.
References
- (1).↵
- (2).↵
- (3).↵
- (4).↵
- (5).↵
- (6).↵
- (7).↵
- (8).↵
- (9).↵
- (10).↵
- (11).↵
- (12).
- (13).↵
- (14).↵
- (15).↵
- (16).↵
- (17).↵
- (18).↵
- (19).
- (20).
- (21).↵
- (22).↵
- (23).↵
- (24).↵
- (25).↵
- (26).↵
- (27).↵
- (28).↵
- (29).↵
- (30).↵
- (31).↵
- (32).↵
- (33).↵
- (34).↵
- (35).↵
- (36).↵
- (37).↵
- (38).↵
- (39).↵
- (40).↵
- (41).↵
- (42).↵
- (43).↵
- (44).↵
- (45).↵
- (46).↵
- (47).↵
- (48).↵
- (49).↵
- (50).↵
- (51).↵
- (52).↵
- (53).↵
- (54).↵
- (55).↵
- (56).↵
- (57).↵
- (58).↵
- (59).↵
- (60).↵
- (61).↵
- (62).↵
- (63).↵
- (64).↵
- (65).↵
- (66).
- (67).↵
- (68).↵
- (69).↵
- (70).↵
- (71).↵
- (72).↵
- (73).↵
- (74).↵
- (75).↵
- (76).↵
- (77).↵
- (78).↵
- (79).↵
- (80).↵
- (81).↵
- (82).↵
- (83).↵
- (84).↵
- (85).↵
- (86).↵
- (87).↵
- (88).↵
- (89).↵
- (90).↵
- (91).↵
- (92).↵
- (93).↵
- (94).↵
- (95).↵
- (96).↵
- (97).↵
- (98).↵