The plant circadian clock gene LHY influences Medicago truncatula nodulation

Legumes house nitrogen-fixing endosymbiotic rhizobia in specialized polyploid cells within root nodules, which are factories of metabolic activity. We discovered that the circadian clock-associated transcriptional factor LATE ELONGATED HYPOCOTYL (LHY) affects nodulation in Medicago truncatula. By carrying out expression analysis of transcripts over time in nodules we found that the clock enables coordinated control of metabolic and regulatory processes linked to nitrogen fixation. Rhythmic transcripts in root nodules include a subset of Nodule-specific Cysteine Rich peptides (NCRs) that have the LHY-bound conserved Evening Element in their promoters. Until now, studies have suggested that NCRs act to regulate bacteroid differentiation and keep the rhizobial population in check. However, these conclusions came from the study of a few members of this very large gene family that has complex diversified spatio-temporal expression. We suggest that rhythmic expression of NCRs may be important for temporal coordination of bacterial activity with the rhythms of the plant host, in order to ensure optimal symbiosis. Highlights The circadian clock-associated transcriptional factor LATE ELONGATED HYPOCOTYL (LHY) impacts on successful Medicago truncatula-rhizobial symbiosis The plant clock coordinates rhythmic patterns of metabolic and regulatory activity in nodules and drives rhythmic expression of a subset of Nodule-specific Cysteine Rich (NCR) genes. Rhythmic expression of NCRs may be important for temporal coordination of bacterial activity with plant host rhythms to ensure optimal symbiosis.


Introduction
In plants, animals and microbes, many aspects of physiology, metabolism and development exhibit 24-hour rhythmicity controlled by a circadian clock. Under natural day-night conditions, the circadian clock is synchronized to light-dark and temperature cycles and enables anticipation of predictable daily changes in the environment. Rhythmicity is particularly pervasive in plants. In the model plant Arabidopsis thaliana, about 30% of genes are expressed rhythmically in constant light, and up to 90% under at least some cycling environmental conditions (1). Many aspects of metabolism are rhythmic, including photosynthetic carbon assimilation, nitrogen and sulphur metabolism (2), and appropriate timing of starch utilization is known to ensure optimal growth (3). The circadian clock also impacts plant productivity and health by modulating interactions with microorganisms. Plants show different levels of resistance to fungal and bacterial pathogens depending on the time of infection (4)(5)(6)(7), and bacterial infections have been found to alter plant circadian regulation in order to attenuate immune responses (8). Plant circadian rhythms also influence the composition of rhizosphere microbial communities, and impeded circadian clock function in the plant host results in the recruitment of a different root microbiome, with consequences for plant health (9,10).
The mechanism of the plant circadian clock has been studied extensively and shown to consist of a small gene network comprising multiple transcriptional feedback loops (11). In A. thaliana, a pair of closely related MYB transcription factors, Late Elongated Hypocotyl (LHY) and Circadian Clock Associated (CCA1), are expressed in the morning and act to repress the expression of other clock components, by binding to a DNA sequence motif in their promoters known as the Evening Element or EE (AAATATCT/AGATATTT) (2). As LHY/CCA1 expression declines, a set of pseudo-response regulators (PRR9, PRR7, PRR5 and PRR1, also known as TOC1) are expressed as sequential waves during the day and early evening, and act to repress expression of LHY and CCA1 till the following dawn. A third set of proteins, composed of LUX ARRYTHMO (LUX), EARLY FLOWERING (ELF3) and ELF4, is expressed at dusk and forms an 'Evening Complex'. There is evidence that a similar mechanism operates in roots, although whereas the leaf clock is primarily synchronized to diurnal light-dark cycles, the root clock is thought to be entrained by shoot-derived signals (12)(13)(14). Clock components are conserved in both monocot and dicot crops, and have been linked to important agronomic traits including growth and flowering time (15). Homologues of A. thaliana clock genes have been identified in most legumes including soybean (Glycine max) cow pea (Vigna unguiculata) and garden pea (Pisum sativum) (16)(17)(18)(19). While LHY and CCA1 have largely redundant function in the central oscillator of A. thaliana, a single orthologue of these proteins is present in Medicago truncatula, termed MtLHY (20).
Altered function of the soybean circadian clock through overexpression of a light-signalling component has been seen to lead to grain yield increases (21). However, there is a lack of information about the impact of the circadian clock on legume symbioses with nitrogen-fixing rhizobia. This is important, because this symbiosis contributes to the nitrogen nutrition of the plant which increases plant growth, reducing the need for synthetic nitrogen fertilisers while also improving soil health. During nodulation, rhizobia are accommodated in specialized lateral root organs called nodules. Formation of nodules is initiated following recognition, by host plant LysM receptors, of Nod factors (NF) released from rhizobial bacteria. This leads to activation of calcium oscillations, then transcriptional responses that enable controlled cell division for nodule formation, and rhizobial entry via an infection thread. Within nodules, rhizobia inhabit an intracellular compartment derived from host cell membranes, called the symbiosome. They proliferate and differentiate into nitrogen-fixing bacteroids, which convert atmospheric di-nitrogen into a plant-accessible form such as ammonium that the host plant will incorporate into its own nitrogen metabolism. In exchange for the fixed nitrogen, the bacteria benefits from host-supplied carbon and other nutrients (22,23). The evolution of nodulation in legumes has been greatly shaped by a whole genome duplication event approximately 58 million years ago (MYA), resulting in amplified, rearranged gene families and retention of paralogous genes (24). Prominent amongst these is the Nodule Cysteine-Rich (NCR) gene family of small secreted peptides that are highly specific to nodules (25). Except for some Aeschynomene species from the relatively ancient dalbergoid lineage, NCRs are exclusively found in the Inverted Repeat-Lacking Clade (IRLC) of legumes which includes the model plant M. truncatula and many agriculturally important crops such as alfalfa, clovers, lentils, chickpea, garden pea and fava beans (26). Only a few NCRs have been characterized in detail so far, but a picture is emerging of the importance of functional diversity for this gene family (25). The diverse spatio-temporal expression profiles of NCRs (27)(28)(29), high level of expression specificity across nodules (30,31), and variation in amino acid sequence and isoelectric points (32), could enable this functional variation. Recent findings show that a subset of NCRs is regulated by nitrogen availability and by autoregulation of nodulation, suggesting an additional role for NCRs in controlling nodule development depending on cues from the environment (33).
Here we show that in M. truncatula nodules, disrupted circadian rhythmicity through loss of function of the core clock gene LHY results in reduced nodulation, suggesting that the circadian clock may impact on plant-rhizobia interactions in nodules. We investigate potential mechanisms through analysis of the rhythmic transcriptome in nodules and reveal circadian control of a subset of NCR genes through EE motifs in their promoters. We suggest that circadian regulation of NCR gene expression in nodules may play a role to ensure temporal coordination of bacterial activity with the rhythms of the plant host. Optimising the timing of nodule-specific nitrogen fixationregulatory peptides may allow improvement of nitrogen fixation without altering any aboveground clock features. This may represent an interesting target for sustainable agriculture of legume crops.

Loss of LHY function disrupts circadian rhythms and impairs nodulation in M. truncatula
In A. thaliana, LHY and CCA1 function as transcriptional repressors (34,35) and have a largely redundant function in the central oscillator of A. thaliana. In M. truncatula a single orthologue of these proteins is present (Medtr7g118330) that shares 36.6% identity at the amino acid level to AtCCA1 (At2g46830) and 44.2% identity at the amino acid level to AtLHY (At1g01060) (Fig. 1A), thus we call it MtLHY. In order to test the importance of MtLHY in the regulation of nodulation, we isolated and characterized two mutants with reduced MtLHY expression from the Noble collection of retrotransposon insertion (Tnt1) lines (Fig. 1B). The lhy-1 mutant contained an insertion upstream of the translational start site, likely in the promoter of LHY, and exhibited strongly reduced expression of the LHY transcript (20% of WT levels; Fig. 1C; Table S1). The second mutant, lhy-2, had an insertion at the end of the second-last exon and negligible expression of the LHY transcript ( Fig. 1C; Table S1).
We examined the rhythmicity of our lhy mutants by measuring the rhythmic opening and closing of the first true leaf. Plants were grown for 10 days under 12L12D cycles then imaged over 7 days in constant light. The wild-type R108 showed sustained rhythmicity for over a week in constant light, with a period length of approximately 30 hours. In contrast, both mutant alleles exhibited leaf movement rhythms with shorter free-running periods (approximately 26 hours) and a much lower amplitude, then became arrhythmic after 120 hours ( Fig. 1D-F). Leaf opening in the mutants occurred 6 hours early in the first day following transfer to constant light, indicating that both mutations resulted in a large phase advance in constant light ( Fig. 1E-F). These results demonstrated that loss of LHY function alters the function of the clock in M. truncatula.
In order to assess the effect of lhy mutations on nodulation, we inoculated three biological replicates of lhy-1, lhy-2 and R108 seedlings with the high-efficiency rhizobial symbiont Sinorhizobium meliloti WSM1022. When grown under 16L8D, both mutants had lower nodule weight and lower dry shoot weight than the wild-type ( Fig. 2A-C ; Table S1). Interestingly, our data shows that lhy yield is similar to WT when not inoculated with rhizobia, but reduced when plants are inoculated with rhizobia, suggesting that the reduced yield in the mutant is largely due to disrupted nodulation ( Fig. 2A-C). These observations indicate that normal function of LHY is required for optimal nodulation and plant yield.   Table S1 for all values and analyses.

Gene expression within nodules shows the presence of a functional nodule clock
We then asked what processes involved in nodule function might be under circadian regulation by carrying out a timecourse analysis of the rhythmic transcriptome. Plants were grown under diurnal light-dark cycles (12L12D) for 40 days in order to entrain their circadian clocks, then transferred to constant light to test for persistence of rhythms in the absence of environmental time cues. Nodules were sampled every 3 hrs for the first 24-hours in constant light, then every 6 hrs for another 24 hours (Fig. 3A), and changes in gene expression were analyzed by RNA-seq (Table S2). After normalization and mapping of reads to the M. truncatula genome (4.0v1) we were able to determine expression levels for 61,510 transcripts out of the 62,319 protein-coding transcripts in the genome (98.7%). We then used Metacycle analysis (36) to identify transcripts that oscillate with a period of approximately 24-hours. This identified 2,832 transcripts with rhythmic behaviour in constant light (P<0.05) (Fig. 3B), i.e about 5 % of the transcriptome. Hierarchical clustering identified four broad groups of genes ( Fig. 3C-F). The majority of rhythmic genes peaked around dawn and were found in clusters 1 and 4, peaking 24 and 21 hours after dawn, respectively. These morning and late-night clusters contained 811 and 1,458 transcripts, respectively, whereas the morning and evening clusters (clusters 2 and 3) contained only 242 and 321 transcripts. This contrasted with previous observations in A. thaliana whole plants, that the majority of cycling genes peak either before dawn or dusk (1).
We first examined the expression patterns of clock-associated genes in nodules; Fig. 3C-F; Table S3. Putative orthologs of all A. thaliana circadian clock genes are present within the M. truncatula genome (Table S3). LHY was identified in cluster 1 and peaked at dawn. Q-PCR analyses confirmed this observation, and showed that temporal patterns of LHY expression were similar in roots and nodules, but preceded expression in shoots by several hours in constant light (Fig. S1). PRR5 homologues were expressed in consecutive waves, with PRR5c and PRR5a peaking 3 and 9 hours after dawn, and PRR7a and TOC1 peaking at dusk (Fig. S2). Evening complex genes LUX, ELF3 and ELF4 were expressed at dusk, whereas GI, which plays a role in light-dependent turn-over of the TOC1 protein, was expressed in the late afternoon (Fig. S2). The temporal expression patterns of these clock-associated genes were consistent with those in leaves of A. thaliana (1,37), rice and poplar (38), and the legume soybean (39), suggesting that circadian clock mechanisms are largely conserved between these plant species, and between root nodules and plant leaves.

Rhythmic coordination of nodule metabolism
Having determined that circadian clock components cycle in nodules, we asked which other aspects of nodule function might be under circadian regulation, through analysis of gene descriptions (obtained from Phytomine in Phytozome) as well as GO term, biological pathway and protein domain enrichment (summarized in Fig. 3G-H). As previously described in A. thaliana (2), genes associated with the isoflavonoid pathway were expressed around dawn, including the ratelimiting enzyme phenylalanine ammonia-lyase (PAL). The morning cluster (Cluster 2) was enriched for genes involved in elongation growth, including the transcription factor Phytochrome Interacting Factor 4 (PIF4, Medtr3g449770), which promotes auxin biosynthesis in A. thaliana (40), eight Walls Are Thin1 (WAT)1-related genes encoding glycoside hydrolases, as well as nodulin genes; all related to cell wall biosynthesis and flavonoid biosynthesis (41) (Fig. 3D). This was consistent with the temporal expression pattern of transcripts associated with cell wall synthesis, both in monocots such as rice and dicots such as poplar (38). The morning cluster also contained 4 genes related to light-harvesting complex I chlorophyll a/b binding proteins, which are associated with plastid function in roots and beneficial plant-microbe interactions (42).  Table S2. (C-F) There are four clusters whose expression peaks at: dawn (red, C), morning (yellow, D), evening (blue, E), late night (green, F); overrepresented GO terms, pathways or protein domains and the clock-related genes (see Table S3) within each cluster are shown. (G) Hierarchical clustering representing correlation among significantly enriched (top 20) biological processes in rhythmic transcripts. Enrichment P-values (Bonferronicorrected for multiple testing) are shown for each biological process and represented by blue dots, where larger dots are more highly significant; see Table S4. (H) Summary of changes over a 24h period (12L12D) in nodules.
The evening cluster (Cluster 3) contained genes encoding enzymes associated with starch degradation (Fig. 3E). This is consistent with the pattern of starch accumulation and degradation in A. thaliana, where starch accumulates during the light period, then is utilized to support growth during the night (3). Three genes related to starch degradation were also expressed in the latenight cluster (Cluster 4, Fig. 3F), indicating continuous starch degradation during the night period, which may be important to provide simple carbon compounds to the rhizobial symbionts.
Late-night and early morning (dawn) processes were largely related to nitrogen metabolism. Both Cluster 4 and 1 contain many genes that are associated with glutamate metabolism, amino acids and nitrate/nitrite transport and Cluster 1 contains glutamine synthetase (Medtr3g065250), which catalyses the first step of nitrogen assimilation (Fig. 3F, Table S4). Furthermore, the ureide biosynthesis pathway was enriched in Cluster 4. Ureides are the main long-distance transport forms of nitrogen from nodules to the shoot and are moved up the xylem vessels to the leaf tissue where then they are used as a nitrogen source (43). Although indeterminate nodule-forming legumes including M. truncatula have been classified as amidetype (rather than ureide type), detection of ureide pathway related genes suggests that this part of the metabolism still occurs in these legumes as a response to nitrogen fixation (44). Genes related to phosphate metabolic processes were also enriched in late-night Cluster 4, which may play a critical role to meet the high Pi requirement for nitrogen fixation during the late night (Fig.  3F). Genes related to anion transmembrane transport, over-represented in Cluster 1 (Fig. 3C), may enable movement of compounds formed during the night.
Together, these results indicate that key aspects of nodule function, including starch utlization, nitrogen assimilation and nitrogen transport, occur rhythmically under the control of the circadian clock, and suggest that appropriate temporal coordination of these processes may be important for optimal nodule function.

Circadian regulation of NCR gene expression in nodules via the Evening Element
We then looked for genes that might play a role to coordinate rhizobial activity with metabolic rhythms in nodules, and found that a large number (45) of NCR transcripts were rhythmic ( Fig.  4, Table S5). Rhythmic expression of this group of peptides, known to affect rhizobial activities (25), may be part of the mechanism by which the plant circadian clock impacts on nodulation. The distribution of rhythmic NCRs per cluster showed that there were 2 NCRs in the dawn cluster 1 (0.25%), 5 NCRs in the morning cluster 2 (2%), 24 NCRs in the evening cluster 3 (7.5%) and 14 NCRs in the late-night cluster 4 (1%). These results show an enrichment of NCRs in the evening cluster, compared to a random distribution across all clusters (Fig. 4C Table S2 for all data values. These observations led us to investigate how the clock might control NCR gene expression in nodules. We carried out a de novo motif analysis in promoters of NCRs with rhythmic expression (Fig. 5A). This identified 3 over-represented motifs within 500 bp upstream of the translational start sites and within 200 bp of the TATA box (Fig. S3). These 12bp-motifs mapped into longer stretches of conserved sequences identified in a previous study (31). Fig. 5A-B) was closely related to the Evening Element (EE) AGATATTT, which in A. thaliana is bound by the circadian clock-associated proteins CCA1 and LHY (2,34,35,45). Consistent with regulation by these dawn-specific transcriptional repressors, more than half of the rhythmic NCRs were enriched in the evening cluster. To obtain further evidence that the EE motif drives rhythmic expression of NCRs, we used the FIMO tool (46) to assess the presence of the motif in the promoters of rhythmic vs non-rhythmic NCRs. This showed that 29% (13) of NCR gene promoters that cycled in our experiment contained the canonical EE motif, AGATATTT, compared to 8% of a random set of non-rhythmic NCR genes (P=0.0013 for enrichment) (Fig. 5C, Table S6). We then asked if the EE might also be found in non-rhythmic members of the gene family. Out of the 700 NCR genes, compiled from data from (47) and (48) ( Table S5) we identified an EE in the promoters of 75 genes (~11%), compared to 4% in a randomized set of 700 non-NCR gene promoters ( Fig. 5D; P=0.0047 for enrichment; Table S6). Almost all promoters that contain the EE had a single occurrence of the motif, with just three promoters having 2 occurrences (Table S5). 63 NCRs contained the EE but did not oscillate in our experiment (Table S5). This is consistent with previous observations that a large proportion of CCA1 and LHY regulatory targets do not exhibit rhythmic expression under any observed condition (34,35). Expression of these NCRs may cycle at other stages of nodule development or in specific nodule cell types.
The alternative version of the EE motif, AGACATTT (EE-related, or 'EER' motif) was only found in 6 rhythmic NCRs (Fig. 5C). To ask whether the EER motif was capable of driving rhythmic gene expression, we searched for this motif in the promoters of the 2,832 cycling transcripts described above (represented by 2,703 genes). Whereas the EE motif was associated with evening-specific gene expression, rhythmic genes containing the EER motif were expressed at all phases of the day-night cycle (Fig. 5D), suggesting that the EER motif might not play a role in circadian regulation of gene expression. On the other hand, the EER motif was found in about 23% of NCR promoters, compared to 2.4% in a control gene set (2.5%, P=3.8E-5 for enrichment) suggesting a potential role in nodule-specific gene expression (Fig. 5C). Only 13 NCRs had both EE and EER motifs, supporting the notion that there are distinct functional categories of NCRs (Table S5). Across all NCRs, presence of the EE or of the EER motif was fairly evenly distributed across the phylogeny, suggesting that they did not arise as part of a single lineage-specific expansion event (Fig. S4).
Nodules are comprised of five zones that each have a different role in nodulation, which are the meristem (furthest from the root), the infection zone, the interzone, the nitrogen fixation zone and the senescence zone (closest the root). We asked if the oscillating NCRs are expressed in any of these particular zones by querying data published in a study that determined the expression locations of 521 NCRs out of our list of 701 NCR genes (29). NCR expression is abundant in the interzone (53%), but the interzone was also enriched (P=0.039; chi-squared test) for cycling NCRs whose promoter contains an EE, with 77% of these rhythmic NCRs being localized there. Therefore this region of the nodule, site where bacteroids become fully elongated and start to express N-fixation genes along with the expression of some key late nodulins by host cells (49), seems to be particularly enriched for LHY-controlled NCRs. In contrast, NCRs containing the EER motif were not enriched in the interzone (P=0.742; chi-squared test). These observations suggest that rhythmic expression of NCRs under the control of the EE may be particularly important at the stage where bacterial symbionts are differentiating into bacteroids.  Table S6 for all motif enrichment statistical analysis. (D) Presence of the EE and the EER in the % of transcript promoters within each cluster; see Table  S6 for all motif enrichment statistical analysis.

Discussion
Many aspects of physiology, metabolism and development exhibit circadian regulation across plants, animals and some microbes. Thus, the clock often influences the outcome of interactions between organisms (8). In plants, the oscillator mechanism of the clock has been investigated at length in shoots (reviewed in (11)). In roots, it is known to be entrained by shoot-derived signals (14). Despite the agricultural importance of the beneficial legume-microbe interaction of nodulation, very little is known about the impact of the plant clock on this nitrogen-fixing symbiosis. In common bean (Phaseolus vulgaris), changes in expression levels of clock-associated genes were detected in the early stages of symbiosis, suggesting that the function of the root circadian clock was adjusted in response to infection by rhizobial strains (50). Here we show that loss-offunction of LHY, one of the core clock genes, results in reduced nodulation. Plant yield was reduced in lhy mutants that were inoculated with rhizobium, but not in un-inoculated plants, suggesting that reduced biomass was caused by disrupted nitrogen fixation. These observations were consistent with a recent report showing that reduced function of MtLHY was associated with reduced nodulation (51). While it remains possible that LHY impairs nodulation through a development effect, these results suggest that appropriate timing of circadian rhythms in plant nodules may be important for optimal symbiotic interactions with rhizobium.
We show that the temporal pattern of expression of clock-associated genes in Medicago truncatula nodules is consistent with that observed in other plant species and in other organs, suggesting that the molecular mechanism of the central oscillator is largely conserved. However, while LHY and CCA1 are closely related and have largely redundant functions in A. thaliana, the M. truncatula genome contains a single orthologue of these proteins. Whereas loss of function of both genes was required to disrupt free-running rhythmicity in A. thaliana (52), loss of function of MtLHY led to short period rhythms of leaf movements and gradual arrhythmia in constant light. The LHY binding site, also known as Evening Element or EE, was over-represented in the promoters of M. truncatula nodule genes that peaked in expression in the evening. This was consistent with a role for the cognate transcription factor, MtLHY in driving rhythmic gene expression. These findings suggest that MtLHY plays a similar role to its A. thaliana orthologue as a core component of the nodule central oscillator.
In order to obtain clues to the mechanisms by which the circadian clock might affect nodulation, we went on to identify rhythmic processes in nodules. About 5% of the genome showed rhythmic expression in constant light, which was much smaller than the fraction reported for whole A. thaliana plants, about 15% (1, 2, 53). As no other circadian transcriptome data are available for plant roots or for M. truncatula, it is unclear whether this reflects a species difference or a root vs shoot difference. Expression of genes associated with starch degradation was observed in the evening, as previously described in A. thaliana (Harmer et al., 2000). This was followed by expression of genes associated with ureide biosynthesis in the late subjective night. Ureides are the main long-distance transport forms of organic nitrogen in legumes, and their production in the late subjective night suggests that nitrogen-fixation in symbiosomes occurs during the night, deriving its energy from starch hydrolysis. In support of this hypothesis, we also find transmembrane amino acid transporters in the late-night/dawn gene clusters.
Genes associated with isoflavonoid biosynthesis peaked around dawn, as previously described in other plants such as Ginkgo (54) and A. thaliana ( (2)). Isoflavonoids are polycyclic compounds that belong to the wider group of phytoalexins that are synthesized by many plants and many have antimicrobial activities. In legumes, a wide range of isoflavonoid compounds have been described, with the composition mix being different depending on the species (55). Some of these isoflavonoids actually initiate the plant-symbiont molecular dialogue that leads to nodule formation, by inducing the expression of nod genes in rhizobia (56). The clock is known to regulate plant defence responses, and plants are typically more resistant to pathogen attacks at dawn (4, 6, 57). Production of flavonoids at dawn may contribute to this gating mechanism, to control entry of microbes into plant roots while attracting rhizobial symbionts. However, flavonoids are thought to have a role beyond initial recruitment of rhizobia, since they are mostly produced in the nodule infection zone, where bacteroids become fully elongated and start to express N-fixation genes (58). In mature nodules, isoflavonoids have been suggested to play a role in maintaining a homogeneous rhizobial population (59). Since expression of genes associated with cell wall biosynthesis peaks in the morning, rhythmic production of flavonoids could act to coordinate nodule cell expansion with bacteroid proliferation at dawn.
Our transcriptomic analysis also revealed the rhythmic expression of a subset of NCRs, with the majority peaking in the evening. This large family of peptides is thought to control bacterial differentiation within the nodule, but there is evidence for functional differentiation of NCRs, as different NCRs can have either pro-symbiotic or anti-symbiotic properties (60, 61) and bacterial elongation and activity in nodules can vary depending on the particular suite of NCRs present in the plant host (47). The observation that a subset of NCRs is expressed rhythmically in nodules suggests a function to synchronize bacterial activity with the rhythms of the plant host and provides further evidence for functional differentiation of this group of peptides. Previous studies of NCR promoters identified long stretches of conserved sequence which included putative regulatory motifs such as an ID1 binding site, an Auxin Response Factor (ARF) binding site, a DOF protein binding site, and MADS transcription factor binding sites (31). Here we show that the EE motif is over-represented within NCR promoters, suggesting direct repression by the MtLHY clock protein in the morning. This explains the temporal expression pattern of the majority of NCRs, peaking in the evening in cluster 3. Some NCRs peaking earlier or later in clusters 2 and 4 that contained EEs in their promoters are also likely to be regulated by MtLHY in combination with other rhythmic transcription factors.
A related motif named EE-related or EER was also identified, which was not associated with expression at specific times of the day but was over-represented in NCR promoters, and thus may be associated with nodule-specific gene expression. This motif was also present within one of the stretches of conserved sequence previously identified in NCR promoters (31). The EE sequence (AGATATTT) lies at the same position within this conserved sequence, but was not uncovered in that previous research, likely due to the EER sequence variant (AGACATTT) being present at a high frequency. The presence of a cytosine in the EER motif is interesting because it could be associated with epigenetic regulation of expression (62).
The coordination of nodule growth with bacterial differentiation and nitrogen fixation in indeterminate legume nodules is a well-orchestrated process. Our results suggest that rhythmic expression of NCR peptides under the control of the plant circadian clock plays a vital role in the establishment of successful symbiotic interactions. Many crops have lost their photoperiodic responses as part of domestication, because this was essential for cultivation at a broad range of latitudes, and in many cases, this happened through disruption of the circadian clock (63). For example, clock components have been modified during the soybean domestication process (reviewed in (64). It is therefore crucial to understand how the clock affects the host-symbiont interaction so we can avoid breeding against the efficiency of the nitrogen fixation process. Moreover, the possibility of modulating specific downstream pathways such as rhythmic NCR peptides may enable optimization of nodulation whilst avoiding undesirable plant clock related side-effects. By identifying a mechanism that links control of plant growth and development with that of its symbiotic partner, our work opens up a new field of investigation for understanding how the rhizobial activity is regulated by the plant.

Plant materials and growth conditions
Medicago truncatula wild-type accession A17 in the Jemalong background was obtained from the IGER seed bank (Aberystwyth, http://www.igergru.ibers.aber.ac.uk). Tnt1 M. truncatula mutant lines for LHY (Medtr7g118330) in the R108 background were identified from the Noble Research Institute ((https://medicago-mutant.noble.org/mutant/database.php) (65). Lines were identified by querying R108 sequences of the M. truncatula LHY coding region plus 200 bp upstream and downstream using a blastn search with default parameter settings (E-value cut-off 10 -6 ). Tnt1 lines were selected based on their E-values and % identity > 95. Lines NF17115 (lhy-1) and NF16461 (lhy-2) were identified with insertions in the promoter region and fifth exon respectively (Fig. 1B).
Seeds were scarified with concentrated H2SO4 (~2-3 mL per 100 seeds) then thoroughly washed 3-5 times with sterile water. Scarified seeds were sterilized by treating with 7% sodium hypochlorite solution for 5 minutes with gentle shaking followed by 7 times repeated washes with sterile water. Seeds were sown on 1.5% phyto-agar plates, sealed using 3M Micropore TM tape, wrapped in foil then left in at 4˚C for 72 hrs. Plates were then placed in a Sanyo MLR-352 growth chamber (25˚C) for 4 days before seedlings with a radicle length of >2 cm were transferred to

lhy mutant characterization and rhythmic leaf movement assays
For phenotypic analysis, plants were grown under 12L12D for 5 weeks, then removed from pots and photographed before measuring shoot and nodule weights. For rhythmic leaf movement (RLM) assays, plants were grown under 12L12D for 10 days before transferring to constant light for imaging from above using timelapse cameras (Brinno). Opening and closing of the first true leaf was monitored by measuring changes in visible leaf area using ImageJ software. Greyscale images were thresholded and converted to binary with leaves showing white on a dark background. White pixels were then quantified over time in regions of interest using the Integrated Density tool. The experiment was repeated twice, then data from both biological replicates was combined. Baseline detrending was applied to the data and periodicity for the remaining samples was analyzed using FFT-NLLS in BioDare2 (https://biodare2.ed.ac.uk (66)).

Rhizobial culture preparation and seedling inoculation for timecourse analysis
Sinorhizobium meliloti strain WSM1022 was grown on TY/Ca 2+ plates (5g/L tryptone, 3g/L yeast extract, 6 mM CaCl2·2H2O, pH adjusted to 6.8-7.0) at 28˚C for 2 days. From the solid culture, S. meliloti was spot-inoculated into 10 mL liquid TY/Ca 2+ medium and grown for ~24 hours with gentle shaking at 28˚C. Rhizobial cells were harvested by centrifugation at 3200g for 10 minutes, washed twice with sterile water, then resuspended in sterile water to an OD600=0.05. 250 μL of freshly prepared rhizobial solution was used to inoculate each M. truncatula seedling the day after potting by pipetting onto the vermiculite layer in close proximity to the plants.

Sampling plants for transcriptomic analysis
For RNAseq/qPCR timecourse analysis, after 40 days in 12L12D, pots were transferred to constant light conditions at the same irradiance. At 0hr, and then every 3hr up to 48hr, plants were removed from pots, samples were pooled from 6-7 plants for one biological repetition, immediately flash frozen and stored at -80 °C. Nodules were picked from roots using tweezers, part of each root system without nodules was collected, and leaves were collected as 4-5 trifoliates. Samples for qPCR analysis of LHY expression in mutants vs. R108 (Fig. 1B) were taken at 7:30 (morning) and 15:30 (evening) into the light cycle. Experiments were carried out three independent times.

RNA extraction, RNAseq and quantitative PCR (qPCR) analysis
Frozen plant tissue samples were finely ground using a mortar and pestle, then around 100mg of each powdered sample was used for total RNA extraction then gDNA removal, using the Monarch® Total RNA Miniprep Kit. The quantity (>100 ng/μl) and quality (RNA integrity > 8.5) of RNA was determined using a Bioanalyzer 2100 RNA 6000 Pico Total RNA Kit (Agilent Technologies). Samples containing >5 μg of RNA in total were used for RNAseq. mRNA library preparation, quality assessment and sequencing (150bp, unstranded, paired-end) were carried out by Novogene; mRNA libraries were prepared following the Illumina TruSeqTM RNA library preparation protocol, after rRNA had been removed using the Ribo-Zero kit.
For qPCR analysis, cDNA was prepared using the ProtoScript II First Strand cDNA Synthesis Kit from New England Biolabs (UK) Ltd. qPCR was performed with 20 µl reaction volumes using SYBR Green JumpStart Taq ReadyMix (Sigma-Aldrich) and 40 two-step amplification cycles (95°C and 60°C for 30s and 60s, respectively) in a 96-well Agilent Mx3005P real time PCR machine. Primer pairs designed based on the borders of the 2 nd and 3 rd exons (Fp: CACAAAACAAAGAGAACGATGG, Rp: ATGGCTCCTGATTTGCACAG) were, used for the quantification of LHY expression, normalized against the reference gene Mtβ-Tubulin Medtr7g089120 (Fp: TTTGCTCCTCTTACATCCCGTG, Rp: GCAGCACACATCATGTTTTTGG). Data was analyzed using the ΔCt method, a derivation of the ΔΔCt method (69).

Statistical analysis of transcriptomic levels
Raw sequence data in the form of a pair of .fq.gz files with sequencing depth of at least 20 million reads per sample were processed using tools on the Galaxy EU server (usegalaxy.eu). First, the quality of raw sequencing data was analyzed by FastQC (70). Replicate 2 for 21hr and replicate 3 for 15hr were found to have poor quality data and were removed from analysis. Contaminating adapter sequences and poor-quality sequences were removed by Trimmomatic v36.4 (71) with the following settings: slidingwindow:4:20 and minlen:40 and using phred33 quality scores. Next, these clean, trimmed and paired reads were used to generate raw transcript read-counts and transcripts per million (TPM) normalized read-counts using Salmon quant v0.14.1 (72) with M. truncatula reference transcript sequences (Mt4.0v1) downloaded from the Phytozome database (phytozome.jgi.doe.gov). Expression data for a total of 61,510 transcripts was generated. Read counts were further normalized as log2 transcripts per million (logTPM). To identify the diurnally oscillating transcripts, logTPM expression data were analyzed using the R package MetaCycle v1.2.0 (73) with the following settings: minper: 20, maxper: 28, cycMethod: LS (Lomb-Scargle).
Hierarchical clustering using the total within-cluster sum of square (elbow method) was performed in R using 1-Pearson's correlation coefficient as a dissimilarity distance measure between normalized (mean centred and scaled by SD), oscillating genes. Enrichment analysis for processes was performed with Bonferroni method of correction (P-values<0.05) detailed in Table  S4.

Promoter motif presence and structure analysis
Promoter sequences of M. truncatula (Mt4.0v1) genes were retrieved from the M. truncatula genome database (http://www.medicagogenome.org). Promoter motif analysis was carried using the MEME suite (74) and de novo motif discovery runs were performed on either strand of unaligned 500 bp upstream sequence with motif width of 12 bp. We subsequently also queried 200bp and 1000bp of each NCR promoter, finding that motifs were clustered within the 500bp region; this location is consistent with findings from (31). Conserved motifs were selected based on bit size (range from 0-2), positional bias (P-value<0.05) and with an E-value<0.001 (Fig. S3).
Position weight matrix (PWM) scanning for known motifs was performed using Find Individual Motif Occurrence (FIMO) (46) to determine the number of matches in each promoter region. FIMO was used to search for the occurrence of MEME-generated motifs against these backgrounds on 500 bp promoter upstream sequences with a P-value<1E-4 (46), then genes with at least one motif site in their promoter were assessed for enrichment. For analysis of the whole NCR gene family, 700 NCRs (compiled from data from (47) and (48), Table S5) against 3 sets of 700 randomly sampled non-NCRs from the remainder of the M. truncatula genome were used for enrichment analysis. The Shapiro-Wilk test was used to determine enrichment data normality and an F-test was used to determine data variance. A two sample Welch t-test and pairwise Wilcox test was then used to calculate P-values in R. Similarly, enrichment for the occurrence of the EE in the 45 cycling NCR genes was compared to: (i) 45 expressed but non-cycling NCRs (ii) 45 expressed and cycling non-NCRs and (iii) 45 non-expressed and non-cycling NCRs. FIMO scan for AGATATT enrichment was also performed on upstream regions of (i) all 2,832 cycling transcripts (ii) randomly sampled sets of 2,832 non-cycling genes. For all FIMO enrichment analyses, MEME was used to generate motifs with a size of 10 bp nucleotides and random samples were generated in R.

Multiple sequence alignment of NCR promoters
Promoter sequences were aligned using MAFFT (Multiple Alignment using Fast Fourier Transform) tools at EBI (European Bioinformatics Institute) (75). The promoters of eight cycling NCR genes were aligned using a Hidden Markov Model (HMM), selected based on the occurrence of all the three motif sites. For optimal alignment and representation of motif conservation, we ran algorithms with a parameter setting of gap lowest open penalty 1 allowed in MAFFT, gap extension 0.5 and an iteration of 100 runs; due to the diverse nature of the NCR sequences (76), gaps were required to generate the best local and global alignment (77). The aligned sequences were then visualized in Genious version 11.0.2 (https://www.geneious.com).

Transcription factor search and ortholog identification
Tomtom (78), was used to search for PWM query motifs against the A. thaliana PBM db (79), DAP motifs (O' Malley 2016) and JASPAR plants 2018 databases of known transcription factor binding sites. For othologous gene identification in M. truncatula, we used two methods. Firstly, reciprocal BLASTp was performed with the NCBI blasp suite, them alignment score, percentage query coverage and expect value were determined for forward and reciprocal queries (80). Secondly, a Smith-Watermann (SW) alignment homologue search was carried out using the Phytozome v12.1.6 database. Different homologs of same clock gene are marked as a/b/c, based on their level of similarity to A. thaliana ortholog, with a being the highest. Although we did not find orthologs for PRR9 and PRR3, it should be noted that M. truncatula homologs listed under PRR7/5 also share similarity with AtPRR9/3.

Promoter phylogenetic tree reconstruction
The DNA sequences of 700 NCRs with their upstream region were aligned using MPI-based MAFFT version 7.3 https://mafft.cbrc.jp/alignment/server/index.html for large sequences. Except for three NCR genes (Medtr0538s0010, Medtr1886s0010, Medtr0753s0010) in scaffolds that have an upstream region <500 bp, all other NCR promoters were of 500 bp upstream length. Maximum-likelihood (ML) analyses and search for the best-scoring tree were performed using RAxML version 8.2.10 with rapid bootstrapping of 100 replica runs. The substitution model of Generalised Time Reversal (GTR) and the Gamma model of rate heterogeneity was used. The best resulting ML tree for DNA alignments was used for visualization of the phylogenetic tree on

CCA1/LHY phylogenetic tree reconstruction
Clock gene homologues were initially identified via reciprocal BLASTp was performed with the NCBI blasp suite of A. thaliana clock protein sequences against sequences in the National Center for Biotechnology Information (NCBI). Hits were sorted primarily by maximum bitscore score followed by E value. In cases where there were multiple high scoring hits, multiple putative orthologues were identified. Homology was also assessed using the Smith-Watermann (SW) alignment homologue search with the Phytozome v12.1.6 database. Evolutionary history was inferred using the Maximum Likelihood method and JTT matrix-based model (81). Initial trees for the heuristic search were obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model. Trees with superior log likelihoods are shown. Trees were visualised using the interactive tree of life v4 (82). Data sharing: All raw RNAseq data has been deposited in the NCBI SRA database (PRJNA634620) Competing Interest Statement: The authors declare that they have no competing interests. Figure S1. LHY is rhythmically expressed in nodules. LHY expression exhibits free-running rhythmicity with a dawn peak of expression, consistent with the presence of a functional clock in nodules. The overall expression of LHY is highest in leaves, as expected, but the same rhythm is visible in leaves (green, right-hand y-axis), roots (yellow, left-hand y-axis) and nodules (orange, left-hand y-axis); all measured using qPCR across the timecourse. Expression measurement in nodules using qPCR (orange) corroborates the measurement using RNAseq (blue). White and grey background represents presumptive day and night; error bars indicate standard deviation of the mean (n=3 biological replicates); see Table S3 for all values. The Evening Complex (EC) is formed of three proteins, ELF3, ELF4 and LUX, whose expression peaks in the evening; (C) Pseudo-response regulators (PRRs) are homologous to TOC1 and expressed mainly during the day to form additional loops. (D) RVEs and Gigantea (GI) synchronize the central clock with output pathways. White and grey background represents presumptive day and night (plants were transferred from 12/12 hours dark/light to constant light before sampling); error bars indicate standard deviation of the mean (n=3 biological replicates, except for 15h and 21h where n=2); all graphs have the same y-axis (normalised RNAseq transcripts per million (TPM) count). See Table S2-S3 for all data values. Figure S3: Promoter landscape of the NCRs. Multiple Sequence alignment (MSA) visualization of the conserved motifs in eight NCR promoters that best represents the four conserved regions (red boxes) using Genious 1102. The visualization was performed after aligning 500bp upstream promoters using the EBI tool MAFFT aligned with open gap and gap extension penalty due to the diverse nature of the NCR sequences (see Methods). Conserved motifs were selected based on bits size (range from 0-2), positional bias (P-value<0.05) and with an E-value< 0.001. The evening element (EE) was found within the conserved region that we labelled 'motif 2', there is some homology between the binding sites of ATHB15/ATHB16 and motif 3, and with the binding site of AHL20 and motif 4. de novo identified motif 1 has no known transcriptional regulator. The TATA binding box is indicated within a purple box. Figure S4. Phylogenetic tree of NCR promoter sequences shows high levels of similarity across the family. Polar visualisation of the promoters of 700 NCRs that all have promoter regions of 500bp, excepting three scaffolds (Medtr0538s0010, Medtr1886s0010, Medtr0753s0010). Blue colour denotes NCRs with the evening element, EE (AGATATTT) in their promoter region, green colour with the EER motif AGACATTT and orange colour denotes NCRs with both the EE and EER motif. The NCR promoters can be broadly divided into three groups (denoted 1,2,3 from the root outwards) in which there are NCRs with the EE or the EER. Promoters with both the EE and the EER (orange) are only present in groups 1 and 3. The tree represents DNA maximum-likelihood phylogeny of NCR promoters with 100 bootstrapping and is rooted in the middle. Scale bar of 0.3 represents branch length, measured as the rate of nucleotide substitution per site. Table S1. Analysis of the lhy-1 and lhy-2 mutants. (A) qPCR comparison of LHY expression (relative fold change to β-tubulin) in the lhy-1 and lhy-2 mutants compared to WT R108. (B) Visible leaf area and period length for lhy-1 and lhy-2 mutants compared to WT R108. (C) Nodule fresh weight and plant shoot dry weight for lhy-1 and lhy-2 mutants compared to WT R108, in mockinoculated or S. meliloti WSM1022-inoculated conditions. (D) Expression of LHY, PRR5a and ELF4a transcripts in WT A17 RNAseq data. Rhythmicity was assessed using MetaCycle, and PhaseShift, PhaseShiftHeight, PeakIndex, PeakSPD, Period and P-value for oscillation are given. (E) List of geneIDs for LHY/CCA1 homologues, as shown in Fig. 1A.  Table S3. Expression of clock gene transcripts detected in 40 day-old A17 nodules and information on their oscillating behavior. (A) Medicago truncatula proteins identified with homology to Arabidopsis thaliana clock genes returned from reciprocal BLASTp and Smith-Watermann (SW) alignment homologue searches. Reciprocal BLASTp was performed with the NCBI blasp suite, and alignment score, percentage query coverage and expect value are reported for forward and reciprocal queries. Smith-Watermann alignments were retrieved from the Phytozome v12.1.6 database, SW alignment scores, forward and reciprocal similarity and the incidence of a reciprocal best hit are reported. In cases where a reciprocal high degree of similarity was identified, multiple potential homologues are identified. For each transcript, information on oscillation pattern (Yes or No), oscillating cluster number (1-4), oscillating P-values (LS_pvalue and LS_BH.Q P-value-FDR correction), oscillating period (LS_period), phase (LS_adjphase), and amplitude (LS_amplitude), average and standard deviation (SD) of log2 RNAseq expression values (TPM) per timepoint. (B-C) LHY expression measured in leaves, roots and nodules; C are log2 values of B. Table S4. GO term, pathway and protein domain term enrichment in oscillating gene clusters. (A) Terms, P-value of enrichment with Bonferroni testing and genes associated with each term for dawn cluster 1 (A), morning cluster 2 (B), evening cluster 3 (C) and late-night cluster 4 (D). Table S5. Genes annotated as NCRs in Medicago truncatula A17 and information on their expression levels, oscillating behaviour and presence of motifs. Medicago truncatula transcript IDs identified in RNAseq table, NCR number (where annotated), gene description (from Phytomine), peptide sequence and length, and information on oscillation pattern (Yes (yellow shading) or No), cycling P-value (from Metacycle) and oscillating cluster number (1-4) (A-H). Information on presence of motifs (EE motif=AGATATTT and EER motif=AGACATTT) together with matched sequence, location and P-values (I-AD). Average log2-transformed RNAseq expression values (TPM) for each timepoint (columns AE-AQ), average log2 gene expression value for each gene (AR) and average log2 RNAseq expression values per timepoint compared to the mean for each gene (columns AS-BE). Table S6. Motif analysis in NCRs and cycling genes. (A) Motifs found in the promoters of rhythmic NCRs. (B) Presence of the EE and the EER in the promoters of the 45 cycling NCRs compared to randomized sets of 45 other promoters (as Fig.3G). (C) Presence of the EE and the EER in the promoters of the 700 NCRs compared to randomized sets of 700 other promoters (as Fig.1E). (D) Presence of the EE and the EER in the promoters of the 2,832 cycling NCRs (as Fig.3G). (E) Presence of the EE and the EER within the four clusters of cycling gene promoters (as Fig.3H). For enrichment analysis see Methods; P-values in green are <0.05.