Bioinformatic analysis based genome-wide identification, characterization, diversification and regulatory transcription components of RNA silencing machinery genes in wheat (Triticum aestivum L.)

Dicer-Like (DCL), Argonaute (AGO), and RNA-dependent RNA polymerase (RDR) gene families are known as RNA silencing machinery genes or RNAi genes. They have important activities at post-transcriptional and chromatin modification levels. They regulate gene expression relating to different stresses, growth, and development in eukaryotes. A complete cycle of gene silencing is occurred by the collaboration of these three families. However, these gene families are not yet rigorously studied in the economically important wheat genome. Our bioinformatic analysis based genome-wide identification, characterization, diversification and regulatory components of these gene families identified 7 TaDCL, 39 TaAGO and 16 TaRDR genes from wheat genome against RNAi genes of Arabidopsis thaliana. Phylogenetic analysis of wheat genome with Arabidopsis and rice RNAi genes showed that TaDCL, TaAGO and TaRDR proteins are clustered into four, eight and four subgroups respectively. Domain, motif and exon-intron structure analyses showed that the TaDCL, TaAGO and TaRDR proteins conserve identical characteristics within groups while retain diverse differences between groups. GO annotations implied that a number of biological and molecular pathways are linked to RNAi mechanism in wheat. Gene networking between transcription factors and RNAi proteins indicates that ERF is the leading family linked to maximum RNAi genes followed by MIKC-MADS, C2H2, BBR-BPC, MYB, and Dof. Cis-regulatory elements associated to RNAi genes are predicted to act as regulatory components against various environmental conditions. Expressed sequence tag analysis showed that larger numbers of RNAi genes are expressed in different tissues and organs predicted to play roles for healthy plants and grains. Expression analysis of 7 TaDCL genes using qRT-PCR showed that only TaDCL3a and TaDCL3b had root specific significant expression (p-value<0.05) with no expression in leaf validated EST results. Besides, TaDCL3b and TaDCL4 significantly prompted in drought condition indicating their potential role in drought stress tolerance. Overall results would however help researchers for in-depth biological investigation of these RNAi genes in wheat crop improvement.


Introduction
yeasts; the Piwi and worm-specific AGO(WAGO) subgroups are observed in animals and C. elegans 135 respectively [7,[27][28][29]. However, all members of the subgroup of AGO and Piwi conserve the particular DDH 136 catalytic residues but absent in WAGO. The expression of Piwi subgroup proteins is limited to human germ cell and 137 in rat and some mammals. Although this cluster of AGO proteins have not been identified in any plant species but in 138 recent times a novel type of Argonaute, OsMEL1 has been identified in rice which is linked to male meiosis [7,30].

140
The third type of RNA silencing machinery gene is the RDR protein. This kind of protein family is exist and highly 141 essential for RNAi mechanism in fungi, nematodes and plants, but have not been identified in insects or 142 vertebrates [31]. RDR proteins were separated first from tomato [32]. RDR proteins are critically important for start 143 and increase of silencing process and these protein families contain a well-preserved sequence motif like the 144 catalytic β' subunit of DNA dependent RNA polymerases [7,33]. This gene family, however, possesses a RNA-145 dependent RNA polymerase (RdRP) domain, help form the dsRNA from single-stranded RNAs (ssRNAs) to start a 146 new cycle of RNA silencing [7,34].

148
Recent study suggests that there are numerous copies of DCL, AGO and RDR genes are present in plants and 149 animals [7,10]. Different study on a dozen of crops plant has been made to learn about the role of these RNAi gene

172
Though AtDCL2 produces siRNAs responsible to create defensive mechanism against viral infection and by nature 173 the creation of siRNAs are tend to happen from cis-acting antisense transcript but AtDCL4 regulate the vegetative 174 stage transformation by the production of siRNAs from trans-acting transcript [10,41,42]. Surprisingly,AtDCL3 175 selects short dsRNAs but AtDCL4 cleaves long dsRNA substrate s [11]. Many other functions of DCL genes in 176 plants such as AtDCL1 and AtDCL3 proteins contribute in promoting flowering [43].

178
On the other hand, AGO gene families act as the chief RNA-mediated gene silencing techniques and reported that 179 they play noteworthy performance in growth and development in plants [24,44,45]. AtAGO1 is associated with the 180 transgene silencing pathways [46] and AtAGO4 with the epigenetic silencing [47]. AtAGO7 and AtAGO10 however 181 effect the growth [48] and meristem maintenance [49]. Other AtAGOs however have some important characteristics 182 in gene silencing pathways. Earlier investigations also suggest that the RDR genes are biologically prompt in RNAi 183 mechanism such as co-suppression, protect pathogen invasion, chromatin modification, and post-transcriptional 184 gene silencing activities in plants, viz., Arabidopsis,.

186
Wheat (Triticum aestivum), a global common staple food as well as the second most-produced cereal crops after 187 maize in the world (http://www.fao.org/worldfoodsituation/csdb/en/). It is also the high vital source of carbohydrates 188 and is the leading source of vegetal protein in human food. There is little systemic thorough study has been made so 189 far regarding RNA silencing machinery genes for this crop. The current study was carried out to obtain

249
TaRDR gene families were also listed for the presence of RdRP and RRM functional domains for protein structure.
In order to identify the most potential conserved metal-chelating catalytic triad regions or residues in the PIWI

RNA isolation and expression analysis of 7 TaDCL genes in T.
304 aestivum 305 Complete RNA was taken out using Trizol reagent (TAKARA, Japan) by following the manufacturer's guidelines.

306
RNA was treated with DNaseI (TAKARA, Japan) and reverse-transcribed into cDNA using the PrimeScript RT 307 reagent kit (TAKARA, Japan). The collected cDNAs were used for gene expression studies with quantitative real 308 time PCR (qRT-PCR

398
indicates of an amino acid missing of the corresponding protein sequence. The locations analogous to each protein 399 are mentioned at the end of line.  Arabidopsis and rice homologs, three independent phylogenetic trees were created from the multiple sequence 410 alignment. Encoded protein sequences (S1, S2 and S3 data) of all RNAi genes were used to construct the trees using conservation compared with their Arabidopsis and rice counterpart. Fig. 4A

453
It is observed from the search results from all three databases that all TaDCL protein families possessed all most all 454 assumed six common domains viz., DEAD box/ResIII, Helicase-C, Dicer-dimer, PAZ, RNase III and DSRM (Table   455 3 and Fig.5). These were also conserved by the 4 DCL protein families in model plant Arabidopsis[7,10,70].

477
Besides, the T.aestivum genome encrypted 16 TaRDR protein families sharing a mutual motif analogous to the 478 catalytic β' subunit of RNA-dependent RNA-polymerases(RdRp)( Table 3 and Fig.5), which is also supported by the 479 earlier study [33].

493
Wheat DCL4 protein subfamily, TaDCL4, however was identified to possess a few C-terminal motifs reorganized in 494 comparison of the paralogs AtDCL4, which was also supported by the previous demonstration[10]. All TaDCL 495 protein subfamilies retain 20 conserved motifs with the exception of that TaDCL3b and TaDCL4 contain 19 and 16 496 motifs respectively (Fig.6).

498
On the other hand, out of 39 TaAGO protein subfamilies, the maximum 28 protein members of wheat DCL, AGO 499 and RDR retained 16-20 conserved motifs in their protein domain structures with some variations in other 11 500 subfamilies. High conservation was found in TaAGO1, TaAGO9, and TaAGO10 family members with some 501 exceptions that TaAGO1b, TaAGO1e, and TaAGO10c contained 16, 14, and 15 motifs respectively (Fig.6).
17-20 functional motifs in the RdRp domain similar to that of the AtRDR protein members (Fig.6)

508
and TaRDR groups that are assumed to hold and perform same structural functions. Therefore, discovery and broad 509 study of these motifs is important to structure models of cellular processes at the molecular scale and to identify the 510 mechanisms of diseases in eukaytic groups [58,73].

514
The exon-intron organization of TaDCL, TaAGO and TaRDR genes in wheat (T.aestivum) was studied using  (Table 1 and Fig.7). The members of the TaAGO1 sub-526 family maintained high similarity conservation in containing introns16-18 as in AtAGO1 with exception 11 introns 527 in TaAGO1b (Table 1 and Fig.7).Furthermore all members of the rest of the sub-families follow almost the same 528 pattern in containing introns and exons of their Arabidopsis counterpart. Among the TaRDR protein elements,

529
TaRDR3 contained the highest number of 10 introns followed by 9 and 5 introns in TaRDR5 and TaRDR4 respectively (Table 1and Fig .7). Interestingly, all gene members of TaRDR1 and TaRDR6 sub-group had no introns 531 though AtRDR1 possessed 2 introns and AtRDR6 had 0 introns that might be of some significant implication that 532 these gene sub-families may be responsible for any organ or tissue specific expression characteristics and bear any 533 particular disease resistance potentiality (Table 1 and    functions occur(S4 data , S1 Fig. and Fig.8). An interconnected relationship using QuickGO web-based tool from 548 less specific concepts to more specific concepts among the GO terms and the corresponding gene ontologies are 549 demonstrated using the three hierarchical GO slim trees for BP (S1A Fig.), MF (S1B Fig.), and CC (S1C Fig.)

555
ultimately connected to gene expression (GO: 0010467; p-value: 7.20e-05) (S1A Fig.). There are also a number of 556 biological processes are related to be occurred by the predicted RNAi genes in wheat (T.aestvum) some of which  -20). It is also obvious from the GO tree that 563 response to stimulus and immune system process is associated to the defense response to other organism (GO: 564 0098542; 3.90E-13) (S1A Fig.). GO analysis for BP also suggests that some very important functions are maintained silencing. Nucleic acid binding is also of two types: heterocyclic compound binding (GO: 1901363; 1.80e-10) and organic cyclic compound binding (GO: 0097159; 1.80e-10) (S4 Data). GO ancestry tree showed that all these MFs 587 are the child terms of the GO term binding (GO: 0005488; 0.00217) (S1B Fig.).There is another MF termed as 588 ribonuclease III activity (GO: 0004525; p-value: 7.00E-15) is related to 7 RNAi genes that is the child GO term of 589 double-stranded RNA-specific ribonuclease activity (GO: 0032296; p-value: 7.00e-15) and nuclease activity (GO: 590 0004518; p-value: 6.70e-09) (S1B Fig.). The identified TaAGO  2.20e-05), and membrane-enclosed lumen (GO: 0031974; 2.50e-05) (S1C Fig.).

598
The Venn diagram also illustrates that TaDCLs, TaAGOs and TaRDRs shared significant number of GO pathways.

599
For BP, there were 91 GO enrichment pathways in general (Fig.7D) shared by the identified RNAi genes in wheat 600 (T.aestivum), that suggests that RNAi genes are significantly associated to many important BPs. Furthermore, it 601 were shown that the identified putative RNAi genes displayed a pool of mutual pathways for performing MFs and 602 these MFs area predicted to occur in 6 common CCs( Fig.8E and Fig.8F). GESA however provided insights about  found locating in more than one cellular locations (Fig. 9). It is however observed that most of the RNA silencing 611 genes were found to be located in cytosol (DCL 71.4 %, AGO 87.2 % and RDR 87.5 %) followed by plastid (DCL 612 14.3 %, AGO 33.3 % and RDR 31.2 %). A few of the AGO (20.5 %) and RDR (12.5 %) genes were located in mitochondria but no DCL protein was found in mitochondria. On the other hand, ER, golgi and peroxisome contain 614 no RNA silencing genes ( Fig. 9D and S2 Table).Cytosol is the place of maximum metabolism in plants and most of 615 the proteins in the cell are located in cytosol [81,82]. Since the PTGS takes place in the cytoplasmic region [77], this 616 suggests that the identified putative RNAi genes are actively engaged in PTGS process.

634
In this analysis, 375 TFs were identified those regulate the predicted RNAi genes in wheat (T.aestivum) (S5 Data

640
TFs, which are in leading position accounted for nearly 61% of the total identified TFs (S3 Table) .This, implied that wheat (T.aestivum).

644
Furthermore, a regulatory gene network was created using Cytoscape to analyze in-depth relationship among the 645 TFs and RNAi genes in wheat(T.aestivum).This suggests that there is some diverse relationships among the 646 predicted groups of TFs and predicted RNAi genes (Fig.10)

653
with TaAGO2b) are predicted to bind with some specific cis-acting elements in the RNAi genes to response some 654 physiological and biochemical stimuli in wheat for regulating gene expression (S2 Fig.). Some previous study

655
suggested that a few TFs regulate the expression of a number of important genes for some particular physiological 656 and biochemical reasons [83,84]. Earlier investigation also reported that both AP2 and bZIP TF families contribute in 657 gene regulation for development and resistance response to various environmental stressors [83]. From the analysis 658 results, it is however observed that AP2 TF family only interacts with TaAGO2b (S2 Fig.) in wheat that might be 659 predicted that the DNA-binding domain of this TF family is responsible to play for regulating gene in a specific 660 situation.

663
The nodes of the network were colored based on RNAi genes and TFs. TaDCL, TaAGO and TaRDR genes were 664 represented by pink, green, and yellow color nodes respectively and the TFs were represented by paste color nodes.

665
Different node symbols were used for different TF families.

667
Additionally, four hub TF families were selected based on the node degree criterion that possessed three or more 668 interaction with the predicted wheat (T.aestivum) RNAi genes. Among them 16 TFs belong to ERF family, one TF 669 from MIKC-MADS and C2H2, two TFs from LBD family (Fig.11E). The ERF family with accession no.

670
Traes_2AL_E5A9615E2 regulates all 10 and a TF of ERF family with accession no Traes_5BL_F5D379AFC regulates minimum 5 hub wheat (T. aestivum) RNAi genes in the network followed by the two TF of family LBD 672 with accession no.TRAES3BF084400010CFD_g and Traes_1DS_BB8508CC6 that regulate 8 and 5 RNAi genes 673 respectively (Fig. 11E). The TF family C2H2 and MIKC-MADS each regulates 4 and 3 RNAi genes respectively in 674 wheat (T.aestivum) in the network (Fig. 11E). It is however observed from the Fig. 11E that TFs from ERF family 675 are largely bind to the hub wheat RNAi genes that suggests that TFs from this family contain high tendency to bind

707
Therefore, the identified CREs related to LR is assumed to retain direct connections for high photosynthesis level in 708 wheat plant leaves. Amongst the LR CREs the ATCT-motif, ATC-motif, Box-4, AE-box, G-box, I-box, GAT-motif,

709
GT1-motif were shared by the most of the RNAi genes in wheat (T.aestivum) (Fig 12). The TC-rich repeats (CRE

717
It is well-studied that various plant hormones or phytohormones are essential for plant growth and developmental 718 timing [93]. More than a dozen common CREs is related to wheat plant hormonal activities was also identified from 719 the PlantCARE database analysis. The ABRE (cis-acting element involved in the abscisic acid responsiveness),

734
Besides, there are some other CREs were also found and identified in the promoter and enhancer sites of the RNAi

755
endosperm, spike and other important organs or tissues [1,6,7,10,[35][36][37][38]98]. It is however observed from the Fig.13 tissue or organ while TaAGO5c, TaAGO6e and TaAOG7b did not show any expression in any tissue or organ in 758 wheat(T.aestivum). Nearly 50% RNAi genes of all 62, expressed in root, spike, anther, heads, shoots and endosperm 759 followed by floret, leaf and seed (Fig.13) that certainly implied that these tissues and organs provide major 760 contribution for improved wheat grain formation resulting in increased wheat yield. Among the DCL gene members,

761
TaDCL4 exhibited on in root, wheat head, and seedling shoot specific expression. All TaDCL3 gene sub members 762 displayed expression in root, spike, anther, and endosperm only except TaDCL3d that was in endosperm specific 763 (Fig.13). On the other hand, TaDCL1a and TaDCL1b had no root, leaf, wheat head/shoot and floret/flower specific 764 expression ( Fig.13) but interestingly they had only endosperm-embryo, seed, spike specific expression 765 characteristics which indicates that these two gene members of TaDCL family might play role in formation of 766 healthy and sufficient number of grains in wheat plant(T.aestivum).

770
Though the TaAGO2a/2b genes expressed in wheat endosperm, head, and spike ( Fig.13) implied that these two 771 genes contribute in grain development. But surprisingly three sub members of TaAGO1 (TaAGO1i/j/k) as well as all 772 three sub families of TaAGO4 and TaAGO9 showed expression in all mentioned tissues and organs in the 773 expression map except pistil, spikelet and sheath (Fig.13). This indicates that these 6 members of TaAGO4 and

774
TaAGO9 might have roles in wheat plant growth and development. The gene sub-family TaAGO3 had only 775 endosperm-embryo specific expression. The gene sub families TaAGO6a/6b/6c had shoot anther, spike, and root 776 specific expression whereas the two members TaAGO6c/6d showed leaf and only TaAGO6d showed endosperm 777 specific expression. TaAGO7a also showed root, wheat head/shoot and endosperm specific expression (Fig.13).

778
Among the four sub families of TaAGO10 gene, the TaAGO10a/10b retained expression in all tissues and organs of 779 wheat except leaf, sheath, and floret (Fig.13). Earlier wet lab study reported that RNAi genes of AGO family are 780 tend to show diverse expression intensity in leaf, stem, flower in Brassica napus, pepper(Capsicum Annuum) [1,39].

782
Furthermore, EST analysis for all sub members of TaRDR1 showed their expression in flower but had no expression 783 in any tissues and organs in wheat (T.aestivum) (Fig.12). It that suggests that these sub gene families are tend to 784 responsible for expression in flower like two sub families TaRDR6a/6b (Fig.13).Previous study on pepper(Capsicum sub members of TaRDR2 family did not provide any expression in flower. However, all members of TaRDR2   787 showed expression all most in all mentioned important tissues and organs with additional expression of TaRDR2b in 788 pistil in wheat (T.aestivum) but TaRDR2d expressed in only spike (Fig.12). It can be said from this EST analysis of 789 all 62 RNAi genes in wheat (T.aestivum) that genes those expressed in root and leaf might play role in resistance to