In Silico Identification, Characterization and Diversity Analysis of RNAi Genes and their Associated Regulatory Elements in Sweet Orange (Citrus sinensis L)

RNA interference (RNAi) plays key roles in post-transcriptional and chromatin modification levels as well as regulates various eukaryotic gene expressions which involved in stress responses, development and maintenance of genome integrity during developmental stages. The whole mechanism of RNAi pathway is directly involved with the gene-silencing process by the interaction of Dicer-Like (DCL), Argonaute (AGO) and RNA-dependent RNA polymerase (RDR) gene families. However, the genes of these three RNAi families are largely unknown yet in sweet orange (Citrus sinensis), though it is an economically important fruit plant all over the world. Therefore, a comprehensive investigation for genome-wide identification, characterization and diversity analysis of RNA silencing genes in C. sinensis was conducted and identified 4 CsDCL, 8 CsAGO and 4 CsRDR as RNAi genes. To characterize and validate the predicted genes of RNAi families, various bioinformatics analysis was conducted. Phylogenetic analysis clustered the predicted CsDCLs, CsAGOs and CsRDRs genes into four, six and four subgroups with the relevant genes of Arabidopsis respectively. The domain and motif composition analysis, the gene structure for all three-gene families exhibited almost homogeneity within the same group members while showed significant differences in between groups. The GO enrichment analysis results clearly indicated that the predicted genes have direct involvement into the RNAi process as expected in C. sinensis. Moreover, Cis-regulatory elements and regulatory transcription factor analysis of the reported RNAi genes demonstrated the diverse connection to the huge biological functions and regulatory pathways. The expressed sequence tag (EST) analysis showed that these genes are highly expressed in fruit and leaves which indicate that these reported genes have great involvement in C. sinensis food, flowering and fruit production. The expression analysis of the reported RNAi genes might be more useful to explore the most effective RNAi genes in C. sinensis for further biotechnological application.


INTRODUCTION
In multicellular eukaryotes, wide range of biological functions including genome rearrangement, antiviral defense, heterochromatin formation and development patterning and timing are fine-tuned by generally two types of small RNA (sRNA; including 21-24 nucleotides), named microRNA (miRNA) and short interfering RNA (siRNA) [1][2][3]. These sRNA molecules are initially involved with both transcriptional silencing and post-transcriptional RNA interfering RNA formation However, the genes of these three RNAi families are largely unknown yet in the economically important fruit plant C. sinensis. The sweet orange (C. sinensis) is considered as an inevitable source of high nutrition, a great natural source of vitamin C, an unavoidable antioxidant important for the human body [43]). In production volume, orange is the world second fruit plant all over the world (FAO Statistics 2006). Not only having the market value, but also sweet orange contains about 170 phytonutrients and over 60 flavonoids those works as antioxidant, anti-inflammation, anti-cancer and anti-arteriosclerosis compound and also protects from many chronic diseases like arthritis, obesity and coronary heart diseases [44][45][46][47]. Therefore, a comprehensive investigation for genome-wide identification, characterization and diversity analysis of RNAi genes in C. sinensis is essential. In this project, an attempt is made to accomplish a comprehensive in silico investigation for genomewide identification, characterization and diversity analysis of the members of DCL, AGO and RDR gene families in C. sinensis. A number of bioinformatics analysis such as, phylogenetic analysis, gene structure, functional domains and motifs, gene ontology (GO), sub-cellular localization, transcription factor (TF) analysis, cis-acting regulatory element and expressed sequence tag (EST) analysis have been considered to characterize and validate the predicted RNAi genes in C. sinensis.

Identification of DCL, AGO and RDR Genes
For genome-wide identification of DCL, AGO and RDR genes in sweet orange (C. sinensis), protein sequences were downloaded from the Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html). In this purpose, the previously identified sRNA biogenesis protein sequences of the model plant Arabidopsis thaliana (AtDCLs, AtAGOs and AtRDRs) were used to search the protein sequence of C. sinensis. The Basic Local Alignment Search Tool (BLASTP) program was used against C. sinensis genome in the Phytozome database (Fig. (1)). The resulted paralogs protein sequences from C. sinensis were downloaded with the significant score (≥50) and the significant E-values. For avoiding the redundancy of sequences, only the primary transcripts were considered in this analysis.
The conserved domains of all retrieved sequences were searched and predicted by using the Pfam (http://pfam.sanger.ac.uk/) and the NCBI-CD database (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and the SMART analysis. By this time, the different genomic information including the primary transcript name, genomic length, the chromosomal location of genes, encoded protein length were downloaded from the C. sinensis genome in Phytozome database. In this study, the computationally identified new CsDCLs, CsAGOs and CsRDRs genes in C. sinensis genome were named according to the nomenclature based on phylogenetic relatedness of the similar family-members of the Arabidopsis thaliana genes as named previously. The molecular weight of the selected protein sequences was predicted by using the ExPASyComputepI/Mwtool (http://au.expasy.org/tools/pito ol.html).

Sequence Alignment and Phylogenetic Analysis
In this in silico identification, the multiple sequence alignments of the encoded protein sequences of the predicted genes were conducted by using the Clustal-W method [48] with the MEGA5 program [49]. Finally, the phylogenetic tree analysis was carried out using the Neighbor-joining method [50] implemented on the aligned sequenced and the 1,000 bootstrap replicates [51] were used to check this evolutionary relationship. The evolutionary distances were computed using the Equal Input method [52].

Conserved Domain and Motif Analysis
To investigate the functional domains of the predicted genes by GENEDOC program using the aligned sequences although domains were predicted according to the NCBI database, Pfam database and SMART analysis (Fig. (1)). The CsDCLs, CsAGOs and CsRDRs proteins reflected multiple functional domains like the AtDCLs, AtAGOs and AtRDRs proteins. The predicted genes were selected containing the maximum number of functional domains similar to the AtDCLs, AtAGOs and AtRDRs. For DCL proteins, all six functional domains i.e. DEAD/ResIII, Helicase_C, Dicer_Dimer, PAZ, RNase III and DSRM were considered to choose the prominent CsDCL candidates in C. sinensis. Similarly, the proteins were sorted out as CsAGO candidate having the Argo-N/Argo-L, PAZ, MID and PIWI significant functional domains, the characteristic of plant AGO proteins. Same approach was applied to discriminate the CsRDRs genes. In motif investigation, the most significant conserved metal-chelating catalytic triad residues in the PIWI domain, i.e. aspartate, aspartate and histidine (DDH) [13] as well as histidine at 798 positions (H798) were considered for CsAGOs (Fig. (2)). In this study, we considered the alignment of CsAGOs with the paralogs AtAGOs using Clustal-W method as mentioned before. Furthermore, the conserved motif divergences among all the predicted genes were conducted by a complete online protein sequence analysis using Multiple Expectation Maximization for Motif Elicitation (MEME-Suite) [53]. For this purpose, the following parameters were specified: (i) optimum motif width as ≥6 and ≤50; (ii) maximum 20 motifs.

Gene Structure Analysis
For further computational analysis, we have considered the predicted gene structure and their interaction network analysis ( Fig. (1)). The gene structure was analyzed using the online Gene Structure Display Server (GSDS 2.0, http://gsds.cbi.pku.edu.cn/index.php). The structures of the selected genes were compared with the gene structure of Arabidopsis thaliana. These combined gene structure Figures revealed the exon-intron composition of the predicted gene in C. sinensis. Moreover, numbers of intron were noted from GSDS.

Gene Ontology and Sub-cellular Localization Analysis
To check the engagement of our predicted RNAi associated genes with the cluster of different biological processes and molecular functional pathways, the GO analysis was conducted using online tool implemented in PlantTFDB [54]. Here, the respective p-values were determined by Fisher's exact test and Benjamini-Hochberg's technique was chosen for multiple testing corrections. We considered the p-value < 0.05 as statistical significant for the GO enrichment results corresponding to the predicted genes. For the reported gene product, the sub-cellular location was investigated into the cell considering the different organelles (Fig. (1)). these allocations could improve the overall understanding of the proposed genes as well as proteins functional pathways through the RNAi process. Web-based integrative subcellular location predictor tool called plant subcellular localization integrative predictor (PSI) [55] was used to predict the subcellular location of the identified genes.

Identification of Regulatory Relationship between TFs and C. sinensis RNAi genes
In every living organism, transcription levels of genes are directly regulated by the expression or suppression activities of transcription factors (TF) associated with the respective genes. TFs are also a kind of protein which can bind to the specific promoter region of the genes and can control the transcription process whether the genes to be expressed or suppressed. TFs are involved in the plant stress response activities; plant development in different life stage, disease resistance and so on. In this study, the analysis of associated TFs family with the predicted RNAi genes in C. sinensis was conducted from the widely used plant transcription factor database, plantTFDB (http://planttfdb.cbi.pku.edu.cn//). The related TFs were also categorized into 27 different TFs families and their roles in plants are also provided. These TFs are extremely important to study for the development of C. sinensis plant and enrich the sweet orange production.

Construction of Regulatory Network between TFs and C. sinensis RNAi genes
After identification of the related regulatory TFs of the C. sinensis RNAi genes, the regulatory network and sub-network were constructed and visualized using Cytoscape 3.7.1 [56] to find out the hub proteins and the related important TF through the interaction network. The networks were constructed to investigate the regulatory relationship between the TFs and RNAi genes.

Cis-regulatory element Analysis
In genetic transcription, the cis-elements are highly associated by controlling the transcription of a specific gene through the regulatory transcription factors. To collect more related genomic information about the C. sinensis RNAi genes, the CIsacting regulatory element In the promoter region was retrieved and analyzed through the online cis-element prediction database PlantCARE (http://bioinformatics.psb.ugent.be/ webtools/plantcare/html/). The collected cis-regulatory element was categorized into five categories like light responsive (LR), stress responsive (SR), hormone responsive (HR), others activities (OT) and unknown function. The cis-element having known and described function are represented into the Fig. 12 for CsDCLs, CsAGOs and CsRDR separately.

In silico Expressed Sequence Tag (EST) Analysis
For the important and valuable information about the gene expression, the in silico expressed sequence tag (EST) data analysis was conducted. The PlantGDB (http://www.plantgdb.org/cgi-bin/blast/PlantGDB/) was used for EST mining against the proposed RNAi genes in C. sinensis. The default parameter with e-value=1e-10 were considered for blastn search for the EST mining in PlantGDB database. The further heatmap was constructed to represent the specific RNAi gene expression into different tissue and organ in this fruit plant.

Identification and characterization of CsDCLs, CsAGOs and CsRDRs genes
To identify the best candidates of RNAi pathway in C. sinensis similar to the A. thaliana, all the previously downloaded sequences were gone throughout various kinds of analysis (Fig. (1)). Finally, we have isolated 4 DCL genes, 8 AGO genes and 4 RDR genes those produce the CsDCLs proteins, CsAGOs and CsRDRs proteins respectfully in the C. sinensis genome.
On the basis of HMMER analysis with regards of all six types of conserved domains DEAD, Helicase_C, Dicer_dimar, PAZ, RNase III and DSRM; four DCL loci were identified in sweet orange genome. From the conserved domain search by Pfam databases, NCBI databases and the SMART analysis, all reflected that half of the predicted proteins (CsDCL1 and CsDCL4) are conserved with the DEAD/ResIII, Helicase_C, Dicer-dimer, PAZ, RNase III and dsRM domains, which are preserved by all the plant DCL proteins from the DCL genes family (class 3 RNase III family) [57,58]. On the other hand, CsDCL2 and CsDCL3 have missed a single DSRM domain while others are contained with a second DSRM domain. This DsRM domain is lacked in by non-plant DCL completely [11]. Compared to others DCL proteins, the CsDCL1 has the N-terminal DEAD domain which might consist of three adjacent segments of amino acid sequence within the full domain length (152 amino acids), resulted by the analysis of Pfam databases and SMART. The CsDCL3 also revealed the ResIII domain instead of the DEAD domain. Additionally, the genome length of predicted CsDCL genes varied from 10603 bp to 12728 bp corresponding to CsDCL1 and CsDCL2 with the coding potentiality of 1931 and 1396 amino acid ( Table (1

)).
Based on the conserved domain PAZ and PIWI from the putative polypeptide sequences by HMM and HMMER analysis, we have isolated a total of 8 AGO genes in the C. sinensis genome. Conserved domain analysis by the Pfam database, NCBI databases and SMART analysis, reported that all the selected AGO proteins (CsAGO1-8) shared an N-terminus PAZ domain and a C-terminus PIWI super family domain, the core properties of plant AGO proteins. Moreover, the CsAGO proteins also preserved the other domains like the Arabidopsis i.e. ArgoN, ArgoL1, DUF1785, ArgoL2, ArgoMid. From the previous study, the PIWI domain demonstrating expansive homology to RNase H binds the siRNA 5' end to the target RNA [15] and cracks the target RNAs that represents the complementary sequences to small RNAs [59]. Interestingly, the catalytic traid, three conserved metal-chelating residues (D=aspartate, D=aspartate and H=histidine) in PIWI domain are related to the previous event and this traid was firstly showed in the model plant Arabidopsis on AGO1 [13].
Moreover, another critical conserved histidine residue in AGO1 for in vitro endonuclease activity [60] was found. The genome length of the selected CsAGO genes varied from 2768 bp to 9667 bp producing the CsAGO5b and CsAGO10 respectively with the coding potentiality of 426 and 992 amino acids. In this study, the multiple sequence alignment of the PIWI domains of all CsAGOs with the paralogs AtAGOs in Arabidopsis using the CLUSTAL-W method (Fig. (2)). This alignment revealed that the five CsAGOs represented the conserved DDH traid residues like Arabidopsis AGO1.
Among the other three CsAGOs, the CsAGO5b represented two missing PIWI domain catalytic residue(s) in the second aspartate at the 845 th position (D845) and third histidine at the 986 th position (H986) ( Table (2)). But the other two CsAGO proteins, CsAGO5a and CsAGO5c has the catalytic traid but with a replacement in the third histidine residue at the 986 th position by tyrosine (Y) residue. Surprisingly, the histidine residue at the 786 th position was replaced by proline (P) in CsAGO4 and CsAGO5c; by arginine (R) in CsAGO5b and in CsAGO6, H786 residue was replaced by serine (S) residue (Table (2)). The newly identified 4 CsRDRs proteins that shared a common domain RdRP which consist of a sequence motif corresponds to the catalytic β' subunit of DNA-dependent RNA polymerases [61]. The CsRDRs have the genome length varied from 4373 bp to 11526 bp corresponding coding potentiality of 1157 and 1015 amino acids for CsRDR6 and CSRDR3 respectively.

Phylogenetic Analysis of DCL, AGO and RDR proteins in C. sinensis and A. thaliana
In order to conduct the phylogenetic relationship analysis among the DCL, AGO and RDR proteins of sweet orange and Arabidopsis, the total length amino acid sequences of these plants were used to assess the evolutionary history throughout the neighbor-joining phylogenetic tree (Fig. (3))construction method with considering 1000 bootstrap values. Phylogenetic tree was generated from the full-length aligned protein sequences (Supplementary Data S1) of the 4 CsDCLs and 4 AtDCLs from C. sinensis and Arabidopsis (Fig. (3A)).The tree represented four subfamilies of newly identified CsDCL proteins i.e. To construct the phylogenetic tree for CsAGO proteins, the full-length multiple aligned protein sequence of the 8 CsAGOs and 6 AtAGOs were considered (Supplementary Data S2). The tree exhibited six subfamilies, AGO1, AGO4, AGO5, AGO6, AGO7 and AGO10 with the higher sequence similarity (Fig. (3B)).   (Fig. (3C)). The CsRDR proteins were designated as CsRDR1, CsRDR2, CsRDR3 and CsRDR6 corresponding to the RDR proteins of Arabidopsis AtRDR1, AtRDR2, AtRDR3 and AtRDR6 respectively for the increased sequence similarity. The predicted CsRDR proteins were clustered according to their high sequence conservation with their reflection part in Arabidopsis RDR proteins.

Conserved Domain and motifs analysis of predicted proteins
Preliminary domain analysis of the predicted CsDCLs, CsAGOs and CsRDRs were conducted for searching the conserved domains by Pfam databases, NCBI-CDD databases and Simple Modular Architecture Research Tool (SMART) analysis, results are tabulated in (Table (3)). The CsDCLs proteins showed all the conserved domains through the SMART analysis exhibited some unknown regions and low complexity regions besides the expected domains (Fig. (4)). The conserved motifs were screened using MEME with the pre-discussed criterions. The CsDCLs, CsAGOs and CsRDRs proteins were performed all characteristic of DCL, AGO and RDR domain organization was found in Arabidopsis DCLs, AGOs and RDRs proteins respectively when the CsDCL3 has the ResIII domain instead of DEAD domain and CsAGO5b has missed the ArgoN domain having the PAZ and PIWI domain ( Fig. (4)). Among the CsAGO proteins one has 16 different motifs (CsAGO7), three proteins (CsAGO4, CsAGO5c and CsAGO6) reflected 17 motifs and others three proteins (CsAGO1, CsAGO5a and CsAGO10) contained 20 conserved motifs (Fig. (5)).
Although from the analysis, it is observed that the well conservation within the AGO proteins of C. sinensis and Arabidopsis, there still has had some variability of motif distribution between the different subfamilies of AGO proteins in C. sinensis.
Moreover, this analysis suggested that the conserved predicted motifs might play some vital functional roles in these AGO proteins. In RDR protein family, the MEME analysis exhibited that the least 6 conserved motifs in CsRDR3 coincided with the AtRDR3. Among other CsRDRs proteins, the CsRDR2 and CsRDR6 lighted 20 out of 20 conserved motifs which are well distributed on the RdRP domain and the CsRDR1 had 18 out of 20 conserved motifs (Fig. (5)). Although the predicted motifs were well conserved in the major part of the RDR domain, the motif schemes of different RDR subfamilies did not follow the same distributional pattern. The RDR proteins also reflected some additional motifs besides the RdRP domain having the unknown functional role. However, the MEME-suite analysis reflexed that the CsDCLs, CsAGOs and CsRDRs proteins are enriched with well conservation and distribution of the motifs throughout the subfamilies. This analysis suggested that the predicted motifs might play a different vital role in the functional importance of these genes in C. sinensis in different life stages.

Gene Structure and Genomic location of CsDCLs, CsAGOs and CsRDRs
To observe the gene structure of the predicted CsDCLs, CsAGOs and CsRDRs gene, their exon-intron configuration was explored by using GSDS with the respective genes family of Arabidopsis. The exon-intron configuration of the predicted genes represented higher conservation as expected for that of DCLs, AGOs and RDRs genes in the model plant Arabidopsis (Fig. (6)). The gene structure of CsDCLs exhibited having the number of intron 18-25 (Table (1  On the other hand, out of eight CsAGOs, six genes displayed 20 or 21 introns in the gene structure except the CsAGO5b and CsAGO7 having the number of introns 10 and 2 respectively (Fig. (6)). This structure indicated that CsAGOs genes are highly similar to the AtAGOs. The CsRDRs showed up with the equal number of introns with their paralogs from Arabidopsis, except the CsRDR3 having a number of intron 18 which is just one short than introns in AtRDR3.
The genomic location of the predicted RNAi pathway related genes in C. sinensis was conducted by observing the position of the genes in different scaffold location. The predicted CsDCLs, CsAGOs and CsRDRs genes were distributed among the 15 different scaffolds through the entire genome (Supplementary Table S1). All proteins have a unique scaffold position while the two CsAGO genes (CsAGO5a and CsAGO5b) were placed in the scafold000595 in the different location. Most of the genes are situated from scafold00001 to scafold00674 where the CsAGO5c is in the scafold03700.

Gene Ontology Enrichment Analyses
In order to better understand the biological roles of the predicted RNAi genes and characterize them, GO enrichment analysis was performed (Fig. (7), Supplementary Data S4). From the analysis result it was observed that 12 genes involved in posttranscriptional gene silencing (PTGS) pathway (GO:0016441; p-value:3.60e-27), 10 related to RNA interference (GO:0016246; p-value: 8.50e-25) and 12 genes are associated with gene silencing (GO:0016458; p-value:7.40e-24). The RNAi is closely related to the phenomenon named post-transcriptional gene silencing (PTGS) in plants [62].  Data S4).
The Ven Diagram of the GO terms for three clusters of the RNAi genes was drawn (Fig. (7)). It was observed that the CsDCL, CsAGO and CsRDR genes shared significant number of GO pathways in common. In biological process, there are 89 GO enrichment pathways (Fig. (7)) were shared by the reported proteins, which indicate the involvement of the RNAi gene members in numerous biological processes together. Also in molecular function and cellular component, the predicted genes exhibited a group of mutual GO pathways. So this GO analysis provides a significant indication of the predicted RNAi genes in this study.

Sub-cellular Localization of the Reported Genes and Proteins
The subcellular localization studies of the predicted proteins were observed to uncover their cellular appearance. By the subcellular localization annotation, it has been shown that all the reported proteins identified in this study appear into the cytosol ( Fig. (8)). As PTGS occurs into the cytoplasmic region [66], this result implies that the reported RNAi proteins may directly involve to the PTGS process. On the other hand, four CsAGO and one CsRDR proteins exhibited their appearance into the nucleus when as no CsDCLs are located there. These bring a significant importance whether the CsDCLs are not found in nucleus. Further expression pattern analysis will provide deeper information about the CsDCLs. Some of the identified RNAi proteins are also distributed into the cell membrane, plastid and mitochondria (Fig. (8)).  Table S2).
Previous studies reported that the RNAi genes are not only highly related with PTGS but also with transcriptional gene silencing (TGS) [66]. In protein transcriptional process, RNA polymerase type II complexes are directly involved [67]. For PTGS, the RNAi proteins have greater participation in RNA-induced silencing complex (RISC-mediated) mediated cleavage activities by the help of DCL, AGO and RDR proteins with other molecules [63]. The PTGS happens into the cytoplasmic region for targeted mRNA protein degradation [67].

Regulatory Relationship between TFs and RNAi genes
TFs are considered as the switch of gene expression in all living organism. Plant TFs play significant role in growth, development and stress response activities [68]. Thus, identifying TFs regulating RNAi genes could help to improve our understanding of gene silencing process in C. sinensis. In this analysis a total of 235 TFs were identified those regulate the predicted RNAi genes (Supplementary Data S5). The identified TFs were distributed into 27 groups based on the TF families.
The  Table S3). This finding indicated that those TFs could be important in regulating RNAi genes. To better understand the regulatory relationship between TFs and RNAi genes, a regulatory network was constructed ( Fig.   (9A)). From the resultant network it is observed that different groups of TFs exhibited distinct structure. For example, TF belongs to ERF family mainly bind to the gene CsAGO5a (Fig. (9B), Supplementary Fig. S2). However some RNAi genes such as CsAGO5c, CsDCL4 and others are regulated by ERF family; all of them are also linked to CsAGO5a (Fig. (10A)).
Very similar results were also observed for the hub TFs NAC, WRKY and bZIP (Fig. (10B-D)). Moreover, six hub TFs were identified on the basis of node degree which have more than five interacting partners with the RNAi genes. Among them 3 belongs to dof family and other three are in MIKC_MSDS, C2H2 and bZIP (Fig. (10E)). The Dof TFs family is directly involved with the DNA binding activities by the N-and C-terminal region and causes the regulation of gene activation or repression of the target genes which is the main theme of RNAi. The Dof TFs family also works for the biosynthesis of flavonoids and glucosinolates, stress tolerance, seed germination and controlling the photoperiodic flowering [69][70][71][72]. The TF orange1.1g027691m (MIKC_MADS) regulates heights seven RNAi genes and the rest regulate five RNAi genes in the network (Fig. (10E)). The regulatory network clearly exposes that these predicted genes of RNAi process in C. sinensis will exhibit a dramatically expression pattern that can be retrieved by deeper investigation of these genes in future.
Besides, all of the hubs TFs are connected to eight RNAi genes. Of RNAi genes corresponding to hub TFs, five are AGO two are DCL and only one is RDR. Three RNAi genes (2 AGO: CsAGO10, CsAGO7; 1 DCL: CsDCL1) were predicted to be regulated by the entire six hub TFs.

Cis-acting regulatory element Analysis
The cis-acting regulatory element analyses were conducted to find out the functional diversity of the motifs related to the promoter region of the proposed RNAi genes into C. sinenesis. The PlantCARE database provided the information about the motifs and their functionality with the genes. The analysis revealed that most of the motifs were light responsive (LR) (Fig.   12), widely presents in the entire RNAi gene's promoter. Supporting the EST analysis, the light responsiveness is associated with the photosynthesis which occurs in leaf. Among the light responsive motifs the ATCT-motif, ATC-motif, Box-4, AEbox, G-box, I-box, GAT-motif, GT1-motif were shared by the most of the RNAi genes in C. sinensis (Fig. 12). The TC-rich repeats (cis-acting element involved in defense and stress responsiveness), MBS (MYB binding site involved in droughtinducibility) and LTR elements (cis-acting element involved in low-temperature responsiveness) were commonly found as stress responsive motif among the CsDCL/AOG/RDR genes in C. sinensis. It is known that the plant hormones are essential for plant growth and development. Fig. (12). The cis-regulatory element in the promoter region of the identified C. sinenis DCLs, AGOs and RDRs genes respectively. The deep color represents the presence of that element with the corresponding genes.
There are some others significant elements were identified and represented as others activities. The AT-rich element (binding site of AT-rich DNA binding protein (ATBP-1)), AT-rich sequence (element for maximal elicitor-mediated activation), CAAT-box (common cis-acting element in promoter and enhancer regions), CAT-box (cis-acting regulatory element related to meristem expression), CCAAT-box (MYBHv1 binding site), GCN4_motif (cis-regulatory element involved in endosperm expression), TATA-box (core promoter element around -30 of transcription start), circadian (cis-acting regulatory element involved in circadian control), silencer (GT-1 factor binding site) and TGACG-motif (cis-acting regulatory element involved in the MeJA-responsiveness) were recognized as others cis-acting regulatory element shared by RNAi genes in C. sinensis (Fig 12). Some unknown cis-regulatory elements were also detected along with the reported cis-elements (Supplementary   Table S6). In general the cis-regulatory elements carried out significant evidences about the proposed RNAi genes that will be helpful for further investigation about their role in plant disease response, growth and development.

In silico Expressed Sequence Tag (EST) Analysis
For the important and valuable information about the gene expression, the expressed sequence tag (EST) data analysis can bring a significant piece. This analysis helps to validate the genes activities and to investigate the activities in various stress conditions. The EST mining results from the PlantGDB database indicated that the RNAi genes of C. sinensis expressed in multiple important tissue and organs. However, the expression evidence of the RNAi genes in several species has been reported which showed that the RNAi genes have expression in leaf, root, flower, seeds and in others tissue [13,[23][24][25][26][27][28][29][30]73].
The in silico expressed sequence tag (EST) analysis of the proposed C. sinensis RNAi genes showed that all the genes have their expression at least in one organ or tissue. In general, most of the genes were expressed in leaf and fruit which indicate that these genes have a direct involvement with the photosynthesis and fruit developmental stages in C. sinensis (Fig. 11). Fig. (11). The in silico expressed sequence tag (EST) analysis of the identified RNAi genes in C. sinensis plant. The green color represents the existence of expression and off color stands for absent of expression.
The expression of CsDCLs were identified in leafs (CsDCL1/3/4), flowers (CsDCL2/4), fruit (CsDCL1/2/4), ovule (CsDCL1/3/4) and bark (CsDCL3), where no expression was found in root. The entire CsAGOs exhibited diverse expression pattern all over the plant through root, leafs, flowers, ovule, fruit, fruit rind and seed. Among the CsAGO genes, CsAGO1 and CsAGO4 were detected in maximum organs in C. sinensis. The only CsAGO1/4 provided the expression evidence into the seeds. Similarly the others RNAi genes, the CsRDRs also highly expressed into the leaf, flower and fruit when the CsRDR6 showed expression in ovule and bark. Interestingly, no expression evidence was found for the CsRDR3 in this in silico EST analysis (Fig. 11). According to the EST analysis, the proposed RNAi genes have vast contribution in ovule fertilization, fruit development process, plant photosynthesis which can be dugout by experiment expression validation.

Conclusion
The orange is considered as the second highest produced fruits all over the world. The C. sinensis plant is the major source of sweet orange which is one of the most favourite and nutritious fruits. Identification, characterization and diversity analyses of the RNAi gene families were essential, since these families play the vital role for silencing of other gene families of any plant. In this study, a number of bioinformatics approaches were used to identify, characterize and analyse the diversification, properties and biological functionality of RNAi genes in C. sinensis. By this ways, we identified 4 CsDCL, 8 CsAGO and 4 CsRDR genes as the RNAi gene families in sweet orange. This study also provided additional genomic and physicochemical information of the predicted genes and corresponding proteins. With the phylogenetic analysis, the subgroups of the three gene families were exhibited, the domains and motifs configuration and the gene structures revealed the maximum homogeneity with the respective gene family of Arabidopsis. Moreover, the GO enrichment and subcellular localization analysis provided the final confirmation about the reported genes and protein that, these are the key factor of RNAi process in C. sinensis. In this analysis, we explored regulatory relationship network between TFs and proposed RNAi genes. Potential targets of TFs were identified those involved in plant growth and development as well as controlling the gene expression or suppression related to RNAi process. The cis-acting regulatory elements and expressed sequence tag (EST) analysis indicates that our reported RNAi genes have diverse involvement into the orange plant growth, flowering, and development processes.
Thus the reported genes in this study may exhibit significant expression pattern under different stress conditions in various developmental stages of sweet orange like the other plants that were investigated by other researchers. Therefore, the findings of this study may provide a basis for further research on RNAi pathway genes in C. sinensis to enrich the ultimate sweet orange plant development and fruit production all dover the world.

Supplementary Files
Supplementary datasets, tables and figures are available at www.bbcba.org/softwares/CsRNAi.zip.