Stress Resistance Heat Shock Protein 70 (HSP70) Analysis in Sorghum (Sorghum bicolor L.) at Genome-Wide Level

Heat shock proteins (HSP70) play an important role in many biological processes. However, as typical in Sorghum bicolor, the systematic identification of the HSP70 gene is very limited, and the role of the Hsp70 gene in the evolution of Sorghum bicolor has not been described systematically a lot. To overcome the gap, Insilco analysis of HSP70 gene family was conducted.The investigation was utilizing the bioinformatics method to analyze the HSP70 gene family and it has been identified that 30 HSP70 genes from the genome sequence of Sorghum bicolor. A comprehensive analysis of these 30 identified genes undertaking the analysis of gene structure, phylogeny, and physicochemical properties, subcellular localization, and promoter region analysis. The gene structure visualization analyses revealed that 22 genes contains both 5’ and 3’ UTRS and one 5’ and one 3’ gene and 6 genes without UTR. The highest number of introns was recorded as 12 and those genes have shown that without in any intron. In the promoter region analysis, ten protein motifs are identified and characterized and 2219 cis-acting elements are identified. Among those, the promoter enhancer elements share the highest number (1411) and light-responsive elements share the next value (335). The physicochemical properties analysis revealed that 23 families have an acidic nature while four families are basic and the rests are neutral. In general, the different analyses performed disclosed their structural organization, subcellular localization, physicochemical properties, cis-acting elements, phylogenetic, and understress conditions. This study provides further information for the functional characterization of HSP70 and helps to understand the mechanisms of abiotic stress tolerance under diverse stress conditions in Sorghum bicolor.


INTRODUCTION
Sorghum (Sorghum bicolor L.) is a multipurpose food crop belonging to the Poaceae family, which are C4 carbon cycle plants with high photosynthetic efficiency and productivity. It ranks the top five among cereal crops in the world. It serves as a source of food, fodder, feed, and bioenergy [2]. It plays a major function in the food security of sub-Saharan Africa, supporting about 500 million citizens. It is cultivated in drought-prone and subsidiary areas in semi-arid zones where other crops cannot grow constantly [1,17].
The most critical abiotic stresses that hamper the production of sorghum harvests may comprise: supplement inadequacy, aluminum stress, dry season, high saltiness, waterlogging, and temperature stress, these wonders must be adapted to plants during development [39].
The effects of stress can prompt inadequacies in development, crop yields, perpetual harm, or passing if the stress surpasses the plant resilience limits [25]. The endurance of plants is restricted to an expected warm scope of −10 to +60°C, characterized by the edge of freezing over intracellular water and the temperature of protein denaturation [38]. Such abiotic stresses have been uncovered to cause the development of different intracellular substances, including nucleic acids, amino acids, starches, and proteins. Following the kickoff of molecular biology strategies into plant science, an enormous arrangement of exertion went into the disclosure of stressinducible genes [13].
Along these lines, an understanding of the abiotic stress response is presently thought to be quite possibly the main point in plant science. The main headway in this exploration field has come from the use of molecular biology techniques. After this method was utilized in plant science, different abiotic stress-inducible genes were isolated and their function was correctly characterized in transgenic plants. The accessibility of this information widened and extended our perspective on abiotic stress response and resilience in plants [13]. In such a manner, plants react to abiotic stress like heat also as different anxieties that can trigger plant genes to protect the strange conditions of gene articulation that was not expressed under ordinary conditions. Such sort of reaction with weights on the molecular level is likewise found in living frameworks enveloping microorganisms, plants, and animals [12,35].
A group of genes incited during heat stress and expressed proteins are called heat-shock proteins (HSPs), stress-prompting proteins, or stress proteins [10]. HSPs are known to be expressed in plants when they experience high-temperature stress as well as in light of a wide scope of other ecological burdens, for example, water stress, salt stress, cold stress, and oxidative stress [39,43].
These stress reacting proteins are classified into various groups dependent on their function and expression pattern, such as constitutive heat stun proteins that are expressed constitutively, and inducible structures that are expressed in light of specific components [4]. Besides, it has been grouped dependent on their protein molecular weight, where they are partitioned into HSP90 (83∼110 kDa), HSP70 (66∼78 kDa), HSP60 (58∼65 kDa), and other small molecular weight proteins HSP20s [44]. HSP70 is described by two utilitarian domains, the amino N-terminal ATPase domain (44 kDa) showing an ATPase activity and a carboxyl C-terminal peptiderestricted domain (25 kDa). The peptide-restricting domain is additionally partitioned into a βsandwich subdomain (18 kDa), which is the substrate-restricting domain and an α-helical subdomain [26,47]. The 70-kDa heat shock proteins (HSP70s) are the most bountiful and generally considered a conserved group of proteins. sHSPs comprise small molecular weight proteins that function as molecular chaperones basic for protein collapsing and avoidance of irreversible protein aggregation [44]. Notwithstanding sHSPs being unmistakably expressed during the heat stun reactions in plants, it is presently realized that some are expressed in unstressed cells also and are thusly engaged with measures other than heat stress [40; 44]. For instance, in plants, they are upregulated during the ripening initiation of tomato fruit [37] and may likewise secure ready tomato fruit against chilling injury [31]. The small utilitarian domains sHsps/Hsp20s have been distinguished in sorghum as molecular chaperones that keep up appropriate folding, trafficking, and disaggregation of proteins under assorted abiotic stress conditions [21].

The Problem Statement
The HSPs being prominently expressed during the heat shock response in plants is now known that some are expressed in unstressed cells as well and are, therefore, involved in processes other than heat stress [40; 44]. Several studies have revealed that HSP70 is closely associated with plant abiotic stress [41], disease resistance [22], growth, and development [42]. Heat tolerance in various plant species such as Triticum aestivum [7], Oryza sativa [35], and Capsicum annuum [9].
Small molecular weight HSPs function as molecular chaperones critical for protein folding and prevention of irreversible protein aggregation [44]. When the plant suffers from high temperature, drought, high salt, low temperature, and heavy metals, HSP70s rapidly accumulate to maintain the stability of the protein and biological macromolecules to improve the resistance of the plant [43]. In addition, some studies found that the HSP protein has some relationship with plant embryogenesis. They are also upregulated during the ripening initiation of tomato fruit [37] and may also protect ripe tomato fruits against chilling injury [31]. The relationship between heat shock treatment and embryogenesis was also studied in Brassica napus, and HSP70 and HSP90 located in the nucleus and cytoplasm were found to be rapidly induced [36].
Despite the breakthrough in the identification and characterization of plant HSPs in sorghum, the most widely cultivated and stress-tolerant cereal crop in the world and particularly sub-Saharan countries [17], the mechanism of stress tolerance through HSP70 as quite a few studies have been reported before. Thus, the identification and characterization of HSP70 genes in sorghum will aid in a better understanding of the molecular mechanism of its stress tolerance. In sorghum, the existence of HSP70 was verified by immunoblotting following salt stress [28][29], and heat stress [26], and Nagaraju et al. [27] identified sHSP family which uses as chaperons. However, the authors could't find any literature describing the HSP70 gene characterization at the genomewide level. Due to the effect of various stresses on plants, investigating various mechanisms of plant stress responses is crucial. Thus, a genome-wide level analysis of sorghum HSP70 gene will help to reveal the underlying complex molecular mechanisms. The publication of the genome data of sorghum will enable systematic analyses of HSP70 evolution and function.
In this study, the bioinformatics method was used to analyze genomic HSP70 gene family members of sorghum, including the number of gene identification, phylogenetic relationships, gene structural features (exon-intron organization), and subcellular localization of the HSP70 protein. apply the recommended approach derived from the results of this study will benefit better.
Because one of the breeding methods that tries to find a solution for various breeding bottlenecks are the molecular approaches. As a result, understanding sorghum HSP70s has great importance since sorghum is severely affected by heat stress, particularly during the grain filling stage.
Therefore, after the completion of the groundwork, it would be useful for the functional identification of the HSP70 gene and its application in breeding more adaptable sorghum cultivars' development and engineering of the genes.

Study Materials
The study materials in this research have consisted of the proteomic and genomic sequences of sorghum, maize, wheat, rice, sugarcane, millet, and Arabidopsis. Others like online and offline based software including, TBtool, MEGAX, Microsoft excel, and web based databases like NCBI, MEME, PlantCARE, Pfam, GSDS, Phytozome, ExPasy, and Cello life were used.

Database Mining (HSP70 Family Genes)
The whole sorghum genome sequence was downloaded from the annotation database phytozome

Gene Structure Display for HSP70 Genes
The Gene Structure Display Server tool (http://gsds.gao-lab.org/index.php) was used to analyze the exon-intron structures [14]. Besides the exon and intron regions, the upstream and downstream UTR (un-translated) regions were also determined to show possible structures of entirely expressed mRNA. Intron phases were classified based on their positions relative to the reading frame of the translated proteins: phase 0 (located between two codons), phase 1 (splitting codons between the first and second nucleotides), or phase 2 (splitting codons between the second and third nucleotides).

Protein Motif and Cis Regulatory Elements Prediction of HSP70 Genes
To find conserved motifs in sorghum Hsp70 gene family members, The MEME suite (version 5.3.3) (https://meme-suite.org/meme/tools/meme) [3] was used to search for motifs in all HSP70 genes that was downloaded from phytozome.

Gene Localization and Gene Duplication Analysis
The amino acid sequences of tandemly and segmentally duplicated HSP70 genes were subjected to TBtool for the analysis of synonymous (Ks) and nonsynonymous (Ka) substitution rate determination. In light of a pace of 6.1 x 10 -9 replacements for each site each year, the difference time (T) was determined as T= Ks/(2 x 6.1 10 -9 ) x 10 -6 million years prior (Mya). The gene location on the chromosome was analyzed with the help of Phenogram -Ritchie Lab to annotate with lines in color at specific base pair locations. PhenoGram allows for annotation of chromosomal locations and/or regions with shapes in different colors, gene identifiers, or other text.

Protein Sub-Cellular Localization Prediction and Physico-Chemical Properties
Protein subcellular localization is crucial for genome annotation and protein function prediction.
Therefore, the subcellular localization of proteins was analyzed using cello life. For computing, physicochemical features such as molecular mass, isoelectric point, instability index, aliphatic index, and average hydropathy were computed using ProtParam expasy (https://web.expasy.org/ protparam).

Identification of the HSP70 Gene Family in Sorghum bicolor
The systematic searching of the sorghum genome showed that a total of 39 genes were responsible for SbHSP70 protein production/ coding. However, among the identified genes, eight genes were found as redundant versions of other genes and one gene was a fragmented sequence and therefore was removed. After the removal of redundant sequences, domains in the proteins of these gene families were searched using the Pfam search tool for confirming the presence of specific Hsp70 domains. Under the activity, a total of nine genes were removed and 30 genes were identified. All nonredundant HSP70 genes were distributed on chromosomes 1, 2, 3, 4, 6, 8, 9, and 10 of sorghum. The largest numbers of genes (14) are distributed on chromosome one and four genes on chromosome nine (Table 3 and Figure 3). It was understood from different kinds of literature previously various numbers of HSP70 genes were identified in different plant species, for instance, 17 Hsp70 in Hordeum vulgare [5], 20 StHSP70 in Solanum tuberosum [19] and 21 CaHSP70 genes in Capsicum annuum [8], 24 PvHSP70 in Phaseolus vulgaris and 61 HSP70 in Glycine max were also reported [44]. This variation in HSP70s family in plants may be due to the presence of extra organelles, like plastids, in the plant cell compared to other eukaryotic organisms.

Gene Structure Display for HSP70 Genes
The gene structure was determined for 30 SbHSP70 gene families using both genomic sequences and coding sequences. From the analysis, it has been noticed that a total of 22 genes were found to have both UTRs (5' and 3' end) and two genes were found with only one UTR each( either 5' or 3'), and six genes were found to have no UTR. The maximum number of introns were found in Sobic.004G263500 gene, that is, 12, followed by eight introns in Sobic.004G011700, Sobic.009G066900, Sobic.009G067000, and Sobic.010G230600 genes respectively. Furthermore, no introns were found in two SbHSP70s genes (Sobic.002G008000 and Sobic.003G378700) ( Figure-1 and Table-4).
The structure of a gene determines its coding potential and can also give hints about the ancestry of genes since genes with similar structures probably evolved from a common ancestor [18]. The effect of introns on the transcription of genes is an evolutionarily conserved feature, being exhibited by such diverse organisms as yeast, plants, and mammals. Intron-containing genes are often transcribed more efficiently than non-intronic genes, and therefore, the presence of introns in a gene is generally associated with an increase in protein production mediated through many different mechanisms ranging from the increase of transcription and translation rates to the improvement of mRNA stability and folding [24].

Protein Motif and Cis Regulatory Element Prediction of HSP70 Genes
MEME was used to analyze the conserved motifs of different groups of protein sequences, and 10 conserved motifs were obtained (Figure-2), which were named Motifs 1-10 (Table-2).
In the promoter region analysis, upstream of the transcription start site of the HSP70 gene family contained a total of 2,219 different cis-acting elements which are categorized as growth hormone-related elements (286) light-responsive elements (335), promoter-related elements (1,411), development/cell cycle-related elements (7), drought-related elements (20), metabolismrelated elements(72) seed-specific regulation related elements(16) binding site related elements (18), temperature-related elements (19), stress defense-related elements (20) and anoxic related elements (15) were identified as computed in PLANT CARE enriched cis-acting element analysis database (Table-6). The detail of cis-acting elements is presented in Figure- (Table-6). The current results are in line with those of the previous report of HSP70 gene in Glycine max [45], PvHSP70 gene in Phaseolus vulgaris.

Physico-Chemical Properties of HSP70 Protein in Sorghum bicolor
The analyzed physical and chemical properties of HSP70 protein in sorghum are presented in
Hence, it could be theorized that the molecular variation inside the characterized genes will have a fundamental role in biochemical and physiological functions that gives competitor gene-based markers, which show a nearby relationship with the trait of interest.

Predicted Proteins Sub-Cellular Localization
The analysis undertaken by cello life for subcellular localization of HSP70s in sorghum uncovered that these proteins were dispersed into five areas, for example, cytoplasmic (Cyto), endoplasmic reticulum (ER), chloroplast (Chlo), nuclear (Nucl), and mitochondrial (Mito). Most extreme proteins were found in the cytoplasmic/cytosolic (17), followed by four in chloroplast and mitochondrial. The endoplasmic reticulum represents three and the nucleus offers two genes (Table-3). In the current investigation of Sorghum bicolor, an aggregate of 17 cytosolic HSP70s proteins were discovered, which is similarly higher than the previous report in Oryza sativa [12] and Arabidopsis thaliana five [6] and lower than as contrasted in Glycine max, that is, 34 [34,35,46].

Gene Localization and Gene Duplication Analysis
The distinguished 30 HSP70 genes were conveyed on eight chromosomes of Sorghum bicolor  Table 5).
Gene duplications assume a significant part in advancement as duplications cause genes to create gene families [15]. Truth be told, it has been recommended that couple and segmental duplications have been the essential driving the wellspring of evolution as these occasions lead to extension of gene families, and age of proteins with novel capacities [5]. Tandem duplication includes the duplication of at least two genes situated on a similar chromosome, while segmental duplication alludes to the marvel when genes having a place in a similar clade, however, situated on various chromosomes are duplicated [20].
In the current study, an aggregate of eighteen (18/30; 60 %) Sorghum bicolor HSP70 genes were demonstrated to be copied (Table 5). Furthermore, the seven sets of a gene had all earmarks of being tandemly duplicated, which was perceived on chromosome number one ( Figure 3). The remainder of the copied genes were all segmentally copied/duplicated.
The proportion of Ka and Ks replacement rate is a powerful strategy to research the specific imperative among copied gene sets [32]. Henceforth, in the current review, Ka, Ks, and Ka/Ks esteems for each pair of paralogous genes were determined (Table 4). On an essential level, the worth of Ka/Ks < 1 connotes the decontaminating option (negative option), Ka/Ks > 1 implies positive determination/selection, and Ka/Ks = 1 the system impartial option (20). At this point, 18 HSP70 genes were demonstrated to be copied. The Ka/Ks proportion for copied HSP70 genes went from 0.040056013 to 0.554428423. All HSP70 genes in the current analysis have Ka/Ks value < 1 (Table 5).

Declaration
Ethics approval and consent to participate Not applicable.

Consent for Publication
Not applicable