Abstract
Non alcoholic fatty liver disease (NAFLD) is the most common metabolic disease in humans, affecting the majority of individuals. In the current investigation, we aim to identify potential key genes linked with NAFLD through bioinformatics analyses of next generation sequencing (NGS) dataset. NGS dataset of GSE135251 from the Gene Expression Omnibus (GEO) database were retrieved. Differentially expressed genes (DEGs) were obtained by DESeq2 package. g:Profiler database was further used to identify the potential gene ontology (GO) and REACTOME pathways. Protein-protein interaction (PPI) network was constructed using the Hippie interactome database. miRNet and NetworkAnalyst databases were used to establish a miRNA-hub gene regulatory network and TF-hub gene regulatory network for the hub genes. Hub genes were verified based on receiver operating characteristic curve (ROC) analysis. Totally, 951 DEGs were identified including 476 up regulated genes and 475 down regulated genes screened in NAFLD and normal control. GO showed that DEGs were significantly enhanced for signaling and regulation of biological quality. REACTOME pathway analysis revealed that DEGs were enriched in signaling by interleukins and extracellular matrix organization. ESR2, JUN, PTN, PTGER3, CEBPB, IKBKG, HSPA8, SFN, CDKN1A and E2F1 were indicated as hub genes from PPI network, miRNA-hub gene regulatory network and TF-hub gene regulatory network. Furthermore, ROC analysis revealed that ESR2, JUN, PTN, PTGER3, CEBPB, IKBKG, HSPA8, SFN, CDKN1A and E2F1 might serve as diagnostic biomarkers in NAFLD. The current investigation provided insights into the molecular mechanism of NAFLD that might be useful in further investigations.
Introduction
Non alcoholic fatty liver disease (NAFLD) has emerged as the main cause of metabolic disorder, but current treatments are still suboptimal [1]. NAFLD is a heterogeneous liver disorder that is characterized by hepatic steatosis to non-alcoholic steatohepatitis (NASH) and cirrhosis [2]. NAFLD are linked with progression of hepatocellular carcinoma [3], cardiovascular diseases [4], cerebrovascular diseases [5], diabetes mellitus [6], dietary rhythm [7], obesity [8], hypertension [9] and renal diseases [10]. Genetic factors can be used to show the pathogenesis of NAFLD [11]. Our understanding of the occurrence and advancement molecular mechanism of NAFLD has been greatly upgraded; however, the cause and potential molecular pathogenesis of NAFLD is still unclear [12]. Therefore, it is necessary to explore novel hub genes and signaling pathways for understanding the molecular mechanism and discovering key biomarkers for NAFLD.
At present, next generation sequencing (NGS) technology is widely used in molecular mechanism study and has a broad range of application in molecular biology [13]. It offers an efficient method for systematically screening NAFLD genes and signaling pathways, and identifying their regulatory mechanisms with bioinformatics [14]. Genes includes heme oxygenase-1 [15], odd-skipped related 1 (Osr1) [16], serine rich splicing factor 3 (SRSF3) [17], lipocalin-type prostaglandin D2 synthase (L-PGDS) [18] and β-Klotho (KLB) [19] are responsible for advancement of NAFLD. Signaling pathways includes LKB1/AMPK signaling pathway [20], Nrf2 signaling pathway [21], Sirt1/AMPK and NF-κB signaling pathways [22], eIF2α signaling pathway [23] and AMPK and LXR signaling pathways [24] are liable for progression of NAFLD. Therefore, a plenty of valuable clues could be explored for novel studies on the base of these data. Furthermore, many bioinformatics investigation on NAFLD have been produced in recent years [25], which proved that the integrated bioinformatics methods could help us to further investigation and better exploring the molecular mechanisms.
We downloaded NGS dataset GSE135251 [26–27] from Gene Expression Omnibus database (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) [28], which contain expression profiling by high throughput sequencing data from NAFLD samples and normal control samples. We then performed deep bioinformatics analysis, including identifying common DEGs, gene ontology (GO) and pathway enrichment analysis, protein-protein interaction (PPI) network analysis, module analysis, construction and analysis of miRNA-hub gene regulatory network and TF-hub gene regulatory network. The findings were further validated by receiver operating characteristic curve (ROC) analysis. The aim of this study was to identify DEGs and key signaling pathways, and to explore potential candidate biomarkers for the diagnosis and therapeutic targets in NAFLD.
Materials and methods
Data resources
Expression profiling by high throughput sequencing dataset GSE135251 [26–27] was downloaded from the GEO database. The data was produced using a GPL18573 Illumina NextSeq 500 (Homo sapiens). The GSE135251 dataset contained data from 216 samples, including 206 NAFLD samples and 10 normal control samples.
Identification of DEGs
To assess DEGs, using the “DESeq2” package of R software [29]. A moderated t-statistic and a log-odds of DEG was computed for each contrast for each gene. The Benjamini and Hochberg (BH) method was performed to adjust P value to reduce the false positive error [30]. DEGs between NAFLD samples and normal control samples were identified via DESeq2 package of R software with |logFC|□ > 1.295 for up regulated genes, |logFC|□ < -1.368 for down regulated genes and adjust P value <□ 0.05, which were visualized as volcano plots and heat map. “ggplot2” and “gplot” packages of R software was applied to generate volcano plots and heat map.
GO and pathway enrichment analyses of DEGs
g:Profiler (http://biit.cs.ut.ee/gprofiler/) [31] is an online functional annotation tool to provide a comprehensive understanding of biological information of genes and proteins. GO enrichment analysis (http://www.geneontology.org) [32] is a commonly used approach for defining genes and its RNA or protein product to identify unique biological properties of high-throughput genomic data. GO enrichment analysis including biological process (BP), cellular components (CC), and molecular function (MF). REACTOME (https://reactome.org/) [33] pathway database is a collection of databases dealing with genomes, diseases, biological pathways, drugs, and chemical materials. P value□<□0.05 was considered as the threshold.
Construction of the PPI network and module analysis
The online Hippie interactome (http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/) [34] database was used to identify potential interaction among the common DEGs. Cytoscape software version 3.8.2 (http://www.cytoscape.org/) [35] was applied to establish the protein interaction network. The Network Analyzer plug-in was used to explore hub genes were generated using node degree [36], betweenness centrality [37], stress centrality [38] and closeness centrality [39]. The intersect function was used to identify the hub genes. Then, PEWCC1 plug-in [40] was performed to monitor PPI network modules with Cytoscape. The functional enhancement analysis of DEGs in the significant module were performed using g:Profiler.
MiRNA-hub gene regulatory network construction
miRNet database (https://www.mirnet.ca/) [41] is a bioinformatics platform for predicting miRNA-hub gene pairs. In the current investigation, the targets of the hub genes were predicted using fourteen databases: TarBase, miRTarBase, miRecords, miRanda (S mansoni only), miR2Disease, HMDD, PhenomiR, SM2miR, PharmacomiR, EpimiR, starBase, TransmiR, ADmiRE, and TAM 2.0. The screening criterion was that the miRNA target exists in the fourteen databases concurrently. The miRNA-hub gene regulatory network was depicted and visualized using Cytoscape software [35].
TF-hub gene regulatory network construction
NetworkAnalyst database (https://www.networkanalyst.ca/) [42] is a bioinformatics platform for predicting TF-hub gene pairs. In the current investigation, the targets of the hub genes were predicted using ChEA database. The screening criterion was that the TF target exists in the ChEA database concurrently. The TF-hub gene regulatory network was depicted and visualized using Cytoscape software [35].
Receiver operating characteristic curve (ROC) analysis
ROC curve analysis was accomplished to assess the sensitivity (true positive rate) and specificity (true negative rate) of the hub genes for NAFLD diagnosis and we studied how large the area under the curve (AUC) was by using the pROC package in R statistical software [43].
Results
Identification of DEGs
We obtained Expression profiling by high throughput sequencing from newly diagnosed patients with NAFLD and normal controls from the GSE135251 and analyzed DEGs using DESeq2 package of R software. Setting the cut□off criterion as adjust P value <□0.05, |logFC|□> 1.295 for up regulated genes and |logFC|□< -1.368 for down regulated genes, we identified 951 DEGs from GSE135251. Total 951 DEGs, 476 were up regulated and 475 down regulated genes in NAFLD compared with normal control samples and are listed in Table 1. The heatmap and volcano plot are shown in Fig. 1 and and Fig. 2, respectively.
Heat map of differentially expressed genes. Legend on the top left indicate log fold change of genes. (A1 – A206= Non Alcoholic Fatty Liver Disease Samples; B1 – B10= Normal Control Samples)
Volcano plot of differentially expressed genes. Genes with a significant change of more than two-fold were selected. Green dot represented up regulated significant genes and red dot represented down regulated significant genes.
The statistical metrics for key differentially expressed genes (DEGs)
GO and pathway enrichment analyses of DEGs
The online software g:Profiler was used to identify overrepresented GO categories and REACTOME pathways. The up regulated genes and all of the down regulated genes were uploaded to the gene list separately. GO enrichment analysis showed that the up regulated genes were significantly enriched in BP, including signaling and response to stimulus. The down regulated genes were also significantly enriched in BP, including regulation of biological quality and cell development. As for CC, the up regulated genes were enriched in chromatin and integral component of plasma membrane, and the down regulated genes were enriched in cell periphery and membrane. In addition, with regards to MF, the up regulated genes were enriched in DNA-binding transcription activator activity, RNA polymerase II-specific and signaling receptor binding, and the down regulated genes were enriched in calcium ion binding and structural molecule activity. The results are shown in full in Table 2. REACTOME pathway enrichment analysis found significantly enriched pathways. The up regulated genes were enriched in signaling by interleukins and signal transduction, while all of the down regulated genes were enriched in extracellular matrix organization and biological oxidations. The results are shown in full in Table 3.
The enriched GO terms of the up and down regulated differentially expressed genes
The enriched pathway terms of the up and down regulated differentially expressed genes
Construction of the PPI network and module analysis
All of the up regulated genes and the down regulated genes were uploaded to Hippie interactome database to draw the PPI network. A total of 3199 nodes and 5019 edges were analyzed in Cytoscape and is displayed in Fig. 3. The hub genes with high node degree, betweenness centrality, stress centrality and closeness centrality were selected as the hub genes of NAFLD. These hub genes were ESR2, JUN, PTN, PTGER3, CEBPB, IKBKG, HSPA8, SFN, CDKN1A and E2F1. All of the results are shown in Tables 4. The two significant modules are displayed in Fig. 4A and Fig. 4B. There are 38 nodes and 121 edges genes involved in module 1 and 45 nodes, and 96 edges genes involved in module 2. These modules were significantly enriched in signaling by interleukins, signaling, signal transduction, response to stimulus, signaling by receptor tyrosine kinases, chromatin, DNA-binding transcription activator activity, RNA polymerase II-specific, metabolism of lipids and regulation of biological quality.
PPI network of DEGs. Up regulated genes are marked in green color; down regulated genes are marked in red color.
Modules of isolated form PPI of DEGs. (A) The most significant module was obtained from PPI network with 95 nodes and 103 edges for up regulated genes (B) The most significant module was obtained from PPI network with 79 nodes and 131 edges for down regulated genes. Up regulated genes are marked in green; down regulated genes are marked in red
Topology table for up and down regulated genes.
MiRNA-hub gene regulatory network construction
To investigate the molecular mechanisms underlying the hub genes and miRNAs were searched by bioinformatics methods. The miRNAs of hub genes were predicted by miRNet database. The miRNA-hub gene regulatory network is consisted of 2546 nodes and 13598 edges, including 2204 miRNAs and 252 hub genes (Fig. 5). As shown in Table 5, miRNAs of hub genes were displayed. JUN might be the target of 118 miRNAs (ex; hsa-mir-199a-5p), FOXO1 might be the target of 63 miRNAs (ex; hsa-mir-33a-3p), FOXC1 might be the target of 62 miRNAs (ex; hsa-mir-561-3p), SOCS3 might be the target of 48 miRNAs (ex; hsa-mir-22-5p), CEBPB might be the target of 47 miRNAs (ex; hsa-mir-3613-3p), CDKN1A might be the target of 187 miRNAs (ex; hsa-mir-30b-3p), HSPA8 might be the target of 116 miRNAs (ex; hsa-mir-338-5p), FASN might be the target of 116 miRNAs (ex; hsa-mir-103a-3p), E2F1 might be the target of 74 miRNAs (ex; hsa-mir-150-5p) and CDT1 might be the target of 45 miRNAs (ex; hsa-mir-202-3p).
Target gene - miRNA regulatory network between target genes. The yellow color diamond nodes represent the key miRNAs; up regulated genes are marked in green color; down regulated genes are marked in red color.
miRNA - target gene and TF - target gene interaction
TF-hub gene regulatory network construction
To investigate the molecular mechanisms underlying the hub genes and TFs were searched by bioinformatics methods. The TFs of hub genes were predicted by NetworkAnalyst database. The TF-hub gene regulatory network is consisted of 441 nodes and 6738 edges, including 181 TFs and 260 hub genes (Fig. 6). As shown in Table 5, TFs of hub genes were displayed. JUN might be the target of 61 TFs (ex; GATA2), SOCS3 might be the target of 56 TFs (ex; RUNX1), TFAP2A might be the target of 55 TFs (ex; SUZ12), ATF3 might be the target of 48 TFs (ex; NUCKS1), CEBPB might be the target of 46 TFs (ex; FOXP2), CDKN1A might be the target of 57 TFs (ex; TP53), HSPA8 might be the target of 39 TFs (ex; KLF4), FASN might be the target of 39 TFs (ex; EGR1), ZWINT might be the target of 37 TFs (ex; AHR) and BCAR1 might be the target of 37 TFs (ex; TRIM28).
Target gene - TF regulatory network between target genes. The blue color triangle nodes represent the key TFs; up regulated genes are marked in green color; down regulated genes are marked in red color.
ROC curve analyses of hub genes. A) ESR2 B) JUN C) PTN D) PTGER3 E) CEBPB F) IKBKG G) SFN I) CDKN1A J) E2F1
Receiver operating characteristic curve (ROC) analysis
ROC curve analysis using “pROC” in R statistical software packages was performed to calculate the capacity of ten hub genes to distinguish NAFLD samples from normal control samples. ESR2, JUN, PTN, PTGER3, CEBPB, IKBKG, HSPA8, SFN, CDKN1A and E2F1 all exhibited excellent diagnostic efficiency (AUC > 0.88) (Fig. 6).
Discussion
The NAFLD is a metabolic disease in light of molecular characteristics, distinct morphologies, metabolic pathways, therapeutic response and clinical outcomes. Therefore, it is important to explore the mechanisms of NAFLD progression to limit its advancement. The NGS platforms for detection of gene expression have been progressing briskly in diseases advancement, which provides the basis of novel targets discovery for diagnosis and therapy for NAFLD.
In the current investigation, NGS dataset, GSE135251 was extracted to identify the DEGs between NAFLD samples and normal control samples. The results showed that there are 476 up regulated and 475 down regulated genes between the NAFLD samples and normal control samples by using bioinformatics analysis. Altered expression of FOS (Fos proto-oncogene, AP-1 transcription factor subunit) [44], IGFBP1 [45], IL6 [46], RGS1 [47], NR4A3 [48] and SLC22A12 [49] occurs in hypertension, but these genes might be novel targets for NAFLD. Pan et al. [50], McKee et al. [51], Simon et al. [52] and Shi et al. [53] proved that IGFBP1, AREG (amphiregulin), IL6 and AKR1B10 were involved in NAFLD progression. Ross et al. [54] and Lin et al. [55] further demonstrated that IGFBP1 and IL6 was involved in liver cirrhosis. IGFBP1 [56], AREG (amphiregulin) [57], IL6 [58], NR4A3 [59], EEF1A2 [60] and AKR1B10 [61] are implicated in the growth and metastasis of hepatocellular carcinoma. IGFBP1 [62], IL6 [63], RGS1 [64], NR4A3 [65], EEF1A2 [66] and MYH7 [67] acts in the progression of cardiovascular diseases, but these genes might be novel targets for NAFLD. IGFBP1 [68], IL6 [69], NR4A3 [70] and SLC22A12 [49] plays in the development of obesity, but these genes might be novel targets for NAFLD. Increasing evidence suggested that IGFBP1 [71] and IL6 [72] are involved in diabetes mellitus development and progression, but these genes might be novel targets for NAFLD. Recent studies found that AREG (amphiregulin) [73], IL6 [74] and RGS1 [75] plays an important role in the occurrence and development of cerebrovascular diseases, but these genes might be novel targets for NAFLD. IL6 [76] has been demonstrated to function in kidney disease, but this gene might be novel target for NAFLD.
Subsequently, GO annotation and REACTOME pathway enrichment analysis were performed to elucidate the significance of these identified altered expressed genes. Pathways and GO annotation include signal transduction [77], GPCR ligand binding [78], metabolism of lipids [79], fatty acid metabolism [80], signaling [81], chromatin [82], cell periphery [83] and membrane [84] were linked with progression of NAFLD. EGR1 [85], NR4A1 [86], DLK1 [87], SIK1 [88], CCN1 [89], SOCS2 [90], IRS2 [91], GADD45B [92], SOCS3 [93], MMP1 [94], CXCR4 [95], THBS1 [96], FST (follistatin) [97], KLF4 [98], CCL2 [99], OSM (oncostatin M) [100], MSTN (myostatin) [101], EPO (erythropoietin) [102], IL10 [103], JUN (Jun proto-oncogene, AP-1 transcription factor subunit) [104], FOXO1 [105], DKK1 [106], PTN (pleiotrophin) [107], ATF3 [108], SOCS1 [109], GPR183 [110], AKR1B10 [111], TREM2 [112], SCD (stearoyl-CoA desaturase) [113], GCK (glucokinase) [114], LPL (lipoprotein lipase) [115], ANGPTL8 [116], UGT1A1 [117], IL2RA [118], CDKN1A [119], GAS6 [120], ACE2 [121], NLRP6 [122], S1PR4 [123], FADS2 [124], FASN (fatty acid synthase) [125], TIMP3 [126], CHI3L1 [127], ADAMTSL2 [128], FNDC5 [129], MMP9 [130] and TMC4 [131] were involved in progression of NAFLD. EGR1 [132], NR4A2 [133], DUSP1 [134], DLK1 [135], SIK1 [136], CCN1 [137], CEBPD (CCAAT enhancer binding protein delta) [138], SOCS2 [139], IRS2 [91], GADD45B [140], CXCL2 [141], ZFAND5 [142], EGR3 [143], PTGS2 [144], KL (klotho) [145], SOCS3 [146], MMP1 [147], CXCR4 [148], THBS1 [149], SLC7A11 [150], FAM83A [151], EGR2 [152], ZFP36 [153], NR0B2 [154], CCN2 [155], FST (follistatin) [156], PCK1 [157], CTLA4 [158], KLF4 [159], PRDM1 [160], CCL2 [161], OSM (oncostatin M) [162], ERBB4 [163], MSTN (myostatin) [164], ADAMTS1 [165], PDK4 [166], INHBB (inhibin subunit beta B) [167], EPO (erythropoietin) [168], TSPAN5 [169], RASSF10 [170], GATA3 [171], IL10 [172], ESR2 [173], VIP (vasoactive intestinal peptide) [174], CEBPB (CCAAT enhancer binding protein beta) [175], FOXC1 [176], FOXO1 [177], DKK1 [178], RNF152 [179], SOX17 [180], PTN (pleiotrophin) [181], GAS1 [182], PMAIP1 [183], CCL3 [184], FGF2 [185], ATF3 [186], FZD7 [187], ANGPT2 [188], SOCS1 [189], MAP3K8 [190], LPAR6 [191], EYA4 [192], FOSL1 [193] MAFF (MAF bZIP transcription factor F) [194], KLF6 [195], BASP1 [196], CTCFL (CCCTC-binding factor like) [197], AKR1B10 [198], S1PR2 [199], COMP (cartilage oligomeric matrix protein) [200], IGF2BP1 [201], ACHE (acetylcholinesterase) [202], TREM2 [203], SCD (stearoyl-CoA desaturase) [204], FOXP3 [205], SCIN (scinderin) [206], EPHB2 [207], ANGPTL8 [208], HSPA8 [209], KCNJ11 [210], PFKL (phosphofructokinase, liver type) [211], EYA2 [212], GLIS2 [213], CDKN1A [214], ITGB3 [215], GAS6 [216], PIEZO1 [217], SYT7 [218], AGRN (agrin) [219], DAB2IP [220], EXO1 [221], WNK2 [222], ACE2 [223], SPP1 [224], ANXA6 [225], E2F1 [226], CLDN3 [227], ALDH3A1 [228], ROS1 [229], FASN (fatty acid synthase) [230], NKD2 [231], TIMP3 [232], CHI3L1 [233], RAB25 [234], SUSD2 [235], LAMB3 [236], LYPD1 [237], ADAMTSL5 [238], FAT1 [239], GPBAR1 [240], FNDC5 [241], HLA-G [242], TRAF7 [243], MMP9 [244], TMC4 [245], TNFRSF12A [246], CABYR (calcium binding tyrosine phosphorylation regulated) [247], BMP10 [248] and LOXL4 [249] contributes to the progression of hepatocellular carcinoma. Previous studies have demonstrated that EGR1 [250], SIK1 [251], PTGS2 [252], RGS2 [253], KL (klotho) [254], SOCS3 [255], MMP1 [256], CXCR4 [257], THBS1 [258], CHRM3 [259], KLF4 [260], CACNA1A [261], CCL2 [262], AVPR1A [263], PDK4 [264], EPO (erythropoietin) [265], IL10 [266], ESR2 [267], VIP (vasoactive intestinal peptide) [268], FOXC1 [269], FOXO1 [270], S100B [271], FGF7 [272], SOX17 [273], FGF2 [274], ATF3 [275], IMPA1 [276], HPGDS (hematopoietic prostaglandin D synthase) [277], F2RL3 [278], SCUBE1 [279], COMP (cartilage oligomeric matrix protein) [280], ADAMTS16 [281], ACHE (acetylcholinesterase) [282], SLC6A2 [283], TREM2 [284], GCK (glucokinase) [285], FOXP3 [286], CYP7A1 [287], CMA1 [288], SLC22A2 [289], LPL (lipoprotein lipase) [290], ANGPTL8 [291], HSPA8 [292], MB (myoglobin) [293], KCNJ11 [294], MUC6 [295], UGT1A1 [296], KCNJ5 [297], GDF2 [298], ARHGEF18 [299], ITGB3 [300], P2RY2 [301], PIEZO1 [302], PDGFA (platelet derived growth factor subunit A) [303], ACE2 [304], NLRP6 [305], ROS1 [306], FASN (fatty acid synthase) [307], TIMP3 [308], CHI3L1 [309], TREH (trehalase) [310], SLC6A19 [311], FNDC5 [312], HLA-G [313], MMP9 [314], SLC22A18 [315] and BMP10 [316] are an independent prognostic factors in hypertension, but these genes might be novel targets for NAFLD. EGR1 [317], NR4A1 [318], NR4A2 [319], DUSP1 [320], GADD45G [321], DLK1 [322], SOCS2 [323], GADD45B [324], CXCL2 [325], EGR3 [326], PTGS2 [327], RGS2 [328], KL (klotho) [329], SOCS3 [330], MMP1 [331], CXCR4 [332], THBS1 [333], SLC7A11 [334], EGR2 [335], ZFP36 [336], KCNE4 [337], CCN2 [338], FST (follistatin) [339], CLEC4E [340], OGN (osteoglycin) [341], CTLA4 [342], KLF4 [343], CYTL1 [344], IL1RL1 [345], HBEGF (heparin binding EGF like growth factor) [346], CCL2 [347], OSM (oncostatin M) [348], PRRX1 [349], ERBB4 [350], MSTN (myostatin) [351], ADAMTS1 [352], PDK4 [353], EPO (erythropoietin) [354], GATA3 [355], IL10 [356], ADTRP (androgen dependent TFPI regulating protein) [357], ESR2 [358], VIP (vasoactive intestinal peptide) [359], KCNIP1 [360], PLA2G2A [361], FOXC1 [362], FOXO1 [363], S100B [364], DKK1 [365], SOX17 [366], SLC6A4 [367], CCL3 [368], NPHP3 [369], ATF3 [370], ANGPT2 [371], SOCS1 [372], MYZAP (myocardial zonulaadherens protein) [373], EYA4 [374], F2RL3 [278], TFAP2A [375], MYH7 [376], SLC22A12 [377], SCUBE1 [378], S1PR2 [379], COMP (cartilage oligomeric matrix protein) [380], EDN2 [381], ACHE (acetylcholinesterase) [382], SLC6A2 [383], TREM2 [384], LMOD2 [385], GCK (glucokinase) [386], FOXP3 [387], CYP7A1 [388], LPL (lipoprotein lipase) [389], ANGPTL8 [390], HCN2 [391], HSPA8 [392], MB (myoglobin) [393], KCNJ11 [394], HRH2 [395], TNFSF13 [396], UGT1A1 [397], IL2RA [398], KCNJ5 [399], ITGB3 [400], P2RY2 [401], GAS6 [402], PIEZO1 [403], DAB2IP [404], ACE2 [405], KCNE1 [406], S1PR4 [407], E2F1 [408], ROS1 [409], FADS2 [410], TIMP3 [411], CHI3L1 [412], PHLDA3 [413], MMP17 [414], RASAL1 [415], SLC5A10 [416], SUSD2 [417], OLFM2 [418], LAMA3 [419], BCAR1 [420], TREH (trehalase) [421], FNDC5 [422], HLA-G [313], MMP9 [423] and MYL7 [424] are an important members in the progress of cardiovascular diseases, but these genes might be novel targets for NAFLD. Previous studies have shown that EGR1 [425], NR4A1 [426], DUSP1 [427], DLK1 [428], SIK1 [429], SOCS2 [430], IRS2 [431], SOCS3 [432], CXCR4 [433], THBS1 [434], GPAT3 [435], ZFP36 [436], DUSP2 [437], CCN2 [438], FST (follistatin) [439], PCK1 [440], KLF4 [441], IL1RL1 [442], CCL2 [443], OSM (oncostatin M) [444], MSTN (myostatin) [445], PDK4 [446], EPO (erythropoietin) [447], IL10 [448], VIP (vasoactive intestinal peptide) [449], CEBPB (CCAAT enhancer binding protein beta) [450], FOXO1 [451], S100B [452], DKK1 [453], SLC6A4 [454], CCL3 [455], FGF2 [456], ATF3 [457], SOCS1 [109], IL1RAP [458], MAP3K8 [459], SIM1 [460], SCUBE1 [461], CIDEA (cell death inducing DFFA like effector a) [462], ACHE (acetylcholinesterase) [463], TREM2 [464], SCD (stearoyl-CoA desaturase) [465], GCK (glucokinase) [466], CYP7A1 [467], LPL (lipoprotein lipase) [468], ANGPTL8 [469], KCNJ11 [470], UGT1A1 [471], KCNJ5 [472], GAS6 [473], ACE2 [474], MCHR1 [475], SPP1 [476], E2F1 [477], FADS2 [478], FASN (fatty acid synthase) [479], TIMP3 [480], CHI3L1 [481], MFAP5 [482], LAMB3 [483], SERPINA12 [484], GPBAR1 [485], TREH (trehalase) [486], ANGPTL7 [487], FNDC5 [488], HLA-G [489] and MMP9 [490] are involved in the progression of obesity, but these genes might be novel targets for NAFLD. Studies indicated that EGR1 [491], NR4A1 [492], DLK1 [493], SOCS2 [494], IRS2 [495], PTGS2 [496], RGS2 [497], SOCS3 [498], MMP1 [499], CXCR4 [500], FST (follistatin) [439], PCK1 [501], CHRM3 [502], CTLA4 [503], CCL2 [504], OSM (oncostatin M) [505], EPO (erythropoietin) [506], GATA3 [507], IL10 [508], ESR2 [509], VIP (vasoactive intestinal peptide) [510], KCNIP1 [511], PLA2G2A [361], FOXO1 [451], S100B [512], DKK1 [513], FGF7 [514], RGS16 [515], SLC6A4 [516], CCL3 [517], FGF2 [518], ATF3 [519], SOCS1 [372], COMP (cartilage oligomeric matrix protein) [520], IGF2BP1 [521], ACHE (acetylcholinesterase) [522], TREM2 [523], SCD (stearoyl-CoA desaturase) [524], GCK (glucokinase) [525], EIF2S3B [526], PDX1 [527], FOXP3 [528], CYP7A1 [529], LPL (lipoprotein lipase) [530], ANGPTL8 [531], HSPA8 [532], MB (myoglobin) [533], KCNJ11 [534], UGT1A9 [535], IL2RA [536], CDKN1A [537], SLC29A4 [538], GAS6 [539], PIEZO1 [540], ACE2 [541], E2F1 [542], ALDH3A1 [543], FADS2 [544], FASN (fatty acid synthase) [545], CHI3L1 [546], VTCN1 [547], SERPINA12 [548], GPBAR1 [549], BCAR1 [550], TREH (trehalase) [551], CFB (complement factor B) [552], ANGPTL7 [553], FNDC5 [554], HLA-G [313] and MMP9 [555] played important roles in promoting the diabetes mellitus, but these genes might be novel targets for NAFLD. Németh et al. [556], Westbrook et al. [557], Hu et al. [558], Hu et al. [559], Duitman et al. [560], Landau et al. [561], Jang et al. [562], Kaludjerovic et al. [563], Li et al. [564], Zhang et al. [565], Dwivedi et al. [566], Mehta et al. [567], Baek et al. [341], Grywalska et al. [568], Xu et al. [569], Yu et al. [570], Sakai et al. [571], Feng et al. [572], Elbjeirami et al. [573], Feng et al. [574], Li et al. [575], Bataille et al. [576], Schrimpf et al. [577], Mitra et al. [578], Sun et al. [579], Hanudel et al. [580], Peda et al. [581], Yamaguchi et al. [582], Wang and He [583], Neto et al. [584], Luna-Antonio et al. [585], Hoff et al. [586], Li et al. [587], Cheng and Lin, [588], Tsai et al. [589], Chen et al. [590], Niitsuma et al. [591], Atanasio et al. [592], Sanchez-Niño et al. [593], Shaw et al. [594], Ichida et al. [595], Mazumder et al. [596], Cao et al. [597], Niida et al. [598], Hishida et al. [599], Hirooka et al. [600], Sallinen et al. [289], Huang et al. [601], Ćwiklińska et al. [602], Zou et al. [603], Tsantoulas et al. [604], Oguri et al. [292], Lenglet et al. [605], Pietrzak-Nowacka et al. [606], Lu et al. [607], Zhong et al. [608], Iatrino et al. [609], Yanagita, [610], Yard et al. [611], Maksimowski et al. [612], Valiño-Rivas et al. [613], Guan et al. [614], Wang et al. [615], Hoste et al. [616], Mao, [617], Fabretti et al. [618], Xiao et al. [619], Atwood et al. [620], Lu et al. [621], Khidr [554], Hauer et al. [622], Laucyte-Cibulskiene et al. [623], Yang et al. [624] and Matejas et al. [625] showed that EGR1, NR4A1, SIK1, CCN1, CEBPD (CCAAT enhancer binding protein delta), SOCS2, RGS2, KL (klotho), SOCS3, MMP1, CXCR4, CCN2, FST (follistatin), OGN (osteoglycin), CTLA4, KLF4, PTGER3, HBEGF (heparin binding EGF like growth factor), CCL2, OSM (oncostatin M), ERBB4, TNFSF11, MSTN (myostatin), ADAMTS1, CALCR (calcitonin receptor), INHBB (inhibin subunit beta B), EPO (erythropoietin), IL10, CEBPB (CCAAT enhancer binding protein beta), FOXO1, DKK1, GAS1, NPHP3, FGF2, ATF3, ANGPT2, SOCS1, IL1RAP, C9ORF72, BASP1, AKR1B10, SLC22A12, ACHE (acetylcholinesterase), TREM2, SCD (stearoyl-CoA desaturase), GCK (glucokinase), FOXP3, SLC22A2, EPHB2, LPL (lipoprotein lipase), ANGPTL8, HCN2, HSPA8, MB (myoglobin), KCNJ11, GLIS2, TNFSF13, LSS (lanosterol synthase), GAS6, AGRN (agrin), ACE2, NLRP6, DPEP1, TIMP3, CHI3L1, RASAL1, FAT1, GPBAR1, TREH (trehalase), CFB (complement factor B), FNDC5, HLA-G, MMP9, C4A and LAMB2 were associated with kidney disease, but these genes might be novel targets for NAFLD. Xu et al. [626], Moles et al. [627], Ogata et al. [628], Okamoto et al. [629], Liu et al. [630], Nakamura et al. [631], Liu et al. [632], Zhu et al. [633], Kovalenko et al. [634], Yang et al. [635], Li et al. [636], Znoyko et al. [637], Gong et al. [638], Diamantis et al. [639], Guo et al. [640], Lee et al. [641], Saleh et al. [642], Heinrichs et al. [643], Kurniawan et al. [644], She et al. [645], Mafanda et al. [646], Norman et al. [200], García-Ayllón et al. [647], Zhao et al. [648], Butler et al. [649], Urbánek et al. [650], Smirne et al. [651], Wu et al. [652], Yang et al. [653], Nishimura et al. [654], Han et al. [655] and Barascuk et al. [656] revealed that IRS2, CXCL2, SOCS3, MMP1, CXCR4, SLC7A11, CCN2, KLF4, CCL2, OSM (oncostatin M), PRRX1, ADAMTS1, IL10, VIP (vasoactive intestinal peptide), S100B, CCL3, FGF2, MSMP (microseminoprotein, prostate associated), SOCS1, COMP (cartilage oligomeric matrix protein), ACHE (acetylcholinesterase), FOXP3, EPHB2, UGT1A1, GAS6, ACE2, E2F1, CHI3L1, PHLDA3 and MMP9 are associated with liver cirrhosis. Altered expression of KL (klotho) [657], SOCS3 [658], MMP1 [659], CXCR4 [660], RASD1 [661], SLC7A11 [662], OGN (osteoglycin) [341], KLF4 [663], IL1RL1 [664], CCL2 [665], EPO (erythropoietin) [666], IL10 [667], ESR2 [668], VIP (vasoactive intestinal peptide) [669], FOXC1 [670], FOXO1 [671], S100B [672], DKK1 [673], SOCS1 [674], C9ORF72 [675], CTCFL (CCCTC-binding factor like) [676], S1PR2 [677], ACHE (acetylcholinesterase) [678], TREM2 [679], EPHB2 [680], LPL (lipoprotein lipase) [681], P2RY2 [682], ACE2 [683], CHI3L1 [684], HLA-G [685] and MMP9 [686] are found in cerebrovascular diseases, but these genes might be novel targets for NAFLD.
By analyzing the PPI network and modules, a number of hub genes were identified that may provide new approaches for therapeutic studies of NAFLD. Thus, IKBKG (inhibitor of nuclear factor kappa B kinase regulatory subunit gamma), SFN (stratifin), FOSB (FosB proto-oncogene, AP-1 transcription factor subunit), JUNB (JunB proto-oncogene, AP-1 transcription factor subunit) and JUND (JunD proto-oncogene, AP-1 transcription factor subunit) can be a novel molecular markers of NAFLD and provide new insight to improve therapeutic strategies for NAFLD complications.
A miRNA-hub gene regulatory network and TF-hub gene regulatory network were constructed with hub genes, and then miRNAs and TFs were identified. CDT1 [687], ZWINT (ZW10 interacting kinetochore protein) [688], hsa-mir-3613-3p [689], GATA2 [690], RUNX1 [691], SUZ12 [692], NUCKS1 [693], FOXP2 [694], TP53 [695], AHR (aryl hydrocarbon receptor) [696] and TRIM28 [697] were revealed to be correlated with disease outcome in patients with hepatocellular carcinoma. Hsa-mir-199a-5p [698], hsa-mir-22-5p [699], hsa-mir-338-5p [700], GATA2 [701], RUNX1 [702], SUZ12 [703], TP53 [704] and AHR (aryl hydrocarbon receptor) [705] were revealed to be associated with cardiovascular diseases, but these molecular markers might be novel targets for NAFLD. Hsa-mir-199a-5p [698], GATA2 [706], FOXP2 [707] and AHR (aryl hydrocarbon receptor) [708] have been demonstrated to function in hypertension, but these genes might be novel targets for NAFLD. Previous studies have demonstrated that hsa-mir-22-5p [709], hsa-mir-150-5p [710], RUNX1 [711], TP53 [712], AHR (aryl hydrocarbon receptor) [713] and TRIM28 [714] are an independent prognostic factor in obesity and might be associated with obesity pathogenesis, but these genes might be novel targets for NAFLD. Wang et al. [715], Lu et al. [716], Shi et al. [717], Yu et al. [718], Zhang et al. [719] and Ren et al. [720] reported that altered expression of hsa-mir-3613-3p, hsa-mir-103a-3p, hsa-mir-150-5p, GATA2, RUNX1 and AHR (aryl hydrocarbon receptor) correlated in patients with kidney disease, but these genes might be novel targets for NAFLD. Previous reports have demonstrated that hsa-mir-150-5p [721] and RUNX1 [722] are associated with liver cirrhosis. Increasing evidence demonstrated that hsa-mir-150-5p [723] and TP53 [724] have a function in diabetes mellitus, but these genes might be novel targets for NAFLD. Bertran et al. [725], Yan et al. [726] and Zhu et al. [727] revealed that RUNX1, TP53 and AHR (aryl hydrocarbon receptor) may be the potential targets for NAFLD diagnosis and treatment. Thus, hsa-mir-33a-3p, hsa-mir-561-3p, hsa-mir-30b-3p and hsa-mir-202-3p can be novel molecular markers of NAFLD and provide new insight to improve therapeutic strategies for NAFLD complications.
In conclusion, the large-scale data analysis of the current investigation provides a comprehensive bioinformatics analysis of DEGs that might be associated in the progress of NAFLD. This investigation provides a set of important targets for future studiws into the molecular mechanisms and molecular markers involved in NAFLD. The exploration of potential key genes of NAFLD might provide some potential help in further identification of novel molecular markers for the susceptibility of NAFLD and useful treatment targets. The real function of key genes needs to be explored further to determine the clinical and biological mechanism of NAFLD.
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
No informed consent because this study does not contain human or animals participants.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the GEO (Gene Expression Omnibus) (https://www.ncbi.nlm.nih.gov/geo/) repository. [(GSE135251) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135251)]
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author Contributions
B. V. - Writing original draft, and review and editing
C. V. - Software and investigation
Acknowledgement
I thank Simon Cockell, Newcastle University, Newcastle, United Kingdom, very much, the author who deposited their profiling by high throughput sequencing dataset GSE135251, into the public GEO database.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.↵
- 165.↵
- 166.↵
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.↵
- 172.↵
- 173.↵
- 174.↵
- 175.↵
- 176.↵
- 177.↵
- 178.↵
- 179.↵
- 180.↵
- 181.↵
- 182.↵
- 183.↵
- 184.↵
- 185.↵
- 186.↵
- 187.↵
- 188.↵
- 189.↵
- 190.↵
- 191.↵
- 192.↵
- 193.↵
- 194.↵
- 195.↵
- 196.↵
- 197.↵
- 198.↵
- 199.↵
- 200.↵
- 201.↵
- 202.↵
- 203.↵
- 204.↵
- 205.↵
- 206.↵
- 207.↵
- 208.↵
- 209.↵
- 210.↵
- 211.↵
- 212.↵
- 213.↵
- 214.↵
- 215.↵
- 216.↵
- 217.↵
- 218.↵
- 219.↵
- 220.↵
- 221.↵
- 222.↵
- 223.↵
- 224.↵
- 225.↵
- 226.↵
- 227.↵
- 228.↵
- 229.↵
- 230.↵
- 231.↵
- 232.↵
- 233.↵
- 234.↵
- 235.↵
- 236.↵
- 237.↵
- 238.↵
- 239.↵
- 240.↵
- 241.↵
- 242.↵
- 243.↵
- 244.↵
- 245.↵
- 246.↵
- 247.↵
- 248.↵
- 249.↵
- 250.↵
- 251.↵
- 252.↵
- 253.↵
- 254.↵
- 255.↵
- 256.↵
- 257.↵
- 258.↵
- 259.↵
- 260.↵
- 261.↵
- 262.↵
- 263.↵
- 264.↵
- 265.↵
- 266.↵
- 267.↵
- 268.↵
- 269.↵
- 270.↵
- 271.↵
- 272.↵
- 273.↵