Machine learning-based revelation of GABBR1 and IQGAP2 as candidate biomarkers of pulmonary arterial hypertension and their correlation with immune infiltration

Pulmonary arterial hypertension (PAH) is a pulmonary vascular disease with complex pathogenesis, and its intrinsic molecular mechanisms remain unclear. The aim of this study was to screen gene expression data from PAH patients, identify possible diagnostic indicators of PAH and to investigate the role of immune cell infiltration in the progression of PAH. This study made use of the gene expression dataset of PAH patients from the GEO database. R software was used to identify differentially expressed genes and perform functional enrichment analysis. The SVM-RFE, LASSO and Random Forest algorithms were then used to screen for PAH hub genes and validated in the peripheral blood and lung tissue datasets. Finally, the CIBERSORT algorithm was used to assess PAH lung tissue immune cell infiltration and to investigate the correlation between hub genes and immune cells. A total of 132 DEGs were screened in this study, which were centrally involved in the neuroreceptor-ligand activity pathway and associated with neurotransmission and hemoglobin complex. A total of 2 pivotal genes, GABBR1 and IQGAP2, were obtained by machine learning algorithms. The 2 pivotal genes had good predictive power as verified by ROC curves. Further immune infiltration analysis showed a decrease in T cells and an increase in the proportion of macrophages and dendritic cells in the lung tissue of PAH patients. The expression of GABBR1 was positively correlated with T cells and negatively correlated with macrophages and dendritic cells. In our study, we identified 2 potential diagnostic key genes: GABBR1 and IQGAP2. Our findings may provide a theoretical basis for the analysis of the underlying mechanisms of PAH and the development of targeted medicines. Highlight Box Key findings We identified 2 potential key genes of PAH, GABBR1 and IQGAP2. What is known and what is new? Sympathetic hyperexcitability as well as immune responses are closely associated with the development of PAH, and pulmonary vascular hyperplasia is a key pathogenetic mechanism of PAH. Important biomarkers related to neuroreceptors and immune responses in PAH lung tissue have not been identified, while our study identified GABBR1 as a key neuroreceptor and immune cell regulator in PAH. IQGAP2 could be a new hotspot direction for pulmonary vascular remodeling. What is the implication, and what should change now? GABBR1 and IQGAP2 may be potential therapeutic targets for PAH. The new horizon provided by this study will provide some reference for subsequent PAH studies.


Introduction
Pulmonary arterial hypertension (PAH) is a complex pulmonary vascular disease characterized clinically by elevated pulmonary artery pressures and downstream hemodynamic abnormalities in the pulmonary vasculature and right ventricle (1).Due to this characteristic, resting mean pulmonary artery pressure (mPAP) ≥20 mm Hg had been mostly used as an important diagnostic criterion in recent studies (2).The current global prevalence of PAH is approximately 1% (3).The symptoms of PAH onset are non-specific

Key findings
• We identified 2 potential key genes of PAH, GABBR1 and IQGAP2.

What is known and what is new?
• Sympathetic hyperexcitability as well as immune responses are closely associated with the development of PAH, and pulmonary vascular hyperplasia is a key pathogenetic mechanism of PAH.
• Important biomarkers related to neuroreceptors and immune responses in PAH lung tissue have not been identified, while our study identified GABBR1 as a key neuroreceptor and immune cell regulator in PAH.IQGAP2 could be a new hotspot direction for pulmonary vascular remodeling.

What is the implication, and what should change now?
• GABBR1 and IQGAP2 may be potential therapeutic targets for PAH.The new horizon provided by this study will provide some reference for subsequent PAH studies.and patients usually present with dyspnea on exertion (4).However, PAH is a life-threatening disease and without effective intervention, pulmonary hypertension usually progresses to right ventricular failure and death (5).Pulmonary viral infections are strongly associated with the development of PAH, and available studies have confirmed that infection with coronavirus (COVID-19) can lead to pulmonary vasculopathy, suggesting an increased chance of PAH in the future (6).However, the specific mechanisms underlying the pathogenesis of PAH are complex, and the mechanisms underlying the transformation of normal lung tissue to lesions remain unclear.Therefore, understanding the core biomarkers of PAH is essential for the diagnosis and treatment of this complex disease.
PAH has a high correlation with the patient's living environment, metabolism, genetics and hypoxic injury, with the activation or inhibition of molecular mechanisms resulting from many factors (7).Among these are dysregulation of pathways such as oxidative stress, inflammation and metabolic dysregulation, which ultimately lead to pathological changes such as vasoconstriction, smooth muscle dysregulation and endothelial cell proliferation in the pulmonary vasculature (8).It has been shown that BMPR2, a key gene for endothelial cell proliferation, is the most common gene mutated in hereditary PAH, and by upregulating its expression, the PAH process can be effectively interfered with (9).However, mutations in BMPR2 are less likely to occur in idiopathic PAH (10).The above findings suggest that exploring the core regulatory genes may be a novel therapeutic strategy for PAH.
Immune cells and inflammatory factors are among the important factors contributing to the altered pulmonary vascular pathology in PAH (11).In patients with PAH there is a significant decrease in T-cell function, along with altered composition ratios of monocytes, macrophages, dendritic cells, natural killer cells and B cells (12).In thymus-free nude mice congenitally lacking T cells, their lungs are infiltrated with macrophages, mast cells and B cells, exhibiting immune infiltrative features similar to those of PAH lesions (13).Inflammatory factors, such as IL-6, have an important role in PAH development.mice with IL-6 overexpression spontaneously develop PAH and pulmonary vascular remodeling, as well as occlusive remodeling in hypoxia similar to human disease, whereas IL-6 knockout mice are more resistant to the development of PAH induced by chronic hypoxia (14).The above findings suggest that studies targeting immune-related genes may provide new perspectives for the treatment of PAH.
In this study, we obtained biochip data from the Gene Expression Omnibus (GEO) database to analyze differentially expressed genes (DEGs) (15).Machine learning algorithms were used to further screen and identify diagnostic factors for the PAH hub gene and to validate the diagnostic performance of the core gene by ROC analysis.Further we used CIBERSORT algorithm to study differential immune infiltration between 22 immune cell subpopulations in lung tissue of PAH patients (16).In addition, to better understand the potential molecular immune mechanisms of hub genes, we investigated the relationship between hub genes and infiltrating immune cells.We present the following article in accordance with the TRIPOD reporting checklist.

Data download
The GSE15197, GSE33463 and GSE113439 datasets (GPL6480 platform, GPL6947 platform and GPL6244 platform) were downloaded from the GEO database.In each dataset, only samples from normal control patients (Con group) and PAH patients (PAH group) were selected.GSE15197 contains 31 lung tissue samples, 13 from normal control patients and 18 from PAH patients.GSE113439 contains 26 lung tissue samples, 11 from normal control patients and 15 from PAH patients.GSE33463 contains 71 peripheral blood samples, 41 from normal control patients and 30 from PAH patients.The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Data processing and consolidation
The GSE15197, GSE33463 and GSE113439 datasets were annotated using R 4.3.0software (https://www.r-project.org/) and duplicate values were removed using the mean method.The "inSilicoMerging" package was used to merge the GSE15197 and GSE113439 datasets, and the "limma" package was further used to remove batch effects.The combined dataset consisted of 57 samples, 24 from normal control patients and 33 from PAH patients, with 16,869 genes tested.This dataset will be used for subsequent studies.the GSE33463 dataset will be used for subsequent validation.

DEGs identification
Use the "limma" package of R 4.3.0software to filter and identify differentially expressed genes (DEGs) in the combined dataset.A threshold of |LogFC| > 1 and Adjusted P value < 0.5 was set.The results obtained were visualized using the "ggplot2" package.On this basis, the "pheatmap" package was used to draw the heat map, setting the clustering method to "complete" and the distance algorithm to "eulidean".To highlight the differences between the Con and PAH group, we used the "vegan" package to plot PCoA and set the difference algorithm to "anosim".DEGs were used for follow-up studies and undefined genes were removed.

Functional enrichment analysis
We used the DIVID database (https://david.ncifcrf.gov/)to obtain the biological functions and signaling pathways involved in the DEGs.P value < 0.05 was set to indicate significant functional enrichment.

Revealing the potential biological functions of DEGs by GO analysis. Potential molecular pathways used
to analyze core gene action by KEGG analysis.Demonstrate the association of DEGs with disease by Disgenet analysis.All results were visualized using the "ggplot2" package of the R 4.3.0software.

Hub gene screening by machine learning
We used a combination of SVM-RFE, LASSO and Random Forest algorithms for the screening and identification of pivotal genes.The SVM-RFE algorithm was implemented using the "e1071" and "caret" packages of the R 4.3.0software.The cross-validation method was set as k-fold cross-validation, divided into 10 folds and repeated 5 times.The LASSO algorithm was implemented using the "glmnet" package of the R 4.3.0software, using "lambda_min" to filter the best set of variables.The Random Forest algorithm was implemented using the "randomForest" package to build 10,000 decision trees to obtain the pivot genes, and the results were considered in terms of both the increase in MSE (%incMSE) and increase in NodePurity (incNodePurity) metrics.The results of the three machine learning algorithms use the "venneular" package to take the intersection and obtain the hub genes.

Hub genetic diagnostic performance validation
We validated the pivotal gene diagnostic performance using the "pROC" package of R 4.3.0software.Further logistic regression was performed using the "glm" function, line-column plotting was performed using the "nomogramFormula" package, the accuracy of the model was assessed based on the consistency index, and calibration curves were plotted to assess the performance of the model.

Analysis of immune cell infiltration
Immune cell infiltration studies were performed using the "CIBERSORT" package in R 4.3.0,with the objective of obtaining a matrix of immune cell infiltration in lung tissue from PAH patients.The immune cell infiltration matrix data was visualized using the "ggplot2" package and heat maps were drawn using the "pheatmap" package to highlight the differences in immune cell infiltration between the Con and PAH group.The wilcoxon rank sum test was used to analyze the differences between the two groups in terms of immune cells.

Analysis of the correlation between hub genes and immune cells
Correlation of hub genes with infiltrating immune cells was calculated using Spearman's correlation coefficient.The "ggpubr" package in R 4.3.0 was applied for visualization.

Prediction of regulatory hub gene miRNAs
Regulatory microRNAs (miRNAs) for hub genes were obtained from the mirnet database (https://www.mirnet.ca/).The search results were visualized by using the "ggalluvial" package of the R4.3.0 software.

Statistical analysis
All statistical analysis for this study were implemented using R 4.3.0software, and all statistical tests were bilateral.Statistical analysis of the two data sets was performed using the wilcoxon rank sum test.
Graphical visualization was implemented using the "ggplot2, ggpubr and pheatmap" packages.P < 0.05 was considered statistically significant.

Data pre-processing and DEGs identification
Two datasets, GSE15197 and GSE113439, were included in this study and the two datasets were combined and corrected (Figure 1A, B).PCoA analysis of the Con and PAH groups in the combined dataset revealed significant differences between the two (Figure 1C), suggesting that the correction operations performed on the dataset did not affect their original differences.The results of the volcano plot showed that a total of 132 DEGs were obtained in the combined dataset, including 103 up-regulated and 29 down-regulated genes (Figure 2A).The clustering heat map showed that there were significant differences in expression between the Con and PAH groups on these DEGs (Figure 2B).

DEGs functional enrichment analysis
We performed KEGG analysis on the DEGs and found that the DEGs were mainly involved in four pathways: Taste transduction, Viral protein interaction with cytokine and cytokine receptor, Cytokinecytokine receptor interaction and Neuroactive ligand-receptor interaction (Figure 3A).Further analysis revealed that GABBR1 in DEGs is mainly involved in Taste transduction and Neuroactive ligandreceptor interaction pathways.Its significant downregulation in the PAH group suggests that the physiological activity of neural ligand and receptor molecules in lung tissue of PAH patients may be attenuated.GO analysis showed that DEGs are associated with the regulation of neurogenesis, oxygen transport and neuropeptide signaling pathway in biological processes.In terms of cellular composition, they are highly associated with the hemoglobin complex, haptoglobin-hemoglobin complex and extracellular space.Molecularly involved in organic acid binding, haptoglobin binding and RNA polymerase II transcription factor activity and sequence-specific DNA binding.These results suggest that DEGs are primarily involved in the neural activity of lung tissue and are highly correlated with oxygen transport (Figure 3B).We further analyzed the association of DEGs with diseases through the Disgenet database.It was found that DEGs were highly associated with lung diseases such as Pneumonitis and Lobar Pneumonia, but also with immune responses such as Allergic Reaction, suggesting that immune responses have a large impact in PAH.The significant association of DGEs with diseases such as cardiac arrhythmias explain the molecular basis for the subsequent migration of PAH to cardiac diseases such as right ventricular failure (Figure 3C).

Identification of hub genes
We used the SVM-RFE algorithm for optimal gene subset screening and found that the total dataset of 132 DEGs had the lowest RMSE (Figure 4A).On this result, we called up the top 10 genes in importance as pivotal genes (Figure 4B).There are 15 hub genes in the "lambda_min" subset obtained by the LASSO algorithm (Figure 4C, D).The Random Forest algorithm took the top 10 genes in the %incMSE and incNodePurity items as the optimal subset, respectively (Figure 4E, F).The intersection of these two optimal subsets was taken and a total of 9 hub genes were obtained.The three machine learning results were combined and a total of 2 hub genes were identified: GABBR1 and IQGAP2 (Figure 4H).
Validation in the combined dataset revealed that GABBR1 expression was significantly higher in the Con group than in the PAH group (P< 0.001), while IQGAP2 showed a trend towards significantly lower levels (P< 0.001) (Figure 4I, J).

Hub genetic diagnostic efficacy validation
We validated the diagnostic efficacy of the pivotal genes obtained from machine learning screening by ROC curves.Firstly, the AUC values of GABBR1 and IQGAP2 were found to be 0.962 (0.915-1.000) and 0.933 (0.867-0.999) in lung tissue (combined gene set), respectively, and their combined validation resulted in an AUC value of 0.996 (0.988-1.000) (Figure 5A).Validation of the gene expression set in peripheral blood (GSE33463) revealed AUC values of 0.676 (0.542-0.809) for GABBR1 and 0.530 (0.390-0.670) for IQGAP2, respectively, and after combining them for validation, their AUC values were 0.661 (0.531-0.791) (Figure 5B).In the combined dataset, a corrected C index of 0.999 was constructed for the model.line column plots suggest that reduced GABBR1 expression and upregulated IQGAP2 expression have a greater risk of disease (Figure 5C).In the peripheral blood dataset, a corrected C index of 0.751 was constructed for the model.line plots indicate that IQGAP2 expression has greater diagnostic efficacy compared to IQGAP2 expression (Figure 5D).These results suggest that GABBR1 and IQGAP2 are diagnostic in both peripheral blood and lung tissue, and that the combination of the two genes provides better diagnostic efficacy.

Immuno-infiltration analysis
Differences in immune infiltration between normal and PAH lung tissue in 22 immune cell subpopulations were studied using the CIBERSORT algorithm.Immune cell stacking plots showed significant population differences between normal controls and PAH patients (Figure 6A).In normal lung tissue, T cells CD4 naive and T cells follicular helper had a predominant share.In PAH lung tissue, however, the proportion of Dendritic cells resting, Dendritic cells activated and Macrophages M0 was substantially increased.The clustered heat map showed that the normal group of lung tissue showed a positive correlation with immune cells of the T cell class, such as T cells follicular helper and T cells CD4 naive.In addition, the immune environment of normal lung tissue is mostly composed of Plasma cells and Neutrophils.PAH lung tissue, in contrast to the above trend, showed a substantial decrease in the T-cell content of its immune microenvironment and an increase in the content of dendritic cells and macrophages (Figure 6B).By comparing two with two, we get a clearer variation in the differences.The results showed that the levels of T cells CD4 naive (P= 0.001), T cells CD4 memory resting (P= 0.002), T cells CD4 memory activated (P= 0.002) and T cells follicular helper (P= 0.001) were significantly higher in the lung tissue of the normal group than in the PAH group.In contrast, the levels of Macrophages M0 (P= 0.032), Mast cells resting (P= 0.006) and Mast cells activated (P= 0.013) were significantly higher in the PAH group (Figure 6C).

Regulatory miRNA analysis of hub genes
We predicted the target miRNA of hub genes and found that GABBR1 had more potential miRNA than IQGAP2, with a total of 9 miRNAs, while IQGAP2 had only 5 potential miRNAs.Among them, hsa-miR-138-5p and hsa-miR-20a-5p were the co-regulatory target miRNA of the two hub genes (Figure 8).

Discussion
The development of PAH is mediated by a combination of genes, and inhibition of the expression of some pivotal genes can effectively intervene in the pathogenesis of PAH (17).The immune infiltration of lung tissue in PAH patients is intrinsically linked to their risk level and prognosis (18).However, the expression of some specific genes can significantly alter the immune cell composition of lung tissue (19).Therefore, it is particularly important to search for pivotal genes in PAH and their relevance to immune infiltration.

Key Findings
In our study, analysis of high-throughput sequencing results from the lungs of PAH patients yielded 132 DEGs, whose functional enrichment revealed that a proportion of DEGs are involved in neuroreceptor activity pathways and are closely associated with immune responses.To further mine the core genes, three machine learning models were constructed.The results revealed that gamma-aminobutyric acid type B receptor subunit 1 (GABBR1) and IQ Motif Containing GTPase Activating Protein 2 (IQGAP2) play key roles in PAH.By validation in peripheral blood and lung tissue, these two pivotal genes could predict the occurrence of PAH.We also analyzed the correlation between the two pivotal genes and immune cells and found that GABBR1 was significantly positively correlated with T-cell populations and negatively correlated with macrophage and dendritic cell populations.

Strengths and limitations
Our study was effective in narrowing the scope of the study by deeply mining the core genes of PAH occurrence through multiple machine learning.The 2 hub genes identified in this study have some ability to predict PAH and may be potential therapeutic targets for PAH.However, there are still some limitations in this study.First, idiopathic PAH can be caused by a variety of factors, such as genetics, residential environment, medications, and strenuous exercise, which were not included in the analysis.Secondly, the subsequent progression of PAH has not been investigated in this study, and the molecular mechanisms underlying the progression of PAH to right heart failure have not been clarified, which has limitations for future clinical application.

Comparison with similar researches
Neuroreceptor activity has long been a hot area of research in PAH.Clinical studies have shown that denervation can effectively improve the morbidity of PAH patients (20).It has been proposed that sympathetic overactivity plays a key role in exacerbating the symptoms of PAH patients (21,22).In contrast, GABA can effectively inhibit sympathetic nerves and has an important role in maintaining stable sympathetic inhibitory activity (22).This suggests that GABA receptors have a key role in PAH, but the exact mechanism of their regulation is not yet understood.
Pulmonary artery remodeling is an important risk factor for pulmonary arterial hypertension (PAH).The proliferation of arterial endothelium in PAH abnormalities can be attenuated by suppressing the expression of some genes (23).Numerous studies have shown that IQGAP2 is a central regulatory gene of cell proliferation, which promotes cell division by regulating the mTOR pathway (24).However, whether it regulates pulmonary artery remodeling in patients with PAH has not been investigated in depth.

Explanations of findings
Our discovery of the GABBR1 and IQGAP2 genes, which have not been reported in PAH models, could be the focus of subsequent studies.GABBR1 is one of the specific receptors forγ-aminobutyric acid (GABA), which receives GABA signals and thus inhibits neural activity (25).It has been shown that there is a strong link between GABBR1 expression and lung function (26).Sympathetic hyperactivity is an important factor in the pathogenesis of PAH, therefore the expression level of GABBR1 may be related to PAH progression and is a potential therapeutic target for PAH (27).In contrast, IQGAP2 has been shown to have a proliferative effect, promoting angiogenesis and reorganization (28).By downregulating IQGAP2 it is expected to reduce the state of pulmonary artery remodeling in PAH and thus improve the outcome of PAH patients.
The progression of PAH is closely linked to the immune cell population, particularly the T-cell population, which has a huge potential impact on PAH (29).Among these is the CD4+ subpopulation of T cells (CD4+ T) which functions to resist the inflammatory response of other immune cells, and regulatory T cell deficiency can lead to vascular inflammation with PAH (30).In a lung environment deficient in T cells, lung tissue is more susceptible to vascular endothelial transformation, smooth muscle cell proliferation and outer membrane fibroblast proliferation, which progresses to pulmonary vascular disease (31).In addition to this, studies have shown that regulatory T-cell-deficient rats exhibit accumulation of macrophages a week before and after the onset of PAH (13).In contrast, macrophages secrete Leukotriene B4, a key compound in PAH pulmonary vascular endothelial injury (32).This suggests that a reduced proportion of T cells with the accumulation of macrophages is a feature of the immune infiltration of PAH lung tissue.
We also noted that dendritic cells were recruited in large numbers in the lung tissue of PAH (33).
Dendritic cell populations promote the production of pro-inflammatory cytokines such as IL-6, IL-8 and IL-10, and the increase in these inflammatory factors reduces survival in patients with PAH (34,35).The above findings suggest that the ratio of regulatory T cells, macrophages and dendritic cells is closely related to the progression of PAH, as was also observed in our study.Our study also indicated that GABBR1 expression showed a significant positive correlation with T-cell populations and a significant negative correlation with macrophages and dendritic cells.This suggests that elevating GABBR1 expression may improve the immune infiltration status of lung tissue in PAH.

Implications and actions needed
Traditional PAH biologic factors, such as BMPR2, are highly valuable in predicting the diagnosis and severity of PAH and are widely used in clinical work (36).However, the episodic rate of BMPR2 mutants is low, with only 30% of carriers showing disease manifestations, and some specificity is still lacking in idiopathic PAH (37).Therefore, the exploration and discovery of more specific biological factors is beneficial for the early screening and diagnosis of PAH.In addition, there is an urgent need to develop new therapeutic targets for the treatment of PAH.The new horizons provided by this study will provide some reference for subsequent PAH research.

Conclusions
Our study identified a total of 2 potential key genes, GABBR1 and IQGAP2, two pivotal genes with good diagnostic significance.They have great potential in regulating neuroreceptor activity and vascular endothelial proliferation in PAH lung tissue.One of them, GABBR1, also has a greater association with the immune infiltration of PAH lung tissue.Our findings may provide a theoretical basis for the analysis of the intrinsic mechanisms of PAH and the development of targeted medicines.

Figure 2
Figure 2 Volcano map and heat map of DEGs expression of PAH.(A) Volcano plot of DEGs (The differences are set as |LogFC| > 1 and Adjusted P value < 0.5); (B) Heatmap of DEGs expression (Top 40 up-regulated and 10 down-regulated genes).DEGs, differentially expressed genes; PAH, pulmonary arterial hypertension; FC, fold change.

Figure 3
Figure 3 KEGG, GO and Disgenet analysis of DEGs.(A) KEGG analysis.(B) GO analysis (Top 5 according to P value in BP, CC, and MF, respectively).(C) Disgenet analysis (Top 10 according to P value in Disgenet).DEGs, differentially expressed genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological processes; CC, cellular components; MF, molecular functions.

Figure 4
Figure 4 SVM-RFE, LASSO and Random Forest algorithms for screening hub genes.(A) RMSE values for different subsets of the number of variables in the SVM-RFE algorithm.(B) Top 10 variables in the optimal subset of the SVM-RFE algorithm.(C) Coefficient distribution of genes in the LASSO algorithm.(D) Distribution of Lambda values for different subsets of the LASSO algorithm.(E) Top 10 genes with %incMSE values in Random Forest algorithm.(F) Top 10 genes with IncNodePurity values in Random Forest algorithm.(H) Venn diagram of three machine learning algorithms.(I) Expression plot of GABBR1 in the combined dataset.(J) Expression plot of IQGAP2 in the combined dataset.

Figure 5
Figure 5 Validation of the diagnostic efficacy of GABBR1 and IQGAP2.(A) ROC curve of combined data set.(B) ROC curve of GSE33463 data set.(C) Nomogram column line chart of combined data set.(D) Nomogram column line chart of GSE33463 data set.ROC, receiver operating characteristic.

Figure 6
Figure 6 Immune infiltration analysis of the combine data set.(A) Immune cell accumulation plot.(B) Heat map for immune cells.(C) Box plot of immune cell infiltration levels in healthy individuals and PAH patients.PAH, pulmonary arterial hypertension; ns, P> 0.05; *, P< 0.05; **, P< 0.01.

Figure 7
Figure 7Heat map of the correlation analysis between core genes and immune cells.

Figure 8
Figure 8 Regulatory miRNAs for hub genes.