Integrated analysis of single-cell and transcriptome based RNA-seq multilayer network and WGCNA for construction and validation of an immune cell-related prognostic model in clear cell renal cell carcinoma

Clear cell renal cell carcinoma (ccRCC) is the most common type of renal cancer (RCC). The increasing incidence and poor prognosis of ccRCC after tumour metastasis makes the study of its pathogenesis extremely important. Traditional studies mostly focus on the regulation of ccRCC by single gene, while ignoring the impact of tumour heterogeneity on disease progression. The purpose of this study is to construct a prognostic risk model for ccRCC by analysing the differential marker genes related to immune cells in the single-cell database for providing help in clinical diagnosis and targeted therapy. Single-cell data and ligand-receptor relationship pair data were downloaded from related publications, and ccRCC phenotype and expression profile data were downloaded from TCGA and CPTAC. The DEGs and marker genes of the immune cell were combined and then intersected with the ligand-receptor gene data, and the 981 ligand-receptor relationship pairs obtained were intersected with the target gene of the transcription factor afterwards; 7,987 immune cell multilayer network relationship pairs were finally observed. Then, the genes in the network and the genes in TCGA were intersected to obtain 966 genes for constructing a co-expression network. Subsequently, 53 genes in black and magenta modules related to prognosis were screened by WGCNA for subsequent analysis. Whereafter, using the data of TCGA, 28 genes with significant prognostic differences were screened out through univariate Cox regression analysis. After that, LASSO regression analysis of these genes was performed to obtain a prognostic risk scoring model containing 16 genes, and CPTAC data showed that the effectiveness of this model was good. The results of correlation analysis between the risk score and other clinical factors showed that age, grade, M, T, stage and risk score were all significantly different (p < 0.05), and the results of prognostic accuracy also reached the threshold of qualification. Combined with clinical information, univariate and multivariate Cox regression analyses verified that risk score was an independent prognostic factor (p < 0.05). A nomogram constructed based on a predictive model for predicting the overall survival was established, and internal validation performed well. Our findings suggest that the predictive model built based on the immune cells scRNA-seq will enable us to judge the prognosis of patients with ccRCC and provide more accurate directions for basic relevant research and clinical practice.

and other clinical factors showed that age, grade, M, T, stage and risk score were all significantly different (p < 0.05), and the results of prognostic accuracy also reached the threshold of qualification. Combined with clinical information, univariate and multivariate Cox regression analyses verified that risk score was an independent prognostic factor (p < 0.05). A nomogram constructed based on a predictive model for predicting the overall survival was established, and internal validation performed well. Our findings suggest that the predictive model built based on the immune cells scRNA-seq will enable us to judge the prognosis of patients with ccRCC and provide more accurate directions for basic relevant research and clinical practice.
In order to construct the ligand-receptor network, we first took the union of the differential genes of all immune cell clusters and the marker genes of all these immune cells. Afterwards, we intersected them with the ligand-receptor relationship pairs downloaded from [3]. Finally, a total of 981 pairs of ligandreceptor relationships were obtained (Fig. S3A, Table S3). Afterwards, we downloaded the transcription factors and target genes data from the TRRUST (https://www.grnpedia.org/trrust/) database and merged the ligand-receptor network data with the transcription factors and target genes data, which is the intersection of the target genes of the transcription factors and the genes from the ligand-receptor network, we obtained 7,987 immune cell multifactor network relationship pairs (Table S4). Then, 966 genes used for the immune cell multifactor network construction (Fig. S3B) were obtained by intersecting the 973 genes contained in the ligand-receptor network and the genes in TCGA (https://portal.gdc.cancer.gov/) dataset about ccRCC (Table S5).
Through the WGCNA algorithm [4], the 966 genes in the immune cell multifactor network were used to construct a co-expression network to find the key modules related to overall survival (OS) and OS time. The appropriate scale-free power value was screened out as 4 (Fig. S4A) when the average connectivity degree was relatively higher [5]. All constructed modules are painted with different colours, and the cluster trees of genes are also shown in Fig. S4B. The black and magenta modules were chosen as the key modules, since they had the highest correlations with OS and OS time of ccRCC ( Fig.  S4C-D). The correlations between module eigengenes (MEs) and clinic traits are shown in Fig. S4E. There were 53 genes in these two modules (Table S6).
For a deeper understanding about the biofunctions of these modules, genes in these modules were carried out to conduct gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses, the top terms were visualized (Fig. S4F-I).
Using the "survival" R package to perform univariate Cox regression analysis on the 53 genes contained in the key modules of WGCNA, 28 genes with P value < 0.05 were obtained. Then, the 28 genes with significant prognostic differences were subjected to LASSO regression analysis. The specific steps were to first adopt the Cox proportional hazard model (family = "Cox") to calculate the hazard ratio (HR) and P values of these genes (Fig. 1A) and then randomly simulated 1,000 times (maxit = 1000) to find the most suitable λ value (Fig. 1B). Finally, "lambda.min" was used to screen out 16 genes for constructing a risk scoring model from these 28 genes (Fig. 1C). According to the risk score of each patient given by the model, the median was taken as the cut-off value, and the samples were divided into high and low risk groups. The time-dependent receiver operating characteristic (ROC) curve was used to evaluate the predictive ability of the model's 1-, 3-and 5year survival periods, and the survival curves of the high and low risk groups were also analysed. The CPTAC (https://cptac-dataportal.georgetown.edu/study-summary/S050) dataset for ccRCC was taken as the external validation database to verify the prognostic risk model. In the evaluation of the predictive efficacy of the prognosis model, Kaplan-Meier (K-M) survival analysis was performed on the high and low risk groups, and the difference was significant (Fig. 1D). In the external CPTAC dataset, K-M survival analysis was performed on the high and low risk groups, and the difference was also significant (Fig. 1E). Moreover, the AUC values in their ROC curves validated their precision (Fig. 1F-G).
In the correlation comparison between risk score and other clinical factors, the results showed that all factors other than age had a certain correlation with risk score (Fig. S5A-G). From the ROC curve, it was also found that, except for the N stage, the AUC values of other ROC curves were above 0.5, and the AUC values of the risk score and stage were above 0.7 (Fig. S5H-J).
The immune infiltration of genes in the model is shown in (Fig. 6A). The differential expression of them in TCGA (Fig. 6B) and CPTAC (Fig. 6C) are displayed. Univariate Cox regression analysis (Fig. S6D) and multivariate Cox regression analysis (Fig. S6E) for the independent analysis of the prognostic model were also done.
Additionally, we also developed a nomogram with predictors included age, gender, T, N, M, grade, risk score and stage (Fig. S7A). The consistency index (C-index), which represents the discrimination, of the nomogram was 0.799. and our calibration chart revealed that the nomogram has a good predictive ability (Fig. S7B-D). The AUC values indicated that the predictive precision of the nomogram combined with all factors was better (Fig. S7E-G).
In summary, our research reveals that the immune cells in ccRCC have a significant impact on the prognosis of patients, and have obvious potential regulatory effects on the occurrence and development of ccRCC. The predictive model based on immune cell scRNA-seq will enable us to judge the prognosis of ccRCC patients and provide valuable guidance for clinical practice.

Consent for publication
Not applicable.

Availability of data and material
The data and materials can be obtained by contacting the corresponding author.

Declarations of interest
None.

Funding
This work was sponsored by the Guangdong Medical Research Foundation (A2021375).

Authors' contributions
G.W designed the research plan, analyzed datasets and wrote the manuscript. W.G and S.Z provided meaningful discussion of key points. G.F gave guidance in whole study and revised the manuscript. All authors read and approved the final manuscript.

Acknowledgements
Thanks to the assistance of Mugu lnc (Beijing) in bioinformatics.         Tables   Table S1. Notes on cell clustering. Table S2. Differential genes in each cell cluster. Table S3. Ligand-receptor relationship pair. Table S4. Immune cell multifactor network relationship pair.