Characteristics of circadian rhythm-related genes and establishment of a prognostic scoring system (CRscore) for lung adenocarcinoma with experimental verification

Background Non-small-cell lung cancer (NSCLC) is one of the most common malignant tumors worldwide. Lung adenocarcinoma (LUAD), which is the m ain subtype of NSCLC, has a poor prognosis. In recent years, circadian rhythm (CR)-related genes (CRRGs) have demonstrated associations with tumor occurrence and development, but the relationship between CRRGs and LUAD is not clear. Methods Based on data from The Cancer Genome Atlas and Gene Expression Omnibus databases, we explored the biological function and immune cell infiltration of LUAD in different CR clusters and quantified CR genes using principal component analysis. Then, we built a CR scoring system (CRscore) to explore the relationship between CRRGs and LUAD prognosis. Results Patients were divided into three clusters (A, B, and C). Biological characteristics, such as survival, immune cell infiltration, and gene enrichment, were significantly different among the three clusters. We then established the usefulness of the CR score, which could effectively predict the prognosis of LUAD. Specifically, patients with a high CR score had a better prognosis and were more sensitive to chemotherapy than patients with a low CR score. Conclusion CRRGs can be used to assess the prognosis of patients with LUAD. Quantification of CR using the CRscore tool in patients with LUAD could help to guide personalized cancer immunotherapy strategies in the future. Thus, the CRscore may be a powerful prognostic tool for LUAD.

18.4% of all cancer deaths, and has the highest incidence of all types of cancer 67 worldwide (11.6%) (9). Non-small-cell lung cancer (NSCLC) is the most common 68 type of lung cancer, accounting for approximately 85% of cases. Lung 69 adenocarcinoma (LUAD) is the most common type of NSCLC, and its incidence is 70 increasing year by year. Therefore, it is necessary to identify key molecules and to 71 establish an effective prediction model with good stability that can be used to 72 implement precise treatment and improve the prognosis of patients with LUAD. 73 In this study, we investigated the relationship between LUAD prognosis and CRRGs 74 and established a CR scoring system (CRscore) to predict the prognosis of patients 75 with LUAD and guide treatment selection.  This study examined 10 genes (AANAT , NPAS2, ARNTL, CRY1, PER3, CLOCK, 94 CRY2, CSNK1E, NR1D2, and BHLHE40). First, the copy numbers of the CRRGs 95 were extracted from TCGA-LUAD using Perl software, and histograms were 96 intuitively constructed using R software. The R Circos package was used to map the 97 change in 10 CRRGs on 23 pairs of chromosomes to investigate the relationships 98 between CRRG copy number and chromosome. The differential expression of these 99 CRRGs in TCGA-LUAD was compared using the Wilcoxon rank-sum test in the 100 limma package, which provides a comprehensive solution for microarray and 101 RNA-Seq differential analyses (13). A waterfall plot was created using the maftools 102 package of R to bear out the mutation rate of CRRGs in patients with LUAD. A 103 boxplot was constructed using the ggpubr package of R, and a heatmap was drawn 104 using the pheatmap package. A P value of <0.05 was considered statistically 105 significant. types, the number of samples in each cluster was not short, and no significant increase 116 in the cumulative distribution curve area . According to the correlations between the 117 CRRG clusters and the survival status, the survminer package was used to determine 118 the cut-off points of each subgroup of data, and all possible cut-off points were tested 119 to identify the maximum rank statistics. Based on the log-rank statistics, patients were 120 divided into high, medium, and low expression groups. The Kaplan-Meier method 121 and the survminer package were used to generate survival curves for the predictive 122 analysis. The log-rank test was used to identify significant differences. A P value of 123 <0.05 was considered statistically significant. 125 GSVA is a non-parametric, unsupervised method for estimating path changes and 126 changes in the activity of biological processes in samples in an experimental dataset 127 (14). Based on differences in CRRGs, we revealed the biological pathways between 128 different CRRG clusters. For the gene enrichment analysis, we download 129 "C2.CP.KEGG.7.5.1.symbols ". The scores of different paths in each sample were 130 calculated using the GSA package of R, and the path differences were analyzed using 131 the limma package. A P value of <0.05 indicated differential expression of pathways 132 in pathway regulation (15,16). Heatmaps were drawn using the phatmap package. 133 We used the single-sample GSEA and the gene enrichment score to inspect the 134 relative abundance of immune cell infiltration to obtain the immune score (17). We  To identify the CRRGs there were associated with prognosis (P < 0.05), the univariate 153 Cox regression analysis was used to analyze the DEGs using the survminer package.

154
The ConsensusClusterPlus package was used to cluster the samples according to the 155 expression of prognostic CRRGs to determine the CRRG clusters that should be 156 further analyzed. First, we performed a survival analysis to evaluate the prognostic the survival curves of the three groups, and the difference between the three groups 160 was significant (P < 0.001) according to the log-rank test . After collecting the clinical 161 data (LUAD stage, age, sex, and alive/deceased status), the phatmap package of R 162 was used to draw a heatmap of the correlations between clinical features and CRRG 163 clusters. The boxplot was drawn using the ggpubr package.  We used the following calculation to construct the CRscore mainly using the PCA : where i represents the expression of prognostic DEGs.

176
Correlations between the CR score and patients' clinical 177 characteristics 178 We divided the patients into the high CR score group and the low CR score group for 179 further analysis using the CRscore. First, the survival analysis was used to assess the 180 prognostic value of the CRRG clusters using the same method as described above. respectively. The CR score and CR stage were compared using the limma package.

193
The log-rank test was used to identify statistically significant differences, and the The Kruskal-Wallis test was used to compare three or more groups.  Unsupervised clustering based on CR 315 We introduced a new cohort (merged cohort), which consisted of TCGA-LUAD data 316 and GEO data (GSE37745). Unsupervised clustering was used to separate the tumor 317 samples according to the expression of the seven CRRGs mentioned above.

318
According to the cumulative distribution function value, the optimal quantitative 319 cluster was determined as 3 (k = 3). Thus, the tumor samples were divided into three 320 groups ( Fig. 2A): CRcluster A, CRcluster B, and CRcluster C. There were significant 321 differences among the three clusters according to the results of the PCA (Fig. 5A).

322
Specifically, CRcluster C had the greatest survival advantage (Fig. 2B). The heatmap 323 shows differences in CRRG expression among the different clusters, and the 324 expression of CRRGs was highest in CRcluster B ( Fig. 2C).

369
Based on the three CR clusters, we conducted a differential analysis of the 370 amalgamated cohort to identify DEGs related to CR. As shown in Fig. 5B the co-intersection of the CR typing differential genes. These genes were analyzed 375 using GO and KEGG enrichment analyses to identify their functions (Figs. 4A, B). 376 We identified 151 GO terms and 49 KEGG pathways (P < 0.05, Q < 0.05 prognosis-related DEGs (Fig. 5D), which showed a significant ability to predict 384 patient survival (P < 0.05). Coincidentally, the patients were still divided into three 385 clusters (A-C) using the unsupervised cluster analysis of the 110 CRRGs (Fig. 6A).

386
Patients with gene cluster B had the greatest survival rate in the survival analysis ( Fig.   387 6B). We found that the proportion of patients with LUAD stage I−II was particularly 388 large, and these patients were mainly concentrated in gene cluster B (Fig. 6C).

389
Subsequently, on the basis of the three gene clusters, we analyzed the differential 390 expression of the seven CRRGs in the merged cohort using the limma package. As 391 expected, there were remarkable differences in CRRG expression among the three 392 gene clusters (Fig. 5C). The regulatory mechanism of CRRGs was verified based on 393 the above results. were significantly negatively correlated with the CR score (Fig. 8B). In other words, 410 the lower the CR score, the stronger the immunity. The Kruskal-Wallis test showed a 411 significant difference in the CR score between the CR clusters and the gene clusters.

412
CRcluster A had the lowest score, while CRcluster C had the highest score. As a 413 result, we hypothesized that patients with high and low CR scores were more inclined 414 to suppress and develop tumors, respectively. This is well supported by the survival 415 curves of the high and low CR score groups (Fig. 7A). In terms of the gene clusters, 416 the CR score sequence was gene cluster C > gene cluster B > gene cluster A (Fig.   417 7D). The same was true for the CR clusters (CRcluster C > Crcluster B > Crcluster A) 418 (Fig. 7C). 419 We also analyzed the relationship between the TMB of LUAD and the CR score to 420 explore the relationship between the CR score and tumor occurrence and 421 development. Tumors with a high CR score showed a high TMB (Fig. 9B). In other 422 words, the CR score was positively correlated with the TMB (P < 0.05; r = 0.17) (Fig.   423 9A). The survival curves of the TMB and CR score show that there was no significant 424 difference in survival between the high and low TMB groups (Fig. 9C), but the TMB 425 combined with the CR score predicted a significant difference in survival (Fig. 9D).

426
Patients with a high TMB and a high CR score had a longer survival time. It can be 427 speculated that combining the CR score with the TMB can enhance the sensitivity of 428 TMB to forecast the effectiveness of immunotherapy in patients with LUAD.

467
Although the P value of patients with stage III-IV LUAD was >0.05, the usefulness 468 of the CRscore tool in predicting prognosis should not be underestimated (Fig. 11H).

469
Next, the univariate and multivariate Cox regression analyses were performed for the 470 CR score and clinical characteristics (LUAD stage, sex, and age). Both the univariate 471 (Fig. 9E) and multivariate (Fig. 9F) analyses showed that age, stage, and CR score 472 were independent prognostic factors in this study.

533
The CR is a natural internal homeostatic mechanism that regulates the physiological significantly different between the LUAD group and the healthy group (P < 0.001). In 554 the subsequent genotyping group of prognosis-associated CRRGs, the expression of 555 these seven CRRGs was also significantly different among the three groups (P < 556 0.05). Therefore, we speculate that CRRGs play an essential role in the occurrence 557 and development of LUAD. During the CR cluster analysis, the order of survival was 558 CRcluster C > CRcluster B > CRcluster A. Coincidentally, the same conclusion was 559 drawn among each CR gene cluster, as follows: gene cluster C > gene cluster B > 560 gene cluster A . We hypothesized that the CR score was inextricably related to patient 561 survival. This conjecture was confirmed in the subsequent analysis. As can be seen 562 from the relationship between the CR score and the TMB, the CR score was 563 positively correlated with the TMB. We speculated that high and low CR scores 564 would reveal an anti-tumor process and tumor cell proliferation, respectively. A series 565 of analyses on the gene population confirmed this speculation.

566
As can be seen from the survival analysis, the lower the CR score, the poorer the 567 survival and the higher the tumor malignancy. Further, we detected four common 568 immune checkpoint proteins and predicted their immunogenicity. We suggest that 569 CTLA-4 immunotherapy is more suitable for patients with high CR scores. The 570 sensitivity analysis of common chemotherapeutic drugs showed that most 571 chemotherapeutic drugs were more effective in patients with high CR scores. From 572 another perspective, we also know that the higher the CR score, the better the 573 prognosis.

574
Although the CR score is a prognostic guide and a positive predictor of prognosis in 575 patients with LUAD, some limitations of this study still need to be considered. First, 576 the samples were obtained from public databases, which may have led to selection 577 bias. Second, CRRGs in the database were transcribed from tumor tissues, making it 578 improbable to recognize where the CRRGs identified in this study came from. Finally, 579 not all patients with high CR scores will gain greater immunotherapy benefits, so 580 more clinical factors need to be added to the prediction model to improve its 581 accuracy.

582
In conclusion, we elucidated the significance of CRRGs in clinical practice, immune