Renping Wan1, Hongliang Liao1, Jingting Liu1, Lin Zhou1, Yingqiu Yin2, Tianhao Mu3,4, Jie Wei2. 1. Department of Thoracic Surgery, Yuebei People's Hospital, 133 Huimin South Road, Wujiang District, Shaoguan City, Guangdong Province, China 440200. 2. Department of Respiratory Medicine, Yuebei People's Hospital, 133 Huimin South Road, Wujiang District, Shaoguan City Guangdong Province, China 440200. 3. Department of Oncology, HaploX Biotechnology, 8/F, Aotexin Power Building, No. 1, Songpingshan Road, High Tech North District, Nanshan District, Shenzhen City, Guangdong Province, China 440300. 4. Department of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue University Town, Nanshan District, Shenzhen City, Guangdong Province, China 440300.
Abstract
Cancer stem cells (CSCs) can induce recurrence and chemotherapy resistance of lung adenocarcinoma (LUAD). Reliable markers identified based on CSC characteristic of LUAD may improve patients' chemotherapy response and prognosis. OCLR was used to calculate mRNA expression-based stemness index (mRNAsi) of LUAD patients' data in TCGA. Association analysis of mRNAsi was performed with clinical features, somatic mutation, and tumor immunity. A prognostic prediction model was established with LASSO Cox regression. Kaplan-Meier Plotter (KM-plotter) and time-dependent ROC were applied to assess signature performance. For LUAD, univariate and multivariate Cox analysis was performed to identify independent prognostic factors. LUAD tissues showed a noticeably higher mRNAsi in than nontumor tissues, and it showed significant differences in T, N, M, AJCC stages, and smoking history. The most frequently mutated gene was TP53, with a higher mRNAsi relating to more frequent mutation of TP53. The mRNAsi was significantly negatively correlated with immune score, stromal score, and ESTIMATE score in LUAD. The blue module was associated with mRNAsi. The 5-gene signature was confirmed as an independent indicator of LUAD prognosis that could promote personalized treatment of LUAD and accurately predict overall survival (OS) of LUAD patients.
Cancer stem cells (CSCs) can induce recurrence and chemotherapy resistance of lung adenocarcinoma (LUAD). Reliable markers identified based on CSC characteristic of LUAD may improve patients' chemotherapy response and prognosis. OCLR was used to calculate mRNA expression-based stemness index (mRNAsi) of LUAD patients' data in TCGA. Association analysis of mRNAsi was performed with clinical features, somatic mutation, and tumor immunity. A prognostic prediction model was established with LASSO Cox regression. Kaplan-Meier Plotter (KM-plotter) and time-dependent ROC were applied to assess signature performance. For LUAD, univariate and multivariate Cox analysis was performed to identify independent prognostic factors. LUAD tissues showed a noticeably higher mRNAsi in than nontumor tissues, and it showed significant differences in T, N, M, AJCC stages, and smoking history. The most frequently mutated gene was TP53, with a higher mRNAsi relating to more frequent mutation of TP53. The mRNAsi was significantly negatively correlated with immune score, stromal score, and ESTIMATE score in LUAD. The blue module was associated with mRNAsi. The 5-gene signature was confirmed as an independent indicator of LUAD prognosis that could promote personalized treatment of LUAD and accurately predict overall survival (OS) of LUAD patients.
Lung adenocarcinoma (LUAD) originates from small airway epithelial type II alveolar cells [1, 2]. Most LUAD patients are diagnosed at advanced cancer stages; conventional treatments for those patients are chemotherapy and radiation, to which LUAD is highly resistant. Thus, LUAD shows a high mortality, the five-year survival chance of which is about 15% [3, 4]. This also demands better improvement of early diagnosis, survival prediction, and relapse monitoring of LUAD patients to prolong their survival.A study found cancer stem cells (CSCs) as a small subgroup of cancer cells with stemness. The self-renewing of CSCs and their production of differentiated cells could facilitate the formation of tumor heterogeneity [5]. The latest evidence indicated that CSC-mediated stem-like phenotypes of cancer cells are the major factor responsible for cancer recurrence and chemical resistance [6]. This also points to the need to accurately identify CSC population. However, due to the quiescent nature of lung epithelial cells, distinguishing normal lung epithelial cells from lung CSCs is still a great challenge [7]. A study showed that identifying CSC markers may help characterize CSCs [8]. In LUAD, several CSC markers, including CD44 [9], CD90 [10], and SOX2, have been discovered [11], but clinical application of CSC-specific biomarkers is less popular. Furthermore, the fact that most markers could mark heterogeneous stem cell populations suggests that their isolation and characterization should be developed using a combination of surface markers or a combination of intracellular or extracellular markers [12]. Malta et al. [13] developed a new indicator to reflect stem cell features of mRNAsi calculated by OCLR. Their results proved that a higher value of mRNAsi is indicative of stronger characteristics of CSCs.At present, there are several system biology methods to identify biomarkers related to the prognosis of LUAD and construct mRNA features. Li et al. [14] identified a 7-gene signature by a nonnegative matrix factorization (NMF) method using gene expression profile related to lipid metabolism, Shi et al. [15] established three TKI-related gene expression profiles to predict the chemosensitivity of lung cancer patients. Zhang et al. [16] identified seven gene markers to predict the prognosis of lung cancer patients by using multiomics data integration analysis. The authors of the three groups tested their gene signatures in internal and external data sets but did not conduct clinical verification. This means that identifying robust gene signatures is still a challenge, and more queues are needed to verify signatures. In conclusion, it is very important to identify the gene characteristics related to the prognosis of LUAD by bioinformatics analysis of its biological function. In this study, clinical LUAD data were derived from TCGA database, and mRNAsi was calculated based on OCLR to investigate the relationship of mRNAsi and clinical LUA characteristics and mutations of LUAD. The weighted gene coexpression network analysis (WGCNA) was constructed for screening modules associated with mRNAsi, according to which genes showing a significance of prognostic relevance to LUAD were filtered. Finally, a 5-gene independent prognostic signature was established that may be beneficial for optimizing survival risk assessment and personalized management of patients with LUAD.
2. Materials and Methods
2.1. Data Acquisition and Research Design
From the PCBC website, RNA-Seq data for pluripotent stem cell samples were acquired using the R package synapser (v 0.6.61). The data of induced pluripotent stem cells (IPSC) samples and embryonic stem cell (ESC) were retained. The Ensembl IDs of ESC and IPSC samples were reserved and converted into gene symbol to retain only protein-encoding genes. The data of 78 samples were obtained. From TCGA database, clinical LUAD data containing RNA-Seq profilation information of 594 samples were acquired. The microarray GSE31210 (n = 246) and GSE50081 (n = 181) were downloaded from GEO website (Table S1). Each LUAD sample had expression profile information and survival data. The median expression value of multiple gene symbols was taken, whereas probes corresponding to multiple genes were excluded. The comprehensive gene annotation is obtained from the GENCODE database (GRCh38.p13), and the information is used to map the Ensembl ID to the gene symbol, and only the protein-encoding genes were reserved. Figure 1 summarizes our study design.
Figure 1
Flowchart of research design.
2.2. Correlation between mRNAsi and Clinical Features
mRNAsi was calculated according to the OCLR method provided by Malta et al. [13]. mRNAsi differences between tumor samples and normal samples were analyzed by an unpaired t-test. We used one-way ANOVA in mRNAsi difference comparison between groups of patients in terms of gender, age, clinical stage, TNM stage, and smoking history.
2.3. Relational Analysis between mRNAsi and Molecular Subtypes
MuTect [17] detection on TCGA-LUAD was performed using TCGAbiolinks [18] (V2.14.0), and differences in mRNAsi of different molecular mutant subtypes were analyzed. In addition, molecular subtype of TCGA-LUAD samples was also extracted using R package TCGA biolinks. mRNAsi differences between samples classified by the CIMP or iCluster were compared.
For constructing a coexpression network [19], the WGCNA algorithm was performed. After removing outlier samples, Pearson's correlation coefficients were determined between groups of genes. The optimal efficacy value β was chosen to construct proximity matrix, which was transformed into topological overlap matrix (TOM). For gene cluster according to TOM (in each gene network module, the minimum number of genes was 80), an average-linkage hierarchical clustering method was applied. The pruning algorithm was applied to divide gene modules and integrate those close modules. The most relevant modules with mRNAsi were screened by correlation analysis.
2.5. Functional Enrichment Analysis
The R software (https://www.R-project.org/,version 4.0.2) packageWebGestaltR [20] (v0.4.2) was employed to perform KEGG and GO functional enrichment analyses for analyzing potential biological functions of the most relevant modules of mRNAsi obtained from WGCNA. GO categories were cellular component (CC), molecular function (MF), and biological process (BP). A statistical significance was defined when FDR < 0.05 and P < 0.05.
2.6. Construction and Verification of Prognostic Signature
Genes of mRNAsi-related modules were separated into verification and training sets based on the principle of the same sample size (Table 1). To analyze the relevance between genes and OS (statistical significance was P < 0.01), Univariate Cox analysis was used here. Glmnet software (doi:10.18637/jss.v039.i05, version 4.1-2) package was used for LASSO Cox regression analysis. Here, those genes of a P < 0.05 were further refined according to the Akaike Information Criterion (AIC). The risk score was determined for each patient by multiplying risk factor obtained by Lasso Cox with gene expression extracted. After standardization, with 0 as the threshold, the samples were grouped by the risk scores into two risk groups (low and high). For OS comparisons between risk groups, we plotted Kaplan-Meier (KM) survival curve. ROC curves were used for prediction evaluation of the signature. In addition, previous LUAD prognostic models were compared with the current risk model.
Table 1
Data statistics of TCGA training set and validation set.
Clinical features
TCGA-train
TCGA-test
P
OS
0
156
162
0.6421
1
94
88
Gender
Female
122
148
0.02488
Male
128
102
T stage
T1
82
85
0.1201
T2
143
124
T3
17
28
T4
8
10
TX
0
3
N stage
N0
160
164
0.2326
N1
52
42
N2
35
34
N3
1
1
NX
2
9
M stage
M0
165
167
0.4442
M1
15
9
MX
70
74
Stage
I
131
137
0.8681
II
61
58
III
39
41
IV
15
10
X
4
4
Age
≤65
111
126
0.3647
>65
133
120
NA
6
4
2.7. The Construction of a Nomogram
To precisely determine independent LUAD prognostic factors, clinical parameters, including gender, age, AJCC stage, T stage, and risk score were subjected to univariate and multivariate Cox regression analyses. Using these clinical factors, a nomogram for OS analysis in 1, 3, and 5 year(s) was built.
3. Results
3.1. mRNAsi and Clinical Characteristics of LUAD
See Figure 1 for the work flow of the current work. mRNAsi, which is a new stemness index for dedifferentiation potential evaluation of tumor cells, has been regarded as a CSC marker [21]. mRNAsi was noticeably higher in LUAD tissues than nontumor ones (Figure 2(a)). The clinical features of mRNAsi in LUAD were examined. The divisions of LUAD patients were divided into two groups according to gender, and age showed no significant difference in mRNAsi between age > 65 and age ≤ 65 (Figure 2(c)), but mRNAsi significant differences in N stage (Figure 2(e)), gender (Figure 2(b)), AJCC stage (Figure 2(g)), and smoking (Figure 2(h)), T stage (Figure 2(d)), and M stage (Figure 2(f)) were observed. Hence, mRNAsi was associated with TNM stage, AJCC stage, and smoking.
Figure 2
mRNAsi and clinical characteristics of LUAD. (a) mRNAsi differences between neoplastic and normal tissues. (b) Differences in mRNAsi between female and male LUAD patients. (c) mRNAsi difference in LUAD patients with ge > 65 and age ≤ 65. (d) mRNAsi differences between LUAD patients at different T stages. (e) mRNAsi differences between LUAD patients at different N stage. (f) mRNAsi analysis of M1 stage, M2 stage, and M3 stage patients. (g) mRNAsi differences among the four AJCC stage. (h) Differences in mRNAsi among grouped patients according to smoking.1 stands for life-long nonsmokers (fewer than 100 cigarettes during lifetime), 2 stands for current smokers (includes daily smokers and nondaily smokers or occasional smokers), 3 stands for current reformed smokers for >15 years, 4 stands for current reformed smokers for ≤15 years, and 5 stands for current reformed smokers; duration not specified = 5.
3.2. Associations of mRNAsi with Mutations
The somatic mutation spectrum of LUAD patients was analyzed by maftools. Figure 3(a) shows the top 10 most frequently genes (KRAS, TTN, TP53, MU16, ZFHX4, Ush2a, CSMD3, LRP1b, RyR2, and XIRP2); here, the gene showing the most frequent mutation was TP53. The mRNAsi differences of each gene between the mutated and nonmutated samples were further analyzed, and the mRNAsi of TTN-, LRP1B-, TP53-, CSMD3-, ZFHX4-, MU16-, USH2A-, XIRP2-, and RyR2-mutated samples were greatly higher than that of nonmutated samples (Figures 3(b)–3(j))3. To further examine the relationship between mRNAsi of tumor samples and clinical features and molecular mutations, tumor samples were arranged according to mRNAsi from low to high, and the clinical data and mutation trends of different samples with mRNAsi were compared. The results demonstrated that the mortality and AJCC stage of patients were increased with the increase of mRNAsi. In addition, a higher mRNAsi was indicative of more frequent mutations of CSMD3, TTN, MU16, RyR2, and TP53 (Figure 4).
Figure 3
Associations of mRNAsi with mutations. (a) An overview of the mutant map in LUAD; only the 10 genes with the highest mutation frequency are shown here. (b) mRNAsi differences between TP53 mutant (MT) and TP53 wild-type (WT) samples. (c) Differences in mRNAsi between TTN mutant LUAD samples and TTN wild-type LUAD samples. (d) mRNAsi differences between MU16 mutant LUAD samples and MU16 wild-type LUAD samples. (e) mRNAsi difference in patients with CSMD3 mutant and CSMD3 wild-type. (f) mRNAsi was compared between RYR2 mutant LUAD and RYR2 wild-type LUAD patients. (g) mRNAsi in LUAD patients with mutant LRP1B was compared with that in LUAD patients without mutant LRP1B. (h) Difference analysis was used to compare the difference in mRNAsi between samples with and without mutations in USH2A. (i) Violin plots showed mRNAsi between LUAD samples with and without LRP1B mutations. (j) Differences between XIRP2 wild-type samples and XIRP2 mutant samples Violin diagram of mRNAsi.
Figure 4
Clinical data and mutation trends for LUAD samples with different mRNAsi.
3.3. Correlation of mRNAsi with Molecular Subtypes and Tumor Immunity
To understand the mRNAsi differences between subtype groupings, LUAD samples were grouped according to CpG Island methylator phenotype (CIMP) or iCluster [22]. According to CIMP, the LUAD samples were divided into three subtypes, and significant differences in mRNAsi among the three subtypes were observed (Figure 5(a)). iCluster analysis detected six clusters in LUAD, with significant differences in mRNAsi among them (Figure 5(b)). We also investigated the association of mRNAsi with tumor immunity. ESTIMATE software (https://sourceforge.net/projects/estimateproject/) package was employing for immune and matrix score determination of LUAD samples, and Pearson's correlation analysis showed that in LUAD, ESTIMATE score, immune score, and stromal score were significantly negatively linked to mRNAsi (Figures 5(c)–5(e)), indicating that mRNAsi was involved in tumor immunity.
Figure 5
Correlation of mRNAsi with molecular subtypes and tumor immunity. (a) mRNAsi differences between LUAD samples classified according to CIMP. (b) mRNAsi differences among molecular subtypes identified by iCluster. (c) Correlativity between mRNAsi and stromal score of LUAD samples in TCGA. (d) Pertinent analysis between mRNAsi and immune score of LUAD samples in TCGA. (e) Correlation analysis between mRNAsi and ESTIMATE score of LUAD samples in TCGA.
3.4. Filtering mRNAsi-Related Gene Modules and Their Functions
WGCNA developed the coexpression network for identifying mRNAsi-related modules. Correlation coefficient in this study was >0.85 when β = 6 (Figure 6(a)). Therefore, a soft threshold of 6 was employed to establish a scale-free network; here, we obtained 14 gene modules (Figure 6(b)). The correlation of each module to LUAD patients' age, T stage, gender, smoking, N stage, AJCC stage, M stage, mRNAsi was analyzed; here, the blue module is the most associated with mRNAsi (Figure 6(c)). The biological processes involved in the blue module were explored using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The blue module was found to be largely implicated in cell mitogen-related pathways, including organelle fission, ribonucleoprotein complex biogenesis, and ncRNA metabolic process (Figure 6(d)). From KEGG analysis, the blue module was mainly concentrated with DNA replication, homologous recombination, base excision repair, etc. (Figure 6(e)).
Figure 6
Identification of gene module associated with mRNAsi. (a) Analysis of network topology for various soft-thresholding powers. (b) Hierarchical clustering tree bases on the topological overlap dissimilarity. (c) Correlation between 14 gene modules and gender, age, T stage, N stage, M stage, AJCC stage, smoking, and mRNAsi. (d) Go analysis of the blue modules. (e) KEGG analysis of blue module.
3.5. Construction of the 5-Gene Signature Based on mRNAsi-Related Genes
A total of 2297 genes were extracted from the blue module by univariate Cox analysis, and 268 survival-related genes in LUAD were retained (Table S2). Nine prognosis-associated genes for LUAD patients were identified with LASSO Cox. According to AIC, 4 genes were eliminated, while the remaining 5 genes were used to build prognostic signature: Risk score = 0.117∗PKP2 + 0.340∗GNPNAT1 + 0.299∗H2AFX + 0.263∗TLE1 + 0.459∗AVEN (Figure S1). TCGA training set samples were classified into two risk groups (low and high) through calculating each sample's risk score using the 5-gene signature (Figure 7(a)). Survival analysis revealed a better prognosis of low-risk LUAD patient group (Figure 7(b)). For 1-year, 3-year, and 5-year OS, the AUC of the risk score was 0.7, 0.76, and 0.65, respectively (Figure 7(c)).
Figure 7
Construction of 5-gene signature on account of mRNAsi-related genes. (a) In TCGA training set, distribution of the risk score, survival data, and the mRNA expression of prognosis signature. (b) Survival curves of LUAD patients in a TCGA training set. (c) ROC analysis for OS prediction in TCGA training set.
3.6. Internal and External Verification of the 5-Gene Signature
To assess the prediction of the 5-gene signature, further validation was performed in four queues (TCGA validation set, complete TCGA-LUAD data set, GSE31210, and GSE50081). According to the risk score, cohort samples were categorized into two groups (low and high) (Figure 8(a)). We found that in TCGA validation set, complete TCGA-LUAD dataset, GSE31210, and GSE50081, prognosis of LUAD patients with a low risk was greatly better than high-risk ones with significant differences (Figure 8(b)). From the ROC analysis on the AUCs of long-term survival, we found that the 5-year survival was higher than 0.6 in the four cohorts (Figure 8(c)). These results confirmed that the 5-gene signature predicted LUAD survival accurately.
Figure 8
Internal and external verification of 5-gene signature. (a) Distribution of the risk score, survival data, and the mRNA expression of prognosis signature in different cohorts. (b) Survival curves of patients with LUAD in different cohorts in the high-risk and low-risk groups. (c) Time-dependent ROC analysis for OS prediction in four cohorts.
3.7. Independent Prediction of the 5-Gene Signature in LUAD Prognosis
We explored the relationship of risk score to clinical characteristics, such as M stage, gender, T stage, AJCC, age, N stage, and smoking. As shown in Figure 9, all of these clinical characteristics showed a close relation to the risk score. For verifying the effectiveness of 5-gene signature, stratified analysis was conducted on age (age > 65 and age ≤ 65), AJCC stage (stage III-IV, stage I-II), gender (male and female), M stage (N2-N3 and N0-N1), T stage (T3-T4, T1, and T2), and N stage (M0). The results verified an effective OS prediction of the risk model in almost all subgroups apart from N2-N3 stage patients (Figure 10). Next, a nomogram was established through combining gender, age, risk score, AJCC stage, and T stage. Here, the risk score showed the greatest impact on the prediction of OS (Figure S2A). Moreover, AJCC stage and risk score were independent prognostic factors for LUAD, as verified by the data from univariate and multivariate Cox analysis (Table 2).
Figure 9
Correlation between risk score and each clinicopathologic feature. A t-test or one-way ANOVA determined the correlation between risk score and age (a), gender (b), T stage (c), N (d), M stage (e), AJCC stage (f), and smoking (g), respectively.
Figure 10
Kaplan-Meier stratification survival analyses in TCGA-LUAD data set, including age ≥ 65 (a), age ≤ 65 (b), male (c), female (d), T1 (e), T2 (f), T3-T4 (g), M0 (h), N0-N1 (i), N2-N3 (j), stage I-II (k), and stage III-IV (l).
Table 2
Univariate and multivariate Cox analysis of all LUAD samples in TCGA dataset.
Variables
Univariable analysis
Multivariable analysis
HR
95% CI of HR
P
HR
95% CI of HR
P
Lower
Upper
Lower
Upper
Age
1.008
0.993
1.024
0.299
1.014
0.999
1.030
0.065
Gender
1.048
0.783
1.403
0.753
0.978
0.727
1.317
0.884
T stage
1.514
1.275
1.797
2.3E − 06
1.180
0.979
1.421
0.082
Stage
1.437
1.277
1.617
1.7E − 09
1.289
1.122
1.480
3.3E − 04
Risk score
1.739
1.509
2.004
2.2E − 14
1.737
1.489
2.025
2.0E − 12
3.8. The 5-Gene Signature Outperformed the Other Three Signatures in Predicting the Performance of the OS
We also compared the 5-gene signature with three previously developed signatures [23-29]. TCGA samples' risk score were, respectively, determined using the seven signatures, and accordingly, all patients were divided to two risk groups (low and high). The survival analysis revealed a significant prognosis difference in the two groups (Figures 11(a), 11(c), 11(e), 11111111). ROC curve of the seven signatures showed that the AUCs for the survival in 1, 3, and 5 year(s) were both lower than 0.75 and the average AUC of OS predicted by our 5-gene signature (Figures 11(b), 11(d)11(f), 11111111), which indicated a high accuracy and performance of our signature.
Figure 11
Five-gene signature outperformed the other three signatures in predicting the performance of the OS. (a) Kaplan-Meier curve of prognosis in patients with TCGA-LUAD predicted by 8-gene signature. (b) ROC curve of the 8-gene signature for 1-, 3-, and 5-year OS. (c) Kaplan-Meier curve of prognosis in patients with TCGA-LUAD predicted by 3-gene signature. (d) ROC curve of the 3-gene signature for 1-, 3-, and 5-year OS. (e) Kaplan-Meier curve of 3-gene signature developed by Cheng Yue et al. for predicting prognosis of patients with TCGA-LUAD.(f) ROC curve of the 3-gene signature developed by Yue et al. for 1-, 3-, and 5-year OS. (g) Kaplan-Meier curve for predicting the OS of TCGA-LUAD patients based on the 6-gene risk model developed by Wang et al. (h) ROC curve analysis showing the prognostic prediction efficiency of the risk model. (i) Kaplan-Meier curve for predicting the OS of TCGA-LUAD patients based on the 7-gene risk model developed by Al-Dherasi et al. (j) ROC curve analysis showing the prognostic prediction efficiency of the 7-gene risk model.
3.9. Functional Analysis and Immune Correlation Analysis of 5-Gene Signature
In order to clarify the potential regulatory mechanism of 5-gene signature, we used ssGSEA method to calculate the KEGG pathway enrichment score of each patient and further calculated the correlation between 5-gene signature and each pathway. We can observe the relationship between 5-gene signature and p53_SIGNALING_PATHWAY, MISMATCH_REPAIR, DNA_REPLICATION, and CELL_Cycle, and other pathways were significantly positively correlated with Fatty_ACID_METABOLISM and ARACHIDONIC_ACID_Metabolism, and other metabolic pathways were significantly negatively correlated (Figure S3A). In addition, we also observed a significant positive correlation between 5-gene signature and mRNAsi (Figure S3B) and a significant negative correlation between 5-gene signature and immune infiltration (Figure S3C-E). The five genes contained in 5-gene signature were significantly overexpressed in tumor samples (Figure S3F). We also analyzed the correlation between the expression of five genes contained in 5-gene signature and mRNAsi. It can be observed that except TLE1 gene, the other four genes showed significant positive correlation with mRNAsi (Figure S3J-K).
4. Discussion
Local and/or systemic treatment of LUAD has been greatly improved, but posttreatment recurrence is still relatively frequent [30]. The regrowth of such tumors after treatment is now thought to be dependent on a few CSCs [31]. Ongoing trials demonstrated that anti-CSC therapy can increase tumor response to chemotherapy and improve patient outcomes [8]. As CSCs consist of a variety of heterogeneous phenotypes rather than a single cell type, predicting the effectiveness of a specific therapy to target CSC could be difficult. Therefore, CSC-specific regulatory pathways or markers that are characteristic of LUAD should be developed along with anti-CSC therapies [32]. Here, we evaluated the CSC characteristics of LUAD samples based on mRNAsi and calculated mRNAsi for each sample in TCGA database using OCLR. Previous studies have shown that compared with normal tissue, mRNAsi is significantly higher in tumor tissues such as breast cancer tissue [33], gastric cancer tissue [34], liver cancer tissues [35], and lung squamous cell carcinoma [36]. At this point, the analytical data revealed a significantly lower mRNAsi in nontumor tissues than LUAD tissues and that mRNAsi showed great differences in M stage, N stage, smoking, AJCC stage, and T stage. Multiple studies demonstrated that LUAD had a high rate of somatic mutation and genome rearrangement [22]. In recent years, a number of somatic mutations occurring in LUAD, including TP53, KRAS, PIK3CA, and MET, have been discovered [37]. Our work found that TP53 was the gene with the highest mutation frequency in LUAD, which was also consistent with previous studies. It is reported that TP53 mutation had a negative impact on cancer prognosis and is associated with a shorter survival time [38]. The status of TTN mutation can be applied to independently evaluate immunotherapy prognosis of LUAD patients [39]. Loss of CSMD3 function resulted from somatic mutations can stimulate the oncogenic transformation of airway epithelial cells [40]. RYR2 has been considered a mutated driver of lung cancer [41]. Somatic mutations of LRP1B are linked to lung tumor mutation load [42]. Here, a higher mRNAsi was correlated with more frequent mutations of XIRP2, MU16, ZFHX4, CSMD3, TTN, USH2A, TP53, RyR2, and LRP1B, without significant mRNAsi difference in mutant KRAS or wild-type KRAS.A study found that CSCs can shape immune microenvironment of tumors, and in turn, the functional and phenotypic characteristics of tumor-infiltrating immune cells could affect the phenotype and differentiation of tumor cells [43]. This study then explored the association of tumor immunity to mRNAsi and confirmed that mRNAsi was closely correlated with ESTIMATE score, immune score, and stromal score of LUAD. Then, WGCNA showed that blue module was found to have the strongest correlation with mRNAsi; moreover, the module was mainly involved in pathways related to cell division. The accumulation of cell division of stem cells will lead to cancer development [44].Based on the blue module, we developed a 5-gene signature and verified its reliability and independence in four cohorts. Previous studies reported the role of five genes in LUAD. It has been found that high-expressed PKP2 could result in a poor LUAD prognosis. Functionally, PKP2 knockdown inhibits the invasion, proliferation of lung cancer cells in vitro, and xenograft lung tumor growth in vivo [45]. GNPNAT1 has been detected to be significantly higher in LUAD in comparison with normal ones and is linked to tumor size, lymphatic metastasis status, and clinical stage of the patients [46]. GNPNAT1 is associated with prognosis and immune infiltration in LUAD [47]. H2AFX, which is considered one of the key genes related to mRNAsi, is also associated with the prognosis and cell cycle of LUAD [48]. TLE1 is identified as a lung-specific oncogene that regulates the EMT of A549 cells through inhibiting E-cadherin [49]. Aven has critical functions in cancer cell response to radiation therapy [50]. However, heterogeneity of LUAD will reduce the reliability of a single gene than the use of a combination of multiple genes.The present work performed a comprehensive study of LUAD patients based on mRNAsi, screened the blue modules most associated with mRNAsi by WGCNA, and developed 5-gene signature for OS prediction of LUAD. The 5-gene signature showed a higher AUC in short-term OS prediction in both validation and training sets but was much lower in long-term prediction, suggesting that 5-gene signature is more suitable for predicting short-term survival of LUAD. Another limitation of this study was that all our data came from TCGA database, which lacked comprehensiveness. Moreover, biological experiments (in vitro and in vivo) should be conducted for result verification and further exploration, which will be conducted in our future study with systematic biological studies.
Authors: Pengyuan Liu; Carl Morrison; Liang Wang; Donghai Xiong; Peter Vedell; Peng Cui; Xing Hua; Feng Ding; Yan Lu; Michael James; John D Ebben; Haiming Xu; Alex A Adjei; Karen Head; Jaime W Andrae; Michael R Tschannen; Howard Jacob; Jing Pan; Qi Zhang; Francoise Van den Bergh; Haijie Xiao; Ken C Lo; Jigar Patel; Todd Richmond; Mary-Anne Watt; Thomas Albert; Rebecca Selzer; Marshall Anderson; Jiang Wang; Yian Wang; Sandra Starnes; Ping Yang; Ming You Journal: Carcinogenesis Date: 2012-04-17 Impact factor: 4.944