Literature DB >> 32019546

IRGS: an immune-related gene classifier for lung adenocarcinoma prognosis.

Xiaoshun Shi1,2, Ruidong Li3, Xiaoying Dong4, Allen Menglin Chen5,6, Xiguang Liu4, Di Lu4, Siyang Feng4, He Wang4, Kaican Cai7.   

Abstract

BACKGROUND: Tumour cells interfere with normal immune functions by affecting the expression of some immune-related genes, which play roles in the prognosis of cancer patients. In recent years, immunotherapy for tumours has been widely studied, but a practical prognostic model based on immune-related genes in lung adenocarcinoma comparable to existing model has not been established and reported.
METHODS: We first obtained publicly accessible lung adenocarcinoma RNA expression data from The Cancer Genome Atlas (TCGA) for differential gene expression analysis and then filtered immune-related genes based on the ImmPort database. By using the lasso algorithm and multivariate Cox Proportional-Hazards (CoxPH) regression analysis, we identified candidate genes for model development and validation. The robustness of the model was further examined by comparing the model with three established gene models.
RESULTS: Gene expression data from a total of 524 lung adenocarcinoma patients from TCGA were used for model development. We identified four biomarkers (MAP3K8, CCL20, VEGFC, and ANGPTL4) that could predict overall survival in lung adenocarcinoma (HR = 1.98, 95% CI 1.48 to 2.64, P = 4.19e-06) and this model could be used as a classifier for the evaluation of low-risk and high-risk groups. This model was validated with independent microarray data and was highly comparable with previously reported gene expression signatures for lung adenocarcinoma prognosis.
CONCLUSIONS: In this study, we identified a practical and robust four-gene prognostic model based on an immune gene dataset with cross-platform compatibility. This model has potential value in improving TNM staging for survival predictions in patients with lung adenocarcinoma. IMPACT: The study provides a method of immune relevant gene prognosis model and the identification of immune gene classifier for the prediction of lung adenocarcinoma prognosis with RNA sequencing and microarray compatibility.

Entities:  

Keywords:  IRGS; Immune gene classifier; Lung adenocarcinoma; Prognosis; TCGA

Mesh:

Year:  2020        PMID: 32019546      PMCID: PMC7001261          DOI: 10.1186/s12967-020-02233-y

Source DB:  PubMed          Journal:  J Transl Med        ISSN: 1479-5876            Impact factor:   5.531


Background

The high incidence and mortality of lung cancer make this disease one of the most severe public health problems worldwide. Lung adenocarcinoma accounts for 60% of all lung cancers [1]. The TNM staging system is one of the most commonly used survival classifiers for lung cancer disease stage evaluation [2]. However, this system only considers anatomical factors and does not combine these factors with personalized genetic information or patient immune status. Therefore, individual risk stratification can only be predicted based on general populations. With the application of microarray and sequencing technology, high-throughput gene expression profiles associated with prognostic clinical data can be obtained [3], providing an opportunity for the personalized evaluation of cancer recurrence risk. In recent years, an increasing number of studies have shown that gene expression sets are able to predict cancer patient survival, assisting decision-making regarding adjuvant chemotherapy administration. For example, chromosomal instability is a hallmark of cancer that is associated with tumour heterogeneity. Therefore, the centromere and kinetochore gene expression score (CES) signature has been used to predict the prognosis of lung cancer patients after treatment [4]. Furthermore, the discovery of genetic biomarkers has played positive roles in tailoring treatment plans and avoiding unnecessary treatments. However, current gene models, even those including up to 14 genes, have neglected clinicopathological features or are far from practical [5]. The tissue microenvironment of tumour cells plays a crucial role in the development of tumours [6]. Tumour cells can mimic the functions of immune cells to induce immunosuppression by overexpressing immune-related genes, thereby promoting the proliferation and spread of tumour cells. The acquisition of immune functions maintains tumour cells that are able to survive in the immunosuppressive microenvironment. However, the expression patterns of immune-related genes or proteins in lung adenocarcinoma and their clinical significance are still unclear. With the increased study of tumour immunotherapy, immune scoring is considered an important tumour classification method to understand immunological characteristics and predict cancer patient survival [7]. However, this scoring method only utilizes the degree of immune cell infiltration rather than whole-tumour sequencing data to reflect the overall immune function in patients. Studying the differential expression of immune-related genes in lung cancer tissue samples is of great significance for understanding the immune microenvironment of lung cancer tissues and providing new insights to improve clinical diagnostics, prevention, patient immune status evaluation and prognosis. Based on this premise, we hypothesized that an immune-related gene expression score (IRGS) could assist the prognostic evaluation of patients with lung adenocarcinoma. Independent high-throughput data from microarray platform verified the predictive power and universality of this model. Our model can predict the prognosis of patients with lung adenocarcinoma even more precisely when combined with the traditional TNM staging system.

Methods

Data acquisition

In brief, data were obtained from 3 independent databases. The gene expression quantification data of RNAseq [HTSeq—Counts] for a total of 524 primary tumour and 59 normal samples from the TCGA-lung adenocarcinoma (LUAD) project were downloaded using GDCRNATools [8]. Metadata associated with the downloaded files was parsed using the tool to facilitate the integration of the count table with rows are genes and columns are samples. Trimmed mean of M values (TMM) normalization of the count data was performed using edgeR. Next, 2498 immune-associated genes were obtained from the immunology database and analysis portal (ImmPort) [https://www.immport.org/shared/genelists] on June 3, 2018 [9]. In the validation stage, we screened lung adenocarcinoma microarray data from the Gene Expression Omnibus database (GEO; http://www.ncbi.nlm.nih.gov/geo/) in June 2018 with the search term “lung adenocarcinoma”. The following exclusion criteria were applied to the microarray data: (1) containing squamous cell carcinoma, large cell lung cancer or other non-small cell lung cancer samples; (2) studies with no or insufficient clinical data; (3) only one TNM staging evaluation performed; and (4) non-primary surgical specimens. After review, GSE31210, which contained 226 lung adenocarcinoma samples and a full clinical dataset, was enrolled for validation [10].

Development of an IRGS risk model

An immune-related prognostic model was developed by using the TCGA and ImmPort datasets. First, differentially expressed genes were identified by the limma package [11]. A false discovery rate (FDR) less than 0.01 and an absolute fold change greater than 2 were defined as the significance threshold. Thus, 3013 differentially expressed genes (1173 genes with upregulated expression and 1840 genes with downregulated expression) were filtered out. Second, 364 differentially expressed immune-related genes were filtered from 2498 genes in the ImmPort database. Last, the glmnet package [12] in R was used to implement the Lasso algorithm for gene selection. With an optimal lambda value of 0.0175 based on cross validation at which the minimal mean squared error is achieved, a total of 18 genes were further subjected to univariate and multivariate CoxPH regression analysis. Four immune-related genes (MAP3K8, CCL20, VEGFC, and ANGPTL4) were ultimately included in the risk prognosis model (risk score = − 0.434 × MAP3K8 + 0.127 × CCL20 + 0.231 × VEGFC + 0.122 × ANGPTL4). The Risk score is calculated as ∑coefficients * expression values and the median risk score is − 0.2047 for the TCGA-LUAD cohort. For Kaplan–Meier (KM) survival analysis, patients were divided into a high-risk group and a low-risk group according to the median risk score calculated by this prognostic model. Heretofore, we named this immune-related gene expression score signature the IRGS system.

External validation of the IRGS system

Independent validation was performed by using the publicly available lung cancer dataset GSE31210 profiled by Affymetrix Human Genome U133 Plus 2.0. The GSE31210 dataset consisted of 226 lung tumour samples, of which 127 had an EGFR mutation, 20 had a KRAS mutation, 11 had an EML4-ALK fusion, and 68 were triple-negative cases. The data were normalized by the Affymetrix’s probe level normalization algorithm MAS5. We use overall survival (OS) data for survival analysis. Probes for all of the four prognostic genes in the IRGS system were able to be retrieved in the GSE31210 dataset. For genes with multiple probes, only the one with the largest variation was kept. Expression value of the selected probe for each gene was transformed to log2 scale and then the IRGS model was applied for survival prediction.

Robustness of the IRGS system

To evaluate the performance of the IRGS signature in predicting prognosis, we compared the IRGS system with previously reported signatures including the centromere and kinetochore gene expression score (CES) (CENP-A, HJURP, MIS18B, CENP-N, CENP-L, CENP-K, ZWINT, NDC80, SPC24, SPC25, NUF2, CENP-W, CENP-U and CENP-M) [4], the three-gene prognostic classifier (STX1A, HIF1A, and CCR7) [13] and a 14-gene expression assay (BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, SH3BGR, WNT3A and the three reference genes ESD, TBP, and YAP1) [5]. Consistent with the approach used in IRGS development, Kaplan–Meier overall survival curves were plotted for patients stratified according to the median risk score calculated by these models for the TCGA dataset. Then, the specificity and sensitivity of these models were analysed to calculate the area under the ROC curve.

In silico function analysis

To further explore the key functions of the genes in the IRGS system, we performed co-expression analysis of the four genes in the TCGA lung adenocarcinoma dataset. Protein coding genes with a Pearson correlation coefficient > 0.4 and P value < 0.01 were considered to be correlated. We sorted the LUAD RNA-seq data downloaded from the TCGA into the top and bottom quartiles of MAP3K8, CCL20, VEGFC, and ANGPTL4 expression (high and low expression, respectively) by GSEA [14] to assess the enriched pathways associated with the IRGS system.

Statistical analysis

R (Version 3.4.3) was used for most of the bioinformatics and statistical analyses including RNAseq and microarray data normalization and transformation, differential gene expression analysis, gene coexpression analysis, CoxPH and KM survival analyses, as well as ROC analysis. The in silico pathway analysis of the IRGS system was performed using GSEA software. Two-sided P-values < 0.01 were considered statistically significant. To rule out batch effects in TCGA data collection processing, we used the dispersion separability criterion (DSC) to measure the batch effects of TCGA-LUAD RNAseq data and didn’t observe strong batch effects (DSC < 0.5, an indicator of less serious batch effects, Additional file 1: Figure S1A). We have also compared the results of differential gene expression analyses before and after batch effect removal. A very large overlap of differential expression genes was observed (Additional file 1: Figure S1B), suggesting the batch effect small.

Results

A subset of immune-related genes is dysregulated in lung adenocarcinoma patients

Aiming to identify immune-related genes that can classify high-risk and low-risk lung adenocarcinoma patients, we confined all the data sources in this study to RNA expression profiles of primary lung adenocarcinoma. By comparing primary tumor and solid tissue normal samples in TCGA, a total of 3013 differentially expressed genes (Additional file 2: Table S1) were identified, including 1173 up-regulated genes and 1840 down-regulated genes (Fig. 1a). The differentially expressed genes were further matched with 2498 immune-associated genes from ImmPort to generate a list of 364 candidate genes (Additional file 3: Table S2). Next, by using the lasso algorithm, the candidate genes were narrowed down to 18 with the optimal lambda value of 0.0175 (Fig. 1b). Then, univariate and multivariate Cox regression analyses were further applied. To identify key components of these candidate genes, those fit with significantly and differentially expressed between tumor and normal samples, selected as an important feature by lasso, and significant in both univariate and multivariate coxph tests were candidate genes, restricting down to 4 genes (MAP3K8, CCL20, VEGFC, and ANGPTL4) with P-values less than 0.01 as potential parameters for use in our model (Additional file 4: Table S3). Using the TCGA database, we analysed MAP3K8, CCL20, VEGFC, and ANGPTL4 gene expression levels in both normal tissue and tumour tissue samples. The four immune-associated genes displayed dysregulation in the tumour tissue samples compared with the normal tissue samples (Fig. 1c). The distribution of the survival status, risk scores, and expression of the 4 IRGS genes in training samples are illustrated in Fig. 1d. None of these genes are common driver genes in lung cancer patients. We speculate that the collaboratively abnormal expression of the immune-related genes could be regarded as an independent prognostic factor.
Fig. 1

The selection of immune-related genes. A Differentially expressed genes in the TCGA-LUAD dataset were filtered out. b Immune-related genes were narrowed down by the lasso algorithm. c The expression of MAP3K8, CCL20, VEGFC, and ANGPTL4 in cancerous and control tissue samples is shown. d The distribution of the survival status, risk scores, and expression of the 4 IRGS genes in training samples are illustrated. The x axis shows the patients ranked by risk scores. High and low risk patients were separated by the dashed line

The selection of immune-related genes. A Differentially expressed genes in the TCGA-LUAD dataset were filtered out. b Immune-related genes were narrowed down by the lasso algorithm. c The expression of MAP3K8, CCL20, VEGFC, and ANGPTL4 in cancerous and control tissue samples is shown. d The distribution of the survival status, risk scores, and expression of the 4 IRGS genes in training samples are illustrated. The x axis shows the patients ranked by risk scores. High and low risk patients were separated by the dashed line

Development and independent validation of IRGS

Next, we enrolled the 4 immune-related genes in building a prediction model to assess the recurrence risk of lung adenocarcinoma patients. Both univariate and multivariate CoxPH survival analysis in the TCGA dataset indicated that the risk score (the minimal, maximum and median of risk score of our model in TCGA-LUAD dataset are − 1.9454, 2.3268 and − 0.2047) calculated by the IRGS model could be used as an independent prognostic factor for lung adenocarcinoma patient survival (Table 1). The median value of the risk score was then used to stratify patients; patients scoring greater than the median value were considered to have a poor prognosis, whereas patients scoring less than the median value were stratified as having a good prognosis. Kaplan–Meier survival analysis showed that the IRGS model is capable of stratifying the patients into two distinct groups with significant difference in overall survival (HR = 1.98, 95% CI 1.48 to 2.64, P = 4.19e−06); Fig. 2a). In addition to TNM staging alone in TCGA LUAD cohort (Additional file 5: Figure S2A–D), examination of our model at different stages of the lung adenocarcinoma revealed that the IRGS can separate low-risk and high-risk patients throughout TNM stages, namely stage I (HR = 2.13; CI 1.30 to 3.48; P = 2.73e−03), stage II (HR = 2.01; CI 1.22 to 3.32; P = 6.12e−03), stage III (HR = 2.62; CI 1.66 to 4.11; P = 3.32e−05), and stage IV (HR = 2.70; CI 1.02 to 7.30; P = 0.049).
Table 1

Univariate and multivariate Cox regression analyses of the IMAGES signature in the TCGA dataset

VariableUnivariate analysisMultivariate analysis
HR95% CIP valueHR95% CIP value
Risk score2.722.10–3.522.74e−14*2.351.82–3.056.70e−11*
Stage1.671.46–1.921.71e−13*1.591.38–1.843.96e−10*
Age1.010.99–1.020.31
Gender1.070.80–1.430.63
Smoking status1.030.89–1.180.71
Fig. 2

The development and validation of IRGS. a Kaplan–Meier analysis of overall survival in high- and low-risk groups of the lung adenocarcinoma patients in the TCGA-LUAD dataset; b ROC curves of the IRGS system and gene components in the system; c the area under the ROC curve (AUC) is given, IRGS system, TNM system, age and TNM plus the IRGS system. d Kaplan–Meier analysis of overall survival in high- and low-risk groups of 226 lung adenocarcinoma patients in an independent microarray dataset

Univariate and multivariate Cox regression analyses of the IMAGES signature in the TCGA dataset The development and validation of IRGS. a Kaplan–Meier analysis of overall survival in high- and low-risk groups of the lung adenocarcinoma patients in the TCGA-LUAD dataset; b ROC curves of the IRGS system and gene components in the system; c the area under the ROC curve (AUC) is given, IRGS system, TNM system, age and TNM plus the IRGS system. d Kaplan–Meier analysis of overall survival in high- and low-risk groups of 226 lung adenocarcinoma patients in an independent microarray dataset To test the applicability of the IRGS system, we applied ROC analyses to the training cohort. These analyses revealed that the individual genes in the model could not function as prognostic factors; they had to be combined (Fig. 2b). Further analysis revealed that the IRGS system demonstrated a significant improvement over age or TNM staging in predicting prognosis. Moreover, the prognostic capability increased 11% when the IRGS system was combined with the traditional TNM staging system, yielding an AUC of 0.706 (Fig. 2c) and suggesting that the IRGS is a powerful tool in LUAD survival prediction. To further verify whether the model we created can be tested by independent validation and have cross-platform universality, microarray data with clinical information were obtained from the GEO database. Following the application of the exclusion criteria, the GSE31210 microarray dataset was extracted from among 12 candidate datasets. Consistent with the training cohort results, the result of a univariate Cox regression analysis suggested that the IRGS signature was significantly associated with survival (HR = 2.11; 95% CI 1.29 to 3.46; P = 2.93e−03) in patients with lung adenocarcinoma (Additional file 6: Table S4). Kaplan–Meier analysis found that the IRGS model distinguished the high-risk group from the low-risk group, effectively differentiating the survival of the external 226 lung adenocarcinoma patients in the microarray dataset (HR = 3.92; 95% CI 2.02 to 7.62; P = 2.36e−04; Fig. 2d). Thus, we found that the IRGS signature can be used in both the TCGA RNA-sequencing data and lung microarray datasets to predict the survival of lung adenocarcinoma patients with different TNM-stage disease.

Performance of the IRGS in predicting LUAD prognosis

Some prognostic gene signatures for lung cancer have been reported previously, and the robustness of the IRGS was compared with the three-gene classifier, the CES signature and the 14-gene practical assay in both the TCGA RNAseq training dataset and in the independent microarray validation dataset. We first extracted a list of genes in the literature from the TCGA-LUAD dataset and submitted them to the bioinformatic pipeline used for IRGS development. Survival curves were generated (Fig. 3a–c) and showed that all three prognostic models effectively differentiated the survival of lung cancer patients (three-gene classifier: HR = 1.44, 95% CI 1.08 to 1.92, P = 0.014; the CES signature: HR = 2.07, 95% CI 1.55 to 2.77, P = 7.20e−07); and the 14-gene practical assay: HR = 1.62, 95% CI 1.21 to 2.15, P = 1.12e−03). ROC analysis was performed with the same TCGA cohort, and the AUC values were 0.602 for the three-gene classifier, 0.616 for the CES signature, and 0.598 for the 14-gene practical assay. The AUC value of the IRGS was 0.644 (Fig. 3d), leading us to conclude that the IRGS prognostic scoring system is more effective and accurate than previous gene expression signatures in predicting the survival of lung adenocarcinoma patients.
Fig. 3

The IRGS system was compared with other gene models in the analysis of a unified RNA-sequencing dataset. a Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the three-gene classifier; b Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the CES signature; c Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the 14-gene assay; and d ROC curves for the IRGS signature, the three-gene classifier, the CES signature and the 14-gene practical assay were plotted

The IRGS system was compared with other gene models in the analysis of a unified RNA-sequencing dataset. a Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the three-gene classifier; b Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the CES signature; c Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by the 14-gene assay; and d ROC curves for the IRGS signature, the three-gene classifier, the CES signature and the 14-gene practical assay were plotted In the independent microarray validation dataset, all the three classifiers were capable of stratifying low-risk and high-risk lung adenocarcinoma patients (three-gene classifier: HR = 2.35, 95% CI 1.21 to 4.56, P = 0.015; the CES signature: HR = 4.30, 95% CI 2.22 to 8.35, P = 1.59e−04; and the 14-gene practical assay: HR = 3.80, 95% CI 1.95 to 7.37, P = 3.53e−04; Additional file 7: Figure S3A–C). The AUC values were 0.671 for the three-gene classifier, 0.691 for the CES signature, and 0.672 for the 14-gene signature, whereas the AUC value for the IRGS was 0.681 (Additional file 7: Figure S3D). Taken together, the IRGS prognostic scoring system is comparable with previous gene expression signatures in predicting the survival of lung adenocarcinoma patients using gene expression data from the microarray platform.

Functional analysis of the IRGS signature

To further explore the core biological mechanism of our model, we performed individual co-expression analyses of the four genes with the TCGA lung adenocarcinoma RNA-sequencing dataset. Among the co-expressed genes, 39 genes were significantly associated with MAP3K8, 16 genes were significantly associated with CCL20, 290 genes were significantly associated with VEGFC, and 60 genes were significantly associated with ANGPTL4, but no common intersections were found among the co-expressed genes, suggesting that our model is the optimal gene combination (Fig. 4a). Next, we performed GSEA analysis of the 4 genes in the IRGS model (Additional file 8: Table S5). In the high CCL20 expression phenotype, regulation of immunoglobulin production, positive regulation of immunoglobulin production, positive regulation of the innate immune response, regulation of the T helper 1-type immune response, activation of the innate immune response, positive regulation of B cell-mediated immunity, and positive regulation of the immunoglobulin-mediated immune response were significantly enriched. Similarly, the pathways of T cell activation involved in the immune response, negative regulation of the adaptive immune response, negative regulation of cytokine production involved in the immune response, the humoural immune response, T cell differentiation involved in the immune response, and regulation of cytokine production involved in the immune response were found to be significantly enriched in the high MAP3K8 expression phenotype (Fig. 4b–h). Regarding the high VEGFC expression phenotype, 48 immune-associated pathways were significantly enriched (Additional file 8: Table S5), ranging from innate to adaptive immune responses. Our functional analysis indicates that genes in the IRGS system participate in a variety of immune functions in LUAD tissue.
Fig. 4

The functional analysis of the IRGS. a The Venn diagram indicates that each gene in the IRGS model has an individual biological function; b–h MAP3K8 and its co-expressed genes are involved in multiple immune-related responses

The functional analysis of the IRGS. a The Venn diagram indicates that each gene in the IRGS model has an individual biological function; b–h MAP3K8 and its co-expressed genes are involved in multiple immune-related responses

Discussion

Tumour staging (AJCC/UICC-TNM classification) summarizes data for the tumour burden (T), the presence of cancer cells in draining and regional lymph nodes (N), and metastatic evidence (M) to evaluate the disease stage and predict survival. However, anatomical classification provides limited information for estimating cancer patient outcomes, and the response to treatment cannot be predicted by this system. In 2005, Pages et al. showed that immune cell infiltration and the tumour microenvironment are associated with tumour prognosis [15]. Two years later, Galon et al. proposed the concept of the immunoscore [16] and described key immune components associated with cancer patient survival [17]. The immunoscore was first described as a prognostic marker for early stage (I/II) colorectal cancer patients in 2009 [18]. There have been updated results supporting the implementation of the immunoscore as a new component of a TNM immune cancer classification system [19]. These studies suggest that immunological classification may be superior to the AJCC/UICC TNM classification system alone and that introducing immune-relevant parameters as prognostic factors could be of great clinical significance. The immunoscore method is based on the evaluation of the type, density, and location of immune cells [16] and fails to comprehensively consider the balanced conditions between tumour immune functions and host immune cells, which is a balance of multiple factors such as B cell immunity, T cell immunity, immune escape, cytotoxic effect on tumor cells by the host. An average immune related gene expression from RNA sequencing in bulk could reflect the status of immune cells and tumor cells, contributing better evaluation of disease stage and prediction of survival. Although it has been shown in different studies that a specific gene set can predict the prognosis of cancer patients [20, 21], the number of genes in a model is often beyond practical need [22] or of immune irrelevance [23], making translational feasibility low. With the development of tumour immunotherapy, the role of the immune system in the development of cancer has once again been taken into consideration. However, how immune genes in tumour cells affect cancer prognosis is still poorly understood and has not been well reported. Based on the TCGA-LUAD dataset and a meta-review of microarray data (clinical information listed in Additional file 9: Table S6 and Additional file 10: Table S7), we established and validated a prognostic model based on the expression of immune-related genes in tumour cells. Our system provides a model with high translational feasibility (current models are based on measurement of more than 10 gene expression). Secondly, the four-gene prognostic model based on an immune gene dataset is of cross-platform compatibility. A durable model between two platform with fewer genes which is comparable to the current established multiple genes model could have a better chance to translate into clinical practice, especially aiming at understanding tumor immune environment. To increase the utility of our model, we gradually reduced the number of differentially expressed genes from 3013 to 4 (MAP3K8, CCL20, VEGFC, and ANGPTL4) in the IRGS model by using the lasso algorithm and multivariate Cox regression analysis. In the IRGS system, we postulated that high scores are associated with increased levels of immune instability and an adverse tumour microenvironment in the cancer tissue and the IRGS genes collaboratively function as an indication of a poor prognosis in lung adenocarcinoma. This system reflects that high levels of cancer cell immune instability reduce the susceptibility of cancer cells to adjuvant treatment. The IRGS system can divide lung adenocarcinoma patients into low-risk and high-risk groups, assisting clinical decision-making by increasing the precision of tumour staging and risk stratification. Although current models are able to distinguish high-risk and low-risk patient groups [4, 5, 13], the IRGS model has a better discriminating power in RNA sequencing dataset, and the IRGS combined with TNM staging can increase the predictive capacity. Our functional analysis suggests that the genes in the IRGS model are widely involved in the immunological process. The inflammatory process is essential in the formation and development of cancer and may be involved in tumour growth and metastasis [24]. MAP3K8 (Serine/threonine protein kinase tumour progression locus 2) has been widely reported to be an important signal for inflammatory mediators, and mutational activation of MAP3K8 may be involved in the formation of lung cancer [25]. Su et al. showed that miRNAs might participate in lung cancer progression by regulating MAP3K8 [26]. The chemokine CCL20 is abnormally expressed in non-small cell lung cancer and plays important roles in tumour cell growth, invasion and metastasis [27]. For example, the expression of CCL20 in lung cancer cells is higher in patients with advanced lung cancer than in patients with early-stage lung cancer, and this high expression is associated with a poor prognosis. Bao et al. reported that the CCL20 gene may be a prognostic risk factor or protective factor for lung adenocarcinoma [28]. These observations mean that the unstable expression of these genes can cause different oncological outcomes. VEGF-C has been widely reported in lung cancer and is involved in lung cancer tumourigenesis and associated with lymphatic metastasis; it can be used as a biomarker for evaluating recurrence and prognosis in non-small cell lung cancer [29]. ANGPTL4 (Tumour cell-derived human angiopoietin-like protein 4) can damage vascular endothelial cell junctions, increase pulmonary capillary permeability, and promote the process of tumour cells protruding through the vascular endothelium. It has also been found to be involved in the growth and metastasis of lung cancer [30]. We present an IRGS prognostic model based on a second-generation sequencing technique on tumor samples, providing an efficient assessment of the overall tumour immune status. We noticed that there are some limitations in the current TCGA based LUAD survival model. We understand that there is a need to investigate the lung adenocarcinoma prognostic model based on its intrinsic heterogeneity. However, the current RNA sequencing databases underreported pathological details. Another scenario is that comorbidities have a major impact on the treatment of lung cancer and have proven to be associated with lung cancer survival [31]. Based on current databases, the data were obtained from operable samples (mainly stage I–III), making survival model as a potential tool for early intervention after surgery. Therefore, the use of these models on advance lung adenocarcinoma is limited. To solve the problems as mentioned above, separate databases built by corresponding samples are needed in further studies. We will include these factors in future model development if detailed information were provided from the published database and will focus on these issues at the time of building our inhouse database.

Conclusion

In summary, we identified and validated an immune-related gene expression score, which can be used as an independent prognostic signature in evaluating the survival of patients with lung adenocarcinoma. The IRGS signature was successfully validated with independent microarray data and indicated to be more efficient than existing lung cancer mRNA signatures. Further multicentre prospective validation in clinical trials is needed. Additional file 1: Figure S1. The evaluation of batch effects in TCGA data processing. (A) a dispersion separability criterion did not reveal a strong batch effects; (B) A venm diagram showed that a large overlap of differential expression genes exits. Additional file 2: Table S1. Differentially expressed genes in the TCGA-LUAD dataset. Additional file 3: Table S2. Candidate immune-related genes matched with differentially expressed genes in the TCGA dataset. Additional file 4: Table S3. Univariate and multivariate analyses of candidate genes for IRGS model development. Additional file 5: Figure S2. Survival analysis of TCGA LUAD data by TNM staging. (A) Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by T stage; (B) Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by N stage; (C) Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by M stage; and (D) Kaplan–Meier curves were plotted for the TCGA LUAD dataset stratified by TNM stage. Additional file 6: Table S4. Univariate and multivariate Cox regression analyses of the IRGS signature in the independent microarray dataset. Additional file 7: Figure S3. The IRGS system was compared with other gene models in GSE31210 dataset. (A) Kaplan–Meier curves were plotted for the GSE31210 dataset stratified by the three-gene classifier; (B) Kaplan–Meier curves were plotted for the GSE31210 dataset stratified by the CES signature; (C) Kaplan–Meier curves were plotted for the GSE31210 dataset stratified by the 14-gene assay; and (D) ROC curves for the IRGS signature, the three-gene classifier, the CES signature and the 14-gene practical assay in GSE31210 dataset were plotted. Additional file 8: Table S5. Significantly enriched immune pathways of gene in the IRGS model. Additional file 9: Table S6. Clinical information of TCGA LUAD data. Retrieved from the GDC website (TCGA-LUAD: https://portal.gdc.cancer.gov/projects/TCGA-LUAD). Additional file 10: Table S7. Clinical information of GSE31210 data. Retrieved from GEO website (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31210).
  31 in total

1.  A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies.

Authors:  Johannes R Kratz; Jianxing He; Stephen K Van Den Eeden; Zhi-Hua Zhu; Wen Gao; Patrick T Pham; Michael S Mulvihill; Fatemeh Ziaei; Huanrong Zhang; Bo Su; Xiuyi Zhi; Charles P Quesenberry; Laurel A Habel; Qiuhua Deng; Zongfei Wang; Jiangfen Zhou; Huiling Li; Mei-Chun Huang; Che-Chung Yeh; Mark R Segal; M Roshni Ray; Kirk D Jones; Dan J Raz; Zhidong Xu; Thierry M Jahan; David Berryman; Biao He; Michael J Mann; David M Jablons
Journal:  Lancet       Date:  2012-01-27       Impact factor: 79.321

2.  GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC.

Authors:  Ruidong Li; Han Qu; Shibo Wang; Julong Wei; Le Zhang; Renyuan Ma; Jianming Lu; Jianguo Zhu; Wei-De Zhong; Zhenyu Jia
Journal:  Bioinformatics       Date:  2018-07-15       Impact factor: 6.937

3.  Variations of chromosome 2 gene expressions among patients with lung cancer or non-cancer.

Authors:  Lianmin Bao; Yong Zhang; Jian Wang; Haiyun Wang; Nian Dong; Xiaoqiong Su; Menglin Xu; Xiangdong Wang
Journal:  Cell Biol Toxicol       Date:  2016-06-15       Impact factor: 6.691

4.  Comorbidity Profiles and Their Effect on Treatment Selection and Survival among Patients with Lung Cancer.

Authors:  Michael K Gould; Corrine E Munoz-Plaza; Erin E Hahn; Janet S Lee; Carly Parry; Ernest Shen
Journal:  Ann Am Thorac Soc       Date:  2017-10

Review 5.  The adaptive immunologic microenvironment in colorectal cancer: a novel perspective.

Authors:  Jérôme Galon; Wolf-Herman Fridman; Franck Pagès
Journal:  Cancer Res       Date:  2007-03-01       Impact factor: 12.701

6.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

Review 7.  Lung Adenocarcinoma Staging Using the 2011 IASLC/ATS/ERS Classification: A Pooled Analysis of Adenocarcinoma In Situ and Minimally Invasive Adenocarcinoma.

Authors:  Madhusmita Behera; Taofeek K Owonikoko; Anthony A Gal; Conor E Steuer; Sungjin Kim; Rathi N Pillai; Fadlo R Khuri; Suresh S Ramalingam; Gabriel L Sica
Journal:  Clin Lung Cancer       Date:  2016-03-30       Impact factor: 4.785

8.  In situ cytotoxic and memory T cells predict outcome in patients with early-stage colorectal cancer.

Authors:  Franck Pagès; Amos Kirilovsky; Bernhard Mlecnik; Martin Asslaber; Marie Tosolini; Gabriela Bindea; Christine Lagorce; Philippe Wind; Florence Marliot; Patrick Bruneval; Kurt Zatloukal; Zlatko Trajanoski; Anne Berger; Wolf-Herman Fridman; Jérôme Galon
Journal:  J Clin Oncol       Date:  2009-10-26       Impact factor: 44.544

9.  Mining featured micro ribonucleic acids associated with lung cancer based on bioinformatics.

Authors:  Lin Su; Na Li; Xueyun Huo
Journal:  Thorac Cancer       Date:  2015-07-02       Impact factor: 3.500

10.  Twenty-gene-based prognostic model predicts lung adenocarcinoma survival.

Authors:  Kai Zhao; Zulei Li; Hui Tian
Journal:  Onco Targets Ther       Date:  2018-06-12       Impact factor: 4.147

View more
  9 in total

1.  An immune-related model based on INHBA, JAG2 and CCL19 to predict the prognoses of colon cancer patients.

Authors:  Xuankun Yang; Jia Yan; Yahui Jiang; Yaxu Wang
Journal:  Cancer Cell Int       Date:  2021-06-08       Impact factor: 5.722

2.  Development and validation of an immune gene-set based prognostic signature for soft tissue sarcoma.

Authors:  Rui Shen; Bo Liu; Xuesen Li; Tengbo Yu; Kuishuai Xu; Jinfeng Ma
Journal:  BMC Cancer       Date:  2021-02-08       Impact factor: 4.430

3.  The Combined Detection of Immune Genes for Predicting the Prognosis of Patients With Non-Small Cell Lung Cancer.

Authors:  Wen-Juan Tian; Shan-Shan Liu; Bu-Rong Li
Journal:  Technol Cancer Res Treat       Date:  2020 Jan-Dec

4.  Multi-omics analysis of genomics, epigenomics and transcriptomics for molecular subtypes and core genes for lung adenocarcinoma.

Authors:  Yue Zhao; Yakun Gao; Xiaodong Xu; Jiwu Zhou; He Wang
Journal:  BMC Cancer       Date:  2021-03-09       Impact factor: 4.430

5.  A New Risk Score Based on Eight Hepatocellular Carcinoma- Immune Gene Expression Can Predict the Prognosis of the Patients.

Authors:  Dingde Ye; Yaping Liu; Guoqiang Li; Beicheng Sun; Jin Peng; Qingxiang Xu
Journal:  Front Oncol       Date:  2021-11-19       Impact factor: 6.244

6.  Prognostic Index for Nonsmall Cell Lung Cancer Based on Immune-Related Genes Expression.

Authors:  Ying Cao; Hongyu Zhu; Hailin Shen; Desen Liu; Zhenkai Li; Hailong Shang; Hongdi Du; Ying Wang; Juan Ye
Journal:  Biomed Res Int       Date:  2022-09-19       Impact factor: 3.246

7.  Comprehensive profiling of immune-related genes in soft tissue sarcoma patients.

Authors:  Chuan Hu; Bo Chen; Zhangheng Huang; Chuan Liu; Lin Ye; Cailin Wang; Yuexin Tong; Jiaxin Yang; Chengliang Zhao
Journal:  J Transl Med       Date:  2020-09-01       Impact factor: 5.531

8.  A novel immune-related gene signature predicts survival in esophageal squamous cell carcinoma.

Authors:  Tao Xu; Tianyang Dai; Peiyuan Zeng; Yunfen Guo; Kaiming He
Journal:  Transl Cancer Res       Date:  2021-05       Impact factor: 1.241

9.  Identification of Hypoxia Signature to Assess the Tumor Immune Microenvironment and Predict Prognosis in Patients with Ovarian Cancer.

Authors:  Chunyan Wei; Xiaoqing Liu; Qin Wang; Qipei Li; Min Xie
Journal:  Int J Endocrinol       Date:  2021-12-14       Impact factor: 3.257

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.