Yue Zhan1, Xin Guan1, Yu Zhang1, Zhenhua Zhu2, Aiping Shi1, Zhimin Fan1. 1. Department of Breast Surgery, The First Hospital of Jilin University, Changchun, China. 2. Department of Orthopaedic Trauma, The First Hospital of Jilin University, Changchun, China.
Abstract
Background: Although breast cancer outcome has improved significantly with the recent use of molecularly targeted agents, reliable prognostic signatures are still unavailable because of tumor heterogeneity. Immune processes play an important role in tumor progression. Therefore, the aim of this study was to construct a prognostic signature based on immune-related genes (IRGs). Methods: Clinical information and gene expression of 3,496 patients were extracted from eight public data sets. A total of 2,498 IRGs associated to 17 immune processes were downloaded from the ImmPort database. RNA sequencing (RNAseq) datasets [The Cancer Genome Atlas (TCGA) and GSE96058] were used as the training set (n=2,736) and all microarray datasets were used as validation set (n=760). IRGs related to prognosis were screened out from the training set and used to construct gene pairs. The Cox regression model was used, based on the immune-related gene pairs (IRGPs). The risk score of each patient was calculated and patients were stratified into high- and low-risk groups according to the optimal threshold of the risk score. Immune cell infiltration was evaluated between both groups. Results: Among the 129 prognostic-related immune genes, 8,256 IRGPs were constructed. After screening, 89 IRGPs, including 86 unique IRGs, were used in the prognostic prediction model. Patients in the high-risk group exhibited a significantly poorer overall survival (OS) both in the training set [hazard ratio (HR): 5.9, 95% confidence interval (CI): 4.61-7.54] and validation set (HR: 1.52, 95% CI: 1.16-1.98) compared to the low-risk group. In addition, patients in the high-risk group showed a significantly lower infiltration of CD8+ T cells than patients in the low-risk group. Conclusions: An independent IRGP signature was constructed. Through pairwise comparison of a set of genes, the OS of patients could be predicted. This method avoids the impact of the batch effect caused by different sequencing platforms and has a promising application prospect. 2022 Translational Cancer Research. All rights reserved.
Background: Although breast cancer outcome has improved significantly with the recent use of molecularly targeted agents, reliable prognostic signatures are still unavailable because of tumor heterogeneity. Immune processes play an important role in tumor progression. Therefore, the aim of this study was to construct a prognostic signature based on immune-related genes (IRGs). Methods: Clinical information and gene expression of 3,496 patients were extracted from eight public data sets. A total of 2,498 IRGs associated to 17 immune processes were downloaded from the ImmPort database. RNA sequencing (RNAseq) datasets [The Cancer Genome Atlas (TCGA) and GSE96058] were used as the training set (n=2,736) and all microarray datasets were used as validation set (n=760). IRGs related to prognosis were screened out from the training set and used to construct gene pairs. The Cox regression model was used, based on the immune-related gene pairs (IRGPs). The risk score of each patient was calculated and patients were stratified into high- and low-risk groups according to the optimal threshold of the risk score. Immune cell infiltration was evaluated between both groups. Results: Among the 129 prognostic-related immune genes, 8,256 IRGPs were constructed. After screening, 89 IRGPs, including 86 unique IRGs, were used in the prognostic prediction model. Patients in the high-risk group exhibited a significantly poorer overall survival (OS) both in the training set [hazard ratio (HR): 5.9, 95% confidence interval (CI): 4.61-7.54] and validation set (HR: 1.52, 95% CI: 1.16-1.98) compared to the low-risk group. In addition, patients in the high-risk group showed a significantly lower infiltration of CD8+ T cells than patients in the low-risk group. Conclusions: An independent IRGP signature was constructed. Through pairwise comparison of a set of genes, the OS of patients could be predicted. This method avoids the impact of the batch effect caused by different sequencing platforms and has a promising application prospect. 2022 Translational Cancer Research. All rights reserved.
Entities:
Keywords:
Breast cancer; gene pairs; immune-related genes (IRGs); prognosis; signature
Breast cancer is the most common malignancy affecting women worldwide (1). In 2018, a new case was reported every 18 seconds, and 2.1 million women were diagnosed with breast cancer (2). The global incidence of breast cancer increases by 3.1% every year, from 641,000 cases in 1980 to 1.6 million in 2010 (3).The precise mechanisms of how breast cancer emerges remain unclear (4); however, the oncogenesis and the development of breast cancer are closely related to immunity (5). The breast cancer microenvironment contains a large number of lymphocytes, macrophages, and bone marrow-derived stromal cells, and most of these cell types are involved in the immune response (6). In addition, the number of tumor-infiltrating lymphocytes reflects the strength of the immune response, which has a positive effect on the immune response and the prognosis of breast cancer patients after specific treatment (7). Previous studies reported that the immune microenvironment during the early stages of tumorigenesis mainly plays an anti-tumor role through the cytokines produced by activated CD8+ and CD4+ T cells (1). This suggests that the status of immunity can reflect the patient’s prognosis.Regarding breast cancer, the presence and number of metastases axillary nodes is the most important prognostic marker (8). However, the extent of axillary nodes does not actually reflect prognosis, as Jennifer (9) reported that about 30% of untreated breast cancer patients without node metastasis developed metastasis/recurrent 10 years later. However, about 50% of patients with node involvement could be cured by local treatment. Tumor size and grade are the other two widely used clinical markers (10-12). Because of tumor heterogeneity, the same tumor size or grade does not share a common pathological outcome. Because personalized treatment is promoting, clinical prognostic markers (e.g., tumor size, tumor grade, and lymph node metastases) are not sufficient for the suitable management of early patient diagnosis. Biomarkers have become new tools in the early diagnosis of tumors. Several attempts have been made to construct prognostic models using gene expression data, and good prognostic efficacy has been observed in individual datasets (13-15). However, challenges still remain, such as the overfitting of the data and lack of sufficient validation. Currently, several different sequencing platforms are available, and the data obtained from these platforms by different strategies may yield batch effects and have a significant impact on the results (16,17). The key will be to find a way to make use of gene expression data while avoiding the influence of different gene testing methods. Therefore, in this study, a novel method based on relative gene expression is proposed to reduce the adverse effects introduced by the batch effect and data processing. This approach has been successfully used in the past for predicting the prognosis of several tumors, such as colorectal cancer and serous ovarian carcinoma (18-21). Specifically, in this study, immune-related gene pairs (IRGPs) were constructed to develop a prognostic signature for breast cancer. We present the following article in accordance with the TRIPOD checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-21-2309/rc).
Methods
Public datasets
The whole analysis process is shown in . The Cancer Genome Atlas (TCGA) is a cancer genomics program led by the National Cancer Institute and National Human Genome Research Institute, which contains the genomic data of 33 different cancer types (22). For this study, the RNA sequencing (RNAseq) Level 3 data and clinical information of the breast cancer (BRCA) project were directly downloaded from the TCGA. Due to lack of recording, not all clinical information of patients can be provided. Gene Expression Omnibus (GEO) is a public functional genomics data repository for array- and sequence-based data. Thus, normalized RNAseq or array data of breast cancer samples were retrieved from the GEO database. The search criteria on the GEO database were as follows: (I) data of breast cancer samples having a sample size of more than 50; (II) data on clinical information, especially the survival status and the last follow-up time; and (III) RNAseq data or array data from the HG-U133_Plus_2 or HG-U133A platform. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Since the data were de-identified and publicly available, no institutional review board approval was necessary and no informed consent was signed for this study. Regarding gene screening, the immunity dataset, which contains 2,498 immune genes of 17 immune processes, was downloaded from the ImmPort database (https://immport.niaid.nih.gov/home). Finally, two RNAseq datasets, such as the TCGA and GSE96058, were used as training set for the identification of the signature, and six microarray datasets (GSE7390, GSE124647, GSE42568, GSE20711, GSE48391, and GSE20685) were used as validation set for validating the signature. Patients who received chemotherapy or for which no clear survival information was available were excluded from the datasets. Overall, a total of 3,496 cases were analyzed.
Figure 1
Flow chart of the data analysis employed in this study. BRCA, breast cancer; ICPI, immune-clinical prognostic index; TCGA, The Cancer Genome Atlas.
Flow chart of the data analysis employed in this study. BRCA, breast cancer; ICPI, immune-clinical prognostic index; TCGA, The Cancer Genome Atlas.
Data processing
In this study, to illustrate that the developed model is valid for different types of gene data, training and validation sets were generated according to the platforms. Regarding the sequencing datasets (TCGA and GSE96058), the normalized data of the fragments per kilobase of transcript per million mapped reads (FPKM) was downloaded from the corresponding database and used as training set. Regarding the microarray data derived from the Affymetrix company (including HG-U133_plus_2 or HG-U133A platforms) in the validation set, raw microarray data were downloaded. Then, the background was corrected and subjected to quantile normalization using the Robust Multichip Average (RMA) function of the package affy (v 1.50.0), using default parameters (23).
Screening of the immune-related prognostic genes
Genes were screened prior to the construction of gene pairs. First, the genes in the TCGA and GSE96058 datasets were screened separately. Genes with an average expression in the top 50% of each dataset were selected and considered sufficiently expressed genes. Genes with a mean absolute deviation (MAD) in the top 30% of the sufficiently expressed genes were selected and considered informatic genes of each dataset. Only informatic genes that were present in both TCGA and GSE96058 datasets, and the immune gene set were used, and were subjected to Cox survival regression analysis in the training set (including TCGA and GSE96058). In this study, overall survival (OS) was used as prognosis outcome. Genes that were significantly correlated (P<0.05) with OS were selected for the construction of gene pairs.
Construction and screening of IRGPs
Pairwise pairing was used to construct the IRGPs in the training set. An IRGP consisted of two genes. If the expression value of the first gene was lower than the expression value of the second gene, the value of this IRGP was considered to be 1. Otherwise, it was considered to be 0. IRGPs that were 1 or 0 in more than 90% of the samples in the training set were removed. The score of IRGPs was used to build the prognostic signature. The least absolute shrinkage and selection operator (Lasso)-Cox regression analysis were used to streamline IRGPs. In this study, 200 times of 10-fold cross-validation Lasso-Cox regression analysis was performed to select the parameter lambda in the Lasso-Cox regression model, thus determining the complexity of the model. For each running time, the lambda.min, which was the optimal lambda for the regression model, was extracted. The median lambda.min of the 200 times of 10-fold cross-validation was used in the final Lasso-Cox regression model. IRGPs with non-zero coefficients in the final Lasso-Cox regression model were taken as candidate IRGPs for constructing the prediction model.
Construction and validation of the prediction model based on the immune related gene pair index (IRGPI)
In this section, the risk scoring system is constructed. After the selection of gene pairs, the general Cox regression model was constructed based on the 89 candidate IRGPs in the training set. The risk scores of each sample in the training set and validation set were calculated based on the constructed Cox regression model. The optimal thresholds were selected using the survivalROC package (v 1.0.3) to stratify samples into high- and low-risk groups with risk score (24). The receiver operating characteristic (ROC) curve of the three-year survival in the training set was plotted, and the risk score at the point closest to the coordinate (0,1) on the curve was selected as optimal threshold. At this point, the highest predicted specificity and sensitivity could be achieved. Subsequently, the log-rank survival analysis was performed on the high- and low-risk group in both the training set and validation set.
Immune cell infiltration and gene ontology (GO) analysis
According to the RNAseq data, the infiltration of the immune cells in the samples from the TCGA and GSE96058 datasets was evaluated using the online CIBERSORT platform (https://cibersort.stanford.edu/) (11,25). The function of the IRGPs that were used to construct the Cox regression model was explored by GO enrichment analysis using the package clusterProfile (v 3.11) under default options (26), with q-value <0.05.
Construction and validation of a composite immune-clinical prognostic index (ICPI)
Univariate Cox regression analysis was performed for the risk score and other clinical characteristics [e.g., age, estrogen receptor (ER) status, human epidermal growth factor receptor 2 (HER2) status, node, and molecular subtype] to identify prognostic factors. Variables that were statistically significant in the univariate Cox regression analysis were then included in the multivariate Cox regression analysis. In the multivariate Cox regression analysis, adjustment analysis was performed to identify independent prognostic factors considering confounding effects. Then, the variables that were statistically significant in the multivariable Cox regression analysis were included in the final Cox regression model in the training set to further improve predictive ability. In the model, age was used as continuous variable and the HER2 status was used as binary variable, where a positive HER2 status was defined as 1 and a negative HER2 status as 0. The prognostic performance of the ICPI and risk score were evaluated in terms of the C-index.
Statistical analysis
Statistical analysis was performed using the R language (v 3.6.3) and associated packages. Lasso-Cox regression analysis was performed using the glmnet package (version 4.0). The optimal threshold for the risk score was calculated using the survivalROC package (version 1.0.3), and Cox regression analysis was performed using the survival package (version 3.1). GO enrichment analysis was performed using the clusterprofile package (version 3.11), and the C-index was calculated using the survcomp package (version 3.11) (27). P<0.05 was considered to indicate statistically significant differences.
Results
Patients stratification into high- and low-risk group using IRGPs based prediction model
In this study, a total of 2,736 patients were included in the training set and 760 patients were included in the validation set (, Table S1). A total of 129 immune-related genes (IRGs) were screened out according to gene expression and prognosis in the training set. Then, 8,256 gene pairs were constructed based on these IRGs. After removal of gene pairs with values of 1 or of 0 in more than 90% of samples, 5,144 gene pairs were used. The optimal lambda.min value in the Lasso-Cox regression model was 0.014, and 89 gene pairs were used under this lambda. Detailed information on the 89 gene pairs is provided in Table S2. The Cox regression model was constructed using these 89 gene pairs. The risk scores of all samples in the training set and validation set were calculated and were based on the 89 gene pairs of each sample using the constructed Cox regression model. The optimal threshold in the time-dependent ROC curve analysis for classifying the samples into high- and low-risk group was set to 0.81 ().
Table 1
Clinical characteristics of patients in training set and validation set
Characteristics
Training set
Meta-validation set (n=760)
P value
TCGA (n=763)
GSE96058 (n=1,973)
Age, years, median [range]
58 [26–90]
68 [34–96]
52 [24–91]
<0.0001
ER status
<0.0001
Positive
414
1,865
296
Negative
122
55
171
HER2 status
<0.0001
Positive
72
80
60
Negative
454
1,818
109
PR status
<0.0001
Positive
366
1,693
80
Negative
169
144
60
Node status
0.7317
Positive
269
499
118
Negative
274
1,414
272
Molecular subtype
<0.0001
Luminal A
344
1,282
45
Luminal B
160
380
37
HER2-enriched
52
98
35
Basal-like
111
85
32
Normal-like
96
128
13
Stage
NA
I
149
–
–
II
318
–
–
III
57
–
–
IV
16
–
–
ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; NA, not available; PR, progesterone receptor; TCGA, The Cancer Genome Atlas.
Figure 2
Time-dependent receiver operating characteristic (ROC) curve for screening the optimal threshold. The red dot in the figure (indicating the shortest total distance from 100% sensitivity and 100% specificity) was selected as optimal threshold. Maximum specificity (0.76) and sensitivity (0.83) were achieved at a threshold of 0.81. The area under the ROC curve (AUC) indicated the efficacy of the immune-related gene pair index (IRGPI) in the prediction of 3-year survival.
ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; NA, not available; PR, progesterone receptor; TCGA, The Cancer Genome Atlas.Time-dependent receiver operating characteristic (ROC) curve for screening the optimal threshold. The red dot in the figure (indicating the shortest total distance from 100% sensitivity and 100% specificity) was selected as optimal threshold. Maximum specificity (0.76) and sensitivity (0.83) were achieved at a threshold of 0.81. The area under the ROC curve (AUC) indicated the efficacy of the immune-related gene pair index (IRGPI) in the prediction of 3-year survival.
Risk score as an indicator of patients’ prognosis in breast cancer
Patients in the training and validation sets were divided into high- and low-risk groups. The high-risk patients in the training set had a significantly poorer OS prognosis compared to low-risk patients [hazard ratio (HR): 5.9, 95% confidence interval (CI): 4.61–7.54, P<0.0001]. The subgroup analysis of the ER status, HER2 status, node status, and molecular subtype demonstrated consistent results. The progesterone receptor (PR) status and tumor stage information were available in the TCGA datasets, and the subgroup analysis of the PR status and stage demonstrated significantly poorer OS in patients in the high-risk group than in patients in the low-risk group. The high-risk patients in the validation set had a significantly poorer OS compared to the low-risk patients (HR: 1.52, 95% CI: 1.16–1.98) (). In addition, all subgroups in the subgroup analysis of the validation set, except for the basal-like group, exhibited poorer prognosis in the high- risk group compared to the low-risk group (HR >1) ().
Figure 3
Forest plot of patients in training sets and validation sets. Patients were divided into high- and low-risk group. hazard ratio (HR) >1 indicates a poorer prognosis in the high-risk group compared to the low-risk group. The box and horizontal line indicate the HR and 95% confidence interval (CI) of each group. ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; PR, progesterone receptor; TCGA, The Cancer Genome Atlas.
Forest plot of patients in training sets and validation sets. Patients were divided into high- and low-risk group. hazard ratio (HR) >1 indicates a poorer prognosis in the high-risk group compared to the low-risk group. The box and horizontal line indicate the HR and 95% confidence interval (CI) of each group. ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; PR, progesterone receptor; TCGA, The Cancer Genome Atlas.
IRGs included in the prediction model as indicator of immune cell infiltration and immune processes
The infiltration of CD8+ T cells was significantly lower in the high-risk group in the training set (including TCGA and GSE96058) compared to the low-risk group (, Figure S1). Furthermore, the enrichment analysis of 89 unique genes of the 86 gene pairs in the immune-related risk model demonstrated that these genes mainly played a role in cell proliferation, adhesion, activation, and other functions of immune cells ().
Figure 4
CD8+ T cell infiltration in the training set was significantly lower in the high-risk groups of both the GSE96058 and TCGA datasets. The horizontal line inside the box indicates the median (Q2) score of the CD8+ T cell infiltration. The upper and lower edges of the box represent the 75th percentile (Q3) and 25th percentile (Q1), respectively. The upper and lower horizontal lines represent the upper (Q3 + 1.5× IQR) and lower (Q1 − 1.5× IQR) bounds in the data, respectively. The dots represent outliers (values exceeding the upper bound). IQR = Q3 − Q1. ****, P<0.0001. BRCA, breast cancer; IQR, interquartile range; TCGA, The Cancer Genome Atlas.
Figure 5
Gene ontology (GO) enrichment items of the immune-related gene pairs used in the prediction model. Items were divided into three categories (i.e., biological process, molecular function, and cellular component), and arranged by q-value. **, q<0.01; ***, q<0.001; ****, q<0.0001.
CD8+ T cell infiltration in the training set was significantly lower in the high-risk groups of both the GSE96058 and TCGA datasets. The horizontal line inside the box indicates the median (Q2) score of the CD8+ T cell infiltration. The upper and lower edges of the box represent the 75th percentile (Q3) and 25th percentile (Q1), respectively. The upper and lower horizontal lines represent the upper (Q3 + 1.5× IQR) and lower (Q1 − 1.5× IQR) bounds in the data, respectively. The dots represent outliers (values exceeding the upper bound). IQR = Q3 − Q1. ****, P<0.0001. BRCA, breast cancer; IQR, interquartile range; TCGA, The Cancer Genome Atlas.Gene ontology (GO) enrichment items of the immune-related gene pairs used in the prediction model. Items were divided into three categories (i.e., biological process, molecular function, and cellular component), and arranged by q-value. **, q<0.01; ***, q<0.001; ****, q<0.0001.
Integrated risk score with clinical characteristics to achieve higher predictive ability
Univariate Cox regression analysis showed that risk score, age, ER status, HER2 status, and node status were significantly correlated with clinical prognosis (P<0.05). The above-mentioned variables were analyzed using multivariable Cox regression analysis where only risk score, age, and HER2 status significantly correlated with OS (P<0.05; ). The results indicated that the prognostic effect of the risk score was independent from other covariates, such as age and HER2 status. A novel prognostic index ICPI was constructed by combining age, HER2 status, and IRGPI in the Cox regression model. The new ICPI had a median c-index of 0.84 (range, 0.82–0.86), which was higher than that of the median risk score 0.82 (range, 0.72–0.84) alone in the training set. Next, a nomogram was generated as clinical reference, which included age, HER2 status, and risk score (). The risk score of a patient was calculated based on the expression of the gene pairs and combined with the patient’s age and HER2 status to calculate a total score, from which both the 3- and 5-year survival rates could be predicted in the nomogram.
Table 2
Cox regression analysis of clinical characteristics
Characteristics
Univariate regression analysis
Multivariate regression analysis
Hazard ratio
95% CI
P value
Hazard ratio
95% CI
P value
Age
1.06
1.05–1.07
<0.0001
1.05
1.03–1.06
<0.0001
ER status
0.59
0.41–0.85
0.005
1.23
0.85–1.77
0.28
HER2 status
1.90
1.34–2.68
<0.0001
1.49
1.05–2.14
0.027
Node
1.59
1.27–1.99
<0.0001
1.18
0.97–1.56
0.095
Risk score
2.72
2.48–2.98
<0.0001
2.56
2.27–2.81
<0.0001
Subtype
1.09
0.97–1.23
0.146
–
–
–
CI, confidence interval; ER, estrogen receptor; HER2, human epidermal growth factor receptor 2.
Figure 6
Nomogram for the prediction of 3- and 5-year survival rates. If the patient’s age, HER2 status, and risk score were known, the total score could be calculated and the 3- and 5-year survival rates could be predicted through the nomogram. For example, one woman, who was 30 years old, HER2 positive, and had a risk score of 3, had a total point of 8.4 (0.2+0.2+8). According to this total point, she was predicted to have 3- and 5-year survival rates of 0.88 and 0.7, respectively. HER2, human epidermal growth factor receptor 2.
CI, confidence interval; ER, estrogen receptor; HER2, human epidermal growth factor receptor 2.Nomogram for the prediction of 3- and 5-year survival rates. If the patient’s age, HER2 status, and risk score were known, the total score could be calculated and the 3- and 5-year survival rates could be predicted through the nomogram. For example, one woman, who was 30 years old, HER2 positive, and had a risk score of 3, had a total point of 8.4 (0.2+0.2+8). According to this total point, she was predicted to have 3- and 5-year survival rates of 0.88 and 0.7, respectively. HER2, human epidermal growth factor receptor 2.
Discussion
Breast cancer is the most common malignancy affecting women worldwide (2). Breast cancer prognosis has been markedly improved because of the establishment of different molecular subtypes and the use of targeted drugs. However, the prognosis of tumors of different molecular subtype still varies significantly (28). Therefore, the establishment of a prognostic system, independent of molecular subtypes may help to better understand the disease and promote a more personalized treatment. The development of high-throughput sequencing technology presents new opportunities because the expression of tens of thousands of genes can provide high dimension information that may enable a better evaluation of the patients’ condition. Increasing attention has been focused on the role of immune processes in tumors, as they may reflect tumor prognosis and response to treatment in a certain extent.In this study, IRGs were used to construct gene pairs, which in turn were used to construct a Cox regression model to predict patients’ prognoses. High-dimensional data of gene expression were used while reducing the impact of the batch effects from different sequencing platforms in the model. The results showed that patients in the high-risk group had significantly poorer prognosis when compared to patients in the low-risk group in all subgroups in the training set. Patients in the high-risk group in the meta-validation set had poorer prognosis compared to patients in the low-risk group in most of the subgroups (HR >1). Taken together, these results suggested that the risk score is an independent prognostic factor, which was confirmed by multivariate Cox regression analysis.Immune cell infiltration analysis revealed a reduction in immune cell infiltration, especially in CD8+ T cells, in breast cancer patients in the high-risk group. CD8+ T cells participate in the adaptive immune response and are the main immune cells involved in immune surveillance (29). Once tumor cells are identified in the body, CD8+ T cells are activated by the T cell receptor (TCR) antigen recognition, and rapidly undergo proliferation and differentiation into cytotoxic T lymphocytes (CTLs) to destroy tumor cells through cell-cell contact (30). Previous studies showed that CD8+ T cells can be used as part of the immune score to better evaluate prognosis regardless of the patient’s tumor stage instead of the standard pathological criteria (31). This partially explains the poor prognosis of the high-risk group. Furthermore, GO enrichment analysis showed that the IRGs used in the prediction model primarily played a role in immune cell activation. These findings indicate that the risk score could in part reflect the immune activation state.Next, the individual’s prognostic clinical factors, such as age and HER2 status, were combined with the risk score, and a higher prognosis prediction accuracy was obtained. Thus, the results suggest that clinical data, especially age and HER2 status, are still important prognostic indicators, that can be used to help correct predicted results.This study has certain limitations. Currently, RNAseq and microarray are expensive techniques and a long time is needed to perform them. Therefore, performing these techniques in a standard clinical practice currently remains challenging. In addition, details regarding patient follow-up, which represent an important factor affecting prognosis, remain limited. Patients from different data sets showed significant differences in the baseline level, which also influenced the accuracy of the prediction model. Therefore, additional multi-center clinical studies are required to validate these results. The analysis of the immune cell infiltration is based on the training model CIBERSORT, and differences with the actual situation may be present.In conclusion, an independent IRGP signature was constructed. Through pairwise comparison of a set of genes, the OS of patients could be predicted. This method avoids the impact of the batch effect caused by different sequencing platforms and has a promising application prospect.
Authors: Rafael A Irizarry; Benjamin M Bolstad; Francois Collin; Leslie M Cope; Bridget Hobbs; Terence P Speed Journal: Nucleic Acids Res Date: 2003-02-15 Impact factor: 16.971
Authors: Fatima Cardoso; Laura J van't Veer; Jan Bogaerts; Leen Slaets; Giuseppe Viale; Suzette Delaloge; Jean-Yves Pierga; Etienne Brain; Sylvain Causeret; Mauro DeLorenzi; Annuska M Glas; Vassilis Golfinopoulos; Theodora Goulioti; Susan Knox; Erika Matos; Bart Meulemans; Peter A Neijenhuis; Ulrike Nitz; Rodolfo Passalacqua; Peter Ravdin; Isabel T Rubio; Mahasti Saghatchian; Tineke J Smilde; Christos Sotiriou; Lisette Stork; Carolyn Straehle; Geraldine Thomas; Alastair M Thompson; Jacobus M van der Hoeven; Peter Vuylsteke; René Bernards; Konstantinos Tryfonidis; Emiel Rutgers; Martine Piccart Journal: N Engl J Med Date: 2016-08-25 Impact factor: 91.245
Authors: Aaron M Newman; Chih Long Liu; Michael R Green; Andrew J Gentles; Weiguo Feng; Yue Xu; Chuong D Hoang; Maximilian Diehn; Ash A Alizadeh Journal: Nat Methods Date: 2015-03-30 Impact factor: 28.547