Literature DB >> 29282891

Identification of a novel gene pairs signature in the prognosis of gastric cancer.

Pai-Lan Peng1,2, Xiang-Yu Zhou2, Guo-Dong Yi2, Peng-Fei Chen2,3, Fan Wang3, Wei-Guo Dong1.   

Abstract

Current prognostic signatures need to be improved in identifying high-risk patients of gastric cancer (GC). Thus, we aimed to develop a reliable prognostic signature that could assess the prognosis risk in GC patients. Two microarray datasets of GSE662254 (n = 300, training set) and GSE15459 (n = 192, test set) were included into analysis. Prognostic genes were screened to construct prognosis-related gene pairs (PRGPs). Then, a penalized Cox proportional hazards regression model identified seven PRGPs, which constructed a prognostic signature and divided patients into high- and low-risk groups according to the signature score. High-risk patients showed a poorer prognosis than low-risk patients in both the training set (hazard ratios [HR]: 6.086, 95% confidence interval [CI]: 4.341-8.533) and test set (1.773 [1.107-2.840]). The PRGPs signature also achieved a higher predictive accuracy (concordance index [C-index]: 0.872, 95% CI: 0.846-0.897) than two existing molecular signatures (0.706 [0.667-0.744] for a 11-gene signature and 0.684 [0.642-0.726] for a 24-lncRNA signature) and TNM stage (0.764 [0.715-0.814]). In conclusion, our study identified a novel gene pairs signature in the prognosis of GC.
© 2017 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

Entities:  

Keywords:  Gastric cancer; gene pairs; prognosis; signature

Mesh:

Substances:

Year:  2017        PMID: 29282891      PMCID: PMC5806102          DOI: 10.1002/cam4.1303

Source DB:  PubMed          Journal:  Cancer Med        ISSN: 2045-7634            Impact factor:   4.452


Introduction

Gastric cancer (GC) is one of the most common cancers around the world, with an estimated 951,600 cases and 723,100 deaths per year 1. In spite of great improvements in chemo‐, radio‐, and surgical treatment, the 5‐year overall survival rate remains unsatisfactory. This is mainly caused by advanced stages at diagnosis and high recurrence rates after treatment. Currently, TNM (tumor/node/metastasis) staging system has been widely used for prognostic prediction. However, some patients with the same TNM stage and treatment might have various clinical outcomes. Thus, it is necessary to identify the subset of patients at high risk for recurrence and death, and provide timely intervention. The availability of large‐scale gene expression profiles brings the chance to identify more reliable prognostic signatures in various cancers. Several studies have proposed gene‐expression prognostic signatures in GC 2, 3, 4, 5, 6. However, the models based on gene expression levels of one dataset were difficult to apply in another dataset directly, considering batch effects 7. Instead, the methods based on relative ranking of gene expression levels can be used without the need for eliminating batch effects 8, 9. In this study, we constructed prognosis‐related gene pairs (PRGCs) to develop and validate a novel prognostic signature for GC.

Method

Data collection

Normalized gene expression profiles of GC were downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). Microarray datasets were selected if fulfilling the following criteria: (1) based on the chip platform of Affymetrix Human Genome U133 Plus 2.0 Array (GPL570); (2) availability of related clinical data, especially follow‐up time and survival status; (3) sample size of more than 150. Finally, two datasets (GSE62254 and GSE15459) were included in this study. GSE62254 (n = 300) was used as a training set for signature identification, while GSE15459 (n = 192) was used as a test set for signature validation.

Data preprocessing

The probe IDs were matched to gene symbols using the Affymetrix annotation file (http://www.affymetrix.com). When multiple probes matched to an identical gene symbol, we selected the probe ID with the largest inter‐quartile range (IQR) of expression values among all multiple probe IDs to represent the gene, which was biologically more reasonable and robust than the average method 10.

Prognostic genes screening

Prior to analysis, we calculated mean intensities of each gene across all samples, and filtered out the un‐expressed genes (the smallest 20% rank sum of mean intensity) and un‐informative genes (the lowest 20% rank sum of standard deviations) to decrease the false discoveries. Then, survival analysis was conducted in the training set to screen prognostic genes in GC, using the log‐rank test and permutation method (n = 300).

Construction of a prognostic signature based on PRGPs

The expression level of prognostic genes in a specific sample underwent pair‐wise comparison to generate a score for each PRGP. If PRG 1 was more than PRG2, a PRGP score of 1 was assigned; otherwise, the PRGP score was 0. The PRGPs score profile was used to build the prognostic signature. To minimize the risk of over‐fitting, we used a Cox proportional hazards regression model combined with the least absolute shrinkage and selection criteria operator (glmnet, version 2.0‐10) 11. The penalty parameter was estimated by 10‐fold cross‐validation at 1SE beyond the minimum partial likelihood deviance.

Validation and evaluation of the PRGPs signature

The patients were divided into high‐ and low‐risk groups according to the PRGPs score cutoff, which was determined by a time‐dependent receiver operating characteristic (ROC) curve (survivalROC, version 1.0.3) at 5 years 12. We adopted the nearest neighbor estimation (NNE) method to estimate the ROC curve. The score corresponding to the shortest distance between the ROC curve and the point of 100% true positive and 0% false positive was used as the cutoff value. Survival differences between the high‐ and low‐risk groups were assessed by the Kaplan–Meier estimate and compared using the log‐rank test. To validate the signature, we calculated the PRGPs score profile in the test set, classified the patients into high‐ and low‐risk groups using the same cutoff value. We also compared the prognostic accuracy of PRGPs with two existing molecular signatures in terms of time‐dependent area under ROC curve (AUC) and concordance index (C‐index) (survcomp, version 1.22.0 and compareC, version 1.3.1) 13.

Gene set enrichment analysis

To identify potential biological processes related with the risk based on the PRGPs signature, Gene set enrichment analysis (GSEA) (http://software.broadinstitute.org/gsea/index.jsp) was conducted to detect whether a series of priori defined biological processes were enriched in the gene rank derived from differentially expressed genes (DEGs) between the high‐ and low‐risk groups. False discovery rate (FDR) <0.05 was chosen as the cut‐off criteria.

Statistical analysis

All statistical analyses were performed using R (version 3.3.1, https://www.r-project.org/). For use with GSEA software, the collection of annotated gene sets of h.all.v5.2.symbols.gmt in Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp) was chosen as the reference gene sets. A two‐sided P value <0.05 was considered statistically significant.

Results

Construction and definition of the PRGPs signature

In the training set, there were a total of 300 GC patients (199 male [66%] and 101 female [34%]; median age [range]: 64 [24-86] years). A total of 518 prognostic genes were identified to construct 133903 PRGPs. Then, we constructed a risk score consisting of seven PRGPs using L1‐penalized Cox proportional hazards regression (Fig. 1). The PRGPs signature consisted of 12 unique prognostic genes (ACOT7, CES1, IPMK, NES, PBX3, TMEM245, MIR6756, RAB11FIP4, RBPMS2, RPS27L, TPMT, and TNFRSF11A) (Table 1). In time‐dependent ROC curve analysis, the optimal cutoff for the signature to classify patients into high‐ and low‐risk groups was set at −0.154. High‐risk patients showed a poorer prognosis than low‐risk patients (hazard ratios [HR]: 6.086, 95% confidence interval [CI]: 4.341–8.533) (Fig. 2). Subgroup analysis showed consistent results (Table   2).
Figure 1

Distribution of the patients’ survival status and risk scores of the prognosis‐related gene pairs (PRGPs) signature.

Table 1

Signature information

Gene pair 1Full nameGene pair 2Full nameCoefficient
ACOT7Acyl‐CoA thioesterase 7MIR6756microRNA 6756−0.038007046
CES1Carboxylesterase 1RAB11FIP4RAB11 family interacting protein 40.024780097
IPMKInositol polyphosphate multikinaseRBPMS2RNA‐binding protein with multiple splicing 2−0.190466701
NESNestinRPS27LRibosomal protein S27‐like0.043310157
NESNestinTPMTThiopurine S‐methyltransferase0.010844818
PBX3Pre‐B‐cell leukemia homeobox 3TNFRSF11ATNF receptor superfamily member 11a0.048826825
TMEM245Transmembrane protein 245TNFRSF11ATNF receptor superfamily member 11a0.014605079
Figure 2

Kaplan–Meier curves of overall survival among the patients in training set.

Table 2

Subgroup analysis for the hazard ratios (HRs) between high‐ and low‐risk groups divided by the prognosis‐related gene pairs (PRGPs) signature

VariablesNo. of patientsHR (95% CI)Log‐rank P value
High riskLow risk
All1411596.086 (4.341–8.533)<0.0001
Age
<65 years78836.583 (4.049–10.700)<0.0001
>65 years63766.251 (3.881–10.070)<0.0001
Sex
Male931065.830 (3.846–8.838)<0.0001
Female48536.321 (3.552–11.250)<0.0001
Lauren type
Intestinal53938.423 (4.713–15.050)<0.0001
Diffuse75594.671 (2.965–7.359)<0.0001
Molecular subtype
MSS/TP53−60475.620 (3.316–9.524)<0.0001
MSS/TP53+28517.531 (3.599–15.760)<0.0001
Differentiation
Well/moderate387612.250 (6.150–24.410)<0.0001
Poor55616.172 (3.620–10.520)<0.0001
TNM stage
I/II41858.237 (3.856–17.590)<0.0001
III/IV100724.052 (2.788–5.887)<0.0001
Lymphovascular invasion
Positive1031024.558 (3.107–6.686)<0.0001
Negative254825.440 (9.711–66.650)<0.0001
Venous invasion
Positive25199.201 (4.055–20.880)<0.0001
Negative478210.170 (5.474–18.900)<0.0001
Perineural invasion
Positive51374.751 (2.799–8.064)<0.0001
Negative609910.050 (5.730–17.640)<0.0001

HR, hazard ratio; CI, confidence interval.

Distribution of the patients’ survival status and risk scores of the prognosis‐related gene pairs (PRGPs) signature. Signature information Kaplan–Meier curves of overall survival among the patients in training set. Subgroup analysis for the hazard ratios (HRs) between high‐ and low‐risk groups divided by the prognosis‐related gene pairs (PRGPs) signature HR, hazard ratio; CI, confidence interval. Moreover, we also conducted a sensitivity analysis by omitting one gene pair per time to evaluate the effect of each included gene pair on the signature, and found that each included gene pair helped improve the predictive ability of the signature (Table S1).

Validation of the PRGPs signature

In the test set, there were a total of 192 GC patients (125 male [65%] and 67 female [35%]; median age [range]: 67 [23-92] years). According to the same signature score and cutoff value with the training set, high‐risk patients also showed a poorer prognosis than low‐risk patients in the test set (HR: 1.773, 95% CI: 1.107–2.840) (Fig. 3).
Figure 3

Kaplan–Meier curves of overall survival among the patients in test set.

Kaplan–Meier curves of overall survival among the patients in test set. Furthermore, we randomly selected 14 genes from the 518 prognostic genes to construct seven gene pairs as pseudo‐PRGPs. The pseudo‐PRGPs signature showed a poor predictive accuracy in the prognosis of GC (AUC at 5 years = 0.620), which proved the methodological reliability in this study (Fig. S1).

Comparison with other prognostic signatures

We also compared the PRGPs signature with two existing molecular signatures, both of which were also generated from GSE62254. The 24‐lncRNA expression signature was reported with an AUC of 0.71, and 0.769 for the 11‐Gene expression signature [3,4]. After modeling again in the training set, both the signatures showed a lower predictive efficiency than the PRGPs signature (AUC at 5 years = 0.872, 0.751 and 0.695 for PRGPs, 11‐Gene and 24‐lncRNA), as well as for TNM stage (AUC at 5 years = 0.737) (Fig. 4). Furthermore, the GRGPs signature achieved a higher predictive accuracy than the 11‐Gene signature (C‐index [95% CI]: 0.872 [0.846–0.897] vs. 0.706 [0.667–0.744]; P = 0.026) and the 24‐lncRNA signature (C‐index [95% CI]: 0.872 [0.846–0.897] vs. 0.684 [0.642–0.726]; P = 0.011), and moderately higher than the TNM stage signature (C‐index [95% CI]: 0.872 [0.846–0.897] vs. 0.764 [0.715–0.814]; P = 0.077) (Table 3).
Figure 4

Time‐dependent receiver operating characteristic (ROC) curves of different prognostic signatures in gastric cancer.

Table 3

Predictive accuracy of different prognostic signatures in gastric cancer

SignatureAUCC‐index (95% CI) P valuea
PRGPs0.8200.872 (0.846–0.897)
TNM stage0.7370.764 (0.715–0.814)0.077
11‐Gene0.7510.706 (0.667–0.744)0.026
24–lncRNA0.6950.684 (0.642–0.726)0.011

AUC, area under time‐dependent receiver operating characteristics (ROC) curve; CI, confidence interval; PRGPs, prognosis‐related gene pairs; C‐index, concordance index.

Represents the difference between the PRGPs and other signatures in terms of C‐index.

Time‐dependent receiver operating characteristic (ROC) curves of different prognostic signatures in gastric cancer. Predictive accuracy of different prognostic signatures in gastric cancer AUC, area under time‐dependent receiver operating characteristics (ROC) curve; CI, confidence interval; PRGPs, prognosis‐related gene pairs; C‐index, concordance index. Represents the difference between the PRGPs and other signatures in terms of C‐index.

Biological processes associated with the PRGPs signature

After the patients were divided into high‐ and low‐risk groups according to the PRGPs signature, GSEA identified one gene set of “oxidative and phosphorylation” significantly enriched in the low‐risk group (FDR = 0.029) (Fig. 5). The set contained 188 genes, among which there exited a complex protein–protein interaction network (Fig. S2). When setting the selection criteria at FDR < 0.1, five more enriched sets were identified, namely “MYC targets,” “interferon α response” and “E2F targets” in the low‐risk group and “myogenesis” in the high‐risk group.
Figure 5

Gene set enrichment analysis.

Gene set enrichment analysis.

Discussion

With the development of high‐throughput gene detection technology, we were entering a new era of big biological data. A tremendous amount of genomic information was detected in individual samples, which promoted the identification of novel diagnostic, prognostic, predictive or pharmacodynamic biomarkers 14. Effective development and validation of biomarkers depended mainly on the intended use. A genomic signature was a biomarker in which the genomic data were combined in a defined manner to provide either a continuous score or a categorical classifier for clinical decision‐making. Prognostic signatures were baseline measurement to provide information about the long‐term outcome for cancer patients. Currently, the microarray or RNA‐sequencing data of gene mutation and expression were usually used to construct novel prognostic signatures by a Cox proportional hazards regression model 15, 16. Well‐developed and validated prognostic signatures could help improve patient management in a personalized manner. However, most of these signatures has not been accepted or widely used in clinical practice. This was caused by multiple factors. First of all, the method based on gene expression was difficult to integrate the samples in different sets for batch effects, which limited the sample size. Second, all signatures showed a significant association with the prognosis, but most failed to provide a specific risk‐score formula and the cutoff value for high‐ and low‐risk groups. In this study, we constructed a risk score consisting of seven GRGPs using L1‐penalized Cox proportional hazards regression. The calculated score by this gene pair‐based method was based entirely on the gene expression profile of one GC patient, and could be used without the need for eliminating batch effects. Thus, the formula and cutoff value could be used across multiple datasets, which was an important advantage. When taking the same formula and cutoff value in the test set, we also reached a consistent result. This indicted the robustness of the method and the PRGPs signature. Accordingly, the PRGPs signature showed a higher predictive efficiency and accuracy than other prognostic signatures. The seven GRGPs consisted of 12 genes, among which two genes (CES1 and TNFRSF11A) were also part of the 11‐Gene signature. Of these 12 genes, only PBX1 has been investigated for potential mechanism in GC 17, 18. We thought the other 11 genes might also play a role in GC. Moreover, the expression imbalance in certain gene pairs might play a more important role than individual differentially expressed genes. In GSEA, we found that “oxidative and phosphorylation” was significantly enriched in the low‐risk group, which was consistent with the recent study 19. The limitations should be acknowledged for our study. First, this was a retrospective designed study, rather than a prospective cohort study. Second, the sample size was relatively small, although the method we developed could eliminate the batch effects. Third, we only considered the microarray data based on GLP570 and ignored other unusual platforms, which might lead to selection bias. In conclusion, our study identified a novel gene pairs signature in the prognosis of GC.

Conflict of Interest

None. Table S1. Sensitivity analysis for the prognosis‐related gene pairs signature. Figure S1. Time‐dependent receiver operating characteristic (ROC) curveof the pseudo prognosis‐related gene pairs signature. Figure S2. Protein–protein interaction network of genes inthe enriched set of “oxidationand phosphorylation”. Click here for additional data file.
  18 in total

1.  A P53-Deficiency Gene Signature Predicts Recurrence Risk of Patients with Early-Stage Lung Adenocarcinoma.

Authors:  Yanding Zhao; Frederick S Varn; Guoshuai Cai; Feifei Xiao; Christopher I Amos; Chao Cheng
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2017-11-15       Impact factor: 4.254

Review 2.  What should physicians look for in evaluating prognostic gene-expression signatures?

Authors:  Jyothi Subramanian; Richard Simon
Journal:  Nat Rev Clin Oncol       Date:  2010-04-27       Impact factor: 66.675

3.  MicroRNA-144-3p suppresses gastric cancer progression by inhibiting epithelial-to-mesenchymal transition through targeting PBX3.

Authors:  Butian Li; Shengping Zhang; Hao Shen; Chenglong Li
Journal:  Biochem Biophys Res Commun       Date:  2017-01-19       Impact factor: 3.575

4.  Development of a prognostic model for breast cancer survival in an open challenge environment.

Authors:  Wei-Yi Cheng; Tai-Hsien Ou Yang; Dimitris Anastassiou
Journal:  Sci Transl Med       Date:  2013-04-17       Impact factor: 17.956

5.  Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer.

Authors:  Bailiang Li; Yi Cui; Maximilian Diehn; Ruijiang Li
Journal:  JAMA Oncol       Date:  2017-11-01       Impact factor: 31.777

6.  PBX3 promotes gastric cancer invasion and metastasis by inducing epithelial-mesenchymal transition.

Authors:  Shuanhu Wang; Chenglong Li; Wenbin Wang; Chungen Xing
Journal:  Oncol Lett       Date:  2016-10-21       Impact factor: 2.967

7.  Integrated Analysis Identifies Molecular Signatures and Specific Prognostic Factors for Different Gastric Cancer Subtypes.

Authors:  Li Min; Yu Zhao; Shengtao Zhu; Xintao Qiu; Rui Cheng; Jie Xing; Linlin Shao; Shuilong Guo; Shutian Zhang
Journal:  Transl Oncol       Date:  2016-12-22       Impact factor: 4.243

8.  A seven-gene prognostic signature for rapid determination of head and neck squamous cell carcinoma survival.

Authors:  Sipeng Shen; Jianling Bai; Yongyue Wei; Guanrong Wang; Qingya Li; Ruyang Zhang; Weiwei Duan; Sheng Yang; Mulong Du; Yang Zhao; David C Christiani; Feng Chen
Journal:  Oncol Rep       Date:  2017-10-24       Impact factor: 3.906

9.  SPARC expression in gastric cancer predicts poor prognosis: Results from a clinical cohort, pooled analysis and GSEA assay.

Authors:  Zhi Li; Ao-Di Li; Lu Xu; De-Wei Bai; Ke-Zuo Hou; Hua-Chuan Zheng; Xiu-Juan Qu; Yun-Peng Liu
Journal:  Oncotarget       Date:  2016-10-25

10.  Gene-pair expression signatures reveal lineage control.

Authors:  Merja Heinäniemi; Matti Nykter; Roger Kramer; Anke Wienecke-Baldacchino; Lasse Sinkkonen; Joseph Xu Zhou; Richard Kreisberg; Stuart A Kauffman; Sui Huang; Ilya Shmulevich
Journal:  Nat Methods       Date:  2013-04-21       Impact factor: 28.547

View more
  29 in total

1.  LncRNA CASC11 promoted gastric cancer cell proliferation, migration and invasion in vitro by regulating cell cycle pathway.

Authors:  Li Zhang; Wenquan Kang; Xiaolan Lu; Shiyang Ma; Lei Dong; Baicang Zou
Journal:  Cell Cycle       Date:  2018-09-10       Impact factor: 4.534

2.  Development and verification of a personalized immune prognostic feature in breast cancer.

Authors:  HongLei Wang; Li Wu; HongTao Wang
Journal:  Exp Biol Med (Maywood)       Date:  2020-06-29

3.  Molecular Typing of Gastric Cancer Based on Invasion-Related Genes and Prognosis-Related Features.

Authors:  Haonan Guo; Hui Tang; Yang Zhao; Qianwen Zhao; Xianliang Hou; Lei Ren
Journal:  Front Oncol       Date:  2022-06-03       Impact factor: 5.738

4.  Identification of an immune-related gene pair signature in breast cancer.

Authors:  Yue Zhan; Xin Guan; Yu Zhang; Zhenhua Zhu; Aiping Shi; Zhimin Fan
Journal:  Transl Cancer Res       Date:  2022-06       Impact factor: 0.496

5.  Discovery and Verification of an Immune-Related Gene Pairs Signature for Predicting Prognosis in Head and Neck Squamous Cell Carcinoma.

Authors:  Jiqiang He; Xinqi Fang; Mingming Han
Journal:  Front Genet       Date:  2021-05-24       Impact factor: 4.599

6.  Identification of novel antioxidant gene signature to predict the prognosis of patients with gastric cancer.

Authors:  Jianhua Wu; Xuan Wang; Nan Wang; Li Ma; Xin Xie; Hao Zhang; Huafeng Kang; Zhangjian Zhou
Journal:  World J Surg Oncol       Date:  2021-07-20       Impact factor: 2.754

7.  WGCNA identification of TLR7 as a novel diagnostic biomarker, progression and prognostic indicator, and immunotherapeutic target for stomach adenocarcinoma.

Authors:  Qihang Yuan; Qi Zhou; Jie Ren; Guan Wang; Chunlai Yin; Dong Shang; Shilin Xia
Journal:  Cancer Med       Date:  2021-05-12       Impact factor: 4.452

8.  miR‑195‑5p regulates multi‑drug resistance of gastric cancer cells via targeting ZNF139.

Authors:  Hongfeng Nie; Jixing Mu; Jinsheng Wang; Yong Li
Journal:  Oncol Rep       Date:  2018-06-25       Impact factor: 3.906

9.  A novel immune-related gene pair prognostic signature for predicting overall survival in bladder cancer.

Authors:  Yang Fu; Shanshan Sun; Jianbin Bi; Chuize Kong; Lei Yin
Journal:  BMC Cancer       Date:  2021-07-15       Impact factor: 4.430

10.  Identification of a Novel Epithelial-Mesenchymal Transition Gene Signature Predicting Survival in Patients With HNSCC.

Authors:  Wei Xin; Chaoran Zhao; Longyang Jiang; Dongmei Pei; Lin Zhao; Chengpu Zhang
Journal:  Pathol Oncol Res       Date:  2021-03-29       Impact factor: 3.201

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.