| Literature DB >> 31743579 |
Bin Ma1, Yongmin Li1, Yupeng Ren1.
Abstract
Gastric cancer (GC) remains an important malignancy worldwide with poor prognosis. Long noncoding RNAs (lncRNAs) can markedly affect cancer progression. Moreover, lncRNAs have been proposed as diagnostic or prognostic biomarkers of GC. Therefore, the current study aimed to explore lncRNA-based prognostic biomarkers for GC. LncRNA expression profiles from the Gene Expression Omnibus (GEO) database were first downloaded. After re-annotation of lncRNAs, a univariate Cox analysis identified 177 prognostic lncRNA probes in the training set GSE62254 (n = 225). Multivariate Cox analysis of each lncRNA with clinical characteristics as covariates identified a total of 46 prognostic lncRNA probes. Robust likelihood-based survival and least absolute shrinkage and selection operator (LASSO) models were used to establish a 6-lncRNA signature with prognostic value. Receiver operating characteristic (ROC) curve analyses were employed to compare survival prediction in terms of specificity and sensitivity. Patients with high-risk scores exhibited a significantly worse overall survival (OS) than patients with low-risk scores (log-rank test P-value <.0001), and the area under the ROC curve (AUC) for 5-year survival was 0.77. A nomogram and forest plot were constructed to compare the clinical characteristics and risk scores by a multivariable Cox regression analysis, which suggested that the 6-lncRNA signature can independently make the prognosis evaluation of patients. Single-sample GSEA (ssGSEA) was used to determine the relationships between the 6-lncRNA signature and biological functions. The internal validation set GSE62254 (n = 75) and the external validation set GSE57303 (n = 70) were successfully used to validate the robustness of our 6-lncRNA signature. In conclusion, based on the above results, the 6-lncRNA signature can effectively make the prognosis evaluation of GC patients.Entities:
Keywords: GEO; gastric cancer; least absolute shrinkage and selection operator (LASSO); long noncoding RNAs; prognosis; robust likelihood-based survival
Year: 2019 PMID: 31743579 PMCID: PMC6943089 DOI: 10.1002/cam4.2621
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
Figure 1The schematic workflow of the present study
The demographic characteristics of samples in the training and validation datasets
| Characteristics |
Training dataset
|
Validation dataset
|
Validation dataset
|
|---|---|---|---|
| Age (y) | |||
| ≤60 | 87 | 30 | 29 |
| >60 | 138 | 45 | 39 |
| Survival status | |||
| Living | 116 | 32 | 34 |
| Dead | 109 | 43 | 36 |
| Gender | |||
| Female | 70 | 31 | 18 |
| Male | 155 | 44 | 52 |
| pT | |||
| T2 | 143 | 43 | 7 |
| T3 | 67 | 24 | 54 |
| T4 | 13 | 8 | 9 |
| pN | |||
| N0 | 24 | 14 | 13 |
| N1 | 103 | 28 | 26 |
| N2 | 63 | 17 | 26 |
| N3 | 35 | 16 | 5 |
| pM | |||
| M0 | 202 | 68 | 63 |
| M1 | 20 | 7 | 7 |
| pStage | |||
| Stage I | 20 | 10 | 3 |
| Stage II | 77 | 19 | 9 |
| Stage III | 74 | 21 | 41 |
| Stage IV | 52 | 25 | 17 |
| Lauren subtype | |||
| Diffuse | 107 | 35 | 35 |
| Intestinal | 112 | 38 | 20 |
| Mixed | 6 | 2 | 15 |
| MLH1 IHC | |||
| Negative | 48 | 16 | — |
| Positive | 176 | 58 | — |
| EBV ISH | |||
| Negative | 192 | 65 | — |
| Positive | 15 | 3 | — |
| Molecular subtype | |||
| MSS/TP53‐ | 82 | 25 | — |
| MSS/TP53+ | 59 | 20 | — |
| MSI | 48 | 20 | — |
| EMT | 36 | 10 | — |
Abbreviations: pM, pathology Metastasis stage; pN, pathology Lymph Node stage; pT, pathology Tumor stage.
The most significant of the top 20 lncRNA probes by univariate Cox proportional hazard model
| Probe IDs |
| HR | Low 95% CI | High 95% CI |
|---|---|---|---|---|
| 236141_at | 1.08E−07 | 2.656 | 1.853 | 3.809 |
| 213447_at | 2.12E−07 | 4.327 | 2.488 | 7.525 |
| 219791_s_at | 4.61E−07 | 3.118 | 2.004 | 4.851 |
| 1559901_s_at | 1.31E−06 | 11.148 | 4.198 | 29.602 |
| 1564139_at | 2.01E−06 | 7.717 | 3.322 | 17.925 |
| 221974_at | 3.03E−06 | 2.612 | 1.745 | 3.908 |
| 235759_at | 3.49E−06 | 2.345 | 1.636 | 3.362 |
| 227909_at | 4.17E−06 | 4.957 | 2.507 | 9.801 |
| 242358_at | 4.65E−06 | 4.243 | 2.286 | 7.876 |
| 226582_at | 7.22E−06 | 2.511 | 1.679 | 3.753 |
| 1558828_s_at | 7.35E−06 | 2.647 | 1.730 | 4.052 |
| 1556695_a_at | 7.90E−06 | 6.592 | 2.882 | 15.075 |
| 229734_at | 7.98E−06 | 12.126 | 4.056 | 36.254 |
| 230589_at | 9.05E−06 | 27.102 | 6.313 | 116.348 |
| 1559965_at | 9.98E−06 | 11.411 | 3.874 | 33.606 |
| 232298_at | 1.10E−05 | 2.196 | 1.546 | 3.119 |
| 244553_at | 1.33E−05 | 0.070 | 0.021 | 0.232 |
| 1556364_at | 1.34E−05 | 5.857 | 2.643 | 12.978 |
| 225381_at | 1.34E−05 | 2.108 | 1.507 | 2.949 |
| 241834_at | 1.86E−05 | 9.697 | 3.427 | 27.433 |
Abbreviations: HR: hazard ratio; CI: confidence interval.
Figure 2Screening of prognosis‐related clinical characteristics by Kaplan‐Meier analyses. A, Kaplan‐Meier curves based on different pT stages. B, Kaplan‐Meier curves based on different pN stages. C, Kaplan‐Meier curves based on different pM stages. D, Kaplan‐Meier curves based on different tumor stages. E, Kaplan‐Meier curves based on different age groups, where Q1, Q2, Q3, and Q4 represent quartiles
Figure 3Screening of significant lncRNAs by robust likelihood‐based survival and LASSO models. A, The distribution of all lncRNA probes and standard deviation. The red bar indicates the standard deviation of the lncRNA probe with a frequency greater than 100; the horizontal axis represents the standard deviation, and the vertical axis represents the number of probes. B, The frequency distribution of lncRNA probes selected by the robust likelihood‐based survival model. The horizontal axis represents lncRNA probes, and the vertical axis represents the frequency of occurrence 1000 times. The red bar indicates the standard deviation of the lncRNA probe greater than the median standard deviation of all probes. C, Three‐time cross‐validation for tuning parameter selection in the LASSO model. D, The distribution of each lambda and CI
The results of univariate Cox analysis and their information
| Probe IDs |
| HR | Low 95% CI | High 95% CI | Ref seq symbol | Ensembl symbol |
|---|---|---|---|---|---|---|
| 213447_at | 2.12E‐07 | 4.327 | 2.488 | 7.525 | IPW | — |
| 227909_at | 4.17E‐06 | 4.957 | 2.507 | 9.801 | NCRNA00086 | NCRNA00086 |
| 231925_at | 5.73E‐05 | 4.770 | 2.228 | 10.210 | — | RP11‐38P22.2 |
| 232191_at | 0.0026 | 0.009 | 0.000 | 0.191 | — | ERVH48‐1 |
| 243017_at | 0.0006 | 44.374 | 5.016 | 392.582 | LOC158572 | — |
| 244553_at | 1.33E‐05 | 0.070 | 0.021 | 0.232 | — | AC004080.17 |
Abbreviations: CI: confidence interval; HR: hazard ratio; IPW, lncRNA‐IPW.
Figure 4LncRNA risk score analysis using the training set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254. A, Distribution of 6‐lncRNA‐based risk scores, lncRNA expression levels, and patient survival durations in the training set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254 (n = 225). B, Kaplan‐Meier curves of OS according to the 6‐lncRNA signature. C, ROC curve analyses based on the 6‐lncRNA signature
Figure 5The association between the 6‐lncRNA signature and clinical characteristics. A, The distribution of risk scores according to different clinical information. B, The nomogram to predict the probabilities 1‐y, 3‐y, and 5‐y OS in patients. C, ROC curves according to the nomogram and lncRNA risk score. D, Calibration plots to predict the 3‐y and 5‐y OS of patients. The probability of survival predicted by the nomogram was plotted on the x‐axis, and actual survival was plotted on the y‐axis. E, The forest plot of risk scores and clinical characteristics
Figure 6LncRNA risk score analysis using the internal validation set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254. A, Distribution of 6‐lncRNA‐based risk scores, lncRNA expression levels and patient survival durations in the internal validation set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254 (n = 75). B, ROC curve analyses based on the 6‐lncRNA signature. C, Kaplan‐Meier curves of OS based on the 6‐lncRNA signature
Figure 7LncRNA risk score analysis using the external validation set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57303. A, Distribution of 6‐lncRNA‐based risk scores, lncRNA expression levels, and patient survival durations in the external validation set http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57303 (n = 70). B, ROC curve analyses based on the 6‐lncRNA signature. C, Kaplan‐Meier curves of OS based on the 6‐lncRNA signature
The comparison of studies about lncRNA signature for GC
| Databases | Methods | LncRNA signature | LncRNA symbols | AUC value | Reference |
|---|---|---|---|---|---|
|
| Random survival forest‐variable hunting | 24‐lncRNAs | AF035291, AI028608, AK026189, H04858, BC037827, BC038210, AI916498, AA463827, AA041523, BE621082, AK056852, AW206234, AL703532, AI095542, AI080288, BC021187, BF238392, BC005107, BC039674, AI056187, T79746, H11436, BF511694, and BC035722 | 0.82 | Zhu et al |
| TCGA | LASSO Cox regression model | 3‐lncRNAs | CYP4A22‐AS1, AP000695.6, and RP11‐108M12.3 | 0.737 | Cheng et al |
|
| Univariable Cox regression analysis and random survival forest‐variable hunting | 3‐lncRNAs | LINC01140, TGFB2‐OT1, and RP11‐347C12.10 | 0.688 | Song et al |
| TCGA | Limma, univariate, and multivariate Cox regression models | 5‐lncRNAs | CTD‐2616J11.14, RP1‐90G24.10, RP11‐150O12.3, RP11‐1149O23.2, and MLK7‐AS1 | None | Ren et al |
|
| Weighted correlation network and LASSO analysis | 11‐lncRNAs | ARHGAP5‐AS1, FLVCR1‐AS1, H19, HOTAIR, LINC00221, MCF2L‐AS1, MUC2, PRSS30P, SCARNA9, TP53TG1, and XIST | None | Zhang et al |
|
| Random survival forest‐variable hunting | 5‐lncRNAs | AK001094, AK024171, AK093735, BC003519, and NR_003573 | 0.95 | Fan et al |