| Literature DB >> 33193847 |
Jun Yu1, Xiaoliu Wu1, Min Lv2, Yuanying Zhang1, Xiaomei Zhang1, Jintian Li1, Ming Zhu1, Jianfeng Huang3, Qin Zhang3.
Abstract
Esophageal squamous cell carcinoma (ESCC) is one of the deadliest cancer types with a poor prognosis due to the lack of symptoms in the early stages and a delayed diagnosis. The present study aimed to identify the risk factors significantly associated with prognosis and to search for novel effective diagnostic modalities for patients with early-stage ESCC. mRNA and methylation data of patients with ESCC and the corresponding clinical information were downloaded from The Cancer Genome Atlas (TCGA) database, and the representation features were screened using deep learning autoencoder. The univariate Cox regression model was used to select the prognosis-related features from the representation features. K-means clustering was used to cluster the TCGA samples. Support vector machine classifier was constructed based on the top 75 features mostly associated with the risk subgroups obtained from K-means clustering. Two ArrayExpress datasets were used to verify the reliability of the obtained risk subgroups. The differentially expressed genes and methylation genes (DEGs and DMGs) between the risk subgroups were analyzed, and pathway enrichment analysis was performed. A total of 500 representation features were produced. Using K-means clustering, the TCGA samples were clustered into two risk subgroups with significantly different overall survival rates. Joint multimodal representation strategy, which showed a good model fitness (C-index=0.760), outperformed early-fusion autoencoder strategy. The joint representation learning-based classification model had good robustness. A total of 1,107 DEGs and 199 DMGs were screened out between the two risk subgroups. The DEGs were involved in 70 pathways, the majority of which were correlated with metastasis and proliferation of various cancer types, including cytokine-cytokine receptor interaction, cell adhesion molecules PPAR signaling pathway, pathways in cancer, transcriptional misregulation in cancer and ECM-receptor interaction pathways. The two survival subgroups obtained via the joint representation learning-based model had good robustness, and had prognostic significance for patients with ESCC. Copyright: © Yu et al.Entities:
Keywords: autoencoder strategy; differentially expressed genes; esophageal squamous cell carcinoma; joint representation learning; multi-omics classification prediction
Year: 2020 PMID: 33193847 PMCID: PMC7656101 DOI: 10.3892/ol.2020.12250
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Clinical characteristics of three datasets.
| Clinical index | TCGA cohort (n=96) | E-GEOD-53624 (n=119) | E-GEOD-53624 (n=179) |
|---|---|---|---|
| Age, mean ± SD | 58.29±10.24 | 59.03±8.93 | 59.35±9.03 |
| Sex, female/male | 14/82 | 21/98 | 33/146 |
| OS, years, mean ± SD | 1.25±0.96 | 3.09±2.02 | 3.02±1.91 |
| OS status, alive/dead | 63/33 | 46/73 | 73/106 |
| DFS, mean years ± SD) | 1.11±1.02 | – | – |
| DFS status, 0/1/- | 59/32/5 | – | – |
| Stage, I/II/III/IV/- | 7/55/27/4/3 | 8/44/67/0/0 | 10/77/92/0/0 |
| Pathological T, T1/T2/T3/T4/- | 8/31/50/4/3 | 8/20/62/29 | 12/27/110/30 |
| Pathologic N, N0/N1/N2/N3- | 54/29/6/3/4 | 54/42/13/10/0 | 83/62/22/12/0 |
| Pathological M, M0/M1/- | 83/4/9 | – | – |
| Eastern cancer oncology group, 0/1/2/3/- | 3/28/5/3/57 | – | – |
TCGA, The Cancer Genome Atlas; OS, overall survival; SD, standard deviation; DFS, disease-free survival; T, tumor; N; node; M, metastasis.
Figure 1.Analysis flow chart. (A) Early-fusion antoencoder strategy, (B) joint multimodal representation strategy.
Figure 2.Kapan-Meier diagrams of the risk subgroups obtained by using different strategies. (A) Kaplan-Meier graphs of the risk subgroups obtained using joint multimodal representation strategy, (B) Kaplan-Meier graphs of the risk subgroups obtained using early fusion autoencoder strategy.
Univariate and multivariate cox regression analysis of clinical factors in two risk subgroups.
| Univariate | Multivariate | |||||||
|---|---|---|---|---|---|---|---|---|
| Clinical features | HR | 95% CI | Z | P-value | HR | 95% CI | Z | P-value |
| Group | 8.40×10−4 | |||||||
| G1 | 1.000 | – | – | – | 1.000 | – | – | – |
| G2 | 3.465 | 1.618–7.421 | 3.198 | 1.38×10−3 | 2.469 | 1.061–5.747 | 2.097 | 0.036 |
| Pathological N | 6.03×10−3 | |||||||
| N0 | 0.217 | 0.091–0.519 | −3.434 | 5.95×10−4 | 0.682 | 0.200–2.326 | −0.611 | 0.541 |
| N1 | 0.339 | 0.135–0.851 | −2.304 | 2.12×10−2 | 0.549 | 0.183–1.652 | −1.066 | 0.286 |
| Stage | 1.07×10−3 | |||||||
| I+II | 0.067 | 0.018–0.248 | −4.052 | 5.08×10−5 | 0.171 | 0.028–1.034 | −1.923 | 0.055 |
| III++IV | 0.160 | 0.043–0.593 | −2.745 | 6.00×10−3 | 0.221 | 0.042–1.163 | −1.782 | 0.075 |
| Sex | 4.53×10−3 | |||||||
| Female | 1.000 | – | – | – | 1.000 | – | – | – |
| Male | 5.365 | 1.246–23.094 | 2.256 | 2.41×10−2 | 4.704 | 0.991–22.322 | 1.949 | 0.051 |
| Additional pharmaceutical therapy | 4.25×10−2 | |||||||
| No | 1.017 | 0.348–2.977 | 0.031 | 0.975 | 0.471 | 0-Inf | 0.000 | 1.000 |
| Yes | 0.000 | 0.000-Inf | −0.004 | 0.997 | 0.000 | 0-Inf | −0.001 | 0.999 |
| Additional radiation therapy | 3.16×10−2 | |||||||
| No | 0.979 | 0.34–2.822 | −0.038 | 0.969 | 2.805 | 0-Inf | 0.000 | 1.000 |
| Yes | 0.000 | 0.000-Inf | −0.004 | 0.997 | 0.000 | 0-Inf | −0.001 | 1.000 |
HR, hazards ratio; CI, confidence interval; N, node.
C-index and Brier score of the SVM classifier for robustness evaluation of the risk subgroups using CV procedure.
| Dataset | 10-fold cv | C-index | Brier score |
|---|---|---|---|
| Training | JMR (60%) | 0.77±0.04 | 0.13±0.03 |
| Methylation only | 0.72±0.10 | 0.14±0.03 | |
| RNA only | 0.74±0.05 | 0.13±0.03 | |
| Test | JMR (40%) | 0.75±0.06 | 0.14±0.04 |
| Methylation only | 0.65±0.17 | 0.16±0.05 | |
| RNA only | 0.73±0.11 | 0.14±0.04 |
JMR, joint multimodal representation.
Figure 3.Verification of the classification model in the two independent validation sets. (A) Kaplan-Meier graphs of the risk subgroups obtained in E-GEOD-53624, (B) Kaplan-Meier graphs of the risk subgroups obtained in E-GEOD-53625.
Figure 4.Risk subgroups of all esophageal cancer samples using the model based on joint multimodal representation strategy. (A) Kaplan-Meier graphs of the risk subgroups of all esophageal cancer samples obtained using joint multimodal representation strategy, (B) Kaplan-Meier graphs of the risk subgroups obtained in E-GEOD-53624, (C) Kaplan-Meier graphs of the risk subgroups obtained in E-GEOD-53625.
Figure 5.Heatmap of (A) top 10 differentially expressed mRNAs and (B) top 10 differential methylation genes between risk subgroup G1 and G2.
Figure 6.KEGG pathway enrichment analysis of significant upregulated and downregulated genes. (A) top 10 pathways of significant upregulated genes, and (B) top 10 pathways of significant downregulated genes.