| Literature DB >> 31781361 |
Pengliang Chen1, Pengwei Shi2, Gang Du3, Zhen Zhang4, Liang Liu5.
Abstract
Predicting the outcome after a cancer diagnosis is critical. Advances in high-throughput sequencing technologies provide physicians with vast amounts of data, yet prognostication remains challenging because the data are greatly dimensional and complex. We evaluated Wnt/β-catenin, carbohydrate metabolism, and PI3K-Akt signaling pathway-related genes as predictive features for classifying tumors and normal samples. Using differentially expressed genes as controls, these pathway-related genes were assessed for accuracy using support-vector machines and three other recommended machine learning models, namely, the random forest, decision tree, and k-nearest neighbor algorithms. The first two outperformed the others. All candidate pathway-related genes yielded areas under the curve exceeding 95.00% for cancer outcomes, and they were most accurate in predicting colorectal cancer. These results suggest that these pathway-related genes are useful and accurate biomarkers for understanding the mechanisms behind cancer development.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31781361 PMCID: PMC6855054 DOI: 10.1155/2019/9724589
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Clinical features of patients in The Cancer Genome Atlas (TCGA) dataset.
| Clinical factor | TCGA | |||
|---|---|---|---|---|
| COAD | BRCA | STAD | PRAD | |
|
|
|
|
| |
| Patient count (selected/original) | 387/410 | 1089/1109 | 375/375 | 493/498 |
| Age (years, mean ± SD) | 65.73 ± 13.06 | 58.46 ± 13.20 | 65.83 ± 10.65 | 65.83 ± 10.65 |
| Sex (male/female/−) | 201/186 | 12/1077 | 241/134 | 493/0 |
| Death (dead/alive/−) | 82/304/1 | 152/937 | 150/225 | 10/483 |
| Overall survival (months, mean ± SD) | 28.46 ± 26.27 | 40.96 ± 30.17 | 19.32 ± 18.08 | 35.76 ± 25.89 |
Note. Selected patients included those with clinical characteristics after removing normal, replicate, and missing features from the total sample used in the model.
Elements of pathway-related genes in candidate cancers.
| Datasets | Typical | Nontypical | DEGs | ||||
|---|---|---|---|---|---|---|---|
| Wnt | Ca-Me | PI3K | TLR | TH | RPSC | ||
| COAD (total/selected) | 143/142 | 356/356 | 351/350 | 104/102 | 116/116 | 139/139 | 314/314 |
| BRCA (total/selected) | 143/142 | 356/356 | 351/350 | 104/102 | 116/116 | 139/139 | 241/241 |
| STAD (total/selected) | 143/142 | 356/356 | 351/350 | 104/102 | 116/116 | 139/139 | 133/133 |
| PRAD (total/selected) | 143/142 | 356/356 | 351/350 | 104/102 | 116/116 | 139/139 | 169/169 |
Note. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer; PRAD, prostate cancer; Wnt, Wnt signaling pathway; Ca-Me, carbohydrate metabolism signaling pathway; PI3K, PI3K-Akt signaling pathway; TLR, toll-like receptor signaling pathway; TH, thyroid hormone signaling pathway; RPSC, signaling pathways regulating pluripotency of stem cells; DEGs, differentially expressed genes.
Figure 1Average areas under the curve (AUCs) for Wnt signal pathway-related genes and differentially expressed genes (DEGs) using four machine learning algorithms to predict colorectal cancer from gene expression data. For the pathway genes, support-vector machine (SVM) yields an AUC of 99.49%, decision tree (DT) yields 89.45%, random forest (RF) yields 99.49%, and k-nearest neighbor (KNN) yields 99.42%. For DEGs, SVM yields 99.49%, DT, 99.49%, RF, 96.18%, and KNN, 97.85%.
Performances of pathway-related genes and DEGs in training set.
| Genes | Training set | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| COAD (%) (normal/tumor = 21/287) | BRCA (%) (normal/tumor = 78/777) | STAD (%) (normal/tumor = 21/263) | PRAD (%) (normal/tumor = 36/349) | |||||||||||||
| SVM | RF | DT | KNN | SVM | RF | DT | KNN | SVM | RF | DT | KNN | SVM | RF | DT | KNN | |
| Wnt | 100.00 | 99.73 | 91.97 | 99.82 | 99.85 | 99.16 | 86.78 | 99.50 | 96.30 | 99.14 | 79.55 | 93.76 | 94.97 | 90.73 | 69.88 | 89.58 |
| Ca-Me | 100.00 | 99.82 | 97.15 | 100.00 | 99.78 | 98.10 | 93.21 | 99.10 | 99.42 | 95.30 | 74.56 | 95.31 | 97.15 | 91.89 | 77.56 | 89.18 |
| PI3K | 100.00 | 100.00 | 96.80 | 100.00 | 99.88 | 99.04 | 89.92 | 99.64 | 99.42 | 98.19 | 81.96 | 97.91 | 95.07 | 94.81 | 71.84 | 90.97 |
| TLR | 100.00 | 99.27 | 84.45 | 100.00 | 99.63 | 98.93 | 86.50 | 98.59 | 96.95 | 91.94 | 81.39 | 95.37 | 94.08 | 89.10 | 65.20 | 86.79 |
| TH | 100.00 | 99.91 | 90.54 | 100.00 | 99.81 | 98.21 | 88.77 | 98.78 | 99.23 | 94.79 | 86.93 | 97.51 | 94.85 | 92.20 | 69.92 | 90.62 |
| RPSC | 100.00 | 99.93 | 97.15 | 99.64 | 99.84 | 99.22 | 90.48 | 99.01 | 98.41 | 99.23 | 81.48 | 97.20 | 93.63 | 94.78 | 78.34 | 89.11 |
| DEGs | 100.00 | 99.96 | 96.95 | 100.00 | 99.85 | 99.89 | 96.94 | 98.68 | 99.32 | 99.66 | 88.81 | 97.61 | 96.48 | 95.53 | 83.60 | 94.33 |
Performances of pathway-related genes and DEGs in test sets.
| Genes | Test set | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| COAD (%) (normal/tumor = 9/123) | BRCA (%) (normal/tumor = 35/332) | STAD (%) (normal/tumor = 11/112) | PRAD (%) (normal/tumor = 16/149) | |||||||||||||
| SVM | RF | DT | KNN | SVM | RF | DT | KNN | SVM | RF | DT | KNN | SVM | RF | DT | KNN | |
| Wnt | 100.00 | 99.90 | 94.03 | 99.86 | 99.95 | 97.64 | 88.79 | 100.00 | 99.26 | 98.70 | 79.13 | 99.75 | 95.76 | 95.55 | 76.65 | 95.26 |
| Ca-Me | 100.00 | 100.00 | 99.18 | 100.00 | 99.96 | 99.50 | 88.42 | 100.00 | 99.10 | 97.88 | 75.04 | 97.44 | 94.75 | 96.95 | 79.67 | 95.11 |
| PI3K | 100.00 | 100.00 | 100.00 | 99.95 | 99.92 | 99.43 | 98.41 | 99.66 | 99.35 | 98.53 | 84.57 | 97.44 | 96.56 | 91.61 | 86.01 | 95.91 |
| TLR | 100.00 | 97.01 | 88.48 | 100.00 | 99.87 | 99.35 | 90.17 | 99.44 | 98.62 | 98.01 | 86.36 | 95.37 | 93.20 | 92.05 | 78.56 | 86.79 |
| TH | 100.00 | 99.72 | 99.18 | 99.86 | 99.93 | 98.86 | 83.38 | 99.57 | 99.43 | 98.45 | 80.92 | 97.93 | 96.60 | 89.53 | 64.91 | 91.42 |
| RPSC | 100.00 | 99.86 | 94.03 | 99.95 | 99.93 | 99.93 | 94.13 | 99.84 | 99.26 | 98.25 | 83.68 | 98.74 | 94.67 | 93.12 | 79.88 | 95.05 |
| DEGs | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 99.96 | 99.03 | 99.29 | 99.35 | 99.10 | 88.67 | 96.02 | 93.37 | 92.76 | 85.48 | 91.23 |
Note. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer; PRAD, prostate cancer; Wnt, Wnt signaling pathway; Ca-Me, carbohydrate metabolism signaling pathway; PI3K, PI3K-Akt signaling pathway; TLR, toll-like receptor signaling pathway; TH, thyroid hormone signaling pathway; RPSC, signaling pathways regulating pluripotency of stem cells.
Figure 2Performance of the Wnt signal pathway-related genes in three types of cancers—colorectal cancer, breast cancer, and gastric cancer—using four machine learning algorithms.
Figure 3Receiver operating characteristic curves for the Wnt signaling pathway-, PI3K-Akt signaling pathway-, and carbohydrate metabolism signal pathway-related genes for the three datasets.
Performance of candidate pathway-related genes in cancer prediction.
| Classifiers | Pathway-related genes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Wnt | Carbohydrate metabolism | PI3K-Akt | |||||||
| COAD (%) | BRCA (%) | STAD (%) | COAD (%) | BRCA (%) | STAD (%) | COAD (%) | BRCA (%) | STAD (%) | |
| SVM | 99.49 | 99.49 | 98.84 | 99.49 | 99.39 | 99.13 | 99.49 | 99.48 | 99.43 |
| RF | 99.49 | 99.26 | 98.04 | 99.26 | 99.16 | 96.88 | 98.79 | 99.41 | 96.78 |
| DT | 89.45 | 92.56 | 78.51 | 99.39 | 91.24 | 74.65 | 94.51 | 93.34 | 76.85 |
| KNN | 99.42 | 98.85 | 93.54 | 97.83 | 99.44 | 93.27 | 97.81 | 99.45 | 94.71 |
Note. Tumor samples in the positive group versus the normal samples. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer.
Figure 43D bar plots of the three candidate features in various types of cancers. The z-axis indicates percent area under the curve. (a) COAD, (b) BRCA, and (c) STAD.