Literature DB >> 31781361

Wnt/β-Catenin, Carbohydrate Metabolism, and PI3K-Akt Signaling Pathway-Related Genes as Potential Cancer Predictors.

Pengliang Chen1, Pengwei Shi2, Gang Du3, Zhen Zhang4, Liang Liu5.   

Abstract

Predicting the outcome after a cancer diagnosis is critical. Advances in high-throughput sequencing technologies provide physicians with vast amounts of data, yet prognostication remains challenging because the data are greatly dimensional and complex. We evaluated Wnt/β-catenin, carbohydrate metabolism, and PI3K-Akt signaling pathway-related genes as predictive features for classifying tumors and normal samples. Using differentially expressed genes as controls, these pathway-related genes were assessed for accuracy using support-vector machines and three other recommended machine learning models, namely, the random forest, decision tree, and k-nearest neighbor algorithms. The first two outperformed the others. All candidate pathway-related genes yielded areas under the curve exceeding 95.00% for cancer outcomes, and they were most accurate in predicting colorectal cancer. These results suggest that these pathway-related genes are useful and accurate biomarkers for understanding the mechanisms behind cancer development.
Copyright © 2019 Pengliang Chen et al.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31781361      PMCID: PMC6855054          DOI: 10.1155/2019/9724589

Source DB:  PubMed          Journal:  J Healthc Eng        ISSN: 2040-2295            Impact factor:   2.682


1. Introduction

Cancer, associated with high mortality, is indeed a serious threat to public health. One cause for the high mortality rate is nonspecific symptoms in the early stages, resulting in a poor prognosis and a high fatality rate. Thus, accurately predicting cancer is a most critical and urgent task for physicians. Because cancer is fundamentally caused by gene malfunction, utilizing their expression levels as relatively direct methods of diagnoses has attracted a great deal of research attention. To date, analyses of gene expression level data have greatly benefited cancer diagnoses and treatments [1-3]. However, the high dimensionality and noise associated with the data can make these analyses and applications challenging. To reduce these challenges, data are initially processed to identify a small subset of genes primarily responsible for the disease [4, 5]. Feature selection is reportedly a very effective method for reducing the high dimensionality of gene expression datasets [6]. Cancer biology research is rapidly finding the recurring roles of a small set of signaling cascades: the Wnt cascade, metabolism, PI3K/AKT signaling pathway, and so on. The Wnt signaling pathway is prevalent in carcinogenesis, playing an essential role in the development of various tumors [7, 8]. Indeed, current evidence suggests that up to 80% of colorectal cancers are driven by an activating mutation in the Wnt cascade [9]. Altered energy metabolism is believed to be a hallmark characteristic of cancer [10, 11]. Even in the presence of oxygen, cancer cells can reprogram their glucose metabolisms to produce energy, thus largely limiting energy metabolism to glycolysis [12]. In addition, glycolysis provides cancer cells with various metabolic precursors that promote the synthesis of amino acids, nucleotides, and lipids, leading to cancer development. The PI3K-Akt signaling pathway is most frequently activated in a variety of cancer lineages [13-15]. A range of malignancies, including ovarian, breast, colorectal, and endometrial cancers, frequently exhibit activation of the PI3K pathway through various mechanisms, including genomic mutations or alterations involving PIK3CA, PIK3R1, PTEN, AKT, TSC1, TSC2, LKB1 (also known as STK11), MTOR, and other oncogenes or tumor suppressor genes [16, 17]. This regulates key biological processes, including proliferation, the cell cycle, motility, metabolism, and genomic instability, all of which support the survival, expansion, and dissemination of cancer [18]. In conjunction with the rapidly increasing amount of gene expression data, state-of-the-art data analysis tools are being developed. Of them, machine learning (ML) methods such as random forest (RF), support-vector machine (SVM), decision tree (DT), and k-nearest neighbor (KNN) have been successfully applied to various areas of genomics research [19, 20]. Included are the expression profiles of genes [21], predicting the functional activity of genomic sequences [22], and predicting the intrinsic molecular subtypes of breast cancer [23]. Notably, RF uses highly dimensional data and data that are unbalanced and missing values [24]. An SVM is an ML algorithm that separates entities into appropriate classes using a hyperplane [25]. In cancer research, it has been used successfully to classify people as those with and without cancer based on microarray expression data [26]. These methods were used in this study to predict the cancer state from gene expression data from various types of cancer. Given the significant roles of these cancers, pathway-related genes were used as alternative features.

2. Materials and Methods

2.1. Data Acquisition

Genetic data were downloaded from The Cancer Genome Atlas, a publicly accessible dataset (https://cancergenome.nih.gov/). The microarray expression data included colorectal cancer (1222 samples, 1109 tumorous), gastric cancer (407 samples, 375 tumorous), and breast cancer (440 samples, 410 tumorous). Detailed information about the data is shown in Table 1, and the number of pathway-related genes in the candidate cancers is shown in Table 2.
Table 1

Clinical features of patients in The Cancer Genome Atlas (TCGA) dataset.

Clinical factorTCGA
COADBRCASTADPRAD
n = 440 n = 1222 n = 407 n = 550
Patient count (selected/original)387/4101089/1109375/375493/498
Age (years, mean ± SD)65.73 ± 13.0658.46 ± 13.2065.83 ± 10.6565.83 ± 10.65
Sex (male/female/−)201/18612/1077241/134493/0
Death (dead/alive/−)82/304/1152/937150/22510/483
Overall survival (months, mean ± SD)28.46 ± 26.2740.96 ± 30.1719.32 ± 18.0835.76 ± 25.89

Note. Selected patients included those with clinical characteristics after removing normal, replicate, and missing features from the total sample used in the model.

Table 2

Elements of pathway-related genes in candidate cancers.

DatasetsTypicalNontypicalDEGs
WntCa-MePI3KTLRTHRPSC
COAD (total/selected)143/142356/356351/350104/102116/116139/139314/314
BRCA (total/selected)143/142356/356351/350104/102116/116139/139241/241
STAD (total/selected)143/142356/356351/350104/102116/116139/139133/133
PRAD (total/selected)143/142356/356351/350104/102116/116139/139169/169

Note. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer; PRAD, prostate cancer; Wnt, Wnt signaling pathway; Ca-Me, carbohydrate metabolism signaling pathway; PI3K, PI3K-Akt signaling pathway; TLR, toll-like receptor signaling pathway; TH, thyroid hormone signaling pathway; RPSC, signaling pathways regulating pluripotency of stem cells; DEGs, differentially expressed genes.

2.2. Data Preprocessing

Data preprocessing is a crucial step in ML, and errors at this stage can lead to misleading prediction results. This study included the following preprocessing steps: Data were normalized for each sample by first transforming the data using the log ratio base 2 and then, for each probe, calculating the median of the log-summarized values from all samples and subtracting it from each sample. Missing values were replaced with the attribute mean.

2.3. Feature Selection

For clinical use, the number of cancer samples was unbalanced in comparison with the number of features, possibly leading to a high risk of overfitting and degrading the classification performance, thus significantly affecting predication accuracy. However, effective feature selection is a method used to address this challenge [27]. Considering the importance of pathways in tumorigenesis, three pathway-related genes were selected as candidate features. They were the Wnt/β-catenin, carbohydrate metabolism, and PI3K-Akt signaling pathways. Simultaneously, significantly differentially expressed genes (DEGs) were used as controls for comparing the features used for cancer classification. These DEGs have been previously employed in cancer prediction studies, and the findings support their use as valid features. The DESeq R package [28] was used to identify DEGs. Our criteria were a P value of less than 0.001 and a log 2 fold change of 4 or more. Notably, the pathway-related genes were derived from the Kyoto Encyclopedia of Genes and Genomes (http://www.kegg.jp/) analysis.

2.4. Conventional Machine Learning Algorithms

All four widely used classification methods (SVM, RF, DT, and KNN) were adopted. In the SVM method, the parameter C was assigned a value of either 0.1, 1, 10, or 100, and the kernel function was either “linear,” “rbf,” “poly,” or “sigmoid.” In the KNN method, the number of neighbors was assigned as 3, 5, or 7, and the Euclidean distance, Manhattan distance, and Minkowski distance were combined to train the model. In the DT algorithm, CART was used, and the maximum tree depth was 5 or 10. In the RF model, the numbers of DTs were 5, 10, or 50 and the numbers of features were 2, 4, 10, or 20.

3. Results

3.1. General Classification Workflow

Data were extracted from the Kyoto Encyclopedia of Genes and Genomes database. Specifically, 142, 356, and 350 elements (pathway-related genes) were found for the Wnt, carbohydrate metabolism, and PI3K-Akt signaling pathways, respectively. In addition, 314, 241, and 133 DEG parameters were included for colorectal, breast, and gastric cancer, respectively. To evaluate the cancer predictive ability of these pathway-related genes, the workflow shown in Figure 1 was implemented. Before training the model, all data were subjected to pretraining the model using an autoencoder without labels. This step was designed to improve model performance, avoid random initialization of the weights, and select the candidate model architecture associated with the minimum mean square error.
Figure 1

Average areas under the curve (AUCs) for Wnt signal pathway-related genes and differentially expressed genes (DEGs) using four machine learning algorithms to predict colorectal cancer from gene expression data. For the pathway genes, support-vector machine (SVM) yields an AUC of 99.49%, decision tree (DT) yields 89.45%, random forest (RF) yields 99.49%, and k-nearest neighbor (KNN) yields 99.42%. For DEGs, SVM yields 99.49%, DT, 99.49%, RF, 96.18%, and KNN, 97.85%.

3.2. Wnt Pathway-Related Genes Score as High as DEGs in Predicting Colorectal Cancer

Detailed information about the relative sample and pathway-related genes is shown in Tables 1 and 2. The prediction performances of the entire set of Wnt pathway-related genes and of the DEGs were evaluated using three common metrics: precision, recall, and accuracy. Results are shown in Tables 3 and 4. Scores using Wnt pathway-related genes are comparable to those found using DEGs, achieving approximately 95% accuracy for classifying colorectal cancer regardless of the ML method used (Figure 2).
Table 3

Performances of pathway-related genes and DEGs in training set.

GenesTraining set
COAD (%) (normal/tumor = 21/287)BRCA (%) (normal/tumor = 78/777)STAD (%) (normal/tumor = 21/263)PRAD (%) (normal/tumor = 36/349)
SVMRFDTKNNSVMRFDTKNNSVMRFDTKNNSVMRFDTKNN
Wnt100.0099.7391.9799.8299.8599.1686.7899.5096.3099.1479.5593.7694.9790.7369.8889.58
Ca-Me100.0099.8297.15100.0099.7898.1093.2199.1099.4295.3074.5695.3197.1591.8977.5689.18
PI3K100.00100.0096.80100.0099.8899.0489.9299.6499.4298.1981.9697.9195.0794.8171.8490.97
TLR100.0099.2784.45100.0099.6398.9386.5098.5996.9591.9481.3995.3794.0889.1065.2086.79
TH100.0099.9190.54100.0099.8198.2188.7798.7899.2394.7986.9397.5194.8592.2069.9290.62
RPSC100.0099.9397.1599.6499.8499.2290.4899.0198.4199.2381.4897.2093.6394.7878.3489.11
DEGs100.0099.9696.95100.0099.8599.8996.9498.6899.3299.6688.8197.6196.4895.5383.6094.33
Table 4

Performances of pathway-related genes and DEGs in test sets.

GenesTest set
COAD (%) (normal/tumor = 9/123)BRCA (%) (normal/tumor = 35/332)STAD (%) (normal/tumor = 11/112)PRAD (%) (normal/tumor = 16/149)
SVMRFDTKNNSVMRFDTKNNSVMRFDTKNNSVMRFDTKNN
Wnt100.0099.9094.0399.8699.9597.6488.79100.0099.2698.7079.1399.7595.7695.5576.6595.26
Ca-Me100.00100.0099.18100.0099.9699.5088.42100.0099.1097.8875.0497.4494.7596.9579.6795.11
PI3K100.00100.00100.0099.9599.9299.4398.4199.6699.3598.5384.5797.4496.5691.6186.0195.91
TLR100.0097.0188.48100.0099.8799.3590.1799.4498.6298.0186.3695.3793.2092.0578.5686.79
TH100.0099.7299.1899.8699.9398.8683.3899.5799.4398.4580.9297.9396.6089.5364.9191.42
RPSC100.0099.8694.0399.9599.9399.9394.1399.8499.2698.2583.6898.7494.6793.1279.8895.05
DEGs100.00100.00100.00100.00100.0099.9699.0399.2999.3599.1088.6796.0293.3792.7685.4891.23

Note. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer; PRAD, prostate cancer; Wnt, Wnt signaling pathway; Ca-Me, carbohydrate metabolism signaling pathway; PI3K, PI3K-Akt signaling pathway; TLR, toll-like receptor signaling pathway; TH, thyroid hormone signaling pathway; RPSC, signaling pathways regulating pluripotency of stem cells.

Figure 2

Performance of the Wnt signal pathway-related genes in three types of cancers—colorectal cancer, breast cancer, and gastric cancer—using four machine learning algorithms.

3.3. Wnt Pathway-Related Genes Are Efficient Predictors of Cancer

Based on these results, we hypothesized that the Wnt pathway is potentially a feature that can be adopted for cancer detection. To test this, it was evaluated with common cancers such as breast and gastric cancers. Similar procedures and algorithms were selected, and DEGs were used as controls. Not surprisingly, results using the Wnt pathway-related genes were similar to those using the control group: the area under the curve (AUC) exceeded 94.00%. It is worth noting that Wnt pathway-related genes in breast cancer outperformed those in gastric cancer (AUC values of approximately 98% and 95%, respectively Figure 3).
Figure 3

Receiver operating characteristic curves for the Wnt signaling pathway-, PI3K-Akt signaling pathway-, and carbohydrate metabolism signal pathway-related genes for the three datasets.

3.4. Carbohydrate Metabolism and PI3K-Akt Signaling Pathways Can Predict Cancer Status

It is unknown whether other cancer-related pathways can predict cancer status. Thus, a set of carbohydrate metabolism and PI3K-Akt signaling pathway-related genes were chosen to test their abilities to predict our candidate cancers. The carbohydrate metabolism pathway-related genes scored highest for colorectal cancer followed by breast cancer and gastric cancer. Similar results were found using ML methods: AUC values were 98.28%, 97.30%, 96.07%, and 96.31% when using SVM, RF, DT, and KNN, respectively. Interestingly, the PI3K-Akt signaling pathway-related genes performed similarly. Both carbohydrate metabolism and PI3K-Akt signaling pathways yielded AUCs above 96.00%, implying that both pathways can detect cancer with great accuracy (Table 5). Of note, the SVM and RF methods outperformed DT and KNN in cancer detection (Figure 4). Taken together, these results indicate that these three pathway-related genes can be vital features for cancer prediction and that these pathways vary in predictive power. We believe that most pathway-related genes are promising features that could be used for early cancer diagnoses.
Table 5

Performance of candidate pathway-related genes in cancer prediction.

ClassifiersPathway-related genes
WntCarbohydrate metabolismPI3K-Akt
COAD (%)BRCA (%)STAD (%)COAD (%)BRCA (%)STAD (%)COAD (%)BRCA (%)STAD (%)
SVM99.4999.4998.8499.4999.3999.1399.4999.4899.43
RF99.4999.2698.0499.2699.1696.8898.7999.4196.78
DT89.4592.5678.5199.3991.2474.6594.5193.3476.85
KNN99.4298.8593.5497.8399.4493.2797.8199.4594.71

Note. Tumor samples in the positive group versus the normal samples. COAD, colorectal cancer; BRCA, breast cancer; STAD, gastric cancer.

Figure 4

3D bar plots of the three candidate features in various types of cancers. The z-axis indicates percent area under the curve. (a) COAD, (b) BRCA, and (c) STAD.

4. Discussion

Increasing evidence indicates that colorectal cancer is often initiated by an activating mutation in the Wnt cascade. The correlation between the Wnt pathway and colorectal cancer prompted our investigation into whether Wnt pathway-related genes serve as features for detecting colorectal cancer. Thus, we designed this study to take advantage of various conventional ML models and cancer-related pathways for predicting cancer. Results show that these three pathway-related genes could be used as features for cancer prediction; they yielded results equal to those of DEGs. Given the complexity and high mortality of cancer, the accurate early diagnosis of a cancer type can facilitate clinical management. Only relatively recently has cancer researchers attempted to apply ML for cancer prediction and prognosis [29-31]. Most previous work employed ML methods for modeling cancer progression and then identified informative factors used in a classification scheme and attempted to develop a set of classifiers for feature selection. Conventional ML algorithms require engineering domain knowledge to identify features from raw data, whereas ML automatically extracts simple features from the input data using an all-purpose learning procedure. These simple features are mapped into outputs using a complex architecture composed of a series of nonlinear functions (i.e., “hierarchical representations”) to maximize the predictive accuracy of the model. This measure can be improved using rich information contained in the biological research. We aimed to fill this void by assessing pathway-related genes for their performances in cancer prediction and identification. We demonstrated that three cancer-related pathways (the Wnt signaling pathway, carbohydrate metabolism signaling pathway, and PI3K-Akt signaling pathway) have high predictive accuracy compared with DEGs for cancer prediction and identification. Furthermore, their performances were similar regardless the ML algorithm used. The use of DEGs as features has been previously documented. However, the outcomes suggest that all three pathway-related genes can be used as features for cancer detection. By assessing various cancer types, we observed that the features perform best for colorectal cancer followed closely by breast cancer and then gastric cancer. We speculated that the function of pathway-related genes in various cancer types can vary and are more serious in colorectal cancer. Results also show that these three pathway-related genes achieved different performances for one cancer type, and this can result in contributions of their compositions that vary based on the type of tumorigenesis. Finally, these results demonstrate that the SVM and RF algorithms are superior to those of DT and KNN in genomics research. This variation might be because the classifier differs from one problem to another (e.g., the SVM model tends to meet rule-matching well when hundreds of thousands of dimensions exist, as in this study, whereas DT and KNN depend largely on feature selection in nonlinearly related variables). Unlike studies using other ML methodologies, this study offers additional insights on feature extraction for cancer classification. Each of the novel observations we found are worthy of further investigation.

5. Conclusions

We propose that pathway-related genes have the potential to be used as biomarkers for cancer prediction. We demonstrated that the Wnt signaling pathway, carbohydrate metabolism signaling pathway, and PI3K-Akt signaling pathway can be incorporated into ML models to achieve better prediction performance. The proposed features have the potential to facilitate preoperative care of patients with cancer.
  31 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

2.  Gene expression inference with deep learning.

Authors:  Yifei Chen; Yi Li; Rajiv Narayan; Aravind Subramanian; Xiaohui Xie
Journal:  Bioinformatics       Date:  2016-02-11       Impact factor: 6.937

3.  Inhibition of the phosphoinositide 3-kinase pathway for the treatment of patients with metastatic metaplastic breast cancer.

Authors:  S Moulder; T Helgason; F Janku; J Wheler; J Moroney; D Booser; C Albarracin; P K Morrow; J Atkins; K Koenig; M Gilcrease; R Kurzrock
Journal:  Ann Oncol       Date:  2015-04-15       Impact factor: 32.976

4.  Selective of informative metabolites using random forests based on model population analysis.

Authors:  Jian-Hua Huang; Jun Yan; Qing-Hua Wu; Miguel Duarte Ferro; Lun-Zhao Yi; Hong-Mei Lu; Qing-Song Xu; Yi-Zeng Liang
Journal:  Talanta       Date:  2013-10-03       Impact factor: 6.057

Review 5.  Wnt/beta-catenin signaling: components, mechanisms, and diseases.

Authors:  Bryan T MacDonald; Keiko Tamai; Xi He
Journal:  Dev Cell       Date:  2009-07       Impact factor: 12.270

6.  Multiparametric decision support system for the prediction of oral cancer reoccurrence.

Authors:  Konstantinos P Exarchos; Yorgos Goletsis; Dimitrios I Fotiadis
Journal:  IEEE Trans Inf Technol Biomed       Date:  2011-08-18

7.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

Review 8.  Deep learning in bioinformatics.

Authors:  Seonwoo Min; Byunghan Lee; Sungroh Yoon
Journal:  Brief Bioinform       Date:  2017-09-01       Impact factor: 11.622

9.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.

Authors:  Eng-Juh Yeoh; Mary E Ross; Sheila A Shurtleff; W Kent Williams; Divyen Patel; Rami Mahfouz; Fred G Behm; Susana C Raimondi; Mary V Relling; Anami Patel; Cheng Cheng; Dario Campana; Dawn Wilkins; Xiaodong Zhou; Jinyan Li; Huiqing Liu; Ching-Hon Pui; William E Evans; Clayton Naeve; Limsoon Wong; James R Downing
Journal:  Cancer Cell       Date:  2002-03       Impact factor: 31.743

Review 10.  Hallmarks of cancer: the next generation.

Authors:  Douglas Hanahan; Robert A Weinberg
Journal:  Cell       Date:  2011-03-04       Impact factor: 41.582

View more
  1 in total

1.  Basic Pan-Cancer Analysis of the Carcinogenic Effects of Cyclin-Dependent Kinase 4 (CDK4) in Human Surface Tumors.

Authors:  Jingping Wu; Tinghan Deng; Yuanen Huang; Hongbin Cheng
Journal:  J Healthc Eng       Date:  2021-08-09       Impact factor: 2.682

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.