| Literature DB >> 26286638 |
Wei Liu1, Xuefeng Bai2, Yuejuan Liu2, Wei Wang1, Junwei Han3, Qiuyu Wang2, Yanjun Xu3, Chunlong Zhang3, Shihua Zhang4, Xuecang Li2, Zhonggui Ren2, Jian Zhang2, Chunquan Li2.
Abstract
Precise cancer classification is a central challenge in clinical cancer research such as diagnosis, prognosis and metastasis prediction. Most existing cancer classification methods based on gene or metabolite biomarkers were limited to single genomics or metabolomics, and lacked integration and utilization of multiple 'omics' data. The accuracy and robustness of these methods when applied to independent cohorts of patients must be improved. In this study, we propose a directed random walk-based method to evaluate the topological importance of each gene in a reconstructed gene-metabolite graph by integrating information from matched gene expression profiles and metabolomic profiles. The joint use of gene and metabolite information contributes to accurate evaluation of the topological importance of genes and reproducible pathway activities. We constructed classifiers using reproducible pathway activities for precise cancer classification and risk metabolic pathway identification. We applied the proposed method to the classification of prostate cancer. Within-dataset experiments and cross-dataset experiments on three independent datasets demonstrated that the proposed method achieved a more accurate and robust overall performance compared to several existing classification methods. The resulting risk pathways and topologically important differential genes and metabolites provide biologically informative models for prostate cancer prognosis and therapeutic strategies development.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26286638 PMCID: PMC4541321 DOI: 10.1038/srep13192
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic overview of DRW-GM.
The gene expression profiles are translated into pathway profiles (rows represent pathways and columns represent samples) by DRW-based pathway activity inference. The global gene–metabolite graph is constructed on KEGG metabolite pathways. The input W0 of DRW is the initial weights of genes and metabolites, which are obtained from gene expression profiles and metabolomic profiles respectively. The output W∞ is the probability weight vector of nodes when DRW reaches a steady state. W∞ measures the topological importance of genes by incorporating both gene and metabolite information. The pathway activity is a expression value vector synthetized by topologically weighted differential gene expression vectors in the pathway (Formula (2)).
Comparison of classification performance within datasets.
| DRW-GM | ||
| DRW | 0.9581 ± 0.1037 | 0.9980 ± 0.0281 |
| PAC | 0.9078 ± 0.1520 | 0.9473 ± 0.1165 |
| Mean | 0.8389 ± 0.1845 | 0.8275 ± 0.2083 |
| Median | 0.8224 ± 0.1859 | 0.8581 ± 0.1854 |
| Genes | 0.8436 ± 0.2007 | 0.8654 ± 0.1997 |
Shown are the average AUC and the standard deviation. The classification evaluation is performed according to within-dataset experiments on GSE8511. The AUC shown in bold is the best AUC for the corresponding two phenotypes (Benign–PCA and PCA–Mets).
Comparison of classification performance cross datasets.
| DRW-GM | 0.8522 ± 0.1990 | |||||
| DRW-GM-NM | 0.9866 ± 0.0363 | 0.8413 ± 0.2080 | 0.9817 ± 0.0271 | 0.9947 ± 0.0187 | 0.8995 ± 0.0786 | 0.9991 ± 0.0073 |
| DRW | 0.9780 ± 0.0404 | 0.8254 ± 0.2113 | 0.9827 ± 0.0196 | 0.9851 ± 0.0419 | 0.8918 ± 0.0750 | 0.9989 ± 0.0081 |
| PAC | 0.9634 ± 0.0643 | 0.8139 ± 0.2093 | 0.9675 ± 0.0405 | 0.9482 ± 0.1035 | 0.6645 ± 0.0836 | 0.9911 ± 0.0300 |
| Mean | 0.9450 ± 0.0733 | 0.6663 ± 0.2530 | 0.9351 ± 0.0446 | 0.9105 ± 0.0298 | 0.5172 ± 0.0788 | 0.9713 ± 0.0274 |
| Median | 0.9402 ± 0.0838 | 0.5315 ± 0.2121 | 0.8995 ± 0.0793 | 0.9036 ± 0.0671 | 0.7295 ± 0.0689 | 0.9694 ± 0.0512 |
| Genes | 0.9014 ± 0.1231 | 0.9682 ± 0.0186 | 0.8609 ± 0.0703 | 0.7492 ± 0.0835 | 0.8654 ± 0.0744 | |
Shown are the average AUC and the standard deviation. The classification evaluation are performed according to cross-dataset experiments. The training set is GSE8511. Three independent test sets are GSE3325, GSE32269 and GSE35988. The classifications between Benign and PCA samples, and between PCA and Mets samples are carried out respectively. The AUC shown in bold is the best AUC for the corresponding paired training-test dataset.
aDRW-GM-NM: The classification method that uses gene–metabolite graph, but not incorporates differential metabolites for topological importance evaluation.
Frequently selected pathway markers for prostate cancer prognosis.
| Benign–PCA | ||
| Porphyrin and chlorophyll metabolism | 1030/1500 | GUSB, FTH1, HMBS, ALAD, |
| Purine metabolism | 416/1500 | POLR2H, PAICS, POLR3GL, NUDT9, NME1, AK2, NT5C, NME2, PAPSS1, POLR1A, PDE4A, GUCY1A3, ENTPD5, ENTPD3, PGM1, ADCY2, NUDT5, ITPA, POLR1E, |
| Pentose and glucuronate interconversions | 62/1500 | ALDH2, GUSB, AKR1B1 |
| Drug metabolism - other enzymes | 49/1500 | UCK2, GUSB, TK1, ITPA |
| One carbon pool by folate | 40/1500 | AMT, MTHFD2, SHMT2 |
| PCA–Mets | ||
| Steroid hormone biosynthesis | 701/1500 | HSD17B6, SRD5A2, CYP1A1, |
| Purine metabolism | 222/1500 | NT5C2, PDE8B, POLD1, NME5, PGM1, PNP, ADK, ADSL, POLR2J2, PKM2, NT5M, NT5C1A, ADCY2, ALLC, PDE6A, |
| Tryptophan metabolism | 131/1500 | MAOB, ALDH7A1, ACAT1, ALDH3A2, ALDH1B1, CCBL1, CYP1A1, ALDH2, |
| Arginine and proline metabolism | 105/1500 | LAP3, MAOB, ALDH7A1, ARG2, ALDH3A2, ODC1, ALDH1B1, SMS, GOT2, ALDH2, PYCRL, |
| Arachidonic acid metabolism | 89/1500 | PLA2G4A, HPGDS, PLA2G2D |
| Histidine metabolism | 68/1500 | MAOB, ALDH7A1, ALDH3A2, ALDH1B1, HNMT, HDC, ALDH2, |
| Glycine, serine and threonine metabolism | 53/1500 | AOC3, MAOB, ALDH7A1, SHMT2, SRR, DLD, |
| Drug metabolism - other enzymes | 45/1500 | DPYD, TK1, TYMP |
aMetabolites are shown in italics.
Figure 2A snapshot of the perturbed nodes in arginine and proline metabolism pathway.
The differential genes and metabolites of prostate cancer (PCA–Mets) are shown with red node labels and borders. The nodes discussed in Discussion section are marked with: ①: EC:2.5.1.22, SMS; ②: cpd: C00315, spermidine; ③: cpd: C00750, spermine; ④: EC:3.4.13.5, LAP3; ⑤: cpd:C00148, L-Proline; ⑥: EC:1.4.3.4, MAOB; ⑦: cpd:C02946, N4-Acetylaminobutanoate; ⑧ cpd:C00134, Putrescine.
Figure 3A snapshot of the perturbed nodes in steroid hormone biosynthesis pathway.
The differential genes and metabolites of prostate cancer (PCA–Mets) are shown with red node labels and borders. The key enzymes in the pathway (red arrows) to form C-19 steroids of androgens are perturbed.
Comparison of reproducibility powers of six pathway activity inference methods.
| p-value | 9.122 × 10–6 | 2.301 × 10–4 | 9.313 × 10–10 | 9.313 × 10–10 | 9.313 × 10–10 |
aWilcoxon signed-rank test on the 30 Cscore values of DRW-GM against those of the other five methods.
Figure 4Reproducibility power of pathway activities.
The reproducibility powers of pathway activities inferred by DRW-GM, DRW, PAC, Mean and Median were compared. The individual gene markers were also incorporated for comparison. (A,C,E) Comparison of reproducibility powers for Benign vs PCA (GSE8511->GSE3325, GSE8511->GSE32269, and GSE8511->GSE35988). The x-axis corresponds to the number N of top ranked pathways that considered, and the y-axis shows the reproducibility power C of the top N pathways. N = 10, 20, 30, 40, 50. (B,D,F) Comparison of reproducibility powers for PCA vs MET.