| Literature DB >> 30558539 |
Jie Hao1, Youngsoon Kim2, Tae-Kyung Kim3,4, Mingon Kang5,6.
Abstract
BACKGROUND: Predicting prognosis in patients from large-scale genomic data is a fundamentally challenging problem in genomic medicine. However, the prognosis still remains poor in many diseases. The poor prognosis may be caused by high complexity of biological systems, where multiple biological components and their hierarchical relationships are involved. Moreover, it is challenging to develop robust computational solutions with high-dimension, low-sample size data.Entities:
Keywords: Glioblastoma multiforme; Long-term survival prediction; Pathway-based analysis; Prognosis prediction; Sparse deep neural network; TCGA
Mesh:
Substances:
Year: 2018 PMID: 30558539 PMCID: PMC6296065 DOI: 10.1186/s12859-018-2500-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1ROC Curves. PASNet produces the highest AUC of 0.6622 while the AUC of Dropout NN, SVM, random LASSO, and LLR is 0.6408, 0.6337, 0.6209, and 0.5899, respectively
Comparison of AUC and F1-score in over ten stratified 5-fold cross-validations
| Model | AUC | F1-Score |
|---|---|---|
| Logistic LASSO | 0.5899 ±0.020 | 0.3347 ±0.025 |
| Random LASSO | 0.6209 ±0.020 | 0.3370 ±0.020 |
| SVM | 0.6337 ±0.015 | 0.3446 ±0.015 |
| Dropout NN | 0.6408 ±0.014 | 0.2957 ±0.025 |
| PASNet | 0.6622 ±0.013 | 0.3978 ±0.016 |
The Wilcoxon signed-rank tests for comparing PASNet with the Benchmark Classifiers
| W Statistic | ||
|---|---|---|
| PASNet vs. Dropout NN | 146.5 | 2.13e-06 |
| PASNet vs. RBF-SVM | 137.0 | 1.35e-06 |
| PASNet vs. Random LASSO | 45.0 | 1.06e-08 |
| PASNet vs. Logistic LASSO | 43.0 | 9.52e-09 |
Fig. 2Graphical representation of the output node values over the samples by PASNet. LTS samples obtain higher node values in LTS node than non-LTS samples. Similarly, non-LTS samples obtain higher node values in non-LTS node than LTS samples
Fig. 3Graphical representation among the output layer, hidden layer, and pathway layer in PASNet. (a) The weights between the hidden layer and the output layer. Hidden nodes are sorted in a descending order. (b) The node values in the hidden layer. The horizontal dotted lines indicates LTS/non-LTS samples. The vertical dotted lines indicates LTS/non-LTS samples are significantly distinguished by top 16 pathways. (c) The absolute weights between the pathway layer and the hidden layer
Fig. 4Graphical representation of the 10 top-ranked pathways by PASNet (a) The absolute weights between the 10 top-ranked pathway nodes and the hidden layer. It is a zoom-in view of Fig. 3c. (b) Weights between the gene layer and the 10 top-ranked pathway nodes. The connections are determined by Reactome database
Top-10 ranked pathways for survival prediction in GBM by PASNet
| Pathway name | Pathway size | Reference | Top-5 ranked genesa |
|---|---|---|---|
| Signaling by GPCR | 920 | [ | SHH, PTGFR, GNG5, CHRM5, LHB |
| GPCR downstream signaling | 805 | [ | PTGFR, OR7C2, GNG5, OR10H3, MLNR |
| Innate immune system | 933 | [ | CD79B, INPPL1, SRC, NUP85, DNM2 |
| Adaptive immune system | 539 | [ | CD79B, ASB6, PTEN, NCF4, FBXO2 |
| Metabolism of carbohydrates | 247 | - | HS3ST3B1, NUP85, PFKFB3, LUM, SLC2A4 |
| Transmembrane transport of small molecules | 413 | [ | SLC9A7, ABCA7, GNG5, AQP8, HK3 |
| Developmental biology | 396 | - | NRP2, FES, WNT10B, MYOD1, SLC2A4 |
| Metabolism of proteins | 518 | - | EIF3G, CCT2, TIMM22, RPL3L, GMPPA |
| Class A/1 (rhodopsin-like receptors) | 305 | [ | PTGFR, OPRD1, CHRM5, NPFF, NTSR2 |
| Axon guidance | 251 | [ | NRP2, NRTN, AGRN, FES, RPS6KA4 |
aThe genes were ranked by absolute weights in the pathways
Fig. 5Hierarchical representation of pathways in PASNet. (a) PASNet is partially visualized showing the five pathways. Distinct neural network activations between LTS (b) and non-LTS (c) are shown via PASNet. The nodes of the neural network of (b) and (c) correspond to (a). For instance, the nodes in the pathway layer of (b) and (c) represent signaling by GPCR, innate immune system, aquaporin-mediated transport, signaling by BMP, and Cytokine signaling in immune system. The pathways of signaling by GPCR and innate immune system are inactive with LTS patients, whereas the both pathways are active with non-LTS patients
Fig. 6Architecture of PASNet. The structure of PASNet is constructed by a gene layer (an input layer), a pathway layer that represents the biological pathways linked with input genes, a hidden layer that represents hierarchical relationships among biological pathways, and an output layer that corresponds with clinical outcomes, e.g. a binary class that has long-term survival and short-term survival, stages of cancer
Fig. 7Training of PASNet. (a) Weights and biases are randomly initialized. Connections between the gene layer and the pathway layer are determined by biological pathway databases, and the remaining layers are considered as fully-connected in this step. (b) A sub-network is randomly selected using a dropout technique and trained. (c) Sparse coding optimizes the sparsity of connections in the sub-network