| Literature DB >> 34768960 |
Sangick Park1, Eunchong Huang1, Taejin Ahn1,2.
Abstract
Deep learning has proven advantageous in solving cancer diagnostic or classification problems. However, it cannot explain the rationale behind human decisions. Biological pathway databases provide well-studied relationships between genes and their pathways. As pathways comprise knowledge frameworks widely used by human researchers, representing gene-to-pathway relationships in deep learning structures may aid in their comprehension. Here, we propose a deep neural network (PathDeep), which implements gene-to-pathway relationships in its structure. We also provide an application framework measuring the contribution of pathways and genes in deep neural networks in a classification problem. We applied PathDeep to classify cancer and normal tissues based on the publicly available, large gene expression dataset. PathDeep showed higher accuracy than fully connected neural networks in distinguishing cancer from normal tissues (accuracy = 0.994) in 32 tissue samples. We identified 42 pathways related to 32 cancer tissues and 57 associated genes contributing highly to the biological functions of cancer. The most significant pathway was G-protein-coupled receptor signaling, and the most enriched function was the G1/S transition of the mitotic cell cycle, suggesting that these biological functions were the most common cancer characteristics in the 32 tissues.Entities:
Keywords: biological function; cancer gene expression; deep learning; neural networks; pathway
Mesh:
Year: 2021 PMID: 34768960 PMCID: PMC8584109 DOI: 10.3390/ijms222111531
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
One sample z-test of PathDeep performance for molecular signature collection.
| Pathway Database Sources | Accuracy | |
|---|---|---|
| c2 reactome | 0.994 | 5.19 × 10−2 |
| c6 oncogenic signatures | 0.993 | 1.12 × 10−1 |
| c5 GO cc | 0.993 | 1.57 × 10−1 |
| c2 kegg | 0.992 | 2.40 × 10−1 |
| c3 tft | 0.991 | 2.49 × 10−1 |
| c3 mir | 0.989 | 2.66 × 10−1 |
| c2 cp biocarta | 0.992 | 2.73 × 10−1 |
| c5 GO bp | 0.993 | 2.98 × 10−1 |
| c4 cm | 0.990 | 3.25 × 10−1 |
| c4 cgn | 0.992 | 3.41 × 10−1 |
| c2 cp | 0.993 | 3.77 × 10−1 |
| c5 GO mf | 0.992 | 3.89 × 10−1 |
| c2 cgp | 0.993 | 4.16 × 10−1 |
| c7 immunologic signatures | 0.977 | 4.50 × 10−1 |
| c1 positional | 0.991 | 4.58 × 10−1 |
Figure 1Performance of PathDeep is better than other machine learning methods.
Figure 2t-SNE analysis of pathway index and pathway member gene expression. (A) Pathway member gene expression represents both cancer and normal tissue-related information. (B) The pathway index also includes cancer and normal tissue-related information (e.g., pathway member gene expression data). (C) Pathway member gene expression contains information on cancer types. (D) The pathway index also contains information on cancer types, such as pathway member gene expression.
Figure 3Pathway contribution gene index histogram.
Figure 4PathDeep performance using pathway-contributing gene ratio.
Figure 5Performance comparison between pathway contribution of the top 57 genes and a randomly selected 57 genes.
Cancer-related c2 reactome; 42 pathways and one sample z-test FDR q-value of pathways (pathways with FDR q-value < 1 × 10−10 are shown in this table; the FDR q-values of all 42 pathways are shown in Table S3).
| Pathway (c2 Reactome) | FDR |
|---|---|
| GPCR DOWNSTREAM SIGNALING | 0 |
| GPCR LIGAND BINDING | 4.54 × 10−100 |
| NEURONAL SYSTEM | 9.86 × 10−73 |
| SLC MEDIATED TRANSMEMBRANE TRANSPORT | 1.11 × 10−64 |
| SIGNALING BY EGFR IN CANCER | 4.13 × 10−55 |
| AXON GUIDANCE | 6.81 × 10−53 |
| PEPTIDE LIGAND BINDING RECEPTORS | 1.31 × 10−29 |
| DIABETES PATHWAYS | 9.06 × 10−20 |
| GASTRIN CREB SIGNALING PATHWAY VIA PKC AND MAPK | 4.20 × 10−17 |
| SIGNALING BY FGFR MUTANTS | 9.46 × 10−17 |
| COLLAGEN FORMATION | 3.74 × 10−16 |
| FATTY ACYL COA BIOSYNTHESIS | 3.42 × 10−15 |
| CIRCADIAN REPRESSION OF EXPRESSION BY REV ERBA | 8.17 × 10−12 |
| NFKB IS ACTIVATED AND SIGNALS SURVIVAL | 1.73 × 10−11 |
| ACETYLCHOLINE BINDING AND DOWNSTREAM EVENTS | 6.24 × 10−11 |
Figure 6Data processing and analysis flowchart.
Figure 7PathDeep model structure.
Figure 8Workflow of the PathDeep label permutation study for identifying cancer-related pathways.