| Literature DB >> 21200431 |
Pankaj Chopra1, Jinseung Lee, Jaewoo Kang, Sunwon Lee.
Abstract
Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).Entities:
Mesh:
Substances:
Year: 2010 PMID: 21200431 PMCID: PMC3006158 DOI: 10.1371/journal.pone.0014305
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The accuracy of sumdiff/mul/sign-PAM for the top % genes compared with the PAM accuracy for each of the nine datasets.
The microarray datasets used for classification.
| Dataset | Platform | Total | Total | Reference |
| Genes (N) | Samples (M) | |||
| Colon | cDNA | 2000 | 62 | Alon |
| Leukemia | Affy | 7129 | 72 | Golub |
| CNS | Affy | 7129 | 34 | Pomeroy |
| DLBCL | Affy | 7129 | 77 | Shipp |
| Lung | Affy | 12533 | 181 | Gordon |
| Prostate1 | Affy | 12600 | 102 | Singh |
| Prostate2 | Affy | 12625 | 88 | Stuart |
| Prostate3 | Affy | 12626 | 33 | Welsh |
| GCM | Affy | 16063 | 280 | Ramaswamy |
Figure 2The accuracy of sumdiff/mul/sign-DT for the top % genes compared with the DT accuracy for each of the nine datasets.
Figure 3The accuracy of sumdiff/mul/sign-NB for the top % genes compared with the NB accuracy for each of the nine datasets.
Figure 4The accuracy of sumdiff/mul/sign-SVM for the top % genes compared with the SVM accuracy for each of the nine datasets.
Figure 5The accuracy of sumdiff/mul/sign-k-NN for the top % genes compared with the k-NN accuracy for each of the nine datasets.
LOOCV accuracy of the classifiers for the binary class expression datasets.
| Method | Leukemia | CNS | DLBCL | Colon | Pros.1 | Pros.2 | Pros.3 | Lung | GCM | Avg. |
| TSP | 93.80 | 77.90 | 98.10 | 91.10 | 95.10 | 67.60 | 97.00 | 98.30 | 75.40 | 88.26 |
|
| 95.83 | 97.10 | 97.40 | 90.30 | 91.18 | 75.00 | 97.00 | 98.90 | 85.40 | 92.01 |
| DT | 73.61 | 67.65 | 80.52 | 77.42 | 87.25 | 64.77 | 84.85 | 96.13 | 77.86 | 78.90 |
|
| 91.67 | 70.59 | 97.40 | 64.52 | 82.35 | 77.27 | 87.88 | 95.03 | 81.43 | 83.13 |
|
| 84.72 | 55.88 | 97.40 | 79.03 | 86.27 | 69.32 | 90.91 | 92.27 | 83.21 | 82.11 |
|
| 93.06 | 82.35 | 97.40 | 88.71 | 86.27 | 73.86 | 96.97 | 98.34 | 85.00 | 89.11 |
| NB | 100.00 | 82.35 | 80.52 | 56.45 | 62.75 | 73.86 | 90.91 | 97.79 | 84.29 | 80.99 |
|
| 98.61 | 82.35 | 92.21 | 87.10 | 82.35 | 76.14 | 96.97 | 99.45 | 81.43 | 88.51 |
|
| 97.22 | 79.41 | 89.61 | 85.48 | 85.29 | 75.00 | 100.00 | 100.00 | 82.50 | 88.28 |
|
| 65.28 | 73.53 | 75.32 | 64.52 | 50.98 | 56.82 | 72.73 | 82.87 | 67.86 | 67.77 |
|
| 84.72 | 82.35 | 89.61 | 74.19 | 74.51 | 73.86 | 93.94 | 98.34 | 86.79 | 84.26 |
|
| 98.61 | 85.29 | 94.81 | 83.87 | 91.18 | 76.14 | 96.97 | 99.45 | 92.86 | 91.02 |
|
| 97.22 | 85.29 | 93.51 | 80.65 | 85.29 | 77.27 | 100.00 | 100.00 | 91.79 | 90.11 |
|
| 97.22 | 85.29 | 96.10 | 82.26 | 88.24 | 76.14 | 100.00 | 99.45 | 91.79 | 90.72 |
| SVM | 98.61 | 82.35 | 97.40 | 82.26 | 91.18 | 76.14 | 100.00 | 99.45 | 93.21 | 91.18 |
|
| 98.61 | 82.35 | 96.10 | 88.71 | 93.14 | 78.41 | 100.00 | 99.45 | 91.07 | 91.98 |
|
| 97.22 | 88.24 | 96.10 | 87.10 | 88.24 | 79.55 | 100.00 | 99.45 | 91.07 | 91.89 |
|
| 97.22 | 79.41 | 97.40 | 79.03 | 89.22 | 75.00 | 100.00 | 99.45 | 87.5 | 89.36 |
| PAM | 94.03 | 82.35 | 85.45 | 89.52 | 90.89 | 81.25 | 94.24 | 97.90 | 82.32 | 88.66 |
|
| 95.83 | 79.41 | 87.01 | 87.10 | 93.14 | 77.27 | 96.97 | 98.34 | 83.57 | 88.74 |
|
| 95.83 | 85.29 | 92.21 | 90.32 | 92.16 | 79.55 | 93.94 | 98.90 | 82.86 | 90.12 |
|
| 95.83 | 85.29 | 94.81 | 88.71 | 90.20 | 76.14 | 100.00 | 98.9 | 81.07 | 90.11 |
*Results obtained in [6]
Results from taking the top 4% of genes for making unique doublets.
KEGG pathways related to the top 15 doublets for the CNS dataset.
| Doublet No. | Probe 1 | Gene 1 | KEGG 1 | Probe 2 | Gene 2 | KEGG 2 |
| 1 | U40317_s_at | PTPRS | Unknown | U27459_at | ORC2L | Cell cycle |
| 2 | J00212_f_at | IFNA21 | Cytokine-cytokine receptor interaction | U33920_at | SEMA3F | Axon guidance |
| Regulation of autophagy | ||||||
| Antigen processing and presentation | ||||||
| Toll-like receptor signaling pathway | ||||||
| Jak-STAT signaling pathway | ||||||
| Natural killer cell mediated cytotoicity | ||||||
| Autoimmune thyroid disease | ||||||
| 3 | D50924_at | DHX34 | Unknown | X04707_at | THRB | Neuroactive ligand-receptor interaction |
| 4 | U31215_s_at | GRM1 | Calcium signaling pathway | M64929_at | PPP2R2A | Tight junction |
| Neuroactive ligand-receptor interaction | ||||||
| Gap junction | ||||||
| Long-term potentiation | ||||||
| Long-term depression | ||||||
| 5 | U52828_s_at | CTNND2 | Unknown | U33267_at | GLRB | Neuroactive ligand-receptor interaction |
| 6 | D50582_at | KCNJ11 | Type II diabetes mellitus | Y10204_at | Unknown | Unknown |
| 7 | U83600_at | TNFRSF25 | Cytokine-cytokine receptor interaction | HG2260-HT2349_s_at | Unknown | Unknown |
| 8 | S77835_s_at | IL2 | Cytokine-cytokine receptor interaction | M60858_rna1_at | NCL | Pathogenic Escherichia coli infection - EHEC |
| Jak-STAT signaling pathway | Pathogenic Escherichia coli infection - EPEC | |||||
| T cell receptor signaling pathway | ||||||
| Type I diabetes mellitus | ||||||
| Autoimmune thyroid disease | ||||||
| Allograft rejection | ||||||
| Graft-versus-host disease | ||||||
| 9 | L32179_at | AADAC | Alkaloid biosynthesis II | M14660_at | IFIT2 | Unknown |
| 10 | D50310_at | CCNI | Unknown | L78833_cds2_at | RND2 | Unknown |
| 11 | HG2417-HT2513_at | Unknown | Unknown | U35451_at | CBX1 | Unknown |
| 12 | U03090_at | PLA2G5 | Glycerophospholipid metabolism | M16594_at | GSTA2 | Glutathione metabolism |
| Ether lipid metabolism | Metabolism of enobiotics by cytochrome P450 | |||||
| Arachidonic acid metabolism | Drug metabolism - cytochrome P450 | |||||
| Linoleic acid metabolism | ||||||
| alpha-Linolenic acid metabolism | ||||||
| MAPK signaling pathway | ||||||
| VEGF signaling pathway | ||||||
| Fc epsilon RI signaling pathway | ||||||
| Long-term depression | ||||||
| GnRH signaling pathway | ||||||
| 13 | U68536_at | ZNF24 | Unknown | X64728_at | CHML | Unknown |
| 14 | L43964_at | PSEN2 | Notch signaling pathway | X98206_at | Unknown | Unknown |
| Alzheimers disease | ||||||
| 15 | D13118_at | ATP5G1 | Oxidative phosphorylation | U78556_at | MTMR11 | Unknown |
| Alzheimers disease | ||||||
| Parkinsons disease |