| Literature DB >> 22616791 |
Xiaowei Guan1, Mark R Chance, Jill S Barnholtz-Sloan.
Abstract
BACKGROUND: The identification of very small subsets of predictive variables is an important toπc that has not often been considered in the literature. In order to discover highly predictive yet compact gene set classifiers from whole genome expression data, a non-parametric, iterative algorithm, Splitting Random Forest (SRF), was developed to robustly identify genes that distinguish between molecular subtypes. The goal is to improve the prediction accuracy while considering sparsity.Entities:
Year: 2012 PMID: 22616791 PMCID: PMC3444418 DOI: 10.1186/2043-9113-2-13
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
Figure 1Flowchart of the Splitting Random Forest (SRF) Algorithm.
Comparison of SRF running times (50, 100 and 500) in the GB full dataset, the BC training and the OC training datasets
| GB | 50 | 95.4% (.) | 36 |
| | 100 | 96.6% (+1.2%*) | 23 |
| | 500 | 98.9% (+3.5%**) | 29 |
| BC | 50 | 93.6% (.) | 48 |
| | 100 | 93.6% (+0%*) | 50 |
| | 500 | 95.7% (+2.1%**) | 32 |
| OC | 50 | 92.7% (.) | 189 |
| | 100 | 94.5% (+1.8%*) | 290 |
| 500 | 94.5% (+1.8%**) | 188 |
*Comparison between 50 vs. 100 runs; ** Comparison between 50 vs. 500 runs.
Figure 2Venn Diagrams of the overlap between the SRF50, published datasets and the ANOVA gene lists.A. &B. for GB; C. &D. for breast cancer (BC) and E. &F. for ovarian cancer (OC).
Overall performance comparison of 5 gene lists in the GB, BC and OC validation datasets
| GB | SRF50 | 36 | 80.1% | | 0.87 | 1.68 |
| | Single RF | 88 | 77.8% | −2.3% | 0.86 | 1.63 |
| | Verhaak et al. | 833 | 86.0% | 5.9% | 0.92 | 1.91 |
| | ANOVA | 8670 | 84.1% | 4.0% | 0.9 | 1.83 |
| | Top 50 ANOVA | 50 | 77.2% | −2.9% | 0.87 | 1.71 |
| BC | SRF50 | 48 | 84.0% | | 0.91 | 1.85 |
| | Single RF | 46 | 74.1% | −9.9% | 0.87 | 1.69 |
| | Parker et al. | 979 | 89.0% | 5.0% | 0.89 | 1.78 |
| | ANOVA | 4976 | 85.2% | 1.2% | 0.86 | 1.65 |
| | Top 50 ANOVA | 50 | 82.7% | −1.3% | 0.87 | 1.72 |
| OC | SRF50 | 189 | 89.8% | | 0.96 | 2.06 |
| | Single RF | 245 | 88.9% | −0.9% | 0.96 | 2.06 |
| | Tothill et al. | 2106 | 91.7% | 1.9% | 0.97 | 2.11 |
| | ANOVA | 7144 | 90.7% | 0.9% | 0.97 | 2.09 |
| Top 50 ANOVA | 50 | 87.0% | −2.8% | 0.95 | 2.01 |
All changes of prediction accuracy were calculated using SRF50 as the referent; Multi-class AUC values were calculated by taking average of the all pairwise AUC values; Area covered by radar chart (ACRC) [29].
Subtype prediction accuracy of 5 gene lists in the GB, BC and OC validation datasets
| GB | SRF50 | 70.0% | 81.3% | 93.8% | |
| | Single RF | 77.1% | 91.7% | ||
| | Verhaak et al. | 88.0% | 91.7% | 50.0% | 100.0% |
| | ANOVA | 88.0% | 91.7% | 40.0% | 100.0% |
| | Top 50 ANOVA | 66.0% | 83.3% | 73.3% | 87.5% |
| | | ||||
| BC | SRF50 | 100.0% | 80.0% | 84.4% | |
| | Single RF | 88.9% | 87.5% | ||
| | Parker et al. | 100.0% | 54.5% | 90.0% | 93.8% |
| | ANOVA | 100.0% | 36.4% | 85.0% | 93.8% |
| | Top 50 ANOVA | 100.0% | 75.0% | 84.4% | |
| | | ||||
| OC | SRF50 | 97.6% | 88.0% | 73.9% | |
| | Single RF | 97.6% | 88.0% | 78.3% | |
| | Tothill et al. | 97.6% | 88.0% | 87.0% | 88.9% |
| | ANOVA | 100.0% | 88.0% | 78.3% | 88.9% |
| Top 50 ANOVA | 95.2% | 88.0% | 73.9% |
Figure 3Radar Chart for pairwise comparison (AUC values) for the GB dataset.
Figure 4Radar Chart for pairwise comparison (AUC values) for the BC dataset.
Figure 5Radar Chart for pairwise comparison (AUC values) for the OC dataset.
Biological functions identified from IPA using overlapπng hub gene lists of SRF 50 and the published sets in the GB, BC and OC datasets
| GB | cell morphology, hematological system development and function | |
| | tumor morphology, nervous system development and function | |
| BC | developmental disorder, reproductive system disease, cellular growth and proliferation; | |
| | cancer, infection mechanism, gene expression and tumor morphology | |
| | molecular transport, protein trafficking and cell cycle | |
| OC | antigen presentation, cell-to-cell signaling and interaction, cellular growth and proliferation | |
| | tissue disorders, genetic disorder and cellular assembly and organization | |
| | embryonic development and organismal development | |
| | cardiac damage, organismal injury and abnormalities | |
| cell morphology, connective tissue development and function |