| Literature DB >> 19961616 |
Lucas B Edelman1, Giuseppe Toia, Donald Geman, Wei Zhang, Nathan D Price.
Abstract
BACKGROUND: Identification of molecular classifiers from genome-wide gene expression analysis is an important practice for the investigation of biological systems in the post-genomic era--and one with great potential for near-term clinical impact. The 'Top-Scoring Pair' (TSP) classification method identifies pairs of genes whose relative expression correlates strongly with phenotype. In this study, we sought to assess the effectiveness of the TSP approach in the identification of diagnostic classifiers for a number of human diseases including bacterial and viral infection, cardiomyopathy, diabetes, Crohn's disease, and transformed ulcerative colitis. We examined transcriptional profiles from both solid tissues and blood-borne leukocytes.Entities:
Mesh:
Year: 2009 PMID: 19961616 PMCID: PMC2797819 DOI: 10.1186/1471-2164-10-583
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Diagnostic Classification Tasks
| Classification Task | Tissue Source | Samples | GEO ID | # Probes |
|---|---|---|---|---|
| GI Stromal Tumor vs Leiomyosarcoma | GI Biopsy | 68 (37/31) | N/A | 43,931 |
| Crohn's Disease vs Healthy Controls | PBMC | 101 (59/42) | GDS1615 | 22,283 |
| Ischemic vs Idiopathic Cardiomyopathy | Cardiac Biopsy | 194 (86/108) | GSE5406 | 22,283 |
| Type I Diabetes vs Healthy Controls | PBMC | 105 (81/24) | GSE9006 | 22,283 |
| Type II Diabetes vs Healthy Controls | PBMC | 35 (12/23) | GSE9006 | 22,645 |
| Ulcerative Colitis W/WO Transformation | Colon Biopsy | 54 (11/43) | GSE3629 | 54,681 |
| Gram-Negative vs Gram-Positive Infection | PBMC | 73 (29/44) | GSE6269 | 22,283 |
| Gram-Negative vs Viral Infection | PBMC | 62 (18/44) | GSE6269 | 22,283 |
| HIV Infection vs Healthy Controls | PBMC | 86 (74/12) | GDS1449 | 8793 |
Microarray gene expression datasets obtained from the Gene Expression Omnibus. Transcriptional analysis was performed on either local tissue biopsies or peripheral blood mononuclear cells using commercially available oligonucleotide probe arrays. For sensitivity and specificity analysis, gastrointestinal stromal tumor (GIST), ischemic cardiomyopathy, transformed ulcerative colitis, and viral infection were defined as 'positive' diagnoses.
Figure 1Non-Overlapping TSP and k-TSP Classifiers for GIST and LMS Diagnosis. Cross-validation accuracy of the k-TSP classifier as a function of top-scoring pairs being removed from microarray gene expression data of clinical GIST and LMS specimens. For k-TSP classification, k is held to a maximum of 11 pairs.
Accuracy of Two-Transcript Classifiers on Diverse Phenotypes
| Classification Task | Accuracy | Classifier Gene Pair and Annotated Functions | False Discovery |
|---|---|---|---|
| GIST/LMS | 100% | PRUNE2 (Regulation of Apoptosis) | < 10 E-5 |
| Crohn's Disease | 96.04% | TBX21 (Immune Modulation) | < 10 E-5 |
| Cardiomyopathy | 74.23% | PDE8B (Phosphodiesterase; cAMP Metabolism) | < 0.002 |
| Type I Diabetes | 91.43% | CD1D (Antigen Processing and Presentation) | < 0.002 |
| Type II Diabetes | 100% | UNC5A (Regulation of Apoptosis) | < 0.005 |
| UC Transformation | 96.3% | PAK2 (Kinase Signaling; Cell Cycle Regulation) | 0.05910 |
| Gram-Negative/Viral | 100% | CD40 (Immune Response; B Cell Proliferation) | < 10 E-4 |
| HIV Infection | 100% | GAD1 (Glutamic Acid Metabolism) | < 10 E-4 |
Top apparent accuracy, sensitivity, and specificity, and false-discovery rate for each dataset using a two-gene TSP classifier. False discovery rate was based on the distribution of classifier accuracies following ten-fold random permutation of class labels.
Accuracy of k-TSP Classifiers
| Classification Task | Apparent Accuracy | Cross-Validation | Optimal K |
|---|---|---|---|
| GIST/LMS | 100.00% | 97.06% | 3 |
| Crohn's Disease | 98.00% | 91.10% | 7 |
| Cardiomyopathy | 85.10% | 65.00% | 7 |
| Type I Diabetes | 90.50% | 82.70% | 3 |
| Type II Diabetes | 100.00% | 82.70% | 3 |
| UC Transformation | 98.20% | 88.90% | 3 |
| Gram-Negative/Viral | 100.00% | 100.00% | 3 |
| HIV Infection | 98.80% | 94.20% | 3 |
Top apparent accuracy, leave-one-out cross-validation performance, and optimal 'k' value for each dataset achieved by the combinatoric k-TSP algorithm.
Cross-Validation Accuracy of Two-Transcript Classifiers
| Classification Task | CV Accuracy | CV Sensitivity | CV Specificity |
|---|---|---|---|
| GIST/LMS | 97.06% | 93.55% | 100.00% |
| Crohn's Disease | 87.13% | 88.14% | 85.71% |
| Cardiomyopathy | 74.23% | 58.14% | 87.04% |
| Type I Diabetes | 91.43% | 96.30% | 75.00% |
| Type II Diabetes | 94.29% | 91.67% | 95.56% |
| UC Transformation | 83.33% | 36.36% | 95.35% |
| Gram-Negative/Viral | 96.77% | 88.89% | 100.00% |
| HIV Infection | 88.37% | 90.54% | 75.00% |
Leave-one-out cross-validation accuracy, sensitivity, and specificity for each dataset using a two-gene TSP classifier.
Figure 2Top-Scoring Classifiers and Distributions of Classifier Accuracies. A: The distribution of all possible gene pair classifiers according to accuracy in the diagnosis of GIST and LMS, for both original data and randomly permuted data with randomized class labels. Vertical axis represents the fraction of pairs achieving the indicated accuracy. B: Plot comparing the expression level of the two genes from the top-scoring classifier as measured through a microarray platform, with a line of slope one and intercept of zero separating the two phenotypes according to the more highly expressed transcript. (Figure adapted from data originally published in [26].) C and D: Classifier accuracy distribution and top-scoring classifier microarray gene expression values for the diagnosis Crohns's Disease from circulating leukocytes. E and F: Accuracy distribution and top-scoring classifier in the diagnosis of ischemic and idiopathic cardiomyopathy.