| Literature DB >> 21884628 |
In-Hee Lee1, Gerald H Lushington, Mahesh Visvanathan.
Abstract
BACKGROUND: Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic microarray analysis to find genes that show differential expression according to disease state or type. Data-mining techniques such as feature selection are often used to isolate, from among a large manifold of genes with differential expression, those specific genes whose differential expression patterns are of optimal value in phenotypic differentiation. One such technique, Biomarker Identifier (BMI), has been developed to identify features with the ability to distinguish between two data groups of interest, which is thus highly applicable for such studies.Entities:
Year: 2011 PMID: 21884628 PMCID: PMC3164604 DOI: 10.1186/2043-9113-1-11
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
Comparison of classification performances on MAQC-II data set
| Classification Algorithms | ||||
|---|---|---|---|---|
| Information Gain | 0.9031 (6) | 0.9380 (25) | 0.9008 (40) | 0.9206 (50) |
| Chi-squared test | 0.8821 (1) | 0.9164 (50) | 0.9151 (4) | 0.9441 (60) |
| Relief-F | 0.8821 (1) | 0.9052 (15) | 0.8995 (50) | 0.9306 (60) |
| t-test | 0.9067 (15) | 0.9100 (20) | 0.9042 (8) | 0.9304 (40) |
| Window t-test | 0.8903 (5) | 0.9216 (5) | 0.9012 (2) | 0.9199 (10) |
| Moderated t-test | 0.8903 (6) | 0.9084 (5) | 0.8987 (1) | 0.9309 (50) |
| BMI | 0.9077 (4) | 0.9298 (15) | 0.9164 (4) | 0.9250 (9) |
Each value represents the maximum AUC value (by 10-fold cross-validation) achieved by the corresponding feature selection method and classification algorithm. The number of features used to achieve the maximum is shown inside parenthesis.
Comparison of classification performances on airway data set
| Classification Algorithms | ||||
|---|---|---|---|---|
| Information Gain | 0.6853 (40) | 0.8006 (4) | 0.8297 (50) | 0.8620 (60) |
| Chi-squared test | 0.7052 (20) | 0.8029 (60) | 0.7997 (3) | 0.8309 (50) |
| Relief-F | 0.6633 (25) | 0.7825 (9) | 0.8329 (25) | 0.8685 (60) |
| t-test | 0.6902 (8) | 0.7822 (4) | 0.8402 (4) | 0.8121 (8) |
| Window t-test | 0.6856 (20) | 0.7817 (30) | 0.8367 (20) | 0.8093 (40) |
| Moderated t-test | 0.6878 (6) | 0.7875 (5) | 0.8329 (5) | 0.8115 (20) |
| BMI | 0.7572 (9) | 0.8005 (5) | 0.8299 (5) | 0.8212 (10) |
Each value represents the maximum AUC value (via 10-fold cross-validation) achieved by the corresponding feature selection method and classification algorithm. The number of features used to achieve the maximum is shown inside parenthesis.
Figure 1The median ranks of validated genes in airway data set by various feature selection methods.
Top 10 genes selected by BMI
| Probe ID | Symbol | Regulation | Name |
|---|---|---|---|
| 201694_s _at | EGR1 | Up | early growth response 1 |
| 202056_at | KPNA1 | Up | karyopherin alpha 1 (importin alpha 5) |
| 203265_s_at | MAP2K4 | Up | mitogen-activated protein kinase kinase 4 |
| 207283_at | RPL23AP13 | Down | ribosomal protein L23a pseudogene 13 |
| 211612_s_at | IL13RA1 | Up | interleukin 13 receptor, alpha 1 |
| 214261_s_at | ADH6 | Up | alcohol dehydrogenase 6 (class V) |
| 216609_at | TXN | Down | Full length insert cDNA clone YI46D09 |
| 219233_s_at | GSDMB | Down | gasdermin B |
| 222339_x_at | - | Down | - |
| 34206 at | ARAP1 | Down | ArfGAP with RhoGAP domain, ankyrin repeat and PH domain 1 |
Classification performances with selected biomarkers by BMI and original literature
| Biomarkers by BMI | Biomarkers from original literature | |||||
|---|---|---|---|---|---|---|
| Naïve Bayes | 0.7938++ | 0.7006++ | 0.7489++ | 0.7117 | 0.6644 | 0.6872 |
| SVM | 0.8134++ | 0.7056++ | 0.7615++ | 0.6622 | 0.6593 | 0.6607 |
| Neural Network | 0.7242++ | 0.6422 | 0.6848 | 0.6956 | 0.7459++ | 0.7217++ |
| k-Nearest Neighbor | 0.8325++ | 0.6144 | 0.7275++ | 0.6378 | 0.6964++ | 0.6682 |
| Random Forest | 0.7139++ | 0.7328++ | 0.7230++ | 0.6872 | 0.6680 | 0.6773 |
++ and + denotes superior performance as determined at of 1% and 5% significance levels respectively.
KEGG pathways and PANTHER classifications associated with top 80 genes selected by BMI
| KEGG pathway name | Associated genes | |
|---|---|---|
| Colorectal cancer | 1.3809E-4 | FOS, MSH2, APC |
| Pathways in cancer | 0.0019 | FOS, MSH2, APC, TCEB2 |
| Metabolic pathways | 0.0021 | ADH6, SAT1, EXT2, TGDS, BTD, PRPS1, AGPS |
| Biotin metabolism | 0.0032 | BTD |
| MAPK signaling pathway | 0.0094 | DUSP10, MAP2K4, FOS |
| Cytokine-cytokine receptor interaction | 0.0098 | CXCR4, ACVR2A, IL13RA1 |
| Toll-like receptor signaling pathway | 0.0117 | FOS, MAP2K4 |
| Tight junction | 0.0196 | PPP2R2 D, INADL |
| Mismatch repair | 0.0361 | MSH2 |
| Glycosaminoglycan biosynthesis - heparan sulfate | 0.0408 | EXT2 |
| Pentose phosphate pathway | 0.0423 | PRPS1 |
| Endocytosis | 0.0428 | ARAP1, CXCR4 |
| PANTHER classification | Associated genes | |
| Oxidative stress response | 8.6417E-5 | TXN, MAP2K4, DUSP10 |
| O-antigen biosynthesis | 0.0064 | TGDS |
| T cell activation | 0.0083 | FOS, B2M |
| Interleukin signaling pathway | 0.0108 | IL13RA1, FOS |
| Apoptosis signaling pathway | 0.0133 | ATF3, FOS |
| FGF signaling pathway | 0.0135 | MAP2K4, PPP2R2D |
| Axon guidance mediated by Slit/Robo | 0.0253 | CXCR4 |
| Hypoxia response via HIF activation | 0.0408 | TXN |
| Insulin/IGF pathway-mitogen activated protein kinase kinase/MAP kinase cascade | 0.0484 | FOS |
NCI-Nature pathway interactions associated with top 80 genes selected by BMI
| NCI-Nature Pathway Interaction | Associated genes | |
|---|---|---|
| ATF-2 transcription factor network | 6.8276E-5 | ATF3, FOS, DUSP10 |
| Downstream signaling in naïve CD8+ T cells | 1.8173E-4 | B2 M, EGR1, FOS |
| Signaling events mediated by Hepatocyte Growth Factor Receptor (c-Met) | 2.6255E-4 | EGR1, MAP2K4, APC |
| Ephrin B reverse signaling | 8.6116E-4 | CXCR4, MAP2K4 |
| ErbB1 downstream signaling | 8.7013E-4 | MAP2K4, FOS, EGR1 |
| Regulation of p38-alpha and p38-beta | 0.0011 | DUSP10, MAP2K4 |
| Direct p53 effectors | 0.0013 | APC, MSH2, ATF3 |
| Trk receptor signaling mediated by the MAPK pathway | 0.0014 | EGR1, FOS |
| RhoA signaling pathway | 0.0021 | FOS, MAP2K4 |
| IL6-mediated signaling events | 0.0023 | MAP2K4, FOS |
| Presenilin action in Notch and Wnt signaling | 0.0024 | FOS, APC |
| Calcineurin-regulated NFAT-dependent transcription in lymphocytes | 0.0025 | EGR1, FOS |
| Regulation of Androgen receptor activity | 0.0027 | EGR1, MAP2K4 |
| Fc-epsilon receptor I signaling in mast cells | 0.0041 | FOS, MAP2K4 |
| IL12-mediated signaling events | 0.0045 | B2 M, FOS |
| HIF-1-alpha transcription factor network | 0.0052 | FOS, CXCR4 |
| CDC42 signaling events | 0.0058 | APC, MAP2K4 |
| Regulation of nuclear SMAD2/3 signaling | 0.0075 | FOS, ATF3 |
| Glucocorticoid receptor regulatory network | 0.0077 | FOS, EGR1 |
| Sumoylation by RanBP2 regulates transcriptional repression | 0.0174 | RANBP2 |
| JNK signaling in the CD4+ TCR pathway | 0.0206 | MAP2K4 |
| Ras signaling in the CD4+ TCR pathway | 0.0222 | FOS |
| Hypoxic and oxygen homeostasis regulation of HIF-1-alpha | 0.0284 | TCEB2 |
| Cellular roles of Anthrax toxin | 0.0346 | MAP2K4 |
| VEGFR3 signaling in lymphatic endothelium | 0.0361 | MAP2K4 |
| S1P2 pathway | 0.0377 | FOS |
| PDGFR-alpha signaling pathway | 0.0377 | FOS |
| ALK1 signaling events | 0.0392 | ACVR2A |
| Signaling events mediated by PRL | 0.0392 | EGR1 |
| TRAIL signaling pathway | 0.0438 | MAP2K4 |
| Regulation of CDC42 activity | 0.0453 | APC |
| S1P3 pathway | 0.0453 | CXCR4 |
| CD40/CD40L signaling | 0.0469 | MAP2K4 |
| Canonical Wnt signaling pathway | 0.0469 | APC |
| p38 MAPK signaling pathway | 0.0469 | TXN |
| Calcium signaling in the CD4+ TCR pathway | 0.0484 | FOS |
| Nongenotropic Androgen signaling | 0.0484 | FOS |
| Nephrin/Neph1 signaling in the kidney podocyte | 0.0499 | MAP2K4 |
| IL12 signaling mediated by STAT4 | 0.0499 | FOS |