| Literature DB >> 19187562 |
Pei-Chun Chen1, Su-Yun Huang, Wei J Chen, Chuhsing K Hsiao.
Abstract
BACKGROUND: Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.Entities:
Mesh:
Year: 2009 PMID: 19187562 PMCID: PMC2669483 DOI: 10.1186/1471-2105-10-44
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in acute leukemia data with two classes
| weighted sum | | proportion | Cumulative proportions | gene number | description [ |
| 150.6797 | 0.1847 | 0.1847 | 6201 | interleukin-8 precursor |
| 125.3594 | 0.1536 | 0.3383 | 1882 | CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage) |
| 117.9711 | 0.1446 | 0.4829 | 2402 | Azurocidin gene |
| 92.7434 | 0.5966 | 5552 | probable G protein-coupled receptor LCR1 homolog | |
| 72.3649 | 0.0887 | 0.6853 | 1779 | MPO Myeloperoxidase |
| 69.6762 | 0.0854 | 0.7707 | 6181 | PTMA gene extracted from Human prothymosin alpha mRNA |
| 64.7264 | 0.0793 | 1763 | Thymosin beta-4 mRNA | |
| 61.0759 | 0.0749 | 0.9249 | 2345 | G-gamma globin gene extracted from H. sapiens G-gamma globin and A-gamma globin genes's |
| 55.6241 | 0.0682 | 0.9931 | 5308 | GDP-dissociation inhibitor protein (Ly-GDI) mRNA |
| 5.6697 | 0.0069 | 1 | 5648 | HLA-B null allele mRNA |
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in acute leukemia data with three classes
| weighted sum | | proportion | cumulative proportions | gene number | description [ |
| 206.1576 | 0.1583 | 0.1583 | 6201 | interleukin-8 precursor |
| 196.3753 | 0.1508 | 0.3091 | 1674 | FTL Ferritin, light polypeptide |
| 155.8362 | 0.1196 | 0.4287 | 1882 | CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage) |
| 143.1404 | 0.1099 | 0.5386 | 5552 | probable G protein-coupled receptor LCR1 homolog |
| 141.0207 | 0.6469 | 2402 | Azurocidin gene | |
| 120.1933 | 0.0923 | 0.7392 | 6209 | VIM Vimentin |
| 112.4293 | 0.0863 | 4017 | HLA class II histocompatibility antigen, DR alpha chain precursor | |
| 96.7316 | 0.0743 | 0.8998 | 5716 | RPS3 Ribosomal protein S3 |
| 83.2569 | 0.0639 | 0.9637 | 1779 | MPO Myeloperoxidase |
| 47.4576 | 0.0364 | 1 | 5648 | HLA-B null allele mRNA |
Testing accuracies under different procedures for the acute leukemia data
| Binary classes | |||
| Procedures | Classifier | No. of genes | Accuracy |
| A: Proposed selection and criterion | |||
| RLS-SVR | |||
| +KFDA | 4 | 0.9412 | |
| +SVM | 4 | 0.9412 | |
| ∑ | +KFDA | 7 | 1 |
| +SVM | 7 | 1 | |
| +KFDA | 10 | 1 | |
| +SVM | 10 | 1 | |
| B: Other selection procedures | |||
| BVS | +KFDA | 5 | 0.9706 |
| +SVM | 5 | 0.9706 | |
| BMA | +KFDA | 20 | 1 |
| +SVM | 20 | 1 | |
| SGS1 | +KFDA | 10 | 0.9118 |
| +SVM | 10 | 0.9118 | |
| SGS2 | +KFDA | 10 | 0.9412 |
| +SVM | 10 | 0.9412 | |
| C: Selection and classification together | |||
| IFFS | 14 | 1 | |
| SVM-RFE | 8 | 1 | |
| BVS | 5 | 0.9706 | |
| BMA | 20 | 0.9412 | |
| Three classes | |||
| Procedures | Classifier | No. of genes | Accuracy |
| A: Proposed selection and criterion | |||
| RLS-SVR | |||
| +KFDA | 5 | 0.7353 | |
| +SVM | 5 | 0.9118 | |
| ∑ | +KFDA | 7 | 1 |
| +SVM | 7 | 1 | |
| +KFDA | 10 | 0.9706 | |
| +SVM | 10 | 0.9412 | |
| B: Other selection procedures | |||
| BMA | +KFDA | 15 | 0.9706 |
| +SVM | 15 | 0.9706 | |
| SGS1 | +KFDA | 10 | 0.9118 |
| +SVM | 10 | 0.8529 | |
| SGS2 | +KFDA | 10 | 0.8824 |
| +SVM | 10 | 0.8529 | |
| C: Selection and classification together | |||
| IFFS | 23 | 1 | |
| BMA | 15 | 0.9706 | |
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in colon cancer data
| weighted sum | | Proportion | cumulative proportions | gene number | description [ |
| 33.2835 | 0.2522 | 0.2522 | 164 | interferon-inducible protein 1-8D (human); contains MSR1 repetitive element |
| 28.4860 | 0.2158 | 0.4860 | 1378 | 80.7 KD alpha trans-inducing protein (Bovine herpesvirus type 1) |
| 21.0143 | 0.1592 | 0.6272 | 115 | H. sapiens p27 mRNA |
| 13.9334 | 0.7328 | 249 | human desmin gene, complete cds. | |
| 10.4369 | 0.0791 | 13 | H. sapiens ACTB mRNA for mutant beta-actin (beta'-actin) | |
| 8.2575 | 0.0626 | 0.8745 | 16 | human tra1 mRNA for human homologue of murine tumor rejection antigen gp96 |
| 5.9915 | 0.0454 | 0.9199 | 33 | 40S robosomal protein S24 (human) |
| 5.8151 | 0.0441 | 0.9640 | 167 | IG lambda chain C regions (human) |
| 3.7171 | 0.0282 | 0.9922 | 14 | myosin light chain ALKALI, smooth-muscle iosform (human) |
| 1.0403 | 0 0079 | 1 | 44 | ubiquitin (human) |
Testing accuracies under different procedures for colon cancer data
| Procedures | Classifier | No. of genes | Accuracy | SD |
| A: Proposed selection and criterion | ||||
| RLS-SVR | ||||
| +KFDA | 4 | 0.9250 | 0.0083 | |
| +SVM | 4 | 0.9067 | 0.0082 | |
| ∑ | +KFDA | 5 | 0.9200 | 0.0163 |
| +SVM | 5 | 0.9183 | 0.0157 | |
| +KFDA | 10 | 0.9400 | 0.0186 | |
| +SVM | 10 | 0.9300 | 0.0167 | |
| B: Other selection procedures | ||||
| EB | +KFDA | 9 | 0.9283 | 0.0076 |
| +SVM | 9 | 0.9200 | 0.0194 | |
| SGS1 | +KFDA | 10 | 0.925 | 0.0083 |
| +SVM | 10 | 0.9100 | 0.0153 | |
| SGS2 | +KFDA | 10 | 0.9283 | 0.0076 |
| +SVM | 10 | 0.9100 | 0.0186 | |
| C: Selection and classification together | ||||
| IFFS | 5 | 0.8806 | 0.0167 | |
| SVM-RFE | 8 | 0.9032 | n.a. | |
| EB | 9 | 0.919 | n.a. | |
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in SRBCT data
| weighted sum | | proportion | cumulative proportions | gene number | description [ |
| 366.2124 | 0.2283 | 0.2283 | 509 | human DNA for insulin-like growth factor II (IGF-2); exon 7 and additional ORF |
| 293.0313 | 0.4110 | 187 | insulin-like growth factor 2 (somatomedin A) | |
| 139.3697 | 0.0869 | 0.4979 | 246 | caveolin 1, caveolae protein, 22 kD |
| 130.3774 | 0.0813 | 0.5792 | 1955 | fibroblast growth factor receptor 4 |
| 120.8319 | 0.0753 | 0.6545 | 1645 | olfactomedinrelated ER localized protein |
| 118.9978 | 0.0742 | 0.7287 | 545 | antigen identified by monoclonal antibodies 12E7, F21 and O13 |
| 110.2948 | 0.0688 | 0.7975 | 1954 | follicular lymphoma variant translocation 1 |
| 109.6586 | 0.0684 | 1389 | Fc fragment of IgG, receptor, transporter, alpha | |
| 108.1788 | 0.0674 | 0.9333 | 1372 | nucleolin |
| 107.1303 | 0.0667 | 1 | 430 |
Testing accuracies under different procedures for SRBCT data
| Procedures | Classifier | No. of genes | Accuracy |
| A: Proposed selection and criterion | |||
| RLS-SVR | |||
| +KFDA | 2 | 0.6 | |
| +SVM | 2 | 0.55 | |
| ∑ | +KFDA | 8 | 0.95 |
| +SVM | 8 | 0.95 | |
| +KFDA | 10 | 1 | |
| +SVM | 10 | 1 | |
| B: Other selection procedures | |||
| EB | +KFDA | 14 | 1 |
| +SVM | 14 | 1 | |
| SGS1 | +KFDA | 10 | 0.8 |
| +SVM | 10 | 0.7 | |
| SGS2 | +KFDA | 10 | 0.85 |
| +SVM | 10 | 0.85 | |
| C: Selection and classification together | |||
| EB | 14 | 1 | |
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in breast cancer data
| weighted sum | | proportion | cumulative proportions | gene number |
| 68.8881 | 0.1897 | 0.1897 | 422 |
| 49.4341 | 0.1361 | 0.3258 | 2886 |
| 45.0788 | 0.1241 | 0.4499 | 1612 |
| 42.2519 | 0.1163 | 0.5662 | 114 |
| 39.0654 | 0.6738 | 1066 | |
| 29.6513 | 0.0816 | 0.7554 | 3023 |
| 25.4254 | 0.0700 | 719 | |
| 25.0111 | 0.0689 | 0.8943 | 1084 |
| 20.1092 | 0.0554 | 0.9496 | 497 |
| 18.2996 | 0.0504 | 1 | 1561 |
Testing accuracies under different procedures for breast cancer data
| Procedures | Classifier | No. of genes | Accuracy |
| A: Proposed selection and criterion | |||
| RLS-SVR | |||
| +KFDA | 5 | 0.9091 | |
| +SVM | 5 | 0.9545 | |
| ∑ | +KFDA | 7 | 0.9091 |
| +SVM | 7 | 0.9545 | |
| +KFDA | 10 | 0.9545 | |
| +SVM | 10 | 0.9545 | |
| B: Other selection procedures | |||
| MBGS | +KFDA | 10 | 0.9545 |
| +SVM | 10 | 1 | |
| SGS1 | +KFDA | 10 | 0.9091 |
| +SVM | 10 | 0.9545 | |
| SGS2 | +KFDA | 10 | 0.9091 |
| +SVM | 10 | 0.9545 | |
| C: Selection and classification together | |||
| BMA | 13–18 | 0.7273 | |
| MBGS | 10 | 1 | |
The gene weighted sums, proportions, cumulative proportions, and corresponding gene numbers of the selected genes in lung cancer data
| weighted sum | | proportion | cumulative proportions | gene number | description [ |
| 211.3186 | 0.1600 | 0.1600 | 732 | GRO2 oncogene |
| 208.0781 | 0.1576 | 0.3176 | 2722 | ligand of neuronal nitric oxide synthase with carboxyl-terminal PDZ domain |
| 191.642 | 0.1451 | 0.46278 | 2194 | fatty acid binding protein 7, brain |
| 158.3931 | 0.1200 | 0.5827 | 3243 | bridging integrator 1 |
| 142.839 | 0.6909 | 2010 | progesterone binding protein | |
| 121.546 | 0.0921 | 0.7830 | 2096 | interferon regulatory factor 3 |
| 106.1448 | 0.0804 | 1881 | occludin | |
| 102.9994 | 0.0780 | 0.9414 | 2987 | apoptosis-associated tyrosine kinase |
| 46.473 | 0.0352 | 0.9766 | 215 | ribonuclease, RNase A family, 1 (pancreatic) |
| 30.9358 | 0.0234 | 1 | 270 | UNC13 (C. elegans)-like |
Testing accuracies under different procedures for lung cancer data
| Procedures | Classifier | No. of genes | Accuracy | SD |
| A: Proposed selection and criterion | ||||
| RLS-SVR | ||||
| +KFDA | 5 | 0.903 | 0.0082 | |
| +SVM | 5 | 0.9051 | 0.0111 | |
| ∑ | +KFDA | 7 | 0.9179 | 0.0059 |
| +SVM | 7 | 0.9097 | 0.0065 | |
| +KFDA | 10 | 0.9222 | 0.009 | |
| +SVM | 10 | 0.9071 | 0.0104 | |
| B: Other selection procedures | ||||
| SGS1 | +KFDA | 10 | 0.9005 | 0.0062 |
| +SVM | 10 | 0.9005 | 0.0052 | |
| SGS2 | +KFDA | 10 | 0.8077 | 0.0164 |
| +SVM | 10 | 0.8513 | 0.0015 | |
| C: Selection and classification together | ||||
| SGS1 | 98 | 0.938 | n.a. | |
| SGS2 | 99 | 0.931 | n.a. | |
Figure 1Accuracies with respect to different numbers of genes.