| Literature DB >> 16046822 |
Yong Mao1, Xiaobo Zhou, Daoying Pi, Youxian Sun, Stephen T C Wong.
Abstract
We investigate the problems of multiclass cancer classification with gene selection from gene expression data. Two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. Using F test and recursive feature elimination based on SVM as gene selection methods, binary classification tree based on SVM with F test, binary classification tree based on SVM with recursive feature elimination based on SVM, and FSVM with recursive feature elimination based on SVM are tested in our experiments. To accelerate computation, preselecting the strongest genes is also used. The proposed techniques are applied to analyze breast cancer data, small round blue-cell tumors, and acute leukemia data. Compared to existing multiclass cancer classifiers and binary classification tree based on SVM with F test or binary classification tree based on SVM with recursive feature elimination based on SVM mentioned in this paper, FSVM based on recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy.Entities:
Year: 2005 PMID: 16046822 PMCID: PMC1184049 DOI: 10.1155/JBB.2005.160
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Binary classification tree based on SVM with gene selection.
Algorithm 1The FSVM with gene selection training algorithm.
The index no of the strongest genes selected in hereditary breast cancer dataset.
| No | FSVM with SVM-RFE | BCT-SVM with F test | BCT-SVM with SVM-RFE | ||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 1 | 2 | |
| 1 | 1008 | 1859 | 422 | 501 | 1148 | 750 | 1999 |
| 2 | 955 | 1008 | 2886 | 2984 | 838 | 860 | 3009 |
| 3 | 1479 | 10 | 343 | 3104 | 1859 | 1008 | 158 |
| 4 | 2870 | 336 | 501 | 422 | 272 | 422 | 2761 |
| 5 | 538 | 158 | 92 | 2977 | 1008 | 2804 | 247 |
| 6 | 336 | 1999 | 3004 | 2578 | 1179 | 1836 | 1859 |
| 7 | 3154 | 247 | 1709 | 3010 | 1065 | 3004 | 1148 |
| 8 | 2259 | 1446 | 750 | 2804 | 2423 | 420 | 838 |
| 9 | 739 | 739 | 2299 | 335 | 1999 | 1709 | 1628 |
| 10 | 2893 | 1200 | 341 | 2456 | 2699 | 3065 | 1068 |
| 11 | 816 | 2886 | 1836 | 1116 | 1277 | 2977 | 819 |
| 12 | 2804 | 2761 | 219 | 268 | 1068 | 585 | 1797 |
| 13 | 1503 | 1658 | 156 | 750 | 963 | 1475 | 336 |
| 14 | 585 | 560 | 2867 | 2294 | 158 | 3217 | 2893 |
| 15 | 1620 | 838 | 3104 | 156 | 609 | 501 | 2219 |
| 16 | 1815 | 2300 | 1412 | 2299 | 1417 | 146 | 585 |
| 17 | 3065 | 538 | 3217 | 2715 | 1190 | 343 | 1008 |
| 18 | 3155 | 498 | 2977 | 2753 | 2219 | 1417 | 2886 |
| 19 | 1288 | 809 | 1612 | 2979 | 560 | 2299 | 36 |
| 20 | 2342 | 1092 | 2804 | 2428 | 247 | 2294 | 1446 |
A part of the strongest genes selected in hereditary breast cancer dataset (the first row of genes in Table 1).
| Rank | Index no | Clone ID | Gene description |
| 1 | 1008 | 897781 | Keratin 8 |
| 2 | 955 | 950682 | Phosphofructokinase, platelet |
| 3 | 1479 | 841641 | Cyclin D1 (PRAD1: parathyroid adenomatosis 1) |
| 4 | 2870 | 82991 | Phosphodiesterase I/nucleotide pyrophosphatase 1 |
| (homologous to mouse Ly-41 antigen) | |||
| 5 | 538 | 563598 | Human GABA-A receptor |
| 6 | 336 | 823940 | Transducer of ERBB2, 1 |
| 7 | 3154 | 135118 | GATA-binding protein 3 |
| 8 | 2259 | 814270 | Polymyositis/scleroderma autoantigen 1 (75kd) |
| 9 | 739 | 214068 | GATA-binding protein 3 |
| 10 | 2893 | 32790 | mutS ( |
| 11 | 816 | 123926 | Cathepsin K (pycnodysostosis) |
| 12 | 2804 | 51209 | Protein phosphatase 1, catalytic subunit, beta isoform |
| 13 | 1503 | 838568 | Cytochrome c oxidase subunit VIc |
| 14 | 585 | 293104 | Phytanoyl-CoA hydroxylase (Refsum disease) |
| 15 | 1620 | 137638 | ESTs |
| 16 | 1815 | 141959 | |
| (from clone DKFZp566J2446) | |||
| 17 | 3065 | 199381 | ESTs |
| 18 | 3155 | 136769 | TATA box binding protein (TBP)-associated factor, |
| RNA polymerase II, A, 250kd | |||
| 19 | 1288 | 564803 | Forkhead (drosophila)-like 16 |
| 20 | 2342 | 284592 | Platelet-derived growth factor receptor, alpha polypeptide |
Classifiers' performance on hereditary breast cancer dataset by cross-validation (number of wrong classified samples in leave-one-out test).
| Classification method | Top 5 | Top 10 | Top 20 |
| FSVM with SVM-RFE | 0 | 0 | 0 |
| BCT-SVM with F test | 1 | 0 | 0 |
| BCT-SVM with SVM-RFE | 0 | 0 | 0 |
The index no of the strongest genes selected in small round blue-cell tumors dataset.
| No | FSVM with SVM-RFE | BCT-SVM with F test | SVM-RFE SVM-RFE | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 1 | 2 | 3 | 1 | 2 | 3 | |
| 1 | 246 | 255 | 1954 | 851 | 187 | 1601 | 1074 | 169 | 422 | 545 | 174 | 851 |
| 2 | 1389 | 867 | 1708 | 846 | 509 | 842 | 246 | 1055 | 1099 | 1389 | 1353 | 846 |
| 3 | 851 | 246 | 1955 | 1915 | 2162 | 1955 | 1708 | 338 | 758 | 2050 | 842 | 1915 |
| 4 | 1750 | 1389 | 509 | 1601 | 107 | 255 | 1389 | 422 | 1387 | 1319 | 1884 | 1601 |
| 5 | 107 | 842 | 2050 | 742 | 758 | 2046 | 1954 | 1738 | 761 | 1613 | 1003 | 742 |
| 6 | 2198 | 2050 | 545 | 1916 | 2046 | 1764 | 607 | 1353 | 123 | 1003 | 707 | 1916 |
| 7 | 2050 | 365 | 1389 | 2144 | 2198 | 509 | 1613 | 800 | 84 | 246 | 1955 | 2144 |
| 8 | 2162 | 742 | 2046 | 2198 | 2022 | 603 | 1645 | 714 | 1888 | 867 | 2046 | 2198 |
| 9 | 607 | 107 | 348 | 1427 | 1606 | 707 | 1319 | 758 | 951 | 1954 | 255 | 1427 |
| 10 | 1980 | 976 | 129 | 1 | 169 | 174 | 566 | 910 | 1606 | 1645 | 169 | 1 |
| 11 | 567 | 1319 | 566 | 1066 | 1 | 1353 | 368 | 2047 | 1914 | 1110 | 819 | 1066 |
| 12 | 2022 | 1991 | 246 | 867 | 1915 | 169 | 1327 | 2162 | 1634 | 368 | 509 | 867 |
| 13 | 1626 | 819 | 1207 | 788 | 788 | 1003 | 244 | 2227 | 867 | 129 | 166 | 788 |
| 14 | 1916 | 251 | 1003 | 153 | 1886 | 742 | 545 | 2049 | 783 | 348 | 1207 | 153 |
| 15 | 544 | 236 | 368 | 1980 | 554 | 2203 | 1888 | 1884 | 2168 | 365 | 603 | 1980 |
| 16 | 1645 | 1954 | 1105 | 2199 | 1353 | 107 | 2050 | 1955 | 1601 | 107 | 796 | 2199 |
| 17 | 1427 | 1708 | 1158 | 783 | 338 | 719 | 430 | 1207 | 335 | 1708 | 1764 | 783 |
| 18 | 1708 | 1084 | 1645 | 1434 | 846 | 166 | 365 | 326 | 1084 | 187 | 719 | 1434 |
| 19 | 2303 | 566 | 1319 | 799 | 1884 | 1884 | 1772 | 796 | 836 | 1626 | 107 | 799 |
| 20 | 256 | 1110 | 1799 | 1886 | 2235 | 1980 | 1298 | 230 | 849 | 1772 | 2203 | 1886 |
A part of the strongest genes selected in small round blue-cell tumors dataset (the first row of genes in Table 4).
| Rank | Index no | Clone ID | Gene description |
| 1 | 246 | 377461 | Caveolin 1, caveolae protein, 22kd |
| 2 | 1389 | 770394 | Fc fragment of IgG, receptor, transporter, alpha |
| 3 | 851 | 563673 | Antiquitin 1 |
| 4 | 1750 | 233721 | Insulin-like growth factor binding protein 2 (36kd) |
| 5 | 107 | 365826 | Growth arrest-specific 1 |
| 6 | 2198 | 212542 | |
| (from clone DKFZp586J2118) | |||
| 7 | 2050 | 295985 | ESTs |
| 8 | 2162 | 308163 | ESTs |
| 9 | 607 | 811108 | Thyroid hormone receptor interactor 6 |
| 10 | 1980 | 841641 | Cyclin D1 (PRAD1: parathyroid adenomatosis 1) |
| 11 | 567 | 768370 | tissue inhibitor of metalloproteinase 3 |
| (Sorsby fundus dystrophy, pseudoinflammatory) | |||
| 12 | 2022 | 204545 | ESTs |
| 13 | 1626 | 811000 | Lectin, galactoside-binding, soluble, 3 binding |
| protein (galectin 6 binding protein) | |||
| 14 | 1916 | 80109 | Major histocompatibility complex, class II, DQ alpha 1 |
| 15 | 544 | 1416782 | Creatine kinase, brain |
| 16 | 1645 | 52076 | Olfactomedinrelated ER localized protein |
| 17 | 1427 | 504791 | Glutathione S-transferase A4 |
| 18 | 1708 | 43733 | Glycogenin 2 |
| 19 | 2303 | 782503 | |
| 20 | 256 | 154472 | Fibroblast growth factor receptor 1 |
| (fms-related tyrosine kinase 2, Pfeiffer syndrome) | |||
Classifiers' performance on small round blue-cell tumors dataset by cross-validation (number of wrong classified samples in leave-one-out test).
| Classification method | Top 5 | Top 10 | Top 20 |
| FSVM with SVM-RFE | 0 | 0 | 0 |
| BCT-SVM with F test | 1 | 0 | 0 |
| BCT-SVM with SVM-RFE | 0 | 0 | 0 |
The index no of the strongest genes selected in acute leukemia dataset.
| No | FSVM with SVM-RFE | BCT-SVM with F test | BCT-SVM with SVM-RFE | ||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 1 | 2 | |
| 1 | 6696 | 1882 | 6606 | 2335 | 4342 | 1882 | 4342 |
| 2 | 6606 | 4680 | 6696 | 4680 | 4050 | 6696 | 4050 |
| 3 | 4342 | 6201 | 4680 | 2642 | 1207 | 5552 | 5808 |
| 4 | 1694 | 2288 | 4342 | 1882 | 6510 | 6378 | 1106 |
| 5 | 1046 | 6200 | 6789 | 6225 | 4052 | 3847 | 3969 |
| 6 | 1779 | 760 | 4318 | 4318 | 4055 | 5300 | 1046 |
| 7 | 6200 | 2335 | 1893 | 5300 | 1106 | 2642 | 6606 |
| 8 | 6180 | 758 | 1694 | 5554 | 1268 | 2402 | 6696 |
| 9 | 6510 | 2642 | 4379 | 5688 | 4847 | 3332 | 2833 |
| 10 | 1893 | 2402 | 2215 | 758 | 5543 | 1685 | 1268 |
| 11 | 4050 | 6218 | 3332 | 4913 | 1046 | 4177 | 4847 |
| 12 | 4379 | 6376 | 3969 | 4082 | 2833 | 6606 | 6510 |
| 13 | 1268 | 6308 | 6510 | 6573 | 4357 | 3969 | 2215 |
| 14 | 4375 | 1779 | 2335 | 6974 | 4375 | 6308 | 1834 |
| 15 | 4847 | 6185 | 6168 | 6497 | 6041 | 760 | 4535 |
| 16 | 6789 | 4082 | 2010 | 1078 | 6236 | 2335 | 1817 |
| 17 | 2288 | 6378 | 1106 | 2995 | 6696 | 2010 | 4375 |
| 18 | 1106 | 4847 | 5300 | 5442 | 1630 | 6573 | 5039 |
| 19 | 2833 | 5300 | 4082 | 2215 | 6180 | 4586 | 4379 |
| 20 | 6539 | 1685 | 1046 | 4177 | 4107 | 2215 | 5300 |
A part of the strongest genes selected in small round blue-cell tumors dataset (the second row of genes in Table 4).
| Rank | Index no | Gene accession number | Gene description |
| 1 | 1882 | M27891_at | CST3 cystatin C (amyloid angiopathy and cerebral hemorrhage) |
| 2 | 4680 | X82240_rna1_at | TCL1 gene (T-cell leukemia) extracted from |
| mRNA for T-cell leukemia/lymphoma 1 | |||
| 3 | 6201 | Y00787_s_at | Interleukin-8 precursor |
| 4 | 2288 | M84526_at | DF D component of complement (adipsin) |
| 5 | 6200 | M28130_rna1_s_at | Interleukin-8 (IL-8) gene |
| 6 | 760 | D88422_at | Cystatin A |
| 7 | 2335 | M89957_at | IGB immunoglobulin-associated beta (B29) |
| 8 | 758 | D88270_at | GB DEF = (lambda) DNA for immunoglobin light chain |
| 9 | 2642 | U05259_rna1_at | MEF2C MADS box transcription enhancer factor 2, |
| polypeptide C (myocyte enhancer factor 2C) | |||
| 10 | 2402 | M96326_rna1_at | Azurocidin gene |
| 11 | 6218 | M27783_s_at | ELA2 Elastatse 2, neutrophil |
| 12 | 6376 | M83652_s_at | PFC properdin P factor, complement |
| 13 | 6308 | M57731_s_at | GRO2 GRO2 oncogene |
| 14 | 1779 | M19507_at | MPO myeloperoxidase |
| 15 | 6185 | X64072_s_at | SELL leukocyte adhesion protein beta subunit |
| 16 | 4082 | X05908_at | ANX1 annexin I (lipocortin I) |
| 17 | 6378 | M83667_rna1_s_at | NF-IL6-beta protein mRNA |
| 18 | 4847 | X95735_at | Zyxin |
| 19 | 5300 | L08895_at | MEF2C MADS box transcription enhancer factor 2, |
| polypeptide C (myocyte enhancer factor 2C) | |||
| 20 | 1685 | M11722_at | Terminal transferase mRNA |
Classifiers' performance on acute leukemia dataset by cross-validation (number of wrong classified samples in leave-one-out test).
| Classification method | Top 5 | Top 10 | Top 20 |
| FSVM with SVM-RFE | 1 | 0 | 1 |
| BCT-SVM with F test | 2 | 4 | 2 |
| BCT-SVM with SVM-RFE | 2 | 1 | 2 |