| Literature DB >> 19832995 |
Malik Yousef1, Mohamed Ketany, Larry Manevitz, Louise C Showe, Michael K Showe.
Abstract
BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE)Entities:
Mesh:
Substances:
Year: 2009 PMID: 19832995 PMCID: PMC2774324 DOI: 10.1186/1471-2105-10-337
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The SVM-RNE algorithm. A flowchart of the SVM-RNE algorithm consists of main three steps: 1) Building Networks for building networks of genes, 2) SVM scoring for assessment of significant networks and 3)Network Elimination to remove networks with low score.
Summary results for the SVM-RNE, SVM-RCE and SVM-RFE algorithms.
| SVM-RNE | 2 | 4 | 100% | 2 | 5 | 91% | 4 | 13 | 80% |
| 4 | 8 | 97% | 4 | 8 | 90% | 6 | 18 | 79% | |
| 14 | 31 | 96% | 14 | 33 | 90% | 11 | 30 | 74% | |
| 24 | 55 | 92% | 30 | 69 | 89% | ||||
| SVM-RCE | 2 | 8 | 96% | 2 | 8 | 76% | 2 | 13 | 96% |
| 3 | 12 | 96% | 9 | 34 | 89% | ||||
| 9 | 32 | 97% | 19 | 71 | 91% | ||||
| 15 | 51 | 97% | 28 | 104 | 91% | ||||
| 32 | 101 | 96% | |||||||
| 6 | 39 | 92% | |||||||
| 10 | 64 | 92% | |||||||
| SVM-RFE | 9 | 89% | 8 | 84% | 12 | 81% | |||
| 32 | 94% | 32 | 85% | 18 | 79% | ||||
| 102 | 100% | 102 | 87% | 30 | 77% | ||||
Classification accuracies for the three algorithms on three datasets are presented at representative steps in the course of recursive feature elimination. The number of clusters (c) and the total number of genes in the clusters (g) are shown for the steps which are presented in reverse order, i.e. the last elimination step is shown first in the table. No clusters are shown for SVM-RFE since the genes are eliminated without clustering.
Figure 2Classification performance of SVM-RCE on the Lymphocyte data set. All of the values are an average of 100 iterations of SVM-RCE. ACC is the accuracy, TP is the sensitivity, and TN is the specificity of the remaining genes determined on the test set. The x-axis shows the median number of clusters and number of genes in the clusters at each step.
Figure 3Classification performance of SVM-RNE on the Lymphocyte data set. All of the values are an average of 100 iterations of SVM-RNE. ACC is the accuracy, TP is the sensitivity, and TN is the specificity of the remaining genes determined on the test set. The x-axis shows the median number of genes hosted by the networks.