| Literature DB >> 20140065 |
Mihir S Sewak1, Narender P Reddy, Zhong-Hui Duan.
Abstract
Analysis of gene expression data provides an objective and efficient technique for sub-classification of leukemia. The purpose of the present study was to design a committee neural networks based classification systems to subcategorize leukemia gene expression data. In the study, a binary classification system was considered to differentiate acute lymphoblastic leukemia from acute myeloid leukemia. A ternary classification system which classifies leukemia expression data into three subclasses including B-cell acute lymphoblastic leukemia, T-cell acute lymphoblastic leukemia and acute myeloid leukemia was also developed. In each classification system gene expression profiles of leukemia patients were first subjected to a sequence of simple preprocessing steps. This resulted in filtering out approximately 95 percent of the non-informative genes. The remaining 5 percent of the informative genes were used to train a set of artificial neural networks with different parameters and architectures. The networks that gave the best results during initial testing were recruited into a committee. The committee decision was by majority voting. The committee neural network system was later evaluated using data not used in training. The binary classification system classified microarray gene expression profiles into two categories with 100 percent accuracy and the ternary system correctly predicted the three subclasses of leukemia in over 97 percent of the cases.Entities:
Keywords: gene selection; leukemia cancer; microarray; neural networks; sample classification
Year: 2009 PMID: 20140065 PMCID: PMC2808175 DOI: 10.4137/bbi.s2908
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Architectural diagram of the Committee Neural Network System: The 250 gene expression data was divided into ten sets of 25 each. Several neural networks were trained using each of these sets. Each network has three output nodes corresponding to each classification (T-ALL, B-ALL, and AML). The best performing 11 networks were recruited into a committee. The committee decision was by majority opinion.
Figure 2.Expression intensity values of top 50 genes for the binary classification system are scaled to a ‘hot’ color map. The x-axis displays the patients clubbed according to disease while the y axis shows informative genes.
Figure 3.Expression intensity values of the top most informative genes for the ternary classification system are scaled to a “hot” color map. The x-axis displays the patients grouped according to the disease while the y-axis shows informative genes. Figure 3a shows genes that differentiate B-ALLs from T-ALLs and AMLs. Figure 3b shows genes that differentiate T-ALLs from B-ALLs and AMLs. Figure 3c shows genes that differentiate AMLs from B-ALLs and T-ALLs.
Performance of the recruited committee for the two class system on the initial validation dataset.
| Correct classification out of 8 samples presented | 7 | 8 | 7 | 7 | 7 | 8 |
| Accuracy | 87.5 | 100 | 87.5 | 87.5 | 87.5 | 100 |
Performance of the recruited committee for the two class system on the final validation dataset.
| Correct classification out of 27 samples presented | 24 | 27 | 25 | 27 | 27 | 27 |
| Accuracy | 88.9 | 100 | 92.6 | 100 | 100 | 100 |
The overall performance of the recruited committee for the binary classification system.
| Correct classification out of 35 samples | 31 | 35 | 32 | 34 | 34 | 35 |
| Accuracy | 88.6 | 100 | 91.4 | 97.1 | 97.1 | 100 |
Sample results to demonstrate the working of a committee system.
| NN1 | B-ALL | B-ALL | AML | AML | ||
| NN2 | B-ALL | AML | ||||
| NN3 | B-ALL | B-ALL | AML | AML | ||
| NN4 | B-ALL | B-ALL | AML | |||
| NN5 | B-ALL | B-ALL | T-ALL | AML | AML | |
| NN6 | B-ALL | B-ALL | T-ALL | AML | AML | |
| NN7 | B-ALL | B-ALL | T-ALL | AML | AML | |
| NN8 | B-ALL | T-ALL | T-ALL | AML | ||
| NN9 | B-ALL | B-ALL | T-ALL | T-ALL | ||
| NN10 | B-ALL | B-ALL | T-ALL | T-ALL | ||
| NN11 | B-ALL | B-ALL | T-ALL | T-ALL | ||
| Committee decision | B-ALL | B-ALL | T-ALL | AML | AML | |
| Actual class | B-ALL | B-ALL | T-ALL | T-ALL | AML | AML |
Performance of the recruited committee for the three class system on the initial validation dataset.
| Correct classification out of 8 samples presented | 7 | 7 | 7 | 7 | 7 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
| Accuracy | 87.5 | 87.5 | 87.5 | 87.5 | 87.5 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Performance of the recruited committee for the three class system on the final validation dataset.
| Correct classification out of 27 samples presented | 26 | 20 | 26 | 23 | 25 | 25 | 25 | 26 | 24 | 23 | 23 | 26 |
| Accuracy | 96.3 | 74.1 | 96.3 | 85.2 | 92.6 | 92.6 | 92.6 | 96.3 | 88.9 | 85.2 | 85.2 | 96.3 |
The overall performance of the recruited committee for the ternary classification system.
| Correct classification out of 35 samples | 33 | 27 | 33 | 30 | 32 | 33 | 33 | 34 | 32 | 31 | 31 | 34 |
| Accuracy | 94.3 | 77.1 | 94.3 | 85.7 | 91.4 | 94.3 | 94.3 | 97.1 | 91.4 | 88.6 | 88.6 | 97.14 |
Confusion matrix for the three class system.
| ∑ | |||||
|---|---|---|---|---|---|
| Predicted | B-ALL | 19 | 0 | 0 | 19 |
| T-ALL | 0 | 3 | 0 | 3 | |
| AML | 0 | 1 | 12 | 13 | |
| ∑ | 19 | 4 | 12 | 35 | |
| Sensitivity | 100 | 75 | 100 | ||
| Specificity | 100 | 97.14 | 100 | ||