| Literature DB >> 35663047 |
Sunil Kumar Prabhakar1, Semin Ryu1, In Cheol Jeong1, Dong-Ok Won1.
Abstract
One of the major reasons of mortality in human beings is cancer, and there is an absolute necessity for doctors to identify and treat a person suffering from it. Leukemia is a group of blood cancers that usually originates in the bone marrow and results in very high number of abnormal cells. For the diagnosis of cancer, microarray data serves as an important clinical application and serves as a great aid to the entire medical community. The dimensionality of the microarray data is too high, and so selection of suitable genes is quite an important step for the improvement of data classification. Therefore, for the prediction and diagnosis of cancer, there is an utmost necessity to select the most informative genes. In this work, Minimum Redundancy Maximum Relevance (MRMR), Signal to Noise Ratio (SNR), Multivariate Error Weight Uncorrelated Shrunken Centroid (EWUSC), and multivariate correlation-based feature selection (CFS) are chosen as initial feature selection techniques. Then, to select the most informative genes, five different kinds of evolutionary optimization techniques too are incorporated here such as African Buffalo Optimization (ABO), Artificial Bee Colony Optimization (ABCO), Cockroach Swarm Optimization (CSO), Imperialist Competitive Optimization (ICO), and Social Spider Optimization (SSO). Finally, the optimized values are fed through classification process and the best results are obtained when multivariate CFS with SSO is utilized and classified with Probabilistic Neural Network (PNN), and a high classification accuracy of 95.70% is obtained.Entities:
Mesh:
Year: 2022 PMID: 35663047 PMCID: PMC9162867 DOI: 10.1155/2022/2052061
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.246
Dataset details.
| Dataset | Number of genes | Class 1 (ALL) | Class 2 (AML) | Total samples |
|---|---|---|---|---|
| Leukemia: AML-ALL | 7129 | 47 | 25 | 72 |
Figure 1Illustration of the work.
Performance analysis of classifiers in terms of classification accuracies with ABO for different gene selection techniques using 50–200 selected genes.
| Method | NBC | SVM | RF | PNN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes selected | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 |
| Minimum Redundancy–Maximum Relevance (MRMR): | 85.22406 | 89.85625 | 88.8125 | 77.6025 | 76.54 | 81.51 | 84.8975 | 78.90875 | 76.27 | 77.02938 | 85.9375 | 78.59852 |
| Signal to Noise Ratio (SNR): | 76.0675 | 90.1125 | 87.5 | 77.47188 | 90.88625 | 77.35758 | 79.82 | 92.45 | 91.67 | 92.97 | 88.55 | 87.10938 |
| Multivariate Error-Weighted Uncorrelated Shrunken Centroid (EWUSC): | 85.9375 | 86.32813 | 91.47406 | 77.35758 | 76.97875 | 88.025 | 84.8975 | 91.1475 | 85.74344 | 91.93 | 89.85625 | 92.97 |
| Multivariate correlation-based feature selection (CFS): | 91.27813 | 77.3984 | 90.625 | 89.85625 | 75.96875 | 77.58617 | 86.32813 | 77.57801 | 77.02938 | 78.97406 | 87.10938 | 76.54 |
Performance analysis of classifiers in terms of classification accuracies with ABCO for different gene selection techniques using 50–200 selected genes.
| Method | NBC | SVM | RF | PNN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes selected | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 |
| Minimum Redundancy–Maximum Relevance (MRMR): | 86.71875 | 76.81844 | 88.8125 | 78.77813 | 78.6475 | 87.10938 | 90.36875 | 77.21063 | 85.84047 | 80.86 | 84.375 | 86.32813 |
| Signal to Noise Ratio (SNR): | 88.55 | 85.42 | 85.74344 | 89.85625 | 88.025 | 90.88625 | 85.84047 | 84.22856 | 80.86 | 84.08213 | 91.27813 | 91.47406 |
| Multivariate Error-Weighted Uncorrelated Shrunken Centroid (EWUSC): | 88.8125 | 81.77 | 78.59852 | 85.67875 | 84.8975 | 75.75781 | 75.84375 | 77.17797 | 76.96609 | 76.675 | 77.73313 | 76.45563 |
| Multivariate correlation-based feature selection (CFS): | 76.21938 | 76.81 | 91.1475 | 85.9375 | 77.5127 | 77.53719 | 88.8125 | 84.08213 | 76.84375 | 77.01672 | 88.025 | 89.20625 |
Performance analysis of classifiers in terms of classification accuracies with CSO for different gene selection techniques using 50–200 selected genes.
| Method | NBC | SVM | RF | PNN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes selected | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 |
| Minimum Redundancy – Maximum Relevance (MRMR): | 78.33727 | 81.25 | 91.27813 | 86.32813 | 77.11266 | 77.92906 | 77.6025 | 79.82 | 84.22856 | 81.77 | 85.54938 | 85.84047 |
| Signal to Noise Ratio(SNR): | 92.19 | 90.88625 | 91.93 | 77.02938 | 90.49688 | 86.71875 | 85.67875 | 77.27594 | 77.39023 | 86.71875 | 77.27594 | 77.39023 |
| Multivariate Error-Weighted Uncorrelated Shrunken Centroid (EWUSC): | 83.3495 | 76.2025 | 77.47188 | 77.54535 | 88.8125 | 76.05063 | 79.82 | 76.37125 | 83.594 | 75.75 | 76.49781 | 79.69 |
| Multivariate correlation-based feature selection (CFS): | 88.55 | 87.10938 | 87.10938 | 89.20625 | 84.8975 | 77.97805 | 77.4882 | 79.17 | 89.85625 | 76.675 | 81.9 | 77.39023 |
Performance analysis of classifiers in terms of classification accuracies with ICO for different gene selection techniques using 50–200 selected genes.
| Method | NBC | SVM | RF | PNN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes selected | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 |
| Minimum Redundancy–Maximum Relevance (MRMR): | 83.3495 | 85.84047 | 91.47406 | 87.10938 | 76.81 | 84.22856 | 77.21063 | 76.6075 | 76.23414 | 86.71875 | 89.6 | 85.02813 |
| Signal to Noise Ratio (SNR): | 77.36574 | 75.75 | 77.32492 | 77.25961 | 77.58617 | 78.71281 | 75.9375 | 77.43922 | 75.875 | 85.67875 | 76.90281 | 76.37125 |
| Multivariate Error-Weighted Uncorrelated Shrunken Centroid (EWUSC): | 77.37391 | 86.71875 | 88.025 | 86.52344 | 90.1125 | 84.8975 | 77.39432 | 76.02531 | 76.135 | 75.625 | 75.75 | 87.10938 |
| Multivariate correlation-based feature selection (CFS): | 82.095 | 84.08213 | 80.015 | 81.77 | 83.78925 | 80.015 | 92.45 | 90.1125 | 83.105 | 76.97875 | 76.91547 | 77.47188 |
Performance analysis of classifiers in terms of classification accuracies with SSO for different gene selection techniques using 50–200 selected genes.
| Method | NBC | SVM | RF | PNN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes selected | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 | 50 | 100 | 200 |
| Minimum Redundancy–Maximum Relevance (MRMR): | 76.81 | 79.82 | 85.74344 | 77.40656 | 78.33727 | 76.2025 | 85.84047 | 77.37391 | 76.2025 | 77.47188 | 85.74344 | 77.53719 |
| Signal to Noise Ratio (SNR): | 76.92813 | 77.21063 | 86.71875 | 86.32813 | 87.10938 | 91.1475 | 85.67875 | 77.35758 | 85.67875 | 76.54 | 77.34125 | 77.47188 |
| Multivariate Error-Weighted Uncorrelated Shrunken Centroid (EWUSC): | 83.78925 | 82.616 | 86.32813 | 78.77813 | 80.73 | 89.6 | 81.51 | 91.1475 | 94.01125 | 77.47188 | 77.39023 | 78.00254 |
| Multivariate correlation-based feature selection (CFS): | 77.63516 | 77.08 | 91.27813 | 91.93 | 94.53375 | 94.66438 | 75.875 | 85.84047 | 88.8125 | 77.78211 | 94.795 | 95.705 |
Figure 2Performance of Performance Index for various classifier average in five different optimizations.