| Literature DB >> 30108444 |
Hala M Alshamlan1,2.
Abstract
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.Entities:
Keywords: ABC; Artificial Bee Colony Algorithm; Cancer gene selection; Classification rule; Gene expression profile; Microarray; Quantitive rule-based classification model
Year: 2018 PMID: 30108444 PMCID: PMC6087852 DOI: 10.1016/j.sjbs.2018.01.017
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 1319-562X Impact factor: 4.219
Fig. 1Using evolutionary algorithms to discover rule based classification model.
Fig. 2The main steps of ABC algorithm.
Fig. 3The main phases and steps of the proposed DQB Dynamic Quantitive Bee algorithm.
Fig. 4The solution space for the DQB algorithm. SN represents the number of particular solutions or rules (food sources), and D represents the number to be optimized for each solution or quantitative classification rule QCR. Each cell represent different .
Fig. 5The structure of the rule item.
Fig. 6The dynamic local search method.
Fig. 7The flowchart of classification and prediction phase.
Statistics for cancer gene expression profiles.
| Gene expression profiles | No of classes | No of samples | No of genes | Description |
|---|---|---|---|---|
| Colon ( | 2 | 62 | 2000 | 40 Cancer samples and 22 normal samples |
| Leukemia1 ( | 2 | 72 | 7129 | 25 AML samples and 47 ALL samples. |
| Lung ( | 2 | 96 | 7129 | 86 Cancer samples and 10 normal samples. |
| SRBCT ( | 4 | 83 | 2308 | 29 EWS cancer samples, 18 NB cancer samples, 11 BL cancer samples, and 25 RMS cancer samples. |
| Lymphoma ( | 3 | 62 | 4026 | 42 DLBCL cancer samples, 9 FL cancer samples, and 11 B-CLL cancer samples. |
| Leukemia2 ( | 3 | 72 | 7129 | 28 AML sample, 24 ALL sample, and 20 MLL samples. |
The control parameters for DQB algorithm.
| Parameter | Value |
|---|---|
| 200 | |
| 2500 | |
| 30 | |
| Limit | 10 |
The performance of the CFS with SVM classifier for cancer gene expression profile.
| Microarray datasets | Number of genes | Classification accuracy |
|---|---|---|
| Colon | 25 | 91.94% |
| Leukemia1 | 80 | 100% |
| Lung | 71 | 100% |
| SRBCT | 110 | 100% |
| Lymphoma | 184 | 100% |
| Leukemia2 | 103 | 100% |
The best quantitative classification rules for colon dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 87.09% | |
| 2 | 93.54% | |
| 3 | 98.38% | |
| 4 | 100 % |
The best quantitative classification rules for leukemia1 dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 95.83% | |
| 2 | 100% | |
| 2 | 98.61% |
The best quantitative classification rules for lung dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 100% |
The best quantitative classification rules for SRBCT dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 100% | |
| 2 | 100% | |
| 2 | 100% | |
| 3 | 100% |
The best quantitative classification rules for lymphoma dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 100% | |
| 2 | 100% | |
| 1 | 96.96% |
The best quantitative classification rules for leukemia2 dataset.
| Number of rule items | QCR | Accuracy |
|---|---|---|
| 1 | 100% | |
| 2 | 100% | |
| 2 | 100% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for colon dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 87.09% | 85.48% | 86.28% | 72.58% | 69.35% | 70.96% |
| 2 | 93.54% | 91.93% | 92.74.% | 88.70% | 85.48% | 87.09% |
| 3 | 98.38% | 96.77% | 97.58% | 90.32% | 85.48% | 87.09% |
| 4 | 100% | 98.38% | 99.19% | 87.09% | 82.25% | 85.48% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for leukemia1 dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 95.83% | 94.44% | 95.13% | 84.72% | 80.55% | 82.63% |
| 2 | 100% | 97.22% | 98.61.% | 84.72% | 81.94% | 83.33% |
| 3 | 100% | 100% | 100% | 91.66% | 90.27% | 90.96% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for lung dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 100% | 98.95% | 99.22% | 93.75% | 89.58% | 90.22% |
| 2 | 100% | 100% | 100% | 98.95% | 95.33% | 96.87% |
| 3 | 100% | 100% | 100% | 100% | 91.66% | 97.91% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for SRBCT dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 100% | 96.38% | 98.79% | 92.71% | 73.99% | 83.13% |
| 2 | 100% | 97.59% | 99.8% | 95.18% | 80.72% | 90.36% |
| 3 | 100% | 100% | 100% | 95.18% | 81.92% | 91.56% |
| 4 | 100% | 100% | 100% | 96.38% | 85.54% | 92.77% |
| 5 | 100% | 100% | 100% | 96.38% | 79.51% | 90.36% |
| 6 | 100% | 100% | 100% | 93.97% | 78.31% | 85.54% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for lymphoma dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 100% | 92.42% | 96.96% | 92.42% | 68.18% | 89.39% |
| 2 | 100% | 90.90% | 95.16% | 92.42% | 61.24% | 86.36% |
| 3 | 100% | 90.90% | 95.16% | 87.87% | 59.67% | 74.19% |
| 4 | 100% | 90.90% | 95.16% | 86.36% | 59.67% | 72.58% |
The performance of the DQB algorithm with the proposed DLS method as compared to the original local search method for leukemai2 dataset.
| Number of rule items | Classification accuracy for the DQB algorithm | |||||
|---|---|---|---|---|---|---|
| Dynamic local search method | Original local search method | |||||
| Best | Worst | Mean | Best | Worst | Mean | |
| 1 | 100% | 95.83% | 97.22% | 91.66% | 72.22% | 83.33% |
| 2 | 100% | 98.61% | 99.3% | 93.05% | 68.05% | 80.55% |
| 3 | 100% | 100% | 100% | 91.66% | 55% | 69.44% |
| 4 | 100% | 100% | 100% | 90.27% | 55% | 72.22% |
Average runtime (in s) for the DQB algorithm and other classification algorithms.
| Algorithms | Preprocessing time | Average classification time | Total |
|---|---|---|---|
| DQB | 26.37 s | 30.51 s | 56.88 s |
| CFS + OneR | 26.37 s | 36.86 s | 63.23 s |
| CFS + PART | 26.37 s | 40.22 s | 66.59 s |
| mRMR-ABC with SVM ( | 25.17 s | 72.13 s | 97.3 s |
| Genetic-ABC with SVM ( | 25.17 s | 90.26 s | 115.43 s |
| ABC with SVM ( | 0.0 s | 134.74 s | 134.74 s |
The classification accuracy performance of the related algorithms under comparison for binary-class cancer microarray datasets. Numbers in parentheses mean the numbers of selected genes.
| Algorithms | Colon | Leukemia1 | Lung |
|---|---|---|---|
| DQB | 99.19(4) | 100(3) | 100(2) |
| CFS-PART | 93.54(3) | 97.22(2) | 98.95(1) |
| CFS-OneR | 88.70(1) | 95.83(1) | 98.95(1) |
| ABC-SVM ( | 92.44(20) | 91.89(5) | 93.75(8) |
| mRMR-ABC ( | 93.60(15) | 92.82(5) | 98.53(7) |
| Genetic-ABC ( | 93.60(9) | 96.43(5) | 99.11(7) |
| CARSVM ( | 95.83(4) | ||
| Association rule-based ( | 90(2) | 82(2) | |
| BSTC ( | 82.35(866) | 100(2173) | |
| PSO ( | 85.48(20) | 94.44(23) | |
| PSO ( | 87.01(2000) | 93.06 (7129) | |
| mRMR-PSO ( | 90.32(10) | 100(18) | |
| GADP ( | 100(8) | 100(5) | 100(8) |
| mRMR-GA ( | 100(15) | ||
| IGA ( | 100(4) | 95.75(7) | |
| MLHD-GA ( | 97.1(10) | ||
| GA ( | 93.55(12) | 100(6) | |
| mAnt ( | 91.5(8) | 100(9) |
The classification accuracy performance of the related algorithms under comparison for multu-class cancer microarray datasets. Numbers in parentheses mean the numbers of selected genes.
| Algorithms | SRBCT | Lymphoma | Leukemia2 |
|---|---|---|---|
| DQB | |||
| CFS-PART | |||
| CFS-OneR | 74.69(1) | 96.77(1) | 86.11(1) |
| ABC-SVM ( | 87.99(6) | 92.42(5) | 90.27(8) |
| mRMR-ABC ( | 91.56(6) | 95.69(5) | 91.66(8) |
| GBC ( | 96.38(6) | 96,96(5) | 95.83(8) |
| GADP( | 100(8) | 100(6) | |
| mRMR-GA ( | 95(5) | ||
| IGA ( | 98.75(6) | ||
| MLHD-GA ( | 100(11) | 100(6) | 100(9) |
| CFS-IBPSO ( | 100(6) | 98.57(41) | |
| mAnt ( | 100(7) | ||
| MIDClass ( | 100(3) |
| 1: | Represent the solution space of |
| 2: | |
| 3: | Generate random initial solution (rule) consisting of |
| 4: | Calculate the fitness value using fitness function in Eq. |
| 5: | Reset the abandonment counter. |
| 6: | |
| 7: | |
| 8: | Randomly select neighbour food source (rule) for current employee bee |
| 9: | Randomly select the first |
| 10: | Update the position of employ bee food source (rule) by changing the selected |
| 11: | Randomly select the second |
| 12: | Update the position of employee bee food source (rule) by changing selected |
| 13: | Calculate the fitness value using fitness function in Eq. |
| 14: | |
| 15: | Replace the old solution with the new one and reset the abandonment counter of the new solution. |
| 16: | |
| 17: | Increase the abandonment counter of the old solution by 1. |
| 18: | |
| 19: | Choose a food source (rule) depending on its probability to be chosen (roulette wheel). |
| 20: | Randomly select the first |
| 21: | Update the position of food source (rule) by change the selected |
| 22: | Randomly select the second |
| 23: | Update the position of food source (rule) by changing selected |
| 24: | Calculate the fitness value using fitness function in Eq. |
| 25: | |
| 26: | Replace the old solution with the new one and reset the abandonment counter of the new solution. |
| 27: | |
| 28: | Increase the abandonment counter of the old solution by 1. |
| 29: | Set the abandonment bee Limit |
| 30: | Search for abandonment bee. |
| 31: | |
| 32: | Reset the abandonment counter of bee. |
| 33: | Generate a new solution (rule) for the employee bee randomly. |
| 34: | |
| 35: | Return the generated rules. |