| Literature DB >> 32054876 |
Ahmed T Sahlol1,2, Philip Kollmannsberger3, Ahmed A Ewees1.
Abstract
White Blood Cell (WBC) Leukaemia is caused by excessive production of leukocytes in the bone marrow, and image-based detection of malignant WBCs is important for its detection. Convolutional Neural Networks (CNNs) present the current state-of-the-art for this type of image classification, but their computational cost for training and deployment can be high. We here present an improved hybrid approach for efficient classification of WBC Leukemia. We first extract features from WBC images using VGGNet, a powerful CNN architecture, pre-trained on ImageNet. The extracted features are then filtered using a statistically enhanced Salp Swarm Algorithm (SESSA). This bio-inspired optimization algorithm selects the most relevant features and removes highly correlated and noisy features. We applied the proposed approach to two public WBC Leukemia reference datasets and achieve both high accuracy and reduced computational complexity. The SESSA optimization selected only 1 K out of 25 K features extracted with VGGNet, while improving accuracy at the same time. The results are among the best achieved on these datasets and outperform several convolutional network models. We expect that the combination of CNN feature extraction and SESSA feature optimization could be useful for many other image classification tasks.Entities:
Mesh:
Year: 2020 PMID: 32054876 PMCID: PMC7018965 DOI: 10.1038/s41598-020-59215-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Samples from the ALL-IDB2 dataset[2] showing benign (top) and malignant (bottom) lymphocytes.
Figure 2Overview of the VGGNet layer structure (left) and corresponding parameters (right).
Figure 3Flow chart of our proposed approach.
Figure 4Performance of the proposed hybrid VGGNet and SESSA approach on the ALL-IDB dataset; (a) average performance over 10 runs, (b) accuracy for 10 best and worst runs.
Comparison of feature number and performance for both datasets.
| Dataset 1 | Features | Percentage | Accuracy | Specificity | Sensitivity |
|---|---|---|---|---|---|
| VGG 19 | 25088 | 100% | 94.23 | 88 | |
| Proposed approach | 95 | ||||
| VGG 19 | 25088 | 100% | 80.9 | 80.9 | |
| Proposed approach | 67.3 | ||||
Results of the feature selection compared to other swarm based optimization algorithms for both datasets.
| Alg. | F. no. | Internal validation | Testing (external validation) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | Acc. | Sens. | Spec. | Prec. | F1 | RMSE | Acc. | Sens. | Spec. | Prec. | F1 | |||
| Dataset 1 | SESSA | |||||||||||||
| (ALL-IDB2) | SEMVO | 1121 | 0.122 | 0.981 | 0.961 | 0.963 | 0.981 | 0.1902 | 0.9610 | 0.9947 | 0.9268 | 0.9304 | 0.9617 | |
| SEGWO | 1101 | 0.170 | 0.968 | 0.999 | 0.938 | 0.939 | 0.967 | 0.1941 | 0.9576 | 0.9942 | 0.9199 | 0.9258 | 0.9587 | |
| SEPSO | 1163 | 0.132 | 0.979 | 0.957 | 0.96 | 0.979 | 0.1944 | 0.9609 | 0.9929 | 0.9263 | 0.9298 | 0.9615 | ||
| SEGA | 1158 | 0.175 | 0.965 | 0.997 | 0.933 | 0.937 | 0.966 | 0.204 | 0.9547 | 0.9918 | 0.9184 | 0.9247 | 0.9561 | |
| Dataset 2 | SESSA | 0.673 | 0.85 | |||||||||||
| (C-NMC) | SEMVO | 1168 | 0.419 | 0.825 | 0.902 | 0.662 | 0.848 | 0.874 | 0.447 | 0.800 | 0.871 | 0.645 | 0.843 | 0.857 |
| SEGWO | 766 | 0.407 | 0.834 | 0.906 | 0.676 | 0.861 | 0.883 | 0.427 | 0.818 | 0.906 | 0.634 | 0.837 | 0.870 | |
| SEPSO | 1196 | 0.399 | 0.841 | 0.916 | 0.673 | 0.862 | 0.888 | 0.418 | 0.825 | 0.897 | 0.874 | |||
| SEGA | 1102 | 0.42 | 0.824 | 0.901 | 0.662 | 0.848 | 0.874 | 0.443 | 0.804 | 0.878 | 0.642 | 0.842 | 0.860 | |
Parameters setting of all optimization algorithms.
| Algorithm | Parameters values |
|---|---|
| SESSA | |
| SEMVO | |
| SEGWO | |
| SEPSO | |
| SEGA |
Figure 5Convergence curves of the proposed approach and of other optimization approaches, (a) for 10 independent runs of SESSA, and (b) compared to other algorithms.
Figure 6Feature extraction time and accuracy on the ALL-IDB2 dataset (a) and on the C-NMC dataset (b) compared to other CNN models.
Comparison with related works on ALL-IDB2 (top) and C-NMC (bottom).
| Dataset 1 | Features | Classifier | Feature extraction | Accuracy % |
|---|---|---|---|---|
| Singhal | Texture | SVM | Manual | 89.72 |
| Singhal | Texture | KNN | Manual | 93.84 |
| Bhattacharjee | Shape | KNN | Manual | 95.23 |
| Sahlol | Shape, color, texture | KNN | Manual | 95.67 |
| Proposed approach | Deep features (VGG19) | SVM | Autom. | |
| Marzahl | Deep features (ResNet 18) | CNN | Autom. | 86.9 |
| Ding | Deep features (various) | CNN | Autom. | 86.7 |
| Kulhalli | Deep features (ResNeXt) | CNN | Autom. | 85.7 |
| Proposed approach | Deep features (VGG19) | SVM | Autom. |