| Literature DB >> 36124301 |
Arjun Ghosh1, Nanda Dulal Jana1, Saurav Mallik2,3, Zhongming Zhao2.
Abstract
Convolutional neural networks (CNNs) are deep learning models used widely for solving various tasks like computer vision and speech recognition. CNNs are developed manually based on problem-specific domain knowledge and tricky settings, which are laborious, time consuming, and challenging. To solve these, our study develops an improved differential evolution of convolutional neural network (IDECNN) algorithm to design CNN layer architectures for image classification. Variable-length encoding is utilized to represent the flexible layer architecture of a CNN model in IDECNN. An efficient heuristic mechanism is proposed in IDECNN to evolve CNN architecture through mutation and crossover to prevent premature convergence during the evolutionary process. Eight well-known imaging datasets were utilized. The results showed that IDECNN could design suitable architecture compared with 20 existing CNN models. Finally, CNN architectures are applied to pneumonia and coronavirus disease 2019 (COVID-19) X-ray biomedical image data. The results demonstrated the usefulness of the proposed approach to generate a suitable CNN model.Entities:
Keywords: CNN; DE; NAS; convolutional neural network; differential evolution; image classification; neural architecture search; neuroevolution; optimal neural architecture
Year: 2022 PMID: 36124301 PMCID: PMC9481963 DOI: 10.1016/j.patter.2022.100567
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Conventional structure of a CNN model
Summary of the related works and comparison with the proposed work
| Model | Search method | Proposed work | Limitation | Dataset |
|---|---|---|---|---|
| GeNet | GA | A fixed-length binary encoding strategy that represents CNN and GA was used to encode connections between layers of the CNN model. | Hyper-parameters of the associated layers were ignored. | CIFAR-10 |
| EvoCNN | GA | A variable-length encoding strategy was used to represent CNN, and GA was used to optimize both connections between layers and weights of the CNN model. | The best-generated architecture faced an over-fitting problem because of considering a large numbers of parameters. | MNIST, convex, rectangle |
| MA-NET | Memetic | CNN was represented using a variable-length encoding strategy, and the memetic algorithm was used to optimize connections between layers of the CNN model. | Because of the large number of parameters considered, the best-generated architecture may have encountered an over-fitting problem. | MNIST, MNIST variation, convex, rectangle |
| DeepSwarm | ACO | Pheromone information was used collectively to find the best CNN model. The authors used local and global pheromone update rules during method execution to balance exploration and exploitation. | Associated collective behavior made the approach computationally expensive. | MNIST, Fashion-MNIST, CIFAR-10 |
| IPPSO | PSO | A novel encoding scheme inspired by computer networking to represent a CNN architecture. PSO was used to optimize the layers and associated hyper-parameters of the CNN models. | Because of the architecture’s fixed pre-defined length, the depth of the architectural search space was reduced. | MNIST, MNIST with noisy image, convex |
| psoCNN | PSO | A variable-length encoding strategy to represent CNN architecture and layer type, such as Conv, Pool, and FC, was updated by copying the layers in a random fashion from the personal or global best solutions. | The architectural search space that might be less explored because a new particle was built from the global or personal best particle. | MNIST, MNIST variation, convex, rectangle |
| DECNN | DE | An internet protocol (IP)-based encoding strategy represented a CNN architecture. DE mutation and crossover operations were used to evolve CNN models. An extra crossover operator was integrated to generate offspring from the parent individuals. | Trim operation before the mutation operation may have led to loss of exploration in the architectural search space, and two crossover operations made the approach very complicated and expensive. | MNIST, MNIST variation, convex, rectangle |
| DE-NAS | DE | A cell-based encoding scheme was used to represent the CNN models. Continuous values were mapped to the NAS search space with a discretization strategy to evaluate architectures. | Converting from continuous to discrete space was expensive. Using a cell-based encoding strategy made the approach more costly and complicated. | CifarA, CifarB, Cifar C |
| IDECNN (proposed method) | DE | A variable-length direct encoding scheme is proposed to represent the depth, layer types, and arrangement of layers in CNN architectures; a simple difference mechanism between two architectures, followed by mutation and crossover operations to evolve each CNN through the original DE. | This only considered the layer-based CNN architecture design rather than cell- or block-based architecture design because of the limited computational resources on hand. | MNIST, MNIST variation, convex, rectangle |
Figure 2Framework of the proposed IDECNN algorithm
Figure 3Individuals with different lengths in IDECNN
The hyper-parameter and parameter settings for IDECNN
| Parameter name | Value |
|---|---|
| DE initialization | |
| Population size | 20 |
| # generation | 20 |
| 0.6 | |
| 0.4 | |
| Hyper-parameters of CNN | |
| Conv kernel size | 3–7 |
| Conv stride size | 1 |
| # feature maps | 3–256 |
| Pool kernel size | 3 |
| Pool stride size | 2 |
| Pool type | average or max |
| No. of neurons in an FC layer | 1–300 |
| Length of CNN | 3–10 |
| Training of CNN | |
| Activation function | ReLu |
| Weight initialization | Xavier |
| Optimizer | Adam |
| Learning rate | 0.001 |
| Batch size | 200 |
| Dropout rate | 0.5 |
| No. of epochs for single individual evaluation | 1 |
| No. of epochs for final individual | 100 |
Figure 4Difference calculation between two individuals
Figure 5Donor vector generation using mutation operation
Figure 6Trial vector generation using crossover operation
Figure 7Sample pictures of each benchmark dataset used in the proposed work
Overview of the datasets used in the proposed IDECNN algorithm for experimental study
| Dataset | Input size | Description | No. of training | No. of test | No. of classes |
|---|---|---|---|---|---|
| MNIST | handwritten digits | 10 | |||
| MBI | handwritten digits with background images | 10 | |||
| MRB | handwritten digits with random noise as background | 10 | |||
| MRD | handwritten rotated digits | 10 | |||
| MRDBI | handwritten rotated digits and background images | 10 | |||
| CS | convex shapes | 2 | |||
| RECT | rectangle border shapes | 2 | |||
| RECT-I | rectangle border shapes and image backgrounds | 2 |
Classification error results of IDECNN and state-of-the-art methods/models
| Methods/models | MNIST | MBI | MRB | MRD | MRDBI | CS | RECT | RECT-I | |
|---|---|---|---|---|---|---|---|---|---|
| LeNet-1 | – | – | – | – | – | – | – | ||
| LeNet-4 | – | – | – | – | – | – | – | ||
| LeNet-5 | – | – | – | – | – | – | – | ||
| NNet | |||||||||
| SVM + Poly | |||||||||
| SVM + RBF | |||||||||
| DBN-1 | |||||||||
| DBN-3 | |||||||||
| SAA-3 | |||||||||
| TIRBM | – | – | – | – | – | – | – | ||
| PGBM + DN-1 | – | – | – | – | – | – | |||
| RandNet-2 | |||||||||
| PCANet-2 | |||||||||
| LDANet-2 | |||||||||
| EvoCNN | best | ||||||||
| mean | |||||||||
| MA-NET | best | – | – | – | – | ||||
| mean | – | – | – | – | – | – | – | – | |
| DeepSwarm | best | – | – | – | – | – | – | – | |
| mean | – | – | – | – | – | – | – | ||
| IPPSO | best | – | – | – | – | – | |||
| mean | – | – | – | – | – | ||||
| SD | – | – | – | – | – | ||||
| psoCNN | best | ||||||||
| mean | |||||||||
| DECNN | best | – | – | ||||||
| mean | – | – | |||||||
| SD | – | – | |||||||
| IDECNN | best | ||||||||
| mean | |||||||||
| SD |
Figure 8Test accuracy boxplots of the IDECNN algorithm for the MNIST, MBI, MRD, MRDBI, CS, RECT, and RECT-I datasets
Classification error results of psoCNN and IDECNN without BN and dropout on the CS image dataset
| Model | CS | |
|---|---|---|
| psoCNN-BN-Dropout | best | |
| mean | ||
| IDECNN-BN-Dropout | best | |
| mean | ||
| SD |
Best CNN architectures evolved by IDECNN on eight image datasets
| Dataset | CNN architecture with hyper-parameters | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MNIST | ||||||||||
| MBI | ||||||||||
| MRB | ||||||||||
| MRD | ||||||||||
| MRDBI | ||||||||||
| CS | ||||||||||
| RECT | ||||||||||
| RECT-I | ||||||||||
Conv, convolution; Pool, pooling; FC, fully connected; , Conv kernel size; , Conv stride size; , number of feature maps; , Pool kernel size; , Pool stride size; , number of neurons.
Number of parameters used in each generated best CNN architecture
| Optimal CNN architecture | No. of parameters |
|---|---|
| MNIST_CNN | 4.32 million |
| MBI_CNN | 12.41 million |
| MRB_CNN | 9.40 million |
| MRD_CNN | 5.58 million |
| MRDBI_CNN | 6.14 million |
| CS_CNN | 16.27 million |
| RECT_CNN | 2.43 million |
| RECT-I_CNN | 1.79 million |
Figure 9Effect of three epoch numbers (1, 5, and 10) on best model training accuracy during fitness evaluations on the CS dataset
Figure 10Training accuracy of the best individual for 10 runs on the CS dataset
Figure 11Training accuracy of the best individual with different F and settings for the CS dataset
Overview of the chest X-ray dataset
| Class name | Input size | No. of training | No. of validation | No. of test |
|---|---|---|---|---|
| Normal | 1,082 | 267 | 234 | |
| Pneumonia | 3,110 | 773 | 390 |
Comparison of the classification accuracy of psoCNN and IDECNN on pneumonia chest X-ray images using the best CNN architecture generated for each dataset
| Optimal CNN model | Model | Accuracy |
|---|---|---|
| MNIST_CNN | ||
| IDECNN | ||
| MBI_CNN | psoCNN | |
| MRB_CNN | ||
| IDECNN | ||
| MRD_CNN | psoCNN | |
| MRDBI_CNN | psoCNN | |
| CS_CNN | ||
| IDECNN | ||
| RECT_CNN | psoCNN | |
| RECT-I_CNN | psoCNN | |
Figure 12A sample of some of the predicted images with the percentage of predicted accuracy using the MNIST_CNN model in the case of psoCNN and the MRD_CNN model in the case of IDECNN
Figure 13The obtained confusion matrices for the chest X-ray dataset using the eight best generated CNN architectures of the proposed IDECNN
(A–H) (A) MNIST_CNN, (B) MBI_CNN, (C) MRB_CNN, (D) MRD_CNN, (E) MRDBI_CNN, (F) CS_CNN, (G) RECT_CNN, and (H) RECT-I_CNN.
The obtained precision, recall, and F1 score of each class of chest X-ray dataset using the models MNIST_CNN, MBI_CNN, MRD_CNN, MRDBI_CNN, CS_CNN, RECT_CNN, and RECT-I_CNN
| Optimal CNN architecture | Normal | Pneumonia | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 score | Precision | Recall | F1 score | |
| MNIST_CNN | 0.92 | 0.67 | 0.78 | 0.83 | 0.97 | 0.89 |
| MBI_CNN | 0.98 | 0.34 | 0.50 | 0.71 | 0.99 | 0.83 |
| MRB_CNN | 0.93 | 0.59 | 0.72 | 0.80 | 0.97 | 0.88 |
| MRD_CNN | 0.96 | 0.71 | 0.82 | 0.85 | 0.98 | 0.91 |
| MRDBI_CNN | 0.98 | 0.47 | 0.63 | 0.76 | 0.99 | 0.86 |
| CS_CNN | 0.98 | 0.34 | 0.50 | 0.71 | 0.99 | 0.83 |
| RECT_CNN | 0.96 | 0.39 | 0.56 | 0.73 | 0.99 | 0.84 |
| RECT-I_CNN | 0.96 | 0.47 | 0.63 | 0.76 | 0.99 | 0.86 |
Overview of the chest X-ray dataset with two times random splitting
| No. of scenario | Class name | No. of training | No. of validation | No. of test |
|---|---|---|---|---|
| 1 | normal | 1,108 | 316 | 159 |
| pneumonia | 2,991 | 854 | 428 | |
| 2 | normal | 791 | 474 | 318 |
| pneumonia | 2,136 | 1,281 | 856 |
Figure 14The obtained confusion matrices for the chest X-ray dataset with random splitting (scenario 1) using the eight best generated CNN architectures of the proposed IDECNN
(A–H) (A) MNIST_CNN, (B) MBI_CNN, (C) MRB_CNN, (D) MRD_CNN, (E) MRDBI_CNN, (F) CS_CNN, (G) RECT_CNN, and (H) RECT-I_CNN.
The obtained precision, recall, F1 score, and model accuracy using the models MNIST_CNN, MBI_CNN, MRD_CNN, MRDBI_CNN, CS_CNN, RECT_CNN, and RECT-I_CNN for scenario 1
| Optimal CNN architecture | Normal | Pneumonia | Model accuracy | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 score | Precision | Recall | F1 score | ||
| MNIST_CNN | 0.85 | 0.67 | 0.75 | 0.89 | 0.96 | 0.92 | |
| MBI_CNN | 0.61 | 0.49 | 0.54 | 0.82 | 0.88 | 0.85 | |
| MRB_CNN | 0.71 | 0.76 | 0.73 | 0.91 | 0.86 | 0.88 | |
| MRD_CNN | 0.89 | 0.79 | 0.84 | 0.93 | 0.96 | 0.94 | |
| MRDBI_CNN | 0.65 | 0.55 | 0.60 | 0.84 | 0.86 | 0.85 | |
| CS_CNN | 0.51 | 0.55 | 0.53 | 0.83 | 0.80 | 0.81 | |
| RECT_CNN | 0.91 | 0.42 | 0.57 | 0.82 | 0.98 | 0.89 | |
| RECT-I_CNN | 0.82 | 0.45 | 0.58 | 0.82 | 0.96 | 0.88 | |
Figure 15The obtained confusion matrices for the chest X-ray dataset with random splitting (scenario 2) using the eight best generated CNN architectures of the proposed IDECNN
(A–H) (A) MNIST_CNN, (B) MBI_CNN, (C) MRB_CNN, (D) MRD_CNN, (E) MRDBI_CNN, (F) CS_CNN, (G) RECT_CNN, and (H) RECT-I_CNN.
The obtained precision, recall, F1 score, and model accuracy using the models MNIST_CNN, MBI_CNN, MRD_CNN, MRDBI_CNN, CS_CNN, RECT_CNN, and RECT-I_CNN for scenario 2
| Optimal CNN architecture | Normal | Pneumonia | Model accuracy | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 score | Precision | Recall | F1 score | ||
| MNIST_CNN | 0.72 | 0.87 | 0.79 | 0.95 | 0.87 | 0.91 | |
| MBI_CNN | 0.64 | 0.51 | 0.57 | 0.83 | 0.89 | 0.86 | |
| MRB_CNN | 0.68 | 0.77 | 0.72 | 0.91 | 0.87 | 0.89 | |
| MRD_CNN | 0.82 | 0.92 | 0.87 | 0.97 | 0.93 | 0.95 | |
| MRDBI_CNN | 0.79 | 0.57 | 0.66 | 0.86 | 0.95 | 0.90 | |
| CS_CNN | 0.70 | 0.61 | 0.65 | 0.86 | 0.90 | 0.88 | |
| RECT_CNN | 0.70 | 0.68 | 0.69 | 0.88 | 0.89 | 0.88 | |
| RECT-I_CNN | 0.61 | 0.51 | 0.56 | 0.83 | 0.88 | 0.85 | |
Overview of the COVID-19 X-ray dataset
| Class name | Input size | No. of training | No. of validation | No. of test |
|---|---|---|---|---|
| COVID-19 | 2,531 | 723 | 362 | |
| Non-COVID-19 | 7,134 | 2,038 | 1,020 |
Figure 16The obtained confusion matrices for the COVID-19 X-ray dataset using the eight best generated CNN architectures of the proposed IDECNN
(A–H) (A) MNIST_CNN, (B) MBI_CNN, (C) MRB_CNN, (D) MRD_CNN, (E) MRDBI_CNN, (F) CS_CNN, (G) RECT_CNN, and (H) RECT-I_CNN.
The obtained precision, recall, F1 score, and model accuracy using the models MNIST_CNN, MBI_CNN, MRD_CNN, MRDBI_CNN, CS_CNN, RECT_CNN, and RECT-I_CNN for the COVID-19 dataset
| Optimal CNN architecture | COVID-19 | Non-COVID-19 | Model accuracy | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 score | Precision | Recall | F1 score | ||
| MNIST_CNN | 0.65 | 0.67 | 0.66 | 0.88 | 0.87 | 0.87 | |
| MBI_CNN | 0.64 | 0.70 | 0.69 | 0.89 | 0.86 | 0.87 | |
| MRB_CNN | 0.67 | 0.68 | 0.67 | 0.89 | 0.88 | 0.88 | |
| MRD_CNN | 0.61 | 0.69 | 0.65 | 0.88 | 0.84 | 0.86 | |
| MRDBI_CNN | 0.59 | 0.67 | 0.63 | 0.88 | 0.83 | 0.85 | |
| CS_CNN | 0.56 | 0.62 | 0.59 | 0.87 | 0.81 | 0.84 | |
| RECT_CNN | 0.58 | 0.67 | 0.62 | 0.88 | 0.83 | 0.85 | |
| RECT-I_CNN | 0.59 | 0.72 | 0.65 | 0.89 | 0.82 | 0.85 | |