Literature DB >> 35875781

Medical Image Classification Using Transfer Learning and Chaos Game Optimization on the Internet of Medical Things.

Alhassan Mabrouk¹, Abdelghani Dahou², Mohamed Abd Elaziz^3,4,5, Rebeca P Díaz Redondo⁶, Mohammed Kayed⁷.

Abstract

The Internet of Medical Things (IoMT) has dramatically benefited medical professionals that patients and physicians can access from all regions. Although the automatic detection and prediction of diseases such as melanoma and leukemia is still being investigated and studied in IoMT, existing approaches are not able to achieve a high degree of efficiency. Thus, with a new approach that provides better results, patients would access the adequate treatments earlier and the death rate would be reduced. Therefore, this paper introduces an IoMT proposal for medical images' classification that may be used anywhere, i.e., it is an ubiquitous approach. It was designed in two stages: first, we employ a transfer learning (TL)-based method for feature extraction, which is carried out using MobileNetV3; second, we use the chaos game optimization (CGO) for feature selection, with the aim of excluding unnecessary features and improving the performance, which is key in IoMT. Our methodology was evaluated using ISIC-2016, PH2, and Blood-Cell datasets. The experimental results indicated that the proposed approach obtained an accuracy of 88.39% on ISIC-2016, 97.52% on PH2, and 88.79% on Blood-cell datsets. Moreover, our approach had successful performances for the metrics employed compared to other existing methods.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35875781 PMCID： PMC9300353 DOI： 10.1155/2022/9112634

Source DB: PubMed Journal: Comput Intell Neurosci

1. Introduction

The Internet of Things (IoT) has been formulated to define the use of devices that can be controlled remotely [1]. The development of these devices allowed a wide range of uses. Hence, IoT is used in many areas, such as industrial [2], smart cites [3], agriculture [4], and Internet of Medical Things (IoMT) [5]. However, the IoMT technology has been commonly applied due to its high performance, saving time, and efforts of specialists/patients [6]. Besides, it provides patient care, such as monitoring their medications and tracking their hospital admission location. IoMT technologies are widely available, especially for diseases with the highest mortality rate globally, such as melanoma [7], leukemia [8], and others. Technology such as mobile devices and wearables can collect information about human health to provide effective hospital care. These technologies could be used in many applications or services, like obtaining data and analyzing them and monitoring the diagnosis of neurological illnesses. As a result of its efficiency and usability, the IoMT technology has been broadly accepted and widely used. Deep learning (DL) models can help diagnose breast cancer [9] and Alzheimer's disease [10] using advanced biomedical imaging methods such as thermal imaging and magnetic resonance imaging (MRI); however, these methods are expensive, require specialized medical imaging equipment, and are not available in many rural areas of developing countries. Thus, DL has recently been used by IoMT to automate and accurately diagnose a variety of diseases that help to facilitate efficient and appropriate healthcare [11]. For instance, an IoMT system for stroke detection using convolution neural networks (CNN) and transfer learning was demonstrated to distinguish between a healthy brain and hemorrhagic and ischemic strokes in CT scan images, as introduced in [12]. Although DL models outperformed traditional machine learning [13], there is less work known for DL-based IoMT on healthcare than services available on IoMT devices. The IoMT system for stroke patients' prevention can capture and maintain the patient's heartbeats, core temperature, and external factors quickly and with the required precision. These factors are essential for diagnosing stroke examination. DL techniques can help prevent frequent difficulties that take much time to solve. For example, web scraping [14], data mining [15], and sentiment analysis [16] are all areas where TL technology has a broad array of applications. Moreover, these approaches need a huge size of well-labeled training data samples. Many transfer learning (TL)-based approaches have been developed in medical image analyses to solve this issue. Due to its capacity to effectively solve the shortcomings of reinforcement learning and supervised learning, TL is becoming more widespread in medical image processing [17]. TL aims to train the forecast function in the target domain by utilizing information obtained in the source domain from a vast number of labeled datasets (e.g., ImageNet). TL is widely recognized in different computer vision domains for helping to enhance the learning of sparsely labeled or limited datasets in the particular domain [18]. Unfortunately, the input image properties of the training examples (i.e., a massive dataset of natural images) and the test data are highly different for TL in medical imaging (i.e., a small dataset of clinical images). Because of the significantly different domains with various and unconnected classes, as in [19], the transferred functions learned from the source database (training set) may be biased when directly implemented into the target database (test set). Consequently, the biased function's features are unlikely to be desired in the target domain, the medical image field. Moreover TL is vital to have both indicate environmental and discriminative capability in the feature extraction process in order to improve classification accuracy [20]. According to the traditional view, the TL is pretrained in the experiment and then finetuned for implementation using detailed information. Unsupervised, inductive, transductive, and negative learning are all types of TL. Also, it can solve these challenges [21]. Hence, we use a TL model to obtain features from medical images. Many features, such as color, texture, and size, are used in standard medical image categorization methods. When controlling high-dimensional feature vectors through an optimizer algorithm, the selection of optimal features is offered in a way to improve classification efficiency [22]. The optimal representation of the specified subset of features creates additional issues for the researchers. In order to automate this method, feature selection (FS) approaches have also been crucial for accurately defining these essential features. Therefore, we developed a method to solve the diagnostic imaging identification challenge and optimize the process, which is wrapped as an IoMT system to reduce morbidity and mortality worldwide. To the best of our knowledge, our approach is the first that tries to improve the efficiency of medical image classification on IoMT based on merging the deep learning (as MobileNetV3) and chaos game optimization metaheurstic optimization. In order to improve the performance for classifying medical images, the system incorporates both TL and FS optimization techniques. It is initially recommended that a TL architecture analyzes the supplied medical images and develops contextualized representations without personal communication. A finetuned MobileNetV3 is utilized to retrieve the embedded images. Next, a novel FS method is also planned to analyze each pixel embedding and choose only the most important properties to improve medical image classification performance. The FS method depends on a new metaheuristic strategy known as chaos game optimization (CGO). The reasons for employing CGO approaches to optimize the FS challenge in this paper are as follows. We would want to examine the most recent CGO optimizer. Furthermore, a CGO method is compared to the approach to complex, modern, and high-efficiency algorithms which reveal that the CGO optimizer has the optimal solution for the problems examined, with typically more incredible classification performance (i.e., fewer iterations and execution time). The contributions of this paper can be summarized as follows: The proposed IoMT system helps minimize human intervention in medical centers and provides fast diagnosis reports embedded in low-resourced systems. The transfer learning (i.e., MobileNetV3) model is finetuned on the assessed medical image datasets to extract relevant features. A novel feature selection approach to select appropriate features is used to build an IoMT system. An extensive evaluation of the proposed system is reported and compared to several state-of-the-art techniques using two real-world datasets. According to the paper's structure, Section 2 describes a review of recent work on medical imaging. Section 3 offers a detailed description of our approach. Section 4 analyzes the implementation results of image classification techniques. Finally, the concluding remarks give future scope in Section 6.

2. Related Works

The essential strength of the classification task to help diagnose the medical image makes it an important area of research. Therefore, this section is presented with the recent works about medical image classification. Recently, researchers have improved the Internet of Medical Things (IoMT) using DL and the classification task performance by applying transfer learning. Due to advances in connectivity among systems, the Internet of Things (IoT) is currently being used in various fields. When used in the medical area, the IoT can construct care and monitoring systems that could be monitored remotely. It is now possible for medical professionals and sometimes even patients to remotely access sensor data generated by devices attached to persons who are being monitored or have specific requirements [23]. Computed-aided diagnosis (CAD) technologies can benefit from the IoT by providing an interaction that directly correlates the terminal to the devices for medical images' classification. To put it another way, any person may now control a technology that formerly required training [24]. DL has been increasingly popular on the Internet of Medical Things (IoMT) in recent years [25]. As a result, the IoMT concept is suitable for building embedded technologies that can accurately diagnose diseases in the same manner that professionals perform. IoMT innovation, according to [26], has contributed to the establishment of vital healthcare systems. Physicians may now receive it in various settings, allowing them to better diagnose patients without affecting subjective features. Another obstacle that is yet to be addressed is the disparity between rare and common diseases regarding the amount of data collected. They introduced a method for the recognition of CT scan images of pulmonary and ischemic stroke on the IoMT [27]. These researchers employed an IoT device to directly contact users to choose the optimum extraction methods and classifications for a given situation. However, it was a result of this problem that the system was underperforming. A considerable percentage of accuracy is required in the medical sector when diagnosing the form of the disease. It has been shown in previous research studies that early identification of cancer is vital for sick people to receive the best treatment possible. Thus, our goal is to improve the medical image diagnosis by increasing the accuracy of the applied algorithms. In recent decades, metaheuristic optimization algorithms are combined with convolution neural networks (CNN) for medical image classification. The transfer learning process has been viral, primarily since it enables the system to be more powerful, reduces financial costs, and requires fewer inputs, supported by the entry weights supplied by the training process transferred. The study [28] examined training from many cases through a transformation in medical image processing. The researchers discussed various types of learning and future studies possibilities. For finetuning of transfer learning, Ayan and Ünver [29] employed the Xception and VGG16 structures. When they added two fully connected layers and the multiple output tier with a SoftMax activation function, they also completely modified the architecture of Xception. In the VGG16 structure, the past eight tiers were halted and the completely connected levels were modified. Accordingly, the testing time for each image for the VGG16 and Xception networks was 16 and 20 ms, respectively. InceptionV3, ResNet18, and GoogLeNet were among the models employed in Reference [30]. Based on convolutional networks, a determination has been made. They used each one of the models to test the premise that voting may be used to arrive at a diagnostic. In their study, the findings of the classifiers were combined using the clear majority. Accordingly, the diagnostic correlates to the class with the largest rate of start voting in the polls. The model's mean testing time per image was 161 ms using this method. On top of that, they attained high classification rates for X-ray pictures. According to this study, pneumonia can be diagnosed using deep convolutional networks. As part of our method, we rely on classical classifiers to minimize the computing cost of classifying information. As a result of their extensive feature representation skills, CNNs have been commonly applied in medical image processing in the latest years and have shown substantial gains. Zhang et al. The authors of [31] have developed a system for target class lesion identification based on multi-CNN collaboration. In addition, their approach was more reliable in identifying lesions and its utility had been evaluated using necessary details. A strong ensemble structure for cancer detection was created [32] using dynamic classification techniques. Therefore, a more distinctive and robust model can be created. To identify skin lesions on their own, in Reference [33], they proposed that a crossnet-based mixture of multiple convolutional networks may be used. For the categorization of melanomas, MobileNet and DenseNet were coupled [34]. Because the light medical image classification model was designed to boost feature selectivity, computation complexity, and parameter settings, it differed from older systems. It used a categorization strategy that worked well. Currently, metaheuristic optimization algorithms are being used to solve a wide range of complex optimization problems. Rather than a single answer, a list of possible solutions allows them to navigate the solution space efficiently. They beat other optimization approaches as a result. Samala et al. The authors of [35] suggested a method of multilayered pathway development to identify breast cancer. They used a two-stage method: transfer learning and identifying features, respectively. Region of interest (ROI) from large lesions were being used to train pretrained CNNs. On top of it, a random forest classification model was created using the learned CNN. We evolved pathways using a genetic algorithm (GA) with random selection and total number crossover operators. Their research found a 34% change in features and a 95% reduction in parameter actions using their proposed strategy. Through particle swarm optimization (PSO), da Silva et al. [36] optimized the hyperparameter of CNN for a false-positive reduction in CT lung images due to their comparable structures and low density, which causes false-positive results. Scientists have found that optimizing an automatic detection system can improve outcomes and minimize human intervention. In order to acquire the binary threshold value, Vijh et al. The authors of [37] adopted OTSU-based adaptable PSO for automatic classification of brain cancers. To reduce noise and improve the image quality, noise processing was removed and skull stripping was applied. For feature extraction, GLCM was utilized and 98% of the features were extracted. Utilizing the grey wolf optimization (GWO) method, Shankar et al. [38] developed a novel concept for Alzheimer's disease using brain imaging analysis. An initial consideration for picture editing is to remove undesirable regions. The retrieved images are then sent to CNN for feature extraction, resulting in improved performance. According to Goel et al. [39], OptCoNet is an optimized CNN architecture for recognizing COVID-19 patients as normal/pneumonia sufferers. For hyperparameter adjustment of the convolution layer, they employed the GWO. Their study found that the proposed approach assisted in the automated examination of patients and reduced medical systems' burdens on the system. In order to enhance architectures for denoising images, Elhoseny et al. [40] employed the dragonfly and improved firefly algorithms (FFA) to categorize the images as normal and abnormal. This adjustment improved significantly due to this adjustment, as the peak signal to noise ratio (PSNR) reduced significantly. Melanoma diagnosis was enhanced utilizing the whale optimization algorithm (WOA) and levy battles, as introduced in Reference [41]. Two datasets were analyzed using the developed structure, and the accuracy was 87% on both datasets. Some of them suffer from premature convergence and local minima, especially when faced with a large solution space [42]. Often, this limit results in inefficient task scheduling solutions, which hurts system performance. Therefore, a globally optimal solution to the IoMT task scheduling problem is urgently needed. However, these existing approaches were still unable to achieve a high degree of efficiency. To overcome this problem, this paper aims to find the best solutions that lead for improving performance. Hence, we combine transfer learning with metaheuristic FS optimization to create an available IoMT system. The characteristics of this system allow for outstanding performance, reasonable computing expenses, and address the financial concerns discussed earlier. As a result of the IoMT, it is necessary to treat and detect infection inside and outside the clinic. Therefore, Internet-connected devices and a digital copy of scan were used in IoMT system. However, these existing approaches were still unable to achieve a high degree of efficiency. To overcome this problem, this paper aims to find the best solutions that lead for improving performance. The main difference between the proposed model and previous approaches is that we combine transfer learning with metaheuristic FS optimization to create an available IoMT system. The characteristics of this system allow for outstanding performance and reasonable computing expenses. Hence, this system is necessary to treat and detect infections and diseases from anywhere.

3. Methodology

Inside this field of medical image classification, detecting a user's illness using a medical database is an interesting topic. The present study used three datasets for image recognition analysis, with the major goal of achieving maximal performance in disease diagnosis. The three datasets investigated were ISIC-2016 [43], PH2 [44] (both for melanoma detection), and Blood-Cell classification [45]. Figure 1 depicts the established IoMT's architecture. Initially, the IoMT devices capture medical images, and if the goal is to train the IoMT system, the image data could be sent to a cloud center. There still are three main processes at this level. Throughout the first stage, the features are extracted while using the TL architecture, as detailed in Subsection 3.2.2. The next stage is to find the relevant features using CGO. Lastly, the classification is performed, and the results can be dispersed across fog operating systems to save on communication costs if desired. If the goal is to identify the condition of the collected data, the training data in fog operating systems are utilized.

Figure 1

Diagram of the proposed IoMT system.

3.1. Proposed IoMT System

Our IoMT system is based on a computational cloud that communicates with a fog. Users may easily manipulate the data and parameters required to get the online service's classification results. This system component also handles communication between IoT devices (mobile phones and laptops) and the cloud center. Because the patient's images are all the same, the system can be used for various exams, proving its reliability. Image sizes, formats, and color conversions are adjusted as standards. The IoMT system represented in Figure 1 is what we offer to implement our methodology in the system in order to give a quick reaction and support the physician in making appropriate choices. There are two components in our system, cloud computing and fog computing. These are done first by sending a medical image database to a training level in the cloud using IoT technologies. Using the training model, the created system from Subsections 3.2.2 and 3.3.2 may be well. The pretrained feature extraction technique is deployed on cloud service and benefits from the light and quick approach. There is wellknown interoperability and limited resource use on embedded systems with the MobileNetV3 structure to extract the features. The introduced CGO algorithm, a lightweight and robust feature selection method, has been used upon feature extraction to minimize the features embedding set and only maintain the more essential features in each filtered image. We can speed up the training process by decreasing the number of features, which will allow us to arrive at a classification choice in an acceptable amount of time. One of the two components included in this IoMT system is fog computing. It allows the approved training model to make predictions without re-training the system, saving time and reducing network traffic. As a result, fog computing devices can assist the expert in making a judgment on medical image diagnosis better than waiting for a choice from the cloud centers. In addition, the training process on the cloud centers is finetuned regularly, employing photos gathered from connected devices and saved in a database. Thus, the training system's quality will improve, making better, more accurate decisions. There will also be a web-based application that the transmitter can use to create a rapid forecast that uses the pretrained or finetuning program to refine the system on a batch of new photographs. The sender will receive the final choice among other measurement metrics like accuracy to back up system forecasts.

3.2. Feature Extraction Using TL

This section gives a detailed description of the used transfer learning technique for features learning and extraction. As mentioned in Section 2, the pretrained model for image classification tasks in computer vision is beneficial in training and inference speed. In addition, few parameters can be finetuned during the training process rather than training models from scratch. In our system, MobileNetV3 is used as the backbone of the feature extraction process where the top layers of the model are replaced with new layers, and only specific layers are finetuned. The MobileNetV3 is an optimized version generated by a network architecture search (NAS) algorithm called NetAdapt. The NetAdapt algorithms use MobileNetV1 and MobileNetV2 components to search for an optimal network architecture and kernel size to minimize the model size and latency alongside maximizing its performance.

3.2.1. Efficient Deep Learning

DL techniques and models have demonstrated success in various tasks, including image classification, image segmentation, and object detection [46-49]. However, the challenges of these tasks, especially the quality and the impact of the learned representations, remain largely unexplored. Over the past decade, several DL architectures and training techniques have been proposed. For instance, researchers focus on exploiting the power of DL models to improve the model's performance and efficiency in terms of training time, computational resources, and accuracy. One of the most investigated DL models is convolutional neural networks with different architectures, designs, parameters, and training processes. Depthwise convolutions are DL components designed to exploit the spatial information in the input image and replace the traditional convolution layers, thus facilitating their deployment on embedded devices or edge applications. Various DL models have embraced the concept of depthwise convolutions to overcome the limitations of traditional convolution layers including MobileNets [50, 51], ShuffleNets [52], NASNet [53], MnasNet [54], and EfficientNet [55]. Unlike the traditional convolution layers, the depthwise convolution layers are used separately on each input channel. Thus, the models can be computationally inexpensive and trained with fewer parameters and less training time. In this section, we will focus on introducing the MobileNetV3 [51] and its core components. More detailed information will be discussed in the following sections, where we describe the MobileNetV3 as our feature extractor used in the proposed system. Howard et al. [51] introduced the MobileNetV3 in two versions: MobileNetV3-large and MobileNetV3-small. The MobileNetV3 is designed to optimize the latency and accuracy of the previous version, which is the MobileNetV2 architecture. For instance, MobileNetV3-large improved the accuracy by 3.2% compared to the MobileNetV2 while reducing the latency by 20%. The MobileNetV3 was designed using a network architecture search (NAS) technique termed the NetAdapt algorithm to search for the optimal network structure and kernel size of the depthwise convolution. As illustrated in Figure 2 (Section 3.2.2), the MobileNetV3 architecture is composed of the following core building blocks:

Figure 2

The building blocks of the proposed network architecture for feature extraction.

The depthwise separable convolutional layer has a depthwise convolutional kernel of size 3 × 3, followed by batch normalization and activation function. The 1 × 1 convolution (pointwise convolution) for linear combination computations of the depthwise separable convolutional layer and feature maps' extraction. The global average pooling layer reduces the dimensionality of the feature maps. The inverted residual block is inspired from the bottleneck blocks networks [56] that use the residual skip connections mechanism. The inverted residual block consists of the following sub-blocks: The 1 × 1 expansion and projection convolutional layers with a depthwise convolutional kernel of size 1 × 1 to learn more complex representations and reduce the model's calculations. A depthwise separable convolutional layer. A residual skip connection mechanism. The squeeze-and-excite block (SE block) [54] to select the relevant features on a channelwise basis. The h-swish activation function [57, 58] which is used interchangeably with the ReLU (Rectified linear unit) activation function.

3.2.2. Feature Extraction Module

Using different image datasets, the MobileNetV3 was finetuned to learn and extract feature vectors from inputted images of size 224 × 224. The MobileNetV3 was trained on the ImageNet dataset [56]. In our experiments, the MobileNetV3-large pretrained model was employed and finetuned on the datasets having skin cancer and blood cells images. A 1 × 1 point-wise convolution (Conv) was used to replace the top layers used for classification in the MobileNetV3 model as shown in Figure 2. The 1 × 1 pointwise convolution can be seen as a multilayer perceptron (MLP) used for classification and feature extraction tasks. Thus, in our implementation, we used two 1 × 1 pointwise convolutions at the top of the model to extract features from the input images and finetune the model on the image classification task. Meanwhile, the MobileNetV3 building block consists of an inverted residual block inspired by the bottleneck blocks. The inverted residual block contains two important blocks: the depthwise separable convolution block and a squeeze-and-excite block used to link the input and output features on the same channels, thus improving the features representations with low memory usage. The depthwise separable convolution block consists of 3 × 3 depthwise convolution, batch normalization (BN), activation function, and 1 × 1 pointwise convolution where the order of execution of the layers is as follows: (3 × 3Conv)⟶(BN)⟶(ReLU/h − swish)⟶(1 × 1Conv)⟶(BN)⟶(ReLU/h − swish). In contrast, the squeeze-and-excite block consists of fully connected layers (FC) with nonlinear transformation for global feature extraction using global pooling operation with the following execution order: (Pool)⟶(BN)⟶(FC1)⟶(ReLU)⟶(FC2)⟶(Sigmoid). Each building block can integrate a depthwise separable convolutional layer with different nonlinearity functions such as ReLU or hard swish (h-swish) which are defined in Equations 1 and 2, respectively.where h_swich is a modified version of the sigmoid activation function and σ(x) defines the piecewise linear complex analog function. To extract the feature vector from each input image, we used the generated finetuned model on each dataset. We flattened the 1 × 1 pointwise convolutional layer (placed before the classification layer) output and used the output as the feature vector. The extracted feature vector for each image of size 128 will be fed into the feature selection process in the proposed system. The model was finetuned for 100 epochs with a batch of size 32 on each dataset to produce the best classification performance. Meanwhile, to update the model's weight and bias parameters, we used the RMSprop optimizer with a learning rate of 1e − 4. To overcome the model's overfitting, we used the dropout layer with a probability of 0.38.

3.3. Feature Selection Optimization

During using methods for extraction of features, including MobileNetV3, the extracted features were not transmitted straight to the classification algorithm since it needed more processing time to reach. Feature selection (FS) techniques reduce redundant or unusable features from retrieved patient data like a content decomposition method. It means that the FS process minimizes the quantity of data transferred. As a result, an optimized feature choice process was implemented wherein most of the critical features were defined using the optimizer, i.e., chaos game optimization (CGO).

3.3.1. Chaos Game Optimization (CGO)

As a result of certain principles of the chaos theory, the CGO relies on fractal self-similarity issues [59]. According to the chaos theory, small changes in the early conditions of a chaotic system can significantly impact its future because of the system's dependence on its beginning conditions. Following this theory, the present state of a system can predict its future state, while the estimated existing state of the system does not identify its future state. In mathematics, the chaos game is constructing fractals by utilizing the main polygon pattern and a chosen randomly crucial point to create fractal patterns. The main goal is to construct a combination of points with a recurrent attitude to achieve a shape with a similar style in different ranges. Using a Sierpinski triangle fractal as an example, we may better appreciate the chaos game's theory. As shown in Figure 3, if three points are chosen for the main fractal structure, in this case, the output is the triangle. Selected vertices have been highlighted in red, green, or blue. The die utilized should have two red sides, two blue sides, and two green sides in this situation. First, a random point is chosen as the fractal's seed. A seed is moved from its starting location to the vertex corresponding to that color on each die roll by rolling it again and using its new location as a starting point for further reiterations. Finally, a die is rolled multiple times before the Sierpinski triangle appears.

Figure 3

Using chaotic game to create the Sierpinski triangle [59].

As a result of using the chaos game mechanics and fractals, the CGO method has been developed. Many candidate solutions (S) represent a few of Sierpinski's valid points. There are some choice factors (s) associated with each solution candidate (S). These selection factors reflect the placement of such eligible seeds within the Sierpinski triangle. The triangle can also be used to seek solutions. The primary strategy is to generate new seeds in the search area that could be the newly eligible seeds by generating temporary triangles. Toward achieving this goal, four different approaches are described. There is an iteration of this technique across all eligible seeds and the kth temporary triangles inside the search domain. The triangle has three nodes inside the search area, including three kth initial points, the blue (S), green (G), and red points (M). In this temporary triangle, a die is used to create new seeds using the chaotic method. Chaos game principles are used in this temporary triangle, creating new points with a die and three seeds. The three seeds (S, the G, and M) are placed in order of importance, from first to third, respectively. When it comes to S's first seed, a die with six faces (i.e., three red and three green) is used. Depending on the color of the die, the point is transformed in S toward M (red side) or G (green side). When rolling dice comes up green/red, the point is moved over to either G/M. It is possible to replicate this feature by using a random number generation function that creates only two values (0 and 1) for the possibility of selecting red or green sides. The green side indicates that the seed placed in S has moved to the G, while the red side indicates that the seed placed in S has moved to M. Unaffected by the fact that both sides of the game are equally likely to emerge, creating two random numbers for both M and G assumes that perhaps the seed contained in M is relocated anywhere along connected connections between the M and G. As a result of the chaos game technique that manages this feature, some randomly generated factorials are also used based on the actuality of the seeds' movement inside the search region. The first point has the following mathematical expression:where S is the solution candidate (kth) and G refers to the global solution implemented so far. As the name suggests, M is the average number of beginning points considered three points in the kth temporary triangle. Seed motion limitations are modeled using the randomly generated factorial, where α is the seed's motion limitations. If there is a desire to represent the likelihood of rolling a dice, β and γ correspond to random integers of 0 or 1. D is the number of eligible points (solution candidates). Regarding the second point, which is placed in the G, a die with six faces (i.e., three red and three blue) is utilized. The point in the G is moved to the S (blue face) or the M (red face). When a random number production function generates only two numbers, 0 and 1, for the possibility of picking red/blue faces, this property can be represented. When the blue face shows, the position of the seed in the G is changed to the S. When the red face shows, the position of the point in the G is changed to the M. Although each blue or red side has an equal chance to happen, the potential of generating two random numbers of 1 for S and M is also assumed that the point placed in G is relocated along the course of the connected connections between M and S. According to the chaotic game technique, transportation inside the search region should be limited based on the actuality of the seed; certain randomly generated factorials are used to manage this feature. The mathematical representation for the second seed is as follows: In addition, for the third seed, which is placed in M, a die with three blue sides and three green sides is used. The seed in M is transferred to the S (blue side) or the G (green side) by rolling the dice and relying on the color that shows green/blue. This functionality can be represented by a random integer creation function that generates only two values, 0 and 1, for the option of selecting green/blue faces. When the blue face shows, the position of the point in the M is changed to the S. When the green face occurs, the place of the point in the M is transferred to the G. Each one of the green and blue sides has an equal chance of occurring in this game. Then, creating two random numbers of S and G. Next, the M is transferred the path of the associated lines between the G and the S. Based on the actuality of the point, movements inside this search region should be controlled using the chaotic game technique to regulate this feature; specific randomly generated factorials are used. The third seed has the following formula: The additional point is also used as a fourth point placed in the S to conduct out all the stages of modification inside the search range. The technique for upgrading the fourth seed's placement is dictated by specific random fluctuations in the randomly chosen decision factors. The fourth seed has the following mathematical representation:where the point dimension is denoted by N. i denotes an integer in the range [1, N]. rand stands for an uniform random value (0, 1). For managing and changing the rates of exploration and exploitation within the proposed CGO algorithm, four formulas are conducted to identify the α as shown in Equation (7), which is used to simulate the seeds mobility limitations. These four formulas are randomly employed to locate the position of the first through third seeds.where R denotes an uniform random value in the range (0,1). Besides, ϵ and α are integers having random values ranged (0,1). According to the self-similarity of the fractals, the early eligible seeds and the freshly formed seeds applying the chaos game principle must be considered to determine if the newly created seeds should be included or not with the total eligible seeds inside the search domain. As a result, the initial seeds are transformed into new individual points if they achieve the highest levels of self-similarity or they are reserved if the new seeds achieve the lowest levels of self-similarity. Consider that the substitution operation is carried out in the mathematical technique to obtain a model with a reduced difficulty level. Since the Sierpinski triangle is a total form, the total points that have been found so far are used to complete its shape. If the solution variables (S) are out of bounds, it is crucial to deal with them as soon as they are discovered. S is outside the range of variables in this example, and the flag advises adjusting the boundaries of those variables. After a predefined set of optimization rounds, the optimization method concludes. Algorithm 1 outlines the steps of the CGO algorithm. Besides, Figure 4 depicts the flowchart of this algorithm. Initially, the beginning locations of the solution candidates (X) inside the search region are determined by random selection. Second, we determine the initial solution options' objective value based on the initial seeds' self-similarity. Then, it produces the global best (G) pertinent to the seed with a high eligibility level. Furthermore, we generate a mean group (M) that used a random choice technique for each eligible point (S) inside the search area. Also, we create a temporary triangle with the required three vertices of S, M, and G for each eligible seed (S) inside the search region. Subsequently, we find four seeds for each temporary triangle using Equations (3)–(6). Afterward, the s external variables scope should be checked for boundary conditions. Moreover, self-similarity is taken into account while calculating the objective function of these new seeds. Finally, it is time to replace the initial eligible points with new seeds if their objective functions show high self-similarity levels.

Figure 4

Flowchart showing the proposed methodology on the FS method.

3.3.2. Optimal Feature Selection

In general, the feature extraction methods are separated into training and test datasets, with the training dataset used to learn the model to identify the essential features. Figure 4 depicts the stages of the binary CGO optimization technique. First, the CGO is to produce a series of N agents X that depict the FS best solution. Then, the following formula is used to carry out a task: The dimension of the specific issue is denoted by Dim in Equation (8) (i.e., the number of features). In comparison, the search space is defined by U and L. A further step is to acquire the Boolean edition from each Xi, which is accomplished to use the following equations: The objective value from each X is computed by applying the optimization technique, which depends on the binary BX and classifying mistakes.in which (|BX|/Dim) represents the ratio of defined feature sets. The classifying fault utilizing SVM is denoted by γ. SVM is commonly used because it is much more steady than other classification techniques and has fewer parameters. In contrast, λ is a measurement that always had to adjust the proportion of selected features and categorization fault. The following step is to examine the halt criteria, and if they have been encountered, the best solution is brought back. Alternatively, the automatic update steps are repeated. The classification is conducted after getting the optimal features from the CGO algorithm. We use a machine learning technique, such as stochastic gradient descent (SGD). To train deep neural networks with better prediction capabilities by investigating the top nonconvex cost space is among the main objectives in DL. As a typical reason to describe this phenomenon, one can demonstrate that perhaps the cost landscape on its own is simple, with no misleading local optimal. However, it turns out that the cost landscape of superior DL models has fictitious local (or global) optimum, and stochastic gradient descent (SGD) is capable of detecting them [60]. Nevertheless, the SGD approach, launched at random, has high generalization qualities in the real world. In explaining this achievement, a hypothesis would have to provide for the entire method course, which became apparent. The problem remains challenging, even for the most advanced DL trained on datasets, which are still in the experimental stage.

4. Experiments

4.1. Experimental Data

Three datasets of medical photos were used to conduct image classification task for our experimental tests: PH2 [44], ISIC-2016 [43], and Blood-Cell datasets as in [45]. PH2: A total of 200 dermoscopic images were included in this dataset, including 80 Atypical Nevus, 80 Common Nevus, and 40 Melanoma. This dataset can be freely downloaded at http://www.fc.up.pt/addi/ph2.database.html. Table 1 describes in more detail about each dataset and its respective classes. As an example, Figure 5 shows some of the image samples from the selected databases.

Table 1

Dataset description.

Dataset name	Class	Training data	Test data	# Images per class
Ph ²	Common Nevus	68	12	80
	Atypical Nevus	68	12	80
	Melanoma	34	6	40
	Total	170	30	200
ISIC-2016	Malignant	173	75	248
	Benign	727	304	1,031
	Total	900	379	1279
Blood-Cell	Neutrophil	2499	624	3123
	Monocyte	2478	620	3098
	Lymphocyte	2483	620	3103
	Eosinophil	2497	623	3120
	Total	9957	2487	12444

Figure 5

Example medical image samples for classification task from the three selected datasets.

ISIC-2016: In total, 1179 photos are included in this dataset, which is separated into two categories: Most of the data are benign, whereas the remainder is malignant. There is a link on the website to get this database https://challenge.isic-archive.com/data. Blood-Cell: The dataset is collected from publicly available dataset from BCCD Dataset (https://www.kaggle.com/paultimothymooney/blood-cells/data). It comprises 12,442 blood cell images, 2487 test sets, and 9957 training sets. These images are classified into four types of blood cells: eosinophils, lymphocytes, monocytes, and neutrophils. There are 2496 eosinophils, 2484 lymphocytes, 2477 monocytes, and 2498 neutrophils in the training set, while 623 eosinophils, 623 lymphocytes, 620 monocytes, and 624 neutrophils are in the testing set.

4.2. Evaluation Metrics

This research was evaluated using the metrics in Table 2: balanced accuracy, accuracy, recall, precision, and F1 score. Balanced accuracy is specified as the average accuracy acquired across all classes. The quantity created across all predicted values is referred to as accuracy. The recall is calculated as the proportion of actual numbers to values that should have been predicted. Precision is calculated as the proportion of actual numbers to defined properties. Finally, the F1 score indicates a class imbalance between recall and precision, where false positives (FP) refers to the precise number of positives discovered from actual samples, when referring to true negatives (TN), it refers to the correct number of nonmodular data found. Besides, the number of nodular data discovered in a nonnodular sample is known as false positives (FP). Finally, it represents the number of faults identified in actual nodular data, referred to as false negatives (FN).

Table 2

Various performance parameters.

Metrics	Formula
Recall	TP/TP+FN
Precision	TP/TP+FP
Accuracy	TP+TN/TP+TN+FP+FN
F1 score	2∗Precision∗Recall/Precision+Recall
Sensitivity	TP/TP+FN
Specificity	TN/FP+TN
Balanced Accuracy	Sensitivity+Specificity/2

4.3. Experimental Results and Analysis

The results analysis and discussion of experiments for the suggested approach task scheduling technique are presented in this section. First, we compare our approach with various metaheuristic optimization strategies. Afterward, the three classifiers are compared, namely, k-nearest neighbor (kNN), support vector machines (SVM), and stochastic gradient descent (SGD). Then, we compare our results to those of other current medical image classification algorithms. Finally, a comparison with published techniques has been conducted. To objectively examine the effectiveness of our proposed approach, we compared it to nine wellknown algorithms. The metaheuristic optimizers, in particular, Particle swarm optimization (PSO) [61], Multiverse optimizer (MVO) [62], Grey wolf optimization (GWO) [63], Moth-flame optimization (MFO) [64], Whale optimization algorithm (WOA) [65], Firefly algorithm (FFA) [66], Bat algorithm (BAT) [67], and Hunger games search (HGS) [68]. As seen in Table 3, each optimizer retains a particular set of parameters. As the number of search agents increases, so does the likelihood of finding a worldwide optimal. The sample size is set at 50 in all experiments. The number of search agents could be reduced complexity.

Table 3

The parameters of each FS optimizer and their values.

S#	Optimizer	Parameter	Value
1	PSO	Vmax	6.0
		Wmax	0.9
		Wmin	0.2

2	MVO	WEPMin	0.2
2	MVO	WEPMax	1.0

3	GWO	A	2.0
3	GWO	R	(−1, 1)

4	MFO	B	1.0
4	MFO	L	(−1, 1)

5	WOA	A	2.0
5	WOA	R	1.0

6	FFA	Alpha	0.5
		BetaMin	0.2
		Gamma	1.0

7	BAT	QMin	0.0
7	BAT	QMax	2.0

8	HGS	VC2 = 0.03	0.0
		Vmax	6.0
		Wmax	0.9
		Wmin	0.2

9	CGO	β and γ	(1, 2)

The nine optimization techniques were combined with standard machine learning classifiers to produce the findings, such as KNN, SVM, and SGD. (a) According to KNN, an unidentified sample's classification is determined by the geographical sharing of benefits in that population. We can then find out where the k closest examples are located. The length among items is used to determine consistency. A typical length in a Euclidean distance is based on a mathematical formula. (b) It is possible to use SVMs as classification algorithms by altering the distributed space of data. SVM uses statistical knowledge for the classification task, and hyperplane arithmetic can be used to understand statistics. The hyperplane is defined based on the kernel used during a plot. Linear, polynomial, and RBF kernels are among the most common kernel types. (c) There are many advantages to using the SGD technique. An explanation for such success had to cover a broad duration of the procedure, which became apparent. Only the most robust DL learned on data, already in the test stage, have difficulty in solving the challenge.

4.4. Analysis Results

When evaluating these optimization techniques, multiple measures are used. Evaluation of each method was based on recall, precision, accuracy, and F1 score. PH2, ISIC-2016, and Blood-Cell datasets are represented in 4, 5, and 6, respectively. In these tables, the bolded results are the highest accurate ones. According to the outcomes shown in these tables, the SGD-based CGO beats PSO, MVO, GWO, MFO, WOA, FFA, BAT, and HGS. On the PH2 dataset, Table 4 shows that the CGO approach plays a significant role in feature selection when applying an SGD classifier since the findings are still effective; this is apparent throughout all measures. Analyzing results on the accuracy metric, using the SGD classifier, CGO can classify 97.52% of the test set, which is higher than the findings of the other optimization algorithms. According to the CGO, the BAT, HGS, MVO, MFO, and GWO in the second level are both at 97.50%. Moreover, the PSO's accuracy results are on par with WOA's, at 97.14%. Lately, the FFA's result has been the worst performance (i.e., 96.79%). On another view, the CGO achieved 97.54% on the precision metric, which was the best result on the SGD algorithm. BAT, HGS, MFO, and MVO came in the second level, 97.53%. They are followed by the GWO, which achieved 97.51%. Then, with the same level of precision, PSO and DLOHGS both have 97.19%. Last but not the least, FFA has the lowest performance with 96.84%. To make things even better, the recall measure for the SGD classifier was 97.51% for CGO, 97.50% for HGS, MVO, MFO, and BAT, 97.49% for GWO, 97.14% for PSO, and WOA, and 96.79% for FFA. In terms of F1 score, our CGO algorithm came out on top, with 97.51%. CGO is followed by the BAT, HGS, MFO, MVO, and GWO algorithms, 97.50%. Also, WOA achieved 97.15%. The last algorithms, PSO and FFA, are the worst in performance. In addition, the balanced accuracy measure for our CGO algorithm was 97.93%. Following CGO are BAT, MVO, MFO, GWO, and HGS algorithms with 97.92% each. More than that, the PSO has 97.62% accuracy. Lastly, FFA and WOA had the worst results with 97.32% and 97.02%, respectively. However, integrating these nine optimization techniques with the KNN classifier and SVM classifier produced the lowest metrics results compared with the SGD classifier.

Table 4

Results of each algorithms on PH2 dataset.

Optimizer	Classifier	Recall	Precision	F1 score	Accuracy	Balanced accuracy
PSO	SGD	0.9714	0.9719	0.9714	0.9714	0.9762
	KNN	0.9564	0.9569	0.9565	0.9564	0.9732
	SVM	0.9679	0.9684	0.9679	0.9679	0.9702

MVO	SGD	0.9750	0.9753	0.9750	0.9750	0.9792
	KNN	0.9561	0.9566	0.9562	0.9561	0.9762
	SVM	0.9679	0.9684	0.9679	0.9679	0.9702

GWO	SGD	0.9749	0.9751	0.9750	0.9751	0.9792
	KNN	0.9719	0.9721	0.9716	0.9715	0.9762
	SVM	0.9678	0.9694	0.9679	0.9679	0.9702

MFO	SGD	0.9750	0.9753	0.9750	0.9750	0.9792
	KNN	0.9714	0.9719	0.9714	0.9714	0.9762
	SVM	0.9679	0.9684	0.9679	0.9679	0.9702

WOA	SGD	0.9714	0.9718	0.9715	0.9714	0.9732
	KNN	0.9571	0.9576	0.9572	0.9571	0.9762
	SVM	0.9714	0.9719	0.9714	0.9714	0.9762

FFA	SGD	0.9679	0.9684	0.9679	0.9679	0.9702
	KNN	0.9564	0.9569	0.9565	0.9564	0.9762
	SVM	0.9714	0.9719	0.9714	0.9714	0.9762

BAT	SGD	0.9750	0.9753	0.9750	0.9750	0.9792
	KNN	0.9561	0.9566	0.9562	0.9561	0.9762
	SVM	0.9714	0.9719	0.9714	0.9714	0.9762

HGS	SGD	0.9750	0.9753	0.9750	0.9750	0.9792
	KNN	0.9564	0.9569	0.9565	0.9564	0.9732
	SVM	0.9714	0.9719	0.9714	0.9714	0.9762

CGO	SGD	0.9751	0.9754	0.9751	0.9752	0.9793
	KNN	0.9750	0.9753	0.9750	0.9750	0.9792
	SVM	0.9714	0.9719	0.9714	0.9714	0.9762

The proposed CGO algorithm outperformed other optimization techniques on the ISIC-2016 dataset, as seen in Table 5. The accuracy of the CGO algorithm for the SGD classifier was 88.39%, which was the best performance. In comparison, the BAT was at the second level, with 87.60%. With 87.07% of the vote, the HGS algorithm follows the preceding two. The PSO algorithm, which has 85.75%, follows the preceding three methods. The FFA and MVO algorithms (84.77%) are similar to their predecessors' algorithms. The algorithms that follow are the GWO (84.43%), WOA (83.91%), and MFO (79.95%). For the precision measure, our suggested CGO approach achieved a score of 87.81%. Following the FFA comes the HGS, which has an 87.75% rating. It was 86.22% for the HGS algorithm to keep up with them. 84.82% and 83.99% are the relative percentages for the PSO and WOA algorithms after the previous two algorithms in order of importance. The previous algorithms are followed by GWO, MVO, and FFA, which have respective success rates of 83.92%, 83.99%, and 83.78%. The MFO, on the other hand, has the lowest performance of 80.15%. As a result of the recall metric, 88.39% of the test samples were able to be compared using CGO, BAT, HGS, PSO, FFA, MVO, and GWO algorithms, while 83.91% of them were compared using the WOA method and 79.95% were compared using the MFO algorithm. For example, the proposed CGO outscored previous algorithms by 87.51% on the F1 score scale. 86.14% was obtained by HGS, which HGS followed. Next, BAT, MVO, GWO, and WOA have 85.79%, 84.18%, 84.14%, and 83.95%, respectively. Finally, MFO gets the poorest performance with 80.05% but not the latest. There was a 75.69% balanced accuracy of the CGO algorithm, which was the best performance. Regarding the WOA's and the HGS's performance in the second and third levels, respectively, they scored 74.90% and 73.86%. GWO is behind with 73.72%. FFA scored 64.85%, which is the lowest possible score.

Table 5

Results of each algorithms on the ISIC-2016 dataset.

Optimizer	Classifier	Recall	Precision	F1 score	Accuracy	Balanced accuracy
PSO	SGD	0.8575	0.8482	0.8390	0.8575	0.6852
	KNN	0.8657	0.8569	0.8523	0.8657	0.7072
	SVM	0.8654	0.8570	0.8587	0.8654	0.7454

MVO	SGD	0.8470	0.8389	0.8418	0.8470	0.7288
	KNN	0.8633	0.8539	0.8498	0.8633	0.7121
	SVM	0.8654	0.8570	0.8587	0.8654	0.7454
GWO	SGD	0.8443	0.8392	0.8414	0.8443	0.7372
	KNN	0.8391	0.8310	0.8341	0.8391	0.7189
	SVM	0.8681	0.8598	0.8610	0.8681	0.7470

MFO	SGD	0.7995	0.8015	0.8005	0.7995	0.6892
	KNN	0.8364	0.8317	0.8338	0.8364	0.7273
	SVM	0.8681	0.8598	0.8610	0.8681	0.7470

WOA	SGD	0.8391	0.8399	0.8395	0.8391	0.7490
	KNN	0.8678	0.8605	0.8531	0.8678	0.7139
	SVM	0.8681	0.8598	0.8610	0.8681	0.7470

FFA	SGD	0.8470	0.8378	0.8204	0.8470	0.6485
	KNN	0.8654	0.8570	0.8514	0.8654	0.7238
	SVM	0.8681	0.8598	0.8610	0.8681	0.7470

BAT	SGD	0.8760	0.8775	0.8579	0.8760	0.7068
	KNN	0.8670	0.8601	0.8520	0.8670	0.7206
	SVM	0.8654	0.8570	0.8587	0.8654	0.7454

HGS	SGD	0.8707	0.8622	0.8614	0.8707	0.7386
	KNN	0.8649	0.8565	0.8510	0.8649	0.7272
	SVM	0.8760	0.8684	0.8680	0.8760	0.7520

CGO	SGD	0.8839	0.8781	0.8751	0.8839	0.7569
	KNN	0.8311	0.8233	0.8265	0.8311	0.7089
	SVM	0.8628	0.8544	0.8564	0.8628	0.7437

For the Blood-cell dataset, the results of the CGO method and other optimizers are shown in Table 6. The SGD, SVM, and KNN classifiers have been combined on the nine optimizers in the table. According to the table, merging the CGO algorithm with SGD surpassed other algorithms by 88.79%, which is the accuracy score. GWO is then used to get the same outcome as MFO (i.e., 88.74%). There is also 88.70% in the MVO. BAT and HGS had the worst score, with 88.58%. The CGO also had the best results on the precision metric, with 91.10% of the vote. Ninety-one% (91.07%) was the second-best result, which belongs to MFO and WOA. Two other algorithms (BAT and HGS) performed poorly, with 90.92% and 90.83% of their respective performances, respectively. Recall results were better when using the CGO algorithm, with the best outcomes. The GWO and MFO all have the same recall (i.e., 88.74%). They are followed closely by the MVO; 88.66% was reached by the FFA and WOA, whom the FFA and the WOA followed. Finally, the BAT and HGS algorithms have a worse outcome of 88.58%. The proposed CGO also outperformed other algorithms on F1 score, with 89.95%. The MFO and GWO optimizers came second with 88.98%. There are also 88.95% for each of the other algorithms: MVO, WOA, FFA, PSO, and BAT, correspondingly. Finally, the HGS gets the poorest performance with 88.82% of the population. In the CGO algorithm, 88.78% accuracy was attained. At the same time, MFO was ranked second (88.74%) by the GWO. With 88.66%, WOA and FFA algorithms are next in line. Only BAT and HGS achieved a score of 88.58%.

Table 6

Results of each algorithms on the Blood-Cell dataset.

Optimizer	Classifier	Recall	Precision	F1 score	Accuracy	Balanced accuracy
PSO	SGD	0.8862	0.9104	0.8888	0.8862	0.8862
	KNN	0.8866	0.9100	0.8890	0.8866	0.8866
	SVM	0.8862	0.9102	0.8888	0.8862	0.8862

MVO	SGD	0.8870	0.9106	0.8895	0.8870	0.8870
	KNN	0.8858	0.9094	0.8883	0.8858	0.8858
	SVM	0.8866	0.9109	0.8892	0.8866	0.8866

GWO	SGD	0.8874	0.9102	0.8898	0.8874	0.8874
	KNN	0.8854	0.9093	0.8880	0.8854	0.8854
	SVM	0.8858	0.9110	0.8885	0.8858	0.8858

MFO	SGD	0.8874	0.9107	0.8898	0.8874	0.8874
	KNN	0.8870	0.9107	0.8895	0.8870	0.8870
	SVM	0.8858	0.9104	0.8885	0.8858	0.8858

WOA	SGD	0.8866	0.9107	0.8892	0.8866	0.8866
	KNN	0.8870	0.9107	0.8895	0.8870	0.8870
	SVM	0.8866	0.9109	0.8892	0.8866	0.8866

FFA	SGD	0.8866	0.9100	0.8891	0.8866	0.8866
	KNN	0.8858	0.9094	0.8883	0.8858	0.8858
	SVM	0.8858	0.9102	0.8884	0.8858	0.8858

BAT	SGD	0.8858	0.9092	0.8883	0.8858	0.8858
	KNN	0.8850	0.9090	0.8876	0.8850	0.8850
	SVM	0.8850	0.9098	0.8877	0.8850	0.8850

HGS	SGD	0.8858	0.9083	0.8882	0.8858	0.8858
	KNN	0.8862	0.9090	0.8886	0.8862	0.8862
	SVM	0.8862	0.9101	0.8888	0.8862	0.8862

CGO	SGD	0.8879	0.9110	0.8995	0.8879	0.8878
	KNN	0.8878	0.9112	0.8902	0.8878	0.8878
	SVM	0.8866	0.9109	0.8892	0.8866	0.8866

According to a different perspective, Figure 6 depicts the average accuracy of each feature selection optimization algorithm on the three selected datasets examined on the SGD classifier. The total average result on three databases is about 91.57% for the CGO, while the BAT technique comes in second with 91.23%. About 91.05% of outcomes from the HGS are better than those from the PSO. Those are followed by the MVO (90.03%), GWO (90.23%), FFA (90.05%), and WOA (89.90%). Last but not the least, the MFO has the lowest performance (88.73%).

Figure 6

Average accuracy of the SGD classifier of the selected datasets based on nine FS algorithms.

According to a client, the complete method takes far less time to execute. Figure 7 shows that the suggested CGO and HGS algorithms have an average execution time of 0.5672 and 0.5189 seconds for the three datasets, respectively. These results are lower than those of other algorithms that have been compared. The MFO optimizer took 0.7164 seconds to run, whereas GWO, WOA, FFA, BAT, and MVO took 0.7169 s, 0.7177 s, 0.7332 s, 0.7644 s, and 0.7723 s, respectively. The highest (or worst) execution time was attained (1.0576 s) for the PSO.

Figure 7

Average execution time of nine FS methods.

Figure 8 displays the average balanced accuracy of each feature selection approach on the three datasets, namely, ISIC-2016, PH2, and Blood-Cell, from a different perspective. On an average, KNN, SVM, and SGD classifiers outperform the CGO approach by 86.74%; the HGS method comes second with 86.72%. The WOA delivers superior results than GWO and MVO, with 86.62 and 86.53%, respectively. 86.22% of the vote goes to the BAT. After that, the MFO, PSO, and FFA optimizer obtained the lowest results, with an average balanced accuracy of 86.10%, 85.74%, and 85.56%, respectively.

Figure 8

Average balanced accuracy of the selected datasets based on nine FS algorithms.

The SGD, SVM, and KNN classifier's average accuracy on the three selected datasets are shown in Figure 9 on various techniques for optimization (i.e., the nine optimizers, which are introduced before). In the figure, we can see that the SVM outperformed other classifiers on the accuracy metric. To be more specific, the SVM achieved 90.78% accuracy, whereas the KNN achieved 90.13% accuracy. In the end, the SGD algorithm achieved 90.40%.

Figure 9

The averaged results of the selected dataset in terms of accuracy metric using the three classifiers.

However, the time to complete the full procedure is shorter than that for a user. As a result, the average execution time of the optimization algorithms for the three databases is presented in Figure 10. SGD's classification algorithm took the least amount of time, according to the results, then comes the SVM classifier, which takes 0.2767 seconds to complete its task. 1.7271 seconds is the longest (and therefore the worst) time for another classifier, KNN.

Figure 10

Average execution time of the three classifiers.

To sum it up, the CGO optimization technique paired with the SGD classifier earned the greatest accuracy metric among all combinations for the ISIC-2016, PH2, and Blood-Cell datasets. Moreover, the SGD outperforms other classification algorithms (i.e., KNN and SVM) according to the results.

4.5. Comparison with the Literature Studies

This section compares with other state-of-the-art medical image classification techniques. Table 7 shows the results of state-of-the-art methods. The development of high-accuracy technology for medical image classification is a major undertaking. It is important to compare our strategy to other models that have been tested on the same datasets. Using ISIC-2016, PH2, and Blood-Cell datasets, Table 7 evaluates the performance of several techniques for disease identification.

Table 7

Accuracy results (%) of the existing approaches.

Source	Dataset	Year	Classificationmodel	Accuracy(%)
[69]	ISIC-2016	2016	CUMED	85.50
[70]	ISIC-2016	2017	BL-CNN	85.00
[71]	ISIC-2016	2018	DCNN-FV	86.81
[31]	ISIC-2016	2019	MC-CNN	86.30
[32]	ISIC-2016	2019	KNORA-E	88.00
[33]	ISIC-2016	2020	MFA	86.81
[34]	ISIC-2016	2020	FUSION	87.60
Our	ISIC-2016	present	CGO + SGD	88.39
[72]	PH2	2017	ANN	92.50
[73]	PH2	2019	Kernel Sparse	93.50
[74]	PH2	2020	DenseNet201 + SVM	92.00
[75]	PH2	2020	DenseNet201 + KNN	93.16
[76]	PH2	2021	ResNet50 + NB	95.40
Our	PH2	present	CGO + SGD	97.52
[77]	Blood-Cell	2013	CNN + SVM	85.00
[78]	Blood-Cell	2017	CNN	87.08
[79]	Blood-Cell	2019	CNN + Augmentation	87.00
Our	Blood-Cell	present	CGO + SGD	88.79

For the ISIC-2016 dataset, the following advanced skin cancer identification methods were compared: Based on segregation and then validation [69], relied on feature-fusion [70], correlated with fisher-coding and deep residual networks [71], multi-CNN interactive learning model [31], ensemble method [32], and integrating fisher-vector and CNN fusion [33]. To differentiate characteristics, a fine-grained classification concept is applied [34]. For the PH2 dataset, the following advanced techniques for diagnosing melanoma have been included in the artificial neural network approach; as introduced in [72], they developed a decision-aid system. Also, it was proposed by the authors of [73] to use sparse kernel models to represent feature data in a high-dimensional feature vector. According to the authors of [74], U-Net can be used to detect malignant tumors automatically. As a part of their IoT system, the authors of [75] employed transfer learning and CNN. A hierarchical architecture founded on two-dimensional pixels in the image and ResNet was introduced in [76] for advanced DL. As a result of the CNN solution, the SVM-based classifiers were able to classify data, as proposed in [77]. Besides, a granularity feature and SVM are used in [78]. In order to identify and count essential blood cells in the Blood-Cell dataset, they used the following identifying and counting methods. In order to automate the entire procedure, CNNs were presented as a DL method in [79].

5. Discussion

The bottom line is that we can remove superfluous features from high-dimensional medical image representations obtained by CNN (i.e., MobileNetV3). The MobileNetV3 models achieved the effective performance as a feature extractor in our work. The class activation map for the MobileNetV3 model was prepared where the activation provided by the last layer is represented as an overlayed heat map, as shown in Figure 11. In the figure, the red regions represent the most important discriminative regions, while the other colored regions are less important.

Figure 11

Grad-CAM heatmaps on the skin images using the MobileNetV3 model.

In order to include a more complex comparison among different algorithms, we have used the Friedman (FD) test. The FD test is nonparametric that calculates and ranks the statistical value. In Reference [80], the FD test is used to determine whether there is a significant difference between different methods. Furthermore, Figure 12 compares the CGO method to the nine optimization techniques on the three datasets in terms of recall, precision, F1 measure, accuracy, and balanced accuracy. When the CGO's results are analyzed using the five metrics, it is clear that the CGO algorithm surpasses the others. In terms of balanced accuracy, the CGO has the lowest mean ranking of 1, following the GWO has the mean rank of 3.50. MFO and MVO have nearly identical mean levels, with 4. WOA and HGS have a mean rank of 4.17. Finally, BAT, PSO, and FFA are lower than the others, with a mean rank of 6.17, 7.33, and 7.83, respectively. According to the FD test results for accuracy, CGO is also better than others, with a mean rank of 1. They were followed by GWO, which achieved 3.83. On the other hand, MVO and BAT have the same mean level of 5, whereas MFO and HGS have 5.33. Lastly, PSO, FFA, and WOA have the highest mean ranking. Furthermore, we discovered that the CGO in the F1-score measure has the best mean rank of 1, and the GWO and MVO have the second and third mean ranks of 3.83 and 4, respectively. BAT and HGS have nearly identical mean levels (i.e., 5). Finally, MFO and WOA have a mean rank of 5.17 and 6.00, respectively. Finally, PSO and FFA have lower than the others, with a mean rank of 7.33 and 7.67, respectively. Finally, the precision measure difference between the CGO and the BAT, MVO, MFO, WOA, HGS, PSO, GWO, and FFA optimization algorithms averages 4.5, 4.83, 5, 5.17, 5.17, 5.33, 6, and 8, respectively. According to the FD test results for recall, CGO is also better than others, with a mean rank of 1. They were followed by MVO, which achieved 4.33. BAT has a mean rank level of 4.67, whereas MFO and HGS have 5. Lastly, GWO, PSO, FFA, and WOA have the highest mean ranking. As a result of Friedman's test, there is a noticeable difference between the proposed model and the other models (where the p value is less than 0.05), as shown in Figure 12.

Figure 12

The mean rank of FD test on several feature selection algorithms on the SGD classifier.

These reasons support that our approach obtains the best results. Thus, CGO is an effective search algorithm for tackling complex optimization issues; therefore, it is critical to pick its parameters carefully. For example, when the clusters of the population in CGO were analyzed, CGO worked better when the population of an optimal solution was classified into two parts. Second, CGO performed searches more consistently than other methods, as evidenced by lower standard deviation values in the results. Finally, CGO's exploration and exploitation techniques are successfully applied since they worked equally well on datasets with a wide variety of dimensions, making FS challenges adaptable. However, our approach also has some limitations, mainly in time and memory complexity. Therefore, we are currently working on trying to improve the efficiency of our approach. In fact, we are assessing to take into account other augmentation procedures, as introduced in [81]. Moreover, we plan to use other deep learning models such as Swin or Vision transformer, which achieved the best results and have been more recently used in different computer vision tasks.

6. Conclusion and Future Work

The automatic medical image classification task has been expanding rapidly in recent years. However, existing approaches are still incapable of achieving good performance due to the similarity in physical attributes of images, the diversity of medical experience, and a small medical image dataset. Therefore, this paper demonstrates a new method of classifying medical images that uses the IoMT system to help clinicians and patients make a quick and advanced diagnosis of diseases in any area. The proposed system relies on the classification models trained in the cloud center before being used, after extracting features from the medical images acquired from IoT nodes and passing them to fog computing. To obtain the features, MobileNetV3 was used. The MobileNetV3 was finetuned on medical image datasets to generate higher sophisticated and informative representations and retrieve feature vector representation. After that, we used a new metaheuristic method in the binary form (as chaos game optimization) to reduce the features' representation space. This algorithm leads to an enhancement for the convergence rate toward the optimal subset of relevant features. Therefore, CGO produces a high convergence speed. This indicates that it avoids trapping in local optima. Thus, it successfully balances the exploration and exploitation phases because of the fast determination of the threshold values and the high accuracy presented in the results. The learned model's efficiency is evaluated either by transmitting it to a tested medical images cloud center or by using fog computing with a clone of the learning algorithm. Our experiments were applied on three databases, ISIC-2016, PH2, and Blood-Cell. According to the results, the new CGO optimization method outperforms other existing feature selection methods. The work evaluated the combinations of nine optimizers with three different classifier configurations. The most significant results for accuracy, F1 score, recall, and precision metrics for these datasets were achieved with the CGO optimizer combined with the SGD classifier. For ISIC-2016, the accuracy value was 88.39%; for PH2, the accuracy was 97.52%;, finally, for Blood-Cell, the accuracy was 88.79%. Furthermore, the results of the comparisons with some other state-of-the-art medical image classification technologies demonstrated that the created IoMT methodology is an appropriate mechanism. In the near future, this system would be available in hospitals with the aim of monitoring the patients' condition from home. Patients would automatically send a report to the hospital through the connected devices, with vital information about blood pressure, insulin levels, etc. Then, professional staff at the hospital would follow up each case and, if needed, would directly communicate to the patient. However, there are still some limitations to the proposed model, being the most relevant the requirements for computational resources, that is, more time is needed to obtain the results, and also the requirements for memory resources. We are currently working on lowering complexity and enhancing the efficiency of the suggested system. Also, we plan to propose a CGO-based multiobjective feature selection approach for high dimensional data with a small instance to simultaneously maximize the classification performance and minimize the number of features, using more efficient classifiers. Additionally, automatic cluster number determination and the application of hyperheuristic approaches in FS can also be an exciting line of research. Moreover, a more comprehensive volume of medical data will be evaluated in the future study. Finally, merging several classifications algorithms is an attractive object of investigation that could allow practitioners to influence the performance of existing methods.

28 in total

1. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.

Authors: Stefan Elfwing; Eiji Uchibe; Kenji Doya
Journal: Neural Netw Date: 2018-01-11

Review 2. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis.

Authors: Veronika Cheplygina; Marleen de Bruijne; Josien P W Pluim
Journal: Med Image Anal Date: 2019-03-29 Impact factor: 8.545

3. Kernel sparse representation based model for skin lesions segmentation and classification.

Authors: Nooshin Moradi; Nezam Mahdavi-Amiri
Journal: Comput Methods Programs Biomed Date: 2019-08-16 Impact factor: 5.428

4. OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19.

Authors: Tripti Goel; R Murugan; Seyedali Mirjalili; Deba Kumar Chakrabartty
Journal: Appl Intell (Dordr) Date: 2020-09-21 Impact factor: 5.086

5. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis.

Authors: Ravi K Samala; Heang-Ping Chan; Lubomir M Hadjiiski; Mark A Helvie; Caleb Richter; Kenny Cha
Journal: Phys Med Biol Date: 2018-05-01 Impact factor: 3.609