Literature DB >> 34745371

Deep Learning Approach for Early Detection of Alzheimer's Disease.

Hadeer A Helaly^1,2, Mahmoud Badawy^2,3, Amira Y Haikal².

Abstract

Alzheimer's disease (AD) is a chronic, irreversible brain disorder, no effective cure for it till now. However, available medicines can delay its progress. Therefore, the early detection of AD plays a crucial role in preventing and controlling its progression. The main objective is to design an end-to-end framework for early detection of Alzheimer's disease and medical image classification for various AD stages. A deep learning approach, specifically convolutional neural networks (CNN), is used in this work. Four stages of the AD spectrum are multi-classified. Furthermore, separate binary medical image classifications are implemented between each two-pair class of AD stages. Two methods are used to classify the medical images and detect AD. The first method uses simple CNN architectures that deal with 2D and 3D structural brain scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset based on 2D and 3D convolution. The second method applies the transfer learning principle to take advantage of the pre-trained models for medical image classifications, such as the VGG19 model. Due to the COVID-19 pandemic, it is difficult for people to go to hospitals periodically to avoid gatherings and infections. As a result, Alzheimer's checking web application is proposed using the final qualified proposed architectures. It helps doctors and patients to check AD remotely. It also determines the AD stage of the patient based on the AD spectrum and advises the patient according to its AD stage. Nine performance metrics are used in the evaluation and the comparison between the two methods. The experimental results prove that the CNN architectures for the first method have the following characteristics: suitable simple structures that reduce computational complexity, memory requirements, overfitting, and provide manageable time. Besides, they achieve very promising accuracies, 93.61% and 95.17% for 2D and 3D multi-class AD stage classifications. The VGG19 pre-trained model is fine-tuned and achieved an accuracy of 97% for multi-class AD stage classifications.

Entities: Chemical

Keywords: Alzheimer’s disease; Brain MRI; Convolutional neural network (CNN); Deep learning; Medical image classification

Year: 2021 PMID： 34745371 PMCID： PMC8563360 DOI： 10.1007/s12559-021-09946-2

Source DB: PubMed Journal: Cognit Comput ISSN： 1866-9956 Impact factor: 4.890

Introduction

The most common cause of dementia is Alzheimer’s disease (AD) because 60–80% of dementia cases account for it [1, 2]. In a neurodegenerative form of dementia, AD starts with mild cognitive impairment (MCI) and gradually gets worse. It affects brain cells, induces memory loss, thinking skills, and hinders performing simple tasks [3, 4]. Therefore, AD is a progressive multi-faceted neurological brain disease. The persons with MCI are more likely to develop AD than others [5, 6]. People observe the effects of AD only after years of changes in the brain because it initiates two decades or more before the symptoms are detected. Alzheimer’s disease International (ADI) reports that more than 50 million people worldwide are dealing with dementia. By 2050, this percentage is projected to increase to 152 million people, which means that every 3 s, people develop dementia. The estimated annual cost of dementia is expected to be $1 trillion and is predicted to double by 2030 [7]. Depending on the age, the proportion of people affected by AD varies. Figure 1 shows 5.8 million Americans in the United States (US) aged 65 and older with AD in 2020. And by 2050, it is expected to reach 13.8 million [5].

Fig. 1

A proportion of people affected by AD according to ages in the United States [5]

A proportion of people affected by AD according to ages in the United States [5] The biggest challenge facing Alzheimer’s experts is that no reliable treatment available for AD so far [8, 9]. Despite this, the current AD therapies can relieve or slow down the progression of symptoms. So, the early detection of AD at its prodromal stage is critical [10, 11]. Computer-Aided System (CAD) is used for accurate and early AD detection to avoid AD patients’ high care costs, which are expected to rise dramatically [12]. In the early AD diagnosis, traditional machine learning techniques typically take advantage of two types of features [13], namely, region of interest (ROI)-based features and voxel-based features. More specifically, they rely heavily on basic assumptions, such as regional cortical thickness, hippocampal volume, and gray matter volume, regarding structural or functional anomalies in the brain [14, 15]. Traditional methods depend on manual feature extraction, which relies heavily on technical experience and repetitive attempts, which appears to be time-consuming and subjective. As a result, deep learning especially convolutional neural networks (CNNs) is an effective way to overcome these problems [16]. CNN can boost efficiency further, has shown great success in AD diagnosis, and it does not need to do handcrafted features extraction as it extracts the features automatically [17, 18]. In this study, an end-to-end Alzheimer’s disease early detection and classification (E2AD2C) framework is established focused on deep learning approaches and convolutional neural networks (CNN). Four stages of AD such as (I) Clinically Stable or Normal Control (NC), (II) Early Mild Cognitive Impairment (EMCI), (III) Late Mild Cognitive Impairment (LMCI), and (IV) Alzheimer’s disease (AD) are multi-classified. Besides, separate binary medical image classifications are implemented between each two-pair class of AD stages. This medical image classification is applied using two methods. The first method uses simple CNN architectures that deal with 2D and 3D structural brain scans from the ADNI dataset based on 2D and 3D convolution. The second method applies the transfer learning principle to take advantage of the pre-trained models for medical image classifications, such as the VGG19 model. In addition to that, using the final qualified architectures, Alzheimer’s checking web application is proposed. It helps doctors and patients to check AD remotely, determines the AD stage, and advises the patient according to its AD stage. The remainder of this paper is organized as follows: in the “Related Work” section, the relevant works are reviewed. The “Problem Statement and Plan of Solution” section outlines the major issues and the aims of this study. In the “Methods and Materials” section, the methods and materials are discussed. In the “Experimental Results and Model Evaluation” section, the experiments and the results are assessed. The “Conclusion” section summarizes the paper.

Related Work

AD detection has been widely studied, and it involves several issues and challenges. A sparse autoencoder and 3D convolutional neural networks were used by Payan et al. [19]. They built an algorithm that detects an affected person’s disease status based on a magnetic resonance image (MRI) scan of the brain. The major novelty was the usage of 3D convolutions, which gave a better performance than 2D convolutions. The convolutional layer had been pre-trained with an auto-encoder, but it had not fine-tuned. Performance is predicted to improve with fine-tuning [20]. Sarraf et al. [21] used a commonly used CNN architecture, LeNet-5, to classify AD from the NC brain (binary classification). Hosseini et al. [22] developed the work presented in [19]. They predicted the AD by a Deeply Supervised Adaptive 3D-CNN (DSA-3D-CNN) classifier. Three stacked 3D Convolutional Autoencoder (3D-CAE) networks were pre-trained using CAD-Dementia dataset with no skull stripping preprocessing. The performance was measured using ten-fold cross-validation. Korolev et al. [23] proved that an equivalent performance could be realized. When the residual network and plain 3D CNN architectures were applied on 3D structural MRI brain scans, the results showed that the two networks’ depth was very long, and the complexity was high. They did not achieve high performance as expected. An eight-layer CNN structure was studied by Wang et al. [24]. Six layers served the feature extraction process in convolutional layers and two fully connected layers in classification. The results showed that max-pooling and Leaky Rectified Linear unit (LReLU) gave a high performance. Khvostikov et al. [25] used a 3D Inception-based CNN for the AD diagnosis. The method depended on Structural Magnetic Resonance Imaging (SMRI) and Diffusion Tensor Imaging (DTI) modalities fusion on hippocampal Regions of Interest (RoI). They compared the performance of that approach with the AlexNet-based network. Higher performance was reported by 3D Inception than by AlexNet. A HadNet architecture was proposed to study Alzheimer’s spectrum MRI by Sahumbaiev et al. [26]. The dataset of MRI images is spatially normalized by Statistical Parametric Mapping (SPM) toolbox and skull-stripped for better training. It is projected that when the HadNet architecture improved, sensitivity and specificity would improve as well. The model of Apolipoprotein E expression level4 (APOe4) was suggested by Spasov et al. [27]. MRI scans, genetic measures, and clinical evaluation were used as inputs for the APOe4 model. Compared with pre-trained models such as AlexNet [28] and VGGNet [29], the model minimized computational complexity, overfitting, memory requirements, prototyping speed, and a low number of parameters. A novel CNN framework was proposed based on a multi-modal MRI analytical method using DTI or Functional Magnetic Resonance Imaging (fMRI) data by Wang et al. [30]. The framework classified AD, NC, and amnestic mild cognitive impairment (aMCI) patients. Although it achieved high classification accuracy, it is expected that using 3D convolution instead of 2D convolution would give better performance. A shallow tuning of a pre-trained model such as Alex net, Google Net, and ResNet50 was suggested by Khagi et al. [31]. The main objective was to find the effect of each section of the layers in the results in the natural image and medical image classification. PFSECTL mathematical model was proposed by Jain et al. [32] based on CNN and VGG-16 pre-trained models. It worked as a feature extractor for the classification task. The model supported the concept of transfer learning. Ge et al. [33] developed a 3D multi-scale CNN (3DMSCNN) model. For AD diagnosis, 3DMSCNN was a new architecture. Additionally, they proposed an enhancement strategy and feature fusion for multi-scale features. Graph Convolutional Neural Network (GCNN) classifier was proposed by Song et al. [34] based on the Graph-theoretic tools. They trained and validated the network using structural connectivity graphs representing a multi-class model to classify the AD spectrum into four categories. For the detection of AD, Liu et al. [35] used speech info. The features of the spectrogram were extracted and obtained from elderly speech data. The system relied on methods for machine learning. Among the tested models, the logistic regression model gave the best results. Besides, a multi-model deep learning framework was proposed by Liu et al. [36]. Automatic hippocampal segmentation and AD classification were jointed based on CNN using structural MRI data. The learned features from the multi-task CNN and the 3D Densely Connected Convolutional Networks (3D DenseNet) models were combined to classify the disease status. A protocol was introduced by Impedovo et al. [37]. This protocol offered a “cognitive model” for evaluating the relationship between cognitive functions and handwriting processes in healthy subjects and cognitively impaired patients. The key goal was to establish an easy-to-use and non-invasive technique for neurodegenerative dementia diagnosis and monitoring during screening and follow-up. A 3D CNN architecture is applied to 4D FMRI images for classifying four AD stages (AD, EMCI, LMCI, NC) by Harshit et al.[38]. In addition to that, other CNN structures that deal with 3D MRI for different AD stage classification are suggested by Silvia et al. [39] and Dan et al. [40]. A 3D Densely Connected Convolutional Networks (3D DenseNets) is applied in 3D MRI images for 4-way classification by Juan Ruiz et al. [41].

Problem Statement and Plan of Solution

Recently, numerous architectures that can accommodate AD detection and medical image classification have been proposed in the literature, as seen in the “Related Work” section. However, most of them lack applying transfer learning techniques, multi-class medical image classification, and applying Alzheimer’s disease checking web service to check AD stages and advise patients remotely. These issues have not been sufficiently discussed in the literature. So, the novelties of this study, according to other state-of-the-art techniques reviewed in the “Related Work” section, can be organized as follows: • An end-to-end framework is applied for the early detection of Alzheimer’s disease and medical image classification. • Medial image classification is applied using two methods as follows: The first method is based on simple CNN architectures that deal with 2D and 3D structural brain MRI. These architectures are based on 2D and 3D convolution. The second method uses transfer learning to take advantage of the pre-trained models such as the VGG19 model. • The main challenges for medical images are the small number of the dataset. So, data augmentation techniques are applied to maximize the dataset’s size and prevent the overfitting problem. • Resampling methods are used, such as “oversampling, downsampling” to overcome collected imbalanced dataset classes. • Three multi-class medical image classification and 12 binary medical image classification have experimented with four AD stages. • The experimental results give high performance according to nine performance metrics. • Due to the COVID-19 pandemic, it is difficult for people to go to hospitals periodically to avoid gatherings and infections. Thus, Alzheimer’s disease checking web service for doctors and patients is proposed to check AD and determine its stage remotely. Then, it advises according to the specified AD stage.

Methods and Materials

Early detection of Alzheimer’s disease plays a crucial role in preventing and controlling its progress. Our goal is to propose a framework for the early detection and classification of the stages of Alzheimer’s disease. There will be a comprehensive explanation of the proposed E2AD2C framework workflow, the preprocessing algorithms, and medical image classification methods in the next sub-sections.

The Proposed E2AD2C Framework

The proposed E2AD2C framework comprises six steps, which are as follows: Step 1—Data Acquisition Step: All trained data is collected from the ADNI dataset in 2D, T1w MRI modality. It includes medical image descriptions such as Coronal, Sagittal, and Axial in the DICOM format. The dataset consists of 300 patients divided into four classes AD, EMCI, LMCI, and NC. Each class has 75 patients with a total number of images of 21 and 816 scans. AD class contains 5764 images, EMCI has 5817 images, LMCI includes 3460 images, and NC has 6775 images. All medical data were derived with a size of 256 × 256 in 2D format. Table 1 depicts demographic data for 300 subjects from the ADNI dataset. It gives an overview of the data, such as the number of patients in each class, the ratio of male or female patients in each class, and the mean of ages with the standard deviation (STD). Figure 2 shows three slices in a two-dimensional format. The slices were extracted from an MRI scan in MR Accelerated Sagittal MPRAGE view, MR Axial Field Mapping view, and MR 3 Plane Localizer view.

Table 1

Demographic data for 300 subjects

Alzheimer stages	AD	EMCI	LMCI	NC
Subject number	75	75	75	75
Male/female	21/54	51/24	43/32	32/43
Age (mean ± STD)	75.95 ± 0.91	76.08 ± 0.89684	77.44 ± 1.33801	75.68 ± 0.469617

Fig. 2

Slices of MR images: Accelerated Sagittal MPRAGE view, Axial Field Mapping view, and 3 Plane. Localizer view from left to right of AD patient

Demographic data for 300 subjects Slices of MR images: Accelerated Sagittal MPRAGE view, Axial Field Mapping view, and 3 Plane. Localizer view from left to right of AD patient Step 2—Preprocessing Step: The collected dataset suffers from imbalanced classes. To overcome this problem, we resampling the dataset using two methods (oversampling and undersampling). Oversampling means coping instances for the under-represented class, and undersampling means deleting instances from the over-represented class. We apply oversampling method on AD, EMCI, and LMCI. Also, the undersampling method is utilized for the NC class. All AD classes after resampling methods become 6000 MRI images. As a result, the dataset becomes 24,000 images. The dataset is then processed, normalized, standardized, resized, denoised, and converted to a suitable format. The data is denoised by a non-local means algorithm for blurring an image to reduce image noise. Step 3—Data Augmentation Step: Due to the scarcity of medical datasets, the dataset is augmented using traditional data augmentation techniques such as rotation and reflection (flipping) that flips images horizontally or vertically. So, the dataset’s size becomes 48,000 images divided into 12,000 images for each class. The major reasons for using data augmentation techniques are to (i) maximize the dataset and (ii) overcome the overfitting problem. The balanced augmented dataset of 48,000 MRI images is then shuffled and split into training, validation, and test set with a split ratio of 80:10:10 on a random selection basis for each class. Table 2 summarizes the resulting training, validation, and test set sizes for 4-way classification (AD vs. CN vs. EMCI vs. LMCI) as well as 2-way classification or multi-class and binary classifications.

Table 2

Training, validation, and test set size

Class label	Training set size	Validation set size	Test set size	Total
0 AD	9600	1200	1200	12,000
1 EMCI	9600	1200	1200	12,000
2 LMCI	9600	1200	1200	12,000
3 NC	9600	1200	1200	12,000
Total	38,400	4800	4800	48,000

Training, validation, and test set size Step 4—Medical Image Classification Step: In this step, four stages of AD spectrum (I) NC, (II) EMCI, (III) LMCI, and (IV) AD are multi-classified. Besides, separate binary classifications are implemented between each two-pair class. This medical image classification is done via two methods. The first method depends on simple CNN architectures that deal with 2D, 3D structural brain MRI scans based on 2D, 3D convolutions. The CNN architectures are built from scratch. The second method uses transfer learning techniques for medical image classification, such as VGG 19 model, to benefit from the pre-trained weights. Step 5—Evaluation Step: The two methods and the CNN architectures are evaluated according to nine performance metrics. Step 6—Application Step: Based on the proposed qualified models, an AD checking web application is proposed. It helps doctors and patients to check AD remotely, determines the Alzheimer’s stage of the patient based on the AD spectrum, and advises the patient according to its AD stage. The full pipeline of the proposed framework is shown in Fig. 3.

Fig. 3

The proposed framework E2AD2C architecture

Preprocessing Techniques

Data Normalization

Data normalization is the process that changes the range of pixel or voxel intensity values. It aims to remove some variations in the data, such as different subject pose or differences in image contrast, to simplify subtle difference detection. Zero-mean, unit variance normalization, [−1, 1] rescaling, and [0, 1] rescaling are examples of the data normalization methods. The last method is applied in the current study. The difference between these normalization methods appears in Fig. 4. It illustrates an original image and its output shape based on applying the different data normalization methods.

Fig. 4

Example of the normalization methods applied on MRI image

Proposed Classification Methods and Techniques

Feature extraction, feature reduction, and classification are three essential stages where traditional machine learning methods are composed. All these stages are then combined in standard CNN. By using CNN, there is no need to make the feature extraction process manually. Its initial layers’ weights serve as feature extractors, and their values are improved by iterative learning. CNN gives higher performance than other classifiers. It consists of three layers: (i) the convolution layer performs the feature extraction process, (ii) the pooling layer performs the dimensionality reduction, and (iii) the fully connected layer performs the classification and converts from the two-dimensional matrices into a one-dimensional vector [42]. The convolutional layer represents a learnable filter that extracts features from an input image. For a 3D image with size H × W × C where H is the height, W is the width, and C is the number of channels. Using a 3D filter-sized F × F × F where F is the filter height, F is the filter width, and F is the number of filter channels. Therefore, the output activation map should be with a size of A × A, where A is the activation height and A is the activation width. The values of A and A can be obtained using Eqs. 1 and 2. P represents the padding and S is the stride; n filters may exist, so the activation map size should become A × A × n, as illustrated in Fig. 5.

Fig. 5

Illustration of the convolutional operation

Illustration of the convolutional operation Non-linearity in the network is handled by the activation function, making a non-linear transformation to the neuron’s inputs. For the proposed binary classifier, we apply the sigmoid function in the output layer. It gives the probabilities of a data point belonging to a particular class in values between 0 and 1, calculated by Eq. 3. The Rectified Linear Unit (ReLU) activation function is applied for all hidden layers because of sigmoid drawbacks, as it gives zero results for the negative input values. So, the neuron is not activated, and only a definite number of neurons are activated, which accelerates the computation and training, calculated by Eq. 4. An improved version of the ReLU activation function is called the Leaky rectified linear layer (LReLU), calculated by Eq. 5. The difference between the three activation functions is depicted in Fig. 6.

Fig. 6

The difference among the sigmoid, Relu, and LRelu activation functions [24]

The difference among the sigmoid, Relu, and LRelu activation functions [24] For the proposed multi-classifier, the SoftMax function is used [32], which returns the probability for a data point belonging to each class, calculated from Eq. 6. where x is the input vector, is the standard exponential function for the input vector, k is the number of classes in the multi-class classifier, and is the standard exponential function for the output vector. For medical image classification and AD stage detection, we use two methods. The first method uses simple CNN architectures built from scratch. These architectures are a competitive tool for Multi-class medical image classification (M2IC) and binary medical image classification (BMIC) that deal with 2D, 3D MRI based on 2D, 3D convolution. So, we called these architectures (2D-M2IC, 3D-M2IC, 2D-BMIC, and 3D-BMIC). The 2D-M2IC model uses three convolutional layers in a two-dimensional format by convolutional kernels (sized: 3 × 3), with 3 max-pooling kernels (sized: 2 × 2). After that, there are two dropout layers followed by a flatten layer and 2 FC layers. Rectified linear layer (ReLU) is the activation function of the hidden layers. Eventually, a final FC layer with a softmax activation function is used to handle the four stages of Alzheimer’s disease. The dataset format in this model is the 2D format with a size of (100 × 100) pixels for MRI images. The architecture of the 2D-M2IC model is shown in Fig. 7.

Fig. 7

The 2D-M2IC model architecture

The 2D-M2IC model architecture The 3D-M2IC model has the same structure as the 2D-M2IC model, but it uses 3D convolutional layers. It comprises three convolution layers, three max-pooling, and 2 FC layers, followed by a softmax output layer. All 3D convolution kernels are sized 3 × 3 × 3 with a stride value of 1 in all three dimensions. All pooling kernels are sized 2 × 2 × 2. The 2D MRI medical images’ processing is performed to convert them to the 3D format with size (50 × 30 × 20) voxels to be more suitable to this model, as shown in Fig. 8. The number of trainable parameters is 875.588 and 1,654,468 for 2D-M2IC and 3D-M2IC, respectively. The number of non-trainable parameters is zero for the two architectures. The Adam optimization algorithm is also used in the proposed models to improve the weights with a learning rate = “0.0001” to optimize the loss function.

Fig. 8

The 3D-M2IC model architecture

The 3D-M2IC model architecture The second method uses the transfer learning principle for medical image classification. Transfer learning is a deep learning procedure whereby a neural network model is first trained on a problem similar to the issue being solved. Transfer learning’s key benefit is that (i) it benefits from the pre-trained weights resulting from the training of millions of images from the ImageNet database. (ii) It decreases the training time for a learning model. (iii) Its ability to reduce generalization errors. Therefore, we use the VGG-19 pre-trained model for MRI multi-class classification. VGG-19 is a convolutional neural network that has 19 layers in its architecture. A basic fine-tuning is applied to the final layer of VGG19 to be optimal for the proposed medical image classification problem. The trainable parameter for fine-tuned VGG19 is 25,433,540, and the non-trainable parameter is zero. The tuning applied in the VGG 19 model is shown in Table 3.

Table 3

The tuning applied in the vgg19 model

Model: "sequential"
Layer (type)	Output shape	Param #
vgg19 (functional)	(None, 3, 3, 512)	20,024,384
flatten (Flatten)	(None, 4608)	0
dense (Dense)	(None, 1024)	4,719,616
dense_1 (Dense)	(None, 512)	524,800
dense_2 (Dense)	(None, 256)	131,328
dropout (Dropout)	(None, 256)	0
dense_3 (Dense)	(None, 128)	32,896
dropout_1 (Dropout)	(None, 128)	0
dense_4 (Dense)	(None, 4)	516
Total params: 25,433,540
Trainable params: 25,433,540
Non-trainable params: 0

The tuning applied in the vgg19 model

Experimental Results and Model Evaluation

The proposed models take into consideration different conditions. The experimental results are analyzed in terms of nine performance metrics: accuracy, loss, confusion matrix, F1 Score, recall, precession, the receiver operating characteristic curve (ROC), True Positive Rate (Sensitivity), Area under Curve (AUC), and Matthews Correlation Coefficient. The summarization of the applied performance metrics is shown in Table 4.

Table 4

Summarization of the applied performance metrics

Metric	Description	Expression
Accuracy (ACC)	•It is the number of the correct prediction to the total number of predictions	ACC = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{TP }+\mathrm{ TN}}{\mathrm{TP }+\mathrm{ TN }+\mathrm{ FP }+\mathrm{ FN}}$$\end{document}TP+TNTP+TN+FP+FN where TP, TN, FP, and FN represent the True Positive, True Negative, False Positive, and False Negative values
Loss	•For binary classification, we use binary cross-entropy loss	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l\left(y,p\right)=-(y\mathrm{log}p+(1-y)\mathrm{log}(1-p))$$\end{document}ly,p=-(ylogp+(1-y)log(1-p)) where y is the actual value and p is the predicted value
Loss	•For multi-classification, we use categorical cross-entropy loss	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l\left(y,p\right)=-\sum_{c=1}^{M}{y}_{o,c} \mathrm{log}{p}_{o,c}$$\end{document}ly,p=-∑c=1Myo,clogpo,c where M is the number of classes, l is the loss value, and p is the predicted value
F1 Score	•It is the harmonic mean of precision and recall. It has a range of [0, 1]. The higher the F1 Score is, the better the model performance is	F1 = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{2\mathrm{TP}}{2\mathrm{TP }+\mathrm{ FP }+\mathrm{ FN}}$$\end{document}2TP2TP+FP+FN
Recall	•It is the correct positive result amount to all relevant sample amount	Recall = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FN}}$$\end{document}TPTP+FN
Precession	•It is the correct positive result amount to the positive result amount predicted by the classifier	P = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FP}}$$\end{document}TPTP+FP
The receiver operating curve (ROC) and Area under the Curve (AUC)	•It picks a good cut-off Threshold for the model from plotting True Positive Rate (TPR) against False Positive Rate (FPR) for different values of the Threshold in the range of [0, 1]	TPR (sensitivity) = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FN}}$$\end{document}TPTP+FN FPR (1-specificity) = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{FP}}{\mathrm{FP }+\mathrm{ TN}}$$\end{document}FPFP+TN
Matthews Correlation Coefficient (MCC)	•The higher the correlation between True and predicted values is, the better the model prediction is	MCC = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{TP }\times \mathrm{ TN}-\mathrm{FP }\times \mathrm{ FN}}{\sqrt{(\mathrm{TP }+\mathrm{ FP})(\mathrm{TP }+\mathrm{ FN})(\mathrm{TN }+\mathrm{ FP})(\mathrm{TN }+\mathrm{ FN})}}$$\end{document}TP×TN-FP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)
Confusion matrix	•It is the complete description of the model performance •It gives a matrix as an output, and it forms the basis of other types of metrics that depend on TP, TN, FP, and FN metrics	To understand the definition of TP, TN, FP, and FN, assume the proposed binary model classifies between AD and NC then: – TP: The case that p is AD and y is AD – TN: The case that p is NC and y is NC – FP: The case that p is AD and y is NC – FN: The case that p is NC and y is AD

Summarization of the applied performance metrics ACC = where TP, TN, FP, and FN represent the True Positive, True Negative, False Positive, and False Negative values where y is the actual value and p is the predicted value where M is the number of classes, l is the loss value, and p is the predicted value TPR (sensitivity) = FPR (1-specificity) = •It is the complete description of the model performance •It gives a matrix as an output, and it forms the basis of other types of metrics that depend on TP, TN, FP, and FN metrics To understand the definition of TP, TN, FP, and FN, assume the proposed binary model classifies between AD and NC then: – TP: The case that p is AD and y is AD – TN: The case that p is NC and y is NC – FP: The case that p is AD and y is NC – FN: The case that p is NC and y is AD

Methods and Model Evaluation

For multi-class and binary medical image classification methods applied, we propose simple CNN architecture models called 2D-M2IC, 3D-M2IC, 2D-BMIC, 3D-BMIC, and fine-tuned VGG19 model. According to the accuracy metric, these models will be evaluated by comparing their performance to other state-of-the-art models, as shown in Table 5.

Table 5

Comparison of the proposed models with the state-of-the-art models

Approach	Dataset		Modality	Type of classification	Accuracy
Payan et al. [19]	755 in each class (AD, MCI, and HC)	ADNI	MRI	Binary, multi	AD vs. EMC vs. HC: 89.47% AD vs. HC: 95.39% AD vs. MCI: 86.84% HC vs. MCI: 92.11%
Sarraf et al. [21]	302 subjects (211 AD, 91 NC)	ADNI	MRI, fMRI	Binary	AD vs. HC: 98.84%
Hosseini-Asl et al. [22]	210 subjects (70 AD, 70 NC, 70 MCI)	CAD-dementia	MRI	Binary, multi	AD vs. EMC vs. HC: 89.1% AD + MCI/NC: 90.3% AD/NC: 97.6% AD/MCI: 95% MCI/NC: 90.8%
Korolev et al. [23]	50 AD, 43 LMCI, 77 EMCI, 61 NC	ADNI	MRI	Binary	AD vs. NC: 80% AD vs. EMCI: 63% AD vs. LMCI: 59% LMCI vs. NC: 61% LMCI vs. EMCI: 52% EMCI vs. NC: 56%
Wang et al. [24]	98 AD, 98 NC	Local hospitals, OASIS	MRI	Binary	AD/NC: 97.65%
Khvostikov et al. [25]	53 AD, 228 MCI, 250 NC	ADNI	sMRI and DTI		AD/MCI/NC: 68.9% AD/NC: 93.3% AD/MCI: 86.7% MCI/ NC: 73.3%
Sahumbaiev et al. [26]	530 subjects (185 AD, 185 MCI, 160 HC)	ADNI	MRI	Multi	AD/MCI/NC: 88.31%
Spasov et al. [27]	AD 192, 184 NC	ADNI	MRI	Binary	AD/NC: 99%
Yan Wang et al. [30]	35 AD, 30 aMCI, 40 NC	Beijing Xuanwu Hospital	DTI, fMRI	Multi	AD/aMCI/NC: 92.06%
Khagi et al. [31]	28 AD, 28 NC	OASIS	MRI	Binary	AD/NC: 98.51%
Jain et al. [32]	150 subjects (AD 50, NC 50, MCI 50)	ADNI	sMRI	Multi, binary	AD/MCI/NC: 95.73% AD vs CN: 99.14% AD vs MCI: 99.30% MCI vs. CN: 99.22%
Song et al. [34]	AD 12, NC 12, EMCI 12, LMCI 12	ADNI	DTI	Multi	AD/EMCI/LMCI/NC: 89%
Ge, C., & Qu, Q. et al. [33]	337 subjects (198 AD, 139 NC)	ADNI	MRI	Binary	AD/NC: 98.80%
Harshit et al. [38]	120 subjects, 30 for each class (AD, EMCI, LMCI, NC)	ADNI	4D FMRI	Multi-classification	AD/EMCI/LMCI/NC: 93%
Silvia et al. [39]	407 HC, 418 AD, 280 c-MCI, 533 stable MCI [s-MCI]	ADNI	3D MRI	Binary	AD vs. HC: 99.2%, c-MCI vs HC: 87.1%, s-MCI vs. HC: 76.1%, AD vs. c-MCI: 75.4%, AD vs. s-MCI: 85.9%, c-MCI vs. s-MCI: 75.1%
Dan et al. [40]	787 subjects for (AD, MCI_c, MCI_nc, HC) classes	ADNI	3D MRI	Binary	AD vs. HC: 84%, MCIc vs. HC: 79%, MCIc vs. MCInc: 62%
Juan Ruiz et al. [41]	600 brain MRI images	ADNI	3D MRI	Multi	AD, EMCI, LMCI, NC: 66.67%
Proposed 2D-M²IC model	300 subjects (75 AD, 75 EMCI, 75 LMCI, 75 NC) Total size = 48,000 MRI images	ADNI	2D MRI	Multi, binary	AD vs. NC: 97.11% AD vs. EMCI: 96.32% AD vs. LMCI: 96.62% LMCI vs. NC: 98.10% LMCI vs. EMCI: 95.23% EMCI vs. NC: 98.39% AD/EMCI/LMCI/NC: 93.60%
Proposed 3D-M²IC model			3D MRI	Multi, binary	AD vs. NC: 97.36% AD vs. EMCI: 97.07% AD vs. LMCI: 97.16% LMCI vs. NC: 98.05% LMCI vs. EMCI: 96.03% EMCI vs. NC: 98.47% AD/EMCI/LMCI/NC: 95.17%
Proposed fine-tuned VGG19 model			2D MRI	Multi	AD/EMCI/LMCI/NC: 97%

Comparison of the proposed models with the state-of-the-art models AD vs. EMC vs. HC: 89.47% AD vs. HC: 95.39% AD vs. MCI: 86.84% HC vs. MCI: 92.11% AD vs. EMC vs. HC: 89.1% AD + MCI/NC: 90.3% AD/NC: 97.6% AD/MCI: 95% MCI/NC: 90.8% AD vs. NC: 80% AD vs. EMCI: 63% AD vs. LMCI: 59% LMCI vs. NC: 61% LMCI vs. EMCI: 52% EMCI vs. NC: 56% AD/MCI/NC: 68.9% AD/NC: 93.3% AD/MCI: 86.7% MCI/ NC: 73.3% AD/MCI/NC: 95.73% AD vs CN: 99.14% AD vs MCI: 99.30% MCI vs. CN: 99.22% 300 subjects (75 AD, 75 EMCI, 75 LMCI, 75 NC) Total size = 48,000 MRI images AD vs. NC: 97.11% AD vs. EMCI: 96.32% AD vs. LMCI: 96.62% LMCI vs. NC: 98.10% LMCI vs. EMCI: 95.23% EMCI vs. NC: 98.39% AD/EMCI/LMCI/NC: 93.60% AD vs. NC: 97.36% AD vs. EMCI: 97.07% AD vs. LMCI: 97.16% LMCI vs. NC: 98.05% LMCI vs. EMCI: 96.03% EMCI vs. NC: 98.47% AD/EMCI/LMCI/NC: 95.17% Table 5 shows that for multi-class medical image classification of AD stages (AD, EMCI, LMCI, NC), the proposed fine-tuned vgg19 achieved the highest accuracy of 97%. The proposed 3D-M2IC achieved the second-highest accuracy of 95.17%. The proposed 2D-M2IC achieved the third-highest accuracy of 93.6%. Harshit et al. [38] get the fourth-highest accuracy value of 93%, and Juan Ruiz et al. [41] get the lowest accuracy of 66.7%. Therefore, from the empirical results, it is proved that the proposed architectures are suitable simple structures that reduce computational complexity, memory requirements, overfitting, and provide manageable time. They also achieve very promising accuracy for binary and multi-class classification. Figure 9 shows the comparison of the proposed models (2D-M2IC, 3D-M2IC, and fine-tuned VGG19 model) with other state-of-the-art models for multi-class medical image classification.

Fig. 9

The comparison of the proposed models with other models for multi-class medical image classification

The comparison of the proposed models with other models for multi-class medical image classification The comparison among the proposed models (2D-M2IC, 3D-M2IC, 2D-BMIC, 3D-BMIC, and fine-tuned VGG19 model) with one another for multi-class and binary medical image classifications for four stages of Alzheimer’s disease is shown in Fig. 10. It shows three multi-class medical image classifications and 12 binary medical image classifications for the AD spectrum.

Fig. 10

The comparison among the proposed models (2D-M2IC, 3D-M2IC, 2D-BMIC, 3D-BMIC, and fine-tuned VGG19 model) with one another

The comparison among the proposed models (2D-M2IC, 3D-M2IC, 2D-BMIC, 3D-BMIC, and fine-tuned VGG19 model) with one another The performance metrics, such as precision, recall, and F1 Score of the models (2D-M2IC model, 3D-M2IC model) on the test set after 25 epochs of learning, are shown in Table 6.

Table 6

Comparison of the performance metrics of the two proposed models (2D-M2IC model, 3D-M2IC model)

	2D-M²IC			3D-M²IC
	Precision	Recall	F1 Score	Precision	Recall	F1 Score	Support
AD	0.96	0.93	0.95	0.98	0.94	0.96	1200
EMCI	0.90	0.97	0.94	0.92	0.96	0.94	1200
LMCI	0.98	0.90	0.93	0.97	0.88	0.92	1200
NC	0.98	0.95	0.96	0.97	0.98	0.98	1200
Micro-avg	0.95	0.94	0.95	0.96	0.95	0.95	4800
Macro-avg	0.95	0.94	0.95	0.96	0.94	0.95	4800
Weighted-avg	0.95	0.94	0.95	0.96	0.95	0.95	4800
Samples-avg	0.94	0.94	0.94	0.95	0.95	0.95	4800

Comparison of the performance metrics of the two proposed models (2D-M2IC model, 3D-M2IC model) When evaluating the models (2D-M2IC model, 3D-M2IC model) by training and validation accuracy and the training and validation loss, it is noticed that the accuracy increases and the loss is decreased for the models, as shown in Figs. 11 and 12, respectively.

Fig. 11

Training and validation accuracy and loss for 2D-M2IC

Fig. 12

Training and validation accuracy and loss for 3D-M2IC

Training and validation accuracy and loss for 2D-M2IC Training and validation accuracy and loss for 3D-M2IC The confusion matrix shows the number of patients diagnosed as NC and classified as AD and vice versa, the number of patients diagnosed as NC and classified as LMCI and vice versa, the number of patients diagnosed as LMCI and classified as EMCI and vice versa, and so on. The confusion matrix and normalized confusion matrix for the models (2D-M2IC model, 3D-M2IC model) are shown in Table 7.

Table 7

The confusion metric and normalized confusion metric for the proposed models (2D-M2IC model, 3D-M2IC).

The confusion metric and normalized confusion metric for the proposed models (2D-M2IC model, 3D-M2IC). The ROC-AUC for the models (2D-M2IC model, 3D-M2IC model) where class 0 refers to AD, class 1 refers to EMCI, class 2 refers to LMCI, and class 3 refers to NC, shown in Figs. 13 and 14, respectively. Besides, when applying the MCC metric for evaluating the proposed models, MCC = 92.51134% for 2D-M2IC and 94.3247% for 3D-M2IC for medical image multi-class classifications.

Fig. 13

The ROC-AUC of the proposed 2D-M2IC

Fig. 14

The ROC-AUC of the proposed 3D-M2IC

The ROC-AUC of the proposed 2D-M2IC The ROC-AUC of the proposed 3D-M2IC

Alzheimer Checking Web Service

Because of the COVID-19 pandemic, it is difficult for people to go to hospitals periodically to avoid gatherings and infections. Thus, a web service based on the proposed CNN architectures is established. It aims to support patients and doctors in diagnosing and checking Alzheimer’s disease remotely. It also determines in which Alzheimer’s stage the patient suffers from based on the AD spectrum. The application is created using the python programing language. Python is used to program the back-end of the website. Besides, HTML, CSS, JavaScript, and Bootstrap languages are used for the design of the website. The website is divided into sections. The first contains information about Alzheimer’s disease. It also includes the causes that lead to it. The second contains the stages of Alzheimer’s and the features in each AD stage. The third is a dynamic application that works as a virtual doctor. The patients or doctors can upload the MRI images for the brain. The application then checks if that MRI has the disease or not and to which stage the MRI images belong. After that, the application advises the patient according to the AD stage diagnosed, as appeared in Fig. 15. Figure 15 shows how the Alzheimer Checking Web Service is tested using random MRI images from the ADNI dataset for different stages of Alzheimer’s disease. After the patient uploads the MRI image, the program classifies the MRI as belonging to one of the phases of Alzheimer’s disease (AD, EMCI, LMCI, and NC). Moreover, the application guides the patient with advice relied on the classified stage.

Fig. 15

The AD stage prediction for MRI medical images

Conclusion

In this paper, the E2AD2C framework for medical image classification and Alzheimer’s disease detection is proposed. The proposed framework is based on deep-learning CNN architectures. Four AD stages are multi-classified. Besides, separate binary classifications are implemented between each two-pair class. This medical image classification is applied using two methods. The first method uses simple CNN architectures that deal with 2D and 3D structural brain scans from the ADNI dataset based on 2D and 3D convolution. The second method applies the transfer learning principle to take advantage of the pre-trained models. So, the VGG19 model is fine-tuned and used for multi-class medical image classifications. Moreover, Alzheimer’s checking web application is proposed using the final qualified proposed architectures. It helps doctors and patients to check AD remotely, determines the Alzheimer’s stage of the patient based on the AD spectrum, and advises the patient according to its AD stage. Nine performance metrics are used in the evaluation and comparison between the two methods. The experimental results prove that the proposed architectures are suitable simple structures that reduce computational complexity, memory requirements, overfitting, and provide manageable time. They also achieve very promising accuracy, 93.61% and 95.17% for 2D and 3D multi-class AD stage classifications. The VGG19 pre-trained model is fine-tuned and achieved an accuracy of 97% for multi-class AD stage classifications. In the future, it is planned to apply other pre-trained models such as EfficientNet B0 to B7 for multi-class AD stage classifications and check the performance. Furthermore, the dataset is augmented by simple data augmentation techniques. It is intended to use the DCGAN technique. In addition to that, it is planned to apply MRI segmentation to emphasize Alzheimer’s features before AD stage classifications.

21 in total

1. Regional coherence changes in the early stages of Alzheimer's disease: a combined structural and resting-state functional MRI study.

Authors: Yong He; Liang Wang; Yufeng Zang; Lixia Tian; Xinqing Zhang; Kuncheng Li; Tianzi Jiang
Journal: Neuroimage Date: 2007-01-24 Impact factor: 6.556

2. GRAPH CONVOLUTIONAL NEURAL NETWORKS FOR ALZHEIMER'S DISEASE CLASSIFICATION.

Authors: Tzu-An Song; Samadrita Roy Chowdhury; Fan Yang; Heidi Jacobs; Georges El Fakhri; Quanzheng Li; Keith Johnson; Joyita Dutta
Journal: Proc IEEE Int Symp Biomed Imaging Date: 2019-07-11

3. A Robust Deep Model for Improved Classification of AD/MCI Patients.

Authors: Feng Li; Loc Tran; Kim-Han Thung; Shuiwang Ji; Dinggang Shen; Jiang Li
Journal: IEEE J Biomed Health Inform Date: 2015-05-04 Impact factor: 5.772

4. Multi-Source Transfer Learning via Ensemble Approach for Initial Diagnosis of Alzheimer's Disease.

Authors: Yun Yang; Xinfa Li; Pei Wang; Yuelong Xia; Qiongwei Ye
Journal: IEEE J Transl Eng Health Med Date: 2020-04-23 Impact factor: 3.316

5. Early Detection of Alzheimer's Disease Using Magnetic Resonance Imaging: A Novel Approach Combining Convolutional Neural Networks and Ensemble Learning.

Authors: Dan Pan; An Zeng; Longfei Jia; Yin Huang; Tory Frizzell; Xiaowei Song
Journal: Front Neurosci Date: 2020-05-13 Impact factor: 4.677

Review 6. 3D Deep Learning on Medical Images: A Review.

Authors: Satya P Singh; Lipo Wang; Sukrit Gupta; Haveesh Goli; Parasuraman Padmanabhan; Balázs Gulyás
Journal: Sensors (Basel) Date: 2020-09-07 Impact factor: 3.576

Review 7. Deep Learning in Mining Biological Data.

Authors: Mufti Mahmud; M Shamim Kaiser; T Martin McGinnity; Amir Hussain
Journal: Cognit Comput Date: 2021-01-05 Impact factor: 5.418

Review 8. Deep Learning for Computer Vision: A Brief Review.

Authors: Athanasios Voulodimos; Nikolaos Doulamis; Anastasios Doulamis; Eftychios Protopapadakis
Journal: Comput Intell Neurosci Date: 2018-02-01

Review 9. MRI Segmentation and Classification of Human Brain Using Deep Learning for Diagnosis of Alzheimer's Disease: A Survey.

Authors: Nagaraj Yamanakkanavar; Jae Young Choi; Bumshik Lee
Journal: Sensors (Basel) Date: 2020-06-07 Impact factor: 3.576

3 in total

1. A³C-TL-GTO: Alzheimer Automatic Accurate Classification Using Transfer Learning and Artificial Gorilla Troops Optimizer.

Authors: Nadiah A Baghdadi; Amer Malki; Hossam Magdy Balaha; Mahmoud Badawy; Mostafa Elhosseini
Journal: Sensors (Basel) Date: 2022-06-02 Impact factor: 3.847

2. Early-Stage Alzheimer's Disease Prediction Using Machine Learning Models.

Authors: C Kavitha; Vinodhini Mani; S R Srividhya; Osamah Ibrahim Khalaf; Carlos Andrés Tavera Romero
Journal: Front Public Health Date: 2022-03-03

Review 3. Artificial Intelligence Models in the Diagnosis of Adult-Onset Dementia Disorders: A Review.

Authors: Gopi Battineni; Nalini Chintalapudi; Mohammad Amran Hossain; Giuseppe Losco; Ciro Ruocco; Getu Gamo Sagaro; Enea Traini; Giulio Nittari; Francesco Amenta
Journal: Bioengineering (Basel) Date: 2022-08-05

3 in total