| Literature DB >> 31284687 |
Muhammad Owais1, Muhammad Arsalan1, Jiho Choi1, Tahir Mahmood1, Kang Ryoung Park2.
Abstract
Various techniques using artificial intelligence (AI) have resulted in a significant contribution to field of medical image and video-based diagnoses, such as radiology, pathology, and endoscopy, including the classification of gastrointestinal (GI) diseases. Most previous studies on the classification of GI diseases use only spatial features, which demonstrate low performance in the classification of multiple GI diseases. Although there are a few previous studies using temporal features based on a three-dimensional convolutional neural network, only a specific part of the GI tract was involved with the limited number of classes. To overcome these problems, we propose a comprehensive AI-based framework for the classification of multiple GI diseases by using endoscopic videos, which can simultaneously extract both spatial and temporal features to achieve better classification performance. Two different residual networks and a long short-term memory model are integrated in a cascaded mode to extract spatial and temporal features, respectively. Experiments were conducted on a combined dataset consisting of one of the largest endoscopic videos with 52,471 frames. The results demonstrate the effectiveness of the proposed classification framework for multi-GI diseases. The experimental results of the proposed model (97.057% area under the curve) demonstrate superior performance over the state-of-the-art methods and indicate its potential for clinical applications.Entities:
Keywords: Artificial intelligence (AI); classification of multiple gastrointestinal (GI) diseases; deep learning; endoscopic video analysis; residual network (ResNet) and long short-term memory (LSTM) model
Year: 2019 PMID: 31284687 PMCID: PMC6678612 DOI: 10.3390/jcm8070986
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Comparison of our proposed and existing methods for endoscopy disease classification.
| Endoscopy Type | Method | Purpose | No. of Classes | Strength | Weakness |
|---|---|---|---|---|---|
| CE | Log Gabor filter, SUSAN edge detection and SVM [ | Small bowel polyps and ulcers detection | 2 | Computationally efficient | Limited dataset and number of classes |
| CE | Texture features (ULBP, wavelet) + SVM [ | Polyp detection in GI tract | 2 | Robust to illumination change and scale invariant | Limited dataset and number of classes |
| CE | Texture features (LBP, wavelet) + SVM [ | Tumor recognition in the digestive tract | 2 | Invariant to illumination change | Limited dataset and number of classes |
| CE | Texture features (SIFT, Saliency) + SVM [ | Polyp classification | 2 | Extract scale invariant features | Limited dataset and number of classes |
| CE | Texture features (SIFT, HoG, LBP, CLBP, ULBP) + SVM, FLDA [ | Polyp Detection | 2 | Extract scale invariant features | Limited dataset and number of classes |
| CE | CNN [ | Small intestine movement characterization | 6 | High classification performance | Limited number of classes |
| CE | CNN [ | Celiac disease classification | 2 | High sensitivity and specificity | Limited dataset and number of classes |
| CE | CNN [ | Hookworm detection | 2 | Edge extraction network results in better performance | Limited number of classes |
| EGD | CNN [ | 9 | Comparable performance of second CNN with the clinical diagnosis reference standard | CAD performance should be enhanced. | |
| EGD | CNN [ | Anatomical classification of GI images | 6 | High classification performance | Limited number of classes |
| EGD | CNN-based SSD detector [ | Gastric cancer detection | 2 | High sensitivity | Overall low positive prediction value |
| Colonoscopy | CNN [ | Colorectal polyp detection and classification | 3 | High detection performance | Limited dataset and number of classes |
| Colonoscopy | CNN [ | Real-time colorectal polyp type analysis | 4 | High accuracy and sensitivity | Limited number of classes |
| Colonoscopy | Online and offline 3D-CNN [ | Detection of colorectal polyps | 2 | Computationally efficient | CAD performance should be enhanced. |
| EGD, Colonoscopy, Sigmoidoscopy, | CNN (ResNet) + LSTM | Classification of multiple GI diseases | 37 | Computationally efficient | Cascaded training of CNN and LSTM requires more time |
Figure 1Overall flow diagram of the proposed classification framework.
Figure 2Overview of the proposed cascaded convolutional neural network and long short-term memory (LSTM)-based deep architecture for the classification of multiple gastrointestinal (GI) diseases.
Figure 3Residual block of ResNet18 with (a) 1 × 1 convolutional-mapping-based residual unit and (b) identity-mapping-based residual unit.
Layer-wise configuration details of deep ResnNet18 model in our study.
| Layer Name | Feature Map | Filters | Kernel Size | Stride | #Padding | Total Learnable Parameters |
|---|---|---|---|---|---|---|
| Image input layer |
| n/a | n/a | n/a | n/a | n/a |
| Conv1 |
| 64 |
| 2 | 3 | 9600 |
| Max pooling |
| 1 |
| 2 | 1 | |
| Conv2-1–Conv2-2 |
| 64 |
| 1 | 1 | 74,112 |
| Conv3-1–Conv3-2 |
| 64 |
| 1 | 1 | 74,112 |
| Conv4-1–Conv4-2 |
| 128 |
| 2 | 1 | 230,528 |
| Conv5-1–Conv5-2 |
| 128 |
| 1 | 1 | 295,680 |
| Conv6-1–Conv6-3 |
| 256 |
| 2 | 1 | 919,808 |
| Conv7-1–Conv7-2 |
| 256 |
| 1 | 1 | 1,181,184 |
| Conv8-1–Conv8-3 |
| 512 |
| 2 | 1 | 3,674,624 |
| Conv9-1–Conv9-2 |
| 512 |
| 1 | 1 | 4,721,664 |
| Avg pooling |
| 1 |
| 7 | 0 | |
| FC layer | 37 | 18,981 | ||||
| Softmax | 37 | |||||
| Classification layer | 37 | |||||
| Total number of learnable parameters: 11,200,293 | ||||||
Figure 4Internal connectivity of a standard LSTM cell.
Layer-wise configuration details of long short-term memory (LSTM) model in our study.
| Layer Name | Feature Map Size | Total Learnable |
|---|---|---|
| Sequence input layer |
| |
| LSTM | 600 | 1,951,200 |
| Dropout | 600 | |
| FC layer | 37 | 22,237 |
| Softmax | 37 | |
| Classification layer | 37 | |
| Total learnable parameters: 1,973,437 | ||
Figure 5Different anatomical districts of the human GI tract.
Details of multiple subcategories of each anatomical district and their corresponding classes.
| Gastrointestinal Tract | Class Name | Training Set (Frames) | Testing Set (Frames) | Total | |
|---|---|---|---|---|---|
| Anatomical District | Subcategory | ||||
|
| Larynx | C1: Normal | 387 | 387 | 774 |
| Upper part | C2: Normal | 625 | 625 | 1250 | |
| C3: Esophageal candidiasis | 419 | 419 | 838 | ||
| C4: Esophageal papillomatosis | 272 | 272 | 544 | ||
| Lower part (z-line) | C5: Normal | 250 | 250 | 500 | |
|
| Cardia | C6: Hiatal hernia | 648 | 648 | 1296 |
| Fundus | C7: Atrophic gastritis | 241 | 241 | 482 | |
| C8: Atrophic and xanthoma gastritis | 255 | 254 | 509 | ||
| Body | C9: Benign hyperplastic polyps | 1070 | 1070 | 2140 | |
| C10: Adenocarcinoma (Cancer) | 955 | 955 | 1910 | ||
| Pylorus | C11: Normal | 1275 | 1275 | 2550 | |
|
| Duodenum | C12: Normal | 423 | 423 | 846 |
| C13: Ulcer | 1345 | 1345 | 2690 | ||
| C14: Papilla vateri | 702 | 702 | 1404 | ||
| Terminal Ileum | C15: Crohn’s disease | 840 | 840 | 1680 | |
| Ileocecal | C16: Severe Crohn’s disease | 278 | 278 | 556 | |
| Ileocecal valve | C17: Crohn’s disease | 838 | 838 | 1676 | |
|
| Caecum | C18: Adenocarcinoma (Cancer) | 1301 | 1301 | 2602 |
| C19: Melanosis coli | 342 | 342 | 684 | ||
| C20: Caecal angiectasia | 403 | 404 | 807 | ||
| C21: Appendix aperture | 694 | 694 | 1388 | ||
| Ascending/ | C22: Adenocarcinoma (Cancer) | 1293 | 1293 | 2586 | |
| C23: Melanosis coli | 603 | 604 | 1207 | ||
| C24: Other types of polyps | 250 | 250 | 500 | ||
| C25: Dyed resection margins | 250 | 250 | 500 | ||
| C26: Dyed lifted polyps | 250 | 250 | 500 | ||
| C27: Melanosis coli and tuber adenoma | 243 | 243 | 486 | ||
| C28: Inflammatory polyposis | 382 | 382 | 764 | ||
| C29: Normal | 500 | 500 | 1000 | ||
| Sigmoid Colon | C30: Tuber adenoma | 2212 | 2212 | 4424 | |
| C31: Polypoid cancer | 282 | 282 | 564 | ||
| Rectosigmoid | C32: Ulcerative colitis | 2071 | 2071 | 4142 | |
|
| C33: Severe Crohn’s disease | 1074 | 1074 | 2148 | |
| C34: Adenocarcinoma (Cancer) | 1362 | 1362 | 2724 | ||
| C35: Tuber adenoma | 1069 | 1069 | 2138 | ||
| C36: Normal | 420 | 420 | 840 | ||
| C37: A focal radiation injury | 411 | 411 | 822 | ||
Figure 6Examples from each class of the 37 different categories (i.e., to ) including both normal and diseased cases.
Figure 7Selected sample images for illustrating the high intra-class variance: (a) C18; (b) C22; (c) C34; (d) C32; (e) C13; and (f) C30.
Parameters of the stochastic gradient descent method for the training of both ResNet18 and LSTM models in our experiments.
| Model | Number of Training Epochs | Initial Learning Rate | Momentum | L2-Regularization | Learning Rate Drop Factor | Mini-Batch Size |
|---|---|---|---|---|---|---|
| ResNet18 | 8 | 0.001 | 0.9 | 0.0001 | 0.1 | 10 |
| LSTM | 10 | 0.0001 | 0.9 | 0.0001 | 0.1 | 50 |
Figure 8Training loss and accuracy plots during the first stage (i.e., spatial features extraction by ResNet18): (a) 1st fold cross-validation; and (b) 2nd fold cross-validation.
Figure 9Training loss and accuracy plots during the second stage (i.e., temporal features extraction using LSTM): (a) 1st fold cross-validation; and (b) 2nd fold cross-validation.
Figure 10Classification performance of our framework according to the number of frames () for LSTM.
Performance comparison of our method using ResNet18 + LSTM with the conventional ResNet18 model based on feature extraction from different layers (unit: %).
| Layer Name | Feature Dim. | ResNet18 [ | Proposed | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | F1 score | mAP | mAR | Accuracy | F1 score | mAP | mAR | ||
| Conv6-2 | 50,176 | 75.86 ± 4.03 | 78.62 ± 1.28 | 81.64 ± 0.35 | 75.85 ± 2.69 | 87.15 ± 1.02 | 87.61 ± 0.04 | 88.85 ± 0.81 | 86.40 ± 0.85 |
| Conv7-2 | 50,176 | 77.13 ± 3.61 | 79.61 ± 0.73 | 82.42 ± 0.76 | 77.02 ± 2.02 | 88.02 ± 2.78 | 88.94 ± 1.18 | 91.20 ± 0.12 | 86.81 ± 2.36 |
| Conv8-2 | 25,088 | 84.39 ± 1.54 | 84.75 ± 0.69 | 85.92 ± 0.20 | 83.62 ± 1.15 | 89.07 ± 0.10 | 89.96 ± 0.88 | 91.24 ± 0.86 | 88.72 ± 0.91 |
| Conv9-2 | 25,088 | 87.10 ± 0.70 | 87.57 ± 0.47 | 88.19 ± 0.17 | 86.97 ± 1.09 | 89.39 ± 1.10 | 89.70 ± 1.69 | 90.24 ± 1.61 | 89.18 ± 1.76 |
| Avg. pooling | 512 | 89.95 ± 1.26 | 90.35 ± 1.74 | 90.72 ± 1.17 | 89.99 ± 2.29 | 92.57 ± 0.66 | 93.41 ± 0.12 | 94.58 ± 0.37 | 92.28 ± 0.58 |
Performance comparisons of our method (ResNet18 + LSTM) with the conventional ResNet18 with and without PCA (unit: %).
| Method | ResNe18 [ | Proposed | ||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | F1 score | mAP | mAR | Accuracy | F1 score | mAP | mAR | |
| With PCA | 88.50 ± 1.01 | 90.16 ± 0.16 | 91.85 ± 0.11 | 88.52 ± 0.20 | 90.01 ± 0.17 | 91.82 ± 0.37 | 94.22 ± 0.40 | 89.54 ± 0.33 |
| Without PCA | 89.95 ± 1.26 | 90.35 ± 1.74 | 90.72 ± 1.17 | 89.99 ± 2.29 | 92.57 ± 0.66 | 93.41 ± 0.12 | 94.58 ± 0.37 | 92.28 ± 0.58 |
Figure 11Confusion matrix of the proposed method. The entry in the row and column corresponds to the percentage of samples from class that were classified as class . Precision and recall are calculated as “TP/ (TP + FP) “and “TP/ (TP + FN)” [45], respectively.
Comparative classification performance of proposed method and different baseline CNN models (unit: %).
| Methods | Accuracy | F1 Score | mAP | mAR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | |
| SqueezeNet [ | 78.69 | 77.00 | 77.84 ± 1.19 | 77.53 | 75.95 | 76.74 ± 1.12 | 78.38 | 75.16 | 76.77 ± 2.27 | 76.70 | 76.76 | 76.73 ± 0.04 |
| AlexNet [ | 79.19 | 80.97 | 80.08 ± 1.26 | 80.31 | 80.66 | 80.49 ± 0.24 | 80.55 | 80.85 | 80.70 ± 0.21 | 80.08 | 80.47 | 80.28 ± 0.27 |
| GoogLeNet [ | 83.36 | 85.82 | 84.59 ± 1.74 | 84.99 | 85.29 | 85.14 ± 0.21 | 84.67 | 85.92 | 85.29 ± 0.89 | 85.32 | 84.66 | 84.99 ± 0.47 |
| VGG19 [ | 84.81 | 85.49 | 85.15 ± 0.48 | 84.57 | 86.02 | 85.29 ± 1.03 | 85.48 | 86.27 | 85.88 ± 0.56 | 83.67 | 85.77 | 84.72 ± 1.48 |
| VGG16 [ | 83.88 | 87.57 | 85.72 ± 2.61 | 84.84 | 86.77 | 85.80 ± 1.37 | 85.20 | 87.28 | 86.24 ± 1.47 | 84.48 | 86.26 | 85.37 ± 1.26 |
| InceptionV3 [ | 87.23 | 88.61 | 87.92 ± 0.98 | 87.80 | 89.10 | 88.45 ± 0.92 | 86.50 | 89.24 | 87.87 ± 1.93 | 89.14 | 88.96 | 89.05 ± 0.13 |
| ResNet50 [ | 88.94 | 90.17 | 89.55 ± 0.87 | 90.13 | 91.06 | 90.60 ± 0.66 | 89.59 | 91.82 | 90.70 ± 1.58 | 90.68 | 90.32 | 90.50 ± 0.26 |
| ResNet18 [ | 90.84 | 89.06 | 89.95 ± 1.26 | 91.58 | 89.13 | 90.35 ± 1.74 | 91.55 | 89.89 | 90.72 ± 1.17 | 91.62 | 88.37 | 89.99 ± 2.29 |
| Proposed | 92.10 | 93.03 | 92.57 ± 0.66 | 93.49 | 93.33 | 93.41 ± 0.12 | 94.32 | 94.84 | 94.58 ± 0.37 | 92.68 | 91.87 | 92.28 ± 0.58 |
Figure 12Receiver operating characteristic curves of our proposed method and other baseline models with the area under the curve (AUC).
Parametric and structural comparisons of different deep CNN models with our proposed model.
| CNN Models | Size (MB) | No. of Conv. Layers | No. of FC Layers | No. of LSTM Layers | Network Depth | Parameters (Millions) | Image Input Size |
|---|---|---|---|---|---|---|---|
| SqueezeNet [ | 4.6 MB | 18 | 18 | 1.24 | 227-by-227 | ||
| AlexNet [ | 227 MB | 5 | 3 | 8 | 61 | 227-by-227 | |
| GoogLeNet [ | 27 MB | 21 | 1 | 22 | 7.0 | 224-by-224 | |
| VGG19 [ | 535 MB | 16 | 3 | 19 | 144 | 224-by-224 | |
| VGG16 [ | 515 MB | 13 | 3 | 16 | 138 | 224-by-224 | |
| InceptionV3 [ | 89 MB | 47 | 1 | 48 | 23.9 | 299-by-299 | |
| ResNet50 [ | 96 MB | 49 | 1 | 50 | 25.6 | 224-by-224 | |
| ResNet18 [ | 44 MB | 17 | 1 | 18 | 11.7 | 224-by-224 | |
|
| 48 MB | 17 | 1 | 1 | 19 | 13.17 | 224-by-224 |
Figure 13Sensitivity analysis plot of our method and various baseline models in terms of (a) average accuracy; (b) average F1 score; (c) mAP; and (d) mAR.
Figure 14t-test performance of our method and the second-best model in terms of (a) average accuracy; (b) average F1 score; (c) mAP; and (d) mAR.
Comparison of classification performance of the proposed method with different handcrafted feature-based methods (unit: %).
| Method | Classifiers | Accuracy | F1 Score | mAP | mAR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | Fold | Fold | Avg. ± Std | ||
| LBP | AdaBoostM2 | 36.90 | 34.57 | 35.74 ± 1.65 | 28.85 | 26.55 | 27.70 ± 1.63 | 36.90 | 34.57 | 35.74 ± 1.65 | 23.68 | 21.55 | 22.61 ± 1.51 |
| Multi-SVM | 45.53 | 42.15 | 43.84 ± 2.39 | 43.34 | 41.35 | 42.35 ± 1.41 | 44.05 | 41.94 | 42.99 ± 1.49 | 42.66 | 40.77 | 41.72 ± 1.34 | |
| RF | 57.37 | 56.84 | 57.10 ± 0.37 | 53.40 | 54.31 | 53.85 ± 0.64 | 54.53 | 55.06 | 54.79 ± 0.37 | 52.31 | 53.58 | 52.95 ± 0.90 | |
| KNN | 49.68 | 51.24 | 50.46 ± 1.10 | 46.28 | 48.44 | 47.36 ± 1.53 | 45.73 | 47.99 | 46.86 ± 1.59 | 46.84 | 48.90 | 47.87 ± 1.46 | |
| HoG [ | AdaBoostM2 | 40.28 | 38.41 | 39.35 ± 1.33 | 33.04 | 32.68 | 32.86 ± 0.25 | 40.28 | 38.41 | 39.35 ± 1.33 | 28.00 | 28.44 | 28.22 ± 0.31 |
| Multi-SVM | 47.96 | 51.73 | 49.84 ± 2.67 | 51.95 | 55.66 | 53.80 ± 2.63 | 68.13 | 66.64 | 67.39 ± 1.05 | 41.97 | 47.79 | 44.88 ± 4.11 | |
| RF | 60.10 | 62.72 | 61.41 ± 1.85 | 61.73 | 64.66 | 63.19 ± 2.07 | 68.03 | 69.29 | 68.66 ± 0.89 | 56.49 | 60.61 | 58.55 ± 2.91 | |
| KNN | 50.14 | 56.26 | 53.20 ± 4.33 | 52.22 | 57.13 | 54.68 ± 3.47 | 57.37 | 59.45 | 58.41 ± 1.47 | 47.93 | 54.98 | 51.45 ± 4.99 | |
| MLBP | AdaBoostM2 | 46.42 | 41.62 | 44.02 ± 3.40 | 40.04 | 34.85 | 37.45 ± 3.67 | 46.42 | 41.62 | 44.02 ± 3.40 | 35.20 | 29.98 | 32.59 ± 3.69 |
| Multi-SVM | 56.18 | 54.76 | 55.47 ± 1.00 | 53.72 | 52.49 | 53.10 ± 0.87 | 55.70 | 53.81 | 54.75 ± 1.33 | 51.87 | 51.23 | 51.55 ± 0.45 | |
| RF | 61.56 | 61.24 | 61.40 ± 0.22 | 56.98 | 58.16 | 57.57 ± 0.84 | 58.41 | 59.75 | 59.08 ± 0.95 | 55.62 | 56.65 | 56.13 ± 0.73 | |
| KNN | 54.38 | 56.43 | 55.40 ± 1.45 | 50.90 | 53.49 | 52.20 ± 1.83 | 50.92 | 53.21 | 52.06 ± 1.61 | 50.88 | 53.78 | 52.33 ± 2.05 | |
| Proposed | 92.10 | 93.03 | 92.57 ± 0.66 | 93.49 | 93.33 | 93.41 ± 0.12 | 94.32 | 94.84 | 94.58 ± 0.37 | 92.68 | 91.87 | 92.28 ± 0.58 | |
Figure 15Class prediction-based retrieval system by using our proposed classification framework.
Figure 16Examples of the correctly retrieved frames by our proposed method: (a) C6; (b) C19; (c) C24; and (d) C37.
Figure 17Examples of the correctly classified frames by our proposed method with probability score graph: (a) C6; (b) C19; (c) C24; and (d) C37.
Figure 18Examples of the incorrectly retrieved frames by our proposed method: (a) C16; (b) C31; (c) C33; and (d) C34.
Figure 19Examples of the incorrectly classified frames by our proposed method with probability score graph: (a) C16; (b) C31; (c) C33; and (d) C34.