| Literature DB >> 33242010 |
Muhammad Owais1, Muhammad Arsalan1, Tahir Mahmood1, Jin Kyu Kang1, Kang Ryoung Park1.
Abstract
BACKGROUND: The early diagnosis of various gastrointestinal diseases can lead to effective treatment and reduce the risk of many life-threatening conditions. Unfortunately, various small gastrointestinal lesions are undetectable during early-stage examination by medical experts. In previous studies, various deep learning-based computer-aided diagnosis tools have been used to make a significant contribution to the effective diagnosis and treatment of gastrointestinal diseases. However, most of these methods were designed to detect a limited number of gastrointestinal diseases, such as polyps, tumors, or cancers, in a specific part of the human gastrointestinal tract.Entities:
Keywords: artificial intelligence; computer-aided diagnosis; content-based medical image retrieval; deep learning; endoscopic video retrieval; polyp detection
Year: 2020 PMID: 33242010 PMCID: PMC7728528 DOI: 10.2196/18563
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Comprehensive flow diagram of the proposed classification and retrieval framework. The red dotted box highlights our major contributions in this proposed retrieval framework.
Figure 2Overall block diagram of our proposed spatiotemporal feature–based classification network composed of DenseNet and LSTM-based networks. KNN: k-nearest neighbor; LSTM: long short-term memory; PCA: principal component analysis.
Figure 3Example frames of our selected data set for each class and their correspondence with the different anatomical districts of the human gastrointestinal tract.
Performance comparisons of our proposed network with and without applying the PCA.
| Fold | Performance without PCAa (using fully connected network), % | Performance with PCA + KNNb (λ=31), % | ||||||
|
| Accuracy | F1 | mAPc | mARd | Accuracy | F1 | mAP | mAR |
| Fold 1 | 97.31 | 97.83 | 97.97 | 97.70 | 98.41 | 98.80 | 98.82 | 98.79 |
| Fold 2 | 94.18 | 95.11 | 96.91 | 93.38 | 93.97 | 95.18 | 97.54 | 92.94 |
| Average, mean (SD) | 95.75 (2.21) | 96.47 (1.92) | 97.44 (0.75) | 95.54 (3.05) | 96.19 (3.13) | 96.99 (2.56) | 98.18 (0.90) | 95.86 (4.13) |
aPCA: principal component analysis.
bKNN: k-nearest neighbor.
cmAP: mean average precision.
dmAR: mean average recall.
Figure 4Detailed performance results of our proposed network shown in an average confusion matrix.
Comparative classification performance of our proposed network with the other state-of-the-art methods used in endoscopy.
| Authors | Deep network | Accuracy, % | F1, % | mAPa, % | mARb, % |
| Zhang et al (2017) [ | SqueezeNet [ | 77.84 | 76.74 | 76.77 | 76.73 |
| Hicks et al (2018) [ | VGG19 [ | 85.15 | 85.29 | 85.88 | 84.72 |
| Fan et al (2018) [ | AlexNet [ | 80.08 | 80.49 | 80.70 | 80.28 |
| Takiyama et al (2018) [ | GoogLeNet [ | 84.59 | 85.14 | 85.29 | 84.99 |
| Byrne et al (2019) [ | InceptionV3 [ | 87.92 | 88.45 | 87.87 | 89.05 |
| Jani et al (2019) [ | MobileNetV2 [ | 88.53 | 88.51 | 88.34 | 88.69 |
| Lee et al (2019) [ | ResNet50 [ | 89.55 | 90.60 | 90.70 | 90.50 |
| Vezakis et al (2019) [ | ResNet18 [ | 89.95 | 90.35 | 90.72 | 89.99 |
| Owais et al (2019) [ | CNNc + LSTMd [ | 92.57 | 93.41 | 94.58 | 92.28 |
| Cho et al (2019) [ | InceptionResNet [ | 84.78 | 84.53 | 84.15 | 84.92 |
| Dif et al (2020) [ | ShuffleNet [ | 89.63 | 89.14 | 88.67 | 89.63 |
| Song et al (2020) [ | DenseNet201 [ | 92.12 | 92.42 | 92.91 | 91.93 |
| Guimarães et al (2020) [ | VGG16 [ | 85.72 | 85.80 | 86.24 | 85.37 |
| Hussein et al (2020) [ | ResNet101 [ | 90.24 | 91.14 | 91.52 | 90.78 |
| Klang et al (2020) [ | Xception [ | 86.05 | 84.88 | 84.19 | 85.58 |
| Proposed method | DenseNet + LSTM + PCAe + KNNf | 96.19 | 96.99 | 98.18 | 95.86 |
amAP: mean average precision.
bmAR: mean average recall.
cCNN: convolutional neural network.
dLSTM: long short-term memory.
ePCA: principal component analysis.
fKNN: k-nearest neighbor.
The t test and Cohen d performance analysis results (P values and effect sizes).
| Methods | Proposed vs second-best method | Proposed vs third-best method | ||||||
|
| Accuracy | F1 | mAPa | mARb | Accuracy | F1 | mAP | mAR |
| <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | <.001 | |
| Cohen | 2.51 | 1.78 | 1.89 | 1.55 | 3.4 | 2.32 | 2.88 | 1.89 |
amAP: mean average precision.
bmAR: mean average recall.
Figure 5The t test and Cohen d performance comparison of our method with the second-best and third-best models with the average accuracy, F1 score, mAP, and mAR. CNN: convolutional neural network; LSTM: long short-term memory; mAP: mean average precision; mAR: mean average recall.
Classification performance comparison of our proposed method with the other handcrafted feature–based methods.
| Feature descriptor and classifier | Accuracy, % | F1, % | mAPa, % | mARb, % | |||||
| Local binary pattern [ |
|
|
|
| |||||
|
| AdaBoostM2 | 35.74 | 27.70 | 35.74 | 22.61 | ||||
|
| Multi-SVMc | 43.84 | 42.35 | 42.99 | 41.72 | ||||
|
| RFd | 57.10 | 53.85 | 54.79 | 52.95 | ||||
|
| KNNe | 50.46 | 47.36 | 46.86 | 47.87 | ||||
| Histogram of oriented gradients [ |
|
|
|
| |||||
|
| AdaBoostM2 | 39.35 | 32.86 | 39.35 | 28.22 | ||||
|
| Multi-SVM | 49.84 | 53.80 | 67.39 | 44.88 | ||||
|
| RF | 61.41 | 63.19 | 68.66 | 58.55 | ||||
|
| KNN | 53.20 | 54.68 | 58.41 | 51.45 | ||||
| Multilevel local binary pattern [ |
|
|
|
| |||||
|
| AdaBoostM2 | 44.02 | 37.45 | 44.02 | 32.59 | ||||
|
| Multi-SVM | 55.47 | 53.10 | 54.75 | 51.55 | ||||
|
| RF | 61.40 | 57.57 | 59.08 | 56.13 | ||||
|
| KNN | 55.40 | 52.20 | 52.06 | 52.33 | ||||
| Proposed feature descriptor |
|
|
|
| |||||
|
| AdaBoostM2 | 93.39 | 93.66 | 94.35 | 92.98 | ||||
|
| Multi-SVM | 95.50 | 96.43 | 97.98 | 94.96 | ||||
|
| RF | 81.16 | 82.96 | 84.48 | 81.55 | ||||
|
| KNN | 96.19 | 96.99 | 98.18 | 95.86 | ||||
amAP: mean average precision.
bmAR: mean average recall.
cSVM: support vector machine.
dRF: random forest.
eKNN: k-nearest neighbor.
fLSTM: long short-term memory.
gPCA: principal component analysis.
Figure 6Obtained class-specific discriminative regions from different parts of the first-stage network (DenseNet) for given input frames.
Performance comparisons of our proposed and the second-best baseline method [10] using both retrieval methods.
| Method | With class prediction | Without class prediction | |||||||
|
| Accuracy, % | F1, % | mAPa, % | mARb, % | Accuracy, % | F1, % | mAP, % | mAR, % | |
| Owais et al [ | 92.57 | 93.41 | 94.58 | 92.28 | 93.18 | 94.02 | 94.68 | 93.38 | |
| Proposed | 96.19 | 96.99 | 98.18 | 95.86 | 96.13 | 96.94 | 98.04 | 95.89 | |
amAP: mean average precision.
bmAR: mean average recall.
Figure 7Obtained retrieval results in ranked order (1st to 24th best matches) for the C16 input query frame using both a class prediction–based method and a method without class prediction.
Figure 8Obtained retrieval results in ranked order (1st to 24th best matches) for the C31 input query frame using both a class prediction–based method and a method without class prediction.