| Literature DB >> 32218230 |
Dat Tien Nguyen1, Jin Kyu Kang1, Tuyen Danh Pham1, Ganbayar Batchuluun1, Kang Ryoung Park1.
Abstract
Computer-aided diagnosis systems have been developed to assist doctors in diagnosing thyroid nodules to reduce errors made by traditional diagnosis methods, which are mainly based on the experiences of doctors. Therefore, the performance of such systems plays an important role in enhancing the quality of a diagnosing task. Although there have been the state-of-the art studies regarding this problem, which are based on handcrafted features, deep features, or the combination of the two, their performances are still limited. To overcome these problems, we propose an ultrasound image-based diagnosis of the malignant thyroid nodule method using artificial intelligence based on the analysis in both spatial and frequency domains. Additionally, we propose the use of weighted binary cross-entropy loss function for the training of deep convolutional neural networks to reduce the effects of unbalanced training samples of the target classes in the training data. Through our experiments with a popular open dataset, namely the thyroid digital image database (TDID), we confirm the superiority of our method compared to the state-of-the-art methods.Entities:
Keywords: artificial intelligence; deep learning; malignant thyroid nodule; ultrasound image; weighted binary cross-entropy loss
Mesh:
Year: 2020 PMID: 32218230 PMCID: PMC7180806 DOI: 10.3390/s20071822
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of the previous studies on the ultrasound thyroid nodule image classification problem.
| Category | Method | Strength | Weakness |
|---|---|---|---|
| Handcrafted-based Methods |
Classification is implemented using extracted image features via human-designed methods [ |
Easy to implement. Does not require high-performance hardware devices |
Low classification accuracy |
| Deep learning-based methods |
Fine-tuning an existing CNN network for classification [ Extracts image features using a pretrained CNN network while classification is implemented using an SVM [ Combines detection and classification based on a CNN network [ |
Utilizes the power of deep learning and transfer learning methods Higher accuracy than handcrafted-based methods |
There is room for enhancing classification performance |
| Fusion of deep and handcrafted-based methods |
Extracts image features from both spatial and frequency domains for classification problem [ |
Applies a cascade classifier scheme to enhance classification performance using handcrafted and deep features |
More complicated and takes longer processing time than using a single method (FFT-based or CNN-based methods) |
|
Extracts image information from both spatial and frequency domains for classification problem Combines classification results by multiple CNN models to enhance classification performance Reduces the effect of unbalanced training samples of CNN network by using weighted cross-entropy loss function. (Proposed method) |
Analyzes the ultrasound thyroid images using different architectures of CNN network Enhances the classification results compared to the use of single CNN architecture |
Requires strong hardware equipment to run multiple CNN networks Takes longer processing time than the previous studies. |
Figure 1Example of captured ultrasound thyroid images in the thyroid digital image database (TDID) dataset [20]: (a) benign cases and (b) malign cases.
Figure 2Flow chart of the proposed method.
Figure 3Example result of the thyroid region detection algorithm used in our study: (a) an input ultrasound thyroid image; (b) the binarized image; (c) the thyroid region detection by selecting the largest object; and (d) the final detection results.
Figure 4Coarse classifier based on information extracted in the frequency domain using the Fast Fourier Transform (FFT)-based method: (a) a thyroid image represented in spatial and frequency domain with a selected circle frequency regions and (b) the flowchart for classifying thyroid images into ‘benign’, ‘malign’, or ‘ambiguous’ region in our study.
Figure 5A general architecture of a Convolutional Neural Network (CNN) network for the image classification problem.
Figure 6Methodology for constructing the residual convolution block.
Figure 7Comparison between: (a) the conventional convolution block versus (b) the naïve inception block.
Figure 8Flow-chart of the deep learning-based system constructed by combining classification results of multiple CNN networks.
ResNet50-based CNN architecture used in our experiments.
| Layer | Input Shape | Output Shape | Number of Parameters |
|---|---|---|---|
| Convolution Layers by ResNet-50 Network | (224, 224, 3) | (7, 7, 2048) | 23,587,712 |
| Global Average Pooling | (7, 7, 2048) | 2048 | 0 |
| Batch Normalization | 2048 | 2048 | 8192 |
| Dropout | 2048 | 2048 | 0 |
| Output Layer (Dense layer) | 2048 | 2 | 4098 |
Inception-based CNN architecture used in our experiments.
| Layer | Input Shape | Output Shape | Number of Parameters |
|---|---|---|---|
| Convolution Layers by Inception Network | (224, 224, 3) | (5, 5, 2048) | 21,802,784 |
| Global Average Pooling | (5, 5, 2048) | 2048 | 0 |
| Batch Normalization | 2048 | 2048 | 8192 |
| Dropout | 2048 | 2048 | 0 |
| Output Layer (Dense layer) | 2048 | 2 | 4098 |
Description of the TDID dataset used in our experiments (each number means the number of patients).
| Benign Case | Malign Case | Total | ||
|---|---|---|---|---|
| Training Data | Testing Data | Training Data | Testing Data | |
| 41 | 11 | 196 | 50 | 298 |
Parameters for training CNN models in our study.
| Optimizer | Number of Epochs | Batch Size | Initial Learning Rate | Stop Criteria |
|---|---|---|---|---|
| Adam | 30 | 32 | 0.0001 | End of Epochs |
Classification performance of the individual CNN network using the TDID dataset (unit: %).
| Method | ResNet50-Based Network | Inception-Based Network | ||||
|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| Using BCE | 87.778 | 91.356 | 64.018 | 81.506 | 83.406 | 68.760 |
| Using wBCE | 82.412 | 83.950 | 72.524 | 80.792 | 81.842 | 74.016 |
Classification performance by combining the two CNN networks using MIN, MAX, and SUM rules (unit: %).
| Method | MIN Rule | MAX Rule | SUM Rule | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| Using BCE | 83.938 | 85.868 | 71.142 | 90.603 | 95.446 | 58.718 | 82.677 | 83.894 | 74.219 |
| Using wBCE | 75.200 | 74.859 | 77.226 | 91.192 | 95.083 | 65.687 | 78.709 | 79.167 | 75.967 |
Classification performance of our proposed method using the TDID dataset with MIN, MAX, and SUM rules (unit: %).
| Method | MIN Rule | MAX Rule | SUM Rule | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| Using BCE | 86.928 | 89.331 | 71.142 | 90.603 | 95.446 | 58.718 | 86.073 | 87.831 | 74.219 |
| Using wBCE | 83.517 | 84.466 | 77.226 | 92.051 | 96.072 | 65.687 | 85.286 | 86.748 | 75.967 |
Comparison of the overall accuracy of the previous studies and our proposed method with the TDID dataset (unit: %).
| Methods | Accuracy | |
|---|---|---|
| Zhu et al. [ | 84.00 | |
| Chi et al. [ | 79.36 | |
| Sundar et al. [ | VGG16 | 77.57 |
| GoogLeNet | 79.36 | |
| Nguyen et al. [ | 90.88 | |
| Proposed Method | 92.05 | |
Figure 9Example results obtained by our proposed method: (a) example results of the benign case and (b) example results of the malign case.
Processing time of our proposed method (unit: ms).
| Preprocessing Step. | FFT-Based Classification | ResNet50-Based Classification | Inception-Based Classification | Total |
|---|---|---|---|---|
| 11.646 | 5.093 | 17.525 | 23.178 | 57.442 |