| Literature DB >> 35547561 |
Guanfang Wang1, Xianshan Chen2, Geng Tian1, Jiasheng Yang3.
Abstract
Imbalanced classes and dimensional disasters are critical challenges in medical image classification. As a classical machine learning model, the n-gram model has shown excellent performance in addressing this issue in text classification. In this study, we proposed an algorithm to classify medical images by extracting their n-gram semantic features. This algorithm first converts an image classification problem to a text classification problem by building an n-gram corpus for an image. After that, the algorithm was based on the n-gram model to classify images. The algorithm was evaluated by two independent public datasets. The first experiment is to diagnose benign and malignant thyroid nodules. The best area under the curve (AUC) is 0.989. The second experiment is to diagnose the type of fundus lesion. The best result is that it correctly identified 86.667% of patients with dry age-related macular degeneration (AMD), 93.333% of patients with diabetic macular edema (DME), and 93.333% of normal individuals.Entities:
Mesh:
Year: 2022 PMID: 35547561 PMCID: PMC9085325 DOI: 10.1155/2022/3151554
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Figure 1Hex string 2-gram partition example.
Figure 2Ultrasonic image preprocessing of thyroid nodules. (a) The original image. (b) Image after removing adjacent tissues of nodules. (c) Image after adjusting nodule angle.
Figure 3Retinal OCT image preprocessing. (a) The original image. (b) The preprocessed image.
Parameter settings for both classifiers.
| libsvm | newff | ||||
|---|---|---|---|---|---|
| c | g | Si | Epochs | Goal | lr |
| 10 | 0.01 | 3, 6, 3 | 100 | 1 | 0.01 |
Algorithm 1Proposed algorithm.
Characteristic dimension.
| Mode | Essential feature | Distinguishing feature | |||
|---|---|---|---|---|---|
| 100% | 75% | 50% | 25% | ||
| TF | 3590 | 1463 | 1098 | 732 | 366 |
| TF-RF | 3590 | 1338 | 1004 | 669 | 335 |
Figure 4The results of 10-fold cross-validation. (a) The classifier with SVM and TF achieves the best AUC of 0.989 using all features as inputs. (b) The classifier with BP and TF achieves the best AUC of 0.981 using 25% features as inputs. (c) The classifier with SVM and TF-RF achieves the best AUC of 0.978 using all features as inputs. (d) The classifier with BP and TF-RF achieves the best AUC of 0.977 using 25% features as inputs.
Fraction of correctly classified during cross-validation.
| Class | TF-SVM | TF-BP | TF-RF-SVM | TF-RF-BP |
|---|---|---|---|---|
| AMD | 13/15 = 86.667% | 13/15 = 86.667% | 13/15 = 86.667% | 12/15 = 80.000% |
| DME | 13/15 = 86.667% | 11/15 = 73.333% | 14/15 = 93.333% | 12/15 = 80.000% |
| Normal | 14/15 = 93.333% | 14/15 = 93.333% | 14/15 = 93.333% | 14/15 = 93.333% |