| Literature DB >> 33986459 |
Eunhye Choi1, Donghyun Kim2, Jeong-Yun Lee3, Hee-Kyung Park4.
Abstract
Orthopantomogram (OPG) is important for primary diagnosis of temporomandibular joint osteoarthritis (TMJOA), because of cost and the radiation associated with computed tomograms (CT). The aims of this study were to develop an artificial intelligence (AI) model and compare its TMJOA diagnostic performance from OPGs with that of an oromaxillofacial radiology (OMFR) expert. An AI model was developed using Karas' ResNet model and trained to classify images into three categories: normal, indeterminate OA, and OA. This study included 1189 OPG images confirmed by cone-beam CT and evaluated the results by model (accuracy, precision, recall, and F1 score) and diagnostic performance (accuracy, sensitivity, and specificity). The model performance was unsatisfying when AI was developed with 3 categories. After the indeterminate OA images were reclassified as normal, OA, or omission, the AI diagnosed TMJOA in a similar manner to an expert and was in most accord with CBCT when the indeterminate OA category was omitted (accuracy: 0.78, sensitivity: 0.73, and specificity: 0.82). Our deep learning model showed a sensitivity equivalent to that of an expert, with a better balance between sensitivity and specificity, which implies that AI can play an important role in primary diagnosis of TMJOA from OPGs in most general practice clinics where OMFR experts or CT are not available.Entities:
Year: 2021 PMID: 33986459 PMCID: PMC8119725 DOI: 10.1038/s41598-021-89742-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Clinical and demographic characteristics of the OPG dataset.
| Normal | Indeterminate | OA | ||
|---|---|---|---|---|
| Female | Number of joints | 619 | 656 | 683 |
| Mean age | 34.70 | 34.03 | 41.48 | |
| 95% CI | 33.56–35.83 | 32.92–35.14 | 40.23–42.73 | |
| SD | 14.43 | 14.54 | 16.66 | |
| Male | Number of joints | 181 | 123 | 116 |
| Mean age | 31.86 | 28.20 | 32.88 | |
| 95% CI | 29.78–33.94 | 26.11–30.28 | 29.88–35.88 | |
| SD | 14.27 | 11.78 | 16.50 | |
| Total | Number of joints | 800 | 779 | 799 |
| Mean age | 34.06 | 33.11 | 40.23 | |
| 95% CI | 33.06–35.05 | 32.11–34.11 | 39.06–41.40 | |
| SD | 14.43 | 14.28 | 16.89 | |
Indeterminate, indeterminate temporomandibular joint osteoarthritis; OA, temporomandibular joint osteoarthritis.
Confusion matrix and model performance for the initial AI.
| Confusion matrix | Model performance | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Actual | Predicted | Precision | Recall | Accuracy | Weighted average precision | Weighted average recall | F1 score | ||
| Normal | Indeterminate | OA | |||||||
| Normal | 57 | 46 | 47 | 0.72 | 0.38 | 0.51 | 0.55 | 0.51 | 0.53 |
| Indeterminate | 14 | 53 | 83 | 0.44 | 0.35 | ||||
| OA | 8 | 22 | 120 | 0.48 | 0.80 | ||||
Indeterminate, indeterminate temporomandibular joint osteoarthritis; OA, temporomandibular joint osteoarthritis.
Confusion matrix and model performance in each Trial.
| Confusion matrix | Model performance | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Actual | Predicted | Precision | Recall | Accuracy | Weighted average precision | Weighted average recall | F1 score | ||
| Normal | OA | ||||||||
| Trial 1 | Normal | 283 | 17 | 0.80 | 0.94 | 0.80 | 0.81 | 0.80 | 0.80 |
| OA | 72 | 78 | 0.82 | 0.52 | |||||
| Trial 2 | Normal | 35 | 115 | 0.81 | 0.23 | 0.73 | 0.75 | 0.73 | 0.74 |
| OA | 8 | 292 | 0.72 | 0.97 | |||||
| Trial 3 | Normal | 119 | 26 | 0.78 | 0.82 | 0.78 | 0.78 | 0.78 | 0.78 |
| OA | 34 | 93 | 0.78 | 0.73 | |||||
OA, temporomandibular joint osteoarthritis.
Five-fold cross-validation in Trial 3.
| Work | Precision | Recall | F1 score | Accuracy | AUC (95% CI) |
|---|---|---|---|---|---|
| 1 | 0.82 | 0.59 | 0.69 | 0.74 | 0.83 (0.79–0.88) |
| 2 | 0.80 | 0.71 | 0.75 | 0.77 | 0.86 (0.82–0.90) |
| 3 | 0.83 | 0.76 | 0.79 | 0.81 | 0.87 (0.83–0.91) |
| 4 | 0.75 | 0.76 | 0.75 | 0.75 | 0.83 (0.79–0.88) |
| 5 | 0.77 | 0.74 | 0.76 | 0.76 | 0.84 (0.80–0.89) |
| Average | 0.80 | 0.71 | 0.75 | 0.76 | 0.85 (0.81–0.89) |
Diagnostic performance and level of agreement in each Trial.
| Diagnostic performance | Cohen’s kappa | Kappa index | McNemar’s test | ||||
|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | |||||
| Trial 1 | Expert | 0.81 | 0.61 | 0.91 | 0.54 | Moderate | .001 |
| AI | 0.80 | 0.52 | 0.94 | 0.51 | Moderate | .000 | |
| Trial 2 | Expert | 0.69 | 0.57 | 0.93 | 0.42 | Moderate | .000 |
| AI | 0.73 | 0.97 | 0.23 | 0.25 | Fair | .000 | |
| Trial 3 | Expert | 0.85 | 0.72 | 0.97 | 0.69 | Substantial | .000 |
| AI | 0.78 | 0.73 | 0.82 | 0.56 | Moderate | .366 | |
AI, artificial intelligence.
Figure 1Comparison of the sensitivities and specificities in Trials 1, 2, and 3.
Figure 2Result of ROI extraction, 300 × 300 pixels.
Figure 3Clinical datasets used for training, validation, and test.