| Literature DB >> 35046138 |
Sunjin Yim1, Sungchul Kim2, Inhwan Kim2, Jae-Woo Park3, Jin-Hyoung Cho4, Mihee Hong5, Kyung-Hwa Kang6, Minji Kim7, Su-Jung Kim8, Yoon-Ji Kim9, Young Ho Kim10, Sung-Hoon Lim11, Sang Jin Sung9, Namkug Kim12, Seung-Hak Baek13.
Abstract
OBJECTIVE: The purpose of this study was to investigate the accuracy of one-step automated orthodontic diagnosis of skeletodental discrepancies using a convolutional neural network (CNN) and lateral cephalogram images with different qualities from nationwide multi-hospitals.Entities:
Keywords: Convolutional neural networks; Lateral cephalogram; Multi-center study; One-step automated orthodontic diagnosis
Year: 2022 PMID: 35046138 PMCID: PMC8770967 DOI: 10.4041/kjod.2022.52.1.3
Source DB: PubMed Journal: Korean J Orthod Impact factor: 1.372
Figure 1Flowchart of dataset and experimental setup.
CNN, convolutional neural network.
Information on the product, radiation exposure condition, sensor, and image condition of the cephalometric radiograph system in 10 multi-centers
| Cephalometric radiograph systems | SNUDH | KADH | AJUDH | AMC | CNUDH | CSUDH | EUMC | KHUDH | KNUDH | WKUDH | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product | Company | Asahi | Vatech | Planmeca | Carestream | Instrumentarium | Planmeca | Asahi | Asahi | Asahi | Planmeca |
| Model | CX-90SP-II | Uni3D NC | Proline XC | CS9300 | OrthoCeph | Proline XC | Ortho stage | CX-90SP | CX-90SP- | Promax | |
| Radiation | Kvp | 76 | 85 | 68 | 80 | 85 | 80 | 75 | 70 | 70 | Female 72, |
| mA | 80 | 10 | 7 | 12 | 12 | 12 | 15 | 15 | 80 | 10 | |
| sec | 0.32 | 0.9 | 2.3 | 0.63 | 1.6 | 1.8 | 1 | 0.3–0.35 | 0.32 | 1.87 | |
| Sensor | Image | Cassette | CCD sensor | CCD sensor | CCD sensor | Cassette | Cassette | Cassette | Cassette | Cassette | CCD |
| Sensor | 10 × 12 (inch) | 30 × 25 (cm) | 10.6 × 8.85 (inch) | 30 × 30 (cm) | 10 × 12 (inch) | 8 × 10 (inch) | 8 × 12 | 10 × 12 | 11 × 14 | 27 × 30 | |
| Image | Image size | 2,000 × 2,510/ | 2,360 × 1,880 | 1,039 × 1,200 | 2,045 × 2,272/ | 2,500 × 2,048 | 2,392 × 1,792 | 2,510 × 2,000 | 2,500 | 1,950 × 2,460/ | 1,818 |
| Actual | 0.150/ 0.100 | 0.110 | 0.250 | 0.132/ 0.145 | 0.115 | 0.100 | 0.100 | 0.110 | 0.100 | 0.132 | |
| Lateral cephalogram images used in this study (number) | 1,129 | 864 | 22 | 21 | 20 | 30 | 26 | 23 | 19 | 20 | |
SNUDH, Seoul National University Dental Hospital; KADH, Kooalldam Dental Hospital; AJUDH, Ajou University Dental Hospital; AMC, Asan Medical Center; CNUDH, Chonnam National University Dental Hospital; CSUDH, Chosun University Dental Hospital; EUMC, Ewha University Medical Center; KHUDH, Kyung Hee University Dental Hospital; KNUDH, Kyungpook National University Dental Hospital; WKUDH, Wonkwang University Dental Hospital; CR, computed radiography; CCD, charge-coupled device.
Classification criteria for the anteroposterior skeletal discrepancies (APSDs), vertical skeletal discrepancies (VSDs), and vertical dental discrepancies (VDDs) for orthodontic analysis
| Sex | APSDs | VSDs | VDDs | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ANB | FMA | FHR | Overbite | ||||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | ||||
| Female | 2.4 | 1.8 | 24.2 | 4.6 | 65 | 9 | 1.5 | 1.5 | |||
| Male | 1.78 | 2.02 | 26.78 | 1.79 | 66.37 | 5.07 | |||||
ANB, angle among A point, nasion, and B point; FMA, Frankfort mandibular plane angle; FHR, Jarabak’s posterior/anterior facial height ratio; SD, standard deviation.
Distribution of classification groups in each diagnosis for human gold standard in the training set, internal test set, and external test set
| Classifications | Training set | Internal test set | External test set | Sum | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SNUDH | KADH | Sum | SNUDH | KADH | Sum | AJUDH | AMC | EUMC | CNUDH | CSUDH | KHUDH | KNUDH | WKUDH | Sum | Internal + external | Total | |||||
| APSDs | Class I | 238 | 323 | 561 (36.9) | 122 | 40 | 162 (34.4) | 8 | 6 | 5 | 4 | 7 | 11 | 7 | 2 | 50 (27.6) | 212 (32.5) | 773 (35.6) | |||
| Class II | 183 | 263 | 446 (29.3) | 112 | 44 | 156 (33.1) | 8 | 8 | 11 | 8 | 13 | 4 | 4 | 6 | 62 (34.3) | 218 (33.4) | 664 (30.5) | ||||
| Class III | 359 | 156 | 515 (33.8) | 115 | 38 | 153 (32.5) | 6 | 7 | 10 | 8 | 10 | 8 | 8 | 12 | 69 (38.1) | 222 (34.0) | 737 (33.9) | ||||
| Sum | 780 | 742 | 1,522 | 349 | 122 | 471 | 22 | 21 | 26 | 20 | 30 | 23 | 19 | 20 | 181 | 652 | 2,174 | ||||
| VSDs | Normodivergent | 331 | 389 | 720 (47.3) | 146 | 50 | 196 (41.6) | 10 | 6 | 7 | 9 | 17 | 10 | 7 | 7 | 73 (40.3) | 270 (41.4) | 989 (45.5) | |||
| Hyperdivergent | 314 | 241 | 555 (36.5) | 135 | 40 | 175 (37.2) | 5 | 9 | 12 | 6 | 3 | 7 | 8 | 6 | 56 (30.9) | 231 (35.4) | 786 (36.2) | ||||
| Hypodivergent | 135 | 112 | 247 (16.2) | 68 | 32 | 100 (21.2) | 7 | 6 | 7 | 5 | 10 | 6 | 4 | 7 | 52 (28.7) | 151 (23.2) | 399 (18.4) | ||||
| Sum | 780 | 742 | 1,522 | 349 | 122 | 471 | 22 | 21 | 26 | 20 | 30 | 23 | 19 | 20 | 181 | 652 | 2,174 | ||||
| VDDs | Normal overbite | 440 | 493 | 933 (61.3) | 196 | 53 | 249 (52.9) | 11 | 11 | 10 | 8 | 9 | 10 | 10 | 10 | 79 (43.6) | 328 (50.3) | 1,261 (58.0) | |||
| Open bite | 209 | 194 | 403 (26.5) | 99 | 41 | 140 (29.7) | 4 | 7 | 9 | 5 | 9 | 8 | 4 | 5 | 51 (28.2) | 191 (29.3) | 594 (27.3) | ||||
| Deep bite | 131 | 55 | 186 (12.2) | 54 | 28 | 82 (17.4) | 7 | 3 | 7 | 7 | 12 | 5 | 5 | 5 | 51 (28.2) | 133 (20.4) | 319 (14.7) | ||||
| Sum | 780 | 742 | 1,522 | 349 | 122 | 471 | 22 | 21 | 26 | 20 | 30 | 23 | 19 | 20 | 181 | 652 | 2,174 | ||||
Values are presented as number only or number (%).
APSDs, anteroposterior skeletal discrepancies; VSDs, vertical skeletal discrepancies; VDDs, vertical dental discrepancies; SNUDH, Seoul National University Dental Hospital; KADH, Kooalldam Dental Hospital; AJUDH, Ajou University Dental Hospital; AMC, Asan Medical Center; EUMC, Ewha University Medical Center; CNUDH, Chonnam National University Dental Hospital; CSUDH, Chosun University Dental Hospital; KHUDH, Kyunghee University Dental Hospital; KNUDH, Kyungpook National University Dental Hospital; WKUDH, Wonkwang University Dental Hospital.
Figure 5The results of t-stochastic neighbor embedding in anteroposterior skeletal discrepancies (APSDs), vertical skeletal discrepancies (VSDs), and vertical dental discrepancies (VDDs) per dataset. The labels of ground truth (GT) and prediction (PD) were set to check their distribution. Dotted circles indicate areas with irregular mixing. Dotted lines indicate cutoff lines.
Performance of our model for the diagnosis of the APSDs, VSDs, and VDDs in the internal test set and external test set using the binary ROC analysis
| Classifications | Accuracy | AUC | Sensitivity | Specificity | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Internal test set | External test set | Internal test set | External test set | Internal test set | External test set | Internal test set | External test set | |||||||||||||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |||||||||
| APSDs | Class I | 0.8488 | 0.0103 | 0.8320 | 0.0230 | 0.9212 | 0.0038 | 0.9042 | 0.0195 | 0.7938 | 0.0328 | 0.7840 | 0.0297 | 0.8764 | 0.0186 | 0.8504 | 0.0273 | |||||||
| Class II | 0.8972 | 0.0057 | 0.8796 | 0.0153 | 0.9533 | 0.0026 | 0.9601 | 0.0067 | 0.8192 | 0.0334 | 0.7226 | 0.0515 | 0.9359 | 0.0161 | 0.9613 | 0.0046 | ||||||||
| Class III | 0.9372 | 0.0063 | 0.9525 | 0.0108 | 0.9807 | 0.0025 | 0.9930 | 0.0023 | 0.9111 | 0.0225 | 0.9652 | 0.0079 | 0.9497 | 0.0086 | 0.9446 | 0.0160 | ||||||||
| Mean | 0.8944 | 0.0368 | 0.8880 | 0.0518 | 0.9517 | 0.0245 | 0.9524 | 0.0382 | 0.8414 | 0.0571 | 0.8239 | 0.1076 | 0.9206 | 0.0345 | 0.9188 | 0.0516 | ||||||||
| VSDs | Normodivergent | 0.8365 | 0.0082 | 0.8309 | 0.0267 | 0.9186 | 0.0046 | 0.9157 | 0.0151 | 0.8235 | 0.0279 | 0.7699 | 0.0416 | 0.8458 | 0.0122 | 0.8722 | 0.0178 | |||||||
| Hyperdivergent | 0.9019 | 0.0035 | 0.9061 | 0.0203 | 0.9730 | 0.0047 | 0.9730 | 0.0047 | 0.8149 | 0.0273 | 0.9143 | 0.0293 | 0.9534 | 0.0190 | 0.9024 | 0.0360 | ||||||||
| Hypodivergent | 0.9346 | 0.0098 | 0.9094 | 0.0164 | 0.9824 | 0.0015 | 0.9684 | 0.0026 | 0.9000 | 0.0394 | 0.8000 | 0.0661 | 0.9445 | 0.0127 | 0.9535 | 0.0110 | ||||||||
| Mean | 0.8910 | 0.0413 | 0.8821 | 0.0410 | 0.9580 | 0.0283 | 0.9523 | 0.0273 | 0.8461 | 0.0478 | 0.8280 | 0.0757 | 0.9146 | 0.0505 | 0.9094 | 0.0398 | ||||||||
| VDDs | Normal overbite | 0.7376 | 0.0291 | 0.7591 | 0.0230 | 0.8177 | 0.0166 | 0.8359 | 0.0152 | 0.6530 | 0.0956 | 0.6582 | 0.0664 | 0.8288 | 0.0441 | 0.8373 | 0.0557 | |||||||
| Open bite | 0.8730 | 0.0130 | 0.8917 | 0.0139 | 0.9475 | 0.0053 | 0.9626 | 0.0074 | 0.8371 | 0.0366 | 0.8275 | 0.0611 | 0.8882 | 0.0304 | 0.9262 | 0.0228 | ||||||||
| Deep bite | 0.8637 | 0.0270 | 0.8586 | 0.0127 | 0.9286 | 0.0099 | 0.9238 | 0.0055 | 0.8000 | 0.1100 | 0.8196 | 0.0836 | 0.8781 | 0.0530 | 0.8723 | 0.0457 | ||||||||
| Mean | 0.8248 | 0.0654 | 0.8365 | 0.0584 | 0.8979 | 0.0582 | 0.9074 | 0.0538 | 0.7634 | 0.1111 | 0.7684 | 0.1006 | 0.8651 | 0.0468 | 0.8786 | 0.0535 | ||||||||
APSDs, anteroposterior skeletal discrepancies; VSDs, vertical skeletal discrepancies; VDDs, vertical dental discrepancies; ROC, receiver operating characteristic; AUC, area under the curve; SD, standard deviation.
Performance of our model for the diagnosis of the APSDs, VSDs, and VDDs in the internal test set and external test set using the multiple ROC analysis
| Classifications | Accuracy | Pairwise AUC | Pairwise sensitivity | Pairwise specificity | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Internal test set | External test set | Internal test set | External test set | Internal test set | External test set | Internal test set | External test set | |||||||||||||||||
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |||||||||
| APSDs | Class I → Class II | 0.8503 | 0.0086 | 0.8054 | 0.0222 | 0.8943 | 0.0106 | 0.8222 | 0.3830 | 0.8802 | 0.0283 | 0.9080 | 0.0098 | 0.8192 | 0.0299 | 0.7226 | 0.0461 | |||||||
| Class I ← Class II | 0.9175 | 0.0039 | 0.9061 | 0.0136 | 0.8192 | 0.0299 | 0.7226 | 0.0461 | 0.8802 | 0.0283 | 0.9080 | 0.0098 | ||||||||||||
| Class I → Class III | 0.9143 | 0.0092 | 0.9277 | 0.0147 | 0.9486 | 0.0057 | 0.9780 | 0.0039 | 0.9173 | 0.0149 | 0.8760 | 0.0320 | 0.9111 | 0.0201 | 0.9652 | 0.0071 | ||||||||
| Class I ← Class III | 0.9698 | 0.0035 | 0.9856 | 0.0032 | 0.9111 | 0.0201 | 0.9652 | 0.0071 | 0.9173 | 0.0149 | 0.8760 | 0.0320 | ||||||||||||
| Class II → Class III | 0.9754 | 0.0033 | 0.9725 | 0.0142 | 0.9913 | 0.0014 | 0.9992 | 0.0009 | 0.9654 | 0.0077 | 0.9419 | 0.0299 | 0.9856 | 0.0026 | 1.0000 | 0.0000 | ||||||||
| Class II ← Class III | 0.9920 | 0.0013 | 0.9989 | 0.0013 | 0.9856 | 0.0026 | 1.0000 | 0.0000 | 0.9654 | 0.0077 | 0.9419 | 0.0299 | ||||||||||||
| VSDs | Hyper → Hypo | 0.9905 | 0.0037 | 0.9778 | 0.0126 | 0.9998 | 0.0002 | 0.9930 | 0.0019 | 0.9851 | 0.0058 | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 0.9538 | 0.0261 | |||||||
| Hyper ← Hypo | 0.9998 | 0.0001 | 0.9977 | 0.0003 | 1.0000 | 0.0000 | 0.9538 | 0.0261 | 0.9851 | 0.0058 | 1.0000 | 0.0000 | ||||||||||||
| Hyper → Normo | 0.8755 | 0.0040 | 0.8791 | 0.0223 | 0.9593 | 0.0063 | 0.9587 | 0.0068 | 0.8149 | 0.0244 | 0.9143 | 0.0262 | 0.9296 | 0.0257 | 0.8521 | 0.0485 | ||||||||
| Hyper ← Normo | 0.9034 | 0.0088 | 0.9329 | 0.0119 | 0.9296 | 0.0257 | 0.8521 | 0.0485 | 0.8149 | 0.0244 | 0.9143 | 0.0262 | ||||||||||||
| Hypo → Normo | 0.8959 | 0.0139 | 0.8688 | 0.0212 | 0.9669 | 0.0024 | 0.9459 | 0.0042 | 0.9000 | 0.0352 | 0.8000 | 0.0591 | 0.8939 | 0.0231 | 0.9178 | 0.0173 | ||||||||
| Hypo ← Normo | 0.9451 | 0.0153 | 0.8972 | 0.0316 | 0.8939 | 0.0231 | 0.9178 | 0.0173 | 0.9000 | 0.0352 | 0.8000 | 0.0591 | ||||||||||||
| VDDs | Open → Deep | 0.9766 | 0.0112 | 0.9706 | 0.0186 | 0.9982 | 0.0012 | 0.9924 | 0.0044 | 0.9814 | 0.0116 | 0.9922 | 0.0096 | 0.9683 | 0.0412 | 0.9490 | 0.0319 | |||||||
| Open ← Deep | 0.9951 | 0.0066 | 0.9956 | 0.0042 | 0.9683 | 0.0412 | 0.949 | 0.0319 | 0.9814 | 0.0116 | 0.9922 | 0.0096 | ||||||||||||
| Open → Normal | 0.8463 | 0.0141 | 0.8538 | 0.0201 | 0.9308 | 0.0063 | 0.9434 | 0.0084 | 0.8414 | 0.0318 | 0.8314 | 0.0520 | 0.8490 | 0.0363 | 0.8684 | 0.0284 | ||||||||
| Open ← Normal | 0.8190 | 0.0452 | 0.8373 | 0.0341 | 0.8490 | 0.0363 | 0.8684 | 0.0284 | 0.8414 | 0.0318 | 0.8314 | 0.0520 | ||||||||||||
| Deep → Normal | 0.8066 | 0.0338 | 0.8062 | 0.0132 | 0.8911 | 0.0130 | 0.8775 | 0.0089 | 0.8000 | 0.0984 | 0.8275 | 0.0788 | 0.8088 | 0.0741 | 0.7924 | 0.0682 | ||||||||
| Deep ← Normal | 0.8156 | 0.0388 | 0.8345 | 0.0277 | 0.8088 | 0.0741 | 0.7924 | 0.0682 | 0.8000 | 0.0984 | 0.8275 | 0.0788 | ||||||||||||
ROC curve analysis with multiple classification tasks was performed.
APSDs, anteroposterior skeletal discrepancies; VSDs, vertical skeletal discrepancies; VDDs, vertical dental discrepancies; ROC, receiver operating characteristic; AUC, area under the curve; SD, standard deviation; Hyper, hyperdivergent; Hypo, hypodivergent; Normo, normodivergent; Open, open bite; Deep, deep bite; Normal, normal overbite.
Comparison of the binary ROC analysis results between multi-models in a previous study and a single model in this study
| Models | APSDs | VSDs | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | AUC | Sensitivity | Specificity | Accuracy | AUC | ||||||||||||||||
| Yu et al’s | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This study | Yu et al’s study[ | This | ||||||||
| Model I | 0.8575 | 0.8414 | 0.9288 | 0.9206 | 0.9050 | 0.8944 | 0.938 | 0.9517 | 0.8427 | 0.8461 | 0.9213 | 0.9146 | 0.8951 | 0.8910 | 0.937 | 0.9580 | |||||||
| Model II | 0.9079 | NA | 0.9539 | NA | 0.9386 | NA | 0.970 | NA | 0.9222 | NA | 0.9611 | NA | 0.9481 | NA | 0.985 | NA | |||||||
| Model III | 0.9355 | NA | 0.9677 | NA | 0.9570 | NA | 0.978 | NA | 0.9459 | NA | 0.9729 | NA | 0.9640 | NA | 0.984 | NA | |||||||
ROC, receiver operating characteristic; APSDs, anteroposterior skeletal discrepancies; VSDs, vertical skeletal discrepancies; AUC, area under the curve; SD, standard deviation; NA, not applicable.
Figure 2Diagrams of the model architecture. A, During training, an ArcFace head was added to the last convolutional layer of the backbone in parallel with the softmax layer. B, After training, the ArcFace head was removed and inference was implemented using only the softmax layer.
Figure 3Metrology distribution of the anteroposterior skeletal discrepancies (APSDs: Class I, Class II, and Class III), vertical skeletal discrepancies (VSDs: normodivergent pattern, hyperdivergent pattern, and hypodivergent pattern), and vertical dental discrepancies (VDDs: normal overbite, open bite, and deep bite) per dataset. Red lines in APSDs and VSDs indicate one standard deviation of the normal classification. Red lines in VDDs indicate the boundary values, which were 0 mm and 3 mm.
ANB, angle among A point, nasion, and B point; FMA, Frankfort mandibular plane angle; FHR, Jarabak’s posterior/anterior facial height ratio; norm, normalized; Man, mandible 1 crown; Max, maxilla 1 crown; dist, distance.
Figure 6Gradient-weighted class activation mapping plots for anteroposterior skeletal discrepancies (APSDs), vertical skeletal discrepancies (VSDs), and vertical dental discrepancies (VDDs).
Summary of the study design, methods and results in the orthodontic diagnosis of previous CNN studies and this study
| Author (year) | Samples | Model and its application | Data set | Results |
|---|---|---|---|---|
| Arık et al. |
400 publicly available cephalograms 19 landmarks 8 cephalometric parameters 2 human examiners |
Deep learning with CNN and shape-based model Landmark detection Cephalometric analysis |
Training set: 150 Test set: 250 |
High anatomical landmark detection accuracy (∼1% to 2% higher success detection rate for a 2-mm range compared with the top benchmarks in the literature) High anatomical type classification accuracy (~76% average classification accuracy for test set) |
| Park et al. |
1,028 lateral cephalograms 80 landmarks 1 human examiner |
Deep learning with YOLOv3 and SSD Landmark detection |
Training set: 1,028 Test set: 283 |
The YOLOv3 algorithm outperformed SSD in accuracy for 38 of 80 landmarks The other 42 of 80 landmarks did not show a statistically significant difference between YOLOv3 and SSD Error plots of YOLOv3 showed not only a smaller error range but also a more isotropic tendency The mean computational time spent per image was 0.05 seconds and 2.89 seconds for YOLOv3 and SSD, respectively YOLOv3 showed approximately 5% higher accuracy compared with the top benchmarks in the literature |
| Nishimoto |
219 lateral cephalograms from internet 10 skeletal landmarks 12 cephalometric parameters Human examiners – not mentioned |
Personal desktop computer CNN Landmark detection Cephalometric analysis |
Training set: 153 (expanded 51 folds) Test set: 66 |
Average and median prediction errors were 17.02 and 16.22 pixels Despite the variety of image quality, using cephalogram images on the internet is a feasible approach for landmark prediction |
| Hwang et al. |
1,028 lateral cephalograms 80 landmarks 2 human examiners |
Deep learning with YOLOv3 Landmark detection |
Training set: 1,028 Test set: 283 |
Upon repeated trials, AI always detected identical positions on each landmark Human intra-examiner variability of repeated manual detections demonstrated a detection error of 0.97 ± 1.03 mm The mean detection error between AI and human: 1.46 ± 2.97 mm The mean difference between human examiners: 1.50 ± 1.48 mm Comparisons in the detection errors between AI and human examiners: less than 0.9 mm, which did not seem to be clinically significant |
| Kunz et al. |
1,792 cephalograms 18 landmarks 12 orthodontic parameters 12 human examiners |
CNN deep learning algorithm Landmark detection Cephalometric analysis Humans' gold standard: median values of the 12 examiners |
Training set: 1,731 Validation set: 61 Test set: 50 |
No clinically significant differences between humans' gold standard and the AI's predictions |
| Yu et al. |
5,890 lateral cephalograms and demographic data from one institute 4 cephalometric parameters 2 human examiners |
One-step diagnostic system for skeletal classification Multimodal CNN model |
<Model I> Training set: n = 1,644 Validation set: n = 351 Test set: n = 351 Training set: n = 1,912 Validation set: n = 375 Test set: n = 375 |
Vertical and sagittal skeletal diagnosis: > 90% sensitivity, specificity, and accuracy Vertical classification: highest accuracy at 96.40 (95% CI, 93.06 to 98.39; model III) Binary ROC analysis: excellent performance (mean area under the curve > 95%) Heat maps of cephalograms: visually representing the region of the cephalogram |
| Kim et al. |
2,075 lateral cephalograms from two institutes 400 open dataset 23 landmarks 8 cephalometric parameters 2 human examiners |
Stacked hourglass deep learning Two-stage automated algorithm Web-based application Landmark detection Cephalometric analysis |
Evaluation group 1: Training set: n = 1,675 Validation set: n = 200 Test set: n = 200 Training set: n = 1,675 Validation set: n = 175 Test set: n = 225 ISBI 2015 test set: n = 400 |
Landmark detection error: 1.37 ± 1.79 mm Successful classification rate: 88.43% |
| This study |
2,174 lateral cephalograms from ten institutes 4 cephalometric parameters 1 human examiners |
One-step diagnostic system for skeletal and dental discrepancy CNN including Densenet-169, Arcface, Softmax External validation |
Training set: n = 1,522 from 2 institutes Internal test set: n = 471 from 2 institutes External test set: n = 181 from the other 8 institutes |
Binary ROC analysis: Accuracy and area under the curve were high in both internal and external test set (range: 0.8248–0.8944 and 0.8979–0.9580 in internal test set; 0.8821–0.8880 and 0.9074–0.9524 in external test set) in diagnosis of the skeletal and dental discrepancies Multiple ROC analysis: Accuracy and area under the curve were high in both internal and external test set (range:0.8066–0.9905 and 0.8156–0.9998 in internal test set; 0.8054–0.9725 and 0.8222–0.9992 in external test set) in diagnosis of the skeletal and dental discrepancies t-SNE analysis succeeded in creating the well-separated boundaries between the three classification groups in each diagnosis Grad-CAM showed different patterns and sizes of the focus areas according to three classification groups in each diagnosis |
CNN, convolutional neural network; YOLO, “you only look once” real-time object detection; SSD, single shot detector; ISBI, International Symposium on Biomedical Imaging; AI, artificial intelligence; CI, confidence interval; ROC, receiver operating characteristic; t-SNE, t-stochastic neighbor embedding; Grad-CAM, gradient-weighted class activation mapping.