| Literature DB >> 31186466 |
Joachim Krois1, Thomas Ekert1,2, Leonie Meinhold1, Tatiana Golla1, Basel Kharbot1, Agnes Wittemeier1, Christof Dörfer3, Falk Schwendicke4.
Abstract
We applied deep convolutional neural networks (CNNs) to detect periodontal bone loss (PBL) on panoramic dental radiographs. We synthesized a set of 2001 image segments from panoramic radiographs. Our reference test was the measured % of PBL. A deep feed-forward CNN was trained and validated via 10-times repeated group shuffling. Model architectures and hyperparameters were tuned using grid search. The final model was a seven-layer deep neural network, parameterized by a total number of 4,299,651 weights. For comparison, six dentists assessed the image segments for PBL. Averaged over 10 validation folds the mean (SD) classification accuracy of the CNN was 0.81 (0.02). Mean (SD) sensitivity and specificity were 0.81 (0.04), 0.81 (0.05), respectively. The mean (SD) accuracy of the dentists was 0.76 (0.06), but the CNN was not statistically significant superior compared to the examiners (p = 0.067/t-test). Mean sensitivity and specificity of the dentists was 0.92 (0.02) and 0.63 (0.14), respectively. A CNN trained on a limited amount of radiographic image segments showed at least similar discrimination ability as dentists for assessing PBL on panoramic radiographs. Dentists' diagnostic efforts when using radiographs may be reduced by applying machine-learning based technologies.Entities:
Mesh:
Year: 2019 PMID: 31186466 PMCID: PMC6560098 DOI: 10.1038/s41598-019-44839-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Base-case and sensitivity analyses.
| T. | PBL | Ref. test | Prevalence valid. set | Images train. set | Images valid. set | Acc. | AUC | F1 | Sens. | Specif. | PPV | NPV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All* | 20% | Average | 0.43 ± 0.05 | 1456.4 ± 44.6 | 353.2 ± 24.6 | 0.81 ± 0.02 | 0.89 ± 0.02 | 0.78 ± 0.03 | 0.81 ± 0.04 | 0.81 ± 0.05 | 0.76 ± 0.05 | 0.85 ± 0.02 |
| All | 25% | Average | 0.28 ± 0.04 | 1873.0 ± 49.7 | 353.2 ± 24.6 | 0.81 ± 0.04 | 0.88 ± 0.02 | 0.68 ± 0.05 | 0.75 ± 0.08 | 0.82 ± 0.07 | 0.64 ± 0.10 | 0.90 ± 0.02 |
| All | 30% | Average | 0.18 ± 0.05 | 2185.0 ± 59.6 | 353.2 ± 24.6 | 0.81 ± 0.06 | 0.89 ± 0.02 | 0.57 ± 0.09 | 0.72 ± 0.15 | 0.83 ± 0.10 | 0.50 ± 0.11 | 0.94 ± 0.03 |
| in | 20% | Average | 0.45 ± 0.06 | 1456.4 ± 44.6 | 102.0 ± 7.4 | 0.75 ± 0.03 | 0.84 ± 0.03 | 0.73 ± 0.05 | 0.77 ± 0.07 | 0.73 ± 0.08 | 0.70 ± 0.06 | 0.80 ± 0.03 |
| ca | 20% | Average | 0.26 ± 0.08 | 1456.4 ± 44.6 | 51.7 ± 4.6 | 0.83 ± 0.04 | 0.86 ± 0.04 | 0.63 ± 0.14 | 0.63 ± 0.17 | 0.89 ± 0.04 | 0.65 ± 0.13 | 0.88 ± 0.05 |
| pm | 20% | Average | 0.34 ± 0.07 | 1456.4 ± 44.6 | 93.3 ± 7.1 | 0.80 ± 0.04 | 0.88 ± 0.05 | 0.73 ± 0.07 | 0.79 ± 0.07 | 0.81 ± 0.05 | 0.68 ± 0.10 | 0.88 ± 0.04 |
| m | 20% | Average | 0.56 ± 0.05 | 1456.4 ± 44.6 | 106.2 ± 10.3 | 0.86 ± 0.03 | 0.94 ± 0.03 | 0.88 ± 0.03 | 0.88 ± 0.06 | 0.84 ± 0.08 | 0.88 ± 0.06 | 0.85 ± 0.06 |
The accuracy, the area-under-the-curve (AUC), the F1-score, sensitivity, specificity and positive/negative predictive values (PPV, NPV) as mean (SD) values are shown. In the base-case (marked by an asterisk *), all teeth were included during training, and the cut-off for periodontal bone loss (PBL) in the reference test was set at 20% of the average of three independent measurements. For sensitivity analyses we varied the cut-offs (increasing them to 25% and 30%), and also validated the performance on specific subsets of teeth. Note that the number of images in the training set varies due to oversampling of the minority (prevalent) class.
T. All: all teeth, in: incisors, c: canine, pm: premolars, m: molars.
Ref. test: Average: Reference test was set at 20% (25%, 30%) of the average of three independent measurements.
Performance of the six dentists (mean, SD) on the image segments of the validation datasets.
| T. | PBL | Ref. test | Prevalence valid. set | Images valid. set | Acc. | AUC | F1 | Sens. | Specif. | PPV | NPV |
|---|---|---|---|---|---|---|---|---|---|---|---|
| All* | 20% | Average | 0.45 ± 0.00 | 1635.3 ± 4.5 | 0.76 ± 0.06 | 0.77 ± 0.06 | 0.78 ± 0.04 | 0.92 ± 0.02 | 0.63 ± 0.14 | 0.68 ± 0.07 | 0.90 ± 0.02 |
| All | 25% | Average | 0.31 ± 0.00 | 1635.3 ± 4.5 | 0.66 ± 0.08 | 0.74 ± 0.05 | 0.64 ± 0.05 | 0.95 ± 0.01 | 0.53 ± 0.12 | 0.48 ± 0.06 | 0.96 ± 0.01 |
| All | 30% | Average | 0.20 ± 0.00 | 1635.3 ± 4.5 | 0.56 ± 0.08 | 0.71 ± 0.05 | 0.47 ± 0.04 | 0.96 ± 0.02 | 0.46 ± 0.10 | 0.31 ± 0.04 | 0.98 ± 0.01 |
| in | 20% | Average | 0.49 ± 0.00 | 472.5 ± 2.1 | 0.73 ± 0.08 | 0.73 ± 0.07 | 0.77 ± 0.05 | 0.89 ± 0.04 | 0.58 ± 0.17 | 0.68 ± 0.07 | 0.84 ± 0.04 |
| ca | 20% | Average | 0.29 ± 0.00 | 240.7 ± 0.5 | 0.73 ± 0.09 | 0.79 ± 0.06 | 0.67 ± 0.07 | 0.91 ± 0.04 | 0.67 ± 0.14 | 0.54 ± 0.09 | 0.95 ± 0.02 |
| pm | 20% | Average | 0.37 ± 0.00 | 437.2 ± 1.1 | 0.75 ± 0.07 | 0.78 ± 0.05 | 0.73 ± 0.05 | 0.91 ± 0.03 | 0.65 ± 0.13 | 0.62 ± 0.09 | 0.92 ± 0.02 |
| m | 20% | Average | 0.58 ± 0.00 | 485.0 ± 1.2 | 0.81 ± 0.05 | 0.79 ± 0.07 | 0.86 ± 0.03 | 0.95 ± 0.04 | 0.62 ± 0.17 | 0.79 ± 0.08 | 0.92 ± 0.04 |
The accuracy, the area-under-the-curve (AUC), the F1-score, sensitivity, specificity and positive/negative predictive values (PPV, NPV) are shown. In the base-case (marked by an asterisk *), all teeth were analyzed, and the cut-off for periodontal bone loss (PBL) in the reference test was set at 20% of the average of three independent measurements. For sensitivity analyses we varied the cut-offs (increasing them to 25% and 30%), and also included only specific subsets of teeth in the analyses.
T. All: all teeth, in: incisors, c: canine, pm: premolars, m: molars.
Ref. test: Average: Reference test was set at 20% (25%, 30%) of the average of three independent measurements.
Figure 1Receiver operating characteristic (ROC) curves for the base-case model (reference test defined by average of three independent measurements of the % PBL, cut-off 20%, all teeth included). The CNN was evaluated against the reference test with respect to sensitivity (the proportion of positives that are correctly identified as such) and specificity (the proportion of negatives that are correctly identified as such). The colored curves indicate the discrimination ability in each validation fold. The bold blue line represents the mean discrimination ability; the gray area corresponds to the 95% confidence interval (CI), respectively. The discrimination ability is further summarized by the AUC. A single examiner’s discrimination ability is represented by a magenta marker (operating point). If the operating point for a particular examiner lies outside the 95% CI, the model’s discriminative ability is significantly different (superior, inferior) from the examiner’s. Insert: magnified area. Sensitivity analyses with variations in reference test construction, cut-off and tooth subsets can be found in Table 1 and the appendix.