Literature DB >> 35775697

Deep learning improves implant classification by dental professionals: a multi-center evaluation of accuracy and efficiency.

Jae-Hong Lee¹, Young-Taek Kim², Jong-Bin Lee³, Seong-Nyum Jeong⁴.

Abstract

PURPOSE: The aim of this study was to evaluate and compare the accuracy performance of dental professionals in the classification of different types of dental implant systems (DISs) using panoramic radiographic images with and without the assistance of a deep learning (DL) algorithm.
METHODS: Using a self-reported questionnaire, the classification accuracy of dental professionals (including 5 board-certified periodontists, 8 periodontology residents, and 31 dentists not specialized in implantology working at 3 dental hospitals) with and without the assistance of an automated DL algorithm were determined and compared. The accuracy, sensitivity, specificity, confusion matrix, receiver operating characteristic (ROC) curves, and area under the ROC curves were calculated to evaluate the classification performance of the DL algorithm and dental professionals.
RESULTS: Using the DL algorithm led to a statistically significant improvement in the average classification accuracy of DISs (mean accuracy: 78.88%) compared to that without the assistance of the DL algorithm (mean accuracy: 63.13%, P<0.05). In particular, when assisted by the DL algorithm, board-certified periodontists (mean accuracy: 88.56%) showed higher average accuracy than did the DL algorithm, and dentists not specialized in implantology (mean accuracy: 77.83%) showed the largest improvement, reaching an average accuracy similar to that of the algorithm (mean accuracy: 80.56%).
CONCLUSIONS: The automated DL algorithm classified DISs with accuracy and performance comparable to those of board-certified periodontists, and it may be useful for dental professionals for the classification of various types of DISs encountered in clinical practice.

Entities: Chemical

Keywords: Artificial intelligence; Deep learning; Dental implants; Dentist

Year: 2022 PMID： 35775697 PMCID： PMC9253278 DOI： 10.5051/jpis.2104080204

Source DB: PubMed Journal: J Periodontal Implant Sci ISSN： 2093-2278 Impact factor: 2.086

INTRODUCTION

In recent decades, dental implants have been considered as one of the most predictable treatment modalities for the replacement of natural teeth, with an overall cumulative 10-year survival rate of 96.4% (95% confidence interval [CI], 95.2%–97.5%) [1]. In accordance with this trend, numerous implant manufacturers worldwide have developed various types of dental implant systems (DISs), which have been successfully used in clinical practice [23]. Despite their relatively high long-term survival and success rate, biological (e.g., peri-implant mucositis and peri-implantitis) and mechanical (e.g., fracture of prosthetic or fixture parts and screw loosening) complications are frequently associated with DISs [456]. A long-term follow-up study found that the cumulative complication rate after an observation period of up to 16 years was 48.03%, and the prevalence of biological and mechanical complications was 16.94% and 31.09%, respectively [7]. Therefore, regular repair and maintenance care are essential in order to ensure long-term success, and it is critically important for clinicians to be able to identify the brand and model of DISs [89]. Studies using 2- and 3-dimensional dental radiographs to train deep learning (DL) algorithms based on convolutional neural networks are being conducted; these models have shown excellent performance in the detection, classification, and segmentation of irregular and complicated medical radiographic images [101112]. In particular, most current research on DL algorithms for the identification and classification of various types of DISs achieved favorable and reliable outcomes with an overall accuracy performance of over 80% [1314151617]. In recent years, several studies have reported that medical professionals, especially experienced and board-certified radiologists, when assisted by a DL algorithm, showed improved diagnostic accuracy and enhanced efficiency in terms of reduced reading times without compromising the detection and classification accuracy [1819]. However, to the best of our knowledge, there is no research on whether the assistance of DL algorithms is clinically efficacious in the classification of DISs by dental professionals, including board-certified periodontists, periodontology residents, and dentists not specialized in implantology. Therefore, the current study was conducted to compare the accuracy of dental professionals in the classification of DISs using panoramic radiographic images with and without the assistance of a DL algorithm.

MATERIALS AND METHODS

Ethics

This multi-center study was approved by the Institutional Review Board of Daejeon Dental Hospital, Wonkwang University (WKUDH; approval No. W2104/003-001), and the requirement for informed consent was waived. All radiographic images were anonymized, and no clinical information was provided. The corresponding author (JHL), who did not have any conflicts of interest, had full access to and managed all the data used in the current study. The checklist for artificial intelligence in dental research was followed [20].

Dataset

To confirm whether DL assistance can improve the classification of various types of DISs by dental professionals, we used the training (80%; n=5,716) and validation (20%; n=1,429) datasets from our previous study. A total of 180 cropped panoramic images containing only DISs were newly collected and used as the test dataset for this study. The dataset contained 6 different types of DISs with a diameter of 3.3–5.0 mm and length of 7–13 mm; the DISs included Astra OsseoSpeed® TX (n=30), Dentium Implantium® (n=30), Dentium Superline® (n=30), Osstem TSIII® (n=30), Straumann SLActive® BL (n=30), and Straumann SLActive® BLT (n=30). The images were collected from 3 dental hospitals of Daejeon Dental Hospital, Wonkwang University, Ilsan Hospital, National Health Insurance Service (NHIS-IH), and Mokdong Hospital, Ewha Womans University (EWU-MH), respectively.

Automated DL algorithm

An automated DL algorithm (Neuro-T version 2.0.1, Neurocle Inc., Seoul, Korea), which was developed to select the best model and optimize the hyper-parameters of neural networks, was used in this study. Detailed information about its DL architecture and hyper-parameter configuration has been reported in a previously published paper [21].

Comparison of the accuracy performance of dental professionals in classification with and without the assistance of the DL algorithm

Using a self-reported questionnaire, the accuracy performances of dental professionals with and without the assistance of the automated DL algorithm were assessed and compared. The survey was provided in paper and PDF formats; it was completed by 5 board-certified periodontists, 8 periodontology residents, 17 conservative and pediatric dentistry residents, and 14 interns with relatively less experience and exposure in implant dentistry from 3 dental hospitals (WKUDH, NHIS-IH, and EWU-MH). The questionnaire comprised 180 questions that showed cropped radiographic images and enquired about the type of DISs; it did not include any personal information, except for the information about each occupational category of dental professionals. There was an interval of at least 1 month between the first and second surveys, and the content of the questionnaire was used to classify 180 cropped panoramic images into 6 types of DISs. During the second survey, the classifications determined by the DL algorithm were provided.

Statistical analysis

The surveyed data were collected and managed as a spreadsheet using Excel (version 360, Microsoft, Redmond, WA, USA). All indicators of statistical accuracy were summed for each group (all dental professionals, board-certified periodontists, periodontology residents, and dentists not specialized in implantology), and the average mean values were compared among the dental professionals with and without the assistance of the DL algorithm. Receiver operating characteristic (ROC) curve analyses were conducted to evaluate panoramic radiographic image-wise classification performance, and the areas under the ROC curves (AUCs) were compared. Sensitivity (true positive [TP]/TP + false negative [FN]), specificity (true negative [TN]/TN + false positive [FP]), accuracy (TP + TN / TP + TN + FP + FN), and a confusion matrix were also calculated using Neuro-T (version 2.0.1) and R statistical software (version 3.5, R Foundation for Statistical Computing, Vienna, Austria). For all tests, a P value <0.05 was considered to indicate statistical significance.

RESULTS

Visualization of class activation

A visualization of contributing features and class activation maps indicated the discriminative regions used by the automated DL algorithm to classify the 6 different types of DISs, which helped to identify and interpret the model output. In Figure 1, the most relevant salient areas are highlighted in yellow to red to identify the most discriminant features of the DISs.

Figure 1

Visualization of the class activation and feature maps of the 6 different types of dental implant systems.

Performance of the automated deep convolutional neural network algorithm compared to that of dental professionals

When comparing the average accuracy between the automated DL algorithm and dental professionals, the automated DL algorithm (mean accuracy: 80.56%) outperformed most participants (mean accuracy: 63.13%), including board-certified periodontists (mean accuracy: 77.67%), periodontal residents (mean accuracy: 67.94%), and dentists not specialized in implantology (mean accuracy: 57.81%). When assisted by the DL algorithm, the average classification accuracy of all dental professionals (mean accuracy: 78.88%) significantly improved (P<0.05). In particular, board-certified periodontists (mean accuracy: 88.56%) showed a higher average accuracy compared to that of the DL algorithm, and dentists not specialized in implantology (mean accuracy: 77.83%) showed the largest improvement when assisted by the algorithm, reaching a similar average accuracy to that of the DL algorithm (Figure 2).

Figure 2

Comparison of average accuracy of dental professionals for the classification of 6 different types of DISs with and without the assistance of the DL algorithm. Statistically significant improvement in classification accuracy was seen with the assistance of the DL algorithm.

DL: deep learning.

a) P<0.05.

Comparison of average accuracy of dental professionals for the classification of 6 different types of DISs with and without the assistance of the DL algorithm. Statistically significant improvement in classification accuracy was seen with the assistance of the DL algorithm.

DL: deep learning. a) P<0.05.

Confusion matrix

Figure 3 illustrates the confusion matrix with normalization, showing a summary of the multiclass classification of DISs based on the automated DL algorithm and by dental professionals with and without the assistance of the algorithm. The higher the diagonal values and the darker the shade of blue in the confusion matrix, the more accurate were the classification outcomes. For the automated DL algorithm, the classification accuracy was the highest for Straumann SLActive® BLT (100%), and the diagnostic accuracy was the lowest for Dentium Superline® (56.7%). For dental professionals without the assistance of the DL algorithm, the classification accuracy was the highest for Straumann SLActive® BLT (87.2%), and the diagnostic accuracy was the lowest for Dentium Superline® (44.0%). For dental professionals with the assistance of the DL algorithm, the classification accuracy was the highest for Straumann SLActive® BLT (89.9%), and the diagnostic accuracy was the lowest for Osstem TSIII® (68.0%).

Figure 3

Multi-label classification confusion matrix with normalization. (A) Automated DL, (B, C) dental professionals with and without the assistance of the DL algorithm.

DL: deep learning.

Multi-label classification confusion matrix with normalization. (A) Automated DL, (B, C) dental professionals with and without the assistance of the DL algorithm.

DL: deep learning.

Outcomes for the classification of 6 different types of DISs

The automated DL algorithm outperformed most of the participating dental professionals in terms of overall sensitivity and specificity. The comparison is indicated in Table 1 and Figure 4. In particular, the superiority of the automated DL algorithm was distinct for Straumann SLActive® BLT (accuracy: 0.989, sensitivity: 1.000, and specificity: 0.987). Among dental professionals, board-certified periodontists showed the highest accuracy in DIS classification, which was 98.8% (sensitivity: 0.953 and specificity: 0.995) and 99.3% (sensitivity: 0.967 and specificity: 0.999) with and without the assistance of the DL algorithm, respectively.

Table 1

Comparison of accuracy between dental professionals for the classification of 6 different types of DISs with and without the assistance of the automated DL algorithm

		Automated DL algorithm			Without the assistance of the DL algorithm			With the assistance of the DL algorithm			P value^a)
		Accuracy	Sensitivity	Specificity	Accuracy	Sensitivity	Specificity	Accuracy	Sensitivity	Specificity	P value^a)
Dentsply Astra OsseoSpeed TX^®
	Automated DL algorithm	0.922	0.800	0.947
	All dental professionals				0.864	0.588	0.919	0.934	0.794	0.963	<0.001
	Board-certified periodontists				0.912	0.747	0.945	0.957	0.873	0.973	<0.001
	Periodontology residents				0.879	0.548	0.945	0.925	0.710	0.968	<0.001
	Dentists not specialized in implantology				0.847	0.562	0.904	0.932	0.803	0.958	<0.001
Dentium Implantium^®
	Automated DL algorithm	0.944	0.833	0.967
	All dental professionals				0.876	0.632	0.925	0.937	0.841	0.956
	Board-certified periodontists				0.941	0.867	0.956	0.976	0.967	0.977	<0.001
	Periodontology residents				0.887	0.776	0.909	0.930	0.910	0.934	<0.001
	Dentists not specialized in implantology				0.856	0.523	0.922	0.929	0.785	0.958	<0.001
Dentium Superline^®
	Automated DL algorithm	0.894	0.567	0.960
	All dental professionals				0.832	0.440	0.910	0.898	0.796	0.937
	Board-certified periodontists				0.903	0.593	0.965	0.941	0.793	0.971	0.002
	Periodontology residents				0.848	0.529	0.911	0.893	0.690	0.933	<0.001
	Dentists not specialized in implantology				0.808	0.370	0.896	0.890	0.690	0.930	<0.001
Osstem TSIII^®
	Automated DL algorithm	0.900	0.700	0.940
	All dental professionals				0.807	0.441	0.881	0.894	0.680	0.937	<0.001
	Board-certified periodontists				0.859	0.607	0.909	0.923	0.807	0.947	<0.001
	Periodontology residents				0.839	0.514	0.904	0.886	0.629	0.937	<0.001
	Dentists not specialized in implantology				0.783	0.373	0.865	0.889	0.667	0.934	<0.001
Straumann SLActive^® BL
	Automated DL algorithm	0.961	0.933	0.967
	All dental professionals				0.928	0.816	0.951	0.957	0.873	0.974	<0.001
	Board-certified periodontists				0.950	0.893	0.961	0.981	0.907	0.996	<0.001
	Periodontology residents				0.945	0.857	0.963	0.976	0.924	0.987	<0.001
	Dentists not specialized in implantology				0.917	0.782	0.944	0.945	0.847	0.965	<0.001
Straumann SLActive^® BLT
	Automated DL algorithm	0.989	1.000	0.987
	All dental professionals				0.955	0.872	0.972	0.977	0.899	0.992	<0.001
	Board-certified periodontists				0.988	0.953	0.995	0.993	0.967	0.999	0.223
	Periodontology residents				0.962	0.852	0.984	0.981	0.910	0.995	0.004
	Dentists not specialized in implantology				0.945	0.858	0.962	0.971	0.878	0.989	<0.001

DIS: dental implant system, DL: deep learning.

a)P values for accuracy comparisons between dental professionals performing classifications with and without the assistance of the automated DL.

Figure 4

Area under the receiver operating characteristic curve for the performance of the automated DL algorithm in the classification of 6 different types of dental implant systems in comparison with those of dental professionals with and without the assistance of the DL algorithm.

DL: deep learning.

DIS: dental implant system, DL: deep learning. a)P values for accuracy comparisons between dental professionals performing classifications with and without the assistance of the automated DL.

Area under the receiver operating characteristic curve for the performance of the automated DL algorithm in the classification of 6 different types of dental implant systems in comparison with those of dental professionals with and without the assistance of the DL algorithm.

DL: deep learning.

DISCUSSION

Panoramic radiography, along with intraoral periapical radiography, is the most widely and commonly used dental radiologic examination for identifying the brand and model of a DIS. In order to clearly classify DISs, it is ideal to have a radiograph perpendicular to the long axis of the implant fixture, where the aspects (including thread type, groove, tapered, and collar shape) are likely to be particularly visible. Nevertheless, there are many cases where the implant fixture position makes it difficult to acquire such a radiograph for various anatomical or prosthetic reasons, and these require expertise and time-consuming work to identify the DISs used. Therefore, several studies have been conducted on the development and evaluation of various pre-trained and/or fine-tuned DL algorithms for the identification and classification of DISs [1314151617]. A pilot study using a fine-tuned YOLO v3 model with 1,282 panoramic images of 6 types of DISs demonstrated that the TP ratio and average precision of each DIS varied from 0.50 to 0.82 and from 0.51 to 0.85, respectively [14]. Sukegawa et al. [13] reported that when a total of 8,859 images of 11 types of DISs were trained through a finely tuned VGGNet-16 model, an average accuracy of 92.7% was achieved, and another study using a pre-trained GoogLeNet Inception model found a 93.8% (95% CI, 87.2%–99.4%) diagnostic accuracy when 1,206 images of 6 types of DISs were used [15]. In another recent study that trained 5 DL models (SqueezeNet, GoogLeNet, ResNet-18, MobileNet-v2, and ResNet-50) using 801 images of 4 types of DISs, the average accuracy exceeded 90% (93%–98%) [16]. Unlike the DL algorithms used in the above studies, we tried to improve DIS classification accuracy by using an automated DL algorithm, rather than conventional DL algorithms made by human experts. An automated DL model builds the entire DL pipeline automatically, mainly using Bayesian optimization and random search methods, to optimize models and hyper-parameters [2223]. In particular, automated DL is considered a useful technique for developing optimized DL models with limited cost, time, and computing power resources. In a recent study, an automated DL model showed excellent accuracy performance in the detection and classification of DISs using dental radiographic images [2124]. The automated DL model using periapical images showed more reliable accuracy performance in the detection (AUC=0.984; 95% CI, 0.900–1.000) and classification (AUC=0.869; 95% CI, 0.778–0.929) of fractured implants compared to those of pre-trained and fine-tuned VGGNet-19 and GoogLeNet models [24]. In addition, the automated DL model for panoramic and periapical images has shown excellent accuracy performance (AUC=0.954; 95% CI, 0.933–0.970), and results comparable to or better than those of dental professionals including board-certified periodontists, periodontology residents, and dentists not specialized in implantology (P<0.05) [21]. Several studies have clearly demonstrated that assistance of a DL model improved the performance and efficiency of medical professionals [1819]. One study reported that a DL mode for bone age determination showed significant correlations with the reference bone age (r=0.992, P<0.001) and tended to enhance efficiency by reducing the reading times (from 18% to 40%) without compromising accuracy performance [18]. Another study also confirmed that the assistance of a DL algorithm improved the accuracy performance of thoracic radiologists (AUC=0.93–0.98; P=0.002) in the detection and localization of major abnormal findings (including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax) on chest radiographic images and reduced the reading time (from 10–65 seconds to 6–27 seconds; P<0.001) [19]. Consistent with the results of previous studies, our findings also confirmed that assistance of the DL algorithm significantly improved the average classification accuracy of DISs (P<0.05). In particular, board-certified periodontists with the assistance of the DL algorithm (mean accuracy: 88.56%) showed higher accuracy than the DL algorithm alone (mean accuracy: 80.56%); this seems to imply synergy between the assistance of the DL and knowledge of experienced experts. In addition, dentists not specialized in implantology showed a larger increase in accuracy than board-certified periodontists and periodontal residents. This demonstrates that the assistance of a DL algorithm can be of considerable clinical value in the decision-making of inexperienced dental professionals. There were several limitations in this study. First, due to the retrospective nature of our study, there was a possibility of spectrum bias, although dental radiographic images were collected from 3 dental hospitals. Second, because this automated DL algorithm did not analyze and assess periapical radiographic images, our study did not include them. When looking at the results of past studies using apical and periodontal images, datasets containing periapical images seem to be associated with higher accuracy than datasets containing panoramic images [1721]. Third, although images of 6 different types of DISs were collected from a multi-center database, the types of DISs and quantity of the dataset were still insufficient for direct clinical applications. Therefore, it is crucial to collect high-quality, large-scale datasets through well-designed prospective studies. In conclusion, the automated DL algorithm classified DISs on panoramic radiographs with an accuracy performance comparable to that of board-certified periodontists, and it will be helpful for dental professionals classifying various types of DISs in clinical practice.

24 in total

1. Artificial intelligence in dental research: Checklist for authors, reviewers, readers.

Authors: Falk Schwendicke; Tarry Singh; Jae-Hong Lee; Robert Gaudin; Akhilanand Chaurasia; Thomas Wiegand; Sergio Uribe; Joachim Krois
Journal: J Dent Date: 2021-02-22 Impact factor: 4.379

2. Dentists' Most Common Practices when Selecting an Implant System.

Authors: Ahed Al-Wahadni; Mohamed S Barakat; Khladoon Abu Afifeh; Yusuf Khader
Journal: J Prosthodont Date: 2017-10-25 Impact factor: 2.752

3. Diagnosis of cystic lesions using panoramic and cone beam computed tomographic images based on deep learning neural network.

Authors: Jae-Hong Lee; Do-Hyung Kim; Seong-Nyum Jeong
Journal: Oral Dis Date: 2019-11-18 Impact factor: 3.511

4. Geometric comparison of five interchangeable implant prosthetic retaining screws.

Authors: M J Jaarda; M E Razzoog; D G Gratton
Journal: J Prosthet Dent Date: 1995-10 Impact factor: 3.426

5. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm.

Authors: Jae-Hong Lee; Do-Hyung Kim; Seong-Nyum Jeong; Seong-Ho Choi
Journal: J Dent Date: 2018-07-26 Impact factor: 4.379

6. Incidence and pattern of implant fractures: A long-term follow-up multicenter study.

Authors: Jae-Hong Lee; Yeon-Tae Kim; Seong-Nyum Jeong; Na-Hong Kim; Dong-Woon Lee
Journal: Clin Implant Dent Relat Res Date: 2018-05-15 Impact factor: 3.932

7. A Performance Comparison between Automated Deep Learning and Dental Professionals in Classification of Dental Implant Systems from Dental Imaging: A Multi-Center Study.

Authors: Jae-Hong Lee; Young-Taek Kim; Jong-Bin Lee; Seong-Nyum Jeong
Journal: Diagnostics (Basel) Date: 2020-11-07

8. Efficacy of deep convolutional neural network algorithm for the identification and classification of dental implant systems, using panoramic and periapical radiographs: A pilot study.

Authors: Jae-Hong Lee; Seong-Nyum Jeong
Journal: Medicine (Baltimore) Date: 2020-06-26 Impact factor: 1.817

9. Transfer Learning via Deep Neural Networks for Implant Fixture System Classification Using Periapical Radiographs.

Authors: Jong-Eun Kim; Na-Eun Nam; June-Sung Shim; Yun-Hoa Jung; Bong-Hae Cho; Jae Joon Hwang
Journal: J Clin Med Date: 2020-04-14 Impact factor: 4.241