Literature DB >> 35686357

Benchmarking Deep Learning Models for Tooth Structure Segmentation.

L Schneider^1,2, L Arsiwala-Scheppach^1,2, J Krois^1,2, H Meyer-Lueckel³, K K Bressem^4,5, S M Niehues⁴, F Schwendicke^1,2.

Abstract

A wide range of deep learning (DL) architectures with varying depths are available, with developers usually choosing one or a few of them for their specific task in a nonsystematic way. Benchmarking (i.e., the systematic comparison of state-of-the art architectures on a specific task) may provide guidance in the model development process and may allow developers to make better decisions. However, comprehensive benchmarking has not been performed in dentistry yet. We aimed to benchmark a range of architecture designs for 1 specific, exemplary case: tooth structure segmentation on dental bitewing radiographs. We built 72 models for tooth structure (enamel, dentin, pulp, fillings, crowns) segmentation by combining 6 different DL network architectures (U-Net, U-Net++, Feature Pyramid Networks, LinkNet, Pyramid Scene Parsing Network, Mask Attention Network) with 12 encoders from 3 different encoder families (ResNet, VGG, DenseNet) of varying depth (e.g., VGG13, VGG16, VGG19). On each model design, 3 initialization strategies (ImageNet, CheXpert, random initialization) were applied, resulting overall into 216 trained models, which were trained up to 200 epochs with the Adam optimizer (learning rate = 0.0001) and a batch size of 32. Our data set consisted of 1,625 human-annotated dental bitewing radiographs. We used a 5-fold cross-validation scheme and quantified model performances primarily by the F1-score. Initialization with ImageNet or CheXpert weights significantly outperformed random initialization (P < 0.05). Deeper and more complex models did not necessarily perform better than less complex alternatives. VGG-based models were more robust across model configurations, while more complex models (e.g., from the ResNet family) achieved peak performances. In conclusion, initializing models with pretrained weights may be recommended when training models for dental radiographic analysis. Less complex model architectures may be competitive alternatives if computational resources and training time are restricting factors. Models developed and found superior on nondental data sets may not show this behavior for dental domain-specific tasks.

Entities: Chemical

Keywords: artificial intelligence; computer vision; neural networks; segmentation; tooth structures; transfer learning

Mesh：

Year: 2022 PMID： 35686357 PMCID： PMC9516600 DOI： 10.1177/00220345221100169

Source DB: PubMed Journal: J Dent Res ISSN： 0022-0345 Impact factor: 8.924

Introduction

Deep learning (DL) has been widely employed for image analytics in dermatology (skin photographs) (Jafari et al. 2016), ophthalmology (retina imagery) (Son et al. 2020), or pathology (histological specimens) (Kather et al. 2019). Also in dentistry, DL classification models have been employed to predict the modality of radiographs (Cejudo et al. 2021), the presence of caries lesions (Lee et al. 2018), periodontal bone loss (Krois et al. 2019), and apical lesions (Ekert et al. 2019) on dental radiographs. DL segmentation models, which perform a classification task at the pixel level, were used for the segmentation of anatomical structures in panoramic images (Cha et al. 2021), apical lesions on cone beam computed tomography scans (Orhan et al. 2020), periodontal bone loss on panoramic radiographs (Kim et al. 2019), and caries lesions on bitewings (Cantu et al. 2020). Recent guidelines in the field call for rigorous and comprehensive planning, conducting, and reporting of DL studies in dentistry (Schwendicke et al. 2021). One key element in those guidelines is a hypothesis-driven selection of the DL model configuration, which includes, among others, its architecture, its complexity, and the initialization strategy for the model weights (e.g., via transfer learning). (1) Architecture: The basic unit of an artificial neural network is a neuron, which is a nonlinear mathematical model inspired by the biological neuron (McCulloch and Pitts 1943). These units are stacked to build layers that are connected via mathematical operations with other layers of neurons. The arrangement of these layers and operations defines the model architecture. Model architectures such as ResNet (He et al. 2016) or VGG (Simonyan and Zisserman 2015) are widely used in the field of machine learning. For image segmentation, specialized layers extend the basic model architectures, which in such a setting are referred to as backbone. This allows one to plug in different backbones and benchmark them for image segmentation tasks. (2) Complexity: Most model architectures are available in different degrees of complexities, which reflects the depth of the neural network (i.e., the number of layers included and the number of neurons and connections between them). Deeper models are more complex as they consist of more parameters (i.e., connections between neurons). (3) Initialization: The connections between neurons and layers of neurons, which are also referred to as model weights, are basically digits that correspond to the strength of the connection. During model training, these weights are adjusted to find a set of values that are most suitable to solve the underlying task. Starting with a predefined setting of these weights enhances the efficiency of the training process and improves model convergence. Using a predefined setting of weights that stem from a previously trained neural network provides a meaningful starting point for the training process. This technique is referred to as transfer learning (Tan et al. 2018). The sheer number of possible configurations of model architecture, including backbones, complexity, and initialization strategies, impedes systematic and comprehensive comparisons of existing study findings (Schwendicke et al. 2019). One strategy to overcome this issue is to perform benchmarking, which involves the systematic comparison of different model architectures and model configurations on an identical data set. Such benchmarking studies provide guidance for researchers in the model design process, which improves research efficiency by enabling the development of high-performing models in a shorter time at lower development costs. However, in the medical domain and, more so, dentistry, benchmarking initiatives are scarce, owing to limited data availability and high costs for establishing solid and accepted ground truth labels and annotations. To cope with these difficulties, the ITU/WHO Focus Group Artificial Intelligence for Health (FG-AI4H) is developing a standard evaluation process and benchmarking framework for artificial intelligence (AI) models in health. The present study will inform this initiative. In a recent benchmarking study, Bressem et al. (2020) benchmarked 16 different model architectures for classification tasks on 2 openly available chest radiograph data sets: CheXpert (Irvin et al. 2019) and the COVID-19 Image Data Collection. They showed that complex and deep models do not necessary outperform simpler architectures. Similarly, Ke et al. (2021) addressed the assumption that model architectures that perform better on the ImageNet data set (Deng et al. 2009), a popular open-source benchmark data set containing millions of labeled images, also generally perform better on CheXpert. This assumption was not found to be valid based on the comparison of 16 convolutional architectures on 5 classification tasks. In the present study, we aim to expand the studies of Bressem et al. (2020) and Ke et al. (2021) to a dental segmentation task. We benchmarked 216 DL models defined by their architecture, complexity, and initialization strategy. We evaluated these model configurations for a specific dental task: tooth structure (enamel, dentin, pulpal cavity, fillings, and crowns) segmentation on dental bitewing radiographs. We deliberately decided to use this application since first, there is evidence that segmentation models perform well on this task (Ronneberger et al. 2015a) and, second, there is less ambiguity about the establishment of the ground truth for this task, with tooth structures being easily discriminated even by nonsenior clinicians. We expect our results to inform dental researchers about suitable model configurations for their experiments and aim to contribute to evidence-guided DL model selection in dental research.

Materials and Methods

Benchmarking Tasks

This analysis is based on a segmentation task for tooth structures on dental bitewing radiographs. Several model development aspects were benchmarked. (1) Architecture: First, we assessed different DL model architectures, since to date, most neural networks have mainly been benchmarked on openly available data sets such as ImageNet. However, it is not yet determined whether the best-performing networks on ImageNet will also perform best for dental radiographic images. Hence, we benchmarked architectures such as U-Net (Ronneberger et al. 2015b), U-Net++ (Zhou et al. 2018), Feature Pyramid Networks (FPN) (Kirillov et al. 2019), LinkNet (Chaurasia and Culurciello 2017), Pyramid Scene Parsing Network (PSPNet) (Zhao et al. 2017), and Mask Attention Network (MAnet) (Fan et al. 2020), among others. These networks were selected, as they all allow to employ the same established backbones of varying depths of model layers (ResNet50 [He et al. 2016], VGG13 [Simonyan and Zisserman 2015], DenseNet121 [Huang et al. 2017]). The depth of the encoder is conventionally represented by the digits behind the name of the architecture (e.g., ResNet18, ResNet34). All model implementations were taken from the same software package (Yakubovskiy 2020). (2) Complexity: Second, we investigated the model performances emanating from model complexity. Supposedly, deeper DL models, which have more trainable parameters, outperform shallower alternatives if enough data and computational resources are available. However, deeper models are more likely to overfit training data, and model convergence may not be reached. Furthermore, limited computational resources imply restrictions regarding image resolution or batch size; both may negatively affect the model performance. (3) Initialization: Third, we analyzed different initialization strategies, such as random weights initialization or initialization based on pretrained weights from the ImageNet as well as the CheXpert data set. The latter strategies are referred to as transfer learning. Thereby, features learned on large, open data sets are directly transferred to a new task and hence do not have to be learned from scratch. This technique speeds up model convergence and improves model performance. Initialization with ImageNet is one of the most popular transfer learning strategies. Even for tasks on medical radiographs, transferring knowledge from models trained on ImageNet yields a boost in performance (Ke et al. 2021). However, the feature space learned on ImageNet differs fundamentally from medical features of radiographs. ImageNet consists of natural RGB color images that are classified into more than 20,000 classes, while radiographic images contain grayscale images and are usually classified in only a few categories. Hence, an initialization with pretrained models on radiographic images such as the CheXpert data set (Irvin et al. 2019) may potentially be more suitable for medical segmentation tasks of, for instance, dental radiographs.

Ethics Statement

This study was ethically approved by the ethics committee of the Charité (EA4/102/14 and EA4/080/18).

Study Design

In the present study, 72 models were built from a combination of varying architectures and encoder backbones and were each trained with 3 different initialization strategies on a tooth structure segmentation task. Each model was trained with 5-fold cross-validation with varying train, validation, and test sets for each fold. Hence, for each model run, the data were randomly split into training, validation, and test data with proportions of 60% (3 folds), 20% (1 fold), and 20% (1 fold), respectively. We additionally applied a sensitivity analysis and assessed model performances on underrepresented classes (in our case, fillings and crowns), as in real life, medical data set class imbalance is likely the rule and not the exception. Reporting of this study follows the Standards for Reporting Diagnostic Accuracy guideline (STARD) (Bossuyt et al. 2015) and the Checklist for Artificial Intelligence in Dental Research (Schwendicke et al. 2021).

Performance Metrics

Model performances were primarily quantified by the F1-score, which captures the harmonic mean of recall (specificity) and precision (positive predictive value [PPV]). F1-scores are computed from the sum of true positives, false positives, and false negatives over all channels of segmentation masks and cross-validation folds. This method was described by Forman and Scholz (2010) and results in unbiased F-scores in cross-validation schemes. Secondary metrics were accuracy, sensitivity, precision, and intersection of union (IoU). Based on the distribution of the results, the median was chosen as a descriptive statistic.

Data Set, Sample Size, and Reference Test

The available data set consisted of 1,625 dental bitewing radiographs with a maximum of 8 to 9 teeth per image and is described in detail in the Appendix. Tooth structures visible on bitewing radiographs (namely, enamel, dentin, the pulp cavity, and nonnatural “structures” like fillings and crowns) were annotated in a pixel-wise fashion (as masks) by 1 dental expert. These masks represent the ground truth for each data sample. In a second iteration, those annotations were reviewed by another dental expert for validity and correctness. Each annotator independently assessed each image using an in-house custom-built annotation tool described in Ekert et al. (2019). All examiners were calibrated and advised on how to perform the segmentation. Images with implants, bridges, or root canal fillings were very rare (<1%) and therefore excluded. Notably, enamel, dentin, and pulpal areas were present in every radiograph, while fillings and crowns were only available in 80% and 20% of images, respectively. Images and segmentation masks were resized to a resolution of 224 × 224 to provide a fixed input size of the images as required by the model architectures.

Models and Training

As represented in Figure 1, models were built by combining different model architectures (U-Net, U-Net++, FPN, LinkNet, PSPNet, MAnet) with backbones from 3 different families (ResNet, VGG, DenseNet) of different depths (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, VGG13, VGG16, VGG19, DenseNet121, DenseNet161, DenseNet169, DenseNet201). This led to a total of 72 model designs, which were each initialized with 3 different strategies (random, ImageNet, CheXpert), resulting into 216 trained models in total. All models were trained under a 5-fold cross-validation scheme, where the combination of samples in training, validation, and test set was varied for each fold to achieve a reasonable estimate of the model performance independent from the data split. Details on training are described in the Appendix.

Figure 1.

Illustration of the study design. Model setups were based on different architectures, encoder backbones, and initialization strategies (top) and 5-fold cross-validation with varying train, validation, and test sets for each fold (bottom). Exemplary bitewing radiograph (left) and tooth structure components overlaid on an input image (right).

Statistical Analysis

Model configurations with respect to initialization strategies and architectures were ranked according to their median F1-score and formally tested for differences between configurations with the nonparametric Wilcoxon rank-sum test. The nonparametric Spearman’s rank-order correlation was estimated to determine the relationship between complexity and model performance (F1-score). To account for multiple comparisons, we adjusted the P values using the Benjamini–Hochberg method (Benjamini and Hochberg 1995). P values below 0.05 were considered statistically significant. The number of pairwise comparisons C of conditions k was computed via equation (1).

Results

Figure 2 presents an overview of segmentation outputs generated by different model architectures in comparison to the ground truth. Figure 3 shows the F1-scores of different model configurations grouped by architecture, backbone family, and initialization strategy.

Figure 2.

Examples of segmented bitewing radiographs. (A) Naive input image. (B) Ground truth and (C–H) output of tooth structure segmentation by different model architectures. The red, dark green, light green, gray, and blue colors indicate enamel, pulp cavity and root canals, dentin, filling, and crown classes, respectively. All models in this example were built with a ResNet50 backbone and initialized with pretrained CheXpert weights. This figure is available in color online.

Figure 3.

F1-scores stratified by initialization strategy, architecture, and backbone family based on sample sizes n. Median, interquartile range, and 95% confidence interval are represented by the white dot, the black box, and the black line, respectively. Different superscript letters indicate statistically significant difference (e.g., between U-Net and LinkNet), while the same superscript letters represent no significant difference (e.g., between LinkNet and U-Net++) (see Appendix for more details).

(1) Architecture: Out of 15 pairwise comparisons of model architectures, 14 turned out to be statistically significantly different. U-Net++, U-Net, and LinkNet achieved a median (interquartile range [IQR]) F1-score of 0.86 (0.85, 0.87), (0.84, 0.86), and (0.85, 0.88), respectively, and outperformed MAnet, PSPNet, and FPN with statistical significance. Backbones from the VGG and DenseNet group reached a median (IQR) of 0.85 (0.83, 0.86) and (0.81, 0.86), respectively, while the ResNet group reached a median (IQR) F1-score of 0.84 (0.81, 0.86). Models with backbones from the VGG group outperformed models with backbones of the ResNet group with statistical significance. (2) Complexity: We found a statistically significant weak positive monotonic relationship between the network size and its performance with r = 0.32 (P < 0.001). (3) Initialization: Different initialization strategies computed over all architectures and backbones achieved F1-scores of 0.86 (0.83, 0.87) (ImageNet), 0.86 (0.83, 0.87) (CheXpert), and 0.83 (0.77, 0.84) (random initialization). Models initialized with ImageNet or CheXpert outperformed models initialized with random weights (PImageNet < 0.001, PCheXpert < 0.001). No significant difference was observed between ImageNet and CheXpert (P = 0.85). (4) Class imbalances: In a sensitivity analysis, the model performance was evaluated on the minority classes of filling (80%) and crown (20%). In general, models’ performance was inversely related to class frequencies (Fig. 4).

Figure 4.

F1-scores of different models in the minority classes, filling (white) and crown (steel blue), respectively. We stratified the analyses by initialization strategy, architecture, and backbone family. Median, interquartile range, and 95% confidence interval are represented by the white dot, the black box, and the black line, respectively. Results are based on a sample size n. This figure is available in color online.

(4.1) Architecture: Models based on a VGG backbone outperformed models with a ResNet backbone on the minority classes of filling (P = 0.009) and crown (P = 0.013). Notably, there was no statistical difference between the 3 backbones on the majority classes of pulpal cavity and dentin. (4.2) Complexity: We found a statistically significant weak positive monotonic relationship between the network size and its performance for class dentin (r = 0.245, P < 0.001), enamel (r = 0.239, P < 0.001), filling (r = 0.195, P = 0.004), pulpa (r = 0.218, P < 0.001), and class crown (r = 0.154, P < 0.023). (4.3) Initialization: Models with ImageNet and CheXpert initialization consistently outperformed models with random initialization. There was no statistically significant difference between ImageNet and CheXpert initializations. Examples of segmented bitewing radiographs. (A) Naive input image. (B) Ground truth and (C–H) output of tooth structure segmentation by different model architectures. The red, dark green, light green, gray, and blue colors indicate enamel, pulp cavity and root canals, dentin, filling, and crown classes, respectively. All models in this example were built with a ResNet50 backbone and initialized with pretrained CheXpert weights. This figure is available in color online. F1-scores stratified by initialization strategy, architecture, and backbone family based on sample sizes n. Median, interquartile range, and 95% confidence interval are represented by the white dot, the black box, and the black line, respectively. Different superscript letters indicate statistically significant difference (e.g., between U-Net and LinkNet), while the same superscript letters represent no significant difference (e.g., between LinkNet and U-Net++) (see Appendix for more details). F1-scores of different models in the minority classes, filling (white) and crown (steel blue), respectively. We stratified the analyses by initialization strategy, architecture, and backbone family. Median, interquartile range, and 95% confidence interval are represented by the white dot, the black box, and the black line, respectively. Results are based on a sample size n. This figure is available in color online.

Discussion

We benchmarked 216 models defined by their architecture, complexity, and initialization strategy on a tooth structure segmentation task of dental bitewing radiographs. Several findings require a more detailed discussion. First, we aimed to evaluate whether there are superior model architectures for the tooth segmentation task at hand. We discovered a performance advantage of models with backbones from the VGG family over models with backbones from the ResNet family. Our findings are consistent with those from Ke et al. (2021), who reported that architecture improvements reported on ImageNet may not always be translated to performances on medical imaging tasks. New model architectures and model improvements seem to be prone to overfitting on ImageNet data sets. Hence, transferability of newest AI research results into other domains, here the dental domain, may not be guaranteed. The statistically significant performance advantage of models with VGG encoder backbones plead for the usage of VGG encoders, when solid baseline models are required, which perform reasonably well across different model configurations and settings. This may be relevant for the implementation of proof of concepts, for example. The top 10 performing models on the tooth structure segmentation task were built with backbones from the ResNet and DenseNet family. Consequently, if the focus is on model performance, it seems warranted to invest time to find an optimal model configuration based on more complex models (e.g., from the ResNet family). If, however, the validation of general concepts or benchmarking is the focus of the study, VGG-based models seem a reasonable choice as they are more robust across model configurations. Second, one of our objectives evolved around the effect of the model complexity on the model performance. One of the key findings was a weak positive relationship between model depth and model performance. Therefore, we accept our hypothesis. Notably, however, the number of parameters increased in large steps, with only incremental improvements of model performance. Hence, the performance improvement was oftentimes disproportionate to the increasing demands for computational resources, training time, or the need to reduce image resolutions. The largest network in the present study was MAnet combined with a ResNet152 backbone, which reached an F1-score of 0.85 (0.85, 0.85) over all folds (ImageNet initialization). LinkNet in combination with a ResNet50 backbone was 5 times smaller but reached an F-score of 0.88 (0.88, 0.88) in comparison. It should be highlighted that lower computational costs allow for input imagery of higher resolution, which may be relevant for many dental applications. Our third objective, aimed to give insights whether initializing with ImageNet or CheXpert, is consistently superior even when there is a difference in performance between both initialization strategies. We found statistically significant performance boosts for models initialized with ImageNet or CheXpert weights in comparison to a random initialization. These findings are consistent with those from Ke et al. (2021), who reported that 12 of 16 architectures benefited from an initialization with ImageNet weights for a classification task of chest radiographs. The comparison of ImageNet and CheXpert initialization showed no significant differences. Fourth, we additionally found predictions on the minority class of filling (80%) to be generally more stable over different model configurations than predictions on class crowns (20%). Our results showed that there are superior architectures for segmenting minority classes (e.g., U-Net, U-Net++, LinkNet), but choosing a reasonable architecture may not be sufficient to overcome class imbalance. Hence, it could be recommended to address this problem with weighted loss functions (Guerrero-Penã et al. 2018) or oversampling (Buda et al. 2018). This study comes with several limitations. First, our results were based on 1 specific DL task, a tooth structure segmentation on bitewing radiographs, and are limited to the examined model architectures. Hence, we do not claim generalizability of our findings across other segmentation tasks or over all existing model architectures. Second, images of our data set originate from varying machines, which may lead to different behavior of the models. Furthermore, radiographs with bridges, implants, and root canal fillings were not considered in the present study as they were very rare. We accept this as our aim was to benchmark models and not to build clinically useful ones in this study. In line with this, we were only aiming at a model comparison instead of proposing a high-precision model. Hence, we did not take any actions against the existing class imbalance and did not perform an extensive hyperparameter search. Finally, we based our analysis of the relationship between model performances and model complexity exclusively on the number of model parameters. It may be the case that model architectures with more parameters require less computational power through more efficient structures of layers. Furthermore, we did not evaluate the effect of minor differences in performance within the dental environment or how computational resources are affected by differences in the number of parameters of the models.

Conclusion

We benchmarked different configurations of DL models based on their architecture, backbone, and initialization strategy regarding their performance on a tooth structure segmentation task of dental bitewing radiographs to provide guidance for researchers in their DL model selection process. Regarding the superiority of certain model architectures, we found that VGG backbones provided solid baseline models across different model configurations, while peak performances were reached through combinations of U-Net++, LinkNet, and ResNet or DenseNet encoders. Superior architectures did not overcome class imbalance. Models known to perform better than others on a nondental data set like ImageNet did not demonstrate such superiority on our dental imaging task. The analysis of the relationship between model complexity and performance showed that deeper models did not necessarily perform better than shallow alternatives with lower demands in computational resources. Finally, we found that transfer learning boosts model performance, independent of the origin of transferred knowledge.

Author Contributions

L. Schneider, contributed to conception, design, data analysis, and interpretation, drafted and critically revised the manuscript; L. Arsiwala-Scheppach, contributed to analysis, critically revised the manuscript; J. Krois, contributed to conception, design, and data analysis, drafted and critically revised the manuscript; H. Meyer-Lueckel, contributed to interpretation, critically revised the manuscript; K.K. Bressem, contributed to acquisition and interpretation, critically revised the manuscript; S.M. Niehues, contributed to acquisition, critically revised the manuscript; F. Schwendicke, contributed to conception, design, data acquisition, and interpretation, drafted and critically revised the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work. Click here for additional data file. Supplemental material, sj-docx-1-jdr-10.1177_00220345221100169 for Benchmarking Deep Learning Models for Tooth Structure Segmentation by L. Schneider, L. Arsiwala-Scheppach, J. Krois, H. Meyer-Lueckel, K.K. Bressem, S.M. Niehues and F. Schwendicke in Journal of Dental Research

17 in total

1. A logical calculus of the ideas immanent in nervous activity. 1943.

Authors: W S McCulloch; W Pitts
Journal: Bull Math Biol Date: 1990 Impact factor: 1.758

2. Detecting caries lesions of different radiographic extension on bitewings using deep learning.

Authors: Anselmo Garcia Cantu; Sascha Gehrung; Joachim Krois; Akhilanand Chaurasia; Jesus Gomez Rossi; Robert Gaudin; Karim Elhennawy; Falk Schwendicke
Journal: J Dent Date: 2020-07-04 Impact factor: 4.379

3. Artificial intelligence in dental research: Checklist for authors, reviewers, readers.

Authors: Falk Schwendicke; Tarry Singh; Jae-Hong Lee; Robert Gaudin; Akhilanand Chaurasia; Thomas Wiegand; Sergio Uribe; Joachim Krois
Journal: J Dent Date: 2021-02-22 Impact factor: 4.379

Review 4. Convolutional neural networks for dental image diagnostics: A scoping review.

Authors: Falk Schwendicke; Tatiana Golla; Martin Dreher; Joachim Krois
Journal: J Dent Date: 2019-11-05 Impact factor: 4.379

5. Development and Validation of Deep Learning Models for Screening Multiple Abnormal Findings in Retinal Fundus Images.

Authors: Jaemin Son; Joo Young Shin; Hoon Dong Kim; Kyu-Hwan Jung; Kyu Hyung Park; Sang Jun Park
Journal: Ophthalmology Date: 2019-05-31 Impact factor: 12.079

6. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer.

Authors: Jakob Nikolas Kather; Alexander T Pearson; Niels Halama; Dirk Jäger; Jeremias Krause; Sven H Loosen; Alexander Marx; Peter Boor; Frank Tacke; Ulf Peter Neumann; Heike I Grabsch; Takaki Yoshikawa; Hermann Brenner; Jenny Chang-Claude; Michael Hoffmeister; Christian Trautwein; Tom Luedde
Journal: Nat Med Date: 2019-06-03 Impact factor: 53.440

7. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

8. Panoptic Segmentation on Panoramic Radiographs: Deep Learning-Based Segmentation of Various Structures Including Maxillary Sinus and Mandibular Canal.

Authors: Jun-Young Cha; Hyung-In Yoon; In-Sung Yeo; Kyung-Hoe Huh; Jung-Suk Han
Journal: J Clin Med Date: 2021-06-11 Impact factor: 4.241

9. Deep Learning for the Radiographic Detection of Periodontal Bone Loss.

Authors: Joachim Krois; Thomas Ekert; Leonie Meinhold; Tatiana Golla; Basel Kharbot; Agnes Wittemeier; Christof Dörfer; Falk Schwendicke
Journal: Sci Rep Date: 2019-06-11 Impact factor: 4.379

10. Classification of Dental Radiographs Using Deep Learning.

Authors: Jose E Cejudo; Akhilanand Chaurasia; Ben Feldberg; Joachim Krois; Falk Schwendicke
Journal: J Clin Med Date: 2021-04-03 Impact factor: 4.241

2 in total

1. Emulating Clinical Diagnostic Reasoning for Jaw Cysts with Machine Learning.

Authors: Balazs Feher; Ulrike Kuchler; Falk Schwendicke; Lisa Schneider; Jose Eduardo Cejudo Grano de Oro; Tong Xi; Shankeeth Vinayahalingam; Tzu-Ming Harry Hsu; Janet Brinz; Akhilanand Chaurasia; Kunaal Dhingra; Robert Andre Gaudin; Hossein Mohammad-Rahimi; Nielsen Pereira; Francesc Perez-Pastor; Olga Tryfonos; Sergio E Uribe; Marcel Hanisch; Joachim Krois
Journal: Diagnostics (Basel) Date: 2022-08-14

2. Data-Driven Dental, Oral, and Craniofacial Analytics: Here to Stay.

Authors: F Schwendicke; M L Marazita
Journal: J Dent Res Date: 2022-10 Impact factor: 8.924

2 in total