| Literature DB >> 35047964 |
Dan Luo1, Wei Zeng1, Jinlong Chen1, Wei Tang1.
Abstract
Deep learning has become an active research topic in the field of medical image analysis. In particular, for the automatic segmentation of stomatological images, great advances have been made in segmentation performance. In this paper, we systematically reviewed the recent literature on segmentation methods for stomatological images based on deep learning, and their clinical applications. We categorized them into different tasks and analyze their advantages and disadvantages. The main categories that we explored were the data sources, backbone network, and task formulation. We categorized data sources into panoramic radiography, dental X-rays, cone-beam computed tomography, multi-slice spiral computed tomography, and methods based on intraoral scan images. For the backbone network, we distinguished methods based on convolutional neural networks from those based on transformers. We divided task formulations into semantic segmentation tasks and instance segmentation tasks. Toward the end of the paper, we discussed the challenges and provide several directions for further research on the automatic segmentation of stomatological images.Entities:
Keywords: automatic segmentation; convolutional neural networks; deep learning; stomatological image; transformer
Year: 2021 PMID: 35047964 PMCID: PMC8757832 DOI: 10.3389/fmedt.2021.767836
Source DB: PubMed Journal: Front Med Technol ISSN: 2673-3129
The imaging characteristics, advantages, and disadvantages of different data types and the prospects for clinical application of deep learning.
|
|
|
|
|
|
|---|---|---|---|---|
| Dental X-rays, panoramic radiography | 2D | Easy to operate, low dose, fast imaging | Lack of 3D information | Assisting in diagnosing and screening diseases quickly and accurately. Reducing missed diagnosis and misdiagnosis. |
| CBCT | 3D | High spatial resolution, short exposure time, low effective radiation dose and small metal artifacts | Low density resolution | 1. Rapid and accurate segmentation of teeth or lesions can assist early diagnosis and reduce missed diagnosis and misdiagnosis. |
| MSCT | 3D | High density resolution | Low spatial resolution, long exposure time, high effective radiation dose and large metal artifacts | 1. Reducing the missed diagnosis and misdiagnosis. |
| IOS | Surface 3D data | Obtaining the 3D data of tooth and soft and hard tissue surface in real time | Lack of internal data within soft and hard tissue | 1. Pursing for segmenting tooth accurately. |
Figure 1Task definitions for automatic image segmentation. (A) The original image. (B) Semantic segmentation: it is required to segment the teeth, jaws, and background, without the need to distinguish the individuals in the category “Tooth” or “Jaw.” (C) Instance segmentation: not only the category label is required, but also the instance label among the same class is needed, i.e., separating the individuals in the category “Tooth” or “Jaw”.
Figure 2The overview of automatic segmentation algorithms. (A) For the backbone network, there are CNN-based and Transformer-based methods, the former includes AlexNet, VGG, GoogLeNet, ResNet, DenseNet, MobileNet, ShuffleNet, and EfficientNet, and the latter includes ViT, Data-efficient image Transformers (DeiT), Convolutional vision Transformer (CvT), and Swin-Transformer. (B) For the semantic segmentation, the CNN-based methods include FCN, SegNet, PSPNet, DeepLab (v1, v2, v3, v3+), UNet, VNet, and UNet++, and the Transformer-based methods include SETR, Segmenter, SegFormer, Swin-UNet, Medical Transformer (MedT), UNETR, MBT-Net, TransUNet, and TransFuse. (C) The instance segmentation task also can be categorized into CNN-based and Transformer-based methods. Meanwhile, it can be divided into the detection-based and the detection-free instance segmentation methods, the former is divided into the single-stage (YOLCAT, YOLO, and SSD) and two-stage methods (Mask R-CNN, PANet, Cascade R-CNN, and HTC), and the latter includes SOLO, DWT, and DeepMask. The Transformer-based methods, such as cell-DETR, ISTR, belong to detection-based methods.
Figure 3The development of automatic image segmentation. The black represents CNN-based methods and the red shows Transformer-based methods.
Figure 4Structure of semantic segmentation network. (A) The CNN-based semantic segmentation approach, from UNet. (B) The Transformer-based semantic segmentation approach, from Swin-Transformer.
Figure 5Structure of instance segmentation network. (A) The CNN-based instance segmentation approach, from Mask R-CNN. (B) The Transformer-based instance segmentation approach, from ISTR.
Features of semantic segmentation algorithms.
|
|
|
|---|---|
| FCN ( | The first full convolution network in semantic segmentation task. |
| SegNet ( | Improving the segmentation performance at boundary, reducing the number of model parameters and calculation cost |
| PSPNet ( | Taking the global context information into consideration, improving the segmentation of small objects and co-occurrent categories. |
| DeepLab series ( | V1: enlarging the receptive field by atrous convolution. V2: obtaining multi-scale feature by ASPP module. V3: exploring the effect of atrous convolutions, multi-grid, atrous spatial pyramid pooling, useful for small objects. V3+:utilizing the decoder module to refine the segmentation results especially along object boundaries, which is a faster and stronger encoder-decoder network. |
| UNet ( | It is extremely suitable for segmenting medical images and can train from small-scale dataset with dedicated data augmentation. |
| VNet ( | It is a variant of UNet and suitable for 3D image analysis. |
| UNet++ ( | An advanced UNet structure, improving the performance on objects of varying size by unifying a set of UNet with different depth. |
| SETR ( | A novel and accurate Transformer-based model on semantic segmentation task, without the need for convolution layer and resolution reduction. |
| Segmenter ( | Applying Transformer structure to obtain global context information and achieving SOTA performance on ADE20K dataset. |
| SegFormer ( | Simplifying the design of Transformer-based model, a lightweight multilayer perceptron decoder is proposed to avoid the complex design of decoder, without the need for positional encoding. |
| Swin-UNet ( | A combination of UNet and Swin-Transformer, which is carefully designed for medical image segmentation, achieving high performance with small number of parameters. |
| MedT ( | A Transformer-based medical image segmentation network without pre-training. |
| UNETR ( | Effectively capturing the global and multi-scale information and achieving high performance on 3D brain tumors and spleen tasks. |
| MBT-Net ( | Fully exploiting the global and local context information by Transformer and CNN respectively and achieving high performance on segmenting corneal endothelial cells. |
| TransUNet ( | It combines the advantages of UNet and Transformer structure to make a strong method on many medical applications including multi-organ segmentation and cardiac segmentation tasks. |
| TransFuse ( | It combines Transformers and CNNs in a parallel style, capturing both global and local information respectively, obtaining better results on both 2D and 3D medical image sets including polyp, skin lesion, hip, and prostate segmentation. |
Features of instance segmentation algorithms.
|
|
|
|---|---|
| YOLACT ( | A real-time instance segmentation method, which achieves the mAP of 29.8% and reaches 33 fps on MSCOCO dataset. |
| YOLO ( | It takes object detection tasks as a regression problem to spatially split bounding boxes and class probabilities, reaching very high speed on many tasks while having a comparable mAP. |
| SSD ( | A fast object detection method that predicts bounding box location by regression and object class by classification, reaching faster speeds comparing to Faster-RCNN, without the need for bounding box proposal and pixel/feature resampling. |
| Mask RCNN ( | Adding a mask branch to the detection Fast R-CNN, proposing RoIAlign for feature alignment. |
| PANet ( | It proposes a new feature fusion strategy for multi-scale features and obtains the winner in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. |
| Cascade-RCNN ( | Continuously optimizing the prediction results by cascading several detection networks with different IoU thresholds. |
| HTC ( | Proposing a multi-task and multi-stage hybrid cascade structure and achieve high performance on many tasks. |
| SOLO ( | An end-to-end detection-free instance segmentation method |
| DWT ( | Combining the traditional watershed transform algorithm with the CNN model |
| DeepMask ( | An earlier instance segmentation method, relatively low performance. |
| Cell-DETR ( | The first Transformer-based instance segmentation method for biomedical data and SOTA performance. |
| ISTR ( | It is the first end-to-end Transformer-based framework in instance segmentation task, predicting low-dimensional mask embeddings, and then matching with ground truth mask embeddings for loss computing. |
Semantic segmentation in teeth and related diseases.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Wirtz ( | 2018 | UNet | Panoramic | 24 | Accuracy, Specificity, Precision, Recall, F1-Score, DSC | 0.818, 0.799, 0.790, 0.827, 0.803, 0.744 |
| Koch ( | 2019 | UNet | Panoramic | 1500 | DSC | 0.934 |
| Sivagami ( | 2020 | UNet | Panoramic | 1171 | Accuracy, Specificity, Precision, Recall, F1-Score, DSC | 0.97, 0.95, 0.93, 0.94, 0.93, 0.94 |
| Choi ( | 2016 | FCN | dental X-ray | 475 | F1-score | 0.74, |
| Cui ( | 2021 | ToothPix | Panoramic | 1500 | IOU, Accuracy, Specificity, Precision, Recall, F1-score | 0.9042, 0.9808, 0.9852, 0.9407, 0.9591, 0.9486 |
| Zakirov ( | 2018 | VNet | CBCT | 517 | IOU, Accuracy | 0.963, 0.96 |
| Chen ( | 2020 | FCN+MWT | CBCT | 25 | DSC, Jaccard, RVD, ASSD | 0.936, 0.881, 0.072, 0.363 mm |
| Lee ( | 2020 | CNN | CBCT | 102 | DSC, Recall, Precision | Validation set: 0.938, 0.952, 0.924; |
| Rao ( | 2020 | UNet+DCRF | CBCT | 110 | VD, DSC, ASSD, MSSD | 18.86 mm3, 0.9166, 0.25 mm, 1.18 mm |
| Ezhov ( | 2019 | VNet | CBCT | 935 | IOU, ASD | 0.94, 0.17 mm |
| Zanjani ( | 2019 | PointCNN | IOS | 120 | IOU | 0.94 |
Instance segmentation in teeth and related diseases.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Jader ( | 2018 | mask RCNN | Panoramic | 193 | Accuracy, Specificity, Precision, Recall, F1-score | 0.98, 0.99, 0.94, 0.84, 0.88 |
| Silva ( | 2020 | Mask RCNN, HTC, ResNeSt, PANet (best) | Panoramic | 1,500 | Accuracy, specificity, precision, recall, F1-Score | PANet: 0.967, 0.987, 0.944, 0.891, 0.916 |
| Gurses ( | 2020 | Mask RCNN+ SURF | Panoramic | 580 | Jaccard, Precision, Recall, F1-score, Rank-1 accuracy | 0.82, 0.93, 0.91, 0.95, 0.8039 |
| Wu ( | 2020 | GH + BADice-DenseASPP-UNet + LO | CBCT | 20 | DSC, ASD, FA, DA | 0.962, 0.122, 0.991, 0.995 |
| Cui ( | 2019 | ToothNet | CBCT | 20 | DSC | 0.9264 |
| Zanjani ( | 2021 | Mask-MCNet | IOS | 164 | mIOU, mAP, mAR | 0.98, 0.98, 0.97 |
Semantic segmentation in the jaw.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Kong ( | 2020 | UNet | Panoramic | 2602 | Accuracy, Jaccard, HD, PPS, Para(M) | 0.9928, 0.9829, 8.32, 41.0, 0.92 |
| Li ( | 2020 | Deetal-Perio (based-Mask RCNN) | Panoramic | 470 | mAP, DSC (all), DSC (single), F1-score, Accuracy | Suzhou dataset: 0.826, 0.868, 0.778, 0.878, 0.884;Zhongshan dataset: 0.841, 0.852, 0.748, 0.454, 0.817 |
| Egger ( | 2018 | FCN-32s, FCN-16s, FCN-8s (best) | MSCT | 20 | DSC | FCN-8s: 0.9203 |
| Zhang ( | 2018 | UNet | CBCT, MSCT | CBCT(77), MSCT(30) | DSC, SEN, PPV | Midface: 0.9319, 0.9282, 0.9361, Mandible: 0.9327, 0.9363, 0.9293 |
| Torosdagli ( | 2019 | Tiramisu (based on UNet and DenseNET) | CBCT | 50 | DSC | 0.9382 |
| Lian ( | 2020 | DTNet | CBCT, MSCT | CBCT(77), MSCT(63) | DSC, SEN, PPV | 0.9395, 0.9424, 0.9368 |
The code and the data of works of literature.
|
|
|
|
|
|---|---|---|---|
| Wirtz ( | UNet |
| Their own dataset |
| Koch ( | UNet |
| The dataset created by Gil Silva |
| Sivagami ( | UNet |
|
|
| Choi ( | FCN |
| Their own dataset |
| Cui ( | ToothPix | Not available | lndb dental dataset: |
| Zakirov ( | VNet |
| Their own dataset |
| Chen ( | FCN+MWT |
| Their own dataset |
| Lee ( | CNN | Not available | Their own dataset |
| Rao ( | UNet+DCRF |
| Their own dataset |
| Ezhov ( | VNet |
| Their own dataset |
| Zanjani ( | PointCNN |
| Their own dataset |
| Jader ( | mask RCNN |
|
|
| Silva ( | Mask RCNN, HTC, ResNeSt, PANet (best) |
|
|
| Gurses ( | Mask RCNN+ SURF |
| DS1: |
| Wu ( | GH + BADice-DenseASPP-UNet + LO | Not available | Their own dataset |
| Cui ( | ToothNet | Not available | Their own dataset |
| Zanjani ( | Mask-MCNet | Not available | Their own dataset |
| Kong ( | UNet |
| Their own dataset |
| Li ( | Deetal-Perio (based-Mask RCNN) |
| Suzhou Dataset and Zhongshan Dataset |
| Egger ( | FCN-32s, FCN-16s, FCN-8s (best) |
| Their own dataset |
| Zhang ( | UNet |
| Their own dataset |
| Torosdagli ( | Tiramisu (based on UNet and DenseNET) |
| |
| Lian ( | DTNet | Not available | Their own dataset |