Literature DB >> 35010820

The Effectiveness of Semi-Automated and Fully Automatic Segmentation for Inferior Alveolar Canal Localization on CBCT Scans: A Systematic Review.

Julien Issa¹, Raphael Olszewski^2,3, Marta Dyszkiewicz-Konwińska¹.

Abstract

This systematic review aims to identify the available semi-automatic and fully automatic algorithms for inferior alveolar canal localization as well as to present their diagnostic accuracy. Articles related to inferior alveolar nerve/canal localization using methods based on artificial intelligence (semi-automated and fully automated) were collected electronically from five different databases (PubMed, Medline, Web of Science, Cochrane, and Scopus). Two independent reviewers screened the titles and abstracts of the collected data, stored in EndnoteX7, against the inclusion criteria. Afterward, the included articles have been critically appraised to assess the quality of the studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. Seven studies were included following the deduplication and screening against exclusion criteria of the 990 initially collected articles. In total, 1288 human cone-beam computed tomography (CBCT) scans were investigated for inferior alveolar canal localization using different algorithms and compared to the results obtained from manual tracing executed by experts in the field. The reported values for diagnostic accuracy of the used algorithms were extracted. A wide range of testing measures was implemented in the analyzed studies, while some of the expected indexes were still missing in the results. Future studies should consider the new artificial intelligence guidelines to ensure proper methodology, reporting, results, and validation.

Entities: Chemical

Keywords: CBCT; algorithm; artificial intelligence; inferior alveolar nerve

Mesh：

Year: 2022 PMID： 35010820 PMCID： PMC8744855 DOI： 10.3390/ijerph19010560

Source DB: PubMed Journal: Int J Environ Res Public Health ISSN： 1660-4601 Impact factor: 3.390

1. Introduction

Artificial intelligence (AI) is a broad domain combining the science and engineering of developing intelligent systems and machines [1,2] that can accomplish complex human cognitive functions such as problem-solving, structure and word recognition, and decision making [3]. The AI has become integrated into our daily life directly and indirectly through digital assistance (Apple’s Siri, Google Now, Amazon’s Alexa, Microsoft’s Cortana…), online recommendations (music, products, movies, map navigation, etc.), advertisements, email filtering, smart replies, automatic detection and other essential fields such as medicine where it is in continuous development [4,5,6]. Machine learning, a subdivision of AI, enables algorithms to learn and predict from data patterns, whereas deep learning enables this process using larger raw data [7,8]. In order to make the most accurate knowledge-based decision, higher experience and data analysis are required [9]. Based on this concept, AI is being implemented extensively in medicine, particularly in diagnosis and decision-making [8,9]. Two forms of AI exist in the medical field: virtual (electronic health records, diagnostic and treatment planning software, and others) and physical (robot surgery assistance, smart prostheses, etc.) [1,10]. Moreover, AI applications in dentistry are rapidly growing [11]. They are used for caries detection and diagnosis [12], oral cancer screening [13,14], improvement of brushing method [15], management of dental fear [16], automatic cleaning, shaping, and filling of the root canal [17], differential diagnosis, treatment planning, and detection of anatomical structure on dental radiographic data [18]. The knowledge of dentists about the basics of dental tomography and the use of cone-beam computed tomography (CBCT) remains questionable despite its popularity in dentistry [19] due to the lack of uniformity of the dental curriculum across dental schools worldwide. Particularly, the exclusion of the CBCT topic from undergraduate studies in some countries and the lack of specialists from the oral and maxillofacial radiology in most European countries [19] raised the question of whether, despite the growing number of CBCT machines, dentists are prepared for the diagnostic process [20]. In consequence, dentists seek additional training and are also becoming interested in available tools that could assist them in the process of reporting. Researchers proposed the use of artificial intelligence (AI) as a fast-assisting tool for dentists in reading and reporting two-dimensional (2D) and three-dimensional (3D) radiographic scans [21,22]. The inferior alveolar nerve (IAN) is an essential nerve that resides in the mandibular canal (MC), which is also known as the inferior alveolar canal (IAC), along with the artery and veins [23]. The IAN, as well as the MC, exhibits different path variations [24,25]. In order to avoid any IAN injuries that may vary from temporary nerve numbness with or without paresthesia to permanent nerve paresthesia (with or without trigeminal neuralgia) [26], a proper tracing on the radiographic image could be helpful [27]. In particular, using CBCT that delivers 3D images [28] gives the operator a choice to evaluate the scanned structures from different views, allowing proper assessment of the IAC and tracing of IAN [29]. Hung et al. [30], in their review investigating the clinical applications and diagnostic performance of AI in dental and maxillofacial radiology, emphasized the need for future systematic reviews describing and assessing the value, impact, and reliability of AI in daily practice. Furthermore, as the implementation of AI in dentistry is relatively new, it is essential to investigate its ability to detect or predict disease or confirm physiological presentation, to increase diagnostic test accuracy, and to compare it to a gold standard test [31]. In this review, we aim to present and systematically analyze the effectiveness of semi-automatic and fully automatic methods for IAN/IAC localization together with future recommendations for practitioners and researchers.

2. Materials and Methods

The proposed systematic review is conducted in accordance with Joanna Briggs Institute (JBI) methodology [32] for diagnostic test accuracy as well as in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [33]. The objective of the review is to identify the available semi-automatic and fully automatic algorithms for IAC localization as well as to present their diagnostic accuracy. The component of the mnemonic PIRD [34] (Population, Index test, Reference test, and Diagnosis of interest) were established as follows: Population: CBCT scans of oral and maxillofacial area in humans. Index test: Diagnostic tool based on semi-automatic and fully automatic algorithm. Reference test: Experts judge or manual tracing. Diagnosis of Interest: IAC/IAN localization.

2.1. Searching Strategy

Five different databases (PubMed, Medline, Web of Science, Cochrane, and Scopus) have been searched electronically until the 14 using a complete searching strategy (Table S1). The implemented searching strategy has been developed and customized for each database after a limited primary search, including the following MeSH keywords: “algorithm” OR” algorithm*” OR “artificial intelligence” OR “AI” OR “automatic” OR “automated” OR “semi-automatic” OR “semi-automated” OR “deep learning” OR “Convolutional neural network” OR CNN OR “machine learning” AND “mandibular canal” OR “inferior alveolar canal” OR “inferior alveolar nerve.” All the retrieved articles were imported to EndNote X7 (Clarivate Analytics, PA, USA) library, and library de-duplication was applied according to Bramer et al. [35].

2.2. Eligibility Criteria

The inclusion and exclusion criteria have been based on the mnemonic PIRD [32,34]. The retrospective clinical trials, cross-sectional and case-control studies investigating the accuracy of diagnostic tools based on semi-automatic or fully automatic algorithms on human CBCT scans for tracing the IAN and comparing it to manual techniques performed by the expert judges were included. In contrast, the exclusion criteria include pilot studies, ex-vivo studies, and conference papers. Additionally, studies investigating orthopantomography or computed tomography (CT) scans as well as studies on animals were excluded. (Table 1).

Table 1

Table of Inclusion and Exclusion Criteria.

Inclusion Criteria	Exclusion Criteria
CBCT scans of oral and maxillofacial area for humans	Panoramic and CT scans of oral and maxillofacial area Inhumans
Diagnostic tool based on semi-automatic and fully automatic algorithm	CBCT scans of oral and maxillofacial area in animals
Experts judge or manual technique	Tracing any oral and maxillofacial structure rather than the IAN/IAC
Tracing the IAN/IAC	Pilot, ex-vivo studies, conference paper/review
Retrospective clinical trials, cross-sectional, case-control study	Full text not accessible
Studies published in any language and with the full text is accessible
No date restriction

As the review question is considered innovative and new in the field, no date or language restrictions have been used.

2.3. Study Selection

Two independent reviewers (J.I and M.D.K) screened the title and abstract of the collected data against the inclusive criteria after a pilot test of the method. The potential articles resulting from the primary screening have been kept, and the full text was assessed in detail according to the inclusive criteria by the same reviewers independently. Any disagreements that arise between the two reviewers at any stage of the process were resolved through discussion or with the third reviewer (R.O).

2.4. Critical Appraisal and Data Extraction

Based on the JBI recommendation [32] and Ma et al. review [36], the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) (Table S2) tool has been used to exam the methodology of the included studies against the predefined criterion, with the aim of considering individual sources of risk of bias. The QUADAS-2 question has been answered by ‘Yes’, ‘No’, ‘Unclear’, or, on some occasions, ‘Not applicable’. Before the appraisal process, the reviewers have agreed on specific criteria to be implemented for the inclusion or exclusion of any study from the review; this criterion was then applied consistently across studies. The data extraction was performed by one reviewer (J.I) and evaluated independently by the second reviewer (M.D.K). The extracted data are presented in Table 2. It includes the author(s), year of publication, study location, study methodology, sample size, persons executing and interpreting index tests (numbers, training, and expertise), as well as the following reported values, were extracted according to availability sensitivity, specificity, accuracy, and the agreement level between both methods.

Table 2

Data extracted from included studies. OMF, Oral and Maxillofacial.

Author, Study Location, and Year of Publication	Algorithm	Total Sample	Persons Executing and Interpreting Reference Tests		Software Used for Reference Test Method	Data Sets Used for Training, Validation and Test	Validation Technique	Sensitivity	Specificity	Accuracy	Agreement between Methods
			Number	Expertise
Orhan et al., Turkey, 2021. [37]	U-net-like (Diagnocat ©)	85	1	OMF radiologist	N/A	N/A	N/A	N/A	N/A	N/A	Kappa statistics = 0.762
Liu et al., China, 2021. [38]	Two U-Net, One ResNet-34	229	2	OMF radiologists with 10 years of experience	Manually modification using Multi-Planar Reformation (MPR)	154, 30, 45 (train, valid, test)	Train, validation, and test split	90.2%	95.0%	93.3%	Kendall’s coefficient = 0.901
Bayrakdar et al., Turkey, 2021. [39]	U-net-like, (Diagnocat ©)	75	1	OMF radiologist with 8 years of experience	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Kwak et al., Korea, 2020. [40]	2D SegNet, 2D U-Net, 3D U-Net	102	3	Two trained researchers, One OMF radiologist with 6 years of experience	INVIVO™(Anatomage, San Jose, CA, USA)	6:2:2 (train:valid:test)	Train, validation, and test split	N/A	N/A	96 % (2D SegNet), 84% (2D U-Net), 99% (3D U-Net)	N/A
Jaskari et al., Finland, 2020. [41]	Fully convolutional deep neural network	637	2	OMF radiologist with 34 years experience and resident in dental and maxillofacial radiologist with 10 years of experience	Planmeca Romexis^® 4.6.2.R software	457, 52, 128 (train, valid, test)	Train, validation, and test split	N/A	N/A	90%	N/A
Abdolali et al., Iran, 2016. [42]	Statistical shape models	120	2	Radiologists with at least 10 years of experience	N/A	84 (training set)	Leave-one-out cross-validation	N/A	N/A	N/A	N/A
Bahrampour et al., Iran, 2016 [43]	Automated algorithm	40	2	Maxillofacial radiologists	N/A	N/A	N/A	N/A	N/A	N/A	N/A

3. Results

3.1. Search Result

A total of 990 articles were collected on the 22 of August 2021 from five electronic databases (PubMed, Medline, Web of Science, Cochrane, and Scopus). After the removal of 142 article duplicates, the title and abstract of 848 articles were evaluated against the inclusion and exclusion criteria resulting in 19 articles eligible for full-text assessment. Following full-text evaluations, only seven pieces have been qualified for the systematic review and subjected to final screening using the QUADAS-2 instrument (Figure 1). The inter-reviewer reliability, kappa statistics, K = 0.883 indicate a significant agreement between the reviewers.

Figure 1

PRISMA flow diagram for the systematic reviews, which included searches of databases.

All seven retrospective studies involve a total of 1288 human CBCT scans. Five out of seven studies used convolutional neural network algorithms [37,38,39,40,41], and in the other two studies, one used statistical shape models [42], and the other one tested a new automated method [43]. Despite the progress of AI within oral and maxillofacial radiology, the number of published studies testing AI algorithms for IAN/IANC detection on CBCT scans is relevantly low; from 2016 till the 22 of August 2021, only seven studies have been published and identified. The U-net-like algorithms implemented by Diagnocat software (Diagnocat Inc, West Sacramento, CA, USA) were tested by Orhan et al. [37] and Bayrakdar et al. [39], respectively tested 85 and 75 CBCT scans as sample size. In each study, one oral and maxillofacial radiologist was involved in performing the reference test. Using a total sample size of 637 CBCT scans divided as follows 457 scans for the training set, 52 scans validation set, and 128 CBCT scant as test set, Jaskari et al. [41] tested the fully convolutional deep neural network algorithm. The reference test was carried by one dental and maxillofacial radiologist with 34 years of experience and a resident in oral and maxillofacial radiologist with ten years of experience using Romexis® 4.6.2.R software (Planmeca, Helsinki, Finland) for IAN annotation. Liu et al. [38] used two U-Nets and One ResNet-34 in their proposed approach, consisting of two modules, one for MC and third molar detection while the other for MC and third molar relation classification. The total sample size included a total of 229 CBCT scans divided into 154 scans for training, 30 scans for validation, and the rest 45 scans for testing. Two oral and maxillofacial radiologists with ten years of experience performed the reference test, the modification of the primary segmentation was completed manually using Multi-Planar Reformation (MPR). Kwak et al. [40] tested three different algorithms, 2D SegNet, 2D U-Net, and 3D U-Net, using a total of 102 CBCT scans of patients ranging from 18 to 90 years old. The sample size was split into three sets in the following ratios 6:2:2 (training set: validation set: testing set). The reference test in this study has been performed by two trained researchers and one oral and maxillofacial radiology with six years of experience using INVIVO™ (Anatomage, San Jose, CA, USA). Statistical shape models were tested by Abdolali et al. [42], the sample consisted of 120 CBCT scans, and two radiologists were conducting the reference test. Bahrampour et al. [43] proposed a new automated algorithm and tested it using a sample of 40 CBCT scans. Two maxillofacial radiologists performed the reference test. The number of experts involved in tracing the IAC varied from 1 to 3 evaluators ranging from radiologists, oral maxillofacial radiologists, and residents in oral maxillofacial radiology. The reference test results were then compared to the results of the tested algorithms. The sensitivity (90.2%) and specificity (95%) were only reported in Lui et al. [38] study, while three studies [38,40,41] reported the accuracy without presenting the diagnostic odds. Kappa statistics and Kendall’s coefficient were reported respectively by Orhan et al. [37] (0.762) and Liu et al. [38] (0.901) in their studies to describe the level of agreement between the index and reference test. Liu et al. [38] determined the reliability between the two investigators using Weighted Kappa (0.783) that indicated good results. The extracted data from the studies are described in Table 2.

3.2. Risk of Bias

Based on the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool, all studies demonstrated a low to moderate risk of bias. The detailed quality assessment is shown in Figure 2.

Figure 2

Risk of bias.

4. Discussion

The major weaknesses for most of the selected and analyzed studies were the variation of indexes used for result presentation [37,38,39,40,41,42,43], the absence of clear exclusion criteria [37,38,39,42,43], and poor explanation of the reference test [37,39,42,43]. These weaknesses mainly affect the studies’ duplication process that is essential according to the standards for reporting of diagnostic accuracy studies (STARD) guidelines [44]. The used samples were from the same setting or location [37,39,40], and the accuracy of the training sets haven’t been described extensively [37,39,43]. It is worth noting that accurate results are expected with more extensive training sets because insufficient sample for training may lead to over-fitting and reducing the ability of the algorithm in generalizing unseen data [45]. The inter-observer reliability was only reported in Liu et al. [38] study, using weighted kappa (k = 0.783). It should be emphasized that reporting the inter-rater and the intra-rater reliability would be beneficial to assess the reproducibility of each observer and the overall agreement between observers [46,47]. Analyzing the design, the methodology, and reported results of the seven studies [37,38,39,40,41,42,43], we have noted that the authors did not follow any defined guidelines. The reported accuracy of the diagnostic test in three studies [38,40,41] was given without presenting the diagnostic odds. In contrast, diagnostic values (true positive, false negative, true negative, false positive) are mandatory to ensure a complete evaluation of the test accuracy [48]. Considering the frequent CBCT artifacts (noise, extinction artifacts, beam hardening, scattering, motion artifacts, etc.) and their impact on diagnosing [49], testing the accuracy of the algorithm on a set of CBCT scans including these artifacts is essential for future clinical application. In our review, none of the included studies considered this category in their samples, while Liu et al. [38] excluded blurred CBCT images caused by artifacts. The principal research guidelines didn’t include the AI section as they had been established before the development of AI. This justifies the high frequency of unclear and not applicable answers in our review, to the QUADAS-2 tool questions. For example, the index test section gave 50% of not applicable and 7.14% of unclear answers as the QUADAS-2 tool wasn’t designed to evaluate the risk of bias for AI diagnostic accuracy studies [50]. The number of studies testing the accuracy of the AI in dentistry, especially in oral and maxillofacial radiology, is increasing alongside the addition of the AI sections within the research guidelines. Recently, Sounderajah et al. [51] started developing AI-specific extensions for STARD guidelines, EQUATOR (Enhancing Quality and Transparency of Health Research), and TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis). Furthermore, the AI extension for SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) [52] and CONSORT (Consolidated Standards of Reporting Trials) [53] have been developed, published, and need to be endorsed by journals aiming to improve the quality of dental AI research [54]. A recent checklist by Schwendicke et al. [55], has been published in order to guide researchers, reviewers, and readers.

5. Conclusions

In summary, we encourage researchers to consider the limitations mentioned above as they may lead to bias in evaluating the used algorithm power and to follow the AI guidelines that are consistently updated. Especially in the view of the benefits from implementing AI, which could allow a global uniformity of the dental report and would assist dentists in their efforts, saving their time but keeping the quality for better outcomes. This review could be viewed as a preliminary report to guide researchers while investigating AI in order to obtain accurate results allowing the proper evaluation of the given algorithm.

47 in total

1. Position and course of the mandibular canal in skulls.

Authors: Ayla Ozturk; Anitha Potluri; Alexandre R Vieira
Journal: Oral Surg Oral Med Oral Pathol Oral Radiol Date: 2012-04

2. Variant Inferior Alveolar Nerves and Implications for Local Anesthesia.

Authors: Kevin T Wolf; Everett J Brokaw; Andrea Bell; Anita Joy
Journal: Anesth Prog Date: 2016

Review 3. Dental cone beam CT: An updated review.

Authors: Touko Kaasalainen; Marja Ekholm; Teemu Siiskonen; Mika Kortesniemi
Journal: Phys Med Date: 2021-07-17 Impact factor: 2.685

Review 4. Introduction to artificial intelligence in medicine.

Authors: Yoav Mintz; Ronit Brodie
Journal: Minim Invasive Ther Allied Technol Date: 2019-02-27 Impact factor: 2.442

Review 5. Artificial Intelligence in Surgery: Promises and Perils.

Authors: Daniel A Hashimoto; Guy Rosman; Daniela Rus; Ozanan R Meireles
Journal: Ann Surg Date: 2018-07 Impact factor: 12.969

6. Evaluation of artificial intelligence for detecting impacted third molars on cone-beam computed tomography scans.

Authors: Kaan Orhan; Elif Bilgir; Ibrahim Sevki Bayrakdar; Matvey Ezhov; Maxim Gusarev; Eugene Shumilov
Journal: J Stomatol Oral Maxillofac Surg Date: 2020-12-18 Impact factor: 1.569

7. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences.

Authors: Zachary Munn; Cindy Stern; Edoardo Aromataris; Craig Lockwood; Zoe Jordan
Journal: BMC Med Res Methodol Date: 2018-01-10 Impact factor: 4.615