Literature DB >> 36001628

AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer.

Kritsasith Warin¹, Wasit Limprasert², Siriwan Suebnukarn¹, Suthin Jinaporntham³, Patcharapon Jantana⁴, Sothana Vicharueang⁴.

Abstract

Artificial intelligence (AI) applications in oncology have been developed rapidly with reported successes in recent years. This work aims to evaluate the performance of deep convolutional neural network (CNN) algorithms for the classification and detection of oral potentially malignant disorders (OPMDs) and oral squamous cell carcinoma (OSCC) in oral photographic images. A dataset comprising 980 oral photographic images was divided into 365 images of OSCC, 315 images of OPMDs and 300 images of non-pathological images. Multiclass image classification models were created by using DenseNet-169, ResNet-101, SqueezeNet and Swin-S. Multiclass object detection models were fabricated by using faster R-CNN, YOLOv5, RetinaNet and CenterNet2. The AUC of multiclass image classification of the best CNN models, DenseNet-196, was 1.00 and 0.98 on OSCC and OPMDs, respectively. The AUC of the best multiclass CNN-base object detection models, Faster R-CNN, was 0.88 and 0.64 on OSCC and OPMDs, respectively. In comparison, DenseNet-196 yielded the best multiclass image classification performance with AUC of 1.00 and 0.98 on OSCC and OPMD, respectively. These values were inline with the performance of experts and superior to those of general practictioners (GPs). In conclusion, CNN-based models have potential for the identification of OSCC and OPMDs in oral photographic images and are expected to be a diagnostic tool to assist GPs for the early detection of oral cancer.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36001628 PMCID： PMC9401150 DOI： 10.1371/journal.pone.0273508

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

The power and potential of artificial intelligence (AI) innovations in healthcare are increasingly proven by the desire to improve the quality of clinical care. Novel AI technologies can help clinicians reduce human errors and increase the accurate decision-making with superior outcomes compared to traditional methods [1]. AI applications in head and neck cancer diagnosis have been developed rapidly with reported successes in the initial interpretation of medical images [2]. Among the technological advancements in AI, deep convolutional neural networks (CNN) are the algorithms based on neural networks that mimic the mechanism of human neurons. CNNs are currently being developed as tools to assist clinicians in solving various problems and to increase the accuracy of disease detection in radiographic images and clinical images [3]. The CNN-based algorithms, such as faster R-CNN, ResNet, and DenseNet, have been used to detect and classify lesions in chest x-rays [4] and lesions from clinical images of the skin, cervix, esophagus and larynx, with expert level results [5-8]. The advent of AI technology does not mean the ultimate replacement of clinicians. Instead, it will help clinicians, especially general practitioners (GPs), evaluate and diagnose patients more accurately. According to the global cancer situation, cancer of the oral cavity, like other life-threatening diseases, is a highly relevant global public health problem. Although oral cancers are the 18th most common cancer worldwide, they are a fatal disease which caused over 170,000 deaths in the year 2020 [9]. Oral squamous cell carcinoma (OSCC) is one frequent malignancy in the oral cavity which accounts for about 90% of all oral cancers [10]. Two-thirds of oral cancers have been found in developing and low to middle income countries, especially in Southeast Asia and South Asia [9]. Most cases of OSCC are transformed from oral potentially malignant disorders (OPMDs) of the oral cavity such as erythroplakia, leukoplakia, erythroleukoplakia, oral lichen planus, etc., which have approximately a 1% potential to transform into a malignancy lesion [11]. OPMDs and early stages of OSCC are often asymptomatic and may appear as harmless lesions so they may be easily misrecognized, especially by general practictioners (GPs) [12], which leads to delayed diagnosis. Treatment of oral cancer depends on the cancer staging. The advance stages of oral cancer often involves more invasive treatment which increases morbidity, cost of treatment and significantly impacts the individual’s quality of life [13, 14]. The prognosis of oral cancer worsens in the advanced stages of cancer. The 5-year survival rate of early stage oral cancer is approximately 69.3% but will decrease to 31.2% in the advanced stage [15, 16]. This number has not significantly improved in the past few decades regardless of various treatments [10]. In addition, the cost of treating oral cancer is extremely high, especially in the late stage, which is higher than that of OPMDs and in the early stage approximately 7.25 and 2.75 times, respectively [17]. Therefore, the early detection could reduce the economic burden of oral cancer. Early detection oral cancer, is therefore very important as it not only increases the survival rate but also improves the quality of life of patients. The aim of this study is to evaluate the performance of CNN-based algorithms for the classification and detection of OPMD and OSCC in oral photographic images, and compare the automatic classification performance of these algorithms to experts (board-certified oral and maxillofacial surgeons) and GPs. These automatic models, combined with clinical data, are expected to provide a new diagnostic tool for GPs to improve the accuracy of early detection of cancerous lesions and to support expert-level decision making in the oral cancer screening program.

Materials and methods

Data description

This study was approved by the Human Research Ethics Committee of Thammasat University (COE 020/2563) and was performed in accordance with the tenets of the Declaration of Helsinki. Informed consent was waived because of the retrospective nature of the fully anonymized images. All clinical oral photographs analyzed in this study were collected retrospectively from the Oral and Maxillofacial Surgery Center of Thammasat University and Khon Kaen University for a period from January 2009 to December 2020. The oral photographic images were captured from various oral cavity areas. The images were of varying resolutions, the largest was 4496 x 3000 pixels and the smallest was 1081 x 836 pixels. The dataset of 980 images was divided into 365 images of OSCC, 315 images of OPMDs and 300 images of non-pathological oral images. The non-pathological oral images were defined as an image of oral mucosa which showed no pathological lesions, e.g., pigmented lesions, OPMDs and malignant lesions. The reference data used in this study were clinical oral photographs of OSCC, OPMDs and non-pathological oral images which were located in various areas of the oral cavity including buccal mucosa, tongue, upper /lower alveolar ridge, floor of mouth, retromolar trigone and lip. All of the OSCC and OPMDs images were biopsy proven confirmed by oral pathologists as the gold standard for diagnosis. The OSCC images, which are OSCC stage I-IV according to the TNM clinical staging system as proposed by the American Joint Committee on Cancer (AJCC) [18], and OPMDs images used for analysis in this study were oral leukoplakia, erythroplakia, erythroleukoplakia, white striae and erythematous lesion surrounded with white striae with the pathological results of mild, moderate and severe epithelial dysplasia, hyperkeratosis and oral lichen planus.

Experiment

All photographic images were uploaded to the VisionMarker server and web application for image annotation (Digital Storemesh, Bangkok, Thailand). The public version is available on GitHub (GitHub, Inc., CA, USA). The lesion boundaries of the OSCC and OPMDs images were annotated by three oral and maxillofacial surgeons. Owing to the differences in manual labeling from one surgeon to another, the ground truth used was the largest area of intersection between all of the surgeons’ annotations in the CNN training, validation and testing (Fig 1).

Fig 1

Examples of the OSCC and OPMDs images from the dataset showing.

(A) OSCC image; (B) annotation of OSCC image by surgeons; (C) OPMDs image; (D) annotation of OPMDs image by surgeons.

Examples of the OSCC and OPMDs images from the dataset showing.

(A) OSCC image; (B) annotation of OSCC image by surgeons; (C) OPMDs image; (D) annotation of OPMDs image by surgeons.

Image classification

Image classification refers to computer algorithms that can classify an image into a certain class according to its visual content. In this work, the CNN-based image classification networks, DenseNet-169, ResNet-101, SqueezeNet and Swin-S, were adopted to create the multiclass image classification model of “OSCC” and “OPMDs” apart from non-pathological oral images on oral photographic images. The image classification experiment was tested on Google Colab (Google Inc., CA, USA) using a Tesla P100, Nvidia driver: 460.32 and CUDA: 11.2 (Nvidia Corporation, CA, USA). The images were preprocessed by augmentation using Keras ImageDataGenerator (open-source software) then the framework resized input images to 224 x 224 pixels to feed into a neural network. The neural network architectures in this experiment are DenseNet-169, ResNet-101, SqueezeNet and Swin-S. DenseNet-169 and ResNet-101 are pre-trained weight from ImageNet except SqueezeNet and Swin-S which are pre-trained from scratch. The DenseNet-169, ResNet-101, SqueezeNet and Swin-S were modified to have 2-dimension output vectors, for multiclass: OSCC, OPMDs and non-pathological oral image, with softmax activation function. The hyper parameters used in this study were as follows: maximum number of epochs was 43, batch size of 32 and learning rate was 0.00001, except for Swin-S which had maximum number of epochs of 100 and batch size of 16. The validation loss was very close to the training loss, and there was no significant indication of over-fitting. The details of each image classification algorithm were as follows: Densely Connected Convolutional Networks (DenseNet) was proposed by Huang et al. [19] as a CNN-based classification algorithm which connects all layers (with matching feature-map sizes) directly with each other. DenseNet exploits the potential of the network through feature reuse, yielding condensed models that are easy to train and highly parameter efficient which is a good feature extractor for various computer vision tasks that build on convolutional features. Residual Networks (ResNet) was developed by He et al. [20] as an architecture that is implemented by reformulating the layers as learning residual functions with reference to the layer inputs. This residual learning framework can gain more accuracy of object classification from considerably increased depth, producing results substantially better than previous networks. SqueezeNet was proposed by Iandola et al. [21] as a small CNN architecture with model compression techniques to less than 0.5 MB by decreasing the quantity of parameters and maximizing accuracy on a limited budget of parameters. SqueezeNet had 50x fewer parameters than a previous CNN, AlexNet, but maintained AlexNet-level accuracy. Swin Transformer (Swin) was presented by Liu et al. [22] as a new vision transformer which produces a hierarchical feature representation and has linear computational complexity with respect to input image size. The design of Swin as a shifted window based self-attention is shown to be effective and efficient on image classification.

Object detection

Detection of lesions is another key to success in disease diagnosis. The CNN-based object detection is shown to be effective in identifying disease in the image. In this study, Faster R-CNN, YOLOv5, RetinaNet and CenterNet2 were adopted to detect the OSCC and OPMDs lesions in oral photographic images. The object detection experiment used the annotated image from VisionMarker (Digital Storemesh, Bangkok, Thailand). The annotated images were identified by bounding boxes showing locations of the lesion areas; then the pairs of image and annotation were ready for the training process. The image was preprocessed by augmentation using Keras ImageDataGenerator (open-source software) then the framework resized an input image to 256 x 256 pixels, except YOLOv5 which resized an input image to 640 x 640 pixels, to feed into a neural network. The training was performed on an on-premise server with 2 of GPU, TitanXP 12GB, Nvidia driver: 450.102 and CUDA: 11.0 (Nvidia Corporation, CA, USA). The neural network architectures were Faster-R-CNN, YOLOv5, RetinaNet and CenterNet2 with the pre-trained weight from COCO Detection. All the networks were trained using stochastic gradient descent (SGD). The hyper parameters used in this study were as follows: 20,000 iterations, maximum number of epochs was 1,882, learning rate of 0.0025 and batch size per image of 128, except for YOLOv5 which had maximum number of epochs of 200, learning rate of 0.01 and batch size per image of 8. The training loss was reduced and maintained between 15,000 and 20,000 iterations. The details of each object detection algorithm were as follows: Faster regional convolutional neural network (Faster R-CNN) was introduced by Ren et al. [23] as a CNN-based object detection framework. This algorithm is the combination of the previous object detection system, Fast R-CNN, and Region Proposal Networks (RPNs) into a single network to share their convolutional features leading to a more real-time object detection method. This design significantly improved the speed and accuracy in the object detection compared to basis R-CNN. Faster R-CNN is the very early object detection proposed to tackle both the localization and classification problems in a single deep learning network so the visual kernel can be computed once for both problems in a single deep neural network forward operation, also known as end-to-end. The input image has passed to CNN network such as VGG network to get the internal latent tensor (intermediate layer) then sends the tensor to two separate subnetworks; first subnetwork performing bounding box location regression and also computing the classification in the second subnetwork. Where the loss function is defined as , where L is the total loss, i is the index of an anchor in a mini-batch, Ncls is the number of possible sub-image from sliding window, Lcls is log loss of classification, λ is a hyperparameter to balance the two loss functions, Nreg is the number of anchor locations and Lreg is a loss function for location regression computed from the robust loss function (smooth L1) [24]. You only look once (YOLO) was proposed by Redmon et al. [25] as a CNN-based object detection algorithm which reframes as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. The YOLO design enables end-to-end training and realtime speeds while maintaining high average precision. Due to early success of Faster R-CNN in terms of high accuracy baseline, YOLO tackled another aspect of object deletion problem by dramatically increasing the frame-rate at 45 frames per second on a Titan X GPU (Nvidia Corporation, CA, USA). The intersection over union metric (IoU) is emphasized in this work to make the region proposal generation bounding box location more accurate by reframing object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities resulting in less computation and having high frame rate performance. RetinaNet was proposed by Lin et al. [26] as a simple one-stage object detector with a new loss function that acts as a more effective alternative to previous algorithms for dealing with class imbalance. This design achieves state of-the-art accuracy and speed for the object detection. Introduced a novel loss function by adding Focal Loss function to original cross entropy to improve accuracy of dense object detectors. Furthermore the RetinaNet architecture adopts Feature Pyramid Network (FPN), which is based on top-down pathway to allow the top level feature to laterally connect to the feature extraction of each layer leading to multi scale feature extraction capability therefore the RetinaNet able to detect smallest and biggest objects effectively. CenterNet2 was developed by Zhou et al. [27] as a probabilistic interpretation of two-stage detectors. This algorithm was designed as a simple modification of standard two-stage detector training by optimizing a lower bound to a joint probabilistic objective over both stages which achieved desirable speed and accuracy for the object detection. The CentetNet revisited the two stage object detection model, where the first stage is to compute the probability of an object in the observation image also called object likelihood to get the bounding box and the second step is to classify the object. The major difference of the CenterNet2 is applying object likelihood and conditional probability to classification P(C) = P(C|O)P(O), where k is index of detection bounding box P(O) is first-stage object likelihood, P(C|O) is conditional probability the given object be the class C and P(C) is the probability of bounding box k be the class C. To evaluate the performance of the image classification and object detection networks, five-fold cross-validation was employed. Data elements were split into 5 subsets using random sampling with equal numbers of OSCC, OPMDs and non-pathological oral images. Then, one subset was considered as a testing set, while the remaining four subsets were used as training and validation sets. This process was repeated 5 times to involve all subsets as testing sets.

Evaluation measures

The metrics used to evaluate the machine learning algorithms in bioinformatics were used in this study [28]. The CNN-based image classification models were evaluated using the precision, recall (sensitivity), specificity, F1 score, and area under the receiver operating characteristics curve (AUC of ROC) to measure the performance in classifying OSCC and OPMDs on the oral photographic images. The classification performance of models was also evaluated by generating a heat map visualization using the gradient-weighted class activation mapping (Grad-CAM) [29] to see how the models classify and identify OSCC and OPMDs on photographic images. For the object detection, the performance of the CNN-based object detection models was evaluated to detect a bounding box relative to the ground truth region in the OSCC and OPMDs images by the precision, recall, F1 score and AUC of precision-recall curve. If the IoU value between the generated bounding box and the ground truth was less than 0.5, then the produced bounding box was considered to be a false prediction or false positive. A test dataset with known pathological results was evaluated to compare the performance of the CNN-based classification models with that of 20 clinicians; 10 board certified oral and maxillofacial surgeons and 10 GPs who have at least 2 years of experience in dental practice in rural hospitals. None of these readers participated in the clinical care or assessment of the enrolled patients, nor did they have access to their medical records. The overall sensitivity and specificity of these clinicians were calculated. Data analyses were conducted using SPSS version 22.0 (SPSS, Chicago, IL). The statistical analysis for image classification and object detection was calculated as follows: True positive (TP): positive outcomes that the model predicted correctly which IoU > 0.5. False positive (FP): positive outcomes that the model predicted incorrectly which IoU < 0.5. True negative (TN): negative outcomes that the model predicted correctly. False negative (FN): negative outcomes that the model predicted incorrectly.

Results

Image classification results

The evaluation of multiclass images was performed on the test set and the results of the CNN-based image classification models are reported in Table 1. The image classification of CNN-based image classification models achieved a precision, a recall (sensitivity), a specificity, an F1 score and AUC of ROC curve as seen in Table 2. The overall sensitivity and specificity for the classification by the ten oral and maxillofacial surgeons of OCSS were 0.90 (95%CI = 0.85–0.96) and 0.89 (95%CI = 0.81–0.97) and OPMDs were 0.74 (95%CI = 0.61–0.87) and 0.93 (95%CI = 0.90–0.96), respectively. In addition, the overall sensitivity and specificity for the classification by the ten GPs of OCSS were 0.77 (95%CI = 0.70–0.85) and 0.87 (95%CI = 0.85–0.90) and OPMDs were 0.68 (95%CI = 0.62–0.75) and 0.86 (95%CI = 0.82–0.90), respectively. Fig 2 shows an example of the Grad-CAM visualization of the DenseNet-169 output of OSCC and OPMDs classes which shows that the model correctly classifies and identifies OSCC and OPMDs on photographic images.

Table 1

Multi-class image classification performances of CNN algorithms on the test sets compared with the average performance of clinicians; ‘oral and maxillofacial surgeons’ vs. ‘GPs’.

	Class
	OSCC					OPMDs
	Precision	Recall (Sensitivity)	Specificity	F1 score	AUC of ROC curve	Precision	Recall (Sensitivity)	Specificity	F1 score	AUC of ROC curve
DenseNet-169	0.98	0.99	0.99	0.98	1.0	0.95	0.95	0.97	0.95	0.98
ResNet-101	0.96	0.92	0.94	0.94	0.99	0.97	0.97	0.94	0.97	0.97
SqueezeNet	0.85	0.72	0.92	0.78	0.88	0.76	0.78	0.88	0.77	0.87
Swin-S	0.69	0.73	0.83	0.71	0.71	0.63	0.74	0.88	0.68	0.80
Oral and maxillofacial surgeons	-	0.90	0.89	-	-	-	0.74	0.93	-	-
GPs	-	0.77	0.87	-	-	-	0.68	0.86	-	-

AUC, area under the curve; ROC, receiver operating characteristics; GPs, General practictioners.

Table 2

Multi-class object detection performances of CNN algorithms on the test sets.

	Class
	OSCC				OPMDs
	Precision	Recall (Sensitivity)	F1 score	AUC of precision—recall curve	Precision	Recall (Sensitivity)	F1 score	AUC of precision—recall curve
Faster R-CNN	0.84	0.90	0.87	0.88	0.60	0.71	0.65	0.64
YOLOv5	0.88	0.86	0.87	0.84	0.74	0.39	0.51	0.34
RetinaNet	0.98	0.82	0.89	0.81	0.92	0.57	0.70	0.55
CenterNet2	0.64	0.92	0.76	0.91	0.49	0.60	0.54	0.58
Oral and maxillofacial surgeons	-	0.90	-	-	-	0.74	-	-
GPs	-	0.77	-	-	-	0.68	-	-

AUC, area under the curve; GPs, General practictioners

Fig 2

Example of the Grad-CAM visualization of the DenseNet-169.

Example of the Grad-CAM visualization of the DenseNet-169.

(A) Image with OSCC lesion; (B) The model correctly classified OSCC and labeled the correct location. (C) Image with OPMDs lesion (D) The model correctly classified OPMDs and labeled the correct location. AUC, area under the curve; ROC, receiver operating characteristics; GPs, General practictioners. AUC, area under the curve; GPs, General practictioners

Object detection results

The object detection models were evaluated on the test set and the results are reported in Table 2. The detection performance of CNN-based object detection models achieved a precision, a recall, an F1 score and AUC of precision-recall curve as shown in Table 2. Examples of detection outputs from CNN-based object detection models in this study are provided in Fig 3.

Fig 3

(A-B) Bounding box ground truth based on surgeons’ annotations of the imaging of the patient with OSCC at retromolar trigone and lateral tongue, respectively; (C-D) Bounding box ground truth based on surgeons’ annotations of the imaging of the patient with OPMDs at retromolar trigone and lateral tongue, respectively; (E-H) The true positive outputs from the faster R-CNN detection; (I-L) The true positive outputs from the YOLOv5 detection; (M-P) The true positive outputs from the RetinaNet detection; (Q-T) The true positive outputs from the CenterNet2 detection.

Discussion

Oral cancer screening is an important part of an oral examination, the goal of which is to identify changes and the development of oral cancer. It is commonly known that OSCC, the most common oral cancer, is often preceded by OPMDs [11]. Patients with oral lesions are often first seen by GPs, both medical and dental. Therefore, GPs are in a unique position to detect oral cancer at early stages. Nevertheless, several studies indicated that the GPs’ s lack knowledge and awareness in the area of oral cancer diagnosis, especially an early sign of oral cancer, is the most significant factor in delaying referral and treatment of oral cancer [30, 31]. Delay in diagnosing oral cancer may lead to more invasive treatment resulting in greater morbidity of oral functions, such as distortions of speech, chewing and swallowing, which will have a significant impact on individual’s quality of life [13]. Usually, when diagnosed at an advanced stage, less than 50% of oral cancer patients survive more than 5 years. This rate has remained disappointingly low and relatively constant during the last few decades [10, 15]. Therefore, the early detection of oral cancer, especially OPMDs or early stage OSCC, with appropriate referral to specialists is crucial to control the disease and improve the survival rate and quality of life of patients. Screening of oral cancer is largely based on visual examination. The current adjunctive diagnostic aids for oral cancer screening include oral cytology, vital staining with toluidine blue and light detector systems, e.g., VELscope. But no technology provides definitive evidence to suggest that it improves the sensitivity or specificity of oral cancer screening beyond oral examination [32]. In recent year, AI techniques have improved performance in areas of image analysis with a range of promising applications in medicine. The flood of medical data in the form of image data and learning algorithms is accelerating the development of AI-based image analysis that promises to improve efficiency, effectiveness and speed of diagnosis enabling new insights about diagnoses, treatment options and patient outcomes [33]. Advances in computer vision and AI technology that improve visual detection can be used to assist visual examination combined with clinical data as a novel diagnostic tool in the oral cancer screening system. In this work, the performance of CNN-based image classification models works well to identify the OSCC and OPMDs. The results, particularly in DenseNet-169 and ResNet-101, achieved near-perfect AUC and showed performance similar to the classification of multiclass image of OSCC and OPMDs on oral images as a CNN model of the study of Fu et al. [34], Tanriver et al. [35] and Song et al. [36] but more accurate than the studies of Welikala et al. [37]. The difference in the performance of models may be from variations in the class distribution of each study. DenseNet-169 and ResNet-101 are a series of well-optimized algorithms, which achieve high performance in image classification, and are widely used in the medical field. However, the DenseNet-169 and ResNet-101 algorithms were a large CNN architecture and required a high-performance computing server for the image classification processing which may not be appropriate for use in a mobile application for oral cancer screening. Therefore, this work selected new and smaller CNN models, SqueezeNet and Swin-S, to test the classification performance of OSCC and OPMDs on oral photographic images. SqueezeNet and Swin-S showed acceptable accuracy and achieved an AUC of 0.71–0.88 which may have inferior performance than DenseNet-169 and ResNet-101. But the small size architecture of these models was more suited for developing into a mobile application for oral cancer screening. In the medical field, there was a study that successfully used SqueezeNet for the diagnosis of the coronavirus disease 2019 (COVID-19) from chest X-ray images [38]. To the best of our knowledge, this is the first study to use Swin-S for classification of oral lesions. Previous studies [35–37, 39, 40] have demonstrated the potential for classification performance of various CNN-based algorithms without comparison with the clinician’s clinical diagnostic decision of oral lesions on photographic images. The strength of this study was the use of histopathologic determination as the ground truth. The results showed that these CNN-based classification models yield a classification performance of OSCC and OPMDs on oral photograph equal to expert level (board certified oral and maxillofacial surgeons) and superior to GP level. Moreover, DenseNet-169 and ResNet-101 even outperformed expert-level classification performance. For the detection of oral lesions, the CNN-based object detection used in this study showed good performance in the detection of OSCC and OPMDs on photographic images which achieved AUC of 0.81–0.91 and 0.34–0.64 in the detection of OSCC and OPMDs, respectively. One of the generally CNN-based object detection algorithms used in medical images, the faster R-CNN achieved high performance in the detection of OSCC and OPMDs with AUC of 0.88 and 0.64, respectively. The faster R-CNN detection performance in this study achieved higher precision, recall and F1 score than the previous study of Welikala et al. [37] for detecting the OSCC and OPMDs on oral photographs which may be from the different number of classes in the study. Nowadays, there is a continuous development of CNN-based object detection to increase the accuracy of detection of the interested object. CenterNet2, one of the latest CNN-based object detection models, achieved the highest performance in detection of OSCC, an AUC of 0.91, but was slightly inferior to faster R-CNN for detecting of OPMDs. The overall OPMDs performance in detection in this study was not as good as the detection of OSCC which may result from the general characteristics of OPMDs in the oral cavity which make them difficult to recognize, even by the expert. The lowest performance model in detection of OPMDs is YOLOv5 which achieved a precision of 0.34, a recall of 0.39, a F1 score of 0.51 and an AUC of 0.34. Even so the results were comparable to those of the study by Tanriver et al. [35] This may be due to YOLOv5 being an extremely fast detection model with an operating time of only 0.07 seconds per frame [25]. A high-speed model of this type may not be appropriate for detecting the features of OPMDs on oral images. Deep CNN models have potential for binary classification and detection of OPMDs [39] and OSCC [40] in oral photographs. In the real-world scenario, the clinical characteristics of OPMDs can show considerable variation which can mimic the likelihood of malignancy, and vice versa. In this regard, multiclass classification and object detection were explored using several CNN-based algorithms in this study. The AUC of the best multiclass CNN models yielded results comparable to those of binary classification and detection. As the focus of AI is shifting from model/algorithm development to the quality of the data used to train the models [41], this study has limitations that need to be addressed. First, the dataset was small and only included OSCC and OPMDs images. And second, the process of labeling lesions on oral photographic images required experts to identify the ground truth on the images, which was time consuming. For future work, we plan to develop the CNN-based mobile application to collect more data and expand the image dataset to include other oral lesions such as pigmented lesions and submucosal lesions, from the multi-cancer center and hospitals in a remote area. In addition, we plan to develop the system integrated into the clinical workflow to allow the experts to label the ground truth of the lesion in the image. This not only saves time on the labeling process, but also increases the chances of the experts to thoroughly study the details of the lesion in the image.

Conclusions

CNN-based models showed comparable diagnostic performances to expert level in classifying OSCC and OPMDs on oral photographic images. In particular, DenseNet-169 and ResNet-101 even outperformed expert-level classification performance. This is expected to be a novel innovation as a diagnostic tool to assist clinicians, especially GPs, in improving the accuracy of early detection of cancerous lesions and support expert-level decision making in the oral cancer screening program.

The receiver operating characteristic (ROC) curve of high performance CNN-based multiclass classification models.

(PDF) Click here for additional data file.

Normalized confusion matrix of high performance CNN-based multiclass classification models.

(PDF) Click here for additional data file.

The precision-recall curve of CNN-based object detection models.

(PDF) Click here for additional data file.

Object detection matrix of CNN-based object detection models.

(PDF) Click here for additional data file. 27 Apr 2022

PONE-D-22-06226

AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer

PLOS ONE Dear Dr. Warin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Ayan Seal, Ph.D Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1.The study presents the results of primary scientific research. 2.Results reported have not been published elsewhere. 3.Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail. 4.Conclusions are presented in an appropriate fashion and are supported by the data. 5.The article is presented in an intelligible fashion and is written in standard English. 6.The research meets all applicable standards for the ethics of experimentation and research integrity. 7.The article adheres to appropriate reporting guidelines and community standards for data availability. Reviewer #2: Briefly try to describe the models which are used in object detection by what all layers they comprise of, and in what fashion. If possible try to highlight the importance of one or more layers and in what way they are supportive for your case. Also try to give some mathematical expression/equations support to your research. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Dr. Rajashekhargouda C. Patil [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

5 Jul 2022 Respond to Reviewers Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Response: Thanks for your suggestion, we checked the references to ensure they meet the requirements of the journal. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1.The study presents the results of primary scientific research. 2.Results reported have not been published elsewhere. 3.Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail. 4.Conclusions are presented in an appropriate fashion and are supported by the data. 5.The article is presented in an intelligible fashion and is written in standard English. 6.The research meets all applicable standards for the ethics of experimentation and research integrity. 7.The article adheres to appropriate reporting guidelines and community standards for data availability. Response: We appreciated the reviewer for the comments. Reviewer #2: Briefly try to describe the models which are used in object detection by what all layers they comprise of, and in what fashion. If possible try to highlight the importance of one or more layers and in what way they are supportive for your case. Also try to give some mathematical expression/equations support to your research. Response: We thank the reviewer for the comment, we have added more detail on the object detection algorithms layer, more reference (ref 24) and mathematical expressions/equations in the Object detection subsection of the Materials and Methods section. Line 181: Faster R-CNN is the very early object detection proposed to tackle both the localization and classification problems in a single deep learning network so the visual kernel can be computed once for both problems in a single deep neural network forward operation, also known as end-to-end. The input image has passed to CNN network such as VGG network to get the internal latent tensor (intermediate layer) then sends the tensor to two separate subnetworks; first subnetwork performing bounding box location regression and also computing the classification in the second subnetwork. Where the loss function is defined as L=(1)/Ncls ∑_i▒〖Lcls〗_i + λ(1)/Nreg ∑_i▒〖Lreg〗_i , where L is the total loss, i is the index of an anchor in a mini-batch, Ncls is the number of possible sub-image from sliding window, Lcls is log loss of classification, λ is a hyperparameter to balance the two loss functions, Nreg is the number of anchor locations and Lreg is a loss function for location regression computed from the robust loss function (smooth L1) [24]. Reference 24: Girshick R, editor Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV); 2015 7-13 Dec. 2015. Line 196: Due to early success of Faster R-CNN in terms of high accuracy baseline, YOLO tackled another aspect of object deletion problem by dramatically increasing the frame-rate at 45 frames per second on a Titan X GPU (Nvidia Corporation, CA, USA). The intersection over union metric (IoU) is emphasized in this work to make the region proposal generation bounding box location more accurate by reframing object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities resulting in less computation and having high frame rate performance. Line 206: Introduced a novel loss function by adding Focal Loss function to original cross entropy to improve accuracy of dense object detectors. Furthermore the RetinaNet architecture adopts Feature Pyramid Network (FPN), which is based on top-down pathway to allow the top level feature to laterally connect to the feature extraction of each layer leading to multi scale feature extraction capability therefore the RetinaNet able to detect smallest and biggest objects effectively. Line 215: The CentetNet revisited the two stage object detection model, where the first stage is to compute the probability of an object in the observation image also called object likelihood to get the bounding box and the second step is to classify the object. The major difference of the CenterNet2 is applying object likelihood and conditional probability to classification P(〖C〗_k)=P(〖C〗_k|〖O〗_k)P(〖O〗_k), where k is index of detection bounding box P(〖O〗_k) is first-stage object likelihood, P(〖C〗_k|〖O〗_k) is conditional probability the given object be the class 〖C〗_k and P(〖C〗_k) is the probability of bounding box k be the class 〖C〗_k. Submitted filename: Response to Reviewers.docx Click here for additional data file. 10 Aug 2022 AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer PONE-D-22-06226R1 Dear Dr. Warin, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Ayan Seal, Ph.D Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Rajashekhargouda C. Patil ********** 16 Aug 2022 PONE-D-22-06226R1 AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer Dear Dr. Warin: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Ayan Seal Academic Editor PLOS ONE

28 in total

1. Oral potentially malignant disorders: A comprehensive review on clinical aspects and management.

Authors: Saman Warnakulasuriya
Journal: Oral Oncol Date: 2020-01-22 Impact factor: 5.337

2. Role of general practice in the diagnosis of oral cancer.

Authors: Timothy Crossman; Fiona Warburton; Michael A Richards; Helen Smith; Amanda Ramirez; Lindsay J L Forbes
Journal: Br J Oral Maxillofac Surg Date: 2015-12-10 Impact factor: 1.651

Review 3. The cost-effectiveness of screening for oral cancer in primary care.

Authors: P M Speight; S Palmer; D R Moles; M C Downer; D H Smith; M Henriksson; F Augustovski
Journal: Health Technol Assess Date: 2006-04 Impact factor: 4.014

Review 4. Current concepts in management of oral cancer--surgery.

Authors: Jatin P Shah; Ziv Gil
Journal: Oral Oncol Date: 2008-07-31 Impact factor: 5.337

5. The Clinical Presentation of Oral Potentially Malignant Disorders.

Authors: Neal J Mccormick; Peter J Thomson; Marco Carrozzo
Journal: Prim Dent J Date: 2016-02-01

Review 6. Artificial Intelligence-based methods in head and neck cancer diagnosis: an overview.

Authors: Hanya Mahmood; Muhammad Shaban; Nasir Rajpoot; Syed A Khurram
Journal: Br J Cancer Date: 2021-04-19 Impact factor: 9.075

7. Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images.

Authors: Hao Xiong; Peiliang Lin; Jin-Gang Yu; Jin Ye; Lichao Xiao; Yuan Tao; Zebin Jiang; Wei Lin; Mingyue Liu; Jingjing Xu; Wenjie Hu; Yuewen Lu; Huaifeng Liu; Yuanqing Li; Yiqing Zheng; Haidi Yang
Journal: EBioMedicine Date: 2019-10-05 Impact factor: 8.143

8. Classification of imbalanced oral cancer image data from high-risk population.

Authors: Bofan Song; Shaobai Li; Sumsum Sunny; Keerthi Gurushanth; Pramila Mendonca; Nirza Mukhia; Sanjana Patrick; Shubha Gurudath; Subhashini Raghavan; Imchen Tsusennaro; Shirley T Leivon; Trupti Kolur; Vivek Shetty; Vidya Bushan; Rohan Ramesh; Tyler Peterson; Vijay Pillai; Petra Wilder-Smith; Alben Sigamani; Amritha Suresh; Moni Abraham Kuriakose; Praveen Birur; Rongguang Liang
Journal: J Biomed Opt Date: 2021-10 Impact factor: 3.758

9. International evaluation of an AI system for breast cancer screening.

Authors: Scott Mayer McKinney; Marcin Sieniek; Varun Godbole; Jonathan Godwin; Natasha Antropova; Hutan Ashrafian; Trevor Back; Mary Chesus; Greg S Corrado; Ara Darzi; Mozziyar Etemadi; Florencia Garcia-Vicente; Fiona J Gilbert; Mark Halling-Brown; Demis Hassabis; Sunny Jansen; Alan Karthikesalingam; Christopher J Kelly; Dominic King; Joseph R Ledsam; David Melnick; Hormuz Mostofi; Lily Peng; Joshua Jay Reicher; Bernardino Romera-Paredes; Richard Sidebottom; Mustafa Suleyman; Daniel Tse; Kenneth C Young; Jeffrey De Fauw; Shravya Shetty
Journal: Nature Date: 2020-01-01 Impact factor: 49.962

10. Automated Detection and Classification of Oral Lesions Using Deep Learning to Detect Oral Potentially Malignant Disorders.

Authors: Gizem Tanriver; Merva Soluk Tekkesin; Onur Ergen
Journal: Cancers (Basel) Date: 2021-06-02 Impact factor: 6.639