| Literature DB >> 35963880 |
Ashley Ferro1,2, Sanjeev Kotecha1,2, Kathleen Fan3,4.
Abstract
Machine learning (ML) algorithms are becoming increasingly pervasive in the domains of medical diagnostics and prognostication, afforded by complex deep learning architectures that overcome the limitations of manual feature extraction. In this systematic review and meta-analysis, we provide an update on current progress of ML algorithms in point-of-care (POC) automated diagnostic classification systems for lesions of the oral cavity. Studies reporting performance metrics on ML algorithms used in automatic classification of oral regions of interest were identified and screened by 2 independent reviewers from 4 databases. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed. 35 studies were suitable for qualitative synthesis, and 31 for quantitative analysis. Outcomes were assessed using a bivariate random-effects model following an assessment of bias and heterogeneity. 4 distinct methodologies were identified for POC diagnosis: (1) clinical photography; (2) optical imaging; (3) thermal imaging; (4) analysis of volatile organic compounds. Estimated AUROC across all studies was 0.935, and no difference in performance was identified between methodologies. We discuss the various classical and modern approaches to ML employed within identified studies, and highlight issues that will need to be addressed for implementation of automated classification systems in screening and early detection.Entities:
Mesh:
Year: 2022 PMID: 35963880 PMCID: PMC9376104 DOI: 10.1038/s41598-022-17489-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1PRISMA flow diagram for study selection.
Summary of identified studies using clinical photography as the screening modality.
| Study | Data source | ML classification methods | Performance metrics | Outcomes (best performing ML) |
|---|---|---|---|---|
| Fu et al.[ | Heterogenous dataset from both smartphones and SLR cameras | NN based on DenseNet121 architecture, pre-trained on ImageNet | Sensitivity Specificity Accuracy AUROC t-SNE | Sensitivity 89.6 Specificity 80.6 Accuracy 84.1 AUROC 0.935 |
| Welikala et al.[ | Smartphone images of oral lesions as part of the MeMoSA initiative | NN based on ResNet101 architecture, pre-trained on ImageNet | Sensitivity Precision F1 | Sensitivity 89.51 Precision 84.77 F1 87.07 |
| Jubair et al.[ | Heterogenous dataset from both smartphones and SLR cameras | NN based on EfficientNet architecture, pre-trained on ImageNet | Sensitivity Specificity Accuracy AUROC | Sensitivity 86.7 Specificity 84.5 Accuracy 85.0 AUROC 0.928 |
| Shamim et al.[ | Images extracted directly from search engines | Multiple pre-trained NNs. Best performing algorithm based on VGG19 architecture | Sensitivity Specificity Accuracy AUROC Tsec | Sensitivity 89.0 Specificity 97.0 Accuracy 0.98 AUROC 0.990 212.09 s |
| Warin et al.[ | Clinical photography. Specific imaging method not disclosed | NN based on DenseNet121 architecture, pre-trained on ImageNet | Sensitivity Specificity Precision AUROC F1 Grad-CAM | Sensitivity 98.75 Specificity 100 Precision 100 AUROC 0.99 F1 0.99 |
| Lin et al.[ | Heterogenous dataset from 4 different smartphones | NN based on HRNet-W18 architecture, pre-trained on ImageNet | Sensitivity Specificity Precision AUROC F1 Grad-CAM | Sensitivity 83.0 Specificity 96.6 Precision 0.84 AUROC 0.946 F1 0.9 |
| Welikala et al.[ | Smartphone images of oral lesions as part of the MeMoSA initiative | Multiple pre-trained NNs. Best performing algorithm based on VGG19 architecture | Sensitivity Specificity Precision Accuracy F1 Grad-CAM | Sensitivity 85.7 Specificity 76.4 Precision 0.77 Accuracy 80.9 F1 0.81 |
| Figueroa et al.[ | Clinical photographs. Specific imaging method not disclosed | NN based on VGG19 architecture, pre-trained on ImageNet | Sensitivity Specificity Accuracy Grad-CAM | Sensitivity 74.4 Specificity 89.1 Accuracy 83.8 |
| Warin et al.[ | SLR camera | NN based on ResNet architecture, pre-trained on ImageNet | Sensitivity Specificity Precision AUROC | Sensitivity 98.4 Specificity 91.7 Precision 92.0 AUROC 0.950 |
| Tanriver et al.[ | Clinical photographs taken in clinical department, supplemented by images from various search engines | Multiple pre-trained NNs; best performance using EfficientNet-b4 architecture | Sensitivity Precision F1 | Sensitivity 89.3 Precision 86.2 F1 85.7 |
| Jeyaraj et al.[ | Imaging data extracted from UCI irvine machine learning repository, the cancer imaging archive and the genomic data commons data portal | Modified Inception v3 architecture pre-trained on ImageNet. Compared to support vector machine and deep belief network | Sensitivity Specificity Accuracy AUROC | Sensitivity 98.0 Specificity 94.0 Accuracy 96.6 AUROC 0.965 |
Summary of identified studies using optical imaging as the screening modality.
| Study | Data source | ML classification methods | Performance metrics | Outcomes (best performing ML) |
|---|---|---|---|---|
| Uthoff et al.[ | Custom smartphone-based dual modality device capable of both white light and autofluorescence imaging | NN based on VGG-M architecture, pre-trained on ImageNet | Sensitivity Specificity Precision NPV Accuracy AUROC | Sensitivity 85.0 Specificity 89.0 Precision 0.88 NPV 0.85 Accuracy 86.9 AUROC 0.91 |
| Song et al.[ | Smartphone-based intraoral imaging system with custom WL probe | NN based on VGG19 architecture, pre-trained on ImageNet | Accuracy | Accuracy 85.6 |
| Chan et al.[ | VELscope device[ | Classification based ResNet or Inception architecture, using either a fully convolutional network or feature pyramid network | Sensitivity Specificity | Sensitivity 98.0 Specificity 88.0 |
| Aubreville et al.[ | Confocal Laser Endomicroscopy images of oral cavity following IV fluorescein. Images extracted from IO videos. CystoFlex UHD and Coloflex UHD as imaging devices | Used untrained LeNet-5 architecture with patch probability fusion, whole image classification using pre-trained Inception V3 CNN and random forest classifier. Best performance using LeNet-5 | Sensitivity Specificity Accuracy AUROC | Sensitivity 86.6 Specificity 90.0 Accuracy 88.3 AUROC 80.7 |
| De Veld et al.[ | Xe lamp with monochromator for illumination, a spectrograph and custom set of long-pass and short-pass filters | NN with base architecture not specified; single hidden layer between input and output | AUROC | AUROC 0.68 |
| Roblyer et al.[ | Multispectral digital microscope (MDM), measuring white light reflectance, autofluorescence, narrow band reflectance and cross-polarised light | Linear discriminant analysis | Sensitivity Specificity AUROC | Sensitivity 93.9 Specificity 98.1 AUROC 0.981 |
| Caughlin et al.[ | Multispectral autofluorescence lifetime imaging (maFLIM) endoscopy | Bespoke neural network using a shared encoder and separate paths for signal reconstruction and classification; classification on pixel-pixel basis | Sensitivity Specificity Precision Accuracy F1 | Sensitivity 87.5 Specificity 67.6 Precision 76.3 Accuracy 77.6 F1 0.80 |
| Jo et al.[ | Time-domain multispectral FLIM rigid endoscope. Emission spectral collected for collagen, NADH, FAD | Quadratic discriminant analysis | Sensitivity Specificity AUROC | Sensitivity 95 Specificity 87 AUROC 0.91 |
| Francisco et al.[ | Portable spectrophotometer with two solid state lasers; a diode emitting at 406 nm and a double frequency neodymium 523 nm as excitation source | Compared naïve bayes, k-Nearest Neighbours and decision tree. Decision tree provided best performance | Sensitivity Specificity Accuracy | Sensitivity 87.0 Specificity 91.2 Accuracy 87.0 |
| Wang et al.[ | Fibre optics-based flurospectrometer, using Xe lamp with monochromator as excitation source | Partial least squares combined with artificial neural network—neural network with single hidden layer | Sensitivity Specificity Precision | Sensitivity 81.0 Specificity 96.0 Precision 88 |
| Majumder et al.[ | N2 laser as excitation source | Relevance Vector Machine (RVM) | Sensitivity Specificity AUROC | Sensitivity 91 Specificity 95 AUROC 0.9 |
| Huang et al.[ | VELscope device | Quadratic discriminant analysis | Sensitivity Specificity | Sensitivity 92.3 Specificity 97.9 |
| Duran-Sierra et al.[ | Multispectral autofluorescence lifetime imaging endoscopy (maFLIM); preferential excitation of NADH and FAD | Best performance using ensemble approach of support vector machine and quadratic discriminant analysis | Sensitivity Specificity F1 AUROC | Sensitivity 94.0 Specificity 74.0 F1 0.85 AUROC 0.81 |
| Jeng et al.[ | VELscope device | Used both linear discriminant analysis and quadratic discriminant analysis | Sensitivity Precision Accuracy F1 AUROC | Sensitivity 92.0 Precision 0.86 Accuracy 86.0 F1 0.88 AUROC 0.96 |
| Huang et al.[ | Custom autofluorescence device, comprising two LED continuous wave lamps, for preferential imaging of NADH and FAD | Quadratic discriminant analysis | Sensitivity Specificity | Sensitivity 94.6 Specificity 85.7 |
| Kumar et al.[ | Custom portable autofluorescence device using collimating lens and bream splitter; 405 nm dioxide for excitation | Dimensionality reduction using PCA, before Mahalanobis distance classification on first 11 PCs | Sensitivity Specificity Accuracy | Sensitivity 98.7 Specificity 100 Accuracy 98.9 |
| Rahman et al.[ | Custom portable imaging system composed of modified headlamp system capable of both autofluorescence imaging and reflectance imaging | Linear discriminant analysis | Sensitivity Specificity AUROC | Sensitivity 92.0 Specificity 84.0 AUROC 0.913 |
| James et al.[ | Use of a spectral-domain Optical Coherence Tomography (OCT) system consisting of a 2D scanning long GRID rod probe with a centre wavelength of 930 nm | Use of 14 artificial neural networks for feature extraction, followed by support vector machine for classification. Best performance using DenseNet-201 and NASNetMobile in delineating OSCC from others | Sensitivity Specificity PPV NPV Accuracy | Sensitivity 86.0 Specificity 81.0 PPV 51.0 NPV 96.0 Accuracy 81.9 |
Summary of identified studies using thermal imaging and VOC analysis as the screening modality.
| Study | Data source | ML classification methods | Performance metrics | Outcomes (best performing ML) |
|---|---|---|---|---|
| Chakraborty et al.[ | FLIR T 650 SC long infrared (7.5–13 µm) camera | Support Vector Machine (SVM) | Accuracy | Accuracy 84.72 |
| Van de Goor et al.[ | ‘Aeonose’ electronic nose—using 3 micro-hotplate metal-oxide sensors to detect a range of VOCs in exhaled breath | Compression of 64 × 36 measurements per sensor, using Tensor Decompression (Tucker3-like). NN implemented through AeoNose software (Aethena software)—base architecture not specified | Sensitivity Specificity Accuracy AUROC | Sensitivity 84 Specificity 67 Accuracy 72 AUROC 0.850 |
| Mohamed et al.[ | ‘Aeonose’ electronic nose—using 3 micro-hotplate metal-oxide sensors to detect a range of VOCs in exhaled breath | Compression of 64 × 36 measurements per sensor, using Tensor Decompression (Tucker3-like). NN implemented through AeoNose software (Aethena software)—base architecture not specified | Sensitivity Specificity Precision Accuracy AUROC | Sensitivity 80 Specificity 77 Precision 67 Accuracy 79 AUROC 0.882 |
| Leunis et al.[ | ‘DiagNose’ electronic nose—12 metal-oxide sensors using four different sensor types: CH4, CO, NOx, Pt | Forward selection logistic regression | Sensitivity Specificity AUROC | Sensitivity 90 Specificity 80 AUROC 0.850 |
| Hakim et al.[ | ‘Nanoscale Artificial Nose’ (NA-NOSE) electronic nose—5 sensors based on gold nanospheres with tert-dodecanethiol, hexanathiol, 2-mercaptobenzoazole, 1-butanethiol, and 3-methyl-1-butanethiol ligands | Support vector machine (SVM) trains on principle components 1 and 2, following PCA of sensor measurements | Sensitivity Specificity Accuracy | Sensitivity 100 Specificity 92 Accuracy 96 |
| Mentel et al.[ | ‘BreathSpect’ device, utilising two fold separation using gas chromatography and mass spectrometry to detect VOCs | 2-Dimensional output from ‘BreathSpect’ device converted to integer arrays. Best classification performance using logistic regression | Accuracy | Accuracy 89 |
Figure 2Summary plots of ‘Risk of bias’ (top panel) and ‘Applicability’ (bottom panel) using the QUADAS-2 tool.
Results of main bivariate random effects model of diagnostic test performance, subgroup analysis, and sensitivity analysis following removal of influential outliers.
| Category | Subgroup | Sensitivity [95% CI] | False positive rate [95% CI] | AUC [restricted AUC] | Diagnostic meta-regression estimate (SE); p-value | |
|---|---|---|---|---|---|---|
| Sensitivity | False positive rate | |||||
| Overall | – | 0.892 [0.866; 0.913] | 0.140 [0.108; 0.180] | 0.935 [0.877] | – | – |
| AI type | Classical | 0.904 [0.878; 0.925] | 0.151 [0.111; 0.202] | 0.915 [0.893] | – | – |
| Modern | 0.883 [0.839; 0.916] | 0.139 [0.096; 0.197] | 0.932 [0.867] | − 0.341 (0.247), p = 0.167 | − 0.003 (0.320), p = 0.994 | |
| Modality | Volatile compounds | 0.863 [0.764; 0.924] | 0.238 [0.142; 0.372] | 0.889 [0.827] | – | – |
| Clinical photographs | 0.911 [0.848; 0.950] | 0.118 [0.070; 0.192] | 0.952 [0.900] | 0.401 (0.464), p = 0.388 | − 0.740 (0.490), p = 0.131 | |
| Optical imaging | 0.882 [0.865; 0.896] | 0.150 [0.112; 0.197] | 0.914 [0.867] | 0.328 (0.450), p = 0.131 | − 0.620 (0.476), p = 0.192 | |
| Lesion type | OSCC vs healthy | 0.868 [0.858; 0.878] | 0.145 [0.093; 0.218] | 0.861 [0.859] | – | – |
| OSCC/OPMD vs benign | 0.875 [0.801; 0.924] | 0.153 [0.063; 0.326] | 0.905 [0.869] | − 0.222 (0.342), p = 0.516 | 0.122 (0.490), p = 0.803 | |
| OSCC/OPMD vs healthy | 0.874 [0.824; 0.911] | 0.179 [0.115; 0.268] | 0.914 [0.852] | 0.205 (0.385), p = 0.594 | 0.205 (0.385), p = 0.594 | |
| Overall | – | 0.892 [0.871; 0.910] | 0.142 [0.104; 0.190] | 0.883 [0.883] | – | – |
| AI type | Classical | 0.903 [0.875; 0.924] | 0.176 [0.150; 0.205] | 0.931 [0.867] | – | – |
| Modern | 0.878 [0.843; 0.907] | 0.118 [0.068; 0.199] | 0.870 [0.870] | − 0.248 (0.207), p = 0.232 | − 0.349 (0.362), p = 0.335 | |
| Modality | Volatile compounds | 0.921 [0.863; 0.856] | 0.157 [0.124; 0.197] | 0.916 [0.912] | – | – |
| Clinical photographs | 0.899 [0.861; 0.928] | 0.084 [0.041; 0.168] | 0.920 [0.890] | 0.244 (0.433), p = 0.574 | − 0.784 (0.583), p = 0.179 | |
| Optical imaging | 0.896 [0.868; 0.238] | 0.172 [0.122; 0.238] | 0.904 [0.884] | 0.275 (0.419), p = 0.512 | − 0.127 (0.547), p = 0.817 | |
| Lesion type | OSCC vs healthy | 0.900 [0.861; 0.929] | 0.185 [0.149; 0.227] | 0.919 [0.866] | – | – |
| OSCC/OPMD vs benign | 0.875 [0.801; 0.924] | 0.152 [0.063; 0.326] | 0.905 [0.869] | − 0.347 (0.306), p = 0.256 | 0.002 (0.479), p = 0.997 | |
| OSCC/OPMD vs healthy | 0.904 [0.863; 0.934] | 0.168 [0.087; 0.299] | 0.910 [0.894] | − 0.070 (0.275), p = 0.256 | 0.083 (0.464), p = 0.858 | |
aInfluential studies removed for sensitivity analysis[2,20,25,26,30,33,38,43,46].
Figure 3Summary Receiver Operator Characteristic (sROC) curves to estimate model performance; Top left, sROC curve of bivariate model of all studies (AUC 0.935); top right, sROC curves according to methodology; bottom left, sROC curves according to AI type; bottom right, sROC curves according to lesion type. AUC for subgroups, and results of subgroup analysis are provided in Table 4.
Figure 5overview of training and validation sample sizes for identified studies included in meta-analysis. Point size proportional to F1 score, indicating no obvious relationship between size of training sample here and performance.
Figure 4Summary of best performing machine learning algorithms adopted by identified studies. The numbers represent the number of studies who reported best outcomes with the associated model. VGG visual geometry group, HR high resolution, NR not reported.