Literature DB >> 34235035

Machine Learning Demonstrates High Accuracy for Disease Diagnosis and Prognosis in Plastic Surgery.

Angelos Mantelakis1, Yannis Assael2, Parviz Sorooshian3, Ankur Khajuria4,1.   

Abstract

INTRODUCTION: Machine learning (ML) is a set of models and methods that can detect patterns in vast amounts of data and use this information to perform various kinds of decision-making under uncertain conditions. This review explores the current role of this technology in plastic surgery by outlining the applications in clinical practice, diagnostic and prognostic accuracies, and proposed future direction for clinical applications and research.
METHODS: EMBASE, MEDLINE, CENTRAL and ClinicalTrials.gov were searched from 1990 to 2020. Any clinical studies (including case reports) which present the diagnostic and prognostic accuracies of machine learning models in the clinical setting of plastic surgery were included. Data collected were clinical indication, model utilised, reported accuracies, and comparison with clinical evaluation.
RESULTS: The database identified 1181 articles, of which 51 articles were included in this review. The clinical utility of these algorithms was to assist clinicians in diagnosis prediction (n=22), outcome prediction (n=21) and pre-operative planning (n=8). The mean accuracy is 88.80%, 86.11% and 80.28% respectively. The most commonly used models were neural networks (n=31), support vector machines (n=13), decision trees/random forests (n=10) and logistic regression (n=9).
CONCLUSIONS: ML has demonstrated high accuracies in diagnosis and prognostication of burn patients, congenital or acquired facial deformities, and in cosmetic surgery. There are no studies comparing ML to clinician's performance. Future research can be enhanced using larger datasets or utilising data augmentation, employing novel deep learning models, and applying these to other subspecialties of plastic surgery.
Copyright © 2021 The Authors. Published by Wolters Kluwer Health, Inc. on behalf of The American Society of Plastic Surgeons.

Entities:  

Year:  2021        PMID: 34235035      PMCID: PMC8225366          DOI: 10.1097/GOX.0000000000003638

Source DB:  PubMed          Journal:  Plast Reconstr Surg Glob Open        ISSN: 2169-7574


INTRODUCTION

An expanding population in the United States has resulted in an increasing demand for plastic surgery services, which, coupled with static number of residents and increasing number of retiring surgeons, is increasing the pressure for the delivery of high-quality care.[1] It is now estimated that there is a workforce shortage of 800 attending physicians in the United States, reducing the availability of care.[1] Artificial Intelligence (AI) could have a major impact on addressing challenges that healthcare systems face. Digital technologies are predicted to affect more than 80% of the healthcare workforce in the next 2 decades, changing the way physicians practice medicine and meeting the increasing demand for services.[2] AI can help drive this change by automating repetitive tasks to free up time from clinicians, improving the diagnostic accuracy of diseases and predicting patient outcomes.[2] Machine learning (ML), a subfield of AI, is a set of models able to learn from past cases (data) to make future predictions. A wide variety of such algorithms are in use today, such as in the automated, individualized suggestions generated during a Google Search, based on ones’ previous searches. These models can be classified into two broad categories: supervised learning and unsupervised learning. The difference between these two categories of learning models lies in the existence of labeled data. In supervised learning, the models are trained using examples of data with known labels, labeled data, and after training, they aim to predict outcomes utilizing new data.[3,4] This function has been utilized in healthcare to assist in both making a diagnosis and for disease outcome prediction. Authors have utilized supervised learning to successfully classify whether a skin lesion is benign (eg, benign nevi) or malignant (malignant melanoma), outperforming the accuracy of 21 board-certified dermatologists (accuracy 72% versus 66%, P < 0.05).[5] Similarly, supervised learning has also been utilized in predicting the risk of developing a condition such as breast cancer based on epidemiological data, and the risk of recurrence after treatment.[6,7] In contrast, unsupervised learning models are trained using unlabeled data, and after training, aim to discover underlying groupings or patterns from the data themselves.[3,8] These algorithms can be particularly useful in identifying previously unknown patterns in vast amounts of unprocessed data, which may then be used in clinical practice. Examples include novel classification of diseases into various subtypes and identifying subgroups of patients with increased risk of certain conditions based on various characteristics (for example, their genome).[9,10] In addition to meeting demand for plastic surgery services, this technology has the potential to revolutionize how plastic surgery is practiced and enhance surgeon’s diagnosis prediction, preoperative planning, and outcome prediction, leading to improved patient care. In burn surgery, even the most experienced surgeons have a clinical estimation of 64%–76% accuracy in the diagnosis of burn depth.[11,12] ML models may outperform this, achieving correct burn depth identification from 2D photographs up to 87%, potentially leading to more appropriate clinical management at presentation.[13] Further, in the prognostication of whether a burn injury will heal within 14 days of presentation, ML models have demonstrated an accuracy of 86%, again surpassing the accuracy of prognostication by clinicians.[4] In the field of microsurgery, postoperative monitoring via 2D image analysis achieves a 95% accuracy in classifying a flap as normal, presence of venous obstruction, or presence of arterial occlusion, leading to potential early identification of flap failure and increased salvage rates.[4] However, the evidence of applications of ML is abstract, with no systematic reviews that summarize the clinical accuracy of such models in practice. This could act as a starting point of developing clinical practice guidelines and to guide future research.[14-17] The aim of this study was to systematically synthesize and report the current literature in the clinical applications of ML in plastic surgery.

METHODS

Search Strategy

The protocol for this systematic review was registered with PROSPERO international prospective registration of systematic reviews registration number: CRD42019140924. The full protocol was published a priori, and there were no deviations from the original protocol.[18] This systematic review was conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.[19] A systematic literature search was performed in MEDLINE (OVID SP), EMBASE (OVID SP), CENTRAL, and ClinicalTrials.gov databases to identify relevant studies for review. The reference lists of all included studies were also screened, and relevant studies were included in the search. Lastly, manual searches of bibliographies, citations, and related articles (Pubmed function) were also performed to identify missed relevant studies. Medical Subject Headings (MeSH) terms were used in combination with free text to construct our search strategy. A sample search strategy used in MEDLINE (OVID SP) is shown in Table 1.[20-70]
Table 1.

Example Search Strategy Used for MEDLINE[20–70]

1(“deep learning” OR “artificial intelligence” OR “machine learning” OR “decision trees” OR “random forests” OR SVM OR “support vector machine”)
2exp “NEURAL NETWORKS (COMPUTER)”/ OR exp “DEEP LEARNING”/
3exp “ARTIFICIAL INTELLIGENCE”/
4(1 OR 2 OR 3)
5(microsurgery OR (surgery AND (plastic OR reconstructive OR esthetic OR aesthetic OR burns OR hand OR craniofacial OR “peripheral nerve”)))
6exp “SURGERY, PLASTIC”/ OR exp “RECONSTRUCTIVE SURGICAL PROCEDURES”/
7(5 OR 6)
8(4 AND 7)
Example Search Strategy Used for MEDLINE[20-70]

Selection Criteria

All eligible studies between January 1990 and June 2020 were included in this review. We included any primary studies (including case reports) that present clinical data on the application of ML in plastic surgery. Only articles in the English language were included. Our exclusion criteria included descriptions of ML in plastic surgery without clinical data, review articles, conference abstracts, animal studies, and articles pertaining to the use of ML outside the remit of the specialty (as defined by the Intercollegiate Surgical Curriculum Program in Plastic Surgery). After the library preparation, two independent reviewers (AM and PS) screened the search results for inclusion based on the title and abstracts. Subsequently, a full-text review was performed independently by the same two researchers (AM and PS) for all included studies. At each step, any discrepancy of opinion was resolved with consensus, and if not resolved, was referred to a third reviewer (AK). If any doubt remained, the article proceeded to the next step of the review. The search results of all included articles, abstracts, full-text articles, and records of the reviewers’ decisions, including reasons for exclusion, were recorded.

Outcome Measures

The primary outcome was the ML algorithm statistical accuracy in performing a prespecified clinical task (eg, prediction of a clinical diagnosis or postoperative outcome). Secondary outcomes include the reported specificity, sensitivity, area under the curve, and technical characteristics of the algorithms.

Data Extraction and Analysis

The data from all full-text articles accepted for the final analysis were independently retrieved by AM and PS, using a standardized data extraction form. Any disagreements were resolved by discussion or referred to the third researcher (AK). The following data (where available) were extracted: a) Study details (year of publication, country), patient demographics, study setting, clinical condition examined. b) ML algorithm characteristics (intended function, whether the model was supervised or unsupervised, function via classification or outcome prediction, usage of real or synthetic data, and which type of ML model was used) c) Primary and secondary outcomes, as above. Statistical meta-analysis could not be performed because of the heterogeneity of the studies in the conditions examined and software models utilized. Instead, a narrative review was performed, with a subgroup analysis of the mean accuracy of the models, calculated by measuring the number of correct predictions over the total predictions made. The subgroup analyses are based on the model function (diagnosis prediction, preoperative planning and outcome prediction) and type of models (NNs, SVMs, decision tree/random forest, and linear regression). This subgroup classification was utilized based on the objectives set for AI models in clinical practice by NHS England.[2]

Quality Assessment

The quality of the included studies was assessed based on the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), performed by two independent reviewers (AM and PS).[71] There were no disagreements between the authors. The QUADAS-2 tool allows for risk of bias assessment and applicability concern assessment of primary diagnostic accuracy studies. Risk of bias was assessed based on the patient selection, index test (in this review, this is the ML algorithm), reference standard (comparator), and flow and timing. Concerns regarding applicability were assessed on the first three terms alone.

RESULTS

Literature Search Results

From a total of 1536 studies, after removal of duplicates, 1181 articles were eligible for a title and abstract review. Of these, 1074 articles did not meet the inclusion criteria and were excluded. Following full-text review of the remaining 107 articles, 56 articles were excluded because the inclusion criteria were not met. A total of 51 articles were included and formed the basis of this systematic review (Fig. 1). Details of the included studies are summarized in Table 2.[20-70]
Fig. 1.

The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram.

Table 2.

Primary Outcomes of Accuracy, Sensitivity, and Specificity for Reconstructive and Burns Surgery

StudyAuthor, YearFunctionModelAccuracySensitivitySpecificityAUC
1Abubakar et al, 2020[20]DPCNNWhite: 99.3% Afro-Carribean: 97.1%NRNRNR
2Chauhan J et al, 2020[21]DPBPBSAM (CNN + SVM)91.70%NRNRNR
3Desbois et al, 2020[22]DPDNN with 3 measures91.98%NANANR
DNN with 4 measures92.45%NANANR
Boost with 3 measures97.89%NANANR
Boost with 4 measures98.08%NANANR
avNN with 3 measures97.45%NANANR
avNN with 4 measures98.30%NANANR
4Rashidi et al, 2020[23]OPDNN100%92%93%0.880
LR95%91%90%0.940
SVM98%NRNR0.780
RF93%NRNR1.000
k-NN98%91%82%0.960
5Bhalodia et al, 2020[24]DPShapeswork software with PCANRNRNRNR
6Guarin et al, 2020[25]DPNRNRNRNRNR
7Formeister et al, 2020[26]OPGradient Boosted Decision Tree60.00%62.00%60.00%NR
8Boczar et al, 2020[27]InterventionIBM Watson92.30%NRNRNR
9O’Neil et al, 2020[28]OPDecision TreeNR5.00%86.80%0.672
10Yoo et al, 2020[29]OPDeep Learning (Generative adversarial network- GAN)NRNRNRNR
Pix2pixNRNRNRNR
Lightweight CycleGANNRNRNRNR
DPDeep Learning + No data augmentation74.20%75.80%72.70%0.824
Deep Learning + Std data augmentation83.3%%78.80%87.90%0.872
Deep Learning + GAN data augmentation90.90%87.80%93.90%0.957
11Angullia et al, 2020[30]OPLeast squares radial basis functionNANANANA
12Eguia et al, 2020[31]OPDecision TreeNANANA0.690
Stepwise Logistic RegressionNANANA0.800
LRNANANA0.830
k-NNNANANA0.840
13Ohura et al, 2019[32]DPSegNet97.60%90.90%98.20%0.994
LinkNet97.20%98.90%98.90%0.987
U-Net98.80%99.30%99.30%0.997
Unet_VGG1698.90%99.20%99.20%0.998
14Porras et al, 2019[33]DPSVM95.30%94.70%96%NR
15Knoops et al, 2019[34]DPSVM95.40%95.50%95.20%NR
OPLRRRLARLASSONRNRNRNR
16Hallac et al, 2019[35]DPPretrained Google-Net94.10%97.80%86%NR
17Levites et al, 2019[36]DPText-based emotion analysisNRNRNRNR
18Shew et al, 2019[37]OP2-class Decision Forest64.40%NRNRNR
19Dorfman et al, 2019[38]DPNeural NetsNRNRNRNR
20Qiu et al, 2019[39]PPU-Net CNNNRNRNRNR
21Aghei et al, 2019[40]OPANN-MLP73.3%76.20%70.20.762
SVM67.20%66.10%68.40%0.731
RF67.20%61%73.70%0.751
LR (FS)67.20%61%73.70%0.711
LR (BS)66.40%64.40%67.70%0.718
22Cirillo et al, 2019[41]DPVGG-1677.53%NRNRNR
Google-Net73.80%NRNRNR
Res-Net 5077.79%NRNRNR
Res-Net 101 without data aug90.54%74.35%94.25%NR
Res-Net 101 with data aug82.72%NRNRNR
23Tran et al, 2019[42]OPk-NN with k = 1-6 or 8-20100%NANANR
24Yadav et al, 2019[43]DPMDS modeling80%97.00%60.00%NR
SVM82.43%87.80%83.33%NR
25Jiao et al, 2019[44]DPR101A CNN82.04%NANANR
IV2RA CNN83.02%NANANR
R101FA CNN84.51%NANANR
26Liu et al, 2018[45]PPLeast Squares RegressionNRNRNRNR
Decision treeNRNRNRNR
Sigmoid Neural NetsNRNRNRNR
Hyperbolic Tangent Neural NetNRNRNRNR
Combined Model (Tree +NN)NRNRNRNR
27Martinez-Jemenez et al, 2018[46]OPRecurrent Partitioning Random Forest85.35%NRNRNR
28Su et al, 2018[47]OPRandom ForestNANANANR
29Tang et al, 2018[48]OPL.R80.50%84.40%77.70%0.875
XGBoost85.40%82.0%%89.7%%0.920
30Cobb et al, 2018[49]OPRandom ForestNANANANR
Stochastic Gradient BoostingNR
31Cho MJ et al, 2018[50]DPK-means96%NRNRNR
32Kuo et al, 2018[51]OPMLR72.70%22.10%93.30%NR
33Tan et al, 2017[52]PPNRNRNRNRNR
34Huang et al, 2016[53]OPSVM100%NANANR
35Park et al, 2015[54]PPFeature wrapping77.30%99%74.10%NR
36Serrano et al, 2015[55]PPSVM79.7397%60%NR
37Mukherjee et al, 2014[56]DPSVM with 3rd polynomial kernel86.13%NANANR
Bayesian classifier81.15%NANANR
38Mendoza et al, 2014[57]DPLDA95.70%97.90%99.60%NR
DPRandom Forest87.90%NRNRNR
DPSVM90.80%NRNRNR
39Acha et al, 2013[58]DPk-NN66.2%NRNRNR
SVM75.7%NRNRNR
PPk-NN83.8%NRNRNR
SVM82.4%NRNRNR
40Schneider et al, 2012[59]OPCART Decision Tree with Gini splitting function73.30%NANANR
41Patil et al, 2009[60]OPBayesian classifier97.78%100%95.50%0.978
Decision Tree96.12%96.60%95.51%0.961
SVM96.12%98.60%93.26%0.961
Back propagation95%96.71%93.26%0.949
42Yamamura et al, 2008[61]OPANN100%NANANR
LR72%NANANR
43Correa et al, 2008[62]DPSVM95.05%NRNRNR
44Acha et al, 2005[63]DPFuzzy-ArtMap Neural Network82.26%83.01%NANR
45Yeong et al, 2005[64]OPANN86%75%97%NR
46Serrano et al, 2005[65]DPFuzzy-ArtMap Neural Network88.57%83.01%NANR
47Yamamura et al, 2004[66]OPANN100%100%100%NR
LR80%66.70%85.70%NR
ANN with leave-one-out crossvalidation86.60%66.70%95.20%NR
48Acha et al, 2003[67]OPFuzzy-ArtMap Neural Network82.60%NRNRNR
49Estahbanati et al, 2002[68]OPANN90%80%NANR
50Hsu et al, 2000[69]PPShallow Neural NetNANANANR
51Fyre et al, 1996[70]OPFeed forward, back propagation error adjustment model98%NANANR
77%NANANR

ADTree, alternating decision tree; AUC, area under the curve; CNN, convoluted NNs; DNN, deep neural network; DP, diagnosis prediction; k-NN, k-nearest neighbor; LASSO, least absolute shrinkage and selection operator; LDA, liner discriminant analysis; MLR, multiple logistic regression; NA, not applicable; NB classifier, Naive Bayes classifier; NR, not reported; OP, outcome prediction; PP, preoperative planning; RF, random forest .

Primary Outcomes of Accuracy, Sensitivity, and Specificity for Reconstructive and Burns Surgery ADTree, alternating decision tree; AUC, area under the curve; CNN, convoluted NNs; DNN, deep neural network; DP, diagnosis prediction; k-NN, k-nearest neighbor; LASSO, least absolute shrinkage and selection operator; LDA, liner discriminant analysis; MLR, multiple logistic regression; NA, not applicable; NB classifier, Naive Bayes classifier; NR, not reported; OP, outcome prediction; PP, preoperative planning; RF, random forest . The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram.

Breakdown of the Applications of ML Models in Diagnosis Prediction, Outcome Prediction, and Preoperative Planning

In total, 51 studies were included in the review, which evaluated the accuracy of 103 ML algorithms. Of these, 27 were on burns surgery and 24 on general reconstructive surgery. The publication years ranged from 1996 to 2020, with 25 studies published in the past year alone (2019–2020). The clinical utility of these algorithms was to assist clinicians in diagnosis prediction (n = 22), outcome prediction (n = 21), and preoperative planning (n = 8). In diagnosis prediction, algorithms were created to assist in automated burn depth diagnosis from 2D photography (n = 9) and total burn surface area (n = 1), automated diagnosis of craniosynostosis (n = 5), wound identification in 2D photography (n = 2), diagnosis and severity assessment of facial palsy (n = 1), diagnosis of congenital auricular deformities (n = 1), identification of emotional responses to plastic surgery on Twitter (n = 1), automated age estimation after rhinoplasty (n = 1), and identifying the correct answer to frequently asked questions (n = 1). In outcome prediction, the ML algorithms created predicted mortality in burn patients (n = 5), the occurrence of AKI in burn and trauma patients (n = 4), occurrence of postoperative complications in breast and head and neck free flap reconstruction (n = 3), concentration and response of aminoglycosides in burn patients (n = 2), postoperative faces after oculoplastic and craniosynostosis surgery (n = 2), burn healing time (n = 1), mortality in patients with necrotizing soft tissue infection (n = 1), delay in radiotherapy following cancer excision (n = 1), posttraumatic stress disorder following burns (n = 1), and factors predicting the occurrence of burns in the pediatric population (n = 1). In preoperative planning, ML was used to predict which wounds will need grafting (n = 2), which patients will need orthognathic or cleft palate operations (n = 2), planning orthognathic and mandibular resections (n = 2), predicting open wound size (n = 1), and complexion of reconstruction following head and neck cancer excision (n = 2).

ML Models Demonstrate High Accuracy, Sensitivity, and Specificity That May Enhance Clinical Decision-making

The 51 studies evaluated 103 ML algorithms (Table 2). The pooled mean of accuracy of ML algorithms was 86.84% (range 60.00–100%). The pooled mean sensitivity and specificity is 81.88% (range 5.00– 99.30%) and 86.38% (range 60.00–100%), respectively, as reported in 39 models. A subgroup analysis was performed based on the clinical utility of the algorithms. For diagnosis prediction, the pooled accuracy, sensitivity, and specificity of ML algorithms was 88.80% (range 66.20–97.60%), 90.62% (range 75.80–97.90%), and 86.81% (range 60.00–99.60%). In outcome prediction, this was 86.11% (range 66.20–97.60%), 69.67% (range 5.00–100%), and 85.94% (range 60.00–100%), respectively. In preoperative planning, two studies reported the accuracy, sensitivity, and specificity, which were 80.28% (range 77.30–83.80%), 98.00% (range 97.00–99.00%), and 67.05% (range 60.00–74.10%). A second subgroup analysis on the reported accuracy was performed based on the type of model utilized. The mean accuracy for NNs was 88.25% (range 73.80–100%), SVMs 88.02% (range 67.20–100%), decision trees/random forest 78.75% (range 60.00–96.12%), and linear regression 76.85% (range 66.40–95.00%).

Breakdown and Analysis of the Supervised and Unsupervised ML Models Utilized

Supervised ML was utilized in 50 of the included studies and unsupervised learning in three studies (two studies employed both supervised and unsupervised learning). The supervised ML algorithms identified are summarized in Table 3. The most commonly used ones were NNs (n = 34), SVMs (n = 13), decision trees/random forests (DT/RF, n = 10), and LR (n = 9). The unsupervised ML models utilized were K-means clustering, a shapeswork software with principal component analysis and the algorithm was not reported in one study.
Table 3.

Technical Characteristics of ML Algorithms Utilized in Burns and Reconstructive Surgery

Study No.AuthorFunctionPurposeInputOutputSupervised or UnsupervisedModeling (Classification or Regression)Real or Synthetic DataTraining
TrainingValidationTest
1Abubakar et al, 2020[20]DPDifferentiate healthy versus burned skin in both white and black skin2D photographsDifferentiate healthy versus burned skin in both white and black skinSupervisedClassificationData augmentation80%NA20%
2Chauhan J et al, 2020[21]DPDiagnose depth of burns2D photographsDifferentiate body part + severity of burnSupervisedClassificationData augmentation80%20%Separate test set
3Desbois et al, 2020[22]DPAutomated assessment of TBSAAnthropometric measurementsAutomated assessment of TBSASupervisedRegressionReal data80%NA20%
4Rashidi et al, 2020[23]OPPrediction of AKI in burn and trauma patientsRenal injury biomarkers and urine outputPrediction of AKI in burn and trauma patientsSupervisedClassificationReal data59%NA41%
5Bhalodia et al, 2020[24]DPMeasuring severity of craniosynostosisCT imagesMeasuring severity of craniosynostosisUnsupervisedNAReal dataNRNRNR
6Guarin et al, 2020[25]DPDiagnosis and severity assessment of facial palsy2D photographsAutomatic localization of 68 facial features in healthy and patients photographsUnsupervisedN/AReal data90%5%5%
7Formeister et al, 2020[26]OPPredicting any type of complications following free flap reconstruction14 patient characteristicsPrediction of complications in microvascular free flapsSupervisedClassificationReal data80%NA20%
8Boczar et al, 2020[27]DPAnswering frequently asked questionsParticipant questionCorrect answer to FAQsSupervisedClassificationReal dataNRNRNR
9O’Neil et al, 2020[28]OPPredicting flap failure in microvascular breast free flap reconstruction7 patient characteristicsFlap failure (yes/no)SupervisedClassificationData augmentation50%–70%NA30%–50%
10Yoo et al, 2020[29]OPPostoperative appearance following oculoplastic surgery for thyroid-associated opthalmopathyPreoperative photographPostoperative photographSupervisedRegressionData augmentationNRNRNR
11Angullia et al, 2020[30]OPPrediction of changes in face shape from craniosynostosis surgeryHigh resolution CTPredict changes in face shape from craniosynostosis surgerySupervisedRegressionReal dataNRNRNR
12Eguia et al, 2019[31]OPPrediction of in-hospital mortality in patients with necrotizing skin and soft tissue infectionPatient demographics, co-morbidities, and hospital characteristics (73 parameters in total)Prediction of in-hospital mortality in patients with necrotizing skin and soft tissue infectionSupervisedClassificationReal data80%NA20%
13Ohura et al, 2019[32]DPDiagnosis of wound ulcer2D photographsDifferentiation of healthy tissue from ulcer regionSupervisedClassificationReal data90%NA10%
14Porras et al, 2019[33]DPDiagnosis of craniosynostosis from 3D photographs3D photographsDiagnosis of craniosynostosis from 3D photographsSupervisedClassificationReal dataNRNRNR
15Knoops et al, 2019[34]PPOrthgonathic surgeryCTNeed for orthognathic surgery (yes/no)SupervisedClassificationReal data80%NA20%
16Hallac et al, 2019[35]DPDiagnosis of congenital auricular deformities2D photographsIdentify presence of congenital auricular deformities (yes/no)SupervisedClassificationReal dataNRNRNR
17Levites et al, 2019[36]DPIdentify emotional responses to plastic surgeryTwitter key wordsAnalyze emotional responses to plastic surgery proceduresSupervisedClassificationReal data60%20%20%
18Shew et al, 2019[37]OPPrediction of delay in radiotherapyVariable inpatient patient dataPrediction of delay of radiotherapy (more or less than 50 days to treatment)SupervisedClassificationReal dataNRNRNR
19Dorfman et al, 2019[38]DPIdentification of age perception following rhinoplasty2D photographsAutomated age predictionSupervisedClassificationReal dataNRNRNR
20Qiu et al, 2019[39]PPPlan mandibular resectionsCTAutomated 3D mandibular segmentation preoperativelySupervisedRegressionReal data48%7%45%
21Aghaei et al, 2019[40]OPElaboration of factors predicting pediatric burnsVarious health, social, and demographic risk factorsMost important factors in predicting burn occurrenceSupervisedClassificationReal data70%NA30%
22Cirillo et al, 2019[41]DPDiagnose depth of burns2D photographsClassification of burn depthSupervisedClassificationData augmentationNRNRNR
23Tran et al, 2019[42]OPPrediction of AKI in burn and trauma patientsRenal injury biomarkers and urine outputPrediction of AKI in burn and trauma patientsSupervisedClassificationReal data80%NA20%
24Yadav et al, 2019[43]DPDiagnose depth of burns2D photographsClassify burns by depth and surface areaSupervisedClassificationReal dataNRNRNR
25Jiao et al, 2019[44]DPDiagnose depth of burns2D photographsClassify burns by depth and surface areaSupervisedClassificationReal data87%NA13%
26Liu et al, 2018[45]PPExplore whether ML can predict open wound sizeFluid resus volume and other patient factorsPredict open wound sizeSupervisedRegressionReal data90%NA10%
27Martinez-Jimenez et al, 2018[46]PPPredicting which wounds need graftingInfrared thermographyPrediction of treatment modality required for burn woundSupervisedClassificationReal data61%NA39.00%
28Su et al, 2018[47]OPPrediction of PTSD & major depressive disorder in burn patientsBurn-related variables, empirically-derived risk factors from previous meta-analysis & theory-derived cognitive variablesPrediction of PTSD & major depressive disorder in burn patientsNRNRNRNRNRNR
29Tang et al, 2018[48]OPPrediction of AKI in burn patientsPatient risk factors and laboratory measurementsPrediction of AKI in burn patientsSupervisedClassificationReal dataNRNRNR
30Cobb et al, 2018[49]OPPrediction of mortality of burn patientsPatient risk factors and laboratory measurementsPredict whether a patient would (1) live versus (2) dieSupervisedClassificationReal data66%NA34%
31Cho MJ et al, 2018[50]DPDiagnosis of cranionynostosisCT imagesAutomated differentiation of craniosynostosis from benign metopic ridge from CTUnsupervisedClassificationReal dataNRNRNR
32Kuo et al, 2018[51]OPPredicting surgical site infectionPatient risk factorsPrediction of SSI (yes/no)SupervisedClassificationReal data70%NA30%
33Tan et al, 2017[52]PPComplexion of reconstruction following basal cell cancer excisionPatient risk factorsPrediction of intraoperative surgical complexitySupervisedClassificationReal dataNRNRNR
34Huang et al, 2016[53]OPPrediction of mortality of burn patientsPatient risk factors and laboratory measurementsPrediction of whether a patient would (1) live versus (2) dieSupervisedClassificationReal data21%66%13%
35Park et al, 2015[54]PPPrediction of need for surgery in patients with cleft lip/palateLateral cephalogramsPrediction of need for surgery in patients with cleft lip/palateSupervisedClassificationReal dataNRNRNR
36Serrano et al, 2015[55]PPPredicting which wounds need grafting2D photographsPredicting which wounnds need grafting (yes/no)SupervisedClassificationReal data21%NA79%
37Mukherjee et al, 2014[56]DPWound recognition and classification2D photographsAutomated assessment of wound classificationSupervisedClassificationReal dataNRNRNR
38Mendoza et al, 2014[57]DPDiagnosis of cranionynostosisCT imagesAutomated craniosynostosis diagnosis from CTSupervisedClassificationReal dataNRNRNR
39Acha et al, 2013[58]DPDiagnose depth of burns2D photographsClassify burns by depthSupervisedClassificationReal data21%NA79%
PPPredicting which wounds need grafting2D photographsPredict whether a burn will need graftingSupervisedClassificationReal data21%NA79%
40Schneider et al, 2012[59]OPPrediction of AKI in burn patientsPatient risk factors and laboratory measurementsPrediction of AKI in burn patientsSupervisedClassificationReal data71%NA29.00%
41Patil et al, 2009[60]OPPrediction of mortality of burn patientsPatient risk factors and laboratory measurementsPrediction of mortality in burn patientsSupervisedClassificationReal dataK-cross validationK-cross validationK-cross validation
42Yamamura et al, 2008[61]OPPrediction of response of aminoglycosides against MRSA infection in burn patientsPatient risk factors and laboratory measurementsPrediction of response of aminoglycosides against MRSA infection in burn patientsSupervisedClassificationReal dataK-cross validationK-cross validationK-cross validation
43Ruiz-Correa et al, 2008[62]DPDiagnosis of craniosynostosisCT imagesClassification of craniosynostosisSupervisedClassificationReal data
44Acha et al, 2005[63]DPDiagnose depth of burns2D photographsAutomated assessment of burn wound depthSupervisedClassificationReal data56%NA44%%
45Yeong et al, 2005[64]OPPrediction of burn healing timeReflectance spectometer measurementsPrediction of burn healing timeSupervisedClassificationReal dataNRNRNR
46Serrano et al, 2005[65]DPDiagnose depth of burns2D photographsAutomated assessment of burn wound depthSupervisedClassificationReal dataNRNRNR
47Yamamura et al, 2004[66]OPPrediction of aminoglycoside/ab × concentration in burn patientsPatient risk factors and laboratory measurementsPrediction of aminoglycoside/ab × concentration in burn patientsSupervisedClassificationReal data100%100%100%
SupervisedClassificationReal data80%66.70%85.70%
48Acha et al, 2003[67]DPIdentify burn tissue from healthy, and classify depth of burn2D photographsIdentify burn tissue from healthy, and classify depth of burnSupervisedClassificationReal data80%NA20%
49Estahbanati et al, 2002[68]OPPrediction of mortality of burn patientsPatient risk factors and laboratory measurementsPrediction of mortality of burn patientsSupervisedClassificationReal data75%NA25%
50Hsu et al, 2000[69]PPSkull reconstruction of areas needing an operationCTSkull reconstruction in CT for preoperative planningSupervisedRegressionReal dataNANANA
51Frye et al, 1996[70]OPPrediction of mortality of burn patientsPatient risk factors and laboratory measurementsPrediction of mortality of burn patientsSupervisedClassificationReal data90%NA10%
Prediction of hospital stay of burn patientsPrediction of hospital stay of burn patientsSupervisedClassificationReal data90%NA10%

NA, not applicable; NR, not reported.

Technical Characteristics of ML Algorithms Utilized in Burns and Reconstructive Surgery NA, not applicable; NR, not reported.

Lack of Data Augmentation and Validation during Training

Data augmentation is often used in small datasets, to artificially create more data samples and increase the effective dataset size, and as a result the statistical performance of a model. Data augmentation was used in only six of the 51 included studies. The remaining articles relied only on real data. For diagnostic predictions, the majority of studies utilized 2D photographs (n = 15) and CT scans (n = 4). For clinical outcome prediction, patient risk factors and laboratory measurements on admission was utilized in most models (n = 17). In preoperative planning, CT scans (n = 3) and 2D photographs (n = 2) comprised the majority of inputs utilized. Training ML models requires splitting the data set in training, validation, and test sets, where the validation set is used for hyperparameter tuning during training to prevent “overfitting” of the model to the given data. Only 10 of the 35 studies utilized a validation set during training. In total, 35 studies report their data training and testing splits, with an 80%–20% split between the training and testing set being the most common methodology presented (n = 9). In terms of output, ML algorithms functioned primarily via classification in 45 studies and via regression in six studies. Classification was utilized for the allocation of a new subject to a specific outcome (for example, burn patient needing a grafting versus healing via secondary intention). Regression was used in studies aiming to recreate a prediction of a postoperative outcome (postoperative CT scan, postoperative 2D photograph, and predicted wound size).

Risk of Bias Assessment

The risk of bias was assessed via the QUADAS-2 tool for risk of bias assessment and concerns over applicability (Fig. 2). The majority of studies had an unclear risk of bias (RoB) in the patient selection (n = 20) and index test domains (n = 24). Most had a low RoB by the reference standard (n = 39) and flow and timing domains (n = 35). For applicability concern, more than half of the studies had a low risk of RoB regarding the patient selection, index test, and reference standard domains (n = 32, n = 33, and n = 38 respectively).
Fig. 2.

Summary of the QUADAS-2 (Quality Assessment on Diagnostic Accuracy Studies-2) analysis.

Summary of the QUADAS-2 (Quality Assessment on Diagnostic Accuracy Studies-2) analysis.

DISCUSSION

This is the first systematic review focusing on the application of ML in plastic surgery, adding to previous reviews on AI in the specialty.[72] After careful selection of studies that demonstrated the clinical application of these algorithms, we identified 51 articles describing the application of 103 ML algorithms. In our review, the mean accuracy for diagnosis prediction, outcome prediction, and preoperative planning was 88.80%, 86.11%, and 80.28%, respectively. The model with the highest mean accuracy was NNs (88.25%), followed by SVMs (88.02%), decision trees/random forest (78.75%), and linear regression (76.85%). Similar findings have been reported in systematic reviews of other surgical specialties. In orthopedic surgery and neurosurgery, the most common models utilized have been Neural Networks (NNs), followed by support vector machines (SVMs) and logistic regression (LR).[3,73] Outcome prediction of ML models in these specialties ranged from 70% to 97%, which is in line with the findings of this report[8,72] Nonsurgical specialties have also utilized NNs and SVMs the most frequent, with accuracies approaching 96% depending on the specialty and model intent.[74,75] The reason behind this preference is potentially that NN, SVM, and DT most closely resemble the cognition behind clinical judgment, where clinicians aim to derive outcome classifications based on multiple, nonlinear inputs. In plastic surgery, ML demonstrated potentially superior accuracy in diagnosis and outcome prediction when compared with clinician judgment. In burn surgery, models included in this review were able to classify burn thickness with an accuracy of up to 99.3%, in contrast to the 60%–70% achieved by surgeons.[21,76] Models have also demonstrated the ability to predict mortality rates with an accuracy of 93%, outperforming commonly used predictive models such as the Belgian score, Boston score, and APACHE II with a sensitivity of 72%, 66%, and 81%, respectively.[50] In microsurgery, models produced high accuracy in prognosis of free flap failure (66%), whereas commonly used prognostic surgical risk calculators have been deemed unreliable for head and neck and breast microsurgical reconstruction (Brier score <0.01 and 0.09–0.44, respectively).[77,78] In addition, ML models demonstrated a predictive capacity for outcomes for which predictive models have not yet been developed but may assist the surgeon in the clinical workplace. Examples include prediction of AKI in burn patients, mortality from necrotizing infections, and postoperative surgical outcomes in craniosynostosis surgery and reconstructive surgery following craniosynostosis correction.[29,31,48,59] ML in plastic surgery has an incredible potential to advance patient care, but it is still in its infancy. This review has highlighted several patterns in successful application. Whenever a diagnosis is solely reliant upon a visual stimulus, for example 2D photography or CT, ML has consistently and reliably outperformed surgeons’ diagnostic accuracy.[18,37,39,40,46,51,53,59,63] Further, in conditions in which there are well-established correlations between certain risk markers and an outcome of interest, such as deranged blood tests on admission and AKI in burn patients, ML yielded highly accurate predictive algorithms.[38,44,55] [2447]However, attempts to include weakly related risk markers resulted in algorithms that had an overall lower predictive accuracy, rendering them unsafe for clinical practice. This review further identified that some plastic surgery subspecialties, such as hand surgery, have yet to incorporate this technology. This may be due to the challenging nature of classifying potential outcomes (eg, classification of hand function outcomes), or lack of data, yet future studies should aim to harvest the potential of this technology. From a technological standpoint, this review identified three key areas to improve future algorithms, that is by tapping into the potential of expanding the dataset size using data augmentation, utilizing novel deep learning models, and making proper use of algorithm validation in research. Data augmentation can be invaluable in the creation of future algorithms, solving the main obstacle of accessibility to large amounts of data needed to train these models. It is a process by which one can artificially enhance the diversity of a patient database without actually collecting new data. (See figure, Supplemental Digital Content 1, which displays data augmentation utilizing random cropping, random rotation, and mirroring (horizontal flipping). A single datapoint has now been augmented to seven novel datapoints. .) This was utilized in only five studies in this review. O’Neil et al utilized data augmentation to enhance a database of 11 patients to 269, allowing the creation of an algorithm to predict the probability of total free flap failure in microvascular breast reconstruction.[24] Until large-scale anonymized medical datasets become more readily available, such as the OpenSAFELY platform, by tapping to this potential of data augmentation, clinicians can overcome the challenges of limited patient datasets. Secondly, future research could substantially benefit from utilizing more recent advances in the field of NNs and deep learning. Compared with traditional ML, deep NNs can process vast amounts of data efficiently and discover complex underlying patterns in the data at scale. A limitation here is the large volume of appropriately structured data needed to train these models. Lastly, future research should ensure that all algorithms created are validated before testing. Separating the validation and test sets is crucial because it prevents overfitting of an algorithm to a set of given data and reports a misleading higher performance. Our review identified that only 10 of the 51 studies utilized validation, indicating that there is a high risk of bias in the remaining studies, as the high accuracies of the algorithms could be the result of overfitting. The evidence in this study is limited by the lack of high-quality level I evidence. The existing studies are mostly small retrospective case series that are inherently at the risk of bias. There are no prospective, randomized controlled trials evaluating these technologies in the clinical setting comparing them with clinician acumen, which limits our comparison on the safety and utility of the technologies. Further, the mean accuracy, sensitivity, and specificity of included algorithms were reported collectively for all algorithms, rather than performing subgroup analysis based on the condition examined because of insufficient studies in the specialty. This pooling of results is not an indication of the accuracy of any individual model, where each algorithm should be examined in isolation. However, this still provided an invaluable insight into the accuracy of these algorithms in plastic surgery. Lastly, because of the limited MeSH terms currently utilized in ML and medicine, potentially important studies on the topic may have been missed. These are expected to be minimal, as we performed a wide library search, which was also completed by extensive reference checking to provide an accurate, up-to-date review.

CONCLUSIONS

ML has the potential to enhance clinical decision-making in plastic surgery by making highly accurate diagnostic and outcome predictions; however, the technology is still in its infancy. There is vast heterogeneity between published studies in regard to the clinical task the algorithms are designed on and the model utilized, thus not allowing for data synthesis and meta-analysis. There is a pressing need for larger prospective, randomized control trials for level I and II data, where these algorithms are utilized in the clinical setting. Future research could benefit from larger datasets, data augmentation, state-of-the-art deep learning models, and more rigorous validation during design.
  74 in total

1.  Quantification of Head Shape from Three-Dimensional Photography for Presurgical and Postsurgical Evaluation of Craniosynostosis.

Authors:  Antonio R Porras; Liyun Tu; Deki Tsering; Esperanza Mantilla; Albert Oh; Andinet Enquobahrie; Robert Keating; Gary F Rogers; Marius George Linguraru
Journal:  Plast Reconstr Surg       Date:  2019-12       Impact factor: 4.730

2.  Prediction of aminoglycoside response against methicillin-resistant Staphylococcus aureus infection in burn patients by artificial neural network modeling.

Authors:  Shigeo Yamamura; Keiko Kawada; Rieko Takehira; Kenji Nishizawa; Shirou Katayama; Masaaki Hirano; Yasunori Momose
Journal:  Biomed Pharmacother       Date:  2007-12-03       Impact factor: 6.529

3.  Features identification for automatic burn classification.

Authors:  Carmen Serrano; Rafael Boloix-Tortosa; Tomás Gómez-Cía; Begoña Acha
Journal:  Burns       Date:  2015-07-15       Impact factor: 2.744

4.  The impending shortage and cost of training the future plastic surgical workforce.

Authors:  Jonathan Yang; Madhav Kishore Jayanti; Anne Taylor; Thomas E Williams; Pankaj Tiwari
Journal:  Ann Plast Surg       Date:  2014-02       Impact factor: 1.539

5.  Seeing the forest beyond the trees: Predicting survival in burn patients with machine learning.

Authors:  Adrienne N Cobb; Witawat Daungjaiboon; Sarah A Brownlee; Anthony J Baldea; Arthur P Sanford; Michael M Mosier; Paul C Kuo
Journal:  Am J Surg       Date:  2017-11-07       Impact factor: 2.565

6.  Toward an Automatic System for Computer-Aided Assessment in Facial Palsy.

Authors:  Diego L Guarin; Yana Yunusova; Babak Taati; Joseph R Dusseldorp; Suresh Mohan; Joana Tavares; Martinus M van Veen; Emily Fortier; Tessa A Hadlock; Nate Jowett
Journal:  Facial Plast Surg Aesthet Med       Date:  2020 Jan/Feb

7.  The applications of machine learning in plastic and reconstructive surgery: protocol of a systematic review.

Authors:  Angelos Mantelakis; Ankur Khajuria
Journal:  Syst Rev       Date:  2020-02-28

Review 8.  Machine Learning in Orthopedics: A Literature Review.

Authors:  Federico Cabitza; Angela Locoro; Giuseppe Banfi
Journal:  Front Bioeng Biotechnol       Date:  2018-06-27

9.  Identifying Ear Abnormality from 2D Photographs Using Convolutional Neural Networks.

Authors:  Rami R Hallac; Jeon Lee; Mark Pressler; James R Seaward; Alex A Kane
Journal:  Sci Rep       Date:  2019-12-03       Impact factor: 4.379

10.  Machine learning prediction in cardiovascular diseases: a meta-analysis.

Authors:  Chayakrit Krittanawong; Hafeez Ul Hassan Virk; Sripal Bangalore; Zhen Wang; Kipp W Johnson; Rachel Pinotti; HongJu Zhang; Scott Kaplin; Bharat Narasimhan; Takeshi Kitai; Usman Baber; Jonathan L Halperin; W H Wilson Tang
Journal:  Sci Rep       Date:  2020-09-29       Impact factor: 4.379

View more
  3 in total

1.  Toward a Universal Measure of Facial Difference Using Two Novel Machine Learning Models.

Authors:  Abdulrahman Takiddin; Mohammad Shaqfeh; Osman Boyaci; Erchin Serpedin; Mitchell A Stotland
Journal:  Plast Reconstr Surg Glob Open       Date:  2022-01-18

2.  Artificial intelligence, machine learning, and deep learning for clinical outcome prediction.

Authors:  Rowland W Pettit; Robert Fullem; Chao Cheng; Christopher I Amos
Journal:  Emerg Top Life Sci       Date:  2021-12-20

3.  Developing Machine Learning Algorithms to Support Patient-centered, Value-based Carpal Tunnel Decompression Surgery.

Authors:  Angelos Mantelakis; Ankur Khajuria
Journal:  Plast Reconstr Surg Glob Open       Date:  2022-08-25
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.