| Literature DB >> 35204328 |
Yiming Zhang1,2, Ying Weng1, Jonathan Lund2.
Abstract
In recent years, artificial intelligence (AI) has shown great promise in medicine. However, explainability issues make AI applications in clinical usages difficult. Some research has been conducted into explainable artificial intelligence (XAI) to overcome the limitation of the black-box nature of AI methods. Compared with AI techniques such as deep learning, XAI can provide both decision-making and explanations of the model. In this review, we conducted a survey of the recent trends in medical diagnosis and surgical applications using XAI. We have searched articles published between 2019 and 2021 from PubMed, IEEE Xplore, Association for Computing Machinery, and Google Scholar. We included articles which met the selection criteria in the review and then extracted and analyzed relevant information from the studies. Additionally, we provide an experimental showcase on breast cancer diagnosis, and illustrate how XAI can be applied in medical XAI applications. Finally, we summarize the XAI methods utilized in the medical XAI applications, the challenges that the researchers have met, and discuss the future research directions. The survey result indicates that medical XAI is a promising research direction, and this study aims to serve as a reference to medical experts and AI scientists when designing medical XAI applications.Entities:
Keywords: artificial intelligence; deep learning; diagnosis; explainable artificial intelligence (XAI); machine learning; surgery
Year: 2022 PMID: 35204328 PMCID: PMC8870992 DOI: 10.3390/diagnostics12020237
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1The relationship between artificial intelligence, machine learning, deep learning, and explainable artificial intelligence.
Figure 2Taxonomy of XAI methods, post hoc XAI types, and some examples.
Figure 3The overall pipeline of a medical XAI application: the XAI methods can be intrinsic or post hoc, and they can provide decision-making and explanation to the doctors.
Figure 4Chronic wound image and its importance map using LIME [30]: (a) original wound image; (b) importance map.
Figure 5Visual feedback of the surgeon’s surgical task using CAM [50]. Visual feedback for the surgeon’s surgical task using CAM [50]. The red and orange subsequences in the plot show the high contribution to the surgeon’s surgical skill assessment task. In contrast, the green and blue subsequences indicate the low contribution.
Literature review of medical XAI applications in diagnosis.
| SN# | Reference | Year | Aim | AI Algorithm | AI Evaluation Metrics | XAI Method | XAI Method Type | XAI Evaluation? |
|---|---|---|---|---|---|---|---|---|
| 1 | [ | 2021 | Allergy diagnosis | kNN, SVM, C 5.0, MLP, AdaBag, RF | Accuracy: 86.39% | Condition-prediction (IF-THEN) rules | Rule-based | No |
| 2 | [ | 2021 | Breast cancer therapies | Cluster analysis | N/A | Adaptive | Dimension reduction | No |
| 3 | [ | 2021 | Spine | One-class SVM, binary RF | F1: 80 ± 12% | Local interpretable model-agnostic explanations (LIME) | Explanation by simplification | No |
| 4 | [ | 2021 | Alzheimer’s disease | Two-layer model with RF | First layer: accuracy: 93.95% | SHAP, Fuzzy | Feature relevance, rule-based | No |
| 5 | [ | 2021 | Hepatitis | LR, DT, kNN, SVM, RF | Accuracy: 91.9% | SHAP, LIME, partial dependence plots (PDP) | Feature relevance, explanation by simplification | No |
| 6 | [ | 2021 | Chronic wound | CNN-based model: pretrained VGG-16 | Precision: 95% | LIME | Explanation by simplification | No |
| 7 | [ | 2021 | Fenestral otosclerosis | CNN-based model: proposed | AUC: 99.5% | Visualization of learned deep representations | Visual explanation | No |
| 8 | [ | 2021 | Lymphedema (Chinese EMR) | Counterfactual multi-granularity graph supporting facts extraction (CMGE) method | Precision: 99.04% | Graph neural network, counterfactual reasoning | Restricted neural network architecture | No |
| 9 | [ | 2020 | Clinical diagnosis | Entity-aware Convolutional neural networks (ECNNs) | Top-3 sensitivity: 88.8% | Bayesian network ensembles | Bayesian models | Yes |
| 10 | [ | 2020 | Glioblastoma multiforme (GBM) diagnosis | VGG16 | Accuracy: 97% | LIME | Explanation by simplification | No |
| 11 | [ | 2020 | Pulmonary nodule diagnostic | CNN | Accuracy: 82.15% | Visually interpretable network (VINet), LRP, CAM, VBP | Visual explanation | No |
| 12 | [ | 2020 | Alzheimer’s disease diagnosis | Naïve Bayes (NB), grammatical evolution | ROC: 0.913 | Context-free grammar (CFG) | Rule-based | No |
| 13 | [ | 2020 | Lung cancer diagnosis | Neural networks, RF | N/A | LIME, natural language explanation | Explanation by simplification, text explanation | No |
| 14 | [ | 2020 | Traumatic brain injury (TBI) identification | k-means, spectral clustering, gaussian mixture | N/A | Quality assessment of the clustering features | Feature relevance | No |
| 15 | [ | 2020 | COVID-19 chest X-ray diagnosis | CNN-based model: proposed COVID-Net | Accuracy: 93.3% | GSInquire | Restricted neural network architecture | No |
| 16 | [ | 2020 | Colorectal cancer diagnosis | CNN | Accuracy: 91.08% | Explainable Cumulative Fuzzy Class Membership Criterion (X-CFCMC) | Visual explanation | Yes |
| 17 | [ | 2020 | Diagnosis of thyroid nodules | Neural network | Accuracy: 93.15% | CAM | Visual explanation | No |
| 18 | [ | 2020 | Phenotyping psychiatric disorders diagnosis | DNN | White matter accuracy: 90.22% | Explainable deep neural network (EDNN) | Visual explanation | No |
| 19 | [ | 2020 | Parkinson’s disease (PD) diagnosis | CNN | Accuracy: 95.2% | LIME | Explanation by simplification | No |
| 20 | [ | 2019 | Post-stroke hospital discharge | LR, RF, RF with AdaBoost, MLP | Test accuracy: 71% | LR, LIME | Intrinsic, Explanation by simplification | No |
| 21 | [ | 2019 | Breast cancer diagnostic decision and therapeutic decision | kNN, distance-weighted kNN (WkNN), rainbow boxes-inspired algorithm (RBIA) | Accuracy: 80.3% | Case-based reasoning (CBR) approach | Explanation by example | Yes |
| 22 | [ | 2019 | Alzheimer’s diagnosis | RF, SVM, DT | Sensitivity: 84% | An interpretable ML model: sparse high-order interaction model with rejection option (SHIMR) | Rule-based | No |
SN#: serial number; N/A: not applicable; AI: artificial intelligence; XAI: explainable artificial intelligence; kNN: k-nearest neighbor; SVM: support vector machine; MLP: multi-layer perceptron; RF: random forest; MCC: matthews correlation coefficient; BSS: brier skill score; SHAP: SHapley Additive exPlanations; LR: logistic regression; DT: decision tree; LIME: Local interpretable model-agnostic explanations; PDP: partial dependence plots; CNN: convolutional neural networks; DNN: deep neural network; AUC: area under the curve.
Literature review of medical XAI applications in surgery.
| SN# | Reference | Year | Aim | AI Algorithm | AI Evaluation Metrics | XAI Method | XAI Method Type | XAI Evaluation? |
|---|---|---|---|---|---|---|---|---|
| 23 | [ | 2020 | Evidence-based recommendation surgery | XGBoost | Validation accuracy: 78.9% | SHAP | Feature relevance | No |
| 24 | [ | 2020 | Surgery training | SVM | Accuracy: 92% | Virtual operative assistant | Feature relevance | No |
| 25 | [ | 2019 | Surgical skill assessment | FCN | Suturing accuracy: 100% | CAM | Visual explanation | No |
| 26 | [ | 2019 | Automatic recognition of instruments in laparoscopy videos | CNN | M2CAI Cholec data tuning on InstCnt non-instrument Instrument: | Activation maps | Visual explanation | No |
| 27 | [ | 2019 | Surgical education | CNN | Percentage of relevant frames among top 50 retrieved frames for three phases: 64.42%, 99.54%, 99.09% | Saliency map, content-based image retrieval | Visual explanation, explanation by example | No |
SN#: serial number; AI: artificial intelligence; XAI: explainable artificial intelligence; SHAP: SHapley Additive exPlanations; SVM: support vector machine; FCN: fully convolutional neural network; CAM: class activation mapping; CNN: convolutional neural networks.
Figure 6Interpreting a prediction with the post hoc XAI method: SHAP.
Figure 7Interpreting a prediction with the post hoc XAI method: LIME. The x-axis shows the feature effect.
Figure 8Interpret the black-box model’s decisions with PDP for the feature “mean radius”.