| Literature DB >> 33008388 |
Julian Hatwell1, Mohamed Medhat Gaber2, R Muhammad Atif Azad2.
Abstract
BACKGROUND: Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice.Entities:
Keywords: AdaBoost; Black box problem; Computer aided diagnostics; Explainable AI; Interpretability
Mesh:
Year: 2020 PMID: 33008388 PMCID: PMC7531148 DOI: 10.1186/s12911-020-01201-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Explanation of a classifier for foetal heart abnormalities
| Decision: | Explanation: | Contrast: | Confidence: |
|---|---|---|---|
| Normal | DP ≤0.0013 ∧ | −74.5% | Coverage: 60.0% |
| ALTV ≤7.7 ∧ | −43.2 | Precision: 98.2% of covered | |
| Prior 79.0% | Min ≤113.15 | −34.58% |
DP: Number of prolonged decelerations per second.
ALTV: % time with abnormal short term variability.
Min: Minimum of baseline foetal heart rate histogram
Explanation of a non-clinical mental health assessment classifier
| Decision: | Explanation: | Contrast: | Confidence: |
|---|---|---|---|
| Has sought | work interfere ≤1.5 ∧ | −45.6% | Coverage: 24.9% |
| treatment | family history >0.9 | −23.3% | Precision: 94.6% of covered |
| Prior 54.9% | |||
Work interfere: If you have a mental health condition, do you feel it interferes with your work?
Answers: 0 = Often, 1 = Sometimes, 2 = Not Sure, 3 = Rarely, 4 = Never
family history: Do you have a family history of mental illness?
Answers: 0 = No, 1 = Not Sure, 2 = Yes
Explanation of automated 30-day hospital readmission risk assessment
| Decision: | Explanation: | Contrast: | Confidence: |
|---|---|---|---|
| Risk: Low | # inpatient ≤1.0 ∧ | −58.1% | Coverage: 16.5% |
| # emergency ≤0.5 ∧ | −46.7% | Precision: 98.1% of covered | |
| # outpatient ≤0.5 ∧ | −41.8% | ||
| Prior 65.0% | # diagnoses ≤5.5 | −39.6% |
# xxxx: number of e.g. hospital visits of type xxxx
Explanation of a classifier for thyroid condition
| Decision: | Explanation: | Contrast: | Confidence: |
|---|---|---|---|
| Abnormal | TSH >6.83 | −78.5% | Coverage: 8.2% |
| Precision: 98.2% of covered | |||
| Prior 26.0% | |||
TSH: Thyroid Stimulating Hormone level test result
Data sets used in the experiments
| Data set | Target | Classes | Class balance | Features | Of which nominal | N |
|---|---|---|---|---|---|---|
| Breast cancer | mb | 2 | 0.63:0.37 | 31 | 1 | 569 |
| Cardiotocography | NSP | 3 | 0.78:0.14: | 22 | 1 | 2126 |
| 0.08 | ||||||
| Diabetic retinopathy | dr | 2 | 0.53:0.47 | 20 | 1 | 1151 |
| Cleveland heart | HDisease | 2 | 0.54:0.46 | 14 | 8 | 303 |
| Mental health survey ’16 | mh2 | 2 | 0.50:0.50 | 46 | 44 | 1433 |
| Mental health survey ’14 | treatment | 2 | 0.51:0.49 | 24 | 3 | 1259 |
| Hospital readmission | readmitted | 2 | 0.54:0.46 | 65 | 1 | 25000 |
| Thyroid | diagnosis | 2 | 0.74:0.26 | 30 | 3 | 9172 |
| Understanding society2 | mh | 3 | 0.22:0.62: | 330 | 246 | 11745 |
| 0.16 |
2This data set is safeguarded by the UK Data Service and used under end user license. It is not included in our repository
Summary of related work
| Author(s) | Date | Medical | Model | XAI |
|---|---|---|---|---|
| Condition(s) | Mechanism | |||
| Lamy et al. [ | 2019 | Breast Cancer | WkNN and MDS | CBR |
| (treatment) | ||||
| Kwon et al. [ | 2018 | General health | RNN | t-SNE and |
| Visual Analytics | ||||
| Adnan and Islam [ | 2017 | Heart disease, | Tree ensembles | Logical Rules |
| dementia | ||||
| Jalali and Pfeifer [ | 2016 | Cancer | L1-SVM | Feature |
| biomarkers | ensemble | importance | ||
| Turgeman and May [ | 2016 | Hospital | C5.0 Tree and SVM | Logical Rule |
| readmission | ensemble | |||
| Jovanovic et al. [ | 2016 | Hospital | Tree Lasso | Regression |
| readmission | Coefficients | |||
| Letham et al. [ | 2015 | Stroke | BRL | Bayesian Rules |
| Caruana et al. [ | 2015 | Pneumonia risk | GA2M | PI plots |
| Kästner et al. [ | 2012 | Breast cancer | Neural Gas | Fuzzy Rules |
Fig. 1Conceptual diagram of Ada-WHIPS
Fig. 2Conceptual diagram of a decision path for one instance through one tree
Fig. 3Counterfactual spaces - conceptual diagram
Final AdaBoost model scores
| Undiscretised: used by | Discretised: used by | ||||
|---|---|---|---|---|---|
| Ada-WHIPS & LORE | Anchors | ||||
| Data | ntree | Accuracy | Accuracy | ||
| SAMME | |||||
| Breast cancer | 200 | 0.98 | 0.96 | 0.96 | 0.92 |
| Cardiotocography | 800 | 0.94 | 0.84 | 0.89 | 0.70 |
| Diabetic retinopathy | 1000 | 0.68 | 0.36 | 0.66 | 0.33 |
| Cleveland heart | 200 | 0.77 | 0.52 | 0.80 | 0.59 |
| Mental health survey ’16 | 200 | 0.88 | 0.76 | 0.88 | 0.75 |
| Mental health survey ’14 | 200 | 0.83 | 0.65 | 0.81 | 0.62 |
| Hospital readmission | 800 | 0.62 | 0.22 | 0.60 | 0.18 |
| Thyroid | 1200 | 0.97 | 0.92 | 0.80 | 0.45 |
| Understanding society | 600 | 0.64 | 0.13 | 0.61 | 0.14 |
| SAMME.R | |||||
| Breast cancer | 1000 | 0.98 | 0.96 | 0.95 | 0.90 |
| Cardiotocography | 1600 | 0.94 | 0.82 | 0.88 | 0.67 |
| Diabetic retinopathy | 200 | 0.69 | 0.38 | 0.65 | 0.30 |
| Cleveland heart | 400 | 0.76 | 0.50 | 0.82 | 0.63 |
| Mental health survey ’16 | 800 | 0.87 | 0.73 | 0.86 | 0.72 |
| Mental health survey ’14 | 200 | 0.80 | 0.60 | 0.81 | 0.63 |
| Hospital readmission | 200 | 0.62 | 0.22 | 0.63 | 0.23 |
| Thyroid | 1600 | 0.97 | 0.92 | 0.76 | 0.37 |
| Understanding society | 200 | 0.62 | 0.13 | 0.62 | 0.15 |
Accuracy and Cohen’s kappa on Held Out Data
Worked example for foetal heart abnormalities data set
Worked example for non-clinical mental health assessment data set
Worked example for automated 30-day hospital readmission risk assessment data set
Worked example for thyroid condition data set
Fig. 4Mean Coverage for SAMME model explanations. Guide lines are added to mitigate over-plotting
Fig. 5Mean Coverage for SAMME.R model explanations. Guide lines are added to mitigate over-plotting
Coverage: Top two by mean rank (mrnk) for three-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Breast | LORE | 1.54 | Ada-WHIPS | 1.61 | 170 | 0.41 | 0.3412 |
| Cardiotocography | LORE | 1.52 | Ada-WHIPS | 1.62 | 637 | 1.06 | 0.1442 |
| Diabetic retinography | 1.39 | LORE | 2.20 | 344 | 6.76 | ≈0** | |
| Cleveland heart | LORE | 1.63 | Ada-WHIPS | 1.82 | 90 | 0.8158 | 0.2072 |
| Mental health survey ’16 | 1.51 | Anchors | 2.22 | 429 | 6.19 | ≈0** | |
| SAMME.R | |||||||
| Breast | Ada-WHIPS | 1.48 | LORE | 1.70 | 170 | 1.29 | 0.0980 |
| Cardiotocography | LORE | 1.52 | Ada-WHIPS | 1.62 | 637 | 1.14 | 0.1269 |
| Diabetic retinography | 1.57 | Anchors | 2.17 | 344 | 4.98 | 0.0000** | |
| Cleveland heart | LORE | 1.50 | Ada-WHIPS | 1.86 | 90 | 1.52 | 0.0649 |
| Mental health survey ’16 | Ada-WHIPS | 1.68 | Anchors | 1.80 | 429 | 1.04 | 0.1492 |
Coverage: Mean rank (mrnk) for two-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Mental health survey ’14 | 1.16 | Anchors | 1.84 | 377 | 66 | ≈0** | |
| Hospital readmission | 1.01 | Anchors | 1.98 | 1000 | 782.5 | ≈0** | |
| Thyroid | 1.10 | Anchors | 1.90 | 1000 | 14806 | ≈0** | |
| Understanding society | 1.20 | Anchors | 1.80 | 1000 | 858 | ≈0** | |
| SAMME.R | |||||||
| Mental health survey ’14 | 1.13 | Anchors | 1.87 | 377 | 119 | ≈0** | |
| Hospital readmission | 1.33 | Anchors | 1.67 | 1000 | 174990 | ≈0** | |
| Thyroid | 1.02 | Anchors | 1.98 | 1000 | 1754 | ≈0** | |
| Understanding society | 1.07 | Anchors | 1.93 | 1000 | 6417 | ≈0** | |
Fig. 6Mean Precision SAMME. Guide lines are added to mitigate over-plotting
Fig. 7Mean Precision SAMME.R. Guide lines are added to mitigate over-plotting
Fig. 8Distributions of Precision SAMME
Fig. 9Distributions of Precision SAMME.R
Proportion of over-fitting, 0.0 precision explanations
| SAMME | SAMME.R | |||||
|---|---|---|---|---|---|---|
| Data | Ada-WHIPS | Anchors | LORE | Ada-WHIPS | Anchors | LORE |
| Breast cancer | 0 | 0.01 | 0.04 | 0 | 0.18 | 0.06 |
| Cardiotocography | 0.00 | 0.07 | 0.09 | 0.00 | 0.08 | 0.09 |
| Diabetic retinopathy | 0 | 0.15 | 0.19 | 0.00 | 0.13 | 0.28 |
| Cleveland heart | 0 | 0.03 | 0.14 | 0 | 0.03 | 0.12 |
| Mental health survey ’16 | 0 | 0.00 | 0.01 | 0 | 0.04 | 0.06 |
| Mental health survey ’14 | 0 | 0.01 | N/A | 0 | 0.01 | N/A |
| Hospital readmission | 0 | 0.15 | N/A | 0 | 0.01 | N/A |
| Thyroid | 0 | 0.03 | N/A | 0 | 0.15 | N/A |
| Understanding society | 0.00 | 0.01 | N/A | 0.01 | 0.08 | N/A |
Fig. 10Distributions of Rule Length. Note the y-axis is log10 scaled
Precision: Top two by mean rank (mrnk) for three-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Breast | 1.40 | Ada-WHIPS | 1.97 | 170 | 3.31 | 0.0004** | |
| Cardiotocography | 1.39 | Ada-WHIPS | 2.09 | 637 | 7.89 | ≈0** | |
| Diabetic retinography | 1.62 | Ada-WHIPS | 1.96 | 344 | 2.85 | 0.0022** | |
| Cleveland heart | 1.16 | Ada-WHIPS | 2.03 | 90 | 3.68 | 0.0001** | |
| Mental health survey ’16 | Anchors | 1.83 | LORE | 1.95 | 429 | 1.02 | 0.1539 |
| SAMME.R | |||||||
| Breast | 1.35 | Ada-WHIPS | 2.08 | 170 | 4.38 | <0.0001** | |
| Cardiotocography | 1.28 | Ada-WHIPS | 2.09 | 637 | 9.16 | ≈0** | |
| Diabetic retinography | 1.50 | Ada-WHIPS | 1.92 | 344 | 3.47 | 0.0002** | |
| Cleveland heart | 1.24 | Ada-WHIPS | 1.90 | 90 | 2.77 | 0.0028** | |
| Mental health survey ’16 | Anchors | 1.83 | Ada-WHIPS | 1.84 | 429 | 0.08 | 0.4678 |
Precision: Mean rank (mrnk) for two-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Mental health survey ’14 | 1.11 | Ada-WHIPS | 1.89 | 377 | 45074 | ≈0** | |
| Hospital readmission | 1.24 | Ada-WHIPS | 1.76 | 1000 | 333580 | ≈0** | |
| Thyroid | 1.19 | Ada-WHIPS | 1.81 | 1000 | 405600 | ≈0** | |
| Understanding society | 1.08 | Ada-WHIPS | 1.92 | 1000 | 458060 | ≈0** | |
| SAMME.R | |||||||
| Mental health survey ’14 | 1.11 | Ada-WHIPS | 1.89 | 377 | 45281 | ≈0** | |
| Hospital readmission | 1.07 | Ada-WHIPS | 1.93 | 1000 | 480520 | ≈0** | |
| Thyroid | Anchors | 1.47 | Ada-WHIPS | 1.53 | 1000 | 233670 | 0.1601 |
| Understanding society | 1.31 | Ada-WHIPS | 1.69 | 1000 | 266150 | ≈0** | |
Fig. 11Mean Stability SAMME. Guide lines are added to mitigate over-plotting
Fig. 12Mean Stability SAMME.R. Guide lines are added to mitigate over-plotting
Stability: Top two by mean rank (mrnk) for three-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Breast | 1.48 | Anchors | 2.16 | 170 | 3.96 | <0.0001** | |
| Cardiotocography | 1.48 | Anchors | 2.19 | 637 | 7.99 | ≈0** | |
| Diabetic retinography | Ada-WHIPS | 1.70 | Anchors | 1.84 | 344 | 1.18 | 0.1198 |
| Cleveland heart | Ada-WHIPS | 1.60 | Anchors | 1.70 | 90 | 0.42 | 0.3374 |
| Mental health survey ’16 | Anchors | 1.87 | LORE | 2.00 | 429 | 1.14 | 0.1269 |
| SAMME.R | |||||||
| Breast | 1.38 | Anchors | 2.18 | 170 | 4.67 | ≈0** | |
| Cardiotocography | 1.49 | LORE | 2.10 | 637 | 6.80 | ≈0** | |
| Diabetic retinography | Ada-WHIPS | 1.64 | Anchors | 1.67 | 344 | 0.24 | 0.4050 |
| Cleveland heart | Ada-WHIPS | 1.49 | Anchors | 1.73 | 90 | 0.98 | 0.1638 |
| Mental health survey ’16 | 1.44 | LORE | 2.18 | 429 | 6.45 | ≈0** | |
Stability: Mean rank (mrnk) for two-way comparisons
| Data | 1st | mrnk | 2nd | mrnk | N | ||
|---|---|---|---|---|---|---|---|
| SAMME | |||||||
| Mental health survey ’14 | 1.19 | Ada-WHIPS | 1.81 | 377 | 39293 | ≈0** | |
| Hospital readmission | 1.43 | Anchors | 1.57 | 1000 | 136050 | ≈0** | |
| Thyroid | 1.35 | Ada-WHIPS | 1.65 | 1000 | 307840 | ≈0** | |
| Understanding society | 1.14 | Ada-WHIPS | 1.86 | 1000 | 405340 | ≈0** | |
| SAMME.R | |||||||
| Mental health survey ’14 | 1.19 | Ada-WHIPS | 1.81 | 377 | 40515 | ≈0** | |
| Hospital readmission | 1.14 | Ada-WHIPS | 1.86 | 1000 | 439750 | ≈0** | |
| Thyroid | 1.18 | Anchors | 1.82 | 1000 | 50600 | ≈0** | |
| Understanding society | 1.39 | Ada-WHIPS | 1.61 | 1000 | 220150 | ≈0** | |
Fig. 13Distributions of Computation Time per Explanation. Note the y-axis is log10 scaled
Coverage of explanations of AdaBoost SAMME
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.3635±0.0068 | 0.1530±0.0053 | 0.3914±0.0156 |
| Cardiotocography | 0.3867±0.0092 | 0.0637±0.0018 | 0.4417±0.0120 |
| Diabetic retinopathy | 0.3225±0.0125 | 0.0636±0.0039 | 0.1060±0.0060 |
| Cleveland heart | 0.2310±0.0084 | 0.1101±0.0079 | 0.3259±0.0259 |
| Mental health survey ’16 | 0.4974±0.0026 | 0.3915±0.0083 | 0.3777±0.0086 |
| Mental health survey ’14 | 0.3368±0.0063 | 0.1483±0.0030 | N/A |
| Hospital readmission | 0.1809±0.0040 | 0.0095±0.0004 | N/A |
| Thyroid | 0.3630±0.0074 | 0.0636±0.0015 | N/A |
| Understanding society | 0.6679±0.0108 | 0.2729±0.0040 | N/A |
Coverage of explanations of AdaBoost SAMME.R
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.33502±0.0055 | 0.1513±0.0054 | 0.3574±0.0157 |
| Cardiotocography | 0.3894±0.0093 | 0.0667±0.0019 | 0.4765±0.0128 |
| Diabetic retinopathy | 0.1349±0.0053 | 0.0759±0.0040 | 0.0945±0.0068 |
| Cleveland heart | 0.2182±0.0085 | 0.1180±0.0078 | 0.3754±0.0271 |
| Mental health survey ’16 | 0.3578±0.0054 | 0.1778±0.0072 | 0.3248±0.0101 |
| Mental health survey ’14 | 0.2927±0.0053 | 0.1444±0.0030 | N/A |
| Hospital readmission | 0.1598±0.0038 | 0.1345±0.0042 | N/A |
| Thyroid | 0.3793±0.0073 | 0.0224±0.0008 | N/A |
| Understanding society | 0.6891±0.0107 | 0.1057±0.0038 | N/A |
Precision of explanations of AdaBoost SAMME
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.9819±0.0022 | 0.9915±0.0062 | 0.8405±0.0179 |
| Cardiotocography | 0.9369±0.0039 | 0.9915±0.0097 | 0.8209±0.0109 |
| Diabetic retinopathy | 0.8031±0.0075 | 0.8016±0.0188 | 0.6300±0.0182 |
| Cleveland heart | 0.8744±0.0118 | 0.9644±0.0189 | 0.6300±0.0321 |
| Mental health survey ’16 | 0.9862±0.0010 | 0.9873±0.0035 | 0.9744±0.0061 |
| Mental health survey ’14 | 0.9301±0.0021 | 0.9798±0.0056 | N/A |
| Hospital readmission | 0.8973±0.0016 | 0.8163±0.0110 | N/A |
| Thyroid | 0.9205±0.0026 | 0.9441±0.0055 | N/A |
| Understanding society | 0.9643±0.0016 | 0.9749±0.0035 | N/A |
Precision of explanations of AdaBoost SAMME.R
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.9831±0.0014 | 0.9793±0.0103 | 0.8215±0.0210 |
| Cardiotocography | 0.9324±0.0032 | 0.9117±0.0107 | 0.7931±0.0110 |
| Diabetic retinopathy | 0.8272±0.0073 | 0.8164±0.0175 | 0.5481±0.0203 |
| Cleveland heart | 0.9059±0.0105 | 0.9640±0.0189 | 0.5971±0.0293 |
| Mental health survey ’16 | 0.9849±0.0013 | 0.9502±0.0100 | 0.9129±0.0124 |
| Mental health survey ’14 | 0.9030±0.0043 | 0.9811±0.0056 | N/A |
| Hospital readmission | 0.9129±0.0013 | 0.9811±0.0032 | N/A |
| Thyroid | 0.9481±0.0015 | 0.8154±0.0110 | N/A |
| Understanding society | 0.8677±0.0043 | 0.8903±0.0081 | N/A |
Stability of explanations of AdaBoost SAMME
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.9500±0.0024 | 0.8992±0.0072 | 0.8226±0.0137 |
| Cardiotocography | 0.9067±0.0044 | 0.8311±0.0078 | 0.8113±0.0085 |
| Diabetic retinopathy | 0.7745±0.0067 | 0.7196±0.0114 | 0.6388±0.0106 |
| Cleveland heart | 0.7973±0.0106 | 0.7671±0.0145 | 0.5906±0.0195 |
| Mental health survey ’16 | 0.9770±0.0011 | 0.9706±0.0053 | 0.9592±0.0046 |
| Mental health survey ’14 | 0.9125±0.0021 | 0.9283±0.0053 | N/A |
| Hospital readmission | 0.8930±0.0017 | 0.7306±0.0071 | N/A |
| Thyroid | 0.9121±0.0028 | 0.9033±0.0047 | N/A |
| Understanding society | 0.9594±0.0017 | 0.9586±0.0035 | N/A |
Stability of explanations of AdaBoost SAMME.R
| Data | Ada-WHIPS | Anchors | LORE |
|---|---|---|---|
| Breast cancer | 0.9505±0.0017 | 0.8885±0.0089 | 0.8035±0.161 |
| Cardiotocography | 0.9020±0.0038 | 0.8226±0.0087 | 0.7844±0.0086 |
| Diabetic retinopathy | 0.7821±0.0064 | 0.7436±0.0109 | 0.5814±0.0111 |
| Cleveland heart | 0.8171±0.0092 | 0.7807±0.0143 | 0.5985±0.0190 |
| Mental health survey ’16 | 0.9707±0.0015 | 0.9051±0.0073 | 0.9013±0.0088 |
| Mental health survey ’14 | 0.8852±0.0041 | 0.9293±0.0051 | N/A |
| Hospital readmission | 0.9075±0.0029 | 0.9514±0.0029 | N/A |
| Thyroid | 0.9401±0.0015 | 0.7716±0.0071 | N/A |
| Understanding society | 0.8616±0.0043 | 0.8624±0.0063 | N/A |