| Literature DB >> 34585629 |
Steindor Ellertsson1, Hrafn Loftsson2, Emil L Sigurdsson1,3,4.
Abstract
OBJECTIVE: Machine learning (ML) is expected to play an increasing role within primary health care (PHC) in coming years. No peer-reviewed studies exist that evaluate the diagnostic accuracy of ML models compared to general practitioners (GPs). The aim of this study was to evaluate the diagnostic accuracy of an ML classifier on primary headache diagnoses in PHC, compare its performance to GPs, and examine the most impactful signs and symptoms when making a prediction.Entities:
Keywords: General practice; Iceland; artificial intelligence; computer-assisted diagnosis; electronic health records; primary headache disorders; statistical data interpretation
Mesh:
Year: 2021 PMID: 34585629 PMCID: PMC8725823 DOI: 10.1080/02813432.2021.1973255
Source DB: PubMed Journal: Scand J Prim Health Care ISSN: 0281-3432 Impact factor: 2.581
Figure 1.The inclusion and filtering process of the CTNs. The clinical features were created by annotating the CTNs, which were then split into training, validation and test sets.
The demographic comparison of the training, validation and test set.
| Training set | Validation set | Test set | |
|---|---|---|---|
| Total size | 600 | 100 | 100 |
| Female | 434 (72.3%) | 65 (65%) | 72 (72%) |
| Mean age (min–max) | 34.56 (6–90) | 33.53 (8–78) | 31.29 (8–77) |
The performance metrics for the classifier and physicians on the test set.
| ML classifier | ||||
|---|---|---|---|---|
| Sensitivity | Specificity | PPV | MCC | |
| Cluster headache | 0.83 | 1.00 | 1.00 | 0.91 |
| Migraine with aura | 0.92 | 0.99 | 0.92 | 0.90 |
| Migraine without aura | 0.67 | 0.99 | 0.86 | 0.73 |
| Tension headache | 0.99 | 0.85 | 0.95 | 0.87 |
| Weighted average | 0.95 | 0.88 | 0.94 | 0.86 |
| GP Specialist 1 | ||||
| Cluster headache | 0.83 | 1.00 | 1.00 | 0.91 |
| Migraine with aura | 0.92 | 0.95 | 0.73 | 0.79 |
| Migraine without aura | 0.80 | 0.92 | 0.53 | 0.61 |
| Tension headache | 0.89 | 0.96 | 0.98 | 0.79 |
| Weighted average | 0.88 | 0.95 | 0.88 | 0.77 |
| GP Specialist 2 | ||||
| Cluster headache | 0.67 | 1.00 | 1.00 | 0.81 |
| Migraine with aura | 0.58 | 0.96 | 0.70 | 0.59 |
| Migraine without aura | 0.90 | 0.87 | 0.45 | 0.58 |
| Tension headache | 0.90 | 0.95 | 0.98 | 0.79 |
| Weighted average | 0.86 | 0.94 | 0.85 | 0.73 |
| GP Specialist 3 | ||||
| Cluster headache | 0.83 | 1.00 | 1.00 | 0.91 |
| Migraine with aura | 0.67 | 0.99 | 0.89 | 0.74 |
| Migraine without aura | 0.50 | 0.95 | 0.56 | 0.48 |
| Tension headache | 0.97 | 0.72 | 0.91 | 0.75 |
| Weighted average | 0.90 | 0.78 | 0.88 | 0.73 |
| GP Trainee 1 | ||||
| Cluster headache | 0.83 | 1.00 | 1.00 | 0.91 |
| Migraine with aura | 0.92 | 0.99 | 0.92 | 0.90 |
| Migraine without aura | 0.70 | 0.93 | 0.54 | 0.56 |
| Tension headache | 0.93 | 0.88 | 0.96 | 0.80 |
| Weighted average | 0.89 | 0.91 | 0.90 | 0.78 |
| GP Trainee 2 | ||||
| Cluster headache | 0.83 | 0.95 | 0.56 | 0.65 |
| Migraine with aura | 0.42 | 0.99 | 0.83 | 0.55 |
| Migraine without aura | 0.80 | 0.79 | 0.31 | 0.41 |
| Tension headache | 0.81 | 0.95 | 0.98 | 0.64 |
| Weighted average | 0.78 | 0.91 | 0.76 | 0.58 |
| GP Trainee 3 | ||||
| Cluster headache | 0.83 | 0.98 | 0.71 | 0.75 |
| Migraine with aura | 0.50 | 0.99 | 0.86 | 0.62 |
| Migraine without aura | 0.70 | 0.90 | 0.44 | 0.49 |
| Tension headache | 0.94 | 0.90 | 0.97 | 0.82 |
| Weighted average | 0.87 | 0.91 | 0.86 | 0.75 |
PPV stands for Positive Predictive Value and MCC for Matthews Correlation Coefficient.
Figure 2.(a) Shapley Additive Explanations (SHAP) for the most impactful input features for a CH prediction. Each feature's name is on the left. The dots' color represents the feature's value, where blue color is lower than red. A blue dot means that a feature was negative in a single CTN, that is, it was not present. Red means the opposite. On the X-axis, the impact on the prediction is plotted, where higher score leads to increased probability of outputting a positive prediction.
Figure 3.The ROC curve for the ML classifier, plotted for each diagnosis on the test set. The AUROC score is a measure of classifier performance with a maximum value of 1. It is calculated as the area under the ROC curve. The performance of each physician is plotted, as is the mean performance of the trainee physicians and specialists.