| Literature DB >> 35536633 |
Felix Balzer1, Markus A Feufel2, Malte L Schmieding1, Marvin Kopka1,3, Konrad Schmidt4,5, Sven Schulz-Niethammer2.
Abstract
BACKGROUND: Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment.Entities:
Keywords: digital health; eHealth apps; mobile phone; patient-centered care; symptom checker; triage
Mesh:
Year: 2022 PMID: 35536633 PMCID: PMC9131144 DOI: 10.2196/31810
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Overall triage accuracy of symptom checker apps in 2015 (data from a study by Semigran et al [23]; N=15) and 2020 (data collected by us; N=22).
| Sample of symptom checker apps | Overall triage accuracy | ||||
|
| 2015 | 2020 | |||
|
| Values (%), median (IQR) | Values, n (%) | Values (%), median (IQR) | Values, n (%) | |
| All triaging apps included in the respective study | 59.1 (51.7-67.1) | 15 (100) | 55.8 (47.8-62.9) | 22 (100) | |
| Subset of apps included in both studies | 55.9 (49.4-65.7) | 8 (53) | 58.3 (53.8-65.3) | 8 (36) | |
| Subset of apps capable of providing self-care triage advice | 59.5 (53.3-70.7) | 11 (73) | 59.5 (50.0-64.4) | 17 (77) | |
Figure 1Overall triage accuracy of 8 symptom checkers included in both samples (2015 and 2020) and assessed on the same 45 case vignettes in 2015 and 2020. Data on symptom checker accuracy for 2015 are taken from a study by Semigran et al [23]. Of the 8 symptom checkers, 3 never recommended self-care as triage level (colored in red) in 2015 and 2 in 2020. One symptom checker (Symptomate) never recommended self-care in the 2015 study by Semigran et al [23] but provides such recommendations in 2020, as indicated both in our data and reported by Hill et al [24,35]. NHS: National Health Service.
Confusion matrix of triage advice of 11 symptom checker apps assessed in 2015 by Semigran et al [23].
| Triage recommendation provided by the symptom checker app | Gold standard solution of the triage level for the case vignette (15 case vignettes per category), n (%) | ||
|
| Emergency (n=130 evaluations) | Nonemergency (n=128 evaluations) | Self-care (n=127 evaluations) |
| Emergency care | 103 (79.2) | 41 (32) | 23 (18.1) |
| Nonemergency | 22 (16.9) | 74 (57.8) | 46 (36.2) |
| Self-care | 5 (3.8) | 13 (10.1) | 58 (45.6) |
Confusion matrix of triage advice of 17 symptom checker apps assessed in 2020 on the same 45 case vignettes as used by Semigran et al [23] in 2015.
| Triage recommendation provided by the symptom checker app | Gold standard solution of the triage level for the case vignette (15 case vignettes per category), n (%) | ||
|
| Emergency (n=202 evaluations) | Nonemergency (n=205 evaluations) | Self-care (n=193 evaluations) |
| Emergency care | 116 (57.4) | 26 (12.6) | 6 (3.1) |
| Nonemergency | 80 (39.6) | 147 (71.7) | 99 (51.2) |
| Self-care | 6 (2.9) | 32 (15.6) | 88 (45.5) |
Figure 2Accuracy, sensitivity, and specificity of symptom checker apps and laypersons for 2 binary triage decisions on whether emergency care is required and whether professional medical care is required at all. Data for symptom checkers are taken from Semigran et al [23], Hill et al [24,35], and our own data collection. Data on laypersons are taken from Schmieding et al [36].
Diagnostic accuracy of symptom checkers as reported by Semigran et al [23] in 2015, Hill et al [24,34], and our data set from 2020a.
| Metric of diagnostic accuracy | Diagnostic accuracy of symptom checkers (%), median (IQR) | ||
|
| Semigran et al [ | Hill et al [ | Our data (n=14 apps) |
| M1 | 35.5 (30.0-40.0) | 34.3 (26.5-40.1) | 45.5 (37.5-51.7) |
| M10 | —b | 59.2 (40.5-70.8) | 71.1 (60.9-76.9) |
| M20 | 55.8 (45.2-73.6) | — | — |
aDiagnostic accuracy as reported by Hill et al [24,34] is based on a different but overlapping set of case vignettes. M1, M10, and M20 abbreviate the proportion of case vignettes a symptom checker assessed where it suggested the gold standard diagnosis first (M1) within the first 10 (M10) or within the first 20 diagnostic suggestions (M20). The table displays the median and IQR values on these 3 metrics of the 3 samples of symptom checkers.
bNot available: Semigran et al [23] presented values only for M1, M3, and M20. Hill et al [24,34] and our data collection disregarded diagnostic suggestions beyond the first 10 suggestions.
Figure 3Overall diagnostic accuracy (correct diagnosis listed first, M1) of 7 symptom checkers included in both samples (2015 and 2020) and assessed on the same 45 case vignettes in 2015 and 2020. Data on symptom checker accuracy for 2015 are taken from Semigran et al [23].
Figure 4Association between M1 diagnostic accuracy (proportion of case vignettes to which the app provided the correct diagnosis first, as percentage) and triage accuracy. Every dot represents a symptom checker app. Red dots represent apps that provide either only triage or only diagnostic advice. Data for symptom checkers are taken from studies by Semigran et al [23], Hill et al [24,35], and our own data collection.