Adam Ceney1, Stephanie Tolond1, Andrzej Glowinski1, Ben Marks1, Simon Swift1,2, Tom Palser1,3,4. 1. Methods Analytics Ltd, Sheffield, United Kingdom. 2. University of Exeter Business School (INDEX), Exeter, United Kingdom. 3. Department of Surgery, University Hospitals of Leicester NHS Trust, Leicester, United Kingdom. 4. SAPPHIRE, Department of Health Sciences, University of Leicester, Leicester, United Kingdom.
Abstract
OBJECTIVES: The aims of our study are firstly to investigate the diagnostic and triage performance of symptom checkers, secondly to assess their potential impact on healthcare utilisation and thirdly to investigate for variation in performance between systems. SETTING: Publicly available symptom checkers for patient use. PARTICIPANTS: Publicly available symptom-checkers were identified. A standardised set of 50 clinical vignettes were developed and systematically run through each system by a non-clinical researcher. PRIMARY AND SECONDARY OUTCOME MEASURES: System accuracy was assessed by measuring the percentage of times the correct diagnosis was a) listed first, b) within the top five diagnoses listed and c) listed at all. The safety of the disposition advice was assessed by comparing it with national guidelines for each vignette. RESULTS: Twelve tools were identified and included. Mean diagnostic accuracy of the systems was poor, with the correct diagnosis being present in the top five diagnoses on 51.0% (Range 22.2 to 84.0%). Safety of disposition advice decreased with condition urgency (being 71.8% for emergency cases vs 87.3% for non-urgent cases). 51.0% of systems suggested additional resource utilisation above that recommended by national guidelines (range 18.0% to 61.2%). Both diagnostic accuracy and appropriate resource recommendation varied substantially between systems. CONCLUSIONS: There is wide variation in performance between available symptom checkers and overall performance is significantly below what would be accepted in any other medical field, though some do achieve a good level of accuracy and safety of disposition. External validation and regulation are urgently required to ensure these public facing tools are safe.
OBJECTIVES: The aims of our study are firstly to investigate the diagnostic and triage performance of symptom checkers, secondly to assess their potential impact on healthcare utilisation and thirdly to investigate for variation in performance between systems. SETTING: Publicly available symptom checkers for patient use. PARTICIPANTS: Publicly available symptom-checkers were identified. A standardised set of 50 clinical vignettes were developed and systematically run through each system by a non-clinical researcher. PRIMARY AND SECONDARY OUTCOME MEASURES: System accuracy was assessed by measuring the percentage of times the correct diagnosis was a) listed first, b) within the top five diagnoses listed and c) listed at all. The safety of the disposition advice was assessed by comparing it with national guidelines for each vignette. RESULTS: Twelve tools were identified and included. Mean diagnostic accuracy of the systems was poor, with the correct diagnosis being present in the top five diagnoses on 51.0% (Range 22.2 to 84.0%). Safety of disposition advice decreased with condition urgency (being 71.8% for emergency cases vs 87.3% for non-urgent cases). 51.0% of systems suggested additional resource utilisation above that recommended by national guidelines (range 18.0% to 61.2%). Both diagnostic accuracy and appropriate resource recommendation varied substantially between systems. CONCLUSIONS: There is wide variation in performance between available symptom checkers and overall performance is significantly below what would be accepted in any other medical field, though some do achieve a good level of accuracy and safety of disposition. External validation and regulation are urgently required to ensure these public facing tools are safe.
Authors: Marcel Schmude; Nahya Salim; Hila Azadzoy; Mustafa Bane; Elizabeth Millen; Lisa O'Donnell; Philipp Bode; Ewelina Türk; Ria Vaidya; Stephen Gilbert Journal: JMIR Res Protoc Date: 2022-06-07
Authors: Felix Balzer; Markus A Feufel; Malte L Schmieding; Marvin Kopka; Konrad Schmidt; Sven Schulz-Niethammer Journal: J Med Internet Res Date: 2022-05-10 Impact factor: 7.076
Authors: Fabienne Cotte; Tobias Mueller; Stephen Gilbert; Bibiana Blümke; Jan Multmeier; Martin Christian Hirsch; Paul Wicks; Joseph Wolanski; Darja Tutschkow; Carmen Schade Brittinger; Lars Timmermann; Andreas Jerrentrup Journal: JMIR Mhealth Uhealth Date: 2022-03-28 Impact factor: 4.773