Literature DB >> 26157077

Evaluation of symptom checkers for self diagnosis and triage: audit study.

Hannah L Semigran1, Jeffrey A Linder2, Courtney Gidengil3, Ateev Mehrotra4.   

Abstract

OBJECTIVE: To determine the diagnostic and triage accuracy of online symptom checkers (tools that use computer algorithms to help patients with self diagnosis or self triage).
DESIGN: Audit study.
SETTING: Publicly available, free symptom checkers. PARTICIPANTS: 23 symptom checkers that were in English and provided advice across a range of conditions. 45 standardized patient vignettes were compiled and equally divided into three categories of triage urgency: emergent care required (for example, pulmonary embolism), non-emergent care reasonable (for example, otitis media), and self care reasonable (for example, viral upper respiratory tract infection). MAIN OUTCOME MEASURES: For symptom checkers that provided a diagnosis, our main outcomes were whether the symptom checker listed the correct diagnosis first or within the first 20 potential diagnoses (n=770 standardized patient evaluations). For symptom checkers that provided a triage recommendation, our main outcomes were whether the symptom checker correctly recommended emergent care, non-emergent care, or self care (n=532 standardized patient evaluations).
RESULTS: The 23 symptom checkers provided the correct diagnosis first in 34% (95% confidence interval 31% to 37%) of standardized patient evaluations, listed the correct diagnosis within the top 20 diagnoses given in 58% (55% to 62%) of standardized patient evaluations, and provided the appropriate triage advice in 57% (52% to 61%) of standardized patient evaluations. Triage performance varied by urgency of condition, with appropriate triage advice provided in 80% (95% confidence interval 75% to 86%) of emergent cases, 55% (47% to 63%) of non-emergent cases, and 33% (26% to 40%) of self care cases (P<0.001). Performance on appropriate triage advice across the 23 individual symptom checkers ranged from 33% (95% confidence interval 19% to 48%) to 78% (64% to 91%) of standardized patient evaluations.
CONCLUSIONS: Symptom checkers had deficits in both triage and diagnosis. Triage advice from symptom checkers is generally risk averse, encouraging users to seek care for conditions where self care is reasonable. © Semigran et al 2015.

Entities:  

Mesh:

Year:  2015        PMID: 26157077      PMCID: PMC4496786          DOI: 10.1136/bmj.h3480

Source DB:  PubMed          Journal:  BMJ        ISSN: 0959-8138


Introduction

Members of the public are increasingly using the internet to research their health concerns. For example, the United Kingdom’s online patient portal for national health information, NHS Choices, reports over 15 million visits per month.1 More than a third of adults in the United States regularly use the internet to self diagnose their ailments, using it both for non-urgent symptoms and for urgent symptoms such as chest pain.2 3 While there is a wealth of online resources to learn about specific conditions, self diagnosis usually starts with search engines like Google, Bing, or Yahoo.2 However, internet search engines can lead users to confusing and sometimes unsubstantiated information, and people with urgent symptoms may not be directed to seek emergent care.3 4 5 6 Recently there has been a proliferation of more sophisticated programs called symptom checkers that attempt to more effectively provide a potential diagnosis for patients and direct them to the appropriate care setting.3 6 7 8 9 10 11 12 13 Using computerized algorithms, symptom checkers ask users a series of questions about their symptoms or require users to input details about their symptoms themselves. The algorithms vary and may use branching logic, bayesian inference, or other methods. Private companies and other organizations, including the National Health Service, the American Academy of Pediatrics, and the Mayo Clinic, have launched their own symptom checkers. One symptom checker, iTriage, reports 50 million uses each year.14 Typically, symptom checkers are accessed through websites, but some are also available as apps for smart phones or tablets. Symptom checkers serve two main functions: to facilitate self diagnosis and to assist with triage. The self diagnosis function provides a list of diagnoses, usually rank ordered by likelihood. The diagnosis function is typically framed as helping educate patients on the range of diagnoses that might fit their symptoms. The triage function informs patients whether they should seek care at all and, if so, where (that is, emergency department, general practitioner’s clinic) and with what urgency (that is, emergently or within a few days). Symptom checkers may supplement or replace telephone triage lines, which are common in primary care.15 16 17 18 To ensure the safety of medical mobile apps, the US Congress is considering the regulation of apps that “provide a list of possible medical conditions and advice on when to consult a health care provider.”19 20 Symptom checkers have several potential benefits. They can encourage patients with a life threatening problem such as stroke or heart attack to seek emergency care.21 For patients with a non-emergent problem that does not require a medical visit, these programs can reassure people and recommend they stay home. For approximately a quarter of visits for acute respiratory illness such as viral upper respiratory tract infection, patients do not receive any intervention beyond over the counter treatment,22 and over half of patients receive unnecessary antibiotics.23 24 25 Reducing the number of visits saves patients’ time and money, deters overprescribing of antibiotics, and may decrease demand on primary care providers—a critical problem given that the workload for general practitioners in the United Kingdom increased by 62% from 1995 to 2008.17 However, there are several key concerns. If patients with a life threatening problem are misdiagnosed and not told to seek care, their health could worsen, increasing morbidity and mortality. Alternatively, if patients with minor illnesses are told to seek care, in particular in an emergency department, such programs could increase unnecessary visits and therefore result in increased time and costs for patients and society. The impact of symptom checkers will depend to a large degree on their clinical performance. To measure the accuracy of diagnosis and triage advice provided by symptom checkers, we used 45 standardized patient vignettes to audit 23 symptom checkers. The vignettes reflected a range of conditions from common to less common and low acuity to life threatening.

Methods

Search strategy for symptom checkers

Between June 2014 and November 2014 we searched for symptoms checkers that were in English, were free, were publicly available, were for humans (compared with veterinary use), and did not focus on a single type of condition (for example, only orthopedic problems). To find symptom checkers that were available as apps in the Apple app store and Google Play, we used two search phrases (“symptom checker”, “medical diagnosis”) used in a recent study on symptom checkers and examined the first 240 search results by hand.12 We chose 240 because this cut-off has been used in previous studies that have searched smartphone app stores.26 To find online symptom checkers, we entered the same two search phrases in Google and Google Scholar and examined the first 300 results. In previous research, the probability of relevant search results identified using Google declines substantially after the first 300 results.27 We supplemented our searches by asking the developers of two symptom checkers if they knew of other competing products. In total we identified 143 symptom checkers. We excluded 102 that used the same medical content and logic as another tool (and therefore would have identical performance) (see list in supplementary appendix). We excluded a further 25 that only focused on a single class of illness (for example, orthopedic problems), 14 that only provided medical advice (for example, what symptoms are typically associated with a certain condition) and did not provide diagnosis or triage advice, and two that were not working. After these exclusions, we evaluated 23 symptom checkers.

Symptom checkers’ characteristics

We categorized symptom checkers by whether they facilitated self diagnosis, self triage, or both; type of organization that operated the symptom checker; and the maximum number of diagnoses provided and whether they were based on Schmitt or Thompson nurse triage guidelines, which are decision support protocols commonly used in telephone triage for pediatric and adult consultations, respectively.28 29 We grouped government and health plans together because both may have a financial incentive to deter unnecessary visits. In the supplementary appendix we provide data when available about estimated total visitors to select symptom checkers.

Clinical vignettes

To evaluate the diagnosis and triage performance of the symptom checkers, we used 45 standardized patient vignettes. We used clinical vignettes to assess performance because they are a common method to test physicians and other clinicians on their diagnostic ability and management decisions. We purposefully selected standardized patient vignettes from three categories of triage urgency: 15 vignettes for which emergent care is required, 15 vignettes for which non-emergent care is reasonable, and 15 vignettes for which a medical visit is generally unnecessary and self care is sufficient. We chose vignettes across the severity spectrum because patients use symptom checkers for symptoms that require both urgent and non-urgent care.3 We included vignettes for both common and uncommon conditions because we believe that the clinical community would be particularly interested in performance for less common but potentially life threatening problems. The standardized patient vignettes were identified from various clinical sources, including materials used to educate health professionals and a medical resource website with content provided by a panel of physicians.30 The source for each vignette also provided the associated correct diagnosis. Symptom checkers generally require users to enter a list of symptoms or ask a series of questions about their symptoms. Each vignette was simplified into a core set of symptoms for easy entry, and in some situations we supplemented the data provided by the vignette because a symptom checker asked about a symptom not addressed in the vignette (see the supplementary appendix for details on source, core symptoms, and supplemental symptoms for each vignette). We categorized the 45 vignettes as either “common” or “uncommon” diagnoses based on the prevalence of the diagnosis among ambulatory visits in the United States (for full details see the supplementary appendix).31

Assessing diagnosis and triage results

Each standardized patient vignette was entered into each website or app, and we recorded the resulting diagnoses and triage advice. An author (HS) with no clinical training entered all the vignettes. A random sample of 25 vignettes was entered into symptom checkers by another person without clinical training and the inter-rater reliability between the two in capturing the symptom checker’s recommendations for diagnosis and triage was high (Cohen’s κ 0.90). In some cases we could not evaluate a vignette because some symptom checkers focus only on children or on adults or the symptom checker did not list or ask for the key symptom in the vignette. To avoid penalizing these symptom checkers, we referred to standardized patient vignettes that successfully yielded an output as “standardized patient evaluations.” To assess diagnostic accuracy, we noted whether the correct diagnosis was listed first or listed at all. For several vignettes, two symptom checkers presented a large number of diagnoses (as much as 99). Because such a long list of potential diagnoses is unlikely to be useful for patients, we considered a diagnosis to be listed at all only if it was within the first 20 diagnoses provided by a symptom checker. It is possible that many patients only focus on the top diagnoses listed. Therefore we also looked at whether the correct diagnosis was listed in the first three diagnoses given. We judged the diagnosis incorrect if the symptom checker indicated that the condition could not be identified. We categorized the triage advice into three groups: emergent, which included advice to call an ambulance, go to the emergency department, or see a general practitioner immediately; non-emergent, which included advice to call a general practitioner or primary care provider, see a general practitioner or primary care provider, go to an urgent care facility, go to a specialist, go to a retail clinic, or have an e-visit; and self care, which included advice to stay at home or go to a pharmacy. If multiple triage locations were suggested (for example, emergency department or specialist), we used the most urgent suggestion. We chose to do so because in almost all of the cases the most urgent triage suggestion was listed first. If a symptom checker was unable to reach a decision on diagnosis for a given standardized patient vignette but provided triage advice, we still assessed the appropriateness of this triage advice. Symptom checkers that required users to select the correct diagnosis before giving triage advice were not included in assessing the accuracy of triage with the exception of iTriage, which always suggested emergent triage advice.

Patient involvement

There was no patient involvement in this study.

Analysis

We calculated summary statistics for diagnostic accuracy and triage advice with 95% confidence intervals based on binomial distribution using Stata/MP 13.0. Given our focus on symptom checkers as a whole, we did not make statistical comparisons of accuracy between individual symptom checkers. We used χ2 tests to compare the diagnosis and triage accuracy by level and urgency and by type of symptom checker. We conducted a sensitivity analysis of triage advice, excluding several symptom checkers that always or usually recommended emergent care.

Results

Study sample

The 23 identified symptom checkers were based in the United Kingdom, United States, the Netherlands, and Poland (table 1): 11 symptom checkers provided both diagnoses and triage advice, eight only provided diagnoses, and four only provided triage advice. The 45 standardized patient vignettes included 26 common and 19 uncommon diagnoses. Performance was assessed on a total of 770 standardized patient evaluations for diagnosis and 532 standardized patient evaluations for triage. Across the symptom checkers, 10 did not ask for demographics (age and sex).
Table 1

 Symptom checkers included in the study

Symptom checkerDescriptionMaximum No of diagnoses*Triage options provided
AskMD (USA)Online health and wellness platform from Sharecare (www.sharecare.com/askmd/get-started)15Not available
BetterMedicine (USA)Health resource from HealthGrades; symptom checker provides possible diagnoses and information about these conditions (www.bettermedicine.com/symptom-checker/)46Not available
DocResponse (USA)Symptom checker started by a group of certified physicians; user can choose from internal medicine, dermatology, and orthopedic views (www.docresponse.com/)5Not available
Doctor Diagnose (USA)App offered on Google Play; provides potential diagnoses and triage advice in some cases3Seek immediate care; call your doctor now; speak with your doctor; home care
Drugs.com (USA)Online resource for drug and related health information; uses content from Harvard Health Publications (www.drugs.com/symptom-checker/)10Emergency department; primary care doctor; home care
EarlyDoc (Netherlands)For triage criteria, uses Dutch College of General Practitioners (NHG) TriageWijzer and the Australian Triage Scale (used in Australia and New Zealand to assess urgency) (www.earlydoc.com/en/)3Don’t wait, and call a doctor now; call a doctor preferably today; see your doctor preferably on a weekday; your complaints don’t seem urgent
Esagil (USA)Provides list of likely diagnoses (based on number of entered symptoms that are congruent with diagnosis); user can also enter blood and urine lab results along with symptoms (http://esagil.org/)65Not available
Family Doctor (USA)Displays flow chart to track symptoms to diagnosis and triage option; created by American Academy of Physicians (http://familydoctor.org/familydoctor/en/health-tools/search-by-symptom.html)7Emergency room; see your doctor; home care
FreeMD (USA)Takes user through a series of questions in a “check-up” to finish with “what might be wrong with you” and “where to go for care”; owned by DSHI Systems, which provides triage decision support solutions from emergency medicine physicians to the US government (Department of Veteran Affairs) and private sector companies; program called TriageXpert (www.freemd.com/)3Emergency department; urgent care; doctor’s office; doctor e-visit; retail clinic; dentist; home care
Harvard Medical School Family Health Guide (USA)From Harvard Health Publications; available both online and in print.† (www.health.harvard.edu/fhg/symptoms/symptoms.shtml) 4Emergency department; general practice; home care
Healthline (USA)Health and wellness website that licenses content to employers, health providers, and health plans (www.healthline.com/symptom-checker)76Not available
Healthwise (USA)Non-profit provider for health content and patient education; symptom checker licensed to other organizations; we accessed using Province of Alberta’s website (https://myhealth.alberta.ca/health/pages/symptom-checker.aspx)Not availableCall 911 now; seek care now; seek care today; try home care
Healthy Children (USA)From American Academy of Pediatrics; use’s Barton D Schmitt’s “Pediatric HouseCalls Symptom Checker” triage protocol (www.healthychildren.org/English/tips-tools/symptom-checker)Not availableCall 911 now; call your doctor now (night or day); call your doctor within 24 hours; call your doctor during weekday office hours; parent care at home
Isabel (UK)Created by Isabel Medical Charity (http://symptomchecker.isabelhealthcare.com/suggest_diagnoses_advanced/landing_page)10Walk-in care; family doctor; emergency services
iTriage (USA)Owned by Aetna; provides clinical sites in user’s region with addresses and phone numbers (www.itriagehealth.com/avatar)5Emergency department, urgent care, retail clinic, family practice, internal medicine, specialties, prescription medication, over the counter medication
Mayo Clinic (USA)Health resource website (www.mayoclinic.org/symptom-checker/select-symptom/itt-20009075)20Not available
MEDoctor (USA)Free differential diagnosis system (www.medoctor.com/)3Not available
NHS Symptom Checkers (UK)Available through England’s National Health Services (NHS) Choices website (www.nhs.uk/symptomcheckers/pages/symptoms.aspx)Not availableEmergency department; general practitioner; home care
Steps2Care (USA)iPhone and Android app; provides symptom care guides from Barton D Schmitt’s pediatric telephone triage guidelines and David A Thompson’s adult telephone triage guidelines Not availableCall 911 now; go to emergency room now; call doctor now or go to emergency room; call doctor within 24 hours; call doctor during office hours; self care at home
Symcat (USA)Triage tool uses data linking specific patient symptoms and physician diagnoses across visits seen in the National Ambulatory Medical Care Survey (NAMCS) (www.symcat.com/)6Primary care; retail clinic; emergency room; urgent care
Symptify (USA)Online self assessment tool and other health services, including, for example, emergency contact list, consultation list (https://symptify.com/)9Emergency room; urgent care; home care; inconclusive
Symptomate (Poland)Uses bayesian network methodology and medical database for diagnoses (https://symptomate.com/)5Emergency room; specialist; general practitioner
WebMD (USA)Medical reference and healthcare resource website (http://symptoms.webmd.com)99Not available

*Symptom checkers that provided diagnostic advice presented a list of potential diagnoses. We identified the maximum number of diagnoses presented across applicable vignettes.

†The online tool often refers the user to the book to make a decision on diagnosis and triage. For this study, we assessed the print version of the symptom checker.

Symptom checkers included in the study *Symptom checkers that provided diagnostic advice presented a list of potential diagnoses. We identified the maximum number of diagnoses presented across applicable vignettes. †The online tool often refers the user to the book to make a decision on diagnosis and triage. For this study, we assessed the print version of the symptom checker.

Accuracy of diagnosis

Overall, the correct diagnosis was listed first in 34% (95% confidence interval 31% to 37%; table 2) of standardized patient evaluations. Performance varied by urgency of condition. The correct diagnosis was listed first for 24% (19% to 30%) of emergent standardized patient evaluations, 38% (32% to 34%) of non-emergent standardized patient evaluations, and 40% (34% to 47%) of self care standardized patient evaluations (P<0.001 for comparison, table 2). There was no difference between symptom checkers that asked for and did not ask for demographic information (34%, 95% confidence interval 30% to 39% and 34%, 28% to 39%, P=0.88; table 3). The correct diagnosis was, however, listed first more often in standardized patient evaluations for common diagnoses than for uncommon diagnoses (38%, 34% to 43% and 28%, 23% to 33%, P=0.004; table 2).
Table 2

 Accuracy of diagnosis decision and triage advice for all symptom checkers, stratified by severity of standardized patient (SP) vignette and by frequency of the condition’s diagnosis

Type of vignette or diagnosisNo of vignettes (%)DiagnosisTriage
Listed firstListed in top 3Listed in top 20
Rate*% (95% CI)P valueRate*% (95% CI)P valueRate*% (95% CI)P valueRate*% (95% CI)P value
All vignettes45 (100)262/77034 (31 to 37)394/77051 (47 to 54)449/77058 (55 to 62)301/53257 (52 to 61)
Type of SP vignette:
 Emergent15 (33)64/26324 (19 to 30)<0.001104/26340 (34 to 46)<0.001132/26350 (44 to 56)0.003147/18380 (75 to 86)<0.001
 Non-emergent15 (33)98/26038 (32 to 44)148/26057 (51 to 63)157/26060 (54 to 66)96/17555 (47 to 63)
 Self care15 (33)100/24740 (34 to 47)142/24757 (51 to 63)160/24765 (59 to 71)58/17433 (26 to 40)
Type of diagnosis†:
 Common26 (58)174/45738 (34 to 43)0.004254/45756 (52 to 61)<0.001283/45762 (57 to 66)0.01162/31352 (46 to 57)0.01
 Uncommon19 (42)88/31328 (23 to 33)140/31345 (38 to 49)166/31353 (47 to 59)139/21963 (57 to 70)

*Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker (see text). For example, an adult SP vignette could not be evaluated if it was a pediatric symptom checker.

†Based on annual number of ambulatory care visits in United States, 2009-10 (see supplementary appendix for further description).

Table 3

 Accuracy of diagnosis given and triage advice for all symptom checkers given certain characteristics of the tools

Symptom checker characteristicsAll symptom checkersDiagnosisTriage
No of symptom checkers (%)Listed firstP valueListed in top 20P valueNo of symptom checkers (%)Rate*% (95% Cl)P value
Rate*% (95% Cl)Rate*% (95% Cl)
All symptom checkers23 (100)19 (100)262/77034 (31 to 37)449/77058 (55 to 62)15 (100)301/53257 (52 to 61)
Uses nurse triage books?†:
 Yes2 (9)0 (0)0—‡—‡0—‡—‡2 (13)41/5772 (60 to 84)0.01
 No21 (91)19 (100)262/77034 (31 to 37)449/77058 (55 to 62)13 (87)260/47555 (50 to 59)
Asks demographic questions?:
 Yes14 (61)12 (63)157/45834 (30 to 39)0.88275/45860 (56 to 65)0.249 (60)162/31951 (45 to 56)<0.001
 No9 (39)7 (37)105/31234 (28 to 39)174/31256 (50 to 61)6 (40)139/21365 (59 to 72)
Site owner:
 Health plan or government3 (13)1 (5)16/4436 (22 to 51)34/4477 (64 to 90)0.013 (20)56/13143 (34 to 51)<0.001
 Provider group4 (17)5 (26)56/16734 (26 to 41)0.9104/16762 (55 to 70)4 (27)65/9668 (58 to 77)
 Private Company14 (61)13 (68)190/55934 (30 to 38)311/55956 (52 to 60)8 (53)180/30559 (53 to 65)
Maximum No of diagnoses listed:
 1-36 (32)6 (32)78/22734 (28 to 41)120/22753 (46 to 59)0.125 (33)87/16353 (46 to 60)0.32
 4-107 (37)7 (37)107/28338 (32 to 43)0.13175/28362 (56 to 68)6 (40)131/22458 (52 to 65)
 ≥116 (32)6 (32)77/26030 (24 to 35)154/26059 (53 to 65)4 (27)—§—§

*Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker. For example, we could not evaluate an SP vignette aimed at adults if the symptom checker was for children.

†From the Schmitt or Thompson protocols.

‡Symptom checker does not provide suggestions for diagnosis.

§Symptom checker does not provide triage advice.

Accuracy of diagnosis decision and triage advice for all symptom checkers, stratified by severity of standardized patient (SP) vignette and by frequency of the condition’s diagnosis *Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker (see text). For example, an adult SP vignette could not be evaluated if it was a pediatric symptom checker. †Based on annual number of ambulatory care visits in United States, 2009-10 (see supplementary appendix for further description). Accuracy of diagnosis given and triage advice for all symptom checkers given certain characteristics of the tools *Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker. For example, we could not evaluate an SP vignette aimed at adults if the symptom checker was for children. †From the Schmitt or Thompson protocols. ‡Symptom checker does not provide suggestions for diagnosis. §Symptom checker does not provide triage advice. Performance varied across symptom checkers. Listing the correct diagnosis first in standardized patient evaluations ranged from 5% for MEDoctor (95% confidence interval 0% to 13%) to 50% for DocResponse (33% to 67%; table 4). Few differences were observed by the symptom checkers’ characteristics (table 3).
Table 4

 Accuracy of diagnosis decision and triage advice for each symptom checker

Symptom checker (n=23)DiagnosisTriage
Listed firstListed in top 3Listed in top 20All casesEmergent care requiredNon-emergent care reasonableSelf care reasonable
Rate*% (95% CI)Rate*% (95% CI)Rate*% (95% CI)Rate*% (95% CI)Rate*% (95% CI)Rate*% (95% CI)Rate*% (95% CI)
Ask MD17/4043 (26 to 59)27/4068 (52 to 83)30/4075 (61 to 89)—†—†—†—†—†—†—†—†
BetterMedicine11/4524 (11 to 38)13/4529 (15 to 43)17/4538 (23 to 53)—†—†—†—†—†—†—†—†
DocResponse18/3650 (33 to 67)24/3667 (50 to 83)26/3672 (57 to 88)—†—†—†—†—†—†—†—†
Doctor Diagnose16/3941 (25 to 57)17/3944 (27 to 60)18/3946 (30 to 63)10/1663 (36 to 89)8/1080 (13 to 100)2/367 (0 to 100)0/30 (0 to0)
Drugs.com17/4340 (24 to 55)20/4347 (31 to 63)25/4358 (43 to74)25/4260 (44 to75)8/1457 (27 to 87)9/1560 (32 to 88)8/1362 (31 to 92)
EarlyDoc6/1932 (9 to 55)7/1933 (9 to 57)7/1933 (9 to57)9/1753 (26 to79)4/667 (12 to 100)3/560 (0 to 100)2/633 (0 to 88)
Esagil9/4420 (8 to33)15/4434 (24 to 54)22/4450 (35 to65)—†—†—†—†—†—†—†—†
Family Doctor20/4347 (31 to62)24/4356 (40 to 71)24/4356 (40 to71)22/4154 (38 to70)6/1250 (17 to 83)7/1450 (20 to 80)9/1560 (32 to 88)
FreeMD16/4436 (22 to51)20/4445 (30 to 61)21/4448 (32 to63)26/4459 (44 to74)10/1567 (40 to 94)13/1587 (67 to 100)3/1421 (0 to 46)
HMS Family Health Guide13/3834 (18 to50)20/3852 (39 to 72)21/3855 (39 to72)32/4078 (64 to91)13/1493 (77 to 100)11/1385 (62 to 100)8/1362 (31 to 92)
Healthline17/4538 (23 to53)24/4553 (37 to 68)26/4558 (43 to73)—†—†—†—†—†—†—†—†
Healthwise—‡—‡—‡—‡—‡—‡19/4443 (28 to58)15/15100 (100 to 100)1/157 (0 to 21)3/1421 (0 to 46)
Healthy Children—‡—‡—‡—‡—‡—‡11/1573 (48 to99)3/3100 (100 to 100)5/5100 (100 to 100)3/743 (0 to 92)
Isabel20/4544 (29 to60)31/4569 (55 to 83)38/4584 (73 to 95)23/4551 (36 to66)15/15100 (100 to 100)8/1553 (25 to 82)0/150 (0 to 0)
iTriage16/4536 (22 to51)29/4564 (49 to 78)34/4577 (64 to90)14/4333 (19 to48)14/14100 (100 to 100)0/140 (0 to 0)0/150 (0 to 0)
Mayo Clinic7/4117 (5 to29)24/4159 (43 to 74)31/4176 (62 to89)—†—†—†—†—†—†—†—†
MEDoctor2/375 (0 to13)16/3743 (26 to 60)16/3743 (26 to60)—†—†—†—†—†—†—†—†
NHS—‡—‡—‡—‡—‡—‡23/4452 (37 to68)13/1587 (67 to 100)3/1520 (0 to 43)7/1450 (20 to 80)
Steps2Care —‡—‡—‡—‡—‡—‡30/4271 (57 to86)12/1486 (65 to 100)10/1471 (44 to 98)8/1457 (27 to 87)
Symcat18/4540 (25 to55)32/4571 (57 to 85)34/4576 (62 to89)20/4544 (29 to60)8/1553 (25 to 82)12/1580 (57 to 100)0/150 (0 to 0)
Symptify13/4529 (15 to43)16/4536 (22 to 51)20/4544 (29 to60)28/4070 (55 to85)11/1292 (73 to 100)10/1471 (44 to 98)7/1450 (20 to 80)
Symptomate10/3231 (14 to48)11/3234 (17 to 52)11/3234 (17 to52)9/1464 (36 to93)7/976 (44 to 100)2/367 (0 to 100)0/20 (0 to 0)
WebMD16/4536 (21 to50)23/4551 (36 to 66)28/4562 (47 to77)—†—†—†—†—†—†—†—†

HMS=Harvard Medical School; NHS=National Health Service.

*Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker. For example, we could not evaluate an SP vignette aimed at adults if the symptom checker was for children.

†Symptom checker does not provide triage advice.

‡Symptom checker does not provide diagnosis suggestions.

Accuracy of diagnosis decision and triage advice for each symptom checker HMS=Harvard Medical School; NHS=National Health Service. *Number of correct SP evaluations divided by applicable SP evaluations. Some SP vignettes could not be applied to a given symptom checker. For example, we could not evaluate an SP vignette aimed at adults if the symptom checker was for children. †Symptom checker does not provide triage advice. ‡Symptom checker does not provide diagnosis suggestions. Across all symptom checkers the correct diagnosis was listed in the first three diagnoses in 51% (95% confidence interval 47% to 54%) of standardized patient evaluations and in the first 20 diagnoses in 58% (55% to 62%) of standardized patient evaluations (table 2). Diagnostic accuracy for listing the correct diagnosis in the top three and top 20 was higher for self care conditions than for emergent conditions and was also higher for common conditions than for uncommon conditions. There was no significant difference in listing the correct diagnosis in the top 20 between symptom checkers that listed more than 11 diagnoses compared with those that only listed 1-3 diagnoses (59%, 53% to 65% v 53%, 46% to 59%, P=0.12; table 3). The accuracy of listing the correct diagnosis in the top 20 across the 23 individual symptom checkers ranged from 34% (95% confidence interval 17% to 52%) to 84% (73% to 95%, table 4).

Accuracy of triage advice

Appropriate triage advice was given in 57% (95% confidence interval 52% to 61%) of standardized patient evaluations (table 2). Performance on triage advice was higher for emergent care standardized patient evaluations than for non-emergent and self-care standardized patient evaluations: 80% (75% to 86%) v 55% (47% to 63%) v 33% (26% to 40%), P<0.001). Appropriate triage advice was higher for uncommon diagnoses than for common diagnoses: 63% (57% to 70%) v 52% (46% to 57%), P=0.01). iTriage, Symcat, Symptomate, and Isabel always suggested users seek care and therefore never advised self care (table 4). After excluding these four symptom checkers, appropriate triage advice was given in 61% (95% confidence interval 56% to 66%) of standardized patient evaluations (see supplementary table 5). Symptom checkers that used the Schmitt or Thompson nurse triage protocols were more likely to provide appropriate triage decisions than those that did not: 72% (95% confidence interval 60% to 84%) v 55% (50% to 59%), P=0.01; table 3. Accurate triage advice varied by operator of symptom checker (provider groups and physician associations 68% (58% to 77%), private companies 59% (53% to 65%), health plans or governments 43% (34% to 51%), P<0.001).

Discussion

Using standardized patient vignettes we measured the diagnostic and triage accuracy of symptom checkers. Although there was a range of performance across symptom checkers, overall they had deficits in both diagnosis and triage accuracy. On average, symptom checkers provided the correct diagnosis within the first 20 listed in 58% of standardized patient evaluations, with the best performing symptom checker listing the correct diagnosis in 84% of standardized patient evaluations. Symptom checkers advised the appropriate level of care about half the time, but this varied by clinical severity. The correct triage decision was much higher for standardized patient evaluations requiring emergent care (80%) than for those for which self care was appropriate (34%).

Comparisons with other studies

Our results on diagnostic accuracy and appropriate triage are roughly similar to previous work on the performance of single symptom checkers for a limited set of diagnoses.6 7 8 32 An orthopedic symptom checker listed the correct diagnosis for knee pain 89% of the time, and Boots WebMD listed the correct diagnosis 70% of the time for ear, nose, and throat symptoms.7 8 One study that also used two common acute standardized patient vignettes to evaluate WebMD reported a diagnostic accuracy rate of 50%.6 Whether this level of performance for diagnosis and triage we observed is acceptable depends on the standard for comparison. If symptom checkers are seen as a replacement for seeing a physician, they are likely an inferior alternative. It is believed that physicians have a diagnostic accuracy rate of 85-90%, though in some studies using clinical vignettes, performance was lower.33 34 However, in-person physician visits might be the wrong comparison because patients are likely not using symptom checkers to obtain a definitive diagnosis but for quick and accessible guidance. Also, instead of diagnostic accuracy the key assessment of symptom checkers may be appropriate triage. Distinguishing between Rocky Mountain spotted fever and meningitis may be less important than ensuring patients seek emergent care. If symptom checkers are seen as an alternative for simply entering symptoms into an online search engine such as Google, then symptom checkers are likely a superior alternative. A recent study found that when typing acute symptoms that would require urgent medical attention into search engines to identify symptom-related web sites, advice to seek emergent care was present only 64% of the time.3 Perhaps the most appropriate comparison to symptom checkers is telephone triage lines, which are widely used in developed nations.15 16 17 18 In general patients use symptom checkers and telephone triage for similar complaints.13 Also, many nurse phone triage lines use the same underlying clinical logic as the symptom checkers evaluated in this study. For example, some health plan nurse triage lines use the Healthwise symptom checker, and the Schmitt and Thompson protocols were originally developed for phone triage and now provide the underlying logic for several symptom checkers that we evaluated. The accuracy of telephone triage recommendations, as compared to in-person physician recommendations, ranged from 61% in a study of pediatric abdominal pain to 69% in a multicenter observational study.35 36 A recent study of NHS Symptom Checkers and NHS Direct’s telephone triage line found triage advice from both to be comparable.9 Given their similar clinical logic, triage performance, and their negligible operation costs, symptom checkers could potentially be a more cost effective way of providing triage advice than nurse-staffed phone lines.17

Implications for using symptom checkers

Both symptom checkers and telephone triage have been promoted as a means of reducing unnecessary office visits.15 16 17 18 37 The impact of symptom checkers on how people seek care depends on how patients respond to advice, and this is unknown. In one study, users expressed skepticism about the diagnosis ultimately suggested by a symptom checker.6 The risk averse nature of symptom checkers’ triage advice is a concern. In two thirds of standardized patient evaluations where medical attention was not necessary, we found symptom checkers encouraged care. Overly risk adverse advice is not limited to symptom checkers. Telephone triage advice can also encourage unnecessary care seeking.32 35 For instance, the NHS’s telephone triage line, which is not staffed by health professionals, has been implicated in increasing visits to emergency departments in the UK.38 Some patients researching health conditions online are motivated by fear, and the listing of concerning diagnoses by symptom checkers could contribute to hypochondriasis and “cyberchondria,” which describes the escalated anxiety associated with self diagnosis on the internet.39 40 41 42 43 Together, confusion, risk adverse triage advice, and cyberchondria could mean that symptom checkers encourage patients to receive care unnecessarily and thus increase healthcare spending. Understanding how patients interpret and use the advice from symptom checkers and the impact of symptom checkers on care seeking should be a key focus for future research. The symptom checkers in this study represent the first generation of such tools, and there are several potential advances that may improve their performance in future versions. Incorporating local epidemiological data may help inform diagnoses. For instance, addition of real time information about the local incidence of illness in the community greatly improved the performance of a diagnostic tool for group A streptococcal pharyngitis.10 Diagnosis and triage rates could also be improved if symptom checkers incorporated individual clinical data from medical claims or the electronic medical record. Demographic information is critical for both diagnostic and triage decisions for programs such as symptom checkers.11 One surprising finding in our study was that symptom checkers that asked for demographic background information did not perform better. However, it is possible that this demographic information was not effectively incorporated into the symptom checkers’ algorithms.

Strengths and limitations of this study

Despite the growing use of symptom checkers, we believe our study is the first to assess the clinical performance across a large number of symptom checkers and wide range of conditions. There were key limitations to this study. We cannot be sure we identified all publicly available symptom checkers, despite a thorough search of relevant databases and consultation with experts in this discipline. We used clinical vignettes in which the symptoms and diagnoses were typically clear, and few vignettes included comorbid conditions, resulting in a possible overestimation of the true clinical accuracy of symptom checkers.33 Some standardized patient vignettes contained specific clinical language (for example, mouth ulcers, tonsils with exudate), and actual patients with the same condition might struggle with the words to use to describe their symptoms or use different terms. Therefore, our analysis represents an indirect assessment of how well symptom checkers would perform with actual patients. We do not know how well physicians or other providers would diagnose or triage when presented with these standardized patient vignettes, preventing a direct comparison between symptom checkers and physicians. When symptom checkers suggested several care sites (for example, emergency department or general practice), our triage assessment was based only on the highest acuity site of care listed, and this may contribute to our finding that triage advice is risk averse. Symptom checkers are part of a larger trend of both patients and physicians using the internet for many healthcare tasks and therefore it seems likely that the use of symptom checkers will only increase. Patients are chatting online with physicians,44 emailing their doctors for medical advice,45 receiving care through e-visits,46 47 and downloading health apps to smartphones.48 In addition to the public, physicians and other practitioners are also using conceptually similar tools to aid in the diagnosis and triage of their patients.49 50 Physicians should be aware that an increasing number of their patients are using new internet based tools such as symptom checkers and that the diagnosis and triage advice patients receive may often be inaccurate. For patients, our results imply that in many cases symptom checkers can give the user a sense of possible diagnoses but also provide a note of caution, as the tools are frequently wrong and the triage advice overly cautious. Symptom checkers may, however, be of value if the alternative is not seeking any advice or simply using an internet search engine. Further evaluations and monitoring of symptom checkers will be important to assess whether they help people learn more and make better decisions about their health. The public is increasingly using the internet for self diagnosis and triage advice, and there has been a proliferation of computerized algorithms called symptom checkers that attempt to streamline this process Despite the growth in use of these tools, their clinical performance has not been thoroughly assessed Our study suggests that symptom checkers have deficits in both diagnosis and triage, and their triage advice is generally risk averse
  39 in total

1.  The dangers of using Google as a diagnostic aid.

Authors:  Peter Black
Journal:  Br J Nurs       Date:  2009 Oct 22-Nov 11

2.  Barriers and facilitators influencing call center nurses' decision support for callers facing values-sensitive decisions: a mixed methods study.

Authors:  Dawn Stacey; Ian D Graham; Annette M O'Connor; Marie-Pascale Pomey
Journal:  Worldviews Evid Based Nurs       Date:  2005       Impact factor: 2.931

3.  Giving patients choice and control: health informatics on the patient journey.

Authors:  B Gann
Journal:  Yearb Med Inform       Date:  2012

4.  Information leaflet and antibiotic prescribing strategies for acute lower respiratory tract infection: a randomized controlled trial.

Authors:  Paul Little; Kate Rumsby; Joanne Kelly; Louise Watson; Michael Moore; Gregory Warner; Tom Fahey; Ian Williamson
Journal:  JAMA       Date:  2005-06-22       Impact factor: 56.272

5.  Are e-health web users looking for different symptom information than callers to triage centers?

Authors:  Frederick North; Prathibha Varkey; Brian Laing; Steven S Cha; Sidna Tulledge-Scheitel
Journal:  Telemed J E Health       Date:  2011-01-07       Impact factor: 3.536

6.  Telephone triage for management of same-day consultation requests in general practice (the ESTEEM trial): a cluster-randomised controlled trial and cost-consequence analysis.

Authors:  John L Campbell; Emily Fletcher; Nicky Britten; Colin Green; Tim A Holt; Valerie Lattimer; David A Richards; Suzanne H Richards; Chris Salisbury; Raff Calitri; Vicky Bowyer; Katherine Chaplin; Rebecca Kandiyali; Jamie Murdoch; Julia Roscoe; Anna Varley; Fiona C Warren; Rod S Taylor
Journal:  Lancet       Date:  2014-08-03       Impact factor: 79.321

7.  A comparison of care at e-visits and physician office visits for sinusitis and urinary tract infection.

Authors:  Ateev Mehrotra; Suzanne Paone; G Daniel Martich; Steven M Albert; Grant J Shevchik
Journal:  JAMA Intern Med       Date:  2013-01-14       Impact factor: 21.873

8.  Trends in prehospital delay in patients with acute myocardial infarction (from the Worcester Heart Attack Study).

Authors:  Jane S Saczynski; Jorge Yarzebski; Darleen Lessard; Frederick A Spencer; Jerry H Gurwitz; Joel M Gore; Robert J Goldberg
Journal:  Am J Cardiol       Date:  2008-10-30       Impact factor: 2.778

9.  Evaluation of the accuracy of smartphone medical calculation apps.

Authors:  Rachel Bierbrier; Vivian Lo; Robert C Wu
Journal:  J Med Internet Res       Date:  2014-02-03       Impact factor: 5.428

10.  mHealth and mobile medical Apps: a framework to assess risk and promote safer use.

Authors:  Thomas Lorchan Lewis; Jeremy C Wyatt
Journal:  J Med Internet Res       Date:  2014-09-15       Impact factor: 5.428

View more
  92 in total

1.  Group Well-Child Care and Health Services Utilization: A Bilingual Qualitative Analysis of Parents' Perspectives.

Authors:  Benjamin J Oldfield; Patricia F Nogelo; Marietta Vázquez; Kimberly Ona Ayala; Ada M Fenick; Marjorie S Rosenthal
Journal:  Matern Child Health J       Date:  2019-11

2.  Artificial intelligence and diagnosis in general practice.

Authors:  Nick Summerton; Martin Cansdale
Journal:  Br J Gen Pract       Date:  2019-07       Impact factor: 5.386

3.  Mining Disease-Symptom Relation from Massive Biomedical Literature and Its Application in Severe Disease Diagnosis.

Authors:  Eryu Xia; Wen Sun; Jing Mei; Enliang Xu; Ke Wang; Yong Qin
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

4.  Direct-to-Consumer Genomics: Harmful or Empowering?: It is important to stress that genetic risk is not the same as genetic destiny.

Authors:  Joel C Eissenberg
Journal:  Mo Med       Date:  2017 Jan-Feb

5.  Case Report: Analytically Confirmed Severe Albenzadole Overdose Presenting with Alopecia and Pancytopenia.

Authors:  Morgan A A Riggan; Gabriel Perreault; Anita Wen; Veronica Raco; Susi Vassallo; Roy Gerona; Robert S Hoffman
Journal:  Am J Trop Med Hyg       Date:  2020-01       Impact factor: 2.345

6.  Health Cards to Assist Decision Making in Consumer Health Search.

Authors:  Guido Zuccon; Gianluca Demartini; Bevan Koopman
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

7.  A clinical decision support system in back pain helps to find the diagnosis: a prospective correlation study.

Authors:  Achim Benditz; Loreto C Pulido; Joachim Grifka; Fabian Ripke; Petra Jansen
Journal:  Arch Orthop Trauma Surg       Date:  2021-08-04       Impact factor: 3.067

Review 8.  Mobile technology and the digitization of healthcare.

Authors:  Sanjeev P Bhavnani; Jagat Narula; Partho P Sengupta
Journal:  Eur Heart J       Date:  2016-02-11       Impact factor: 29.983

9.  Evaluating a mobile application for improving clinical laboratory test ordering and diagnosis.

Authors:  Ashley N D Meyer; Pamela J Thompson; Arushi Khanna; Samir Desai; Benji K Mathews; Elham Yousef; Anita V Kusnoor; Hardeep Singh
Journal:  J Am Med Inform Assoc       Date:  2018-07-01       Impact factor: 4.497

10.  Diagnostic tools for tackling febrile illness and enhancing patient management.

Authors:  Konstantinos Mitsakakis; Valérie D'Acremont; Sebastian Hin; Felix von Stetten; Roland Zengerle
Journal:  Microelectron Eng       Date:  2018-10-05       Impact factor: 2.523

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.