| Literature DB >> 32442157 |
Alaa Abd-Alrazaq1, Zeineb Safi1, Mohannad Alajlani2, Jim Warren3, Mowafa Househ1, Kerstin Denecke4.
Abstract
BACKGROUND: Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field.Entities:
Keywords: chatbots; conversational agents; evaluation; health care; metrics
Mesh:
Year: 2020 PMID: 32442157 PMCID: PMC7305563 DOI: 10.2196/18301
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Flowchart of the study selection process.
Characteristics of the included studies (N=65).
| Parameters and characteristics | Studies, n (%)a | |||
|
| ||||
|
|
| |||
|
|
| Survey | 41 (63) | |
|
|
| Quasi-experiment | 11 (17) | |
|
|
| Randomized controlled trial | 13 (20) | |
|
|
| |||
|
|
| Journal article | 37 (57) | |
|
|
| Conference proceeding | 25 (39) | |
|
|
| Thesis | 3 (5) | |
|
|
| |||
|
|
| United States | 33 (51) | |
|
|
| France | 5 (8) | |
|
|
| Netherlands | 3 (5) | |
|
|
| Japan | 3 (5) | |
|
|
| Australia | 3 (5) | |
|
|
| Italy | 2 (3) | |
|
|
| Switzerland and Netherlands | 2 (3) | |
|
|
| Finland | 1 (2) | |
|
|
| Sweden | 1 (2) | |
|
|
| Turkey | 1 (2) | |
|
|
| United Kingdom | 1 (2) | |
|
|
| Switzerland & Germany | 1 (2) | |
|
|
| Mexico | 1 (2) | |
|
|
| Spain | 1 (2) | |
|
|
| Global population | 1 (2) | |
|
|
| Romania, Spain and Scotland | 1 (2) | |
|
|
| Philippines | 1 (2) | |
|
|
| Switzerland | 1 (2) | |
|
|
| New Zealand | 1 (2) | |
|
|
| Spain and New Zealand | 1 (2) | |
|
|
| South Africa | 1 (2) | |
|
|
| |||
|
|
| Before 2010 | 3 (5) | |
|
|
| 2010-2014 | 17 (26) | |
|
|
| 2015-2019 | 45 (70) | |
|
| ||||
|
|
| |||
|
|
| ≤50 | 38 (62) | |
|
|
| 51-100 | 11 (18) | |
|
|
| 101-200 | 9 (15) | |
|
|
| >200 | 3 (5) | |
|
|
| |||
|
|
| Mean (range) | 39 (13-79) | |
|
|
| |||
|
|
| Male | 48.1 | |
|
|
| |||
|
|
| Clinical sample | 34 (55) | |
|
|
| Nonclinical sample | 28 (45) | |
|
|
| |||
|
|
| Clinical | 30 (50) | |
|
|
| Community | 20 (33) | |
|
|
| Educational | 18 (30) | |
|
| ||||
|
|
| |||
|
|
| Self-management | 17 (26) | |
|
|
| Therapy | 12 (19) | |
|
|
| Counselling | 12 (19) | |
|
|
| Education | 10 (15) | |
|
|
| Screening | 9 (14) | |
|
|
| Training | 7 (11) | |
|
|
| Diagnosing | 3 (5) | |
|
|
| |||
|
|
| Stand-alone software | 40 (62) | |
|
|
| Web-based | 25 (39) | |
|
|
| |||
|
|
| Rule-based | 53 (82) | |
|
|
| Artificial intelligence | 11 (17) | |
|
|
| Hybrid | 1 (2) | |
|
|
| |||
|
|
| Chatbot | 58 (89) | |
|
|
| Users | 4 (6) | |
|
|
| Both | 3 (5) | |
|
|
| |||
|
|
| Text | 40 (62) | |
|
|
| Voice | 9 (14) | |
|
|
| Voice and nonverbal | 8 (12) | |
|
|
| Text and voice | 6 (9) | |
|
|
| Text and nonverbal | 2 (3) | |
|
|
| |||
|
|
| Text, voice and nonverbal | 21 (32) | |
|
|
| Text | 20 (31) | |
|
|
| Voice and nonverbal | 19 (29) | |
|
|
| Text & voice | 4 (6) | |
|
|
| Voice | 1 (2) | |
|
|
| |||
|
|
| Any health condition | 20 (31) | |
|
|
| Depression | 15 (23) | |
|
|
| Autism | 5 (8) | |
|
|
| Anxiety | 5 (8) | |
|
|
| Substance use disorder | 5 (8) | |
|
|
| Posttraumatic stress disorder | 5 (8) | |
|
|
| Mental disorders | 3 (5) | |
|
|
| Sexually transmitted diseases | 3 (5) | |
|
|
| Sleep disorders | 2 (3) | |
|
|
| Diabetes | 2 (3) | |
|
|
| Alzheimer | 1 (2) | |
|
|
| Asthma | 1 (2) | |
|
|
| Cervical cancer | 1 (2) | |
|
|
| Dementia | 1 (2) | |
|
|
| Schizophrenia | 1 (2) | |
|
|
| Stress | 1 (2) | |
|
|
| Genetic variants | 1 (2) | |
|
|
| Cognitive impairment | 1 (2) | |
|
|
| Atrial fibrillation | 1 (2) | |
|
| ||||
aPercentages were rounded and may not sum to 100.
bSample size was reported in 61 studies.
cMean age was reported in 44 studies.
dN/A: not applicable.
eSex was reported in 54 studies.
fSample type was reported in 62 studies.
gSetting was reported in 61 studies.
hNumbers do not add up as several chatbots focused on more than one health condition.
iNumbers do not add up as several chatbots have more than one purpose.
jNumbers do not add up as several chatbots target more than one health condition.