| Literature DB >> 35805395 |
Patricia Gual-Montolio1, Irene Jaén1, Verónica Martínez-Borba1,2, Diana Castilla3,4, Carlos Suso-Ribera1.
Abstract
Emotional disorders are the most common mental disorders globally. Psychological treatments have been found to be useful for a significant number of cases, but up to 40% of patients do not respond to psychotherapy as expected. Artificial intelligence (AI) methods might enhance psychotherapy by providing therapists and patients with real- or close to real-time recommendations according to the patient's response to treatment. The goal of this investigation is to systematically review the evidence on the use of AI-based methods to enhance outcomes in psychological interventions in real-time or close to real-time. The search included studies indexed in the electronic databases Scopus, Pubmed, Web of Science, and Cochrane Library. The terms used for the electronic search included variations of the words "psychotherapy", "artificial intelligence", and "emotional disorders". From the 85 full texts assessed, only 10 studies met our eligibility criteria. In these, the most frequently used AI technique was conversational AI agents, which are chatbots based on software that can be accessed online with a computer or a smartphone. Overall, the reviewed investigations indicated significant positive consequences of using AI to enhance psychotherapy and reduce clinical symptomatology. Additionally, most studies reported high satisfaction, engagement, and retention rates when implementing AI to enhance psychotherapy in real- or close to real-time. Despite the potential of AI to make interventions more flexible and tailored to patients' needs, more methodologically robust studies are needed.Entities:
Keywords: artificial intelligence; emotional problems; psychotherapy; systematic review
Mesh:
Year: 2022 PMID: 35805395 PMCID: PMC9266240 DOI: 10.3390/ijerph19137737
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Selection progress of the review.
Characteristics of the included studies.
| References | Country | Sample Size | Study Design | Emotional Problem Evaluated | Psychological Intervention | Main Outcome | Type of AI Used |
|---|---|---|---|---|---|---|---|
| [ | USA | 8 Tx | Pre-post study (no control) | MDD | Multimodal intervention (behavioral activation approach) | MINI, QIDS-C, PHQ-9, GAD-7 | The nearest neighbor Suppport Vector Machines and Random Forest Classifier |
| [ | USA | 74 (24 cont. + 2-week 24 Tx + 4-week 26 Tx) | Parallel group RCT (1:1) | College students with daily stressors | CBT and other interventions | PHQ-9, GAD-7, and PANAS | AI chatbot: conversational Tess app |
| [ | USA | 95 Tx (22 no selection-random intervention; 21 no selection-AI intervention; 26 selection-random; 26 selection-AI) | Controlled trial | Stress | Stress management micro-interventions (positive psychology, cognitive behavioral, meta-cognitive, and somatic) via app | PHQ-9 and CSQ | Reinforcement Learning algorithm |
| [ | UK | 171 (85 Tx MYLO program + 86 cont. ELIZA program) | RCT | Volunteers with daily stressors | Web-based problem-solving intervention | Problem-related distress | AI chatbot; Manage Your Life Online (MYLO) |
| [ | UK | 129 Tx | Quasi-experimental pre-post study | Self-report symptoms of MDD (nonclinical global population) | CBT together with other interventions via Wysa app | PHQ-9 | AI-chatbot: conversational Wysa app |
| [ | Germany | 1234 Tx | Pre-post study (no control) | Affective and anxiety disorders (70%), and other disorders (30%) | CBT via Trier Treatment Navigator | OQ-30, ASC, ASQ, HSCL-11 | Random Forest Algorithm. |
| [ | Switzerland | 126 Tx | Pre-post pilot study | Depressive symptoms | CBT intervention + MOSS app | PHQ-9 | LASSO (least absolute shrinkage and selection operator) |
| [ | Korea | 43 Tx (10 CRM group + 33 non-CRM) | Prospective Case-control study | MDD, BD I and BD II | Feedback intervention of Behavioral guidance | Daily mood state on eMoodChart, mood episodes, circadian rhythm | Circadian rhythm-based algorithm based on data obtained with a wearable activity tracker |
| [ | Kenya | 41 Tx (pregnant women and new mothers) | Nonconcurrent Multiple baseline SG | Non-clinical Perinatal Depression | CBT (Healthy Moms adaptation) | PHQ-9, feelings, and mood | AI chatbot: conversational Zuri app, Kenyan version of Tess |
| [ | China | 83 university students (42 cont. + 41 Tx) | RCT (unblinded) | Depression | CBT intervention | PHQ-9, GAD-7, and PANAS | AI chatbot: conversational XioNan app) |
Note: Tx, treatment group; AI, artificial intelligence; CRM, Circadian Rhythm for Mood; MYLO, method-of-level therapy program; ELIZA, client-centered therapy program; RCT, randomized controlled trial; SG, single group design; MDD, Major Depressive Disorder; BD, Bipolar Disorder; CBT, Cognitive-Behavioral Therapy; App, mobile application; MOL, method of levels; AI, Artificial Intelligence; PHQ-9, Patient Health Questionnaire-9; GAD-7, Generalized Anxiety Disorder-7; CSQ, Coping Strategies Questionnaire; OQ-30, Outcome Questionnaire; ASC, Assessment for Signal Clients; ASQ, Affective Style Questionnaire; HSCL-11, Hopkins Symptom Checklist; MINI, Mini-International Neuropsychiatric Interview; QIDS-C, Quick Inventory of Depression Symptom-clinician rated; PANAS, Positive Affect and Negative Affect Scale.
Results of clinical symptoms, engagement, and satisfaction.
| References | Results |
|---|---|
| [ | Improvements were found in PHQ-9 (t = 7.02, B = −0.82, Participants became less likely to meet diagnostic criteria for Major Depressive Disorder ( Seven participants completed all eight weeks of participation, one participant dropped out because of technical problems. Participants completed 4.8/9 sessions on the website. Participants engagement with the mobile phone gradually decreased across the intervention. Average satisfaction with the mobile phone was 5.71/7. The most common problems were loss of connectivity, shortness of battery life, and phone freezing during use. |
| [ | Control group reported higher PHQ-9 scores than group 1 ( Control group reported higher GAD-7 scores than group 1 ( Statistically significant differences were found between control group and group 1 in PANAS scores ( Tess users reported higher satisfaction, and learning. Best aspects of the bot were accessibility and empathy. Worst aspects were limitations in natural conversations, such as not being able to understand responses, unexpected answers, and low interactivity. |
| [ | Users in the AI group reported greater stress reduction [ AI participants reported more constructive coping behaviors over time [F(1,16) = 4.4, No differences between groups were found in drop-out. Drop-out = married, had children, had trouble at work or illness. Sample retention = 21% ( |
| [ | All participants reported further resolution of their problems from post-intervention to follow up [F(1,60) = 48.78, Both groups reported improvement in distress [F(2,338) = 51.10, A reduction in the DASS was found over the three time-points [F(2,314) = 49.39, MYLO users rated the program as more helpful than ELIZA users (F(1,60) = 12.98, |
| [ | Within group differences: both comparison groups showed a significant reduction in PHQ-9 score (high users: W = 478.5, Between-groups differences: high users showed higher improvement compared with low users ( 83.3% of high users actively used the app for more than 4 days on. 59.7% (77/129) of users completed at least one wellness tool provided by the app (72 high users and 5 low users). 67.7% rated favourable experience (helpful and encourage); 32% rated less favourable experience (unhelpful and concerns). |
| [ | From pre-treatment to session 10, patients treated with the recommended strategy (optimal) had a significantly higher effect size than patients treated with a non-recommended strategy (non-optimal) on the HSCL-11 (t = 1.01 = −1.99, Drop-out was predicted by higher impairment on the FEP-2, lower impairment on the HSCL-11, a more histrionic personality, higher impairment of interpersonal relationships, a less obsessive personality style, a lower therapist treatment expectation, and a lack of university entrance qualification (all |
| [ | Symptom severity change: significant differences in the PHQ-9 from t0 to t6 ( Comparison of vectors machines and random forest classifier: The random forest classification showed the highest accuracy and specificity. The support vector machine obtained higher sensitivity compared with the random forest classification. |
| [ | CRM group presented fewer (β = 0.033, CRM group had shorter manic/hypomanic episodes (β = 0.039, Positive behavioral changes in CR amplitude, light exposure during daytime, and steps during daytime were found when alert feedback was provided (assuming 95% CIs, No significant differences between groups were found in sleep. |
| [ | Mood improved by 7% over the average mood reported at baseline period (d = 0.17). Retention rate of 51.9%. Less engagement: pregnant, greater depression symptoms, and employed outside. More engagement: married and more educated women. Women had a positive attitude and expressed that they could trust the AI Zuri program. |
| [ | Participants who received chatbot intervention showed an increased reduction of depressive (F = 22.89; Attrition rate of 24.1% (20/83). There were no statistically significant differences between completers and participants who dropped out in sociodemographic or psychological factors. The chatbot group showed a deceased adherence during the five assessment points. The bibliotherapy control group slightly increased adherence rates during the first 8 weeks (two assessment points). Differences between both comparison groups were not significant. Chatbot users showed higher therapeutic alliance than bibliotherapy control group ( Positive aspects of using AI-based system included: easy access, empathy, friendly interesting, educational, exploring depression, interactive, and choice list. Negative comments regarding chatbot use were impersonal, unnatural, rigid patterns, misunderstanding, repetitive contents, too general, irrelevant contents, or too simple. |
Quality assessment of Before-After Studies.
| [ | [ | [ | [ | |
|---|---|---|---|---|
| 1. Was the study question or objective clearly stated? | Yes | Yes | Yes | Yes |
| 2. Were eligibility/selection criteria for the study population prespecified and clearly described? | Yes | Yes | Yes | Yes |
| 3. Were the participants in the study representative of those who would be eligible for the test/service/intervention in the general or clinical population of interest? | Yes | Yes | Yes | Yes |
| 4. Were all eligible participants that met the prespecified entry criteria enrolled? | No | Yes | Yes | Yes |
| 5. Was the sample size sufficiently large to provide confidence in the findings? | NR | Yes | Yes | NR |
| 6. Was the test/service/intervention clearly described and delivered consistently across the study population? | Yes | Yes | No | Yes |
| 7. Were the outcome measures prespecified, clearly defined, valid, reliable, and assessed consistently across all study participants? | Yes | Yes | Yes | Yes |
| 8. Were the people assessing the outcomes blinded to the participants’ exposures/interventions? | NA | Yes | NA | NA |
| 9. Was the loss to follow-up after baseline 20% or less? Were those lost to follow-up accounted for in the analysis? | Yes | Yes | No/Yes | No/No |
| 10. Did the statistical methods examine changes in outcome measures from before to after the intervention? Were statistical tests done that provided p values for the pre-to-post changes? | Yes | Yes | Yes | Yes |
| 11. Were outcome measures of interest taken multiple times before the intervention and multiple times after the intervention (i.e., did they use an interrupted time-series design)? | Yes | No | Yes | Yes |
| 12. If the intervention was conducted at a group level (e.g., a whole hospital, a community, etc.) did the statistical analysis consider the use of individual-level data to determine effects at the group level? | NA | NA | NA | NA |
|
|
|
|
|
|
Note: CD, cannot determine; NA, not applicable; NR, not reported.
Quality assessment of Case Series Studies.
| [ | |
|---|---|
| 1. Was the study question or objective clearly stated? | Yes |
| 2. Was the study population clearly and fully described, including a case definition? | Yes |
| 3. Were the cases consecutive? | Yes |
| 4. Were the subjects comparable? | Yes |
| 5. Was the intervention clearly described? | Yes |
| 6. Were the outcome measures clearly defined, valid, reliable, and implemented consistently across all study participants? | Yes |
| 7. Was the length of follow-up adequate? | NA |
| 8. Were the statistical methods well-described? | Yes |
| 9. Were the results well-described? | Yes |
|
|
|
Note: CD, cannot determine; NA, not applicable; NR, not reported.
Quality assessment of Case-Control Studies.
| [ | |
|---|---|
| 1. Was the research question or objective in this paper clearly stated and appropriate? | Yes |
| 2. Was the study population clearly specified and defined? | Yes |
| 3. Did the authors include a sample size justification? | No |
| 4. Were controls selected or recruited from the same or similar population that gave rise to the cases (including the same timeframe)? | Yes |
| 5. Were the definitions, inclusion and exclusion criteria, algorithms or processes used to identify or select cases and controls valid, reliable, and implemented consistently across all study participants? | CD |
| 6. Were the cases clearly defined and differentiated from controls? | Yes |
| 7. If less than 100 percent of eligible cases and/or controls were selected for the study, were the cases and/or controls randomly selected from those eligible? | NR |
| 8. Was there use of concurrent controls? | No |
| 9. Were the investigators able to confirm that the exposure/risk occurred prior to the development of the condition or event that defined a participant as a case? | Yes |
| 10. Were the measures of exposure/risk clearly defined, valid, reliable, and implemented consistently (including the same time period) across all study participants? | Yes |
| 11. Were the assessors of exposure/risk blinded to the case or control status of participants? | No |
| 12. Were key potential confounding variables measured and adjusted statistically in the analyses? If matching was used, did the investigators account for matching during study analysis? | Yes |
|
|
|
Note: CD, cannot determine; NA, not applicable; NR, not reported.
Quality assessment of Controlled Intervention Studies.
| [ | [ | [ | [ | |
|---|---|---|---|---|
| 1. Was the study described as randomized, a randomized trial, a randomized clinical trial, or an RCT? | No | Yes | Yes | Yes |
| 2. Was the method of randomization adequate (i.e., use of randomly generated assignment)? | NR | Yes | Yes | Yes |
| 3. Was the treatment allocation concealed (so that assignments could not be predicted)? | NR | Yes | Yes | Yes |
| 4. Were study participants and providers blinded to treatment group assignment? | NR | NR | Yes | No |
| 5. Were the people assessing the outcomes blinded to the participants’ group assignments? | Yes | Yes | Yes | No |
| 6. Were the groups similar at baseline on important characteristics that could affect outcomes (e.g., demographics, risk factors, or co-morbid conditions)? | NR | No | Yes | Yes |
| 7. Was the overall drop-out rate from the study at endpoint 20% or lower of the number allocated to treatment? | NR | No | Yes | No |
| 8. Was the differential drop-out rate (between treatment groups) at endpoint 15 percentage points or lower? | NR | Yes | Yes | Yes |
| 9. Was there high adherence to the intervention protocols for each treatment group? | NR | Yes | Yes | Yes |
| 10. Were other interventions avoided or similar in the groups (e.g., similar background treatments)? | NR | NR | Yes | Yes |
| 11. Were outcomes assessed using valid and reliable measures, implemented consistently across all study participants? | Yes | No | Yes | Yes |
| 12. Did the authors report that the sample size was sufficiently large to be able to detect a difference in the main outcome between groups with at least 80% power? | No | Yes | Yes | Yes |
| 13. Were outcomes reported or subgroups analyzed prespecified (i.e., identified before analyses were conducted)? | No | Yes | Yes | Yes |
| 14. Were all randomized participants analyzed in the group to which they were originally assigned, i.e., did they use an intention-to-treat analysis? | NR | No | Yes | Yes |
|
|
|
|
|
|
Note: CD, cannot determine; NA, not applicable; NR, not reported.