| Literature DB >> 26630392 |
Bo Xiao1, Zac E Imel2, Panayiotis G Georgiou1, David C Atkins3, Shrikanth S Narayanan1.
Abstract
The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.Entities:
Mesh:
Year: 2015 PMID: 26630392 PMCID: PMC4668058 DOI: 10.1371/journal.pone.0143055
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of psychotherapy corpora and role in automatic empathy evaluation.
| # Session | # Talk turn | # Words | Has audio | Usage | |
|---|---|---|---|---|---|
| General psychotherapy | 1205 | 300863 | 6550270 | No | ASR-LM training |
| MI Randomized Trials | 153 | 36907 | 1123842 | Yes | ASR-AM & ASR-LM training |
| CTT Trial | 200 | 23985 | 624395 | Yes | Empathy Detection |
Note. ASR = Automatic speech recognition; LM = language model; AM = acoustic model
a For further information on the specific MI randomized trials are summarized see. [24] Specific studies include Alcohol Research Collaborative: Peer Programs; [52] Event Specific Prevention: Spring Break; [53] Event Specific Prevention: Twenty First Birthday; [54] Brief Intervention for Problem Drug Use and Abuse in Primary Care; [55] Indicated Marijuana Prevention for Frequently Using College Students. [56]
b CTT = Context Tailored Training. [23]
Fig 1Distribution of human empathy ratings for the 200 sessions in the CTT trial.
Fig 2Overview of processing steps for moving from audio recording of session to predicted value of empathy.
The lower portion of the figure represents the process for a single session recording, whereas the upper portion represents various speech signal processing tasks, learned from all available corpora (as indicated in the text).
Empathy prediction performance.
| Model | Correlation | Accuracy (%) | Recall (%) | Precision (%) | F-Score (%) |
|---|---|---|---|---|---|
| Chance level | - | 60.5 | 100.0 | 60.5 | 75.4 |
| Coder (individual vs. average) agreement | - | 89.9 | 87.7 | 93.7 | 90.3 |
| Human Transcription | 0.71 | 85.0 | 96.7 | 81.8 | 88.6 |
| Human Diarization | 0.65 | 80.5 | 93.4 | 78.5 | 85.3 |
| Full VAD, Diarization, ASR | 0.65 | 82.0 | 91.7 | 81.0 | 86.1 |
a These results were calculated on 63 sessions instead of 200. On sessions coded by multiple coders, the average opinion is used to derive the dichotomous (binarized) decision. The opinion of the individual coder is compared with the average decision to establish this coder-agreement.
b Result is fully automatic, no human intervention in algorithm.
High vs. low empathy tri-grams.
| High empathy | Low empathy | ||
|---|---|---|---|
| it sounds like | a lot of | during the past | please answer the |
| do you think | you think about | using card a | you need to |
| you think you | you think that | past twelve months | clean and sober |
| sounds like you | a little bit | do you have | have you ever |
| that sounds like | brought you here | some of the | to help you |
| sounds like it’s | sounds like you’re | little bit about | mm hmm so |
| p s is | you’ve got a | the past ninety | in your life |
| what I’m hearing | and I think | first of all | next questions using |
| one of the | if you were | you know what | you have to |
| so you feel | it would be | the past twelve | school or training |