Alex S Cohen1,2, Zachary Rodriguez1,2, Kiara K Warren1, Tovah Cowan1, Michael D Masucci1, Ole Edvard Granrud1, Terje B Holmlund3, Chelsea Chandler4,5, Peter W Foltz4,5, Gregory P Strauss6. 1. Louisiana State University, Department of Psychology, Baton Rouge, LA, USA. 2. Louisiana State University, Center for Computation and Technology, Baton Rouge, LA, USA. 3. University of Tromsø-The Arctic University of Norway, Tromso, Norway. 4. University of Colorado, Institute of Cognitive Science, Boulder, CO, USA. 5. University of Colorado, Department of Computer Science, Boulder, CO, USA. 6. University of Georgia, Department of Psychology, Athens, GA, USA.
Abstract
BACKGROUND AND HYPOTHESIS: Despite decades of "proof of concept" findings supporting the use of Natural Language Processing (NLP) in psychosis research, clinical implementation has been slow. One obstacle reflects the lack of comprehensive psychometric evaluation of these measures. There is overwhelming evidence that criterion and content validity can be achieved for many purposes, particularly using machine learning procedures. However, there has been very little evaluation of test-retest reliability, divergent validity (sufficient to address concerns of a "generalized deficit"), and potential biases from demographics and other individual differences. STUDY DESIGN: This article highlights these concerns in development of an NLP measure for tracking clinically rated paranoia from video "selfies" recorded from smartphone devices. Patients with schizophrenia or bipolar disorder were recruited and tracked over a week-long epoch. A small NLP-based feature set from 499 language samples were modeled on clinically rated paranoia using regularized regression. STUDY RESULTS: While test-retest reliability was high, criterion, and convergent/divergent validity were only achieved when considering moderating variables, notably whether a patient was away from home, around strangers, or alone at the time of the recording. Moreover, there were systematic racial and sex biases in the model, in part, reflecting whether patients submitted videos when they were away from home, around strangers, or alone. CONCLUSIONS: Advancing NLP measures for psychosis will require deliberate consideration of test-retest reliability, divergent validity, systematic biases and the potential role of moderators. In our example, a comprehensive psychometric evaluation revealed clear strengths and weaknesses that can be systematically addressed in future research.
BACKGROUND AND HYPOTHESIS: Despite decades of "proof of concept" findings supporting the use of Natural Language Processing (NLP) in psychosis research, clinical implementation has been slow. One obstacle reflects the lack of comprehensive psychometric evaluation of these measures. There is overwhelming evidence that criterion and content validity can be achieved for many purposes, particularly using machine learning procedures. However, there has been very little evaluation of test-retest reliability, divergent validity (sufficient to address concerns of a "generalized deficit"), and potential biases from demographics and other individual differences. STUDY DESIGN: This article highlights these concerns in development of an NLP measure for tracking clinically rated paranoia from video "selfies" recorded from smartphone devices. Patients with schizophrenia or bipolar disorder were recruited and tracked over a week-long epoch. A small NLP-based feature set from 499 language samples were modeled on clinically rated paranoia using regularized regression. STUDY RESULTS: While test-retest reliability was high, criterion, and convergent/divergent validity were only achieved when considering moderating variables, notably whether a patient was away from home, around strangers, or alone at the time of the recording. Moreover, there were systematic racial and sex biases in the model, in part, reflecting whether patients submitted videos when they were away from home, around strangers, or alone. CONCLUSIONS: Advancing NLP measures for psychosis will require deliberate consideration of test-retest reliability, divergent validity, systematic biases and the potential role of moderators. In our example, a comprehensive psychometric evaluation revealed clear strengths and weaknesses that can be systematically addressed in future research.
Authors: Alex S Cohen; Elana Schwartz; Thanh P Le; Tovah Cowan; Brian Kirkpatrick; Ian M Raugh; Gregory P Strauss Journal: Schizophr Bull Date: 2021-01-23 Impact factor: 9.306
Authors: Katherine Holshausen; Philip D Harvey; Brita Elvevåg; Peter W Foltz; Christopher R Bowie Journal: Cortex Date: 2013-02-19 Impact factor: 4.027