Rupa Jose1, Matthew Matero2, Garrick Sherman1, Brenda Curtis3, Salvatore Giorgi3,4, Hansen Andrew Schwartz2, Lyle H Ungar4,5. 1. Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA. 2. Department of Computer Science, Stony Brook University, Stony Brook, New York, USA. 3. Technology and Translational Research Unit, National Institute on Drug Abuse, Baltimore, Maryland, USA. 4. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA. 5. Department of Psychology, Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Abstract
BACKGROUND: Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. METHODS: Using data collected on 3664 respondents from the general population, we examine how accurately language used on social media classifies individuals as at-risk for alcohol problems based on Alcohol Use Disorder Identification Test-Consumption score benchmarks. RESULTS: We find that social media language is moderately accurate (area under the curve = 0.75) at identifying individuals at risk for alcohol problems (i.e., hazardous drinking/alcohol use disorders) when used with models based on contextual word embeddings. High-risk alcohol use was predicted by individuals' usage of words related to alcohol, partying, informal expressions, swearing, and anger. Low-risk alcohol use was predicted by individuals' usage of social, affiliative, and faith-based words. CONCLUSIONS: The use of social media data to study drinking behavior in the general public is promising and could eventually support primary and secondary prevention efforts among Americans whose at-risk drinking may have otherwise gone "under the radar."
BACKGROUND: Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. METHODS: Using data collected on 3664 respondents from the general population, we examine how accurately language used on social media classifies individuals as at-risk for alcohol problems based on Alcohol Use Disorder Identification Test-Consumption score benchmarks. RESULTS: We find that social media language is moderately accurate (area under the curve = 0.75) at identifying individuals at risk for alcohol problems (i.e., hazardous drinking/alcohol use disorders) when used with models based on contextual word embeddings. High-risk alcohol use was predicted by individuals' usage of words related to alcohol, partying, informal expressions, swearing, and anger. Low-risk alcohol use was predicted by individuals' usage of social, affiliative, and faith-based words. CONCLUSIONS: The use of social media data to study drinking behavior in the general public is promising and could eventually support primary and secondary prevention efforts among Americans whose at-risk drinking may have otherwise gone "under the radar."
Authors: Johannes C Eichstaedt; Robert J Smith; Raina M Merchant; Lyle H Ungar; Patrick Crutchley; Daniel Preoţiuc-Pietro; David A Asch; H Andrew Schwartz Journal: Proc Natl Acad Sci U S A Date: 2018-10-15 Impact factor: 11.205