| Literature DB >> 30356094 |
Saeed Hassanpour1,2,3,4, Naofumi Tomita5, Timothy DeLise6, Benjamin Crosier6,5, Lisa A Marsch6,5,7.
Abstract
Social media may provide new insight into our understanding of substance use and addiction. In this study, we developed a deep-learning method to automatically classify individuals' risk for alcohol, tobacco, and drug use based on the content from their Instagram profiles. In total, 2287 active Instagram users participated in the study. Deep convolutional neural networks for images and long short-term memory (LSTM) for text were used to extract predictive features from these data for risk assessment. The evaluation of our approach on a held-out test set of 228 individuals showed that among the substances we evaluated, our method could estimate the risk of alcohol abuse with statistical significance. These results are the first to suggest that deep-learning approaches applied to social media data can be used to identify potential substance use risk behavior, such as alcohol use. Utilization of automated estimation techniques can provide new insights for the next generation of population-level risk assessment and intervention delivery.Entities:
Mesh:
Year: 2018 PMID: 30356094 PMCID: PMC6333814 DOI: 10.1038/s41386-018-0247-x
Source DB: PubMed Journal: Neuropsychopharmacology ISSN: 0893-133X Impact factor: 7.853
The collected Instagram data in the study
| Data source | Number | Per user |
|---|---|---|
| Instagram user participants | 2287 | N/A |
| Collected Instagram images | 466,227 | 201 ± 392 |
| Collected Instagram captions | 369,000 | 161 ± 298 |
| Collected Instagram comments | 475,000 | 218 ± 586 |
Fig. 1Substance use risk distribution in our dataset. a Substance use risk distribution among Instagram users in the dataset according to NIDA Modified ASSIST risk categories. b Binarized substance use risk distribution among Instagram users in the dataset
Fig. 2Our machine learning architecture for substance use risk estimation based on Instagram data. This architecture uses CNN and LSTM for feature extraction from images and text. A fully connected layer was trained to use the aggregated features to generate the final estimation model for substance use risk
Fig. 3Evaluation of our approach on a held-out test set of 228 individuals. a Receiver operating characteristic (ROC) curves of our risk identification approach on the held-out test set for different substances. b The evaluation results of our machine learning model and the associated 95% confidence intervals for four different substances
The performance of different models and the contribution of different social media features and data types for alcohol use risk assessment
| Model | Feature/data type | AUROC |
|---|---|---|
| Logistic Regression Model (Baseline) | Face features | 0.54 |
| Alcohol-related captions | 0.51 | |
| Alcohol-related comments | 0.50 | |
| Face features and alcohol-related captions | 0.54 | |
| Face features and alcohol-related comments | 0.54 | |
| Alcohol-related captions and comments | 0.51 | |
| All three features combined | 0.55 | |
| Our Deep-Learning Model | Images only | 0.54 |
| Captions only | 0.56 | |
| Comments only | 0.60 | |
| Images and captions | 0.56 | |
| Images and comments | 0.61 | |
| Captions and comments | 0.61 | |
| All three data types combined | 0.65 |