| Literature DB >> 34234186 |
Sandrine R Müller1,2, Xi Leslie Chen3, Heinrich Peters4, Augustin Chaintreau3, Sandra C Matz4.
Abstract
Depression is one of the most common mental health issues in the United States, affecting the lives of millions of people suffering from it as well as those close to them. Recent advances in research on mobile sensing technologies and machine learning have suggested that a person's depression can be passively measured by observing patterns in people's mobility behaviors. However, the majority of work in this area has relied on highly homogeneous samples, most frequently college students. In this study, we analyse over 57 million GPS data points to show that the same procedure that leads to high prediction accuracy in a homogeneous student sample (N = 57; AUC = 0.82), leads to accuracies only slightly higher than chance in a U.S.-wide sample that is heterogeneous in its socio-demographic composition as well as mobility patterns (N = 5,262; AUC = 0.57). This pattern holds across three different modelling approaches which consider both linear and non-linear relationships. Further analyses suggest that the prediction accuracy is low across different socio-demographic groups, and that training the models on more homogeneous subsamples does not substantially improve prediction accuracy. Overall, the findings highlight the challenge of applying mobility-based predictions of depression at scale.Entities:
Year: 2021 PMID: 34234186 PMCID: PMC8263566 DOI: 10.1038/s41598-021-93087-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Predictive accuracies (AUC-ROC) and the 95% confidence intervals across the student and general population samples, using three different classifiers and four different feature sets. (A) Shows the results of logistic regression to predict depression from different data sources in college students (light blue) and the general population sample (dark blue). It also highlights the most predictive features in the mobility model for the student sample. Red (green) indicates a negative (positive) relationship with depression. (B) shows the corresponding results using random forest and XGBoost algorithms. Data collected using the MindDoc app (https://minddoc.de/app).
Figure 2Average out-of-sample predictive performance (AUC) of different socio-demographic sub-population as a function of sample size. Shapes indicate whether the subpopulation is based on a single socio-demographic variable (triangles) or a combination of two variables (circles).
Figure 3Out-of-sample predictive accuracies (AUC) of models trained and validated on the data of different socio-demographic subpopulation as a function of sample size. Shapes indicate whether the subpopulation is based on a single socio-demographic variable (circles) or a combination of two variables (triangles).
Figure 4(a) Distribution of out-of-sample predictive performance (AUC) of clusters determined by K-means algorithm using different numbers of clusters. (b) Average out-of-sample predictive performance (AUC) of subsamples at various thresholds of GPS-records (e.g., 2000 records and higher). The size of the circle indicates the number of participants in each subsample. The dotted line indicates the average AUC score in the MindDoc sample.