| Literature DB >> 28113247 |
N Palmius, A Tsanas, K E A Saunders, A C Bilderbeck, J R Geddes, G M Goodwin, M De Vos.
Abstract
OBJECTIVE: This paper aims to identify periods of depression using geolocation movements recorded from mobile phones in a prospective community study of individuals with bipolar disorder (BD).Entities:
Mesh:
Year: 2016 PMID: 28113247 PMCID: PMC5947818 DOI: 10.1109/TBME.2016.2611862
Source DB: PubMed Journal: IEEE Trans Biomed Eng ISSN: 0018-9294 Impact factor: 4.538
Demographic Data for Participants Who Provided Sufficient Data for Analysis
| HC | BD With Only QIDS Score < 11 | BD With QIDS Score ≥ 11 | |
|---|---|---|---|
| Participants | 14 | 15 | 7 |
| Gender | Two male; 12 female | Five male; ten female | Two male; five female |
| Age (mean ± sd) | 42 ± 14 | 46 ± 14 | 41 ± 15 |
| BMI (mean ± sd) | 24.5 ± 4.6 | 26.4 ± 3.4 | 29.3 ± 1.9 |
| Employment status | Six full-time employed; | ||
| three part-time employed; | Six full-time employed; | ||
| three unemployed; | six part-time employed; | Two full-time employed; | |
| one student; and | two students; and | two part-time employed; and | |
| one retired | one unknown | three unemployed | |
| Weeks of data (mean ± sd) | 8.9 ± 3.2 | 8.4 ± 2.8 | 8.6 ± 3.6 |
Fig. 1Stacked QIDS score distribution in labeled weeks. The height of each bar shows the total number of weeks labeled and the shading indicates how many are from each group.
Fig. 2Filtering of inaccurate data for a typical weekend [top, (a)–(c)] and weekday [bottom, (d)–(f)] day from different participants. Plots (a) and (b) on the left show the coordinates where the participant was recorded relative to their home location. The dark orange star represents (0, 0), which is assumed to be the home location of the participant from which other distances are computed. The surrounding ellipse shows the 95% confidence interval of the points recorded at the assumed home location modeled as a multivariate normal distribution. The purple points are accurate location readings with the path between them joined by the blue arrows. The broken arrow in (B) indicates that there is a gap of longer than 15 min where no data were recorded between the two points and therefore the location of the participant cannot be accurately determined. The yellow points are inaccurate noise recorded at the same time as the accurate recordings. Graphs (b) and (c) and (e) and (f) on the right show the Euclidean distance from where the participant is recorded to their assumed home location over the duration of the day. Plots (b) and (e) show the original unfiltered data with noise, and plots (c) and (f) show the filtered data. The colors of the markers in graphs (b) and (c) and (e) and (f) correspond to the locations with the same color shown in plots (a) and (d). The blue shaded areas in (b) and (e) are showing rapid transitions between the purple (accurate) and yellow (inaccurate) locations (these transitions were excluded from plots (a) and (d) for clarity). It can clearly be seen that the yellow points are inaccurate because they are located far from what appears to be a reasonable path in the plots on the left and that they occur concurrently with the locations in the reasonable path in the graphs on the right. The size of the point in the plots on the left indicates the length of time spent in each location, scaled between 10 min and 1 h. Plots (c) and (f) show the preprocessed data traces used for further analysis.
Feature Overview
| Feature Name | Abbr. | Description |
|---|---|---|
| Entropy | ENT | A measure of the variability in the time that participants spend in the different locations recorded, defined as |
| Normalized Entropy | NENT | A variant of the ENT feature scaled to be in the range [0, 1], defined as |
| Location Variance | LV | An indication of how much the individual is moving between different locations based on the sum of statistical variances in the latitude and longitude, defined as |
| Home Stay | HS | The percentage of time that the participant is recorded in their home location. |
| Transition Time | TT | The percentage of all the time spent travelling between stationary locations in the data recorded. |
| Total Distance | TD | The sum of Euclidean distances between the consecutive location points recorded in the data, calculated as |
| Number of Clusters | NC | The number of distinct location clusters extracted in the week-long data sections using the |
| Diurnal Movement | DM | A measure of daily regularity quantified using the Lomb–Scargle periodogram to determine the power in frequencies with wavelengths around 24 h. The power spectral density (PSD) of the signal in selected frequencies with wavelengths between 23.5 h and 24.5 h is calculated and averaged as |
| Diurnal Movement on Normalized Coordinates | DMN | Similar to the DM feature but calculated on a normalized set of coordinates, where the latitude and longitude are both scaled to have zero mean and unit variance within the period being classified. |
| Diurnal Movement on the Distance From Home | DMD | Similar to the DM and DMN features but calculated using the Euclidean distance from home, rather than latitude and longitude, normalized to have zero mean and unit variance within the period being classified. |
Fig. 3Feature distributions for features calculated on data subsets with optimal individual classification results. Feature abbreviations are given in Table II; WD: Weekday data subset; OPTIMIZED: Optimized daily exclusion data subset; MEDIAN: Median data subset.
Fig. 4Features extracted from the weekday geolocation data from HC and BD participants, showing (a) normalized entropy; and (b) home stay. The standard linear regression model in (6) and GLM with a quadratic model and logistic link function in (7) calculated on the BP participants only are shown overlaid on each of the features. The dashed line shows the moderate depression threshold where the QIDS score is 11.
Regression Model Error Rates on Optimal Data Subset
| Feature | Baseline | Linear Model | Quadratic Logistic GLM | ||
|---|---|---|---|---|---|
| MAE | Data Subset | MAE | Data Subset | MAE | |
| ENT | 4.724 | Weekday | 4.432 | Weekday | |
| NENT | 4.724 | Optimized | 4.207 | Weekday | |
| LV | 4.724 | Optimized | 4.276 | Optimized | |
| HS | 4.724 | Weekday | 4.602 | Weekday | |
| TT | 4.724 | Weekend | 4.530 | Weekend | |
| TD | 4.724 | Base | | Median | 4.666 |
| NC | 4.724 | Weekend | | Weekend | 4.576 |
| DM | 4.724 | Optimized | 4.262 | Optimized | |
| DMN | 4.724 | Optimized | 4.629 | Optimized | |
| DMD | 4.724 | Optimized | Optimized | 4.503 | |
| Combined | 4.724 | Ten Features | 3.748 | 14 Features | |
Significance of fitted model from the baseline model indicated by asterisks: * < 0.05; ** < 0.01; *** < 0.001.
Feature abbreviations are given in Table II; MAE: mean absolute error.
Fig. 5Classification results for depression detection model (based on labeled QIDS score ≥ 11) trained on BD participants only. Classification was performed using QDA with the leave-one-participant-out with group equalization method with 100 folds of group equalization for each left out participant. Features were presented to the classifier in the order selected by the feature selection wrapper method. Results are summarized in the form of box plots showing the median and interquartile range with outliers denoted with crosses. The classification accuracy for only the full-time employed and unemployed participants is also shown in the dashed traces. Feature abbreviations are given in Table II; WD: weekday data subset; WE: weekend data subset.
Fig. 6Performance metrics of the leave-one-participant-out classifier trained with the five features providing optimal classification accuracy. The confusion matrix (a) shows the classification of test samples of each class. Each row is the true class, and each column is the classification. The ROC graph (b) shows how the classifier performs on positive and negative test samples as the classification threshold is adjusted. The gray traces are the results from the individual models trained in cross-validation and the thicker red trace is the mean value. FPR: false positive rate; TPR: true positive rate.
Depression Classification Results
| No. Features | LOO Cross-Validation | 10-fold Cross-Validation | 5-fold Cross-Validation | 3-fold Cross-Validation | ||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | AUC | Accuracy | AUC | Accuracy | AUC | Accuracy | AUC | |
| 1 | 0.747 ± 0.016 | 0.810 ± 0.014 | 0.747 ± 0.016 | 0.811 ± 0.017 | 0.753 ± 0.027 | 0.812 ± 0.028 | 0.758 ± 0.022 | 0.824 ± 0.011 |
| 2 | 0.785 ± 0.016 | 0.833 ± 0.016 | 0.780 ± 0.011 | 0.837 ± 0.016 | 0.780 ± 0.027 | 0.825 ± 0.026 | 0.785 ± 0.022 | 0.855 ± 0.010 |
| 3 | 0.806 ± 0.005 | 0.867 ± 0.012 | 0.801 ± 0.016 | 0.867 ± 0.014 | 0.801 ± 0.022 | 0.859 ± 0.021 | 0.817 ± 0.022 | 0.885 ± 0.012 |
| 4 | 0.839 ± 0.016 | 0.828 ± 0.016 | 0.871 ± 0.023 | 0.844 ± 0.022 | 0.878 ± 0.022 | 0.833 ± 0.016 | 0.900 ± 0.013 | |
| 5 | 0.878 ± 0.013 | 0.833 ± 0.022 | 0.876 ± 0.016 | 0.876 ± 0.011 | 0.909 ± 0.010 | |||
| 6 | 0.844 ± 0.016 | 0.871 ± 0.017 | 0.849 ± 0.022 | 0.887 ± 0.024 | 0.887 ± 0.016 | |||
| 7 | 0.844 ± 0.016 | 0.869 ± 0.014 | 0.833 ± 0.016 | 0.869 ± 0.021 | 0.839 ± 0.054 | 0.885 ± 0.078 | 0.908 ± 0.012 | |
| 8 | 0.879 ± 0.015 | 0.839 ± 0.016 | 0.869 ± 0.018 | 0.833 ± 0.027 | 0.882 ± 0.024 | 0.882 ± 0.022 | 0.908 ± 0.020 | |
| 9 | 0.844 ± 0.016 | 0.867 ± 0.015 | 0.839 ± 0.016 | 0.872 ± 0.018 | 0.817 ± 0.027 | 0.855 ± 0.042 | 0.876 ± 0.016 | 0.899 ± 0.021 |
All values are presented as median ± IQR. Classification was performed using 100 iterations of group equalization for each left out participant or fold. 3-fold cross-validation was performed by splitting all the data for each participant into the three partitions to use for cross-validation.