| Literature DB >> 32442152 |
Alina Haines-Delmont1, Gurdit Chahal2, Ashley Jane Bruen3, Abbie Wall3, Christina Tara Khan4, Ramesh Sadashiv2, David Fearnley5.
Abstract
BACKGROUND: Digital phenotyping and machine learning are currently being used to augment or even replace traditional analytic procedures in many domains, including health care. Given the heavy reliance on smartphones and mobile devices around the world, this readily available source of data is an important and highly underutilized source that has the potential to improve mental health risk prediction and prevention and advance mental health globally.Entities:
Keywords: cell phone; digital phenotyping; machine learning; nearest neighbor algorithm; smartphone; suicidal ideation; suicide
Mesh:
Year: 2020 PMID: 32442152 PMCID: PMC7380988 DOI: 10.2196/15901
Source DB: PubMed Journal: JMIR Mhealth Uhealth ISSN: 2291-5222 Impact factor: 4.773
Figure 1Strength Within Me study flow diagram. Timeframe for recruitment: January-November 2018. Included in the analysis: participants who completed C-SSRS at second follow-up. C-SSRS: Columbia Suicide Severity Rating Scale.
Strength Within Me study data.
| Data source | Examples of variables collected | Examples of raw data | Examples of derived data |
| Stats of Facebook activity and post activity | Number of posts: 5 and number of total likes: 100 | Average likes per post: 20 | |
| User input | Journal, mood, reminders, and safety plan steps | Journal entry: “Last night was horrible. I couldn’t sleep at all with the noise.” | Sentiment: −0.8 and word count: 12 |
| Clinical team | Demographics and C-SSRSa responses | Age: 35 years and C-SSRS risk overall: moderate | C-SSRS risk binary: 1 |
| Passive sensor data | Sleep, steps, and interactions | { | Sleep latency: 1 min and average time asleep: 5 hours |
aC-SSRS: Columbia-suicide severity rating scale.
Engagement rate across active and passive data in the study (N=66).
| Data source | Rate, n (%) |
| Step-related features (Fitbit and iPhone) | 26 (40) |
| Journal entries (self-documented via SWiMa app) | 45 (68) |
| Mood entries (self-reported via SWiM app) | 53 (80) |
| Phone activity (data usage) | 66 (100) |
| Sleep (Fitbit) | 59 (90) |
aSWiM: Strength Within Me.
Figure 2Example of a decision tree formed for the Strength Within Me study data.
Figure 3A diagram of principal component analysis. A high-dimensional dataset has been flattened to a 2-dimensional space where the new axes correspond to the principal components (they point in the direction of the largest variance of the data).
Principal components analysis components and patterns.
| Component | Description | Themes, patterns |
| First component | Maximum efficiency, average efficiency, median efficiency, max time in bed, and number of sleep recordings | Ability to sleep, sleep quality |
| Second component | Number of packets sent, number of times connected to Wi-Fi, number of times connected to cellular data plan, and number of times journal entered | User app activity, data presence |
| Third component | SD sleep start, median journal feeling, max sleep start, max journal feeling, minutes in bed, and minimum journal feeling | Feeling versus sleep activity |
| Fourth component | Median char length, median word length, median journal feeling, SD rest duration, and max rest duration | Journal input versus resting variability |
| Fifth component | Median sentiment, SD number of awakenings during sleep, number of awakenings during sleep, and minimum sentiment | Sleep quality and reflection tone |
The average cross-validation accuracy, along with the SD of the accuracy observed for the various folds.
| Algorithm | 10-fold CVa average accuracy (10,000 iterations) | SD | Comments |
| K-nearest neighbors (k=2)+PCAb (n=5) | 0.68 | 0.12 | Best performance, k=2 seemed natural and worked best up to 10 |
| Random forest (k=25)+PCA (n=5) | 0.60 | 0.13 | Nonlinear helps, too many trees did not, PCA reduced deviation |
| Random forest on raw features (k=25) | 0.60 | 0.15 | Nonlinear helps, too many trees did not |
| SVMc (degree 2 polynomial kernel) | 0.57 | 0.10 | Likely overfit, base guessing |
| Logistic regression+PCA (n=5) | 0.59 | 0.14 | Removed correlation due to PCA+prevent overfitting |
| Logistic regression on raw features | 0.55 | 0.16 | Likely overfit, base guessing |
| Baseline: guessing majority from training fold | 0.53 | 0.20 | Baseline to beat |
aCV: cross-validation.
bPCA: principal component analysis.
cSVM: support vector machine.
Figure 4Example of nearest neighbors with k=2 with data in 2 dimensions. Here, the new test point is x and has 1 minus neighbor and 1 plus neighbor as its 2 closest neighbors. As the minus neighbor is closer, the new point x will be classified as minus. “+” stands for positive class, “-” for negative class, and “x” for new data point that has yet to be assigned a class.
Figure 5Receiver operating characteristic curve for k-nearest neighbors. AUC: area under the curve.