| Literature DB >> 35404256 |
Joanne Zhou1, Bishal Lamichhane2, Dror Ben-Zeev3, Andrew Campbell4, Akane Sano2.
Abstract
BACKGROUND: Behavioral representations obtained from mobile sensing data can be helpful for the prediction of an oncoming psychotic relapse in patients with schizophrenia and the delivery of timely interventions to mitigate such relapse.Entities:
Keywords: Gaussian mixture models; balanced random forest; clustering; dynamic time warping; machine learning; mobile phone; partition around medoids; psychotic relapse; routine; schizophrenia
Mesh:
Year: 2022 PMID: 35404256 PMCID: PMC9039818 DOI: 10.2196/31006
Source DB: PubMed Journal: JMIR Mhealth Uhealth ISSN: 2291-5222 Impact factor: 4.947
Figure 1Sequential relapse prediction approach used in this study. Features are extracted from a period of 4 weeks to predict if a relapse might occur in the coming week.
Features used in relapse prediction models. Baseline features are derived from a previous study [19]. We evaluated if the clustering-based features could improve relapse prediction by complementing the daily behavioral rhythm change-based features represented in the baseline features.
| Feature set and modalities | Features | |
|
| ||
|
| Accelerometer magnitude; ambient light; distance traveled; call duration; sound level; and conversation duration | Mean daily template features (mean, SD, max, range, skewness, and kurtosis), SD template features (mean), absolute difference between mean and maximum template (max), distance between normalized mean templates, weighted distance between normalized mean templates, distance between normalized maximum template and mean template, and daily averages (mean and SD) |
|
| 10-item EMAsa | Mean and SD of EMA items in feature extraction window |
|
| Screen use, distance-based mobility features: distance from home and total movement, and duration-based mobility features: time spent at a location and time spent at home | Mean and SD of daily averages in feature extraction window |
|
| ||
|
| GMMb features | Mean and SD of GMM label and GMM likelihood scores, number of cluster transitions, and number of cluster states |
|
| PAMc features | Mean and SD of PAM label, PAM distance scores, and DTWd difference from the previous day; number of cluster transitions; and number of cluster states |
| Demographic features | Age and education years | |
aEMA: ecological momentary assessment.
bGMM: Gaussian mixture model.
cPAM: partition around medoids.
dDTW: dynamic time warping.
Figure 2Personalization approach for the relapse prediction model [19]. A personalization subset, consisting of data from patients who are closest in age to the test patients, is used to identify the best feature sets, using which a (personalized) relapse prediction model can be trained.
Figure 3Trace of the sample covariance matrix for each cluster obtained with Gaussian mixture model (GMM) and partition around medoids (PAM) clustering approach. A low covariance matrix trace indicates more homogeneous clusters, that is, clusters with lower within-cluster variability.
Figure 4Average daily templates of two signal modalities acceleration (top) and conversation time (bottom) in the clusters obtained from the Gaussian mixture model (GMM) and partition around medoids (PAM) models. Different clusters capture different behavioral patterns.
All cluster profiles obtained from the GMMa and PAMb models in descending cluster size. Different clusters are associated with peculiar behaviors specific to that cluster as it can be observed from the typical profile of signal modality in that cluster.
| Cluster size rank | GMM cluster profile | GMM cluster size (days) | PAM cluster profile | PAM cluster size (days) |
| 1 | No app use, high conversation and SMS text messaging, and other attributes are approximately average | 5217 | Low acceleration, conversation, volume, and sleep duration and very low variability in sleep and volume templates | 3318 |
| 2 | Highest app use and phone calls; high acceleration, conversation, SMS text messaging, distance moved, and volume; early wake up (at approximately 7 AM); and no sleep during the day | 3993 | High volume and SMS text messaging and constantly low sleep template | 3300 |
| 3 | Almost all sensor readings near 0 | 2580 | Conversation and volume sharply increase after 6 AM, highest volume, low phone use before 7 AM, wake up at approximately 7 AM, and sleep at approximately 9 PM | 2728 |
| 4 | Highest acceleration, low phone calls, early wake up (at approximately 7 AM), and no sleep during the day | 1883 | High app use, SMS text messaging, and distance moved around midnight and below average acceleration | 2699 |
| 5 | High acceleration after midnight, high phone calls and SMS text messaging, high overall volume even at night, and late sleep and wake up | 1484 | Lowest acceleration (close to 0) and app use and constantly high screen time and sleep duration | 2378 |
| 6 | Below average volume and distance, wake up after 11 AM, and sleep during the day | 1298 | High phone calls and SMS text messaging, screen time sharply increases after 6 AM, wake up at approximately 9 AM, sleep at approximately 11 PM, and awake during the day | 1686 |
| 7 | Activity level and phone use are high during the day, inactive at night, short sleep duration, high number of phone calls, and acceleration increases after 3 PM | 1046 | Below average screen time and long sleep time (wake up around noon) | 1405 |
| 8 | No app use; low conversation, SMS text messaging, and volume; and long sleep even during the day | 523 | Low phone call, SMS text messaging, and screen time; high volume at night; and constant long sleep (wake up in the afternoon) | 752 |
| 9 | Accelerometer readings close to 0; low app use, conversation, and volume; phone screen is constantly on; and long sleep duration even during the day | 412 | High app use and distance moved during noon; templates in this cluster have high dissimilarities | 170 |
aGMM: Gaussian mixture model.
bPAM: partition around medoids.
Figure 5Time series plots of cluster assignment as obtained from the Gaussian mixture model (GMM) and partition around medoids (PAM) models (left pane) and weighted average likelihood score and distance score of a sample patient (right pane). Changes in cluster features are seen near the relapse instance (shown with the vertical red line).
Figure 6Boxplot of the clustering features (likelihood scores from Gaussian mixture model on top and distance scores from partition around medoids model at the bottom) in x days near relapse (NRx) and all days before relapses not in NRx (pre-NRx) periods. Bars indicate nonnegligible effect size.
The top 10 significant features in the relapse prediction pipeline based on the entire feature set (baseline and clustering-based features). The frequency of selection of a particular feature across the cross-validation loop is used to assess the most significant features for relapse prediction. It is to be noted that different numbers of features are selected in each cross-validation loop, as the number of features to be used is a hyperparameter tuned with a nested cross-validation loop.
| Features | Frequency (normalized) |
| Baseline feature–distance template skewness | 0.19 |
| Clustering feature–mean PAMa label | 0.17 |
| Clustering feature–mean PAM weighted distance | 0.14 |
| Baseline feature–conversation template skewness | 0.14 |
| Clustering feature–number of transitions | 0.12 |
| Clustering feature–SD GMMb label | 0.10 |
| Clustering feature–SD PAM label | 0.10 |
| Clustering feature–mean GMM assigned cluster likelihood | 0.10 |
| Baseline feature–conversation template kurtosis | 0.08 |
| Baseline feature–volume template range | 0.08 |
aPAM: partition around medoids.
bGMM: Gaussian mixture model.
Relapse prediction performance with different feature sets. The baseline features introduced in the previous study [19] are complemented with clustering-based features for evaluation. The performance of both the GMMa-based and PAMb-based feature sets are also separately evaluatedc.
| Method | F2 score (precision/recall) |
| All features | 0.23 (0.063/0.662) |
| Baseline features [ | 0.18 (0.055/0.400) |
| Clustering features | 0.14 (0.035/0.487) |
| GMM features | 0.16 (0.042/0.487) |
| PAM features | 0.19 (0.042/0.525) |
| GMM+baseline features | 0.19 (0.052/0.525) |
| PAM+baseline features | 0.16 (0.045/0.438) |
aGMM: Gaussian mixture model.
bPAM: partition around medoids.
cRandom classification baseline: mean score 0.042 (SD 0.020).