| Literature DB >> 32870161 |
Michael Leo Birnbaum1,2,3, Prathamesh Param Kulkarni4, Anna Van Meter1,2,3, Victor Chen5, Asra F Rizvi1,2, Elizabeth Arenare1,2, Munmun De Choudhury5, John M Kane1,2,3.
Abstract
BACKGROUND: Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions.Entities:
Keywords: Google; diagnostic prediction; digital biomarkers; digital data; digital phenotyping; internet search activity; machine learning; relapse prediction; schizophrenia spectrum disorders
Year: 2020 PMID: 32870161 PMCID: PMC7492982 DOI: 10.2196/19348
Source DB: PubMed Journal: JMIR Ment Health ISSN: 2368-7959
Feature categories along with the dimensionality of each feature type.
| Feature type | Dimensions |
| 24-hour histogram of length of queries with 1-h bin | 24 |
| 24-hour histogram of frequency of queries with 1-h bin | 24 |
| 32-day histogram of length of queries with 4-day bin | 8 |
| 32-day histogram of frequency of queries with 4-day bin | 8 |
| SD of 4-day frequency of queries bins | 1 |
| SD of 4-day length of queries bins | 1 |
| Average of the derivative of 4-day frequency of queries bins | 1 |
| Average of the derivative of 4-day length of queries bins | 1 |
| SD of the derivative of 4-day frequency of queries bins | 1 |
| SD of the derivative of 4 day length of queries bins | 1 |
| Linguistic inquiry and word count | 51 |
| Total number of queries in 1 month | 1 |
| Average query length in 1 month | 1 |
Participant demographics (N=116).
| Characteristic | Value | |
| Age (years), mean (SD) | 24.38 (5.18) | |
|
|
| |
|
| Male | 51 (44.0) |
|
| Female | 65 (56.0) |
|
|
| |
|
| Asian | 18 (15.5) |
|
| African American | 32 (27.6) |
|
| Caucasian | 60 (51.7) |
|
| Mixed/Other | 6 (5.2) |
| Hispanic, n (%) | 11 (9.5) | |
|
|
| |
|
| Schizophrenia | 16 (13.7) |
|
| Schizophreniform | 13 (11.2) |
|
| Schizoaffective | 2 (1.8) |
|
| Unspecified SSDa | 11 (9.5) |
| Healthy volunteers, n (%) | 74 (63.8) | |
aSSD: schizophrenia spectrum disorders.
Diagnostic classifier results.
| Classifier type | Mean F1 | Precision (HVa) | Precision (SSDb) | Recall (HV) | Recall (SSD) | Mean Accuracy | Mean (SD) AUCc |
| Support vector machine | 0.49 | 0.73 | 0.51 | 0.73 | 0.5 | 0.65 | 0.66 (0.09) |
| Random forest | 0.54 | 0.75 | 0.72 | 0.86 | 0.48 | 0.73 | 0.74 (0.06) |
| Gradient boost | 0.47 | 0.71 | 0.53 | 0.77 | 0.44 | 0.65 | 0.68 (0.09) |
aHV: healthy volunteers.
bSSD: schizophrenia spectrum disorders.
cAUC: area under the receiver operating characteristic curve.
Figure 1Receiver operating characteristic curves of the random forest diagnostic classifier for each of the 5 folds. AUC: area under the curve.
Quantity of search data provided per group for the diagnostic classifier.
| Metric | Healthy volunteers | Participants with SSDa |
| Total average queries (SD) | 332.93 (298.1) | 192.76 (214.19) |
| Weekly average queries (SD) | 80.37 (71.92) | 48.19 (52.91) |
aSSD: schizophrenia spectrum disorders.
Relapse classifier results.
| Classifier type | Mean F1 | Precision (HVa) | Precision (SSDb) | Recall (HV) | Recall (SSD) | Mean Accuracy | Mean (SD) AUCc |
| Support vector machine | 0.36 | 0.61 | 0.77 | 0.92 | 0.26 | 0.63 | 0.71 (0.16) |
| Random forest | 0.53 | 0.61 | 0.61 | 0.69 | 0.48 | 0.61 | 0.69 (0.09) |
| Gradient boost | 0.57 | 0.66 | 0.63 | 0.75 | 0.53 | 0.65 | 0.71 (0.10) |
aHV: healthy volunteers.
bSSD: schizophrenia spectrum disorders.
cAUC: area under the receiver operating characteristic curve.
Figure 2Receiver operating characteristic curves of the support vector machine relapse classifier for each of the 5 folds. AUC: area under the curve.
Quantity of search data provided per group for the relapse classifier.
| Metric | Periods of relative health | Periods of relative illness |
| Total average queries (SD) | 96.80 (98.77) | 168.29 (250.18) |
| Weekly average queries (SD) | 24.2 (11.17) | 42.07 (39.9) |
Feature importance of diagnostic classifiers sorted by decreasing order of importance.
| Diagnostic classifier features | Average feature importance (random forest) |
| Reduced search lengths between 8-9 am in participants with SSDa compared to HVb | 0.0315 |
| Reduced search lengths between 6-7 am in participants with SSD compared to HV | 0.0255 |
| Length of queries from 23-20 days prior to first hospitalization is lower in participants with SSD compared to HV | 0.0178 |
| Reduced usage of “relative” LIWCc features in participants with SSD compared to HV | 0.0112 |
| Variance in frequency of search lengths is lower in participants with SSD | 0.0111 |
| Reduced search lengths between 11am to 12 pm in participants with SSD compared to HV | 0.0091 |
| Reduced usage of “inhibition” LIWC features in participants with SSD compared to HV | 0.0078 |
| Reduced search lengths between 4 and 5 am in participants with SSD compared to HV | 0.0073 |
| Reduced usage of “quantifier” LIWC features in participants with SSD compared to HV | 0.0072 |
| Reduced search lengths between 1 and 2 am in participants with SSD compared to HV | 0.0071 |
| Reduced usage of “positive affect” LIWC features in participants with SSD compared to HV | 0.0071 |
| Reduced search lengths between 12 am and 1 am in participants with SSD compared to HV | 0.0070 |
| Reduced usage of “anxiety” LIWC features in participants with SSD compared to HV | 0.0064 |
| Lower overall number of queries in participants with SSD compared to HV | 0.0062 |
| Reduced usage of “preposition” LIWC features in participants with SSD compared to HV | 0.0061 |
| Reduced usage of “inclusive” LIWC features in participants with SSD compared to HV | 0.0059 |
| Frequency of search 19-16 days prior to first hospitalization is lower in participants with SSD compared to HV | 0.0059 |
| Reduced usage of “insight” LIWC features in participants with SSD compared to HV | 0.0057 |
| Number of queries between 2 and 3 am is lower in participants with SSD compared to HV | 0.0056 |
| Number of queries between 11 pm and 12 am is lower in participants with SSD compared to HV | 0.0051 |
aSSD: schizophrenia spectrum disorders.
bHV: healthy volunteers.
cLIWC: linguistic inquiry and word count.
Feature importance of relapse classifiers sorted by decreasing order of importance.
| Relapse classifier features | Average feature importance (support vector machine) |
| Reduced length of queries during relapse periods | 0.0688 |
| Increased usage of “sexual” LIWCa features during relapse periods | 0.0523 |
| Reduced length of queries 3-0 days prior to relapse hospitalization | 0.0506 |
| Reduced frequency of search activity during relapse periods | 0.0263 |
| Reduced usage of “health” LIWC features during relapse periods | 0.0245 |
| Increased usage of “hear” LIWC features during relapse periods | 0.0224 |
| Increased usage of “bio” LIWC features during relapse periods | 0.0223 |
| Increased searches in the 4 days before relapse hospitalization | 0.0209 |
| Reduced length of queries in the 7-4 days prior to relapse hospitalization | 0.0196 |
| Reduced frequency of searches 23-20 days prior to relapse hospitalization | 0.0194 |
| Increased usage of “percept” LIWC features during relapse periods | 0.0186 |
| Increased length of queries in the 31-28 days prior to relapse hospitalization | 0.0162 |
| Increased usage of “inclusive” LIWC features during relapse periods | 0.0143 |
| Denser searches during relapse periods | 0.0140 |
| Increased usage of “anger” LIWC features during relapse periods | 0.0131 |
| Reduced frequency of searches 19-16 days prior to relapse hospitalizations | 0.0125 |
| Reduced length of queries 11-8 days prior to relapse hospitalization | 0.0105 |
| Reduced usage of “sadness” LIWC features during relapse periods | 0.0105 |
| Increased usage of “indefinite pronoun” LIWC features during relapse periods | 0.0104 |
| Reduced frequency of searches 15-12 days prior to relapse hospitalization | 0.0097 |
aLIWC: linguistic inquiry and word count.