| Literature DB >> 35885784 |
Julieta G Rodríguez-Ruiz1, Carlos E Galván-Tejada1, Huizilopoztli Luna-García1, Hamurabi Gamboa-Rosales1, José M Celaya-Padilla2, José G Arceo-Olague1, Jorge I Galván Tejada1.
Abstract
Major depressive disorder (MDD) is the most recurrent mental illness globally, affecting approximately 5% of adults. Furthermore, according to the National Institute of Mental Health (NIMH) of the U.S., calculating an actual schizophrenia prevalence rate is challenging because of this illness's underdiagnosis. Still, most current global metrics hover between 0.33% and 0.75%. Machine-learning scientists use data from diverse sources to analyze, classify, or predict to improve the psychiatric attention, diagnosis, and treatment of MDD, schizophrenia, and other psychiatric conditions. Motor activity data are gaining popularity in mental illness diagnosis assistance because they are a cost-effective and noninvasive method. In the knowledge discovery in databases (KDD) framework, a model to classify depressive and schizophrenic patients from healthy controls is constructed using accelerometer data. Taking advantage of the multiple sleep disorders caused by mental disorders, the main objective is to increase the model's accuracy by employing only data from night-time activity. To compare the classification between the stages of the day and improve the accuracy of the classification, the total activity signal was cut into hourly time lapses and then grouped into subdatasets depending on the phases of the day: morning (06:00-11:59), afternoon (12:00-17:59), evening (18:00-23:59), and night (00:00-05:59). Random forest classifier (RFC) is the algorithm proposed for multiclass classification, and it uses accuracy, recall, precision, the Matthews correlation coefficient, and F1 score to measure its efficiency. The best model was night-featured data and RFC, with 98% accuracy for the classification of three classes. The effectiveness of this experiment leads to less monitoring time for patients, reducing stress and anxiety, producing more efficient models, using wearables, and increasing the amount of data.Entities:
Keywords: depression; machine learning; night-time; random forest; schizophrenia
Year: 2022 PMID: 35885784 PMCID: PMC9318635 DOI: 10.3390/healthcare10071256
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Data-mining process used in this paper to classify depressive, schizophrenic, and healthy-control episodes.
KDD steps and activities involved in this work.
| KDD Process | |
|---|---|
| Pre-KDD | Precision psychiatry using ML algorithms’ principal objectives of treatment response analysis, early identification, suicide prevention, real-time monitoring, and subclassified actual mental disorders [ |
| Selection | The Depresjon and Psykose datasets contain monitor-activity counts of patients with depression and schizophrenia, respectively. |
| Preprocessing | All patients’ activity count data are concatenated into a single matrix, standardized, transposed, and grouped by hours. |
| Transformation | After hourly segmentation, data are grouped into subsets following the day stage: morning (06:00–11:59), afternoon (12:00–17:59), evening (18:00–23:59), and night (00:00–05:59). |
| Data Mining | Classification of depressive, schizophrenic, and control episodes is performed with a random forest classifier. |
| Interpretation/evaluation | Precision, recall, F1 score, MCC, and accuracy measure every model’s effectiveness to identify healthy, schizophrenic, and depressive episodes concerning the day stage. |
| Post-KDD | It is not limited to this written report. |
Figure 2Plot of the mean activity level during all monitored days from every type: schizophrenic patients, depressive patients, and healthy controls. It starts at 00:00, and every point corresponds to one of the 1440 min in one day.
Features extracted from hourly segments of motor activity signals.
| Name | Equation |
|---|---|
| Mean |
|
| Sum |
|
| Maximum |
|
| Minimum |
|
| Median |
|
| Standard deviation |
|
| First decile |
|
| Second decile |
|
| First quantile |
|
| Third decile |
|
| Fourth decile |
|
| Second quantile |
|
| Sixth decile |
|
| Seventh decile |
|
| Third quantile |
|
| Eighth decile |
|
| Ninth decile |
|
| Kurtosis |
|
| Mean absolute deviation |
|
| Standard error of mean |
|
| Skewness |
|
| Variance |
|
| Unique |
|
| where | |
Motor activity level proportion per class.
| Day Stage | No. Features | Features |
|---|---|---|
| 00:00–05:59 | 5 | min, quantile10, quantile20, |
| 06:00–11:59 | 7 | min, median, quantile10, |
| 12:00–17:59 | 8 | max, min, quantile10, |
| 18:00–23:59 | 6 | min, quantile10, quantile20, |
Motor activity level proportion per class.
| Day Stage | Training Instances | Testing Instances | Features |
|---|---|---|---|
| 00:00–06:00 | 6116 | 2622 | 5 |
| 06:00–12:00 | 5051 | 2165 | 6 |
| 12:00–18:00 | 4809 | 2061 | 8 |
| 18:00–00:00 | 4761 | 2041 | 7 |
Figure 3The fivefold cross-validation uses resampling data to create five different datasets to evaluate the model’s performance. Sets 1 to 5 proportions are 80% of the data for training the model and the rest 20% for testing it.
Figure 4Confusion matrices by time segment. (A) Night-time; (B) morning; (C) afternoon; (D) evening. Note: 0 means healthy control, 1 depressive, and 2 schizophrenic episodes.
Fivefold cross-validation results using accuracy as the evaluation metric.
| Model | Accuracy | |
|---|---|---|
| Nighttime (00:00–05:59) | Maximum | 98.62% |
| Minimum | 97.25% | |
| Overall | 98.24% | |
| Morning (06:00–11:59) | Maximum | 88.44% |
| Minimum | 87.47% | |
| Overall | 87.97% | |
| Afternoon (12:00–17:59) | Maximum | 81.63% |
| Minimum | 80.27% | |
| Overall | 80.92% | |
| Evening (18:00–23:59) | Maximum | 91.26% |
| Minimum | 88.97% | |
| Overall | 89.84% | |
Data-mining results by model of day segment and class.
| Day Stage | Precision | Recall | F1 Score | MCC | |
|---|---|---|---|---|---|
| Night 00:00–06:00 | 0 | 0.98 | 0.99 | 0.98 | |
| 1 | 0.98 | 0.96 | 0.97 | 0.96 | |
| 2 | 0.98 | 0.98 | 0.98 | ||
| Morning 06:00–11:59 | 0 | 0.87 | 0.95 | 0.91 | |
| 1 | 0.94 | 0.85 | 0.89 | 0.81 | |
| 2 | 0.88 | 0.80 | 0.84 | ||
| Afternoon 12:00–17:59 | 0 | 0.78 | 0.91 | 0.84 | |
| 1 | 0.81 | 0.70 | 0.75 | 0.69 | |
| 2 | 0.87 | 0.72 | 0.79 | ||
| Evening 18:00–23:59 | 0 | 0.87 | 0.96 | 0.91 | |
| 1 | 0.90 | 0.84 | 0.87 | 0.82 | |
| 2 | 0.94 | 0.83 | 0.88 | ||
| Note: 0 means healthy control, 1 depressive, and 2 schizophrenic episodes. | |||||
Figure 5ROC curve plots by time segment. (A) Night-time; (B) morning; (C) afternoon; (D) evening. Note: 0 means healthy control, 1 depressive, and 2 schizophrenic episodes.