| Literature DB >> 33967675 |
Weitong Guo1,2,3,4, Hongwu Yang2,4, Zhenyu Liu1,3, Yaping Xu1,2, Bin Hu1,3.
Abstract
The proportion of individuals with depression has rapidly increased along with the growth of the global population. Depression has been the currently most prevalent mental health disorder. An effective depression recognition system is especially crucial for the early detection of potential depression risk. A depression-related dataset is also critical while evaluating the system for depression or potential depression risk detection. Due to the sensitive nature of clinical data, availability and scale of such datasets are scarce. To our knowledge, there are few extensively practical depression datasets for the Chinese population. In this study, we first create a large-scale dataset by asking subjects to perform five mood-elicitation tasks. After each task, subjects' audio and video are collected, including 3D information (depth information) of facial expressions via a Kinect. The constructed dataset is from a real environment, i.e., several psychiatric hospitals, and has a specific scale. Then we propose a novel approach for potential depression risk recognition based on two kinds of different deep belief network (DBN) models. One model extracts 2D appearance features from facial images collected by an optical camera, while the other model extracts 3D dynamic features from 3D facial points collected by a Kinect. The final decision result comes from the combination of the two models. Finally, we evaluate all proposed deep models on our built dataset. The experimental results demonstrate that (1) our proposed method is able to identify patients with potential depression risk; (2) the recognition performance of combined 2D and 3D features model outperforms using either 2D or 3D features model only; (3) the performance of depression recognition is higher in the positive and negative emotional stimulus, and females' recognition rate is generally higher than that for males. Meanwhile, we compare the performance with other methods on the same dataset. The experimental results show that our integrated 2D and 3D features DBN is more reasonable and universal than other methods, and the experimental paradigm designed for depression is reasonable and practical.Entities:
Keywords: 3D deep information; affective rating system; deep belief networks; depression recognition; facial expression
Year: 2021 PMID: 33967675 PMCID: PMC8102822 DOI: 10.3389/fnins.2021.609760
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Deployment structure of long short-term memory (LSTM) unit.
Figure 2The framework of proposed approach.
Figure 3The structure of designed deep belief network (DBN) model.
Figure 42D static appearance deep network.
Figure 5Framework of four different deep belief network (DBN) models: (A) 4DBN, (B) 4DBN-AU, (C) AU-4DBN, (D) AU-4DBN-LSTM.
Demographic characteristic of the out-patients and control group: mean and (standard deviation).
| Male | Control group | 52 | 39 (10.8) | 11.8 (2.5) | 0.8 (2.0) | 6.4 (6.4) |
| Out-patients | 52 | 34.8 (11.1) | 11.2 (3.4) | 17.5 (5.6) | 26.4 (12.8) | |
| Female | Control group | 52 | 34.7 (10.7) | 12.3 (3.2) | 0.3 (0.7) | 4.7 (5.3) |
| Out-patients | 52 | 37.4 (10.4) | 10.8 (4.0) | 18.3 (5.6) | 33.5 (13.2) |
Figure 6Schematic diagram of affective rating. The top is valence rating. The bottom is arousal rating.
Figure 7Facial contour that consists of 1,347 vertices generated by Kinect.
The differences in valence and arouse dimension between the healthy and depressed group for the five stimuli with three emotional valences (P < 0.1).
| Film | Health | 1.635 | 0.074 | –1.534 | |
| depression | –1.058 | –0.036 | –1.078 | ||
| 2.428 | |||||
| Question | Health | 1.356 | 0.088 | –0.744 | |
| Depression | –0.273 | 0.333 | –0.330 | ||
| 0.601 | 0.565 | 1.035 | |||
| Valence | Reading | Health | 0.947 | 0.260 | 0.829 |
| Depression | 0.273 | 0.242 | 0.333 | ||
| 0.233 | 0.137 | 0.531 | |||
| Expression figure | Health | 1.084 | 0.205 | –0.938 | |
| Depression | –0.152 | –0.198 | –0.506 | ||
| 0.960 | |||||
| Scene figure | Health | 0.874 | 0.110 | –0.123 | |
| Depression | –0.015 | 0.061 | 0.303 | ||
| 0.125 | 1.531 | 0.211 | |||
| Film | Health | 1.058 | –0.205 | 1.045 | |
| Depression | –0.635 | 0.076 | 0.014 | ||
| 0.151 | |||||
| Question | Health | 0.968 | 0.027 | 0.109 | |
| Depression | –0.060 | –0.030 | 0.151 | ||
| 0.254 | 0.325 | ||||
| arousal | Reading | Health | –0.810 | 0.137 | 0.164 |
| Depression | –0.182 | –0.106 | 0.076 | ||
| 0.222 | 0.115 | 0.177 | |||
| Expression figure | Health | 1.008 | 0.205 | 0.233 | |
| Depression | –0.014 | 0.333 | 0.106 | ||
| Scene figure | Health | 0.219 | 0.055 | –0.068 | |
| Depression | 0.015 | –0.091 | –0.121 | ||
| 0.146 |
The table's data are the statistical value of the difference between the pretest and posttest of the valence dimension and arouse dimension under different stimuli tasks.
Bold indicates significant difference.
Figure 8Box comparison charts of the valence and arousal of the healthy group and depressed group before and after positive film clips stimulation.
Figure 9Comparison of recognition performance with the different hidden levels on three emotional valences of the five stimuli for 3D facial points and 2D images. (A–C) are based on 3D facial points. (D–F) are based on 2D images.
The accuracy of all models for three emotional valences of five stimuli.
| 4DBN-AU | 0.638 | 0.603 | 0.673 | 0.638 | |
| AU-4DBN | 0.693 | 0.635 | 0.701 | 0.676 | |
| Film | 3D-DGDN | 0.745 | 0.677 | 0.752 | 0.725 |
| 2D-SADN | 0.682 | 0.617 | 0.694 | 0.664 | |
| Joint(2D-3D) | 0.716 | ||||
| 4DBN-AU | 0.605 | 0.593 | 0.592 | 0.597 | |
| AU-4DBN | 0.639 | 0.636 | 0.642 | 0.639 | |
| Question | 3D-DGDN | 0.687 | 0.659 | 0.693 | 0.680 |
| 2D-SADN | 0.618 | 0.632 | 0.647 | 0.632 | |
| Joint(2D-3D) | 0.702 | 0.683 | 0.713 | 0.699 | |
| 4DBN-AU | 0.572 | 0.583 | 0.601 | 0.585 | |
| AU-4DBN | 0.623 | 0.625 | 0.652 | 0.633 | |
| Reading | 3D-DGDN | 0.668 | 0.658 | 0.694 | 0.673 |
| 2D-SADN | 0.583 | 0.613 | 0.635 | 0.610 | |
| Joint(2D-3D) | 0.711 | 0.697 | 0.712 | 0.707 | |
| 4DBN-AU | 0.617 | 0.538 | 0.608 | 0.588 | |
| AU-4DBN | 0.671 | 0.592 | 0.672 | 0.645 | |
| Scene picture | 3D-DGDN | 0.716 | 0.651 | 0.724 | 0.697 |
| 2D-SADN | 0.623 | 0.613 | 0.668 | 0.635 | |
| Joint(2D-3D) | 0.747 | 0.707 | 0.752 | 0.735 | |
| 4DBN-AU | 0.659 | 0.591 | 0.635 | 0.628 | |
| AU-4DBN | 0.703 | 0.642 | 0.690 | 0.678 | |
| Expression picture | 3D-DGDN | 0.729 | 0.683 | 0.751 | 0.721 |
| 2D-SADN | 0.684 | 0.657 | 0.668 | 0.700 | |
| Joint(2D-3D) | 0.725 |
Bold indicates a higher recognition rate.
Figure 10Comparison of accuracy with the five stimuli among three networks.
Figure 11Comparison of accuracy on different gender under 95% confidence interval.
Comparison of accuracy based on the same database.
| Film clips | 0.793 | 0.825 | 0.737 | 0.723 | 0.761 | 0.868 | 0.806 | 0.816 | 0.763 | |
| Questions | 0.711 | 0.761 | 0.649 | 0.665 | 0.705 | 0.737 | 0.741 | 0.765 | 0.675 | |
| Female | Readings | 0.723 | 0.768 | 0.711 | 0.691 | 0.728 | 0.658 | 0.714 | 0.751 | 0.658 |
| Scene pictures | 0.728 | 0.801 | 0.711 | 0.701 | 0.713 | 0.763 | 0.755 | 0.806 | 0.711 | |
| Expression pictures | 0.767 | 0.806 | 0.632 | 0.710 | 0.741 | 0.737 | 0.788 | 0.801 | 0.711 | |
| Film clips | 0.736 | 0.772 | 0.647 | 0.698 | 0.733 | 0.794 | 0.794 | 0.782 | 0.647 | |
| Questions | 0.693 | 0.738 | 0.725 | 0.693 | 0.728 | 0.735 | 0.727 | 0.726 | 0.667 | |
| Male | Readings | 0.677 | 0.724 | 0.618 | 0.673 | 0.694 | 0.676 | 0.715 | 0.745 | 0.588 |
| Scene pictures | 0.718 | 0.755 | 0.618 | 0.689 | 0.714 | 0.706 | 0.743 | 0.776 | 0.647 | |
| Expression pictures | 0.726 | 0.761 | 0.647 | 0.710 | 0.673 | 0.706 | 0.780 | 0.737 | 0.588 | |