| Literature DB >> 31698678 |
Nivedhitha Mahendran1, Durai Raj Vincent1, Kathiravan Srinivasan1, Chuan-Yu Chang2, Akhil Garg3, Liang Gao3, Daniel Gutiérrez Reina4.
Abstract
The present methods of diagnosing depression are entirely dependent on self-report ratings or clinical interviews. Those traditional methods are subjective, where the individual may or may not be answering genuinely to questions. In this paper, the data has been collected using self-report ratings and also using electronic smartwatches. This study aims to develop a weighted average ensemble machine learning model to predict major depressive disorder (MDD) with superior accuracy. The data has been pre-processed and the essential features have been selected using a correlation-based feature selection method. With the selected features, machine learning approaches such as Logistic Regression, Random Forest, and the proposed Weighted Average Ensemble Model are applied. Further, for assessing the performance of the proposed model, the Area under the Receiver Optimization Characteristic Curves has been used. The results demonstrate that the proposed Weighted Average Ensemble model performs with better accuracy than the Logistic Regression and the Random Forest approaches.Entities:
Keywords: correlation-based feature selection; major depressive disorder; random forest; smartwatch sensor; weighted average ensemble
Mesh:
Year: 2019 PMID: 31698678 PMCID: PMC6891280 DOI: 10.3390/s19224822
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Architectural Diagram of the Proposed Model.
Figure 2Sigmoidal Curve. Description: Here, ‘x’ can be any dependent attribute.
Figure 3The schematic diagram for Random Forest Approach.
Confusion Matrix Components.
| Confusion Matrix | Definition | Formula |
|---|---|---|
| Accuracy | It is the ratio of correctly classified to the whole set. | TN + TP/All |
| Precision | It is the ratio of correctly classified positive subjects to all the positive subjects. For instance, which answers the question: How many of the patients whom we named as depressed are actually depressed? | TP/TP + FP |
| Sensitivity (Recall) | It is the ratio of correctly classified positive subjects to all those who have the disease in reality. | TP/TP + FN |
| Specificity | It is the ratio of correctly classified negative subjects to all the healthy subjects in reality. | TN/TN + FP |
| FMeasure | It is a combination of both recall and precision. Harmonic average. | 2 × (Precision × Recall)/(Recall + Precision) |
Figure 4Accuracy Vs. cut-off curve for Logistic Regression Approach.
Figure 5Accuracy Vs. cut-off curve for Random Forest Approach.
Figure 6AUC-ROC curve for Logistic Regression Approach.
Figure 7AUC-ROC for Random Forest Approach.
Figure 8AUC-ROC curve for Weighted Average Ensemble Model.
Figure 9Performance comparison between Logistic Regression Model, Random Forest Approach, and the proposed Weighted Average Ensemble Model.
Performance Evaluation of LR, RF, and the proposed Weighted Average Ensemble Model.
| Performance Metrics | Logistic Regression | Random Forest | Weighted Average |
|---|---|---|---|
| Accuracy | 0.9318 | 0.9839 | 0.9901 |
| Precision | 0.9539 | 0.9673 | 0.9754 |
| Sensitivity (Recall) | 0.8430 | 0.9729 | 0.9840 |
| Specificity | 0.9785 | 0.9772 | 0.9887 |
| FMeasure | 0.8950 | 0.9465 | 0.9795 |