| Literature DB >> 35699997 |
Jinkyung Park1, Ramanathan Arunachalam2, Vincent Silenzio3, Vivek K Singh1,4.
Abstract
BACKGROUND: Approximately 1 in 5 American adults experience mental illness every year. Thus, mobile phone-based mental health prediction apps that use phone data and artificial intelligence techniques for mental health assessment have become increasingly important and are being rapidly developed. At the same time, multiple artificial intelligence-related technologies (eg, face recognition and search results) have recently been reported to be biased regarding age, gender, and race. This study moves this discussion to a new domain: phone-based mental health assessment algorithms. It is important to ensure that such algorithms do not contribute to gender disparities through biased predictions across gender groups.Entities:
Keywords: algorithmic bias; gender bias; health equity; health information systems; medical informatics; mental health; mobile phone
Year: 2022 PMID: 35699997 PMCID: PMC9240929 DOI: 10.2196/34366
Source DB: PubMed Journal: JMIR Form Res ISSN: 2561-326X
Results showing the average overall accuracy, accuracy for men, and accuracy for women for various machine learning models in mental health assessment (averaged over 100 iterations).
| Machine learning models | Overall accuracy (%), mean (SD) | Male accuracy (%), mean (SD) | Female accuracy (%), mean (SD) | Delta across gender (%), mean (SD) | |
| Multilayer perceptron neural networks | 59.99 (3.67) | 58.68 (8.14) | 61.92 (9.24) | 12.10 (10.41) | <.001 |
| Support vector machine | 63.17 (2.91) | 65.98 (6.49) | 59.60 (8.37) | 12.20 (8.67) | <.001 |
| Logistic regression | 58.48 (2.69) | 66.59 (5.47) | 47.38 (6.75) | 19.73 (9.80) | <.001 |
| K-nearest neighbors | 61.77 (1.78) | 70.43 (3.72) | 49.63 (5.89) | 20.96 (8.46) | <.001 |
| Random forest | 78.57 (1.61) | 87.16 (2.73) | 71.31 (2.51) | 15.85 (0.22) | <.001 |
The average score for bias metrics in the random forest–based mental health assessment algorithm (average of 100 iterations).
| Bias metrics | Observed score, mean (SD) | Ideal score |
| Delta accuracy (%) | 15.85 (0.22) | 0 |
| Delta true positive rate (%) | −0.88 (8.39) | 0 |
| Delta false positive rate (%) | 33.43 (13.50) | 0 |
| Statistical parity difference (%) | 26.1 (4.16) | 0 |
| Disparate impact | 0.682 (0.049) | 1.0 |
The average score for bias metrics after applying the disparate impact remover approach (average of 100 iterations).
| Bias metrics | Observed score, mean (SD) | Ideal score |
| Delta accuracy (%) | 1.66 (1.56) | 0 |
| Delta true positive rate (%) | 3.74 (6.74) | 0 |
| Delta false positive rate (%) | 5.58 (9.88) | 0 |
| Statistical parity difference (%) | −2.70 (1.71) | 0 |
| Disparate impact | 1.09 (0.041) | 1.0 |
Comparison of delta accuracy, statistical parity difference, and disparate impact before and after applying the postprocessing algorithm.
| Bias metrics | Baseline model, mean (SD) | After bias reduction, mean (SD) | Difference | |
| Delta accuracy (%) | 15.85 (0.22) | 1.66 (1.56) | 14.19 | <.001 |
| Delta true positive rate (%) | −0.88 (8.39) | 3.74 (6.74) | 4.63 | <.001 |
| Delta false positive rate (%) | 33.43 (13.50) | 5.58 (9.88) | 27.85 | <.001 |
| Statistical parity difference (%) | 26.10 (4.16) | −2.70 (1.71) | 28.80 | <.001 |
| Disparate impact | 0.682 (0.049) | 1.09 (0.041) | 0.408 | <.001 |