| Literature DB >> 36111262 |
Teodora Sandra Buda1, João Guerreiro1, Jesus Omana Iglesias1, Carlos Castillo2,3, Oliver Smith1, Aleksandar Matic1.
Abstract
Digital mental health applications promise scalable and cost-effective solutions to mitigate the gap between the demand and supply of mental healthcare services. However, very little attention is paid on differential impact and potential discrimination in digital mental health services with respect to different sensitive user groups (e.g., race, age, gender, ethnicity, socio-economic status) as the extant literature as well as the market lack the corresponding evidence. In this paper, we outline a 7-step model to assess algorithmic discrimination in digital mental health services, focusing on algorithmic bias assessment and differential impact. We conduct a pilot analysis with 610 users of the model applied on a digital wellbeing service called Foundations that incorporates a rich set of 150 proposed activities designed to increase wellbeing and reduce stress. We further apply the 7-step model on the evaluation of two algorithms that could extend the current service: monitoring step-up model, and a popularity-based activities recommender system. This study applies an algorithmic fairness analysis framework for digital mental health and explores differences in the outcome metrics for the interventions, monitoring model, and recommender engine for the users of different age, gender, type of work, country of residence, employment status and monthly income. Systematic Review Registration: The study with main hypotheses is registered at: https://osf.io/hvtf8.Entities:
Keywords: algorithmic discrimination; algorithmic fairness; case study; digital mental health; undesired bias
Year: 2022 PMID: 36111262 PMCID: PMC9468215 DOI: 10.3389/fdgth.2022.943514
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Foundations app screenshots.
Sensitive attributes analysed.
| Sensitive attribute | Values |
|---|---|
| Gender | Female, Male |
| Working position | Do not work, Entry level, Internship |
| Employment status | Unemployed (not searching for job), Unemployed (searching for job), Employed |
| Location of origin | South and East Asia (incl. India and China), UK, Other Western Europe |
| Age | 18–19, 20–26 |
| Level of work | Full time, Part time, Other |
| Monthly income | <£1,000, £1,000–£2,000 |
Distribution of participants per arm, total and for each value of sensitive attribute.
| Sensitive attribute | Attribute value | Group 1 | Group 2 | Group 3 | Group 4 | Total |
|---|---|---|---|---|---|---|
| 153 | 153 | 151 | 153 | 610 | ||
| Gender | Female | 105 | 106 | 105 | 106 | 422 |
| Gender | Male | 46 | 46 | 45 | 45 | 182 |
| Working position | Do not work | 81 | 85 | 77 | 84 | 327 |
| Working position | Entry level | 25 | 19 | 20 | 16 | 80 |
| Working position | Internship | 19 | 20 | 20 | 22 | 81 |
| Location of origin | South and East Asia | 43 | 48 | 35 | 40 | 166 |
| Location of origin | UK | 42 | 48 | 49 | 45 | 184 |
| Location of origin | Other Western Europe | 24 | 23 | 26 | 31 | 104 |
| Age | 18–19 | 17 | 20 | 24 | 19 | 80 |
| Age | 20–26 | 112 | 105 | 97 | 112 | 426 |
| Level of work | Full time | 28 | 32 | 29 | 32 | 121 |
| Level of work | Part time | 60 | 54 | 51 | 51 | 216 |
| Level of work | Other | 62 | 67 | 68 | 67 | 264 |
| Monthly income | <£1,000 | 110 | 121 | 115 | 116 | 462 |
| Monthly income | £1,000-£2,000 | 28 | 26 | 22 | 25 | 101 |
| Employment status | Unemployed (searching for job) | 48 | 56 | 49 | 48 | 201 |
| Employment status | Unemployed (not searching for job) | 32 | 33 | 38 | 36 | 139 |
| Employment status | Employed | 44 | 42 | 41 | 38 | 165 |
Contingency table for WHO-5 step ups (in the original data), split by gender values.
| WHO-5 step up | ||
|---|---|---|
| Gender | Yes | No |
| Female | 7 (8.23%) | 78 |
| Not female | 6 (13.95%) | 37 |
Contingency table for WHO-5 increments (in the original data), split by gender values.
| WHO-5 increment | ||
|---|---|---|
| Gender | Yes | No |
| Female | 33 (38.82%) | 52 |
| Not female | 11 (25.58%) | 32 |
Minimum p-values for Fisher’s exact test on contingency tables involving a sensitive attribute and the following targets: WHO-5 step ups in the original data, WHO-5 increments in the original data, and WHO-5 step up events as predicted by the step up model.
| Sensitive attribute | |||||||
|---|---|---|---|---|---|---|---|
| Gender | Working position | Employment status | Location of origin | Age | Level of work | Monthly Income | |
| WHO-5 step up | |||||||
| Minimum | 0.346 | 0.258 | 0.033 | 0.213 | 0.059 | 0.546 | 0.690 |
| (original data) | |||||||
| WHO-5 increment | |||||||
| Minimum | 0.169 | 0.292 | 0.171 | 0.031 | 0.011 | 0.200 | 0.614 |
| (original data) | |||||||
| WHO-5 step up | |||||||
| Minimum | 0.091 | 0.549 | 0.049 | 0.049 | 0.112 | 0.517 | 0.338 |
| (step up model) | |||||||
| Today explore | |||||||
| Minimum | 0.035 | 0.399 | 0.102 | 0.270 | 0.343 | 0.613 | 0.257 |
| (RecSys model) | |||||||
Statistically significant p-values are marked with *.
Figure 2Percentage of WHO-5 step-ups for each sensitive attribute. None of the differences remain statistically significant after correcting for multiple hypotheses testing.
Step up in WHO-5 level (e.g. regular to low = Step up) with a machine learning model.
| Model | Confusion matrix | AUC | Sensitivity | Specificity | Precision | Recall | Balanced score | Kappa |
|---|---|---|---|---|---|---|---|---|
| XGBoost step up model | [47, 6], [7, 5] | 0.66 | 0.42 | 0.89 | 0.45 | 0.42 | 0.65 | 0.31 |
| Logistic regression step up model | [47, 6], [8, 4] | 0.69 | 0.33 | 0.89 | 0.40 | 0.33 | 0.61 | 0.24 |
| Random forest step up model | [50, 3], [10, 2] | 0.74 | 0.16 | 0.94 | 0.40 | 0.16 | 0.56 | 0.14 |
Contingency table for Today Explore Recsys outputs, split by gender values.
| Activity selected | |||
|---|---|---|---|
| Gender | Yes | No | Clickthrough rate |
| Female | 128 | 2,303 | 5.27% |
| Not female | 27 | 762 | 3.42% |
Figure 3Today explore RecSys Clickthrough rate for each sensitive attribute. None of the differences remain statistically significant after correcting for multiple hypotheses testing.