| Literature DB >> 34463857 |
Kasper van Mens1,2, Sascha Kwakernaak1,3, Richard Janssen3,4, Wiepke Cahn1,5, Joran Lokkerbol2, Bea Tiemens6.
Abstract
A mental healthcare system in which the scarce resources are equitably and efficiently allocated, benefits from a predictive model about expected service use. The skewness in service use is a challenge for such models. In this study, we applied a machine learning approach to forecast expected service use, as a starting point for agreements between financiers and suppliers of mental healthcare. This study used administrative data from a large mental healthcare organization in the Netherlands. A training set was selected using records from 2017 (N = 10,911), and a test set was selected using records from 2018 (N = 10,201). A baseline model and three random forest models were created from different types of input data to predict (the remainder of) numeric individual treatment hours. A visual analysis was performed on the individual predictions. Patients consumed 62 h of mental healthcare on average in 2018. The model that best predicted service use had a mean error of 21 min at the insurance group level and an average absolute error of 28 h at the patient level. There was a systematic under prediction of service use for high service use patients. The application of machine learning techniques on mental healthcare data is useful for predicting expected service on group level. The results indicate that these models could support financiers and suppliers of healthcare in the planning and allocation of resources. Nevertheless, uncertainty in the prediction of high-cost patients remains a challenge.Entities:
Keywords: Machine learning; Mental healthcare; Resource allocation
Mesh:
Year: 2021 PMID: 34463857 PMCID: PMC8732820 DOI: 10.1007/s10488-021-01150-6
Source DB: PubMed Journal: Adm Policy Ment Health ISSN: 0894-587X
Demographic description of patient population in the training data (N = 10,911)
| Demographic variables | Mean | SD | % |
|---|---|---|---|
| Age | 44.0 | 16.55 | |
| Gender, female | 55.7 | ||
| Marital status | |||
| Married | 19.5 | ||
| Living together, unmarried | 4.8 | ||
| Unmarried, never been married | |||
| Divorced | 9.5 | ||
| Widowed | 1.9 | ||
| Unknown | 25.4 | ||
| Education | |||
| High | 15.6 | ||
| Secondary | |||
| Primary | 1.5 | ||
| Unknown | 39.2 | ||
| Living condition | |||
| Single | 16.2 | ||
| Without partner, with children | 2.4 | ||
| With partner, without children | 7.2 | ||
| With partner, with children | 7.0 | ||
| Child with single parent | 1.4 | ||
| Child with multiple parent | 3.4 | ||
| Jail, institutionalized, homeless | 2.8 | ||
| Unknown | 59.7 |
Clinical description of the patient population in the training data (N = 10,911)
| Clinical features | Mean | SD | % |
|---|---|---|---|
| Main diagnosis group | |||
| Personality disorders | 22.2 | ||
| Schizophrenia and other psychotic disorders | 21.9 | ||
| Depressive disorders | 13.9 | ||
| Bipolar disorders | 11.1 | ||
| Anxiety disorders | 10.4 | ||
| Somatic symptom disorders | 5.0 | ||
| Pervasive developmental disorders | 4.8 | ||
| Delerium, dementia | 3.6 | ||
| Eating disorders | 2.7 | ||
| Substance related disorders | 1.9 | ||
| Other diagnosis | 2.6 | ||
| Occupational problem (DSM-IV) at start of DRG | 10.9 | ||
| Legal measure at start of DRG | 6.9 | ||
| Acute care at start of DRG | 6.1 | ||
| Global Assessment of Functioning at start of DRG | 48.5 | 10.65 | |
| T-score baseline at start of DRG | 48.0 | 10.85 | |
| Treatment duration from start DRG, years | 4.6 | 6.06 |
Results on test data (2018, N = 10,201)
| N | Mean hours | Baseline model (R2 = 0.00) | Model1 (R2 = 0.18) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ME | CI | MAE | CI | ME | CI | MAE | CI | |||
| 1 | 3860 | 60.27 | − 1.62 | − 3.66–0.44 | 45.86 | 44.42–47.36 | 0.51 | − 1.38–2.42 | 40.09 | 38.8–41.47 |
| 2 | 1355 | 63.38 | − 1.64 | − 5.3–1.89 | 49.7 | 47.29–52.25 | 3.25 | 0.02–6.55 | 43.19 | 40.89–45.5 |
| 3 | 300 | 68.54 | 7.58 | − 1.99–16.87 | 57.18 | 50.74–63.59 | 9.95 | 1.83–18.12 | 51.32 | 45.36–57.13 |
| 4 | 1472 | 62.77 | − 2.02 | − 5.27–1.37 | 47.08 | 44.63–49.45 | 3.89 | 0.91–6.94 | 41.14 | 38.9–43.22 |
| 5 | 1431 | 59.25 | 1.70 | − 1.55–4.84 | 44.5 | 42.26–46.93 | 6.84 | 3.85–9.74 | 40.51 | 38.28–42.71 |
| 6 | 1783 | 63.68 | 0.99 | − 2.08–4.13 | 49.06 | 46.92–51.21 | 4.12 | 1.23–7.07 | 43.56 | 41.6–45.62 |
| Total | 10,201 | 61.74 | − 0.49 | − 1.81–0.78 | 47.25 | 46.35–48.14 | 3.16 | 1.98–4.32 | 41.64 | 40.81–42.45 |
Aggregated predictions on test data for each insurance company population
ME mean error, MAE mean absolute error, with 95% bootstrapped confidence intervals
Fig. 1Scatterplot of predicted versus actual hours of model3 on test data 2018 (N = 10,201)
Top five most predictive variables for each model with scaled (relative) variable importance values
| Rank | Model1 | Model2 | Model3 | |||
|---|---|---|---|---|---|---|
| 1 | GAF | 100 | Hours previous year | 100 | Time spent in hours in month 2 | 100 |
| 2 | T-score baseline (ROM) | 76 | Duration of treatment at start of DRG | 50 | Time spent in hours in month 1 | 26 |
| 3 | Age | 70 | Crisis situation previous year | 44 | Duration of treatment at start of DRG | 22 |
| 4 | Raw score baseline (ROM) | 57 | T-score baseline (ROM) | 38 | Time spent on intake activities in month 1 and/or 2 | 20 |
| 5 | Legal measures | 54 | Age | 37 | Time spent on treatment appointments in month 1 and/or 2 | 20 |
The variable that contributed the most to model performance is set to 100 and the contribution of the other variables are related to the most contributing variable