| Literature DB >> 27773970 |
Andrew M Jones1, James Lomas1, Peter T Moore2, Nigel Rice1.
Abstract
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.Entities:
Keywords: Health econometrics; Healthcare costs; Heavy tails; Quasi‐Monte‐Carlo methods
Year: 2015 PMID: 27773970 PMCID: PMC5053270 DOI: 10.1111/rssa.12141
Source DB: PubMed Journal: J R Stat Soc Ser A Stat Soc ISSN: 0964-1998 Impact factor: 2.483
Models included in recent published comparative work
|
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||
|
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
| ||||||
| Linear regression | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Linear regression (logarithmic) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| Linear regression (square root) | ✓ | ✓ | ✓ | ✓ | ||||||
| Log‐normal | ✓ | ✓ | ✓ | ✓ | ||||||
| Gaussian GLM | ✓ | † | ||||||||
| Poisson | ✓ | ✓ | ✓ | |||||||
| Gamma | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| EEE models | ✓ | ✓ | ✓ | |||||||
| Weibull | ✓ | ✓ | ✓ | ‡ | ||||||
| Generalized gamma | ✓ | ✓ | ✓ | ✓ | ||||||
| Generalized beta of the second kind | ✓ | ✓ | ||||||||
| Finite mixture of gamma distributions | ✓ | ✓ | ||||||||
| Conditional density estimator | ✓ | ✓ | ||||||||
†Not commonly used and problematic in estimation for our data in preliminary work.
‡A special case of generalized gamma and generalized beta of the second kind distributions which are included in our analysis.
Descriptive statistics for hospital costs
|
|
|
|
|
|---|---|---|---|
|
|
|
| |
|
| 6164114 | 3082057 | 3082057 |
| Mean | £2610 | £2610 | £2610 |
| Median | £1126 | £1126 | £1126 |
| Standard deviation | £5088 | £5090 | £5085 |
| Skewness | 13.03 | 12.94 | 13.13 |
| Kurtosis | 36318 | 347.06 | 379.36 |
| Maximum | £604701 | £476458.3 | £604701 |
| 99th percentile | £19015 | £19074 | £18955 |
| 95th percentile | £8956 | £8943 | £8969 |
| 90th percentile | £6017 | £6010 | £6025 |
| 75th percentile | £2722 | £2721 | £2722 |
| 25th percentile | £610 | £610 | £610 |
| 10th percentile | £446 | £446 | £446 |
| 5th percentile | £407 | £407 | £407 |
| 1st percentile | £347 | £347 | £347 |
| Minimum | £217 | £217 | £217 |
Figure 1Variance against mean for each of the 20 quantiles of the linear index of covariates: the data were divided into 20 subsets by using the deciles of a simple linear predictor for healthcare costs with the set of regressors introduced later; the figure plots the means and variances of actual healthcare costs for each of these subsets, with fitted linear and quadratic trends
Figure 2Diagram setting out the study design
Key for model labels
| OLS | Linear regression |
| LOGOLSHET | Transformed linear regression (logarithmically), heteroscedastic smearing factor |
| SQRTOLSHET | Transformed linear regression (square‐root), heteroscedastic smearing factor |
| GLMLOGP | GLM, log‐link, Poisson‐type family |
| GLMLOGG | GLM, log‐link, gamma‐type family |
| GLMSQRTP | GLM, square‐root link, Poisson‐type family |
| GLMSQRTG | GLM, square‐root link, gamma‐type family |
| LOGNORM | Log‐normal |
| GG | Generalized gamma |
| GB2LOG | Generalized beta of the second kind, log‐link |
| GB2SQRT | Generalized beta of the second kind, square‐root link |
| FMMLOGG | Two‐component finite mixture of gamma densities, log‐link |
| FMMSQRTG | Two‐component finite mixture of gamma densities, square‐root link |
| EEE | EEE method |
| CDEM | Conditional density approximation estimator (multinomial logit) |
| CDEO | Conditional density approximation estimator (ordered logit) |
Percentage of tests rejected at the 5% significance level, when all converged, 94 converged replications, sample size 5000
|
|
|
|---|---|
|
| |
| OLS | — |
| LOGOLSHET | 99 |
| SQRTOLSHET | 0 |
| GLMLOGP | 11 |
| GLMLOGG | 99 |
| GLMSQRTP | 0 |
| GLMSQRTG | 13 |
| LOGNORM | 95 |
| GG | 89 |
| GB2LOG | 96 |
| GB2SQRT | 85 |
| FMMLOGG | 85 |
| FMMSQRTG | 82 |
| EEE | 48 |
| CDEM | 7 |
| CDEO | 1 |
Results of model performance, when all converged, sample size 5000, averaged across 94 replications
|
|
|
|
|
|---|---|---|---|
|
|
|
| |
| OLS | −1.56 | 1833.49 | 4475.49 |
| LOGOLSHET | −140.53 | 1816.63 | 4960.08 |
| SQRTOLSHET |
| 1725.95 |
|
| GLMLOGP | − | 1748.43 | 4557.19 |
| GLMLOGG | −147.33 | 1818.06 | 4984.86 |
| GLMSQRTP |
| 1704.77 |
|
| GLMSQRTG | 46.71 |
|
|
| LOGNORM | 64.25 | 1734.10 | 4825.51 |
| GG | 44.60 | 1750.79 | 4754.22 |
| GB2LOG | −63.96 | 1796.91 | 4873.13 |
| GB2SQRT | 134.84 |
| 4483.35 |
| FMMLOGG | −3.19 | 1758.06 | 4782.69 |
| FMMSQRTG | 121.80 |
| 4477.10 |
| EEE | −42.31 | 1727.28 | 4508.03 |
| CDEM |
|
|
|
| CDEO | −10.13 | 1725.53 | 4474.84 |
Figure 3MPE by decile of fitted costs: (a) OLS; (b) LOGOLSHET; (c) GLMLOGP; (d) GLMLOGG; (e) EEE; (f) SQRTOLSHET; (g) GLMSQRTP; (h) GLMSQRTG; (i) LOGNORM; (j) GG; (k) GB2LOG; (l) RMMLOGG; (m) CDEM; (n) CDEO; (o) GB2SQRT; (p) RMMSQRTG
Figure 4Response surfaces for (a) log(RMSE), (b) log(MAPE), (c) log(ADMPE) and (d) MPE against sample size, constructed by evaluating performance on ‘validation’ set: ……, SQRTOLSHET; ‐ ‐ ‐ ‐ ‐, GLMLOGP; – – –, GLMSQRTP; ‐ · ‐ · ‐ · ‐, GLMSQRTG; – · –, GB2SQRT; — ·—, FMMSQRTG; · · — · ·, CDEM