| Literature DB >> 29258428 |
Abstract
BACKGROUND: The statistical analysis of health care cost data is often problematic because these data are usually non-negative, right-skewed and have excess zeros for non-users. This prevents the use of linear models based on the Gaussian or Gamma distribution. A common way to counter this is the use of Two-part or Tobit models, which makes interpretation of the results more difficult. In this study, I explore a statistical distribution from the Tweedie family of distributions that can simultaneously model the probability of zero outcome, i.e. of being a non-user of health care utilization and continuous costs for users.Entities:
Keywords: Cost data; Health care utilization; Health economics; Tweedie distribution
Mesh:
Year: 2017 PMID: 29258428 PMCID: PMC5735804 DOI: 10.1186/s12874-017-0445-y
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Rank values for AIC and RMSE for all models assessed in 100 simulated data sets each in situations with different percentages of zero costs. The best model (lowest AIC or lowest RMSE) is assigned a value of 1, the worst gets 5. Plots show the rank sums of 100 data sets; lower values are bette
Comparison of marginal effects of Tobit, Tweedie, Poisson, and two-part (Binomial/Gamma and Binomial/GenG) models on the RAND HIE data
| Tobit | Tweedie | Poisson | Two-part | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Binomial | Gamma | GenG | ||||||||||
| Est. | SE | Est. | SE | Est. | SE | Est. | SE | Est. | SE | Est. | SE | |
| Intercept | -212.276 | 118.391 | 883.966 | 112.936 | 94.892 | 94.754 | -0.091 | 0.050 | 1260.155 | 132.253 | 861.167 | 66.452 |
| age | 2.274 | 1.135 | 1.481 | 1.067 | 1.064 | 0.934 | 0.002 | 0.001 | 1.083 | 1.222 | 1.652 | 0.606 |
| disea | 6.319 | 1.695 | 3.045 | 1.566 | 3.534 | 1.408 | 0.005 | 0.001 | 2.127 | 1.806 | 5.664 | 0.895 |
| physlm | 214.712 | 34.484 | 143.054 | 31.272 | 191.965 | 28.998 | 0.040 | 0.019 | 168.177 | 36.817 | 118.515 | 18.349 |
| logc | -23.994 | 17.530 | -9.577 | 16.520 | -8.445 | 13.323 | -0.023 | 0.008 | -3.963 | 19.191 | -19.759 | 9.494 |
| idp | -7.057 | 34.130 | 10.661 | 32.162 | -3.311 | 27.925 | -0.008 | 0.016 | 14.492 | 37.642 | 26.614 | 18.545 |
| lpi | -1.806 | 5.612 | -5.907 | 5.251 | -1.625 | 4.612 | 0.001 | 0.003 | -7.198 | 6.087 | -4.135 | 3.005 |
| fmde | 1.728 | 10.453 | 3.828 | 9.846 | -0.086 | 8.550 | 0.001 | 0.005 | 4.595 | 11.310 | 0.862 | 5.614 |
| linc | 18.113 | 11.951 | 13.757 | 11.492 | 6.876 | 9.452 | 0.012 | 0.005 | 8.076 | 13.893 | 13.392 | 6.926 |
| lfam | -12.640 | 23.245 | -2.754 | 21.955 | -15.670 | 18.987 | 0.007 | 0.011 | -1.910 | 25.514 | -22.502 | 12.599 |
| female | 138.240 | 26.417 | 82.600 | 25.001 | 66.654 | 21.586 | 0.116 | 0.013 | 68.471 | 28.453 | 71.842 | 14.100 |
| black | -166.908 | 39.181 | -72.274 | 37.161 | -60.124 | 31.299 | -0.129 | 0.016 | -28.317 | 44.221 | -85.676 | 22.035 |
| educdec | 0.150 | 4.556 | -2.301 | 4.290 | -2.869 | 3.736 | 0.005 | 0.002 | -4.205 | 4.897 | 1.894 | 2.441 |
| hlthg | -17.324 | 25.583 | -13.963 | 24.077 | -23.013 | 20.990 | 0.013 | 0.012 | -20.765 | 27.486 | -18.554 | 13.627 |
|
| 1.719 | 2 | 2 | |||||||||
|
| 43649 | 37777 | 51495 | 2770 | 34482 | 33439 | ||||||
|
| 573.26 | 568.14 | 572.33 | 568.60 | 568.23 | |||||||
p is the estimated mean-variance power parameter
Fig. 2Q-Q plots for true and estimated quantiles of total health care utilization in the RAND HIE data for all models. Because of heavy outliers, I do not show the last percentile. Quantile values closer to the dashed line represent a better match of empirical and estimated distributions
Fig. 3Mean-variance plots for all 5% quantiles for the Tweedie model. The solid line represents the estimated value for the mean-variance power parameter p=1.719. Other values are plotted for comparison