| Literature DB >> 24303338 |
Thomas Weichle1, Denise M Hynes, Ramon Durazo-Arvizu, Elizabeth Tarlov, Qiuying Zhang.
Abstract
The distributions of medical costs are often skewed to the right because small numbers of patients use large amounts of health care resources. Using data from a study of colon cancer costs, we show, by example, the impact and magnitude of outliers and influential observations on health care costs and compared the effects of statistical costing methods for addressing the disproportionate influence of outliers and influential observations. We used data from a retrospective cohort study of 3,842 elderly veterans with colon cancer who were enrolled in and used health care from, both the Department of Veterans Affairs and Medicare in 1999-2004. After calculating the average colon cancer episode cost and distribution for the full cohort, we used box-plot methods, Winsorization, DFBETAs, and Cook's distance to identify and assess or adjust the outlying and/or influential observations. The number of observations identified as outlying and/or influential ranged from 13 when the predicted DFBETA measurement was greater than 0.15 and the observation was a qualified box-plot outlier to 384 cases using the Winsorization method at the 5th and 95th percentiles. Average costs of colon cancer episodes using these methods were similar. The method of choice from the results of this particular analysis can be conditionally based on whether the purpose is to control only for influential observations or to simultaneously control for outliers and influential observations. Understanding how estimates could change with each approach is important in assessing the impact of a particular method on the results.Entities:
Keywords: Colon cancer; Episode of care; Health care costs; Influential observations; Outliers
Year: 2013 PMID: 24303338 PMCID: PMC3843184 DOI: 10.1186/2193-1801-2-614
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Sample characteristics of veterans with colon cancer
| Full sample | ||
|---|---|---|
| (N = 3,842) | ||
| % | ||
|
| 66–75 | 49.2 |
| 76–85 | 45.8 | |
| 86 and older | 5.0 | |
|
| Male | 96.5 |
|
| African American | 15.5 |
|
| Not married | 37.7 |
| Married | 59.6 | |
| Unknown | 2.8 | |
|
| I | 26.8 |
| II | 30.7 | |
| III | 23.2 | |
| IV | 19.3 | |
|
| 0 | 51.4 |
| 1 | 25.6 | |
| 2–3 | 18.5 | |
| 4 or higher | 4.5 | |
|
| Yes | 33.6 |
|
| Yes | 89.4 |
|
| 4th quartile | 24.9 |
|
| High use | 23.5 |
|
| New England | 3.1 |
| Middle Atlantic | 16.6 | |
| East North Central | 9.3 | |
| West North Central | 10.4 | |
| South Atlantic | 18.9 | |
| East South Central | 5.8 | |
| West South Central | 13.7 | |
| Mountain | 2.8 | |
| Pacific | 19.4 | |
aDeyo-Charlson Comorbidity Index with Romano adaptations.
bMeasured from months −6 to 0 in study period.
cMeasured from months 0 to 12 in study period.
dMeasured from months −1 to 6 in study period.
eMeasured from months −6 to −1 in study period.
Figure 1Sample size, average cost, and cost distribution for each analytic approach. BP: box-plot.
Summary of costs for observations identified as outliers and influential observations
| BP outliers | Winsor02 | Winsor05 | DFBETA03 | DFBETA15 | BP/DFBETA15 | Cook’s distance | |
|---|---|---|---|---|---|---|---|
| N mean (SD) (range) | |||||||
|
| |||||||
| 227 | 152 | 384 | 275 | 16 | 13 | 113 | |
| 52,952 | 108,152 | 77,669 | 99,398 | 265,093 | 299,690 | 164,845 | |
| (113,674) | (128,017) | (94,688) | (94,326) | (117,330) | (98,632) | (104,123) | |
| (43–679,472) | (43–679,472) | (43–679,472) | (100–679,472) | (50,397–553,115) | (174,413–553,115) | (33,642–679,472) | |
|
| |||||||
| I | 135 | 68 | 167 | 114 | 8 | 6 | 39 |
| 14,548 | 49,340 | 32,973 | 63,414 | 230,744 | 258,473 | 132,428 | |
| (57,758) | (90,478) | (67,817) | (72,948) | (69,944) | (55,354) | (70,472) | |
| (77–358,478) | (77–358,478) | (77–358,478) | (100–358,478) | (132,207–358,478) | (213,036–358,478) | (45,404–358,478) | |
| II | 40 | 40 | 91 | 62 | 3 | 2 | 41 |
| 112,481 | 155,546 | 116,974 | 136,742 | 319,201 | 453,603 | 172,793 | |
| (166,986) | (150,250) | (110,805) | (119,407) | (253,169) | (140,731) | (127,453) | |
| (43–679,472) | (43–679,47) | (43–679,472) | (105–679,472) | (50,397–553,115) | (354,092–553,115) | (33,642–679,472) | |
| III | 24 | 22 | 64 | 40 | 2 | 2 | 22 |
| 165,703 | 207,876 | 134,272 | 155,595 | 241,240 | 241,240 | 195,733 | |
| (131,600) | (97,587) | (84,274) | (90,093) | (94,508) | (94,508) | (98,996) | |
| (120–405,892) | (120–405,892) | (120–405,892) | (53,407–405,892) | (174,413–308,068) | (174,413–308,068) | (61,253–405,892) | |
| IV | 28 | 22 | 62 | 59 | 3 | 3 | 11 |
| 56,426 | 104,038 | 81,940 | 91,583 | 318,483 | 318,483 | 188,381 | |
| (112,415) | (119,830) | (84,699) | (70,256) | (60,079) | (60,079) | (99,654) | |
| (71–362,784) | (71–362,784) | (71–362,784) | (362–362,784) | (250,098–362,784) | (250,098–362,784) | (66,227–362,784) | |
|
| |||||||
| No | 73 | 27 | 90 | 110 | 4 | 1 | 14 |
| 5,190 | 16,293 | 16,885 | 34,338 | 153,281 | 267,611 | 100,700 | |
| (31,172) | (59,158) | (45,320) | (44,217) | (89,802) | (−−-) | (61,065) | |
| (90–267,611) | (90–267,611) | (90–267,611) | (100–267,611) | (50,397–267,611) | (267,611–267,611) | (33,642–267,611) | |
| Yes | 154 | 125 | 294 | 165 | 12 | 12 | 99 |
| 75,592 | 127,993 | 96,276 | 142,771 | 302,364 | 302,364 | 173,917 | |
| (130,482) | (130,341) | (98,049) | (93,989) | (102,525) | (102,525) | (105,947) | |
| (43–679,472) | (43–679,472) | (43–679,472) | (394–679,472) | (174,413–553,115) | (174,413–553,115) | (45,949–679,472) | |
BP: box-plot.
Summary of costs among cohorts identified using methods for handling outliers/influential observations
| BP outliers | Winsor02 | Winsor05 | DFBETA03 | DFBETA15 | BP/DFBETA15 | Cook’s distance | |
|---|---|---|---|---|---|---|---|
| N mean (SD) (range) | |||||||
|
| |||||||
| 3,615 | 3,842 | 3,842 | 3,567 | 3,826 | 3,829 | 3,729 | |
| 37,409 | 36,745 | 35,714 | 33,619 | 37,379 | 37,440 | 34,493 | |
| (25,755) | (27,814) | (23,763) | (22,633) | (33,671) | (33,754) | (24,792) | |
| (5,310–165,803) | (693–135,659) | (6,255–96,806) | (43–210,530) | (43–679,472) | (43–679,472) | (43–228,199) | |
|
| |||||||
| I | 893 | 1,028 | 1,028 | 914 | 1,020 | 1,022 | 989 |
| 29,783 | 26,796 | 26,437 | 23,338 | 26,190 | 26,428 | 23,656 | |
| (24,578) | (25,489) | (21,321) | (16,513) | (25,192) | (25,742) | (19,622) | |
| (5,506–165,803) | (693–135,659) | (6,255–96,806) | (77–116,396) | (77–242,913) | (77–242,913) | (77–160,942) | |
| II | 1,141 | 1,181 | 1,181 | 1,119 | 1,178 | 1,179 | 1,140 |
| 36,547 | 37,074 | 35,725 | 33,710 | 38,406 | 38,416 | 34,312 | |
| (25,869) | (27,988) | (23,505) | (22,774) | (38,106) | (38,092) | (24,154) | |
| (6,315–153,706) | (693–135,659) | (6,255–96,806) | (43–210,530) | (43–679,472) | (43–679,472) | (43–210,530) | |
| III | 866 | 890 | 890 | 850 | 888 | 888 | 868 |
| 43,976 | 45,170 | 43,750 | 42,160 | 46,821 | 46,821 | 43,495 | |
| (24,262) | (26,889) | (22,784) | (22,905) | (36,306) | (36,306) | (25,197) | |
| (7,313–164,668) | (693–135,659) | (6,255–96,806) | (120–184,422) | (120–405,892) | (120–405,892) | (120–228,199) | |
| IV | 715 | 743 | 743 | 684 | 740 | 740 | 732 |
| 40,353 | 39,897 | 38,909 | 36,592 | 39,834 | 39,834 | 38,743 | |
| (26,198) | (27,589) | (24,214) | (23,825) | (28,470) | (28,470) | (26,055) | |
| (5,310–150,985) | (693–135,659) | (6,255–96,806) | (71–150,485) | (71–237,281) | (71–237,281) | (71–150,985) | |
|
| |||||||
| No | 334 | 407 | 407 | 297 | 403 | 406 | 393 |
| 28,911 | 24,285 | 24,531 | 21,071 | 23,380 | 24,058 | 21,947 | |
| (23,990) | (24,383) | (21,417) | (15,166) | (22,543) | (24,147) | (20,392) | |
| (5,310–162,908) | (693–135,659) | (6,255–96,806) | (90–99,518) | (90–135,659) | (90–162,908) | (90–127,699) | |
| Yes | 3,281 | 3,435 | 3,435 | 3,270 | 3,423 | 3,423 | 3,336 |
| 38,274 | 38,222 | 37,040 | 34,758 | 39,027 | 39,027 | 35,971 | |
| (25,775) | (27,829) | (23,681) | (22,855) | (34,377) | (34,377) | (24,849) | |
| (5,506–165,803) | (693–135,659) | (6,255–96,806) | (43–210,530) | (43–679,472) | (43–679,472) | (43–228,199) | |
BP: box-plot.
Figure 2Estimated expense rate ratios for key cost-drivers. ERR: expense rate ratio; CI: confidence interval; BP: box-plot.
Figure 3Estimated Post-Modeling Cost Predictions for Key Cost-Drivers. CI: confidence interval; BP: box-plot.