L G Glance1, T M Osler, P Papadakos. 1. University of Rochester School of Medicine, NY, USA. laurent_glance@urmc.rochester.edu
Abstract
OBJECTIVE: To evaluate the impact of case mix variation on the performance of the Acute Physiology and Chronic Health Evaluation (APACHE) II using measures of calibration and discrimination. DESIGN: APACHE II data were collected prospectively at the surgical intensive care unit of the University of Vermont on all adult admissions over an 8-yr period (excluding cardiac surgical patients, burn patients, and patients < 16 yrs of age). The original case mix was systematically varied to create 2,000 different case mixes ranging in mortality between 5% and 18% using a computer-intensive resampling algorithm. The area under the receiver operating characteristic curve and the Hosmer-Lemeshow C statistic were derived for each of the simulated case mixes with bootstrapping. SETTING: The surgical intensive care unit at a 450-bed teaching hospital. PATIENTS: A group of 6,806 adult surgical patients excluding cardiac surgical patients and burn patients. MEASUREMENTS AND RESULTS: Simulated data sets were created from a database of patients treated at a single institution to test the hypothesis that the performance of APACHE II is stable across a clinically reasonable range of mortality rates. The discrimination and calibration of APACHE II varied with case mix. CONCLUSION: The discrimination of APACHE II is not independent of case mix. However, the variability of the Hosmer-Lemeshow statistic as a function of the case mix may simply reflect the limitations of this goodness of fit statistic to assess model calibration. Because the discrimination of APACHE II is a function of case mix, caution should be exercised when using APACHE II-based adjusted mortality rates to compare intensive care units with widely divergent case mixes.
OBJECTIVE: To evaluate the impact of case mix variation on the performance of the Acute Physiology and Chronic Health Evaluation (APACHE) II using measures of calibration and discrimination. DESIGN: APACHE II data were collected prospectively at the surgical intensive care unit of the University of Vermont on all adult admissions over an 8-yr period (excluding cardiac surgical patients, burn patients, and patients < 16 yrs of age). The original case mix was systematically varied to create 2,000 different case mixes ranging in mortality between 5% and 18% using a computer-intensive resampling algorithm. The area under the receiver operating characteristic curve and the Hosmer-Lemeshow C statistic were derived for each of the simulated case mixes with bootstrapping. SETTING: The surgical intensive care unit at a 450-bed teaching hospital. PATIENTS: A group of 6,806 adult surgical patients excluding cardiac surgical patients and burn patients. MEASUREMENTS AND RESULTS: Simulated data sets were created from a database of patients treated at a single institution to test the hypothesis that the performance of APACHE II is stable across a clinically reasonable range of mortality rates. The discrimination and calibration of APACHE II varied with case mix. CONCLUSION: The discrimination of APACHE II is not independent of case mix. However, the variability of the Hosmer-Lemeshow statistic as a function of the case mix may simply reflect the limitations of this goodness of fit statistic to assess model calibration. Because the discrimination of APACHE II is a function of case mix, caution should be exercised when using APACHE II-based adjusted mortality rates to compare intensive care units with widely divergent case mixes.