Literature DB >> 23724384

Modification of Acute Physiology and Chronic Health Evaluation II score through recalibration of risk prediction model in critical care patients of a respiratory disease referral center.

Ali A Velayati¹, Yadollah Mehrabi, Golnar Radmand, Ali A Khadem Maboudi, Hamid R Jamaati, A Shahbazi, Seyed A Mohajerani, Seyed M R Hashemian.

Abstract

BACKGROUND: Several models have been developed to measure the severity of illness in intensive care unit (ICU) patients, It is suggested that the models should be customized depending on the characteristics of different population of patients. This study is aimed to assess and modify the performance of Acute Physiology and Chronic Health Evaluation II (APACHE-II) model in a respiratory diseases referral center.
MATERIALS AND METHODS: A total of 730 patients, admitted to an intensive care unit during one year, were divided into two sets (71% training and 29% test). Our modified APACHE-II model was developed and calibrated on training set. Then, the integrity of the customized model was checked and compared to the original APACHE-II, on the test set. Logistic regression was used to develop ROC analysis, F-measure and kappa coefficient and were employed to calibrate the model.
RESULTS: Both Original and Our modified APACHE-II scores performed acceptable discriminative power (AUC = 0.908: 95%CI 0.861-0.854; and AUC = 0.856: 95%CI 0.789-0.923, respectively); the difference was not significant (P = 0.132). Our modified APACHE-II showed improved accuracy (87.9% vs. 84.1%) and sensitivity (56.4% vs. 16.3%) compared to the original model. F-measure and Kappa also gave the impression of improvement for our modified APACHE-II system.
CONCLUSION: The results demonstrated that a modified APACHE-II system in a local ICU of respiratory disease could have similar discrimination and comparable calibration to the original model.

Entities: Chemical Disease Gene Species

Keywords: Acute Physiology and Chronic Health Evaluation II; calibration; intensive care unit

Year: 2013 PMID： 23724384 PMCID： PMC3665118 DOI： 10.4103/2229-5151.109419

Source DB: PubMed Journal: Int J Crit Illn Inj Sci ISSN： 2229-5151

INTRODUCTION

The Intensive Care Units (ICU) admitted patients usually have wide range of physiological individualities. Comparison of the patient's outcomes without consideration of individual physiological differences does not seem to be sensible. It is obvious that, some measures to represent the physiological conditions of ICU patients have to be considered in order to control and adjust mentioned differences. During the past decades, comparisons have been made on various models which have been applied for ICU patients’ outcomes. The mentioned risk classification systems could also be applied to evaluate the performance of ICUs. In fact these systems are statistical models, by which the probability of an event such as fatality, could be measured from the characteristics and information of each individual patient.[1] Similar systems such as Simplified Acute Physiology Score (SAPS), Mortality Probability Model (MPM), and Acute Physiology And Chronic Health Evaluation II (APACHE-II) have been developed in various versions,[2-7] or estimation of the probability of mortality in the ICUs. The APACHE-II is derived from 11 physiological variables, Glasgow Coma Score (GCS), patient's age and the chronic health status scoring from zero in normal patients to 71 in worst scenario. The probability of mortality could be estimated using a logistic regression model, based on the model coefficients which are derived from categorization of ICU admission causes. There have been many surveys for evaluation and comparison of the performances of APACHE-II for different populations; but no consistent pattern on accuracy of outcome predictions has been observed. The observed mortalities were more than model predictions in some cases, and less in some other cases,[89] although some of the other studies have shown a good performance for the models.[1011] Many surveys have been carried out to assess the performance of existing models in other populations. They mainly established that the intensive care risk prediction models, primarily developed in other countries, require validation and recalibration, prior to putting into actual exercise, within a new country setting.[11-14] In this study the performance of APACHE-II system was evaluated and customized on a number of ICU patients in a tertiary referral center in order to develop a good model for evaluating the severity of illness for respiratory ICU patients.

MATERIALS AND METHODS

Patient selection and data collection

Patients admitted to NRITLD's ICU, in the period of time between Jan 2009 to Feb 2010, were included in the study. Clinical, laboratory and monitoring variables were collected from the patients’ medical records. Data's were collected by trained general physicians using forms designed for this purpose. To control the quality of data, first 10 sheets filled by each person were reviewed by the supervisor; and one out of twenty forms was rechecked afterwards. Finally the quality controlled data sets were entered to a data bank.

Statistical analysis

The quantitative variables of different groups were presented in form of mean ± SD and the qualitative variables were presented by n (%). Student's t test was applied to compare the quantitative variables between different groups, and the qualitative variables were compared using chi-square or Fisher's exact test. The total sample was divided into two subsamples: Training set for refitting the variables into a modified model. Test set for evaluating the model developed. The scores and probability of death were computed according to the original APACHE II model.

Development of the modified APACHE-II

All variables of APACHE-II were entered in a logistic regression model and new coefficients were estimated using training data set. For each categorical variable, the normal group was considered as reference group. The modified scoring system was developed based on significant regression coefficients; non-significant variables, according to the Wald test, were allocated zero score. The variable with the smallest coefficient was given a score of 1. The coefficients of the other significant variables were divided by the smallest coefficient and then rounded to obtain their scores. After preparing the modified scoring system, which we will call modified APACHE-II hereafter, the modified scores were calculated for all patients in the study. These scores were employed as independent variable in a logistic regression models with observed death as dependent variable.

Evaluation of the discriminative powers

For assessment of the discriminative power of different models in both training and test data sets, the Receiver Operating Characteristic (ROC) curve analysis and its Area Under Curve (AUC) was used.

Calibration

Sensitivity, Specificity and Accuracy rate were used to evaluate the calibration of different models. The Kappa coefficient was also used for evaluating the agreement of observed and expected mortality according to the models. We used F-measure, F = (2 × Sensitivity × PPV)/(PPV + Sensitivity) A harmonic mean of the sensitivity and positive predictive value (PPV), as a measure of calibration.[15] In addition, Hosmer-Lemeshow test, goodness of fit test was used to evaluate the calibration. For this purpose, the sample was sorted increasingly by the expected probability of death and then divided into 10 approximately equal size groups. Then the observed and expected mortalities were plotted for each group. The Hosmer-Lemeshow test was not applicable here because the expected mortality in some groups was less than 5; so we just used groups made of this method.

RESULTS

Main characteristics of the patients

Among 730 patients who were admitted during the study time in the ICU, 143 (19.6%) died in hospital. There were 423 (57.9%) males and 307 (42.1%) females in this sample. The mean age was 46.9 ± 9.19 years; that was significantly higher in death group compared to alive (P < 0.001). The mean of original APACHE-II score in death group was significantly more than alive (16.84 ± 7.2 vs. 7.37±4.6; P < 0.001). The main characteristics of patients are shown in Table 1. The first diagnosis at the time of admission was categorized according to APACHE-II classification [Table 2].

Table 1

Basic characteristics of patient's, involved in the study

Table 2

First diagnosis, at the time of admission to ICU

Basic characteristics of patient's, involved in the study First diagnosis, at the time of admission to ICU

Training and test sets

The total sample was divided into two groups: 523 (71.6%) in training dataset and 207 (28.4%) in the test set. The comparison of basic characteristics like age, sex and mortality between training and test set shows no significant difference (P = 0.081, P = 0.905 and P = 0.823, respectively).

Model building

The variables of the Original APACHE-II system were entered in a logistic regression models and modified scoring system was developed [Table 3]. Wald test showed that variables including temperature, mean arterial pressure, serum sodium, serum potassium and serum creatinine were not significant between two groups. Therefore these variables were removed from the modified APACHE-II system. Accordingly, the modified APACHE-II scores were computed for all patients; and the following model was developed for estimating the probability of death:

Table 3

NRITLD version of APACHE II Score (Modified)

Logit (π) = -4.866 + 0.331 × Local APACHE-II Score NRITLD version of APACHE II Score (Modified)

Comparing the discriminative powers

In the training set, the area under the ROC curve for original APACHE-II was 0.860 (95% CI: 0.820-0.899), and for modified APACHE-II was 0.874 (95% CI: 0.832-0916); in the test set AUC for Original APACHE-II was 0.908 (95% CI: 0.861-0.954) and for the Modified APACHE-II was 0.856 (95% CI: 0.789-0.923) [Figure 1].

Figure 1

Roc curve analysis for comparing the discriminative power of Original and modified APACHE II systems; (a) The ROC curves for the test set; (b) The ROC curve for the training set

Comparing the calibrations

The results of Original APACHE-II shows that on test set it had 84.1% accuracy, 16.3% sensitivity, 100% specificity, F-measure = 0.280 and Kappa = 0.228; while the Modified APACHE II system had 87.9% accuracy, 56.4% sensitivity, 96.4% specificity, F-measure = .656 and Kappa = 0.593 [Table 4].

Table 4

Comparison of Original and Modified APACHE II systems

Comparison of Original and Modified APACHE II systems To evaluate the calibration, observed probability was plotted against expected probability, in 10 groups of Hosmer-Lemeshow method [Figure 2]. In this plot, the solid line shows the reference line for the good calibration and the dashed line shows relationship between expected and observed probabilities. The deviation of the dashed line from the solid one shows the deviation of expected from the observed probabilities. Accordingly, the Modified APACHE-II system shows better calibration in both training and test datasets.

Figure 2

Comparing the calibration of Original and Modified APACHE II via 10 groups of Hosmer-Lemeshow method, (a) Original APACHE II on training set, (b) Original APACHE II on test set, (c) Modified APACHE II on training set, (d) Modified APACHE II on test set

DISCUSSION

In this study we tried to customize one of the most commonly used models of ICU risk-adjustment and embarked on developing a modified model with comparable performance based on our ICU patients. The main reason of using APACHE-II instead of APACHE-IV in this study was that the APACHE-IV system has some variables such as albumin and bilirubin that are not routinely registered for patients in our hospitals at first 24 hours of ICU admission. Accordingly using the APACHE-IV system was not feasible. The performance of different predictive models has two main aspects; discriminative power and calibration (goodness-of-fit). The results of this survey showed that, the Original APACHE-II system had good discriminative power in our population, but poor calibration according to the goodness of fit measures. The Modified APACHE-II score showed a good discriminative power as well, although its area under ROC curve in the test set was less than the Original APACHE-II. Although the non-significant discriminative power and slight increase in accuracy (87.9% Vs. 84.1%) are among the outcomes of our study, the remarkable increase in sensitivity could be due to the sample selection (all from a tertiary respiratory referral center). Thus it is recommended that the external validity of our model would be considered in further studies. In this study, all calibration indices were in favor of the Modified APACHE-II except specificity and positive predictive values. This might be due to underestimation of the probability of death in our population by the original APACHE-II score. There are several factors that have effect on the performance of models in different populations. Some researchers believe that the different results of model calibrations could be the effect of various combinations of patients.[16] Most of the patients in the present study were suffering from respiratory and lung diseases making the combination of diseases different from which the original APACHE-II has been developed. This might be one of the reasons for the difference in calibration of APACHE-II score in our population. Most of the previous studies have shown good discriminative power but different calibration,[89] yet researchers try to improve the performance of these models.[61213] Murphy Filkins et al. showed that when a unit or patient population differs substantially from average condition, using the customized models is important. They showed that increasing frequency of patients with each disease characteristics above the original frequency may cause the discrimination and calibration to deteriorate.[17] Furthermore, APACHE-II system is consisted on characteristics of patients at the first 24 hours of admission, and measures cannot be considered independent from treatments and the quality of medical care. Moreover, the starting point for this model is the time of admission which does not have standard definition and is often influenced by the condition of the ICU such as number of beds, quality of pre-hospital care, etc.[1618] This all could explain why the modification of models for different populations or specific groups of patients such as respiratory patients is needed. Our modified model leaves mean arterial pressure and creatinine out of the system. Our patient population, due to random factors, might have displayed a low incidence of renal failure, and so creatinine would not add much to the predictive model. Omitting the mean arterial pressure (MAP) from the model, could be due to its strong co linearity with other variables such age, PH and WBC. Since these variables were more powerful predictors than MAP; and they were highly correlated with MAP, putting them together in a model could increase the variance of regression coefficients and consequently decrease the precision of model. Definitely, it doesn’t mean that MAP is not a powerful factor in estimation of the severity of illness. Although we discuss cogent points about customization, calibration and discrimination of APACHE models, the other aspect is that APACHE and MPM have primarily been developed on USA patients; while SAPS has a more international component and the updated versions did not include USA patients, so we suggest future studies designed based on SAPS. Definitely modified models would benefit the most if they are recalibrated in a larger ICU patient population. A limitation of this study was that the sample was just from one ICU. It is anticipated that, a larger sample size that includes respiratory patients with different characteristics from different ICUs would lead to development of better models; and could help to come across the deficiencies and improve this modified model. Besides, the calibration and discriminative power of this customized model could be studied in other respiratory disease ICUs.

CONCLUSION

The results of this survey show that the calibration of APACHE II model on specific group of patients (respiratory disease) reduced the number of variables and enhanced its performance. It seems that APACHE-II score has its own pros and cons and could be modified to increase its accuracy, performance, and adaptability in a local ICU. Obviously, with increasing developments in treatment methods and changing the mortality patterns in different populations, the scoring systems need frequently change and update. Also, the results of this research emphasize that fitting the new models for specific groups of patients leads to reach more abstract models with fewer variables.

16 in total

1. Validation of severity scoring systems SAPS II and APACHE II in a single-center population.

Authors: M Capuzzo; V Valpondi; A Sgarbi; S Bortolazzi; V Pavoni; G Gilli; G Candini; G Gritti; R Alvisi
Journal: Intensive Care Med Date: 2000-12 Impact factor: 17.440

2. Agreement, the f-measure, and reliability in information retrieval.

Authors: George Hripcsak; Adam S Rothschild
Journal: J Am Med Inform Assoc Date: 2005-01-31 Impact factor: 4.497

3. Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom.

Authors: David A Harrison; Anthony R Brady; Gareth J Parry; James R Carpenter; Kathy Rowan
Journal: Crit Care Med Date: 2006-05 Impact factor: 7.598

4. Veterans Affairs intensive care unit risk adjustment model: validation, updating, recalibration.

Authors: Marta L Render; James Deddens; Ron Freyberg; Peter Almenoff; Alfred F Connors; Douglas Wagner; Timothy P Hofer
Journal: Crit Care Med Date: 2008-04 Impact factor: 7.598

5. Comparison of acute physiology and chronic health evaluations II and III and simplified acute physiology score II: a prospective cohort study evaluating these methods to predict outcome in a German interdisciplinary intensive care unit.

Authors: R Markgraf; G Deutschinoff; L Pientka; T Scholten
Journal: Crit Care Med Date: 2000-01 Impact factor: 7.598

6. Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients.

Authors: S Lemeshow; D Teres; J Klar; J S Avrunin; S H Gehlbach; J Rapoport
Journal: JAMA Date: 1993-11-24 Impact factor: 56.272

7. Prospective validation of the intensive care unit admission Mortality Probability Model (MPM0-III).

Authors: Thomas L Higgins; Andrew A Kramer; Brian H Nathanson; Wayne Copes; Maureen Stark; Daniel Teres
Journal: Crit Care Med Date: 2009-05 Impact factor: 7.598

8. Comparison of the performance of SAPS II, SAPS 3, APACHE II, and their customized prognostic models in a surgical intensive care unit.

Authors: Y Sakr; C Krauss; A C K B Amaral; A Réa-Neto; M Specht; K Reinhart; G Marx
Journal: Br J Anaesth Date: 2008-10-09 Impact factor: 9.166

9. Mortality prediction using SAPS II: an update for French intensive care units.

Authors: Jean Roger Le Gall; Anke Neumann; François Hemery; Jean Pierre Bleriot; Jean Pierre Fulgencio; Bernard Garrigues; Christian Gouzes; Eric Lepage; Pierre Moine; Daniel Villers
Journal: Crit Care Date: 2005-10-06 Impact factor: 9.097

10. Assessment of performance of four mortality prediction systems in a Saudi Arabian intensive care unit.

Authors: Yaseen Arabi; Samir Haddad; Radoslaw Goraj; Abdullah Al-Shimemeri; Salim Al-Malik
Journal: Crit Care Date: 2002-03-13 Impact factor: 9.097

5 in total

1. The comparison of extemporaneous preparations of omeprazole, pantoprazole oral suspension and intravenous pantoprazole on the gastric pH of critically ill-patients.

Authors: Yasamin Dabiri; Fanak Fahimi; Hamidreza Jamaati; Seyed Mohammad Reza Hashemian
Journal: Indian J Crit Care Med Date: 2015-01

2. Acute Kidney Injury Risk Factors For ICU Patients Following Cardiac Surgery: The Application of Joint Modeling.

Authors: Batoul Khoundabi; Anoshirvan Kazemnejad; Marjan Mansourian; Seyed Mohammadreza Hashemian; Mehdi Kazempoor Dizaji
Journal: Trauma Mon Date: 2016-03-28

3. Investigating PIK₃R₃ and ATp₂A₁ Genes Expressions in Ventilator-Associated Pneumonia Patients Admitted to the Intensive Care Unit of Masih Daneshvari Hospital in 2016.

Authors: Hamidreza Jamaati; Naghmeh Bahrami; Mahya Daustany; Payam Tabarsi; Behrooz Farzanegan; Seyed Mohammadreza Hashemian; Abdolreza Mohamadnia
Journal: Rep Biochem Mol Biol Date: 2018-04

4. Association between Severity of Chronic Obstructive Pulmonary Disease and Lung Function Tests.

Authors: Hamid Reza Jamaati; Bahareh Heshmat; Ronak Tamadon; Abbas Hamidi Rad; Seyed Amir Mohajerani; Golnar Radmand; Seyed Mohammad Reza Hashemian
Journal: Tanaffos Date: 2013

5. Tracheal Stenosis and Cuff Pressure: Comparison of Minimal Occlusive Volume and Palpation Techniques.

Authors: Ziae Totonchi; Fatemeh Jalili; Seyed Mohammadreza Hashemian; Hamid Reza Jabardarjani
Journal: Tanaffos Date: 2015

5 in total