Literature DB >> 35820692

Clinical prediction models for mortality in patients with covid-19: external validation and individual participant data meta-analysis.

Valentijn M T de Jong1,2,3, Rebecca Z Rousset4, Neftalí Eduardo Antonio-Villa5,6, Arnoldus G Buenen7,8, Ben Van Calster9,10,11, Omar Yaxmehen Bello-Chavolla5, Nigel J Brunskill12,13, Vasa Curcin14, Johanna A A Damen4,2, Carlos A Fermín-Martínez5,6, Luisa Fernández-Chirino5,15, Davide Ferrari14,16, Robert C Free17,18, Rishi K Gupta19, Pranabashis Haldar17,18,20, Pontus Hedberg21,22, Steven Kwasi Korang23, Steef Kurstjens24, Ron Kusters24,25, Rupert W Major12,12, Lauren Maxwell26, Rajeshwari Nair27,28, Pontus Naucler21,22, Tri-Long Nguyen4,29,30, Mahdad Noursadeghi31, Rossana Rosa32, Felipe Soares33, Toshihiko Takada4,34, Florien S van Royen4, Maarten van Smeden4, Laure Wynants8,35, Martin Modrák36, Folkert W Asselbergs37,38,39, Marijke Linschoten37, Karel G M Moons4,2, Thomas P A Debray4,2.   

Abstract

OBJECTIVE: To externally validate various prognostic models and scoring rules for predicting short term mortality in patients admitted to hospital for covid-19.
DESIGN: Two stage individual participant data meta-analysis.
SETTING: Secondary and tertiary care. PARTICIPANTS: 46 914 patients across 18 countries, admitted to a hospital with polymerase chain reaction confirmed covid-19 from November 2019 to April 2021. DATA SOURCES: Multiple (clustered) cohorts in Brazil, Belgium, China, Czech Republic, Egypt, France, Iran, Israel, Italy, Mexico, Netherlands, Portugal, Russia, Saudi Arabia, Spain, Sweden, United Kingdom, and United States previously identified by a living systematic review of covid-19 prediction models published in The BMJ, and through PROSPERO, reference checking, and expert knowledge. MODEL SELECTION AND ELIGIBILITY CRITERIA: Prognostic models identified by the living systematic review and through contacting experts. A priori models were excluded that had a high risk of bias in the participant domain of PROBAST (prediction model study risk of bias assessment tool) or for which the applicability was deemed poor.
METHODS: Eight prognostic models with diverse predictors were identified and validated. A two stage individual participant data meta-analysis was performed of the estimated model concordance (C) statistic, calibration slope, calibration-in-the-large, and observed to expected ratio (O:E) across the included clusters. MAIN OUTCOME MEASURES: 30 day mortality or in-hospital mortality.
RESULTS: Datasets included 27 clusters from 18 different countries and contained data on 46 914patients. The pooled estimates ranged from 0.67 to 0.80 (C statistic), 0.22 to 1.22 (calibration slope), and 0.18 to 2.59 (O:E ratio) and were prone to substantial between study heterogeneity. The 4C Mortality Score by Knight et al (pooled C statistic 0.80, 95% confidence interval 0.75 to 0.84, 95% prediction interval 0.72 to 0.86) and clinical model by Wang et al (0.77, 0.73 to 0.80, 0.63 to 0.87) had the highest discriminative ability. On average, 29% fewer deaths were observed than predicted by the 4C Mortality Score (pooled O:E 0.71, 95% confidence interval 0.45 to 1.11, 95% prediction interval 0.21 to 2.39), 35% fewer than predicted by the Wang clinical model (0.65, 0.52 to 0.82, 0.23 to 1.89), and 4% fewer than predicted by Xie et al's model (0.96, 0.59 to 1.55, 0.21 to 4.28).
CONCLUSION: The prognostic value of the included models varied greatly between the data sources. Although the Knight 4C Mortality Score and Wang clinical model appeared most promising, recalibration (intercept and slope updates) is needed before implementation in routine care. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY. No commercial re-use. See rights and permissions. Published by BMJ.

Entities:  

Mesh:

Year:  2022        PMID: 35820692      PMCID: PMC9273913          DOI: 10.1136/bmj-2021-069881

Source DB:  PubMed          Journal:  BMJ        ISSN: 0959-8138


Introduction

Covid-19 has had a major impact on global health and continues to disrupt healthcare systems and social life. Millions of deaths have been reported worldwide since the start of the pandemic in 2019.1 Although vaccines are now widely deployed, the incidence of SARS-CoV-2 infection and the burden of covid-19 remain extremely high. Many countries do not have adequate resources to effectively implement vaccination strategies. Also, the timing and sequence of vaccination schedules are still debatable, and virus mutations could yet hamper the future effectiveness of vaccines.2 Covid-19 is a clinically heterogeneous disease of varying severity and prognosis.3 Risk stratification tools have been developed to target prevention and management or treatment strategies, or both, for people at highest risk of a poor outcome.4 Risk stratification can be improved by the estimation of the absolute risk of unfavourable outcomes in individual patients. This involves the implementation of prediction models that combine information from multiple variables (predictors). Predicting the risk of mortality with covid-19 could help to identify those patients who require the most urgent help or those who would benefit most from treatment. This would facilitate the efficient use of limited medical resources, and reduce the impact on the healthcare system—especially intensive care units. Furthermore, if a patient’s risk of a poor outcome is known at hospital admission, predicting the risk of mortality could help with planning the use of scarce resources. In a living systematic review (update 3, 12 January 2021; www.covprecise.org), 39 prognostic models for predicting short term (mostly in-hospital) mortality in patients with a diagnosis of covid-19 have been identified.5 Despite many ongoing efforts to develop covid-19 related prediction models, evidence on their performance when validated in external cohorts or countries is largely unknown. Prediction models often perform worse than anticipated and are prone to poor calibration when applied to new individuals.6 7 8 Clinical implementation of poorly performing models leads to incorrect predictions and could lead to unnecessary interventions, or to the withholding of important interventions. Both result in potential harm to patients and inappropriate use of medical resources. Therefore, prediction models should always be externally validated before clinical implementation.9 These validation studies are performed to quantify the performance of a prediction model across different settings and populations and can thus be used to identify the potential usefulness and effectiveness of these models for medical decision making.7 8 10 11 12 We performed a large scale international individual participant data meta-analysis to externally validate the most promising prognostic models for predicting short term mortality in patients admitted to hospital with covid-19.

Methods

Review to identify covid-19 related prediction models

We used the second update (21 July 2020) of an existing living systematic review of prediction models for covid-19 to identify multivariable prognostic models and scoring rules for assessing short term (at 30 days or in-hospital) mortality in patients admitted to hospital with covid-19.5 During the third update of the living review (12 January 2021),13 additional models were found that also met the study eligibility criteria of this individual participant data meta-analysis, which we also included for external validation. We considered prediction models to be eligible for the current meta-analysis if they were developed using data from patients who were admitted to a hospital with laboratory confirmed SARS-CoV-2 infection. In papers that reported multiple prognostic models, we considered each model for eligibility. As all the prognostic models for covid-19 mortality in the second update (21 July 2020) of the living systematic review had a lower quality and high risk of bias in at least one domain of PROBAST (prediction model study risk of bias assessment tool),7 8 we only excluded models that had a high risk of bias for the participant domain and models for which applicability was deemed poor, as well as imaging based algorithms (see fig 1).
Fig 1

Flowchart of inclusion of prognostic models. The second update took place on 21 July 2020. ICU=intensive care unit

Flowchart of inclusion of prognostic models. The second update took place on 21 July 2020. ICU=intensive care unit

Review to identify patient level data for model validation

We searched for individual studies and registries containing data from routine clinical care (electronic healthcare records), and data sharing platforms with individual patient data of those admitted to hospital with covid-19. We further identified eligible data sources through the second update (21 July 2020) of the living systematic review.5 13 In addition, we consulted the PROSPERO database, references of published prediction models for covid-19, and experts in prognosis research and infectious diseases. Data sources were eligible for model validation if they contained data on mortality endpoints for consecutive patients admitted to hospital with covid-19. We included only patients with a polymerase chain reaction confirmed SARS-CoV-2 infection. We excluded patients with no laboratory data recorded in the first 24 hours of admission. In each data source, we adopted the same eligibility criteria for all models that we selected for validation. We used 30 days for the scoring rule by Bello-Chavolla et al14 when available, otherwise in-hospital mortality was used (see table 1).

Statistical analyses

For external validation and meta-analysis we used a two stage process.15 16 The first stage consisted of imputing missing data and estimating performance metrics in individual clusters. For datasets that included only one hospital (or cohort) we defined the cluster level as the individual hospital (or cohort). In the CAPACITY-COVID dataset,17 which contains data from multiple countries, we considered each country as a cluster. For the data from UnityPoint Hospitals in Iowa, United States, we considered each hospital as a cluster. We use the term cluster throughout the paper. In the second stage we performed a meta-analysis of the performance metrics.18 19 We did not perform an a priori sample size calculation, as we included all data that we found through the review and that met the inclusion criteria.

Stage 1: Validation

We imputed sporadically missing data 50 times by applying multiple imputation (see supplementary material B). Using each of the eight models, we calculated the mortality risk or mortality score of all participants, in clusters where the respective models’ predictors were measured in at least some of the participants. Subsequently, we calculated the concordance (C) statistic, observed to expected ratio (O:E ratio), calibration slope, and calibration-in-the-large for each model in each imputed cluster.11 The C statistic is an estimator for the probability of correctly identifying the patient with the outcome in a pair of randomly selected patients of which one has developed the outcome and one has not.20 The O:E ratio is the ratio of the number of observed outcomes divided by the number of outcomes expected by the prediction model. The calibration slope is an estimator of the correction factor the prediction model coefficients need to be multiplied with, to obtain coefficients that are well calibrated to the validation sample.11 21 The calibration-in-the-large is an estimator for the (additive) correction to the prediction model’s intercept, while keeping the prediction model’s coefficients fixed.11 21 Supplementary material B provides details of the model equations.

Stage 2: Pooling performance

In the second stage of the meta-analysis, we pooled the cluster specific logit C statistic, calibration slope, and log O:E ratios from stage 1.22 We used restricted maximum likelihood estimation and the Hartung-Knapp-Sidik-Jonkman method to derive all confidence intervals.23 24 To quantify the presence of between study heterogeneity, we constructed approximate 95% prediction intervals, which indicated probable ranges of performance expected in new clusters.25 We performed the analysis in R (version 4.0.0 or later, using packages mice, pROC, and metamisc) and we repeated the main analyses in STATA.26 27 28 29 30 This study is reported following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist for prediction model validation (see supplementary material C).31 32

Sensitivity analysis

None of the datasets contained all predictors, meaning the models could not all be validated in a single dataset, which hampered the interpretation. As such, for each performance measure taken separately, we performed a meta-regression on all performance estimates where we included country (not cluster, to save degrees of freedom) and model as predictors (both as dummy variables), which we had not prespecified in our protocol. Then we used these meta-models to predict the performance (and 95% confidence intervals) of each prediction model in each included country, thereby allowing for a fairer comparison of the performance between models. All R code is available from github.com/VMTdeJong/COVID-19_Prognosis_IPDMA.

Patient and public involvement

Patients and members of the public were not directly involved in this research owing to lack of funding, staff, and infrastructure to facilitate their involvement. Several authors were directly involved in the treatment of patients with covid-19, have been in contact with hospital patients with covid-19, or have had covid-19.

Results

Review of covid-19 related prediction models

We identified six prognostic models and two scoring rules that met the inclusion criteria (fig 1). Table 1 summarises the details of the models and scores. The score developed by Bello-Chavolla et al predicted 30 day mortality,14 whereas the other score and the six models predicted in-hospital mortality.33 34 35 36 37
Table 1

Overview of selected models for predicting short term mortality in patients admitted to hospital with SARS-CoV-2 infection

ModelCountry of developmentDevelopment populationPredicted outcomePredictorsModel typeEstimation method
Bello-Chavolla et al14 MexicoAll reported confirmed cases of covid-19, including hospital admission, ICU admission, and outpatient treatment.30 day mortalityAge, diabetes (type 2), obesity (clinician- defined), pneumonia, chronic kidney disease, chronic obstructive pulmonary disease, immunosuppressionScoreRounding of Cox regression coefficients (unpenalised)
Xie et al33 ChinaAdults (≥18 years) with confirmed covid-19, admitted in officially designated covid-19 treatment centresIn-hospital mortalityAge, lactate dehydrogenase, lymphocyte count, oxygen saturationPrediction modelLogistic regression (unpenalised)
Hu et al34 ChinaPatients with severe covid-19 in Tongji Hospital, which specifically accommodated for people with covid-19. Patients directly admitted to intensive care unit were excluded. Patients with certain comorbidities (including cancer, uraemia, aplastic anaemia) were also excluded. Patients with a short hospital stay (<7 days) were excludedIn-hospital mortalityAge, high sensitivity C reactive protein, D-dimer, lymphocyte countPrediction modelLogistic regression (unpenalised)
Zhang et al DCS and DCSL models35 ChinaAdults (≥18 years) admitted to two hospitalsIn-hospital mortalityDCS model: Age, sex, diabetes (unspecified), immunocompromised, malignancy, hypertension, heart disease, chronic kidney disease, cough, dyspnoeaDCSL model: Age, sex, chronic lung disease, diabetes (unspecified), malignancy, cough, dyspnoea, neutrophil count, lymphocyte count, platelet count, C reactive protein, creatininePrediction modelLogistic regression (lasso penalty)
Knight et al 4C Mortality Score36 UKAdults (≥18 years) admitted across 260 hospitalsIn-hospital mortalityAge, sex, number of comorbidities (chronic cardiac disease, respiratory disease, renal disease, liver disease, neurological conditions; dementia; connective tissue disease; diabetes (type 1 and 2); AIDS/HIV; malignancy, obesity), respiratory rate, oxygen saturation (room air), Glasgow coma scale score, urea, C reactive proteinScoreRounding of logistic regression coefficients (lasso penalty)
Wang et al clinical and laboratory models37 ChinaAdults (≥18 years) admitted to hospital. Pregnant women were excludedIn-hospital mortalityClinical model: Age, history of hypertension, history of heart diseaseLaboratory model: Age, oxygen saturation, neutrophil count, lymphocyte count, high sensitivity C reactive protein, D-dimer, aspartate aminotransferase, glomerular filtration ratePrediction modelLogistic regression (unpenalised). Intercept from nomogram
Overview of selected models for predicting short term mortality in patients admitted to hospital with SARS-CoV-2 infection The six prognostic models were estimated by logistic regression. The Bello-Chavolla score and Knight et al 4C Mortality Score were (simplified) scoring rules that could be used to stratify patients into risk groups. The Bello-Chavolla score was developed with Cox regression, whereas the 4C Mortality Score was developed with lasso logistic regression and its weights were rescaled and rounded to integer values. Although the 4C Mortality Score itself does not provide absolute risks, these were available through an online calculator. As the authors promoted the use of the online calculator, we have used these risks in our analysis. For two models by Wang et al (clinical and laboratory), no intercept was available and were approximated.38

Review of patient level data for model validation

We identified 10 data sources, including four through living systematic reviews, one through a data sharing platform, and five by experts in the specialty (fig 2). The obtained datasets included 27 clusters from 18 different countries and contained data on 46 701 patients, 16 418 of whom died (table 2). Study recruitment was between November 2019 and April 2021. Most clusters included all patients with polymerase chain reaction confirmed covid-19, although in some clusters only patients admitted through a specific department were included (see supplementary material A, table S1). Mean age ranged from 45 to 71 years.
Fig 2

Flowchart of data sources

Table 2

Characteristics of the included external validation cohorts and clusters

Dataset, clusterNo of patientsStart recruitment dateEnd recruitment dateTotal No (%) of deathsMean (SD) age; (IQR) (years)No (%) male
Karolinska Institute, Sweden167027 Feb 20201 Sep 2020193 (11.56)57.30 (18.70); (43-71)983 (58.90)
Albert Einstein Hospital, Brazil45327 Feb 202025 Jun 202017 (3.75)56.44 (14.92); (46-68.50)295 (65.05)
Czech Republic Academy of Sciences, Czech Republic2133 Mar 202012 Oct 202042 (20)*68.56 (16.56); (58-80)105 (49)
University College London, UK4111 Feb 202030 Apr 2020115 (28)66 (53-79)†252 (61.31)
General Directorate of Epidemiology, Mexico:
 All data, from this source28 1761 Mar 202016 Apr 202012 990 (46.10)58.57 (15.93); (48-70)17 019 (60.40)
 Development cohort excluded25 05611 556 (46.12)59.08 (15.94); (49-70)15 035 (60.01)
Tongji Hospital,39 China33210 Jan 202018 Feb 2020155 (46.69)58.98 (16.65); (46-70)198 (59.64)
CAPACITY-COVID:
 Belgium22112 Feb 202014 Oct 202051 (23.08)*68.14 (15.72); (57-81)137 (61.99)
 Egypt4512 Apr 202012 Aug 20209 (20)60.89 (14.37); (50-73)21 (46.67)
 France4613 Feb 202018 Dec 20203 (6.52)*67.96 (12.52); (62-77)34 (73.91)
 Iran9010 Feb 20205 May 202013 (14.44)*63.19 (15.39); (53.25-73)59 (65.56)
 Israel2510 Apr 20209 Aug 20202 (8)50.36 (20.15); (31-67)14 (56)
 Italy1066 Feb 20204 May 202022 (20.75)70.72 (11.63); (62.25-78.75)72 (67.92)
 Netherlands510022 Nov 201930 Jul 20201003 (19.67)*66.37 (14.15); (57-76)3172 (62.20)
 Portugal4425 Mar 202019 Aug 202010 (22.73)*71.43 (13.32); (63.75-82)30 (61.18)
 Russia27822 Apr 20204 Jun 202019 (6.83)60.09 (15.49); (50.25-71)137 (49.28)
 Saudi Arabia38929 Feb 202024 Sep 202057 (14.65)*50.56 (16.79); (38-62)270 (69.41)
 Spain475 Mar 202020 Apr 202010 (21.28)*70.98 (16.68); (55-83.75)28 (59.57)
Jeroen Bosch Ziekenhuis, Netherlands3839 Mar 202029 Dec 2020154 (40.21)70.21 (14.79); (61-81)226 (59.01)
Iowa (USA), UnityPoint Hospitals:
 Hospital 12883 Mar 202031 Jul 202014 (4.86)46.49 (19.29); (32-59)147 (51.04)
 Hospital 29293 Mar 202031 Jul 202067 (7.21)50.12 (22.54); (33-68)454 (48.87)
 Hospital 3953 Mar 202031 Jul 20206 (6.32)45.31 (20.54); (27-59)45 (47.37)
 Hospital 4663 Mar 202031 Jul 20206 (9.09)48.77 (21.26); (31-65)39 (59.09)
 Hospital 55113 Mar 202031 Jul 202022 (4.31)51.03 (20.63); (35-67)240 (46.97)
 Hospital 63933 Mar 202031 Jul 202022 (5.60)45.35 (19.02); 30-60176 (44.78)
 Hospital 72953 Mar 202031 Jul 202018 (6.10)47.60 (20.10); 32-63162 (54.92)
Leicester covTrack, UK3908Jan 2020Apr 2021110 (28.22)63.29 (19.19); (50-79)2063 (52.79)
King’s College Hospital, UK240028 Feb 202028 Mar 2021295 (12.29)59.76 (20.58); (47-76)1314 (54.75)

SD=standard deviation; IQR=interquartile range.

Observed number of deaths before multiple imputation.

Median (IQR).

Flowchart of data sources Characteristics of the included external validation cohorts and clusters SD=standard deviation; IQR=interquartile range. Observed number of deaths before multiple imputation. Median (IQR).

External validation and meta-analysis

All results are presented in supplementary material D. The Wang clinical model could be validated in 24 clusters (see supplementary table S2), followed by Hu et al’s model, which was validated in 16 clusters (see supplementary table S3). The remaining models were less often validated, as predictor measurements were available in fewer datasets: the Bello-Chavolla score was validated in seven clusters (see supplementary table S4), the model by Xie et al in nine clusters (see supplementary table S5), the DCS model by Zhang et al in six clusters (see supplementary table S6), the DCSL model by Zhang et al in six clusters (see supplementary table S7), the 4C Mortality Score in six clusters (see supplementary table S8), and the Wang laboratory model in three clusters (see supplementary table S9).

Discrimination

The 4C Mortality Score showed the highest discrimination, with a pooled C statistic of 0.80 (95% confidence interval 0.75 to 0.84, fig 3 and see supplementary fig S4). The heterogeneity of discrimination of this model across datasets (95% prediction interval 0.72 to 0.86) was low compared with that of the other models. The next best discriminating model was the Wang clinical model, with a pooled C statistic of 0.77 (0.73 to 0.80, fig 3), and a greater heterogeneity (95% prediction interval 0.63 to 0.87). Two other models attained a summary C statistic >0.70: the Xie model with a C statistic of 0.75 (0.68 to 0.80, 95% prediction interval 0.58 to 0.86, fig 3), and the Hu model with a C statistic of 0.74 (0.66 to 0.80, 95% prediction interval 0.41 to 0.92, fig 3). The summary C statistic estimates for the remaining models were <0.70 (see supplementary fig S1).
Fig 3

Pooled C statistic estimates with corresponding 95% confidence interval and approximate 95% prediction intervals for four models (see supplementary file for full data). The Knight et al 4C Mortality Score had a C statistic of 0.786 (95% confidence interval 0.78 to 0.79) in the development data and 0.767 (0.76 to 0.77) in the validation data in the original publication. The Wang et al clinical model had a C statistic of 0.88 (0.80 to 0.95) in the development data and 0.83 (0.68 to 0.93) in the validation data in the original publication. The Xie et al model had a C statistic of 0.89 (0.86 to 0.93) in the development data, 0.88 after optimism correction, and 0.98 (0.96 to 1.00) in the validation data in the original publication. The Hu et al model had a C statistic of 0.90 in the development data and 0.88 in the validation data in the original publication. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Pooled C statistic estimates with corresponding 95% confidence interval and approximate 95% prediction intervals for four models (see supplementary file for full data). The Knight et al 4C Mortality Score had a C statistic of 0.786 (95% confidence interval 0.78 to 0.79) in the development data and 0.767 (0.76 to 0.77) in the validation data in the original publication. The Wang et al clinical model had a C statistic of 0.88 (0.80 to 0.95) in the development data and 0.83 (0.68 to 0.93) in the validation data in the original publication. The Xie et al model had a C statistic of 0.89 (0.86 to 0.93) in the development data, 0.88 after optimism correction, and 0.98 (0.96 to 1.00) in the validation data in the original publication. The Hu et al model had a C statistic of 0.90 in the development data and 0.88 in the validation data in the original publication. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Calibration: observed to expected

The O:E ratio of the Xie model was the closest to 1, with a meta-analysis summary estimate of 0.96 (95% confidence interval 0.59 to 1.55, 95% prediction interval 0.21 to 4.28, fig 4), indicating on average the number of predicted deaths was in agreement with the number of observed deaths. However, the relatively wide 95% confidence interval and 95% prediction interval indicate some heterogeneity. The 4C Mortality Score attained an O:E ratio of 0.71 (0.45 to 1.11, 95% prediction interval 0.21 to 2.39, fig 4). The Hu model attained an O:E ratio of 0.61 (0.42 to 0.87, 95% prediction interval 0.15 to 2.48, fig 4). The Wang clinical model attained an O:E ratio of 0.65 (0.52 to 0.82, 95% prediction interval 0.23 to 1.89, fig 4). Supplementary figure S2 shows the O:E ratios of the other models and supplementary table S10 the calibration-in-the-large values.
Fig 4

Pooled observed to expected ratio estimates with corresponding 95% confidence interval and approximate 95% prediction interval for four models. Estimates are presented on the log scale. See supplementary file for full data. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Pooled observed to expected ratio estimates with corresponding 95% confidence interval and approximate 95% prediction interval for four models. Estimates are presented on the log scale. See supplementary file for full data. UCLH=University College London; DGAE=General Directorate of Epidemiology; KCH=King’s College Hospital

Calibration: slope

Supplementary material D figures S3 and S4 show the forest plots for all calibration slopes. The estimate for the calibration slope was the closest to 1 for the 4C Mortality Score (1.22, 95% confidence interval 0.92 to 1.52, 95% prediction interval 0.63 to 1.80). The Wang clinical model had a calibration slope of 0.50 (0.44 to 0.56, 95% prediction interval 0.34 to 0.66). The calibration slope for the Xie model was 0.45 (0.27 to 0.63, 95% prediction interval −0.07 to 0.96) and for the Hu model was 0.32 (0.15 to 0.49, 95% prediction interval −0.34 to 0.98). Supplementary material D presents details of the remaining models that were estimated.

Sensitivity analyses—meta-regression

In the meta-regression, where all performance estimates were regressed on the country and the prediction model, the point estimate of the discrimination was the highest for the 4C Mortality Score (reference) and lowest for the Wang laboratory model. Country wise, the point estimate was highest in Israel and lowest in Mexico (see supplementary material D for point estimates and supplementary table S11 for predicted C statistics for each country). The results for the predicted O:E ratio were less straightforward, as the predicted values for each model except the Wang laboratory model (see supplementary table S12) were greater than 1 in some countries and smaller than 1 in other countries. Similarly, this was the case for the predicted values of the calibration slopes for all models (see supplementary table S13). This implied that none of the included models were well calibrated to the data from all included countries. Supplementary table S14 shows the predicted calibration-in-the-large estimates.

Discussion

In our individual participant data meta-analysis we found that previously identified prediction models varied in their ability to discriminate between those patients admitted to hospital with covid-19 who will die and those who will survive. The 4C Mortality Score, the Wang clinical model, and the Xie model achieved the highest discrimination on average in our study and could therefore serve as starting points for implementation in clinical practice. The 4C Mortality Score could only be validated in six clusters, which might indicate limited usefulness in clinical practice. Whereas the discrimination of both Wang models and the Xie model was lower than in their respective development studies, the discrimination of the 4C Mortality Score was similar to the estimates in its development study. Although the summary estimates of discrimination performance are rather precise owing to the large number of included patients, some are prone to substantial between cluster heterogeneity. Discrimination varied greatly across hospitals and countries for all models, but least for the 4C Mortality Score. For some models the 95% prediction interval of the C statistic included 0.5, which implies that in some countries these models might not be able to discriminate between patients with covid-19 who survive or die during hospital admission. All models were prone to calibration issues. Most models tended to over-predict mortality on average, meaning that the actual death count was lower than predicted. The Xie model achieved O:E ratios closest to 1, but this model’s predicted risks were often too extreme: too high for high risk patients and too low for low risk patients, as quantified by the calibration slope, which was less than 1. The calibration slope was closest to 1 for the 4C Mortality Score, and this was the only model for which the 95% confidence interval included 1. All other summary calibration slopes were less than 1. This could be due to overfitting in the model development process. All the models were prone to substantial between cluster heterogeneity. This implies that local revisions (such as country specific or even centre specific intercepts) are likely necessary to ensure that risk predictions are sufficiently accurate. Implementing existing covid-19 models in routine care is challenging because the evolution and management of SARS-CoV-2 and the consequences of changes to the virus over time and across geographical areas. In addition, the studied models were developed and validated using data collected during periods of the pandemic, and general practice might have subsequently changed. As a result, baseline risk estimates of existing prediction models (eg, the intercept term) might have less generalisability than anticipated and might require regular updating, as shown in this meta-analysis. As predictor effects might also change over time or geographical region, a subsequent step might be to update these as well.40 Since most data originate from electronic health record databases, hospital registries offer a promising source for dynamic updating of covid-19 related prediction models.41 42 43 As data from new individuals become available, the prognostic models should be updated, as well as their performance in external validation sets.41 42 43

Limitations of this meta-analysis

All the models we considered were developed and validated using data from the first waves of the covid-19 pandemic, up to April 2021, mostly before vaccination was implemented widely. Since the gathering of data, treatments for patients with covid-19 have improved and new options have been introduced. These changes are likely to reduce the overall risk of short term mortality in patients with covid-19. Prediction models for covid-19 for which adequate calibration has previously been established may therefore still yield inaccurate predictions in contemporary clinical practice. This further highlights the need for continual validation and updates.43 An additional concern is that prediction models are typically used to decide on treatment strategies but do not indicate to what extent patients benefit from individualised treatment decisions. Although patients at high risk of death could be prioritised for receiving intensive care, it would be more practical to identify those patients who are most likely to benefit from such care. This individualised approach towards patient management requires models to predict (counterfactual) patient outcomes for all relevant treatment strategies, which is not straightforward.44 45 These predictions of patients’ absolute risk reduction require estimation of the patients’ short term risk of mortality with and without treatment, which might require the estimation of treatment effects that differ by patient.45 As variants of the disease emerge, new treatments are developed, and the disease is better managed, predictor effects and the incidence of mortality due to covid-19 may vary, thereby potentially limiting the predictive performance of the models we investigated. We only considered models for predicting mortality in patients with covid-19 admitted to hospital, as outcomes such as clinical deterioration might increase the risk of heterogeneity from variation in measurements and differences in definitions. Mortality, however, is commonly recorded in electronic healthcare systems, with limited risk for misclassification. Furthermore, it is an important outcome that is often considered in decision making. We had to use the reported nomograms to recover the intercepts for two prediction models from one group.31 32 Ideally, authors would have adhered to the TRIPOD guidelines, which would have facilitated the evaluation of their models.

Conclusion

In this large international study, we found considerable heterogeneity in the performance of the prognostic models for predicting short term mortality in patients admitted to hospital with covid-19 across countries. Caution is therefore needed in applying these tools for clinical decision making in each of these countries. On average, the observed number of deaths was closest to the predicted number of deaths by the Xie model. The 4C Mortality Score and Wang clinical model showed the highest discriminative ability compared with the other validated models. Although they appear most promising, local and dynamic adjustments (intercept and slope updates) are needed before implementation in routine care. The usefulness of the 4C Mortality Score may be affected by the limited availability of the predictor variables. Numerous prognostic models for predicting short term mortality in patients admitted to hospital with covid-19 have been published These models need to be compared head-to-head in external patient data On average, the 4C Mortality Score by Knight et al and the clinical model by Wang et al showed the highest discriminative ability to predict short term mortality in patients admitted to hospital with covid-19 In terms of calibration, all models require local updating before implementation in new countries or centres
  32 in total

Review 1.  Risk prediction models: II. External validation, model updating, and impact assessment.

Authors:  Karel G M Moons; Andre Pascal Kengne; Diederick E Grobbee; Patrick Royston; Yvonne Vergouwe; Douglas G Altman; Mark Woodward
Journal:  Heart       Date:  2012-03-07       Impact factor: 5.994

2.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination.

Authors:  George C M Siontis; Ioanna Tzoulaki; Peter J Castaldi; John P A Ioannidis
Journal:  J Clin Epidemiol       Date:  2014-10-23       Impact factor: 6.437

3.  Counterfactual clinical prediction models could help to infer individualized treatment effects in randomized controlled trials-An illustration with the International Stroke Trial.

Authors:  Tri-Long Nguyen; Gary S Collins; Paul Landais; Yannick Le Manach
Journal:  J Clin Epidemiol       Date:  2020-05-25       Impact factor: 6.437

4.  Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.

Authors:  Karel G M Moons; Douglas G Altman; Johannes B Reitsma; John P A Ioannidis; Petra Macaskill; Ewout W Steyerberg; Andrew J Vickers; David F Ransohoff; Gary S Collins
Journal:  Ann Intern Med       Date:  2015-01-06       Impact factor: 25.391

5.  pROC: an open-source package for R and S+ to analyze and compare ROC curves.

Authors:  Xavier Robin; Natacha Turck; Alexandre Hainard; Natalia Tiberti; Frédérique Lisacek; Jean-Charles Sanchez; Markus Müller
Journal:  BMC Bioinformatics       Date:  2011-03-17       Impact factor: 3.307

6.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement.

Authors:  Gary S Collins; Johannes B Reitsma; Douglas G Altman; Karel G M Moons
Journal:  BMC Med       Date:  2015-01-06       Impact factor: 8.775

Review 7.  Dynamic models to predict health outcomes: current status and methodological challenges.

Authors:  David A Jenkins; Matthew Sperrin; Glen P Martin; Niels Peek
Journal:  Diagn Progn Res       Date:  2018-12-18

8.  Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems?

Authors:  David A Jenkins; Glen P Martin; Matthew Sperrin; Richard D Riley; Thomas P A Debray; Gary S Collins; Niels Peek
Journal:  Diagn Progn Res       Date:  2021-01-11

9.  A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint.

Authors:  Jeroen Hoogland; Joanna IntHout; Michail Belias; Maroeska M Rovers; Richard D Riley; Frank E Harrell; Karel G M Moons; Thomas P A Debray; Johannes B Reitsma
Journal:  Stat Med       Date:  2021-08-16       Impact factor: 2.497

10.  An interactive web-based dashboard to track COVID-19 in real time.

Authors:  Ensheng Dong; Hongru Du; Lauren Gardner
Journal:  Lancet Infect Dis       Date:  2020-02-19       Impact factor: 25.071

View more
  1 in total

1.  Lack of Correlation Between Soluble Angiotensin-Converting Enzyme 2 and Inflammatory Markers in Hospitalized COVID-19 Patients with Hypertension.

Authors:  Melva Louisa; Daniel Cahyadi; Dina Nilasari; Vivian Soetikno
Journal:  Infect Drug Resist       Date:  2022-08-24       Impact factor: 4.177

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.