Literature DB >> 34331717

Development and validation of a simplified risk score for the prediction of critical COVID-19 illness in newly diagnosed patients.

Stanislas Werfel¹, Carolin E M Jakob^2,3, Stefan Borgmann⁴, Jochen Schneider^5,6, Christoph Spinner^5,6, Maximilian Schons², Martin Hower⁷, Kai Wille⁸, Martina Haselberger⁹, Hanno Heuzeroth¹⁰, Maria M Rüthrich¹¹, Sebastian Dolff¹², Johanna Kessel¹³, Uwe Heemann¹, Jörg J Vehreschild^2,3,13, Siegbert Rieg¹⁴, Christoph Schmaderer¹.

Abstract

Scores to identify patients at high risk of progression of coronavirus disease (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), may become instrumental for clinical decision-making and patient management. We used patient data from the multicentre Lean European Open Survey on SARS-CoV-2-Infected Patients (LEOSS) and applied variable selection to develop a simplified scoring system to identify patients at increased risk of critical illness or death. A total of 1946 patients who tested positive for SARS-CoV-2 were included in the initial analysis and assigned to derivation and validation cohorts (n = 1297 and n = 649, respectively). Stability selection from over 100 baseline predictors for the combined endpoint of progression to the critical phase or COVID-19-related death enabled the development of a simplified score consisting of five predictors: C-reactive protein (CRP), age, clinical disease phase (uncomplicated vs. complicated), serum urea, and D-dimer (abbreviated as CAPS-D score). This score yielded an area under the curve (AUC) of 0.81 (95% confidence interval [CI]: 0.77-0.85) in the validation cohort for predicting the combined endpoint within 7 days of diagnosis and 0.81 (95% CI: 0.77-0.85) during full follow-up. We used an additional prospective cohort of 682 patients, diagnosed largely after the "first wave" of the pandemic to validate the predictive accuracy of the score and observed similar results (AUC for the event within 7 days: 0.83 [95% CI: 0.78-0.87]; for full follow-up: 0.82 [95% CI: 0.78-0.86]). An easily applicable score to calculate the risk of COVID-19 progression to critical illness or death was thus established and validated.

Entities: Chemical

Keywords: COVID-19; logistic models; machine learning; risk factors

Mesh：

Substances：

Year: 2021 PMID： 34331717 PMCID： PMC8426905 DOI： 10.1002/jmv.27252

Source DB: PubMed Journal: J Med Virol ISSN： 0146-6615 Impact factor: 20.693

INTRODUCTION

The first human cases of coronavirus disease (COVID‐19) were described in December 2019 in Wuhan. COVID‐19 subsequently developed into one of the most disastrous pandemics experienced in our civilization since the Spanish flu at the beginning of the 20th century. , The exponential spread of the disease‐causing severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), as happened throughout Europe during the first wave of the pandemic, can result in excessive hospital overload and a shortage of healthcare resources, which may negatively impact patient outcomes. This experience underpinned the importance of an effective process to allocate limited healthcare resources to the COVID‐19 patients most likely to benefit. To guarantee functional patient care, disease severity assessment for patients presenting at the emergency department (ED) may prove useful and guide frontline physicians in the decision‐making process. A considerable number of patients deteriorate rapidly following hospital admission and require transfer to the intensive care unit (ICU). Conversely, the clinical conditions of other COVID‐19 patients improve rapidly. Therefore, a prediction model can guide physicians in determining whether patients require hospital admission or can be followed up in outpatient care. A risk assessment score may additionally be a useful tool to estimate the individual risk–benefit trade‐off for therapeutic interventions. This study aimed to develop a simplified risk prediction model based on clinical and demographic characteristics and laboratory findings at the time of COVID‐19 diagnosis to estimate the risk of clinical deterioration to critical illness. We used data from the Lean European Open Survey on SARS‐CoV‐2 (LEOSS) project, a prospective European multicenter cohort study.

METHODS

Study design and patient cohort

This analysis included patients who received care at a LEOSS partner site (inpatient or outpatient) beginning March 16, 2020. Cases documented in the LEOSS registry up till August 6, 2020 comprised the initial cohort, which was split into derivation and validation sets. Cases entered from August 7, 2020 to November 18, 2020 comprised the additional test sets (Figure 2A). The design of the LEOSS study and data acquisition was previously described.

Figure 2

Patient flow diagram (A) and months of COVID‐19 diagnosis (B) for the different data sets

Data were recorded anonymously, and no patient‐identifying data were stored. The requirement for written informed consent was therefore waived. Continuous parameters were categorized. To ensure anonymity at all stages of the analysis process, an individual LEOSS Scientific Use File (SUF) was created, which is based on the LEOSS public use file (PUF) principles, as described previously. Following these principles, a minor portion of patients and variable values were removed from the data set and set to missing values to ensure anonymization. Approval for LEOSS was obtained by the applicable local ethics committees of participating centers, and the study was registered at the publicly accessible German Clinical Trials Register (DRKS, No. DRKS00021145). All predictors included in the stability selection are listed in Tables 1 and S1. We predefined a combined endpoint of progression to critical disease or COVID‐19‐related death. The definitions of the disease phases are summarized in Figure 1. The baseline (Day 0) was defined as the day of the first positive SARS‐CoV‐2 test. Only baseline predictors were included in the analysis (laboratory values collected within 48 h of diagnosis). If no CT was conducted within 48 h of positive testing, an exception was made and those CT‐scan variables collected after this time but during the same clinical phase that was present at baseline were included. We additionally calculated a separate predictor describing whether the patient had cardiovascular (CV) comorbidities, defined as any of the following: history of (H/O) myocardial infarction, aortic stenosis, atrioventricular block, carotid artery disease, chronic heart failure, peripheral vascular disease, hypertension, atrial fibrillation or coronary artery disease. An additional variable was also calculated for neurological comorbidities, defined as any of the following reported for the patient: hemiplegia, dementia, cerebrovascular disease or stroke, multiple sclerosis, myasthenia gravis, neuromyelitis optica spectrum disorder (NMOSD), movement disorder (e.g., Parkinson's disease, dystonia, ataxia, and tremor), motor neurone diseases (e.g., amyotrophic lateral sclerosis, and spinal muscular atrophy), other neurological autoimmune diseases and other prior neurological diagnoses. We defined a predictor for any malignant neoplastic disease as any of the following: H/O lymphoma, leukemia, solid tumor, solid metastasized tumor, and stem cell transplantation.

Table 1

Characteristics of patients in the derivation and validation data sets

Predictor	Deriv.	Valid.	Test, f.	Test, l.	Predictor	Deriv.	Valid.	Test, f.	Test, l.
Total patients					CRP (mg/L)
	1297	649	682	219	<3	181 (14%)	101 (16%)	97 (14%)	37 (17%)
Event during follow‐up (7d/all)					3–29	454 (35%)	222 (34%)	250 (37%)	80 (37%)
No	1095/1036	555/522	613/597	198/190	30–69	266 (21%)	132 (20%)	140 (21%)	47 (21%)
	(84%/80%)	(86%/80%)	(90%/88%)	(90%/87%)	70–119	166 (13%)	85 (13%)	92 (13%)	28 (13%)
Yes	202/261	94/127	69/85	21/29	120–179	124 (10%)	55 (8%)	67 (10%)	18 (8%)
	(16%/20%)	(14%/20%)	(10%/12%)	(10%/13%)	180–249	52 (4%)	26 (4%)	18 (3%)	6 (3%)
Type of patient care (not used for analyses)					>249	32 (2%)	17 (3%)	6 (1%)	0 (0%)
Outpatient	16 (1%)	11 (2%)	9 (1%)	1 (0%)	Missing	22 (2%)	11 (2%)	12 (2%)	3 (1%)
Inpatient	1255 (97%)	627 (97%)	648 (95%)	207 (95%)	PCT (ng/ml)
Missing	26 (2%)	11 (2%)	25 (4%)	11 (5%)	<0.005	78 (6%)	28 (4%)	27 (4%)	12 (5%)
Age (year)					0.005–0.5	562 (43%)	282 (43%)	367 (54%)	161 (74%)
≤25	22 (2%)	17 (3%)	36 (5%)	9 (4%)	0.51–2	58 (4%)	35 (5%)	28 (4%)	10 (5%)
26–35	78 (6%)	42 (6%)	64 (9%)	29 (13%)	2.1–10	0 (0%)	0 (0%)	13 (2%)	5 (2%)
36–45	105 (8%)	50 (8%)	86 (13%)	29 (13%)	>10	10 (1%)	6 (1%)	4 (1%)	1 (0%)
46–55	189 (15%)	98 (15%)	104 (15%)	38 (17%)	Missing	589 (45%)	298 (46%)	243 (36%)	30 (14%)
56–65	244 (19%)	117 (18%)	120 (18%)	45 (21%)	D‐dimer (LN)
66–75	214 (16%)	118 (18%)	89 (13%)	25 (11%)	Normal	232 (18%)	123 (19%)	158 (23%)	83 (38%)
76– 85	317 (24%)	140 (22%)	133 (20%)	30 (14%)	>1x, ≤2x	211 (16%)	109 (17%)	126 (18%)	72 (33%)
>85	110 (8%)	59 (9%)	47 (7%)	13 (6%)	>2x, ≤5x	159 (12%)	69 (11%)	72 (11%)	34 (16%)
Missing	18 (1%)	8 (1%)	3 (0%)	1 (0%)	>5x, ≤10x	39 (3%)	27 (4%)	24 (4%)	9 (4%)
Sex					>10x, ≤20x	20 (2%)	11 (2%)	8 (1%)	2 (1%)
Male	768 (59%)	360 (55%)	390 (57%)	133 (61%)	>20x	21 (2%)	12 (2%)	6 (1%)	4 (2%)
Female	529 (41%)	289 (45%)	292 (43%)	86 (39%)	Missing	615 (47%)	298 (46%)	288 (42%)	15 (7%)
Disease phase					Neutrophils (×1000/μl)
Uncompl.	876 (68%)	430 (66%)	488 (72%)	162 (74%)	<0.1	11 (1%)	3 (0%)	4 (1%)	1 (0%)
Compl.	421 (32%)	219 (34%)	194 (28%)	57 (26%)	0.1 to <0.3	14 (1%)	3 (0%)	2 (0%)	0 (0%)
Any cardiovascular comorbidity					0.3 to <0.5	22 (2%)	10 (2%)	2 (0%)	0 (0%)
Yes	727 (56%)	370 (57%)	346 (51%)	104 (47%)	0.5 to <2	118 (9%)	62 (10%)	47 (7%)	15 (7%)
No	545 (42%)	262 (40%)	326 (48%)	113 (52%)	2 to <5	524 (40%)	262 (40%)	275 (40%)	105 (48%)
Missing	25 (2%)	17 (3%)	10 (1%)	2 (1%)	5 to <9	262 (20%)	139 (21%)	144 (21%)	54 (25%)
Malignant neoplasia					≥9	71 (5%)	40 (6%)	39 (6%)	6 (3%)
No	1263 (97%)	635 (98%)	678 (99%)	218 (100%)	Missing	275 (21%)	130 (20%)	169 (25%)	38 (17%)
Yes	34 (3%)	14 (2%)	4 (1%)	1 (0%)	Lymphocytes (×1000/μl)
LDH (LN)					<0.1	16 (1%)	8 (1%)	7 (1%)	1 (0%)
<Normal	0 (0%)	0 (0%)	8 (1%)	2 (1%)	0.1 to <0.3	56 (4%)	30 (5%)	18 (3%)	1 (0%)
Normal	439 (34%)	218 (34%)	249 (37%)	98 (45%)	0.3 to <0.5	95 (7%)	43 (7%)	33 (5%)	9 (4%)
>1x, ≤2x	596 (46%)	312 (48%)	305 (45%)	95 (43%)	0.5 to <0.8	230 (18%)	124 (19%)	118 (17%)	39 (18%)
>2x, ≤5x	87 (7%)	51 (8%)	38 (6%)	11 (5%)	0.8 to <1.5	421 (32%)	212 (33%)	231 (34%)	94 (43%)
>5x	4 (0%)	1 (0%)	3 (0%)	2 (1%)	1.5 to <3	198 (15%)	104 (16%)	100 (15%)	34 (16%)
Missing	171 (13%)	67 (10%)	79 (12%)	11 (5%)	≥3	15 (1%)	13 (2%)	17 (2%)	4 (2%)
Urea (LN)					Missing	266 (21%)	115 (18%)	158 (23%)	37 (17%)
<Normal	8 (1%)	9 (1%)	33 (5%)	8 (4%)
Normal	846 (65%)	408 (63%)	445 (65%)	173 (79%)
>1x, ≤2x	195 (15%)	106 (16%)	89 (13%)	26 (12%)
>2x	63 (5%)	32 (5%)	30 (4%)	8 (4%)
Missing	185 (14%)	94 (14%)	85 (12%)	4 (2%)

Abbreviations: 7d, event (critical phase or COVID‐19‐related death) within 7 days of diagnosis; CRP, C‐reactive protein; LDH, lactate dehydrogenase; LN, laboratory normal range, “x” indicates multiples of the upper limit of the normal range; PCT, procalcitonin; Test, f., full test set (as shown in Figure 2); Test, l., limited test set (as shown in Figure 2).

Figure 1

Definition of COVID‐19 disease phases in the LEOSS registry. Patients were assigned to the highest phase for which at least one characteristic was fulfilled. ALT, alanine transaminase; AST, aspartate transaminase; INR, international normalized ratio of prothrombin time; PaO2, partial pressure of oxygen in arterial blood; qSOFA, quick sequential organ failure assessment score; sO2, blood oxygen saturation; ULN, upper limit of normal Characteristics of patients in the derivation and validation data sets Abbreviations: 7d, event (critical phase or COVID‐19‐related death) within 7 days of diagnosis; CRP, C‐reactive protein; LDH, lactate dehydrogenase; LN, laboratory normal range, “x” indicates multiples of the upper limit of the normal range; PCT, procalcitonin; Test, f., full test set (as shown in Figure 2); Test, l., limited test set (as shown in Figure 2).

Statistical analysis

All analyses were performed using R (version 3.6.3). Random forest (RF) analyses (including missing value imputations and individual Boruta stability selection steps) were calculated using the “randomForestSRC” package by Ishwaran and Kogalur. Among the available baseline variables of the LEOSS data set (≈170 predictors), we selected those with <50% missing values among the combined derivation and validation data set (n = 1946 patients, Figure 2), with the exception of troponin T (52% missing) and pancreatic lipase (56% missing). This resulted in a total of 104 predictors (Tables 1 and S1). The time‐to‐event data in the anonymized LEOSS cohort was grouped for patients experiencing an event at ≥8 days after study inclusion, the time variable was coded accordingly as 1–7 days and ≥8 days, resulting in eight bins for the time variable (Table S1). These were used for the time‐to‐event approaches: random survival forest and Cox models, and for C‐index calculation. Continuous predictors were binned as value ranges in the LEOSS cohort due to anonymization, and the ranges were coded as consecutively increasing integers. Patient flow diagram (A) and months of COVID‐19 diagnosis (B) for the different data sets We performed RF missing value imputation using multivariate unsupervised splitting as described by Tang and Ishwaran and two iterations per imputation. An RF approach has been previously shown to be the method of choice for ordinal variables, which are the main target of imputation in our data set (because continuous variables were categorized). The imputations were performed either for the data of the combined derivation and validation data set (n = 1946 patients) or, separately, the full test set (n = 682 patients, Figure 2), while withholding the outcome variables. Twenty imputed data sets were thus generated for each cohort. We performed a split into a derivation and validation cohort with similar characteristics based on the following predefined potential confounders: age, sex, presence of dyspnea, neutrophil count, lymphocyte count, lactate dehydrogenase (LDH), bilirubin, CRP, procalcitonin (PCT), D‐dimer, H/O malignant neoplasia, presence of CV comorbidity (as defined above) and the number of events. We performed 1000 random splits at 2/3 and 1/3 ratios and calculated the standardized mean difference for each split, selecting the split with the smallest maximal standardized mean difference between these predictors. Variable selection was performed using the Boruta algorithm at 100 iterations using equal proportions of the 20 imputed derivation data sets and a p value of 0.01 for selection. For the classification RFs, we used the presence of an event (critical phase or COVID‐19‐related death) within 7 days of diagnosis as the outcome of interest during Boruta selection. We used the balanced method by Chen et al. both during Boruta selection and modeling with the selected variables. We used survival random forest (RSF) as described by Ishwaran et al., during Boruta selection, and during the final modeling of time‐to‐event data. As RSFs take time to event into account, events occurring beyond 7 days after diagnosis were also included. Variable importance was calculated using permutation. For Cox and logistic (binomial) regression models, we performed ridge (L2) penalization optimized using 20× fold cross‐validation on the imputed derivation data sets. Score values were calculated from the ridge penalized binomial regression coefficients of the model containing the five selected predictors on the derivation data set with missing values replaced with the most common value of the 20 imputed data sets for this patient, and predictor and event within 7 days as the outcome. Finally, the regression coefficients were divided by the smallest value and rounded to the next integer. Two‐sided p values for the binomial ridge penalized coefficients were obtained as suggested by Cule et al., by repeating the ridge regression procedure on a data set with randomly permuted outcomes 1000 times (using equal numbers of the 20 imputed data sets). The area under the receiver operating characteristics curve (AUC) and Harrell's C‐indices were calculated using linear predictors from the binomial and Cox ridge‐penalized regression models or out‐of‐bag predictor estimates for the RF approaches. The 95% confidence intervals (CIs) for AUC and C‐indices were calculated using 1000 bootstraps of patients' scores using equal contributions from the imputed data sets.

RESULTS

Patient population

Important characteristics of the LEOSS cohort were previously described. More diagnosed SARS‐CoV‐2 cases were available for the current analysis compared with the previous report (2969 in the first data set, patients from the first wave of the pandemic, and 1233 patients in the second test set; Figure 2). Based on the predefined disease phase (Figure 1) and the availability of laboratory values, a total of 1946 patients were included in the first round of analysis and assigned to derivation and validation groups with similar characteristics (Figure 2). Important characteristics are summarized in Table 1, with a summary of the remaining predictors provided in Table S1. The age distribution in the first data set was centered, with approximately equal contributions of patients aged ≤65 and >65 years. There were more men than women (55%–59% vs. 41%–45%). At least 56% presented with known CV comorbidity. The incidence of the combined endpoint, critical phase or COVID‐19‐related death within 7 days was 14%–16%, and 20% when including any time point during the follow‐up period (Table 1). From the second test set (patients entered into the registry after the first data export for score derivation), 682 patients fulfilled the selection criteria. This set largely consisted of patients diagnosed after June 2020 (Figure 2). Compared with the derivation/validation cohorts, the patients were younger (60% ≤65 years) and more were diagnosed during an uncomplicated phase (72% vs. 64%–68%). Consequently, the event rate was lower, with only 10% experiencing an event within 7 days of diagnosis and 12% during follow‐up (Table 1). Both the derivation and validation data sets consisted almost exclusively of patients receiving inpatient care.

Predictor selection

We performed Boruta variable stability selection using RF for classification, resulting in the selection of 5 (out of 104) predictors (Table 2). These were CRP, disease phase, age, serum urea, and D‐dimer levels (Figure S1A). Interestingly, including only these five predictors in a logistic regression model achieved results almost on par with the full set of variables (Table 2, “RF Boruta,” binomial ridge, median AUC = 0.81 in the validation cohort).

Table 2

Summary of the predictive performances of the analyzed models

			AUC, 7d (imp. range)		AUC, all (imp. range)
Selection	Model	N pr.	Derivation	Validation	Derivation	Validation
All pr.	RF	104	0.83 (0.82–0.83)	0.83 (0.82–0.83)	0.83 (0.82–0.83)	0.83 (0.82–0.83)
	Binomial ridge	104	0.88 (0.87–0.89)	0.81 (0.80–0.81)	0.86 (0.86–0.87)	0.81 (0.80–0.82)
RF Boruta	RF	5	0.74 (0.72–0.75)	0.73 (0.71–0.76)	0.73 (0.72–0.75)	0.74 (0.73–0.77)
	Binomial ridge	5	0.80 (0.80–0.80)	0.81 (0.81–0.81)	0.80 (0.80–0.80)	0.81 (0.81–0.81)
	Score	5	0.80 (0.80–0.80)	0.81 (0.81–0.81)	0.80 (0.80–0.80)	0.81 (0.81–0.81)
	Score	5	95% CI, 0.77–0.83	95% CI, 0.77–0.85	95%CI, 0.77–0.83	95% CI, 0.77–0.85
	Validation on the full test set			0.83 (0.82–0.83)		0.82 (0.82–0.83)
	Validation on the full test set			95% CI, 0.78–0.87		95% CI, 0.78–0.86
	Validation on the limited test set			0.82 (0.82–0.82)		0.83 (0.83–0.84)
	Validation on the limited test set			95% CI, 0.73–0.90		95% CI, 0.76–0.90

Note: Initial derivation and validation analyses were performed on the respective data sets (n = 1297 and 649, respectively) as summarized in Figure 2. As indicated, the final score was additionally independently validated on the full and the limited test sets (n = 682 and 219, as described in Figure 2). Indicated are the median values and the full range for the imputed data sets (in brackets). AUC values were calculated for an event within 7 days of diagnosis (“7d”) and for all time points (“all”). 95% confidence intervals (95% CI) were calculated for score predictions using bootstrapping with equal contributions of the imputed data sets. Results of the performance of the final score (median AUC and 95% CI) in the resprective validation and test datasets are highlighted in bold.

Abbreviations: AUC, area under the receiver operating characteristic (ROC) curve; imp., imputation; N pr., number of predictors in the model; pr., predictors; RF, random forest for classification.

Summary of the predictive performances of the analyzed models Note: Initial derivation and validation analyses were performed on the respective data sets (n = 1297 and 649, respectively) as summarized in Figure 2. As indicated, the final score was additionally independently validated on the full and the limited test sets (n = 682 and 219, as described in Figure 2). Indicated are the median values and the full range for the imputed data sets (in brackets). AUC values were calculated for an event within 7 days of diagnosis (“7d”) and for all time points (“all”). 95% confidence intervals (95% CI) were calculated for score predictions using bootstrapping with equal contributions of the imputed data sets. Results of the performance of the final score (median AUC and 95% CI) in the resprective validation and test datasets are highlighted in bold. Abbreviations: AUC, area under the receiver operating characteristic (ROC) curve; imp., imputation; N pr., number of predictors in the model; pr., predictors; RF, random forest for classification. We additionally performed a Boruta stability selection using an RSF approach. Twenty‐four predictors were retained, with the five predictors from RF Boruta among the most important variables (Figure S1B). Increasing the number of predictors from 5 to 24 had a minor impact on the model's performance in the validation data set as measured by Harrell's C‐index (median C‐index: 0.76 vs. 0.77, respectively; Table S2).

Derivation and validation of a simplified predictive score

Based on the encouraging results and simple interpretability, we used the coefficients obtained in the binomial ridge regression model with five predictors (Table 3) to derive an additive score to predict COVID‐19 progression to the critical phase or death. The score is listed in Table 4. It exhibited similar performance as the binomial model in both the derivation and validation data sets (median AUC in validation data set for events within 7 days of diagnosis: 0.81, 95% CI: 0.77–0.85; for all events, 0.81, 95% CI: 0.77–0.85; Table 2). Interestingly, the simplified score yielded a similar performance as a Cox regression or an RSF approach with both 5 and 24 predictors as measured by Harrell's C‐index (median C‐index of 0.76, 95% CI: 0.73–0.80 in the validation cohort; Table S2).

Table 3

Results of the ridge‐penalized binomial regression on the five variables selected by RF Boruta

Predictor	Ridge β	p value	Weight
Age	0.07	0.024	1
Disease phase	0.40	0.003	5
Urea	0.26	0.013	3
CRP	0.14	0.002	2
D‐dimer	0.09	0.041	1

Note: Indicated are β coefficients from binomial ridge regression (outcome: event within 7 days) and the resulting weights per step increase in the respective predictor group (all groups are listed in Table 4). p values were calculated using ridge regression on the derivation data set with permutations of the outcome.

Abbreviation: CRP, C‐reactive protein.

Table 4

Calculation of the CAPS‐D score

Predictor	Score	Predictor	Score
Age (year)		CRP (mg/L)
≤25	‐	<3	‐
26–35	+1	3–29	+2
36–45	+2	30–69	+4
46–55	+3	70–119	+6
56–65	+4	120–179	+8
66–75	+5	180–249	+10
76–85	+6	>249	+12
>85	+7	Disease phase
D‐dimer (LN)		Uncomplicated	‐
Normal	‐	Complicated	+5
>1x, ≤2x	+1	Urea (LN)
>2x, ≤5x	+2	<Normal	‐
>5x, ≤10x	+3	Normal	+3
>10x, ≤20x	+4	>1x, ≤2x	+6
>20x	+5	>2x	+9
		Maximum score	38

Abbreviations: CRP, C‐reactive protein; LN, laboratory normal range, “x” indicates multiples of the upper limit of the normal range.

Results of the ridge‐penalized binomial regression on the five variables selected by RF Boruta Note: Indicated are β coefficients from binomial ridge regression (outcome: event within 7 days) and the resulting weights per step increase in the respective predictor group (all groups are listed in Table 4). p values were calculated using ridge regression on the derivation data set with permutations of the outcome. Abbreviation: CRP, C‐reactive protein. Calculation of the CAPS‐D score Abbreviations: CRP, C‐reactive protein; LN, laboratory normal range, “x” indicates multiples of the upper limit of the normal range. We used the second test set of patients whose data were entered into the registry after the initial data export (n = 682 patients, “full test set” in Figure 2) as an independent prospective validation group. To further reduce the impact of missing values on the estimation of score performance, we additionally removed patients from centers with >20% missing values for D‐dimer, the variable with the most missing values (42%–47% missing). Centers that enrolled <5 patients were also excluded, which produced an additional “limited test set” (n = 219 patients; Figure 2). This data set had few missing values (CRP, 1%; serum urea, 2%; D‐dimer, 7% missing; Table 2). In both full and limited test sets, we confirmed the similar performance of the developed scoring system, with a trend toward higher AUC and C‐index values compared with the validation data set (full test set, median AUC for 7 days: 0.83, 95% CI: 0.78–0.87; all events: AUC 0.82, 95% CI: 0.78–0.86; limited test set, median AUC for 7 days: 0.82, 95% CI, 0.73–0.90; all events: AUC 0.83, 95% CI, 0.76–0.90; Table 2; median C‐index for full test set: 0.80, 95% CI, 0.76–0.84; limited test set: 0.81, 95% CI: 0.74–0.87; Table S2). Depending on the clinical application, different cut‐off values may be considered. Therefore, we provide the predictive metrics of the score, such as sensitivity, specificity, and positive and negative predictive values (PPV and NPV) versus the cut‐off (Figure 3), as well as the absolute event risks for specific score values (Figure S2).

Figure 3

Summary of key characteristics of the score for predicting the combined endpoint of critical phase or COVID‐19‐related death (A) within 7 days of the diagnosis or (B) at any time point during follow‐up in the validation and test cohorts. Color codes distinguish the different data sets as indicated. Sensitivity and NPV are indicted by continuous lines and the corresponding y axis scaling on the left, while specificity and PPV are indicated by dashed lines and y axis scaling on the right side of the respective panels. Bottom panels show cumulative fractions of patients meeting respective score cut‐offs for the combined validation and full test set (combined n = 1331). For all panels, the median score (rounded to the next whole integer) of the imputations was calculated for patients with missing values Apart from the discriminative performance, we observed good calibration with a slope ranging from 0.949 to 1.113 in the different validation/test data sets (Figure S3). Interestingly, the Brier score was tendentially smaller in the “full test” compared to the validation data set (0.076–0.091 vs. 0.106–0.124, respectively; Figure S3), mirroring the tendency toward better discriminative performance in this data set (Tables 2 and S2). Calibration‐in‐the‐large for the “full test” set, which yielded a lower event per case rate, was similar to that in the validation set for an event within 7 days (intercept: −0.160 vs. −0.174, respectively), but lower for all events (intercept: −0.314 vs. 0.010, respectively, potentially reflecting the differences in event rates between the cohorts). One method for selecting a cut‐off is to optimize the modified Youden's J. For the proposed score, the optimal J in the combined validation and full test data set was at a cut‐off of ≥17, both for predictions at 7 days after diagnosis and for all events. Applying this cut‐off, on average, 69% of patients were predicted not to progress to critical illness (Table 5, combined validation/test data set) at an NPV of 95% for 7 days after diagnosis and an NPV of 94% for full follow‐up. Patients with scores at or above this threshold had ~3‐fold increased odds of experiencing an event, whereas patients below this threshold had ~3‐fold decreased odds as measured by the respective likelihood ratios (Table 5).

Table 5

Score characteristics at the selected cut‐off of ≥17

	Validation set (7d/all)	Full test set (7d/all)	Combined (7d/all)
Sensitivity	0.73/0.73	0.74/0.72	0.74/0.73
Specificity	0.72/0.75	0.77/0.79	0.75/0.77
PPV	0.31/0.41	0.27/0.32	0.29/0.37
NPV	0.94/0.92	0.96/0.95	0.95/0.94
LR+	2.6/2.9	3.3/3.3	2.9/3.1
LR−	0.37/0.36	0.34/0.36	0.35/0.36
%score < cut‐off	65%	72%	69%

Abbreviations: 7d, event (critical disease or COVID‐19‐related death) within 7d of diagnosis; all, all events during follow‐up; LR+/−, positive/negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; %score < cut‐off, percentage of patients with scores below the cut‐off value (≤16).

Score characteristics at the selected cut‐off of ≥17 Abbreviations: 7d, event (critical disease or COVID‐19‐related death) within 7d of diagnosis; all, all events during follow‐up; LR+/−, positive/negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; %score < cut‐off, percentage of patients with scores below the cut‐off value (≤16).

DISCUSSION

We describe the derivation and validation of a COVID‐19 risk score for the prediction of the combined endpoint of critical disease or COVID‐19‐related death using five predictors. We derive the score in an untargeted manner by selecting the most stable predictors among over 100 available at baseline in the LEOSS registry in an RF approach and using regularized regression to calculate the coefficients. A number of approaches for COVID‐19 risk stratification have been reported (reviewed by Wynants et al. ); several had a similar aim of predicting critical disease, as indicated by admission to the ICU , , , or death. , , , The availability of factors such as hospital or ICU beds was limited during the height of the pandemic with the resulting strain on healthcare systems. Thus, difficulties in generalizing outcome predictions obtained under these constraints in the currently available scores may arise. Some important limiting factors must be considered. If hospital beds are limited, the study population for inpatient analyses may overrepresent patients with symptoms of exceptional severity and high‐risk groups, which may limit generalizability. Similarly, if ICU resources are limited, the indications for admission may be more conservative; a patient may be identified as having a favorable outcome (not admitted to ICU) despite having fulfilled the clinical criteria at some point. Another important consideration is the generalizability of mortality as an outcome in patient stratification. Case fatality rates differ widely across countries, perhaps partly attributed to country‐specific differences in the clinical management of COVID‐19 patients and to resource availability during the first wave of the pandemic. This may limit generalizability and potentially require an update to existing scores for mortality prediction, as care providers gain experience with COVID‐19 management and the strain on hospitals is reduced. A previous review on COVID‐19 prognosis scores came to an overall negative assessment of the potential bias of these scores, which discouraged their use. To our knowledge, a combination of characteristics sets our approach apart from those available at the time of writing making it potentially more generalizable for clinical application: (a) the outcome was not defined in terms of a specific treatment (or lack thereof, i.e., admission to the ICU), but based on clinical features (a predefined “critical phase”); (b) the inclusion was based on predefined clinical criteria (“uncomplicated” or “complicated” phase), and (c) the use of a stability selection approach to reduce the number of predictors, as discussed below. Additionally, the majority (>90%) of patients enrolled in the LEOSS cohort were from Germany, where the capacity of the healthcare system was not generally exceeded during the first wave of the pandemic. To address bias in predictor selection, we used an untargeted approach and resampling techniques (stability selection and cross‐validated ridge regression) to internally test the predictions on the derivation data set and then validate them on a withheld validation cohort. Stability selection aids in ensuring the internal validity and adequate sample size for the derivation data set; too small a sample will typically reduce variable stability and lead to fewer variables being selected. Ridge regression shrinks the regression coefficients to achieve improved predictions in a binomial model with internal (cross‐) validation in the derivation data set. We successfully confirmed the performance of our score in an independent test set, consisting of the majority of COVID‐19 cases diagnosed after the first wave of the pandemic. An important contributor to the predictive performance of the final score was the predefined clinical phase (“complicated” vs. “uncomplicated”), which summarizes the presence of manifest organ involvement of the lungs, heart, or liver. Of note, some parameters of the complicated phase, such as arterial partial pressure of oxygen (PaO2) and pericardial effusion, were acquired by indication (e.g., if an echocardiography or arterial blood gas analysis was performed, but not routinely). Therefore, for phase assignment, these do not have to be taken into account in the absence of an indication for the respective measurement. Serum urea, likely as a measure of kidney involvement, was an important predictor and outperformed creatinine previously for mortality. , This predictor potentially summarizes both pre‐existing chronic kidney disease as a risk factor (Williamson et al. ; Figure S1B) and acute kidney injury (AKI) due to COVID‐19 as organ involvement (also stable in RSF Boruta; Figure S1B). Different mechanisms of AKI in COVID‐19 patients have been observed, including indirect involvement, such as cardiorenal syndrome, direct virus‐induced injury, and immunological causes such as complement activation. , Differentiating the type of acute kidney involvement in COVID‐19 patients may provide further insights and refine risk stratification in future analyses. Overall, our score, despite being limited to five predictors and applying a point system, compared favorably to more complex prediction models. , We suggested a threshold for patients with an increased risk of critical disease at ≥17 points, based on the modified Youden's J. At this threshold, we obtained a positive likelihood ratio of threefold while retaining a good negative predictive value of 94%–95%. Different cut‐offs may be considered based on the application and local circumstances (e.g., different local ratios of critical disease per case, and travel time to the next hospital in case of deterioration in an outpatient setting). The graphs provided in Figure 3 for sensitivity/specificity and PPV/NPV (based on the prevalence in the validation and test data sets) as well as in Figure S2 for absolute risk prediction may assist in determining such thresholds.

Limitations

Our study had several limitations. The LEOSS registry is anonymized, and continuous parameters are categorized, thus potentially reducing the predictive performance of laboratory measures. As a real‐world data set, given the heterogeneity of clinical procedures across centers, our analysis had to compensate for missing values. This typically reduces the predictive performance of the respective variables and the probability that they pass the stability selection criteria. Therefore, some predictors may have been underestimated or missed. Our analysis was limited to predicting disease progression with information obtained at the time of the first positive SARS‐CoV‐2 testing (typically occurring during presentation at the medical facility), without considering the dynamics of the predictors. The days since the onset of symptoms (uncomplicated phase) to the diagnosis were included as a variable; however, the stability criteria were not met. In addition, there were differences between the validation and the test data set, with the latter having a higher proportion of patients diagnosed in the uncomplicated phase (suggesting earlier diagnosis, possibly due to expanded testing capacities after the first wave). Nevertheless, the score exhibited similar or better performance in the test set. This indirect evidence suggests that the application of our score may be valid for time points after diagnosis (or initial presentation), such as if the patient's condition or laboratory values deteriorate. Further studies are required to evaluate its suitability in such settings. No information on patient race/ethnicity was available for this analysis, and it may be assumed that the distribution follows that in the German population and represents largely Caucasians, which may limit generalizability. External validation in different patient populations is therefore required, also with regard to socioeconomic factors and local standards of care. Extensive information on comorbid conditions for study participants was available. Although some passed the criteria in RSF stability selection, none passed the RF stability criteria. However, more predictors (24 vs. 5) did not improve the overall predictive performance. This suggests that the increased risk due to these comorbidities may already be reflected by the remaining five predictors (collinearity), thereby relieving the need for inclusion in the score. However, this may not hold true for less common comorbidities, as the overall prediction improvement will be low for low prevalence predictors, even if they strongly affect the patients suffering from these comorbidities. A score based on the total population, as presented here, may underestimate high‐risk constellations due to rare comorbidities, such as specific cancers or autoimmune diseases/immunosuppressive treatments. To our knowledge, this limitation applies to most, if not all, available COVID‐19 prognosis scores derived from the total population. Unfortunately, these patients may deteriorate rapidly. It is therefore important to establish the additional risk for such rare conditions in addition to the score used in future studies.

CONFLICT OF INTERESTS

Dr. Spinner reports grants, personal fees, and nonfinancial support from Gilead Sciences, grants and personal fees from Janssen‐Cilag, personal fees from Formycon, other from Aperion, other from Eli Lilly, during the conduct of the study; personal fees from AbbVie, personal fees from MSD, grants and personal fees from GSK/ViiV Healthcare outside the submitted work. Dr. Rüthrich reports grants from the IZKF outside the submitted work. Dr. Vehreschild reports personal fees from Merck/MSD, Gilead, Pfizer, Astellas Pharma, Basilea, German Centre for Infection Research (DZIF), University Hospital Freiburg/Congress and Communication, Academy for Infectious Medicine, University Manchester, German Society for Infectious Diseases (DGI), Ärztekammer Nordrhein, University Hospital Aachen, Back Bay Strategies, German Society for Internal Medicine (DGIM), and grants from Merck/MSD, Gilead, Pfizer, Astellas Pharma, Basilea, German Centre for Infection Research (DZIF), German Federal Ministry of Education and Research (BMBF), (PJ‐T: DLR), University of Bristol, Rigshospitalet Copenhagen. The remaining authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS

Jörg J. Vehreschild: Initiation and leading of LEOSS. Jörg J. Vehreschild, Carolin E. M. Jakob, and Maximilian Schons: Developing and maintaining LEOSS. Stanislas Werfel, Christoph Schmaderer, and Christoph Spinner: Conception of this study and critical data interpretation. Stanislas Werfel: Machine learning and statistical analyses, generation of tables and figures and manuscript preparation. Carolin E. M. Jakob: Data management, extraction, and additional statistical analyses. Uwe Heemann and Jochen Schneider: Data interpretation and critical revision of the manuscript. Stefan Borgmann, Jochen Schneider, Martin Hower, Kai Wille, Martina Haselberger, Hanno Heuzeroth, Maria M. Rüthrich, Sebastian Dolff, Johanna Kessel, and Siegbert Rieg: Acquisition of data; all authors revised and approved the final version of the manuscript. Supporting information. Click here for additional data file.

24 in total

1. The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve.

Authors: Neil J Perkins; Enrique F Schisterman
Journal: Am J Epidemiol Date: 2006-01-12 Impact factor: 4.897

2. Missing value imputation in high-dimensional phenomic data: imputable or not, and how?

Authors: Serena G Liao; Yan Lin; Dongwan D Kang; Divay Chandra; Jessica Bon; Naftali Kaminski; Frank C Sciurba; George C Tseng
Journal: BMC Bioinformatics Date: 2014-11-05 Impact factor: 3.169

3. Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19.

Authors: Carolin E M Jakob; Florian Kohlmayer; Thierry Meurers; Jörg Janne Vehreschild; Fabian Prasser
Journal: Sci Data Date: 2020-12-10 Impact factor: 6.444

Review 4. Management of acute kidney injury in patients with COVID-19.

Authors: Claudio Ronco; Thiago Reis; Faeq Husain-Syed
Journal: Lancet Respir Med Date: 2020-05-14 Impact factor: 30.700

Review 5. Back to the Future: Lessons Learned From the 1918 Influenza Pandemic.

Authors: Kirsty R Short; Katherine Kedzierska; Carolien E van de Sandt
Journal: Front Cell Infect Microbiol Date: 2018-10-08 Impact factor: 5.293

6. Factors associated with COVID-19-related death using OpenSAFELY.

Authors: Elizabeth J Williamson; Alex J Walker; Krishnan Bhaskaran; Seb Bacon; Chris Bates; Caroline E Morton; Helen J Curtis; Amir Mehrkar; David Evans; Peter Inglesby; Jonathan Cockburn; Helen I McDonald; Brian MacKenna; Laurie Tomlinson; Ian J Douglas; Christopher T Rentsch; Rohini Mathur; Angel Y S Wong; Richard Grieve; David Harrison; Harriet Forbes; Anna Schultze; Richard Croker; John Parry; Frank Hester; Sam Harper; Rafael Perera; Stephen J W Evans; Liam Smeeth; Ben Goldacre
Journal: Nature Date: 2020-07-08 Impact factor: 49.962

7. Early prediction of mortality risk among patients with severe COVID-19, using machine learning.

Authors: Chuanyu Hu; Zhenqiu Liu; Yanfeng Jiang; Oumin Shi; Xin Zhang; Kelin Xu; Chen Suo; Qin Wang; Yujing Song; Kangkang Yu; Xianhua Mao; Xuefu Wu; Mingshan Wu; Tingting Shi; Wei Jiang; Lina Mu; Damien C Tully; Lei Xu; Li Jin; Shusheng Li; Xuejin Tao; Tiejun Zhang; Xingdong Chen
Journal: Int J Epidemiol Date: 2021-01-23 Impact factor: 7.196

8. A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients.

Authors: Narges Razavian; Vincent J Major; Mukund Sudarshan; Jesse Burk-Rafel; Peter Stella; Hardev Randhawa; Seda Bilaloglu; Ji Chen; Vuthy Nguy; Walter Wang; Hao Zhang; Ilan Reinstein; David Kudlowitz; Cameron Zenger; Meng Cao; Ruina Zhang; Siddhant Dogra; Keerthi B Harish; Brian Bosworth; Fritz Francois; Leora I Horwitz; Rajesh Ranganath; Jonathan Austrian; Yindalon Aphinyanaphongs
Journal: NPJ Digit Med Date: 2020-10-06

9. Prediction models for covid-19 outcomes.

Authors: Matthew Sperrin; Brian McMillan
Journal: BMJ Date: 2020-10-20

3 in total

1. Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases.

Authors: Thomas Linden; Frank Hanses; Daniel Domingo-Fernández; Lauren Nicole DeLong; Alpha Tom Kodamullil; Jochen Schneider; Maria J G T Vehreschild; Julia Lanznaster; Maria Madeleine Ruethrich; Stefan Borgmann; Martin Hower; Kai Wille; Torsten Feldt; Siegbert Rieg; Bernd Hertenstein; Christoph Wyen; Christoph Roemmele; Jörg Janne Vehreschild; Carolin E M Jakob; Melanie Stecher; Maria Kuzikov; Andrea Zaliani; Holger Fröhlich
Journal: Artif Intell Life Sci Date: 2021-12-17

2. SARS-CoV-2 infection in chronic kidney disease patients with pre-existing dialysis: description across different pandemic intervals and effect on disease course (mortality).

Authors: Lisa Pilgram; Lukas Eberwein; Bjoern-Erik O Jensen; Carolin E M Jakob; Felix C Koehler; Martin Hower; Jan T Kielstein; Melanie Stecher; Bernd Hohenstein; Fabian Prasser; Timm Westhoff; Susana M Nunes de Miranda; Maria J G T Vehreschild; Julia Lanznaster; Sebastian Dolff
Journal: Infection Date: 2022-04-29 Impact factor: 7.455

3. Development and validation of a simplified risk score for the prediction of critical COVID-19 illness in newly diagnosed patients.

Authors: Stanislas Werfel; Carolin E M Jakob; Stefan Borgmann; Jochen Schneider; Christoph Spinner; Maximilian Schons; Martin Hower; Kai Wille; Martina Haselberger; Hanno Heuzeroth; Maria M Rüthrich; Sebastian Dolff; Johanna Kessel; Uwe Heemann; Jörg J Vehreschild; Siegbert Rieg; Christoph Schmaderer
Journal: J Med Virol Date: 2021-08-10 Impact factor: 20.693

3 in total