Literature DB >> 35499040

Community-acquired pneumonia: comparison of three mortality prediction scores in the emergency department.

Carolina Hincapié¹, Johana Ascuntar¹, Alba León¹, Fabián Jaimes^1,2.

Abstract

Background: qSOFA is a score to identify patients with suspected infection and risk of complications. Its criteria are like those evaluated in prognostic scores for pneumonia (CRB-65 - CURB-65), but it is not clear which is best for predicting mortality and admission to the ICU. Objective: Compare three scores (CURB-65, CRB-65 and qSOFA) to determine the best tool to identify emergency department patients with pneumonia at increased risk of mortality or intensive care unit (ICU) admission.
Methods: Secondary analysis of three prospective cohorts of patients hospitalized with diagnosis of pneumonia in five Colombian hospitals. Validation and comparison of the score´s accuracies were performed by means of discrimination and calibration measures.
Results: Cohorts 1, 2 and 3 included 158, 745 and 207 patients, with mortality rates of 32.3%, 17.2% and 18.4%, and admission to ICU was required for 52.5%, 43.5% and 25.6%, respectively. The best AUC-ROC for mortality was for CURB-65 in cohort 3 (AUC-ROC=0.67). The calibration was adequate (p>0.05) for the three scores. Conclusions: None of these scores proved to be an appropriate predictor for mortality and admission to the ICU. Furthermore, the CRB 65 exhibited the lowest discriminative ability.

Entities: Chemical

Keywords: Sepsis; clinical decision rules; mortality; pneumonia

Mesh：

Year: 2021 PMID： 35499040 PMCID： PMC9015018 DOI： 10.25100/cm.v52i4.4287

Source DB: PubMed Journal: Colomb Med (Cali) ISSN： 0120-8322

Remark

Introduction

Pneumonia is a significant cause of sepsis worldwide, representing approximately half of all cases, and is the second most frequent cause of sepsis in Colombia , . Globally, pneumonia confers a high risk of mortality , . Between 2005 and 2012 in Colombia, acute respiratory infection was the number one cause of death from communicable diseases, with 48.6% of the cases, representing 56.2% of deaths from communicable diseases in women and 43.1% in men . Providing health care to patients with severe infections carries high cost to a state and its health system. These infections have a challenging clinical approach because they do not have simple and specific prognostic markers that allow early identification of individuals at risk who warrant differential care. Therefore, it is important to have useful clinical tools to estimate the risk of death or complications in emergency department patients with suspected infections. Several studies have been conducted to define a mortality predictive score specifically for pneumonia, and the CURB-65 and CRB-65 scores have been widely used due to their easy application, compared with other ones such as the PSI (Pneumonia Severity Index) . Recently, the third consensus in sepsis (SEPSIS 3) encouraged the implementation of qSOFA (quick sepsis-related organ failure assessment) score in adult patients suspected of having an acute bacterial infection for early identification of those on worse prognosis . The Colombian Ministry of Health , as well as the Argentine Society of Infectious Diseases , and the Mexican Institute of Social Security in their guidelines for the management of patients with community-acquired pneumonia, recommend implementing CURB-65, despite the lack of local studies to confirm and validate this recommendation . The CURB-65, CRB-65 and qSOFA were designed to identify patients at increased risk of complications and mortality. These scores share clinical variables in their compositions and community-acquired pneumonia is the main cause of sepsis; therefore, exploring potential differences in their performance as prognosis models would have implications for clinical practice. Likewise, it is necessary to validate any multivariable model that has been developed for prognostic or diagnostic purposes for a clinical issue in independent populations . Therefore, this study aimed to validate and compare the three scores to determine the best tool to identify emergency department patients with pneumonia who are at increased risk of mortality or intensive care unit (ICU) admission.

Materials and Methods

Study design and setting

This analysis was performed using three prospective cohort studies developed between 2013 and 2016 in five emergency departments of the city of Medellín: Hospital Universitario San Vicente Fundación (560 adult inpatient beds and 45 ICU beds in 4 units), the University Health Services Provider Institution IPS Universitaria Clinica León XIII (450 adult inpatient beds and 24 ICU beds in 2 units), Hospital Pablo Tobón Uribe (360 adult inpatient beds and 40 UCI beds in 3 units), Hospital General de Medellín (442 beds) and Clínica Las Américas (304 beds) - . The first cohort was recruited from the emergency department of three tertiary care hospitals: Hospital Pablo Tobón Uribe, Hospital General de Medellín and Clínica Las Américas (2013-2016). The second was from the emergency departments of three tertiary care hospitals, Hospital Universitario San Vicente Fundación, Institución Universitaria Clínica León XIII and Hospital Pablo Tobón Uribe (2014-2016), and the third was from the emergency service of the Hospital Universitario San Vicente Fundación (2014-2016).

Source of data

For each of the original cohorts, trained research assistants collected data based on electronic medical records in a systematic way, reviewing all admissions to hospital institutions and screening all patients admitted for emergencies with a diagnosis of infection, sepsis, severe sepsis or shock. The definition of the infection source and the presence of organ dysfunction or septic shock were verified with the data extracted from the medical history records in the first 6 hours. To assess the data accuracy, the information was evaluated periodically by the co-investigators. This information was recorded using forms designed specifically for each of the investigations and then stored in electronic databases. Given that the cohorts were prospective, the evaluation of predictors was independent from knowledge of the outcomes of interest. Additionally, it was necessary for this study to recover the BUN (blood urea nitrogen) value upon hospital admission for patients at Hospital Universitario San Vicente Fundación and Hospital Pablo Tobón Uribe. The data collection process took information confidentiality into account and was approved by the ethics committees of each of the participating institutions.

Participants

For the current study, the inclusion criteria were patients who had entered the previous studies with a diagnosis of pneumonia. For cohort 1, the Centers for Disease Control and Prevention (CDC) criteria for infection were used for inclusion, cohort 2 consisted of suspected infection with at least one organ dysfunction criterion, and cohort 3 consisted of clinical suspicion of infection. As common criteria exclusion, we found for the 3 cohorts: patients that were early discharge or referenced to another institution, and patients with do-not-resuscitate orders or terminal diseases (Annex 1). No additional exclusion criteria that had not been considered in the original studies were used in the present study (12-14).

Variables

The primary outcome was hospital mortality; ICU admission was included as a secondary outcome.

The predictor variables

qSOFA: This severity prediction score includes scoring variables on the Glasgow Coma Scale ≤14, systolic blood pressure ≤100 mmHg, and respiratory rate ≥22 breaths per minute, with one point for each variable, for a total score between 0 and 3. It is proposed that the presence of two of these three criteria could predict mortality in patients with suspected infection outside the ICU (7). CURB-65: This score includes the variables confusion (Glasgow score <15), urea >7 mmol/L, respiratory rate ≥30, systolic pressure <90 or diastolic <60 mmHg, age ≥65 years, with one point for each variable, for a total score between 0 and 5. It is proposed that the presence of three or more points could predict mortality in patients with community-acquired pneumonia . CRB-65: This score includes the variables described above in the CURB-65 excepting the urea information. The score includes one point for each variable, for a total score between 0 and 4. It is proposed that the presence of three or more points could predict mortality in community-acquired pneumonia patients .

Sample size

Given that this was a secondary analysis of data, there was no calculation of sample size because the analysis was performed with patients of the respective cohorts that met the inclusion criteria. However, the power for the expected difference in the areas under the curve was calculated from a fixed number of patients and considered a type I error fixed at 0.05. The calculation was based on the formula described by Hanley and McNeil , . With a fixed sample size of 158, 745 and 207 patients for cohorts 1, 2 and 3, respectively, an alpha of 0.05 and taking the observed values of the AUC-ROC (area under the ROC curve) as θ1: 0.7 and θ2: 0.77 (based on the study by Kolditz et al. because we lacked this information locally), we found an estimation of power of 0.52, 0.98 and 0.62, respectively.

Statistical methods

The quantitative variables with a normal distribution are presented as means and standard deviations, while those without a normal distribution are expressed as medians and interquartile ranges (IQRs). A validation and comparison of the three predictive models (CURB-65, CRB-65 and qSOFA) was performed in terms of prognosis. To determine the accuracy of the prediction of the models, it was necessary to examine both the calibration and the discrimination. Calibration compares and establishes the agreement between observed and expected events, while discrimination establishes the ability with which the score distinguishes between individuals who experience or do not experience the event of interest . The performance of the scores in terms of discrimination was determined based on the area under the receiver operating characteristics curve (AUC-ROC) based on the models defined as the sum of the corresponding predictors. The differences between the AUC-ROC were tested using the DeLong-DeLong statistic . The calibration was determined by the degree of correspondence given by the Hosmer-Lemeshow goodness-of-fit test (p> 0.05). Additionally, calibration curves were performed based on the results of the models in each of the cohorts. The operative characteristics for prediction of mortality and ICU need for each of the scores were then estimated, taking two or more points for the qSOFA and 3 or more for both the CURB-65 and CRB-65, based on the original proposal of the models indicating these cutoff points as high risk of mortality. Likewise, the performance of each of the predictive models was analyzed according to all possible cutoff points and compared with the originally proposed cohort points. To calculate the sensitivity, specificity, predictive values and the likelihood ratios of the mentioned scores with their respective cutoff points, Bayes theorem was used, considering mortality and ICU need as a reference test or gold standard. In the main analysis, missing data were considered as abnormal values (worst-case scenario). Additionally, a sensitivity analysis was performed with two additional models: the best scenario, considering the missing data as normal values, and with a multivariate normal regression (MVN), multiple imputation technique, taking the BUN, age, gender, Charlson index, SOFA and Acute Physiology, Age, Chronic Health Evaluation II (APACHEII) as independent values. Statistical analyses were performed with the Stata 14® software. The results are presented with their respective 95% confidence intervals (CI), and a significance level of p <0.05 was applied. Publication standards given by the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines were followed .

Results

A total of 158, 745 and 207 patients were analyzed for cohorts 1, 2 and 3, respectively. In the same order, the median age was 70 (IQR = 56-81), 66 (IQR = 54-77) and 60 (IQR = 44-75) years; 34.2%, 48.9% and 44.4% were female; 52.5%, 43.5% and 25.6% required admission to the ICU; and 32.3%, 17.2% and 18.4% died during hospitalization (Table 1). Blood cultures were requested in 95.6%, 84.8% and 84.5% and germs were isolated for 23.2%, 10.8% and 9.1% of the patients in cohorts 1, 2, and 3, respectively. The most frequent microorganisms found in each of the cohorts were Streptococcus pneumoniae, Klebsiella pneumoniae, Haemophilus influenzae and Escherichia coli (Table 2).

Table 1

Baseline Characteristics of the study population, according to the original cohorts.

	Cohort 1 (n=158)	Cohort 2 (n=745)	Cohort 3 (n=207)
Characteristics
Age, years	70 (56-81)	66 (54-77)	60 (44-75)
Female sex	54 (34.2%)	364 (48.9%)	92 (44.4%)
CDC criteria	158 (100%)	628 (84.75%)	187 (90.34%)
Severity
Charlson index	1 (0-3)	1 (0-2)	1 (0-2)
SOFA	4 (3-6)	4 (3-6)	3 (2-5)
APACHE II	17 (12-21)	15 (11-19)	13 (9-17)
Variables
RR	24 (20- 28)	22 (19-27)	23 (19- 26)
SAP	110 (90-130)	113 (92-132)	120 (100-140)
DAP	60 (49-72)	68 (55-80)	76 (60-84)
MAP	76 (64-90)	83 (68-97)	91 (73-101)
Glasgow coma scale	15 (14-15)	15 (15-15)	15 (15-15)
BUN	n= 143 27.1 (16.3- 45.2)	n= 704 21.5 (14-33.4)	n= 206 18 (13-29)
≥65 years	93 (58.9%)	393 (52.8%)	89 (43%)
Scores
qSOFA	1 (1-2)	1 (1-2)	1 (0-1)
CURB 65	2 (2-3)	2 (1-3)	1 (1-2)
CRB 65	2 (1-2)	1 (1-2)	1 (0-2)
Outcomes
ICU	83 (52.5%)	324 (43.5%)	53 (25.6%)
Death	51 (32.3%)	128 (17.2%)	38 (18.4%)

Table 2

Blood cultures and microbiological results in the study population, according to the original cohorts

	Cohort 1 (n=158)	Cohort 2 (n=745)	Cohort 3 (n=207)
Characteristics
Blood culture requested	151 (95.6%)	632 (84.8%)	175 (84.5%)
Positive blood culture	35 (23.2%)	68 (10.8%)	16 (9.1%)
Main microorganisms
Streptococcus pneumoniae	13 (8.6%)	23 (3.6%)	6 (3.4%)
Klebsiella pneumoniae	6 (4%)	6 (1%)	1 (1%)
Haemophilus influenza	2 (1.3%)	6 (1%)	2 (1.1%)
Staphylococcus aureus	4 (2.7%)	8 (1%)	2 (1%)
Escherichia coli	5 (3.3%)	9 (1.4%)	1 (1%)
Pseudomonas aeruginosa	2 (1.3%)	3 (1%)	0

Abbreviations: SOFA, Sequential Organ Failure Assessment; APACHE II, Acute Physiology and Chronic Health Evaluation II; RR, respiratory rate; SAP, systolic arterial pressure; DAP, diastolic arterial pressure; MAP, mean arterial pressure; BUN, blood urea nitrogen. The quantitative variables were expressed as the medians and their respective interquartile range; categorical variables are shown in absolute and relative frequencies. For the outcome of admission to the ICU, discrimination was low for the three scores in the three cohorts. From the DeLong-DeLong statistic, a statistically significant difference was found between the AUC-ROC in cohorts 1 and 2 (P <0.05) (Figure 1), with an AUC-ROC of 0.59 for the qSOFA, 0.43 for the CURB-65 and 0.44 for the CRB-65 in cohort 1. For the mortality outcome, discrimination was not adequate in any of the three scores in any of the three cohorts. From the DeLong-DeLong statistic, a statistically significant difference was found in the AUC-ROC of cohorts 1 and 2 (Figure 2), with an AUC-ROC of 0.66 (95% CI = 0.62- 0.71) for the CURB-65, 0.60 (95% CI = 0.56-0.65) for the qSOFA and 0.63 (95% CI = 0.59-0.68) for the CRB -65 in cohort 2.

Figure 1

Receiver operating characteristic curve in the different cohorts for qSOFA, CURB-65 and CRB-65 in the discrimination of ICU admission. A. Cohort: 1- DeLong-DeLong p= 0.0008. B. Cohort 2- DeLong-DeLong p= 0.0402. C. Cohort 3 - DeLong-DeLong p= 0.3403.

Figure 2

Receiver operating characteristic curve in the different cohorts for qSOFA, CURB-65 and CRB-65 in the discrimination of mortality. A. Cohort 1 (DeLong-DeLong p= 0.0108). B. Cohort 2 (DeLong-DeLong p= 0.0218). C. Cohort (DeLong-DeLong p= 0.1606).

The calibration of the models was adequate in the study population for admission to the ICU and the mortality outcome, according to the Hosmer-Lemeshow statistic of the three scores in each of the cohorts (p> 0.05) (Table S1).

Table S1

Calibration of the models for ICU admission and mortality

	Cohort 1 (n=158)	Cohort 2 (n=745)	Cohort 3 (n=207)
ICU admission
qSOFA
Number of groups	7	5	7
Hosmer-Lemeshow	1.6	1.1	2.0
p-Value	0.9002	0.7875	0.8449
CURB-65
Number of groups	10	10	9
Hosmer-Lemeshow	8	6.2	3.1
p-Value	0.4355	0.6284	0.8749
CRB-65
Number of groups	10	8	7
Hosmer-Lemeshow	12.1	3.9	3.3
p-Value	0.1466	0.6941	0.6478
Mortality
qSOFA
Number of groups	6	6	4
Hosmer-Lemeshow	4.6	1.1	0.14
p-Value	0.3365	0.8985	0.9316
CURB-65
Number of groups	10	10	9
Hosmer-Lemeshow	5	8.1	3.5
p-Value	0.7612	0.4236	0.8309
CRB-65
Number of groups	10	7	7
Hosmer-Lemeshow	5.7	6.42	1.9
p-Value	0.6843	0.2672	0.8632

Additionally, calibration curves were performed for both outcomes in the different models in each of the cohorts, and a high degree of correspondence of the scores was shown in most of the cohorts (Supplementary Figure S1 and S2).

Figura S1

Calibration curves for ICU need prediction

Figura S2

Calibration curves for mortality prediction

Regarding the performance of the models in their operative characteristics, the greatest sensitivity for ICU need was with the qSOFA (55.4%) and for mortality was with CURB-65 (58.8%) in cohort 1. The greatest specificity was with CRB-65 for both ICU need and mortality, with 93.5% and 93.4% in cohorts 2 and 3, respectively. The lowest performance in predicting mortality in terms of sensitivity was for the CRB-65 in cohort 3 (13.2%), for specificity it was for the qSOFA in cohort 1 (43.9%) and for the positive predictive value it was the CRB-65 in cohort 3 (Tables S2 and S3).

Table S2

Operative characteristics for ICU admission prediction in qSOFA (with cut-off point ≥ 2), CURB 65 (with cut-off point ≥ 3) and CRB 65 (with cut-off point ≥ 3).

	Sensitivity (95% CI)	Specificity (95% CI)	PPV (95% CI)	NPV (95% CI)	LR + (95% CI)	LR - (95% CI)
qSOFA
Cohort 1	55.4 (44.1- 66.7)	60.0 (48.3-71.8)	60.5 (48.9- 72.2)	54.9 (43.5- 66.3)	1.4 (1-1.9)	0.7 (0.6- 1.0)
Cohort 2	37.4 (31.9- 42.8)	78.2 (74.1-82.2)	56.8 (49.9- 63.7)	61.8 (57.6- 66.1)	1.7 (1.4- 2.2)	0.8 (0.7-0.9)
Cohort 3	30.2 (16.9-43.5)	79.2 (72.5-86)	33.3 (19- 47.7)	76.7 (69.9- 83.6)	1.5 (0.9-2.4)	0.9 (0.7-1.1)
CURB-65
Cohort 1	39.8 (28.6-50.9)	45.3 (33.4-57.3)	44.6 (32.6- 56.6)	40.5 (29.4- 51.6)	0.7 (0.5-1.0)	1.3 (1-1.8)
Cohort 2	31.2 (26- 36.4)	76.3 (72.1-80.4)	50.3 (43.1- 57.4)	59.0 (54.8-63.2)	1.3 (1.0-1.7)	0.9 (0.8- 1)
Cohort 3	18.9 (7.4- 30.3)	83.8 (77.6- 89.9)	28.6 (12.2- 45)	75.0 (68.2- 81.8)	1.2 (0.6- 2.3)	1 (0.8- 1.1)
CRB-65
Cohort 1	14.5 (6.3- 22.6)	72.0 (61.2- 82.8)	36.4 (18.4- 54.3)	43.2 (34.1- 52.3)	0.5 (0.3- 1)	1.2 (1.0- 1.4)
Cohort 2	10.5 (7.0- 14)	93.4 (90.9- 95.9)	54.9 (41.6- 68.0)	57.5 (53.8- 61.3)	1.6 (1- 2.6)	1 (0.9- 1.0)
Cohort 3	7.6 (0- 15.6)	92.2 (87.7- 96.8)	25.0 (0.7- 49.3)	74.4 (67.9- 80.8)	1 (0.3- 2.9)	1.0 (0.9- 1.1)

Table S3

Operative characteristics for mortality prediction in qSOFA (with cut-off point ≥ 2), CURB 65 (with cut-off point ≥ 3) and CRB 65 (with cut-off point ≥ 3).

	Sensitivity (95% CI)	Specificity (95% CI)	PPV (95% CI)	NPV (95% CI)	LR + (95% CI)	LR - (95% CI)
qSOFA
Cohort 1	56.9 (42.3- 71.4)	43.9 (34.1- 53.8)	32.6 (22.3- 42.9)	68.1 (56.4- 79.8)	1.0 (0.8- 1.4)	1 (0.7- 1.4)
Cohort 2	39.1 (30.2- 47.9)	73.6 (70.0- 77.1)	23.5 (17.6- 29.4)	85.3 (82.2- 88.4)	1.5 (1.2- 1.9)	0.8 (0.7- 1)
Cohort 3	36.8 (20.2- 53.5)	79.9 (73.5- 86.2)	29.2 (15.3- 43.1)	84.9 (79.0- 90.8)	1.8 (1.1- 3.1)	0.8 (0.6- 1.0)
CURB-65
Cohort 1	58.8 (44.3-73.3)	58.9 (49.1-68.7)	40.5 (28.7-52.4)	75.0 (65.1-84.9)	1.4 (1.0-2)	0.7 (0.5-1.0)
Cohort 2	45.3 (36.3-54.3)	76.8 (73.4- 80.2)	28.9 (22.3- 35.4)	87.1 (84.2- 90.0)	2 (1.5- 2.3)	0.7 (0.6- 0.8)
Cohort 3	31.6 (15.5- 47.7)	86.4 (80.9- 91.9)	34.3 (17.1- 51.4)	84.9 (79.2- 90.5)	2.3 (1.3- 4.2)	0.8 (0.6- 1)
CRB-65
Cohort 1	29.4 (15.9-42.9)	83.2 (75.6-90.7)	45.5 (27-64)	71.2 (62.9-79.5)	2.3 (1.3- 4.2)	0.8 (0.6- 1)	1.8 (1-3.2)	0.9 (0.7-1.0)
Cohort 2	14.1 (7.7-20.5)	92.9 (90.9-95)	29.0 (16.9-41.1)	83.9 (81.1-86.7)	2 (1.1-3.3)	0.9 (0.7-1)
Cohort 3	13.2 (1.1-25.2)	93.5 (89.5-97.5)	31.3 (5.4-57.1)	82.7 (77.1-88.4)	2.0 (0.8-5.5)	0.9 (0.8-1.1)

Discussion

We found that qSOFA, CURB-65 or CRB-65 were not optimal in discriminating hospital mortality or ICU admission in three cohorts of patients with community-acquired pneumonia admitted to five hospitals in Medellín. However, looking at the AUC, sensitivity and negative predictive value values, CURB-65 appeared to consistently perform better than the other two tools with respect to mortality discrimination. In contrast, with regard to calibration, it was possible to demonstrate a good performance for the three scores in the 3 cohorts. Nevertheless, a lack of good discriminative performance indicates that these scoring systems should not be used as predictive tools , . It is necessary to account for the setting of the studies that originally developed the scores: the CURB-65 and the CRB-65 were developed in the United Kingdom, New Zealand and the Netherlands more than 20 years ago, countries with a community-acquired pneumonia associated mortality lower than in Colombia (9% vs 17-32%). On the other hand, the qSOFA was derived from a very recent cohort that used a clinical spectrum beyond pneumonia and showed a hospital mortality of only 4%. In 2006, Capelastegui et al. ( , showed a similar performance between the CURB-65 and CRB-65 scores for mortality at 30 days with an AUC over 0.85. Subsequently, Man et al. compared these prediction rules for 30-day mortality in patients with community-acquired pneumonia and found AUCs higher than the ones observed in the present study . In the original studies that served as the basis for the development of qSOFA, Seymour et al. found a good performance for the prediction of in-hospital mortality . Subsequently, Wang et al. performed a secondary analysis of data from a prospective cohort where they evaluated the performance of qSOFA in patients with an infection diagnosis who admitted to the emergency department, and found the score did not have a good performance (AUC=0.66) for 28-day mortality . Previous studies have demonstrated these scores underestimated risk in patients with community acquired pneumonia. A couple of years ago, Chen et al. compared the performance of the qSOFA, CRB and CRB-65 with respect to mortality and admission to the ICU. The AUC-ROC values of the qSOFA for the prediction of mortality at 28 days were similar to those of the and CRB-65 scores, 0.655 vs 0.661 respectively. Likewise, the prediction of admission to the ICU showed similar discrimination measures, 0.666 vs 0.685 respectively . These results are consistent with ours in terms of discrimination, both for admission to the ICU and for mortality, despite its large sample size and being performed in a single hospital, which could result in less variability in the overall sample. In Germany, Kolditz et al. , compared qSOFA with the CRB and CRB-65 for 30-day mortality in patients with community-acquired pneumonia. They found that the AUC-ROC results favored the CRB-65 compared to the qSOFA, AUC-ROC 0.77 vs 0.70 respectively. More recently, three different studies - show the comparison of severity scores in patients with COVID-19 pneumonia, all of them shows that that CURB-65 could be better compared with qSOFA at estimating mortality. Guohui et al. , found an AUC for discharge mortalitity of 0.85 for CURB-65, 0.80 for CRB-65 and 0.73 for and qSOFA. Bradley et al. found an AUC for 30 days mortalitity of 0.75 for CURB-65 and 0.62 for and qSOFA. Lazar Neto et al. found an AUC for 30 days in hospital mortalitity of 0.74 for CURB-65 and 0.63 for and qSOFA. As shown in the studies presented previously, the performance of the scores changed significantly among all cohorts due to their differences, include the distribution of etiological agents, coexisting diseases, social support, availability of resources and medical behaviors, including the ICU admission criteria. In our study, these scores performance varied even though the cohorts were from the same city, which can be explained by the variability in the patient inclusion criteria. The AUC-ROC is a statistical parameter that allows the comparison of predictive models of diagnosis or prognosis in terms of discrimination capacity, and it is reasonable to use an AUC-ROC >0.75 as a reference of acceptable performance. However, this statistical measure does not allow a direct clinical interpretation, and this limitation in predictive models is a constant in the literature on this topic, for this reason it is always necessary to evaluate simultaneously their operative characteristics. Regarding calibration, none of the mentioned studies above accounted for this in the statistical analysis. The critical importance of poor calibration is often underestimated. This can lead to a decrease in clinical utility; the implementation of a predictive tool with poor calibration could even lead to making decisions that are harmful to the patient . Future studies could consider other variables for score calculations, such as variables related to the microbiological agent, pulse oximetry, temperature, and comorbidities such as chronic obstructive pulmonary disease, congestive heart failure, and immunosuppression, among others. One of the limitations of our study was the sample size. We based the difference of 0.7-0.77 between the discrimination (AUC-ROC) of CRB-65 and qSOFA scores on partial information from Kolditz et al . This difference, however, does not necessarily have a clinical basis and did not consider that all scores had a final poor discrimination performance (AUC <0.75). The traditional approximation of the sample size calculation in predictive models defines a value of at least 10 outcomes for each independent variable , . For comparisons between models, exclusively by means of discrimination, we based the sample size formula on the AUC-ROC comparison by Hanley-McNeil , . However, specifically for the validation of predictive models, there is no clear indication of the sample size calculation, and although some authors have suggested a minimum of 100 outcomes, many studies do not consider this aspect , . On the other hand, the collection was performed in 5 institutions that are recognized as high quality health care centers, which can lead to a selection bias. However, the three cohorts had different inclusion criteria, which significantly improved the clinical spectrum of the study population. Another limitation was that despite being prospectively constructed cohorts, this study provides a secondary analysis of data, giving rise to missing urea values for some participants. These missing data were considered as abnormal values, which could generate a differential or non-differential classification bias. The missing data represented only 5%, however, and the sensitivity analysis with different scenarios did not improve the performance of the models. A predictive model is not of practical use if it cannot discriminate and be calibrated at the same time: to properly separate those who present the condition from those who do not, is as important as whether there is agreement between observed and expected events . Unlike the supervision required for new medical technologies, prediction systems are not subjected to strict judgments, despite the potential risk of affecting a greater number of patients due to their extensive implementation.

Conclusion

In the tree independent cohorts of patients admitted by the emergency department with pneumonia, the qSOFA, CURB-65 and CRB-65 were all found to be limited predictive tools for mortality and admission to the ICU. Furthermore, the CRB-65 exhibited the lowest discriminative ability.

Contribución del estudio

Introducción

La neumonía tiene un lugar preponderante dentro de las causas más importantes de sepsis en el mundo, ocupando el segundo lugar de frecuencia en Colombia (,. Es sabido que la neumonía confiere un alto riesgo de mortalidad a nivel mundial ,. El análisis de situación de salud en Colombia reportó que entre 2005 y 2012, dentro de las causas de muerte por enfermedades transmisibles, la infección respiratoria aguda se encontraba en primer lugar con el 48.6%, representando 56.2% de las muertes por enfermedades transmisibles en mujeres y 43.1% en hombres . Por otra parte, la atención en salud de los pacientes con infecciones graves acarrea grandes costos para un estado y su sistema de salud y representa un gran reto para el enfoque clínico, al no disponer de un marcador pronóstico sencillo y específico que permita identificar tempranamente individuos en riesgo que ameritan una atención diferencial. De esto deriva la importancia de contar con herramientas útiles desde el punto de vista clínico para estimar el riesgo de muerte o de complicaciones en pacientes hospitalizados en el servicio de urgencias con sospecha de infección. Recientemente se propuso el puntaje qSOFA con el fin de identificar de manera temprana aquellos pacientes con peor pronóstico en sospecha de infecciones bacterianas agudas. Del mismo modo, se han realizado varios estudios con el fin de definir un puntaje predictor de mortalidad específicamente en neumonía y se ha utilizado de forma amplia el puntaje CURB-65 por su fácil aplicación, comparado con otros puntajes como el PSI (Pneumonia Severity Index) . Recientemente, el tercer consenso en sepsis (SEPSIS 3) alentó la implementación de la puntuación qSOFA (evaluación rápida de falla orgánica relacionada con la sepsis) en pacientes adultos con sospecha de tener una infección bacteriana aguda para la identificación temprana de aquellos con peor pronóstico . El Ministerio de Salud de Colombia , así como la Sociedad Argentina de Enfermedades Infecciosas y el Instituto Mexicano del Seguro Social en sus guías para el manejo de pacientes con neumonía adquirida en la comunidad, recomiendan implementar CURB-65, a pesar de la falta de estudios locales para confirmar y validar esta recomendación ).. Dado que el CURB-65, el CRB-65 y el qSOFA se diseñaron con el fin de identificar aquellos pacientes con mayor riesgo de complicaciones y mortalidad, y comparten no sólo algunas variables clínicas en su composición sino también la población de estudio de NAC como principal causante de la sepsis, tendría implicaciones para la práctica clínica explorar potenciales diferencias en su desempeño como modelos predictivos de pronóstico. Así mismo, es necesario validar en poblaciones independientes cualquier modelo multivariable que se haya desarrollado con propósitos de pronóstico o diagnóstico para un problema clínico . Por esto, se plantea en esta investigación la validación y comparación de los tres puntajes para determinar la mejor herramienta para identificar a los pacientes con neumonía en el departamento de emergencias que tienen un mayor riesgo de mortalidad o ingreso en la unidad de cuidados intensivos (UCI).

Materiales y Métodos

Diseño del estudio y escenario

Se realizó un análisis de tres estudios de cohorte prospectivos desarrollados entre 2013 y 2016 en cinco servicios de urgencias de la ciudad de Medellín: Hospital Universitario San Vicente Fundación (560 camas de hospitalización de adultos y 45 camas de UCI en 4 unidades), la Institución Prestadora de Servicios de Salud Universitaria IPS Universitaria Clínica León XIII (450 camas de hospitalización de adultos y 24 camas de UCI en 2 unidades), Hospital Pablo Tobón Uribe (360 camas de hospitalización de adultos y 40 camas de UCI en tres unidades), Hospital General de Medellín (442 camas) y Clínica Las Américas (304 camas) -. La primera cohorte se reclutó del servicio de urgencias de tres hospitales de tercer nivel: Hospital Pablo Tobón Uribe, Hospital General de Medellín y Clínica Las Américas (2013-2016). El segundo fue de los servicios de urgencias de tres hospitales de tercer nivel, Hospital Universitario San Vicente Fundación, Institución Universitaria Clínica León XIII y Hospital Pablo Tobón Uribe (2014-2016), y el tercero fue del servicio de urgencias del Hospital Universitario San Vicente Fundación (2014-2016).

Fuente de los datos

Para cada una de las cohortes originales, auxiliares de investigación entrenados recolectaron los datos basados en los registros de historia clínica electrónica de forma sistemática, revisando todos los ingresos a las instituciones hospitalarias, tamizando a todos los pacientes que ingresaban por urgencias con diagnóstico de infección, sepsis, sepsis grave o choque. La definición de la fuente de la infección, la presencia de disfunción orgánica o de choque séptico se verificó con los datos extraídos de los registros de historia clínica en las primeras 6 horas. Para testificar la veracidad de los datos, la información era evaluada de forma periódica por los coinvestigadores. Esta información era consignada en formularios diseñados específicamente para cada una de las investigaciones y luego almacenada en bases de datos electrónicas. Dado que las cohortes eran prospectivas, la evaluación de los predictores fue independiente del conocimiento de los resultados de interés. Adicionalmente fue necesario para esta investigación, recuperar el registro del BUN al ingreso hospitalario para pacientes del Hospital Universitario San Vicente Fundación y Hospital Pablo Tobón Uribe. Para la recolección de los datos se contó con la aprobación de los comités de ética de cada una de las instituciones participantes.

Participantes

Para el actual estudio se tomó como criterio de inclusión aquellos pacientes que hayan ingresado a los estudios con diagnóstico de neumonía. Para la cohorte 1, se utilizaron los criterios de infección de los Centros para el Control y la Prevención de Enfermedades (CDC) para la inclusión, la cohorte 2 consistió en sospecha de infección con al menos un criterio de disfunción orgánica y la cohorte 3 consistió en sospecha clínica de infección. Criterio de exclusión común para las 3 cohortes: pacientes que fueron dados de alta temprana o referenciados a otra institución, y pacientes con órdenes de no reanimación o enfermedades terminales (Anexo 1). En el presente estudio no se utilizaron criterios de exclusión adicionales que no se hubieran considerado en los estudios originales -. La mortalidad hospitalaria fue el desenlace primario y como resultado secundario se incluyó la admisión a UCI.

Predictores

qSOFA: este puntaje de predicción de gravedad incluye las variables de puntaje en la escala de coma de Glasgow ≤14, presión arterial sistólica ≤ 100 mm Hg y frecuencia respiratoria ≥ 22 respiraciones por minuto; con un punto por cada variable, para una puntuación entre 0 y 3. Se plantea que la presencia de dos de esos tres criterios podría predecir la mortalidad en pacientes con sospecha de infección por fuera de la UCI . CURB-65: este puntaje incluye las variables confusión, urea > 7 mmol/L, frecuencia respiratoria ≥ 30, presión sistólica < 90 o diastólica < 60 mmHg, edad ≥ 65 años; con un punto por cada variable, para una puntuación entre 0 y 5. Se plantea que la presencia de tres o más puntos podría predecir la mortalidad en pacientes neumonía adquirida en la comunidad . CRB-65: este puntaje incluye las variables confusión descritas anteriormente en el CURB-65 a excepción de la urea; con un punto por cada variable, para una puntuación entre 0 y 4. Se plantea que la presencia de tres o más puntos podría predecir la mortalidad en pacientes neumonía adquirida en la comunidad .

Tamaño de muestra

Dado que es un análisis secundario de datos, no se realizó un cálculo de tamaño de muestra porque se trabajará con el conjunto de pacientes de las respectivas cohortes que cumplan con los criterios de inclusión. No obstante, se calcula el poder para la diferencia esperada en las áreas bajo la curva a partir de un número de pacientes fijo y considerando una probabilidad de error del tipo I (alfa) también fija en 0.05. El cálculo se basa en la fórmula descrita por Hanley y McNeil ,. Con un tamaño de muestras fijas de 158, 745 y 207 pacientes, para las cohortes 1, 2 y 3 respectivamente, un alfa de 0.05 y tomando los valores observados de las AUC ROC como θ1: 0.7 y θ2: 0.77 (basados en lo encontrado en el estudio de Kolditz y colaboradores dado que carecemos de esta información en nuestro medio) encontramos una estimación de poder de 0.52, 0.98 y 0.62 respectivamente.

Métodos estadísticos

Las variables cuantitativas con distribución normal se presentan en medias y desviaciones estándar, mientras las que no presentaron distribución normal se expresan en medianas y rangos intercuartiles (RIQ). Se realizó una validación y comparación de los tres modelos predictivos (CURB-65, CRB-65 y qSOFA) en términos de pronóstico. Para determinar la exactitud en la predicción de los modelos, es necesario examinar tanto la calibración, como la discriminación. La primera compara y establece la concordancia entre los eventos observados y los previstos, mientras que la discriminación establece la capacidad con la cual el puntaje distingue entre los individuos que experimentan o no el evento de interés . Se determinó el desempeño de los puntajes en cuanto a discriminación con base en el área bajo la curva de características operativas del receptor (AUC-ROC). Se probó las diferencias entre AUC-ROC por medio del estadístico de DeLong-DeLong . La calibración se determinó a través del grado de correspondencia dado por la prueba de bondad de ajuste de Hosmer-Lemeshow (p >0.05). Adicionalmente, se realizaron curvas de calibración basados en los resultados de los modelos en cada una de las cohortes. Luego se estimaron las características operativas para predicción de mortalidad y necesidad de UCI para cada uno de los puntajes, tomando como punto de corte 2 o más puntos para el qSOFA y 3 ó más tanto para el CURB-65 como para el CRB-65, basados en la propuesta original de los modelos que indicaban estos puntos de corte como alto riesgo de mortalidad. Así mismo, se analizó el desempeño de cada uno de los modelos predictores según los todos los posibles puntos de corte y se comparó con los puntos de cohorte originalmente propuestos. Para calcular la sensibilidad, la especificidad, los valores predictivos y las razones de probabilidad de los puntajes mencionados con sus respectivos puntos de cortes, se utilizó el teorema de Bayes asumiendo como prueba de referencia o estándar de oro los desenlaces de mortalidad y necesidad de UCI. En el análisis principal, los datos faltantes se consideraron valores anormales (escenario del peor de los casos). Adicionalmente, se realizó un análisis de sensibilidad con dos modelos adicionales: el mejor escenario, considerando los datos faltantes como valores normales, y con una regresión normal multivariante (MVN), técnica de imputación múltiple, tomando el BUN, edad, sexo, índice de Charlson, SOFA y fisiología aguda, edad, evaluación crónica de la salud II (APACHEII) como valores independientes. Los análisis estadísticos se realizaron con el software Stata 14®. Los resultados se presentan con sus respectivos intervalos de confianza (IC) del 95%, y se aplicó un nivel de significancia de p <0.05. Se siguieron los estándares de publicación dados por el TRIPOD (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) .

Resultados

Se analizaron 158, 745 y 207 pacientes para las cohortes 1, 2 y 3 respectivamente. En cada una de las cohortes se encontraron medianas de edad de 70 (RIQ= 56-81), 66 (RIQ= 54-77) y 60 (RIQ= 44-75); el 58.9, 52.8 y 43% tenían 65 o más años, el 34.2, 48.9 y 44.4% eran de sexo femenino, el 52.5, 43.5 y 25.6% requirió ingreso a UCI, y el 32.3, 17.2 y 18.4% fallecieron durante la hospitalización, respectivamente (Tabla 1). De los pacientes de cada una de las cohortes, al 95.6, 84.8 y 84.5%, respectivamente, se les solicitó hemocultivo; de estos, al 23.2, 10.8 y 9.1% se les aisló germen. Los microorganismos más frecuentemente encontrados en cada una de las cohortes fueron: Streptococcus pneumoniae, Klebsiella pneumoniae, Haemophilus influenzae y Escherichia coli (Tabla 2).

Tabla 1

Características generales y evolución de los participantes del estudio

	Cohorte 1 (n=158)	Cohorte 2 (n=745)	Cohorte 3 (n=207)
Características
Mediana de edad (RIQ)	70 (56-81)	66 (54-77)	60 (44-75)
Género femenino	54 (34.2%)	364 (48.9%)	92 (44.4%)
Criterios CDC	158 (100%)	628 (84.75%)	187 (90.34%)
Gravedad
Charlson (RIQ)	1 (0-3)	1 (0-2)	1 (0-2)
SOFA (RIQ)	4 (3-6)	4 (3-6)	3 (2-5)
APACHE II (RIQ)	17 (12-21)	15 (11-19)	13 (9-17)
Variables
FR (RIQ)	24 (20- 28)	22 (19-27)	23 (19- 26)
PAS (RIQ)	110 (90-130)	113 (92-132)	120 (100-140)
PAD (RIQ)	60 (49-72)	68 (55-80)	76 (60-84)
PAM (RIQ)	76 (64-90)	83 (68-97)	91 (73-101)
Glasgow (RIQ)	15 (14-15)	15 (15-15)	15 (15-15)
BUN (RIQ)	n= 143 27.1 (16.3- 45.2)	n= 704 21.5 (14-33.4)	n= 206 18 (13-29)
≥65 años	93 (58.9%)	393 (52.8%)	89 (43%)
Puntajes
qSOFA (RIQ)	1 (1-2)	1 (1-2)	1 (0-1)
CURB 65 (RIQ)	2 (2-3)	2 (1-3)	1 (1-2)
CRB 65 (RIQ)	2 (1-2)	1 (1-2)	1 (0-2)
Desenlaces
UCI	83 (52.5%)	324 (43.5%)	53 (25.6%)
Muerte	51 (32.3%)	128 (17.2%)	38 (18.4%)

Tabla 2

Características microbiológicas

	Cohorte 1 (n=158)	Cohorte 2 (n=745)	Cohorte 3 (n=207)
Características
Hemocultivo solicitado	151 (95.6%)	632 (84.8%)	175 (84.5%)
Hemocultivo positivo	n=151 35 (23.2%)	n=632 68 (10.8%)	n=175 16 (9.1%)
Microrganismo
Streptococcus pneumoniae	13 (8.6%)	23 (3.6%)	6 (3.4%)
Klebsiella pneumoniae	6 (4%)	6 (1%)	1 (1%)
Haemophilus influenzae	2 (1.3%)	6 (1%)	2 (1.1%)
Staphylococcus aureus	4 (2.7%)	8 (1%)	2 (1%)
Escherichia coli	5 (3.3%)	9 (1.4%)	1 (1%)
Pseudomonas aeruginosa	2 (1.3%)	3 (1%)	0

Abreviaturas: SOFA, Sequential Organ Failure Assessment; APACHE II Acute Physiology and Chronic Health Evaluation II; FR: frecuencia respiratoria; PAS, presión arterial sistólica; PAD, presión arterial diastólica; PAM, presión arterial media; BUN, nitrógeno ureico en sangre. Las variables cuantitativas se expresaron como medianas y su respectivo rango intercuartílico; las variables categóricas se muestran en frecuencias absolutas y relativas. Para el desenlace de ingreso a UCI, la discriminación fue mala para los tres puntajes en las tres cohortes. A partir del estadístico de DeLong-DeLong, se encontró diferencia estadísticamente significativa entre las AUC-ROC en las cohortes 1 y 2 (p <0.05) (Figura 1), con un AUC-ROC de 0.59 para el qSOFA, 0.43 para el CURB-65 y 0.44 para el CRB-65 en la cohorte 1. Para el desenlace de mortalidad, la discriminación no fue adecuada en ninguno de los tres puntajes, en ninguna de las tres cohortes. A partir del estadístico de DeLong-DeLong, se encontró diferencia estadísticamente significativa en las AUC-ROC de las cohortes 1 y 2 (Figura 2), con un AUC-ROC de 0.66 (IC del 95%= 0.62-0.71) para el CURB-65, 0.60 (IC del 95% = 0.56-0.65) para el qSOFA y 0.63 (IC del 95% = 0.59-0.68) para el CRB-65 en la cohorte 2.

Figura 1

Curvas de las características operativas del receptor en las distintas cohortes para qSOFA, CURB-65 y CRB-65 en la discriminación de ingreso a UCI. A. Cohorte 1- DeLong-DeLong p= 0.0008. B. Cohorte 2- DeLong-DeLong P= 0.0402. C. Cohorte 3 - DeLong-DeLong p= 0.3403.

Figura 2

Curvas de las características operativas del receptor en las distintas cohortes para qSOFA, CURB-65 y CRB-65 en la discriminación de mortalidad. A. Cohorte 1 (DeLong-DeLong p= 0.0108). B. Cohorte 2 (DeLong-DeLong P= 0.0218). C. Cohorte 3 (DeLong-DeLong p= 0.1606).

La calibración de los modelos fue adecuada en la población de estudio para el ingreso en UCI y el resultado de mortalidad, según el estadístico de Hosmer-Lemeshow de los tres puntajes en cada una de las cohortes (p >0.05) (Tabla S1).

Tabla S1

Calibración de los modelos para ingreso a UCI y mortalidad

	Cohorte 1 (n=158)	Cohorte 2 (n=745)	Cohorte 3 (n=207)
INGRESO UCI
qSOFA
Número de grupos	7	5	7
Hosmer-Lemeshow	1.6	1.1	2.0
Valor P	0.9002	0.7875	0.8449
CURB-65
Número de grupos	10	10	9
Hosmer-Lemeshow	8	6.2	3.1
Valor P	0.4355	0.6284	0.8749
CRB-65
Número de grupos	10	8	7
Hosmer-Lemeshow	12.1	3.9	3.3
Valor P	0.1466	0.6941	0.6478
MORTALIDAD
qSOFA
Número de grupos	6	6	4
Hosmer-Lemeshow	4.6	1.1	0.14
Valor P	0.3365	0.8985	0.9316
CURB-65
Número de grupos	10	10	9
Hosmer-Lemeshow	5	8.1	3.5
Valor P	0.7612	0.4236	0.8309
CRB-65
Número de grupos	10	7	7
Hosmer-Lemeshow	5.7	6.42	1.9
Valor p	0.6843	0.2672	0.8632

Adicionalmente, se realizaron curvas de calibración para ambos resultados en los diferentes modelos en cada una de las cohortes, y se mostró un alto grado de correspondencia de las puntuaciones en la mayoría de las cohortes (Figura complementaria S1 y S2).

Figura S1

Curvas de calibración para para predicción de necesidad de UCI

Figura S2

Curvas de calibración para para predicción de mortalidad

En cuanto al desempeño de los modelos en sus características operativas, la mayor sensibilidad para la necesidad de UCI fue con el qSOFA (55.4%) y para la mortalidad fue con CURB-65 (58.8%) en la cohorte 1. La mayor especificidad fue con CRB-65 para tanto la necesidad de UCI como la mortalidad, con 93.5% y 93.4% en las cohortes 2 y 3, respectivamente. El menor desempeño en la predicción de mortalidad en términos de sensibilidad fue para el CRB-65 en la cohorte 3 (13.2%), para la especificidad fue para el qSOFA en la cohorte 1 (43.9%) y para el valor predictivo positivo fue el CRB-65. en la cohorte 3 (Tablas S2 y S3).

Tabla S2

Características operativas para predicción de necesidad de UCI de los puntajes qSOFA (con punto de corte ≥ 2), CURB 65 (con punto de corte ≥ 3) y CRB 65 (con punto de corte ≥ 3).

	Sensibilidad (IC 95%)	Especificidad (IC 95%)	VPP (IC 95%)	VPN (IC 95%)	LR + (IC 95%)	LR - (IC 95%)
qSOFA
Cohorte 1	55.4 (44.1- 66.7)	60.0 (48.3-71.8)	60.5 (48.9- 72.2)	54.9 (43.5- 66.3)	1.4 (1-1.9)	0.7 (0.6- 1.0)
Cohorte 2	37.4 (31.9- 42.8)	78.2 (74.1-82.2)	56.8 (49.9- 63.7)	61.8 (57.6- 66.1)	1.7 (1.4- 2.2)	0.8 (0.7-0.9)
Cohorte 3	30.2 (16.9-43.5)	79.2 (72.5-86)	33.3 (19- 47.7)	76.7 (69.9- 83.6)	1.5 (0.9-2.4)	0.9 (0.7-1.1)
CURB-65
Cohorte 1	39.8 (28.6-50.9)	45.3 (33.4-57.3)	44.6 (32.6- 56.6)	40.5 (29.4- 51.6)	0.7 (0.5-1.0)	1.3 (1-1.8)
Cohorte 2	31.2 (26- 36.4)	76.3 (72.1-80.4)	50.3 (43.1- 57.4)	59.0 (54.8-63.2)	1.3 (1.0-1.7)	0.9 (0.8- 1)
Cohorte 3	18.9 (7.4- 30.3)	83.8 (77.6- 89.9)	28.6 (12.2- 45)	75.0 (68.2- 81.8)	1.2 (0.6- 2.3)	1 (0.8- 1.1)
CRB-65
Cohorte 1	14.5 (6.3- 22.6)	72.0 (61.2- 82.8)	36.4 (18.4- 54.3)	43.2 (34.1- 52.3)	0.5 (0.3- 1)	1.2 (1.0- 1.4)
Cohorte 2	10.5 (7.0- 14)	93.4 (90.9- 95.9)	54.9 (41.6- 68.0)	57.5 (53.8- 61.3)	1.6 (1- 2.6)	1 (0.9- 1.0)
Cohorte 3	7.6 (0- 15.6)	92.2 (87.7- 96.8)	25.0 (0.7- 49.3)	74.4 (67.9- 80.8)	1 (0.3- 2.9)	1.0 (0.9- 1.1)

Table S3

Características operativas para predicción de mortalidad de los puntajes qSOFA (con punto de corte ≥ 2), CURB 65 (con punto de corte ≥ 3) y CRB 65 (con punto de corte ≥ 3).

	Sensibilidad (IC 95%)	Especificidad (IC 95%)	VPP (IC 95%)	VPN (IC 95%)	LR + (IC 95%)	LR - (IC 95%)
qSOFA
Cohorte 1	56.9 (42.3- 71.4)	43.9 (34.1- 53.8)	32.6 (22.3- 42.9)	68.1 (56.4- 79.8)	1.0 (0.8- 1.4)	1 (0.7- 1.4)
Cohorte 2	39.1 (30.2- 47.9)	73.6 (70.0- 77.1)	23.5 (17.6- 29.4)	85.3 (82.2- 88.4)	1.5 (1.2- 1.9)	0.8 (0.7- 1)
Cohorte 3	36.8 (20.2- 53.5)	79.9 (73.5- 86.2)	29.2 (15.3- 43.1)	84.9 (79.0- 90.8)	1.8 (1.1- 3.1)	0.8 (0.6- 1.0)
CURB-65
Cohorte 1	58.8 (44.3-73.3)	58.9 (49.1-68.7)	40.5 (28.7-52.4)	75.0 (65.1-84.9)	1.4 (1.0-2)	0.7 (0.5-1.0)
Cohorte 2	45.3 (36.3-54.3)	76.8 (73.4- 80.2)	28.9 (22.3- 35.4)	87.1 (84.2- 90.0)	2 (1.5- 2.3)	0.7 (0.6- 0.8)
Cohorte 3	31.6 (15.5- 47.7)	86.4 (80.9- 91.9)	34.3 (17.1- 51.4)	84.9 (79.2- 90.5)	2.3 (1.3- 4.2)	0.8 (0.6- 1)
CRB-65
Cohorte 1	29.4 (15.9-42.9)	83.2 (75.6-90.7)	45.5 (27-64)	71.2 (62.9-79.5)	1.8 (1-3.2)	0.9 (0.7-1.0)
Cohorte 2	14.1 (7.7-20.5)	92.9 (90.9-95)	29.0 (16.9-41.1)	83.9 (81.1-86.7)	2 (1.1-3.3)	0.9 (0.7-1)
Cohorte 3	13.2 (1.1-25.2)	93.5 (89.5-97.5)	31.3 (5.4-57.1)	82.7 (77.1-88.4)	2.0 (0.8-5.5)	0.9 (0.8-1.1)

Discusión

Encontramos que qSOFA, CURB-65 o CRB-65 no fueron óptimos para discriminar la mortalidad hospitalaria o el ingreso a la UCI en tres cohortes de pacientes con neumonía adquirida en la comunidad ingresados en cinco hospitales de Medellín. Sin embargo, al observar el AUC, la sensibilidad y los valores de valor predictivo negativo, CURB-65 pareció funcionar consistentemente mejor que las otras dos herramientas con respecto a la discriminación de la mortalidad. Por el contrario, con respecto a la calibración, fue posible demostrar un buen desempeño para las tres puntuaciones en las 3 cohortes. No obstante, la falta de un buen desempeño discriminativo indica que estos sistemas de puntuación no deben utilizarse como herramientas predictivas ,. Es necesario tener en cuenta el entorno de los estudios que originalmente desarrollaron las puntuaciones: el CURB-65 y el CRB-65 se desarrollaron en Reino Unido, Nueva Zelanda y Holanda hace más de 20 años, países con una mortalidad asociada a neumonía adquirida en la comunidad menor que en Colombia (9% vs 17-32%). Por otro lado, el qSOFA se derivó de una cohorte muy reciente que utilizó un espectro clínico más allá de la neumonía y mostró una mortalidad hospitalaria de solo el 4%. En 2006, Capelastegui et al. , mostró un desempeño similar entre las puntuaciones CURB-65 y CRB-65 para la mortalidad a los 30 días con un AUC superior a 0,85. Posteriormente, Man et al. compararon estas reglas de predicción para la mortalidad a 30 días en pacientes con neumonía adquirida en la comunidad y encontraron AUC más altas que las observadas en el presente estudio . En los estudios originales que sirvieron de base para el desarrollo de qSOFA, Seymour et al. encontraron un buen desempeño para la predicción de la mortalidad intrahospitalaria . Posteriormente, Wang et al. realizaron un análisis secundario de datos de una cohorte prospectiva donde evaluaron el desempeño de qSOFA en pacientes con diagnóstico de infección que ingresaron al servicio de urgencias, y encontraron que la puntuación no tuvo un buen desempeño (AUC= 0.66) para la mortalidad a 28 días ). Estudios anteriores han demostrado que estas puntuaciones subestiman el riesgo en pacientes con neumonía adquirida en la comunidad. Hace un par de años, Chen et al. compararon el desempeño de la qSOFA, CRB y CRB-65 con respecto a la mortalidad y el ingreso en UCI. Los valores de AUC-ROC del qSOFA para la predicción de la mortalidad a los 28 días fueron similares a los de las puntuaciones y CRB-65, 0.655 vs 0,661 respectivamente. Asimismo, la predicción de ingreso en UCI mostró medidas de discriminación similares, 0.666 vs 0.685 respectivamente . Estos resultados son consistentes con los nuestros en cuanto a discriminación, tanto por ingreso en UCI como por mortalidad, a pesar de su gran tamaño muestral y de ser realizado en un solo hospital, lo que podría resultar en una menor variabilidad en la muestra global. En Alemania, Kolditz et al. , compararon qSOFA con CRB y CRB-65 para la mortalidad a 30 días en pacientes con neumonía extrahospitalaria. Encontraron que los resultados de AUC-ROC favorecieron a CRB-65 en comparación con qSOFA, AUC-ROC 0,77 frente a 0.70 respectivamente. Más recientemente, tres estudios diferentes - muestran la comparación de las puntuaciones de gravedad en pacientes con neumonía COVID-19, todos ellos muestran que CURB-65 podría ser mejor comparado con qSOFA para estimar la mortalidad. Guohui y col. , encontraron un AUC para la mortalidad por descarga de 0.85 para CURB-65, 0.80 para CRB-65 y 0.73 para y qSOFA. Bradley et al., encontraron un AUC para 30 días de mortalidad de 0.75 para CURB-65 y 0.62 para qSOFA. Lazar et al., encontraron un AUC durante 30 días en la mortalidad hospitalaria de 0.74 para CURB-65 y 0.63 para qSOFA. Como se muestra en los estudios presentados anteriormente, el desempeño de los puntajes varió significativamente entre todas las cohortes debido a sus diferencias, incluyen la distribución de agentes etiológicos, enfermedades coexistentes, apoyo social, disponibilidad de recursos y conductas médicas, incluyendo los criterios de ingreso a UCI. En nuestro estudio, el desempeño de estas puntuaciones varió a pesar de que las cohortes eran de la misma ciudad, lo que puede explicarse por la variabilidad en los criterios de inclusión de pacientes. El AUC-ROC es un parámetro estadístico que permite la comparación de modelos predictivos de diagnóstico o pronóstico en términos de capacidad de discriminación, y es razonable utilizar un AUC-ROC >0.75 como referencia de rendimiento aceptable. Sin embargo, esta medida estadística no permite una interpretación clínica directa, y esta limitación en los modelos predictivos es una constante en la literatura sobre este tema, por lo que siempre es necesario evaluar simultáneamente sus características operativas. En cuanto a la calibración, ninguno de los estudios mencionados anteriormente tuvo en cuenta esto en el análisis estadístico. A menudo se subestima la importancia fundamental de una mala calibración. Esto puede conducir a una disminución de la utilidad clínica; la implementación de una herramienta predictiva con mala calibración podría incluso llevar a la toma de decisiones perjudiciales para el paciente . Los estudios futuros podrían considerar otras variables para el cálculo de la puntuación, como variables relacionadas con el agente microbiológico, oximetría de pulso, temperatura y comorbilidades como enfermedad pulmonar obstructiva crónica, insuficiencia cardíaca congestiva e inmunosupresión, entre otras. Una de las limitaciones de nuestro estudio fue el tamaño de la muestra. Basamos la diferencia de 0.7-0.77 entre la discriminación (AUC-ROC) de CRB-65 y las puntuaciones de qSOFA en información parcial de Kolditz et al. Sin embargo, esta diferencia no tiene necesariamente una base clínica y no consideró que todas las puntuaciones tuvieran un rendimiento final de discriminación pobre (AUC <0.75). La aproximación tradicional del cálculo del tamaño de la muestra en modelos predictivos define un valor de al menos 10 resultados para cada variable independiente ,. Para las comparaciones entre modelos, exclusivamente mediante discriminación, basamos la fórmula del tamaño de la muestra en la comparación AUC-ROC de Hanley-McNeil ,. Sin embargo, específicamente para la validación de modelos predictivos, no existe una indicación clara del cálculo del tamaño de la muestra, y aunque algunos autores han sugerido un mínimo de 100 resultados, muchos estudios no consideran este aspecto ,. Por otro lado, la recolección se realizó en cinco instituciones que son reconocidas como centros de atención de salud de alta calidad, lo que puede llevar a un sesgo de selección. Sin embargo, las tres cohortes tenían diferentes criterios de inclusión, lo que mejoró significativamente el espectro clínico de la población de estudio. Otra limitación fue que, a pesar de ser cohortes construidas prospectivamente, este estudio proporciona un análisis secundario de los datos, lo que da lugar a valores de urea faltantes para algunos participantes. Estos datos faltantes se consideraron valores anormales, lo que podría generar un sesgo de clasificación diferencial o no diferencial. Sin embargo, los datos faltantes representaron solo el 5% y el análisis de sensibilidad con diferentes escenarios no mejoró el desempeño de los modelos. Un modelo predictivo no es de uso práctico si no puede discriminar y calibrar al mismo tiempo: separar adecuadamente a los que presentan la condición de los que no, es tan importante como si existe concordancia entre los eventos observados y esperados . A diferencia de la supervisión requerida para las nuevas tecnologías médicas, los sistemas de predicción no están sujetos a juicios estrictos, a pesar del riesgo potencial de afectar a un mayor número de pacientes debido a su extensa implementación.

Conclusión

En las tres cohortes independientes de pacientes ingresados en el servicio de urgencias con neumonía, se encontró que qSOFA, CURB-65 y CRB-65 eran herramientas de predicción limitadas para la mortalidad y el ingreso en la UCI. Además, el CRB-65 exhibió la capacidad discriminativa más baja.

1) Why was this study conducted?

The CURB-65, CRB-65 and qSOFA were designed to identify patients at increased risk of complications and mortality. These scores share clinical variables in their compositions and community-acquired pneumonia is the main cause of sepsis; therefore, exploring potential differences in their performance as prognosis models would have implications for clinical practice.

2) What were the most relevant results of the study?

We did not find either the qSOFA, CURB-65 or CRB-65 to be adequate tools for discriminating hospital mortality or ICU admission in three cohorts of patients with community-acquired pneumonia, who were admitted to emergency departments in 5 reference hospitals in Medellín, Colombia.

3) What do these results contribute?

The qSOFA, CURB-65 and CRB-65 were all found to be ineffective predictive tools for mortality and admission to the ICU in our cohorts, therefore it is necessary to develop and validate predictive models of prognosis of community-acquired pneumonia that are useful for the Colombian population.

1) ¿Por qué se realizó este estudio?

El CURB-65, CRB-65 y qSOFA se diseñaron para identificar a los pacientes con mayor riesgo de complicaciones y mortalidad. Estas tres puntuaciones comparten variables clínicas en su composición y la neumonía extrahospitalaria es la principal causa de sepsis; por lo tanto, explorar las posibles diferencias en su desempeño como modelos de pronóstico tendría implicaciones para la práctica clínica.

2) ¿Cuáles fueron los resultados más relevantes del estudio?

No encontramos que la qSOFA, CURB-65 o CRB-65 fueran herramientas adecuadas para discriminar la mortalidad hospitalaria o el ingreso en UCI en tres cohortes de pacientes con neumonía extrahospitalaria, que ingresaron en los servicios de urgencias de 5 hospitales de referencia de Medellín. Colombia .

3¿Qué aportan estos resultados?

La qSOFA, CURB-65 y CRB-65 resultaron ser herramientas predictivas ineficaces de mortalidad e ingreso en UCI en nuestras cohortes, por lo que es necesario desarrollar y validar modelos predictivos de pronóstico de neumonía adquirida en la comunidad que sean útiles para la población colombiana.

33 in total

1. Calibration of risk prediction models: impact on decision-analytic performance.

Authors: Ben Van Calster; Andrew J Vickers
Journal: Med Decis Making Date: 2014-08-25 Impact factor: 2.583

2. Relaxing the rule of ten events per variable in logistic and Cox regression.

Authors: Eric Vittinghoff; Charles E McCulloch
Journal: Am J Epidemiol Date: 2006-12-20 Impact factor: 4.897

Review 3. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve.

Authors: Nancy R Cook
Journal: Clin Chem Date: 2007-11-16 Impact factor: 8.327

4. Antibiotics has more impact on mortality than other early goal-directed therapy components in patients with sepsis: An instrumental variable analysis.

Authors: Jessica Londoño; César Niño; Andrea Archila; Marta Valencia; Diana Cárdenas; Mayla Perdomo; Giovanny Moncayo; César Vargas; Carlos E Vallejo; Carolina Hincapié; Johana Ascuntar; Alba León; Fabián Jaimes
Journal: J Crit Care Date: 2018-08-30 Impact factor: 3.425

5. Association of diagnostic coding with trends in hospitalizations and mortality of patients with pneumonia, 2003-2009.

Authors: Peter K Lindenauer; Tara Lagu; Meng-Shiou Shieh; Penelope S Pekow; Michael B Rothberg
Journal: JAMA Date: 2012-04-04 Impact factor: 56.272

6. Comparison of the qSOFA and CRB-65 for risk prediction in patients with community-acquired pneumonia.

Authors: Martin Kolditz; André Scherag; Gernot Rohde; Santiago Ewig; Tobias Welte; Mathias Pletz
Journal: Intensive Care Med Date: 2016-09-19 Impact factor: 17.440

7. The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors: J A Hanley; B J McNeil
Journal: Radiology Date: 1982-04 Impact factor: 11.105

8. Validation of a predictive rule for the management of community-acquired pneumonia.

Authors: A Capelastegui; P P España; J M Quintana; I Areitio; I Gorordo; M Egurrola; A Bilbao
Journal: Eur Respir J Date: 2006-01 Impact factor: 16.671

9. Use of CRB-65 and quick Sepsis-related Organ Failure Assessment to predict site of care and mortality in pneumonia patients in the emergency department: a retrospective study.

Authors: Yun-Xia Chen; Jun-Yu Wang; Shu-Bin Guo
Journal: Crit Care Date: 2016-06-01 Impact factor: 9.097

10. Comparison of severity scores for COVID-19 patients with pneumonia: a retrospective study.

Authors: Guohui Fan; Chao Tu; Fei Zhou; Zhibo Liu; Yeming Wang; Bin Song; Xiaoying Gu; Yimin Wang; Yuan Wei; Hui Li; Xudong Wu; Jiuyang Xu; Shengjin Tu; Yi Zhang; Wenjuan Wu; Bin Cao
Journal: Eur Respir J Date: 2020-09-10 Impact factor: 16.671