| Literature DB >> 33509388 |
Andrew A S Soltan1, Samaneh Kouchaki2, Tingting Zhu3, Dani Kiyasseh3, Thomas Taylor3, Zaamin B Hussain4, Tim Peto5, Andrew J Brent6, David W Eyre7, David A Clifton3.
Abstract
BACKGROUND: The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure.Entities:
Mesh:
Year: 2020 PMID: 33509388 PMCID: PMC7831998 DOI: 10.1016/S2589-7500(20)30274-0
Source DB: PubMed Journal: Lancet Digit Health ISSN: 2589-7500
Clinical parameters included in each feature set
| Presentation blood tests | Haemoglobin, haematocrit, mean cell volume, white cell count, neutrophil count, lymphocyte count, monocyte count, eosinophil count, basophil count, platelets, prothrombin time, INR, APTT, sodium, potassium, creatinine, urea, eGFR, CRP, albumin, alkaline phosphatase, ALT, bilirubin |
| Presentation point-of-care blood gas results | Actual base excess, standard base excess, bicarbonate, calcium, chloride, estimated osmolality, fraction of carboxyhaemoglobin, glucose, haemoglobin, haematocrit, potassium, methaemoglobin, sodium, oxygen saturation, calculated lactate, calculated oxygen content, calculated p5O, partial pressure of carbon dioxide at point of care, pH, partial pressure of oxygen |
| Presentation vital signs | Heart rate, respiratory rate, oxygen saturation, systolic blood pressure, diastolic blood pressure, temperature, oxygen flow rate |
| Change (Δ) in blood test results from baseline | Δ albumin, Δ alkaline phosphatase, Δ ALT, Δ basophil count, Δ bilirubin, Δ creatinine, Δ eosinophil count, Δ haematocrit, Δ haemoglobin, Δ lymphocyte count, Δ mean cell volume, Δ monocyte count, Δ neutrophil count, Δ platelets, Δ potassium, Δ sodium, Δ urea, Δ white cell count, Δ eGFR |
| Baseline comorbidity data | Charlson comorbidity index |
ALT=alanine aminotransferase. APTT=activated partial thromboplastin time. CRP=C-reactive protein. eGFR=estimated glomerular filtration rate. INR=international normalised ratio. p50=pressure at which haemoglobin is 50% bound to oxygen.
Population characteristics for the study cohorts and the prospective validation set
| Presenting to hospital | Admitted to hospital | Presenting to hospital (n=3326) | Admitted to hospital (n=1715) | ||||
|---|---|---|---|---|---|---|---|
| COVID-19 negative (n=114 957) | COVID-19 positive (n=437) | COVID-19 negative (n=71 927) | COVID-19 positive (n=383) | ||||
| Patients positive for COVID-19 | 0 | 437 | 0 | 383 | 107 | 91 | |
| Age, years | 60 (38) | 69 (26) | 65 (33) | 71 (26) | 56 (37) | 64 (34) | |
| Sex | |||||||
| Men | 53 570 (46·6%) | 246 (56·3%) | 34 381 (47·8%) | 211 (55·1%) | 1513 (45·5%) | 832 (48·5%) | |
| Women | 61 387 (53·4%) | 191 (43·7%) | 37 546 (52·2%) | 172 (44·9%) | 1813 (54·5%) | 883 (51·5%) | |
| Previous EHR encounter | 85 183 (74·1%) | 367 (84·0%) | 53 370 (74·2%) | 33 091 (86·4%) | 2671 (80·3%) | 1367 (79·7%) | |
| Ethnicity | |||||||
| White British | 76·0% | 65·4% | 78·5% | 68·4% | 66·3% | 68·2% | |
| Not stated | 11·8% | 17·4% | 11·0% | 16·2% | 19·5% | 20·5% | |
| Any other White background | 5·0% | 3·7% | 4·0% | 3·4% | 6·5% | 4·7% | |
| Pakistani | 1·3% | 1·1% | 1·1% | 1·0% | 1·2% | 1·0% | |
| Any other Asian background | 0·9% | 2·5% | 0·8% | 1·8% | 1·4% | 1·2% | |
| Indian or British Indian | 0·8% | 1·1% | 0·7% | 0·8% | 0·9% | 0·8% | |
| White Irish | 0·7% | 0·7% | 0·7% | 0·8% | 0·7% | 0·8% | |
| African | 0·6% | 3·0% | 0·6% | 2·9% | 0·6% | 0·8% | |
| Any other Black background | 0·3% | 0·9% | 0·3% | 0·5% | 0·5% | 0·3% | |
| Bangladeshi | 0·2% | 0·7% | 0·2% | 0·8% | 0·3% | 0·3% | |
| Chinese | 0·2% | 0·2% | 0·2% | 0·3% | 0·4% | 0·3% | |
| Any other ethnic group | 2·0% | 3·2% | 1·8% | 3·2% | 1·6% | 1·3% | |
| Patients positive for influenza | 484 (<0·1%) | 0 | 466 (<0·1%) | 0 | 0 | 0 | |
Data are n (%) or median (IQR). EHR=electronic health-care record.
AUROCs achieved for each independent feature set and for increasing feature sets using stratified 10-fold cross-validation during training
| Presentation blood tests | Blood gas results | Vital signs | Δ blood tests | Presentation blood tests | Presentation blood tests plus blood gas results | Presentation blood tests plus blood gas results plus vital signs | Sets performed on presentation plus Δ blood tests | Sets performed on presentation plus Δ blood tests plus CCI | |
|---|---|---|---|---|---|---|---|---|---|
| Logistic regression | 0·897 (0·003) | 0·730 (0·001) | 0·810 (0·003) | 0·805 (0·008) | 0·897 (0·003) | 0·898 (0·003) | 0·919 (0·002) | 0·920 (0·004) | 0·920 (0·004) |
| Random forest | 0·901 (0·004) | 0·780 (0·000) | 0·815 (0·005) | 0·835 (0·006) | 0·901 (0·004) | 0·907 (0·003) | 0·922 (0·002) | 0·941 (0·004) | 0·937 (0·002) |
| XGBoost | 0·904 (0·000) | 0·770 (0·000) | 0·823 (0·005) | 0·808 (0·050) | 0·904 (0·000) | 0·916 (0·003) | 0·929 (0·003) | 0·942 (0·002) | 0·942 (0·002) |
Data are AUROC (SD). Δ=change in results from baseline. AUROC=area under the receiver operating characteristic curve. CCI=Charlson comorbidity index.
Assessment of performance of the ED and admissions models, calibrated to 70%, 80%, and 90% sensitivities during training, in identifying COVID-19 in patients presenting to or admitted to hospital in the held-out test set
| Sensitivity 0·70 | Sensitivity 0·80 | Sensitivity 0·90 | |
|---|---|---|---|
| Sensitivity | 0·697 (0·009) | 0·774 (0·019) | 0·847 (0·014) |
| Specificity | 0·986 (0·005) | 0·957 (0·009) | 0·917 (0·018) |
| Precision (PPV) | 0·979 (0·007) | 0·944 (0·012) | 0·905 (0·018) |
| NPV | 0·777 (0·005) | 0·820 (0·013) | 0·866 (0·011) |
| AUROC | 0·939 (0·003) | 0·939 (0·003) | 0·939 (0·003) |
| Sensitivity | 0·663 (0·029) | 0·774 (0·013) | 0·854 (0·007) |
| Specificity | 0·973 (0·000) | 0·948 (0·005) | 0·891 (0·009) |
| Precision (PPV) | 0·950 (0·002) | 0·922 (0·006) | 0·861 (0·010) |
| NPV | 0·785 (0·014) | 0·841 (0·007) | 0·886 (0·005) |
| AUROC | 0·940 (0·001) | 0·940 (0·001) | 0·940 (0·001) |
Data are performance (SD). The test set was generated from an 80:20 stratified train-test split of the dataset and balanced equally with controls (50% assumed prevalence). AUROC=area under the receiver operating characteristic curve. ED=emergency department. NPV=negative predictive values. PPV=positive predictive values.
FigureReceiver operating characteristic curves (A) and relative importance of features (B) for the ED and admissions models
ALT=alanine aminotransferase. APTT=activated partial thromboplastin time. CRP=C-reactive protein. ctO2c=calculated oxygen content. ED=emergency department. FCOHb=fraction of carboxyhaemoglobin. p50c=calculated pressure at which haemoglobin is 50% bound to oxygen.
PPV and NPV of the ED and admissions models, calibrated during training to 70% and 80% sensitivities, for identifying COVID-19 in test sets with various prevalences
| 1% | 2% | 5% | 10% | 20% | 25% | 33% | 50% | ||
|---|---|---|---|---|---|---|---|---|---|
| Sensitivity 0·70 | |||||||||
| PPV | 0·203 | 0·383 | 0·613 | 0·763 | 0·834 | 0·902 | 0·888 | 0·979 | |
| NPV | 0·996 | 0·990 | 0·985 | 0·953 | 0·932 | 0·871 | 0·886 | 0·778 | |
| Sensitivity 0·80 | |||||||||
| PPV | 0·133 | 0·282 | 0·493 | 0·638 | 0·767 | 0·831 | 0·823 | 0·944 | |
| NPV | 0·997 | 0·993 | 0·991 | 0·962 | 0·946 | 0·909 | 0·908 | 0·820 | |
| Sensitivity 0·70 | |||||||||
| PPV | 0·175 | 0·304 | 0·513 | 0·595 | 0·830 | 0·859 | 0·876 | 0·950 | |
| NPV | 0·996 | 0·992 | 0·982 | 0·969 | 0·926 | 0·905 | 0·881 | 0·785 | |
| Sensitivity 0·80 | |||||||||
| PPV | 0·098 | 0·211 | 0·390 | 0·509 | 0·755 | 0·797 | 0·812 | 0·922 | |
| NPV | 0·998 | 0·994 | 0·986 | 0·977 | 0·942 | 0·920 | 0·907 | 0·841 | |
ED=emergency department. NPV=negative predictive values. PPV=positive predictive values.
The 10% scenario approximates the observed prevalence of COVID-19 in patients presenting to the study hospitals during April 1–8, 2020.
The 20% scenario approximates the observed prevalence of COVID-19 in patients admitted to the study hospitals during April 1–8, 2020.