| Literature DB >> 33226957 |
Timothy B Plante1,2, Aaron M Blau2, Adrian N Berg3,4, Aaron S Weinberg5, Ik C Jun5, Victor F Tapson5, Tanya S Kanigan4, Artur B Adib4.
Abstract
BACKGROUND: Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients.Entities:
Keywords: COVID-19; SARS-CoV-2; artificial intelligence; development; electronic medical records; emergency department; laboratory results; machine learning; model; testing; validation
Mesh:
Year: 2020 PMID: 33226957 PMCID: PMC7713695 DOI: 10.2196/24048
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Demographics of patients and encounter details, by COVID-19 statusa.
| Demographics | Training | External validation | Sensitivity analysis | ||||||
|
| Negative (n=10,000) | Positive (n=2183) | Negative (n=171,734) | Positive (n=1020) | Negative (n=6890) | Positive (n=952) | |||
|
| |||||||||
|
| 20 to <30 | 1392 (14) | 198 (9) | 27,952 (16) | 71 (7) | 709 (10) | 70 (7) | ||
|
| 30 to <40 | 1481 (15) | 304 (14) | 29,187 (17) | 127 (12) | 882 (13) | 119 (12) | ||
|
| 40 to <50 | 1398 (14) | 413 (19) | 27,764 (16) | 214 (21) | 896 (13) | 205 (22) | ||
|
| 50 to <60 | 1649 (16) | 400 (18) | 28,896 (17) | 217 (21) | 1172 (17) | 208 (22) | ||
|
| 60 to <70 | 1512 (15) | 367 (17) | 23,771 (14) | 180 (18) | 1200 (17) | 163 (17) | ||
|
| 70 to <80 | 1322 (13) | 264 (12) | 18,460 (11) | 121 (12) | 1063 (15) | 108 (11) | ||
|
| ≥80 | 1246 (12) | 237 (11) | 15,704 (9) | 90 (9) | 968 (14) | 79 (8) | ||
|
| |||||||||
|
| Female | 5876 (59) | 1079 (49) | 102,942 (60) | 502 (49) | 3650 (53) | 477 (50) | ||
|
| Male | 4122 (41) | 1104 (51) | 68,790 (40) | 518 (51) | 3240 (47) | 475 (50) | ||
|
| Unknown | 2 (0) | 0 (0) | 2 (0) | 0 (0) | 0 (0) | 0 (0) | ||
|
| |||||||||
|
| Black | 1791 (18) | 397 (18) | 28,874 (17) | 212 (21) | 1230 (18) | 201 (21) | ||
|
| Other | 904 (9) | 976 (45) | 23,222 (14) | 453 (44) | 772 (11) | 448 (47) | ||
|
| Unknown | 450 (4) | 102 (5) | 12,284 (7) | 48 (5) | 368 (5) | 36 (4) | ||
|
| White | 6855 (69) | 708 (32) | 107,354 (63) | 307 (30) | 4520 (66) | 267 (28) | ||
|
| |||||||||
|
| East North Central | 2065 (21) | 280 (13) | 16,184 (9) | 108 (11) | 1103 (16) | 108 (11) | ||
|
| East South Central | 0 (0) | 0 (0) | 3549 (2) | 50 (5) | 138 (2) | 50 (5) | ||
|
| Middle Atlantic | 782 (8) | 294 (13) | 18,776 (11) | 92 (9) | 1356 (20) | 92 (10) | ||
|
| New England | 493 (5) | 1 (0) | 31,624 (18) | 1 (0) | 1 (0) | 1 (0) | ||
|
| Pacific | 106 (1) | 32 (1) | 3617 (2) | 69 (7) | 34 (0) | 1 (0) | ||
|
| South Atlantic | 3116 (31) | 1192 (55) | 70,463 (41) | 613 (60) | 2790 (40) | 613 (64) | ||
|
| West North Central | 633 (6) | 39 (2) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | ||
|
| West South Central | 2805 (28) | 345 (16) | 27,521 (16) | 87 (9) | 1468 (21) | 87 (9) | ||
|
| |||||||||
|
| Rural | 583 (6) | 21 (1) | 3617 (2) | 1 (0) | 34 (0) | 1 (0) | ||
|
| Urban | 9417 (94) | 2162 (99) | 168,117 (98) | 1019 (100) | 6856 (100) | 951 (100) | ||
|
| |||||||||
|
| Discharge from emergency department | 7487 (75) | 1175 (54) | 132,195 (77) | 522 (51) | 4072 (59) | 522 (55) | ||
|
| Non–intensive care unit admission | 2068 (21) | 805 (37) | 29,793 (17) | 379 (37) | 2375 (34) | 335 (35) | ||
|
| Intensive care unit admission | 445 (4) | 203 (9) | 9746 (6) | 119 (12) | 443 (6) | 95 (10) | ||
aFor the training data set: COVID-19 positivity was defined as a positive COVID-19 reverse-transcription polymerase chain reaction (hereafter, PCR) test on the day of presentation to the emergency department among patients in the pandemic time frame (March 2020 through July 2020) in the Premier Healthcare Database (PHD) database among a random selection of 43 of the 64 PHD hospitals reporting PCR positives. COVID-19 negativity was defined as a selection of 10,000 patients in the prepandemic time frame (January through December 2019) in the PHD database from the same 43 hospitals as the patients with COVID-19. For the external validation data set: COVID-19 positivity was defined the same as for the training data set for the PHD data set but also included 952 PCR-positives from the 21 hospitals in the PHD holdout set. Additionally, it included 68 patients with PCR-confirmed COVID-19 from Cedar Sinai Medical Center from March and April 2020. COVID-19 negativity in the external validation set was defined using 154,341 prepandemic visits from the 21 hospitals in the PHD holdout set (January through December 2019) in which primary diagnoses were among the 20 most frequent primary diagnoses given to patients negative for COVID-19 during the pandemic, using Clinical Classification Software Refined codes. It also included 17,393 prepandemic (2008-2019) patient encounters from Beth Israel Deaconess Medical Center. For the sensitivity data set: COVID-19 positivity included the same 952 PCR-positives from the 21 hospitals in the external validation data set. COVID-19 negativity was defined as visits with at least 1 PCR-negative but no PCR-positive result on the day of presentation, and included all 6890 patients with such results from the same 21 hospitals as the positives.
bCensus division was defined using US Census classification [19]. Rural areas are considered territory outside of the US Census Bureau’s definition of urban [20]. These geographic descriptions pertain to the hospital, not the patient’s permanent residence.
Figure 1Discrimination as assessed by ROC curves for training, external validation, and sensitivity analysis data sets. ROC curves for the 3 different data sets: training (blue), external validation (orange), and sensitivity analysis (green). The training curve was obtained through 5-fold cross-validation, where positive controls are PCR-confirmed cases during the pandemic (N=2183) and negative controls are prepandemic patients (N=10,000) from 43 hospitals in the PHD. The training AUROC was 0.91 (95% CI 0.90-0.92). The external validation curve was performed in the external validation data set after training the model on the training data set. External validation positives are PCR-confirmed cases from Cedars-Sinai Medical Center (N=68) and from the PHD holdout set (N=952) comprising 21 hospitals. External validation negatives are prepandemic (2019) patients, from the same 21 PHD hospitals, that match the top 20 primary non–COVID-19 diagnoses in 2020 (N=154,341), as well as all eligible prepandemic (2008-2019) Beth Israel Deaconess Medical Center patients (N=17,393). The AUROC in the external validation data set was 0.91 (95% CI 0.90-0.92). The sensitivity analysis curve demonstrates the effect of using prepandemic patients as negative controls compared to using PCR-negatives from 2020. In this data set, both positives (N=952) and negatives (N=6890) were PCR-confirmed patients from the PHD holdout set (21 hospitals), and no prepandemic data was included. The AUROC in the sensitivity analysis set was 0.89 (95% CI 0.88-0.90). AUROC: area under the receiver operating characteristic curve; PCR: polymerase chain reaction; PHD: Premier Healthcare Database; ROC: receiver operating characteristic.
Clinical performance metrics for the model in the external validation data set for various score cutoffs and COVID-19 pretest prevalencea.
| Score cutoff | Sensitivity | Specificity | Likelihood ratiob | Prevalence of 1% | Prevalence of 10% | Prevalence of 20% | |||
|
|
|
|
| NPVc, % | Yieldd, % | NPV, % | Yield, % | NPV, % | Yield, % |
| 1 | 95.9 | 41.7 | 0.099 | 99.9 | 41.3 | 98.9 | 38.0 | 97.6 | 34.2 |
| 2 | 92.6 | 60.0 | 0.124 | 99.9 | 59.4 | 98.6 | 54.7 | 97.0 | 49.5 |
| 5 | 85.5 | 78.5 | 0.185 | 99.8 | 77.8 | 98.0 | 72.1 | 95.6 | 65.7 |
| 10 | 79.4 | 87.6 | 0.235 | 99.8 | 86.9 | 97.4 | 80.9 | 94.4 | 74.2 |
aThe maximum score was 100; a higher score indicates higher model prediction of COVID-19 positivity.
bThe likelihood ratio uses the equation for negative tests.
cNPV: negative predictive value.
dYield refers to diagnostic yield, which is the percentage of patients that can be ruled out (ie, those with a score below the cutoff).
Figure 2Discrimination as assessed by AUROC curve in age, sex, race, and ED disposition subgroups in the external validation data set. Non-ICU patients were admitted to the hospital but not to an ICU. Distribution of AUROC curves per demographic, as well as per patient disposition type (ED discharge, non-ICU, and ICU) in the external validation data set. Top numbers are AUROC curves, bottom numbers in parentheses are the number of patients. AUROC: area under the receiver operating characteristic; ED: emergency department; ICU: intensive care unit.