| Literature DB >> 35476727 |
Jeffrey G Klann1, Zachary H Strasser1, Meghan R Hutch2, Chris J Kennedy3, Jayson S Marwaha4, Michele Morris5, Malarkodi Jebathilagam Samayamuthu5, Ashley C Pfaff4, Hossein Estiri1, Andrew M South6, Griffin M Weber7, William Yuan7, Paul Avillach7, Kavishwar B Wagholikar1, Yuan Luo2, Gilbert S Omenn8, Shyam Visweswaran5, John H Holmes9, Zongqi Xia10, Gabriel A Brat7, Shawn N Murphy11.
Abstract
BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification.Entities:
Keywords: COVID-19; SARS-CoV-2; clinical research informatics; electronic health records; health care; health data; medical informatics; patient data; phenotype; public health
Mesh:
Year: 2022 PMID: 35476727 PMCID: PMC9119395 DOI: 10.2196/37931
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Participating health care systems’ overall characteristics and the number and period of chart reviews performed for this study.
| Participating site | Hospitals, n | Inpatient discharges per year, n | Number of chart reviews performed, n | Chart review time period, start date-end date |
| BIDMCa | 1 | 40,752 | 400 | March 2020-March 2021 |
| MGBb | 10 | 163,521 | 406 | March 2020-July 2021 |
| NWUc | 10 | 103,279 | 70 | March 2020-February 2021 |
| UPITTd | 39 | 369,300 | 247 | April 2020-August 2021 |
aBIDMC: Beth Israel Deaconess Medical Center.
bMGB: Mass General Brigham.
cNWU: Northwestern University.
dUPITT: University of Pittsburgh/University of Pittsburgh Medical Center.
Figure 1The chart review process. (1-2) At each site, an equal number of patients admitted with a positive SARS-CoV-2 PCR test were sampled by quarter or by month. (3-4) A chart reviewer at the site examined primarily the admission note, discharge summary (or death note), and laboratory values for the hospitalization to classify as admitted for COVID-19, incidental SARS-CoV2, or uncertain. (5-6) These classifications were then merged with 4CE EHR data for use with shared analytic scripts in R. (7-8) The top phenotypes at each site output by the data mining algorithm were summarized, and this was used to manually construct feature sets to be used across sites by selecting components that appeared in step 7 at multiple sites. (9) The performance over time of the top multisite phenotypes was visualized. 4CE: Consortium for Clinical Characterization of COVID-19 by EHR; EHR: electronic health record; ICD-10: International Classification of Diseases, Tenth Revision; PCR: polymerase chain reaction.
Figure 2Design of the phenotyping algorithm. Predictive feature sets of iteratively larger size were selected based on their sensitivity and specificity in correctly identifying COVID-19-specific admissions using 4CE EHR data and chart reviews. We chose the following parameters after testing various thresholds at all 4 sites: AND feature sets, x=0.40, y=0.20, p=0.30; OR feature sets x=0.10, y=0.50, p=0.20; and single features: x=y=p=0. 4CE: Consortium for Clinical Characterization of COVID-19 by EHR; EHR: electronic health record.
Summary of the chart review criteria developed by the 4CEa subgroup of physicians, medical informaticians, and data scientists.
| Chart-reviewed classification | Criteria |
| Admitted specifically | Symptoms on admission Respiratory insufficiency Blood clots in vital organs Hemodynamic changes Other common viral symptoms, such as cough and fever Admitted for non-COVID-19 issue but developed any of the above symptoms while hospitalized |
| Admitted incidentally | The admission history was Trauma Procedure or operation requiring hospitalization Term labor Alternative causes, including drug overdose, cancer progression, and nonrespiratory severe infection |
| Uncertain | Symptoms on admission Preterm labor Liver dysfunction Graft failure Immune system dysfunction Alternative causes, including sickle cell crisis, failure to thrive, and altered mental status |
a4CE: Consortium for Clinical Characterization of COVID-19 by EHRb.
bEHR: electronic health record.
Proportion of chart-reviewed patients admitted specifically for COVID-19 vs admitted with incidental SARS-CoV-2, overall and stratified by site, with a detailed criteria breakdown. A detailed breakdown at site D could not be included, because their process did not record the specific criteria for each classification. Note that cells with 0% are still included to show all the chart review criteria.
| Category | Site A (N=406), n (%) | Site B (N=70), n (%) | Site C (N=247), n (%) | Site D (N=400), n (%) | Overall (N=1123), n (%) | |
|
| 764 (68) | |||||
|
| All | 288 (71) | 59 (84) | 180 (73) | 240 (60) | N/Aa |
|
| Respiratory insufficiency | 202 (50) | 36 (51) | 128 (52) | N/A | N/A |
|
| Blood clot | 6 (1) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Hemodynamic changes | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Other symptomatic COVID-19 | 71 (18) | 19 (27) | 47 (20) | N/A | N/A |
|
| Not admitted for COVID-19 but developed 1 of the above criteria | 8 (2) | <3 (<5) | 5 (2) | N/A | N/A |
|
| 292 (26) | |||||
|
| All | 85 (20) | 9 (13) | 54 (22) | 144 (36) | N/A |
|
| Full-term labor | 18 (4) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Procedure | 8 (2) | <3 (<5) | 9 (4) | N/A | N/A |
|
| Trauma | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Other not COVID-19 | 50 (13) | 6 (9) | 44 (18) | N/A | N/A |
|
| 67 (6) | |||||
|
| All | 33 (8) | <3 (<5) | 10 (4) | 16 (4) | N/A |
|
| Immune dysfunction | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Early labor | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Liver dysfunction | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Graft failure | <3 (<5) | <3 (<5) | <3 (<5) | N/A | N/A |
|
| Other possible COVID-19 | 31 (8) | <3 (<5) | 10 (4) | N/A | N/A |
aN/A: not applicable.
Demographic characterization of the chart-reviewed cohort by site. For each row, the count and percentage (in parentheses) at each site are shown. Two sites did not report Hispanic/Latino. N values for each site are shown in the header; these might not exactly match the summation of each category due to blurring requirements.
| Category | Site A (N=406), n (%) | Site B (N=70), n (%) | Site C (N=247), n (%) | Site D (N=400), n (%) | |
|
| |||||
|
| 0-25 | 14 (4) | 11 (14) | 4 (1) | 11 (3) |
|
| 26-49 | 95 (23) | 15 (21) | 26 (10) | 76 (18) |
|
| 50-69 | 138 (35) | 22 (31) | 99 (40) | 135 (33) |
|
| 70-79 | 72 (17) | 9 (13) | 59 (24) | 90 (22) |
|
| 80+ | 83 (20) | 13 (18) | 59 (24) | 81 (19) |
|
| |||||
|
| Asian | 8 (2) | 2 (3) | 5 (2) | 17 (4) |
|
| Black | 60 (14) | 9 (13) | 58 (23) | 97 (24) |
|
| Hispanic/Latino | 21 (6) | N/Aa | N/A | 55 (14) |
|
| White | 78 (19) | 50 (71) | 179 (72) | 173 (42) |
|
| No information | 230 (58) | 8 (11) | 5 (2) | 61 (14) |
|
| |||||
|
| Male | 200 (50) | 42 (60) | 121 (49) | 188 (47) |
|
| Female | 200 (50) | 28 (40) | 126 (51) | 211 (52) |
aN/A: not applicable.
Figure 3Chart-reviewed proportion of admissions specifically for COVID-19 among all chart reviews by month at each site. The bubble size shows the relative number of patient chart reviews performed that month. The trendline was weighted by bubble size and was performed using locally weighted least squares (loess) regression. Note that the y axis and 95% CI limits extend above 100%.
Top 10 ICD-10a diagnoses among patients’ charts reviewed as admitted specifically for COVID-19, with the proportion of patients with each diagnosis at each site. Each patient might have multiple diagnoses, and therefore, the sum might be greater than 100%.
| ICD-10 diagnosis | Site A (N=288), n (%) | Site B (N=59), n (%) | Site C (N=180), n (%) | Site D (N=240), n (%) |
| U07.1 Covid-19 | 265 (92) | 54 (92) | 145 (80) | 226 (95) |
| J12.89 Other Viral Pneumonia | 125 (44) | 24 (41) | 64 (35) | 173 (70) |
| I10 Essential (Primary) Hypertension | 113 (39) | 16 (27) | 74 (41) | 89 (37) |
| J96.01 Acute Respiratory Failure With Hypoxia | 75 (26) | 20 (34) | 56 (31) | 139 (58) |
| E78.5 Hyperlipidemia, Unspecified | 79 (28) | 4 (7) | 69 (38) | 108 (46) |
| N17.9 Acute Kidney Failure, Unspecified | 74 (25) | 4 (7) | 40 (22) | 94 (39) |
| K21.9 Gastro-Esophageal Reflux Disease Without Esophagitis | 64 (22) | <3 (<3) | 57 (31) | 65 (26) |
| Z87.891 Personal History of Nicotine Dependence | 56 (18) | <3 (<3) | 44 (24) | 66 (27) |
| R09.02 Hypoxemia | 81 (29) | 15 (25) | 21 (12) | 43 (17) |
| J12.82 Pneumonia due to COVID-19 | 72 (25) | 12 (20) | 39 (22) | 35 (15) |
aICD-10: International Classification of Diseases, Tenth Revision.
Top 10 ICD-10a diagnoses among patients’ charts reviewed as admitted with incidental COVID-19, with the proportion of patients with each diagnosis at each site. Each patient might have multiple diagnoses, and therefore, the sum might be greater than 100%.
| ICD-10 diagnosis | Site A (N=85), n (%) | Site B (N=9), n (%) | Site C (N=54), n (%) | Site D (N=144), n (%) |
| U07.1 Covid-19 | 63 (74) | 5 (56) | 40 (73) | 122 (85) |
| N17.9 Acute Kidney Failure, Unspecified | 12 (14) | <3 (<11) | 12 (22) | 24 (17) |
| E11.22 Type 2 Diabetes Mellitus with Diabetic Chronic Kidney Disease | 5 (6) | <3 (<11) | 7 (13) | 23 (15) |
| E11.9 Type 2 Diabetes Mellitus Without Complications | 12 (11) | <3 (<11) | 4 (7) | 14 (11) |
| D64.9 Anemia, Unspecified | 13 (19) | <3 (<11) | 5 (9) | 10 (6) |
| E87.2 Acidosis | 8 (6) | <3 (<11) | <3 (<5) | 12 (10) |
| J12.89 Other Viral Pneumonia | <3 (<2) | <3 (<11) | 4 (7) | 15 (12) |
| J96.01 Acute Respiratory Failure With Hypoxia | 6 (8) | <3 (<11) | 4 (7) | 13 (8) |
| D69.6 Thrombocytopenia, Unspecified | 5 (7) | <3 (<11) | 6 (11) | 12 (7) |
| N18.6 End-Stage Renal Disease | 6 (7) | <3 (<11) | 5 (9) | 6 (5) |
aICD-10: International Classification of Diseases, Tenth Revision.
Top phenotyping feature sets by specificity, with a sensitivity of at least 0.60 for detecting admissions specifically for COVID-19. The table is grouped into feature sets involving potentially real-time data (laboratory tests) and all available data (presence of laboratory tests, medications, and diagnosis codes). Note that laboratory test results are not included in the feature sets. Ranges are shown in the summary statistics because multiple rules with similar performance were summarized using conjunctive normal form.
| Phenotyping feature set | Site | Sensitivity | Specificity | Prevalence (%) | |
| “ | |||||
|
| CRPa AND (Total Bilirubin OR Ferritin OR LDHb) AND (Lymphocyte Count OR Neutrophil Count) AND Cardiac Troponin | D | 0.65-0.72 | 0.85 | 67-71 |
|
| Ferritin AND LDH AND Cardiac Troponin AND (INRc OR PTTd OR Lymphocyte Count OR Neutrophil Count) | D | 0.62-0.69 | 0.85 | 67-71 |
|
| CRP AND (LDH AND/OR Ferritin) AND Cardiac Troponin | A | 0.67-0.70 | 0.89-0.90 | 72-77 |
|
| Procalcitonin OR D-dimer OR CRP OR Cardiac Troponin OR Ferritin | A | 0.63-0.87 | 0.73-0.85 | 65-85 |
|
| Any 2 of: Procalcitonin, LDH, CRP | B | 0.56-0.58 | 0.67 | 63-67 |
|
| D-dimer OR Ferritin OR CRP | C | 0.26-0.37 | 0.86-0.93 | 54-58 |
| “ | |||||
|
| Total bilirubin AND (Ferritin OR LDH OR Lymphocyte Count OR Neutrophil Count) AND diagnosis of Other Viral Pneumonia (J12.89) | D | 0.62-0.64 | 0.92 | 46-48 |
|
| Diagnosis of: Other Viral Pneumonia (J12.89) OR Acute Respiratory Failure with Hypoxia (J96.01) OR Anemia (D64.9) | D | 0.70-0.74 | 0.82-0.88 | 50-63 |
|
| Diagnosis of: Other Viral Pneumonia (J12.89) OR Supplemental Oxygen (severe) | D | 0.75 | 0.82 | 61 |
|
| CRP AND (LDH OR Ferritin) AND Cardiac Troponin | A | 0.70 | 0.89 | 74-77 |
|
| Remdesivir OR Procalcitonin OR Other Viral Pneumonia (J12.89) OR Nonspecific Abnormal Lung Finding (R91.8) OR Shortness of Breath (R06.02) OR Other COVID Disease (J12.82) | A | 0.68-0.72 | 0.85-0.95 | 58-74 |
|
| Hypoxemia (R09.02) OR Other Coronavirus as Cause of Disease (B97.29) OR Shortness of Breath (R06.02) OR Pneumonia (unspecified organism) (J18.9) OR Acute Respiratory Failure with Hypoxia (J96.01) OR Nonspecific Abnormal Lung Finding (R91.8) | B | 0.63-0.68 | 0.89-0.99 | 54-67 |
|
| D-dimer OR ferritin OR CRP OR Other Viral Pneumonia (J12.89) OR Acute Respiratory Failure with Hypoxia (J96.01) | C | 0.71-0.75 | 0.79-0.86 | 52-58 |
aCRP: C-reactive protein.
bLDH: lactate dehydrogenase.
cINR: international normalized ratio.
dPTT: partial thromboplastin time.
The best multisite phenotyping feature sets and their overall performance characteristics. The multisite phenotypes were derived from Table 7 by selecting components of phenotypes that appeared at multiple sites.
| Phenotyping Feature Set | Description | Sensitivity, specificity |
| Other Viral Pneumonia OR Acute Respiratory Failure with Hypoxia OR Shortness of Breath OR Abnormal Lung Finding |
Site A: 0.79,0.72 Site B: 0.88, 0.85
Site D: 0.64,0.58 | |
| CRPa AND Ferritin |
Site A: 0.76,0.85 Site B: 0.88, 0.85 Site C: 0.42, 0.98 Site D: 0.66, 0.55 | |
| Remdesivir OR Oxygen (severe) OR Dx of Other Viral Pneumonia |
Site C: 0.60,0.92
|
aCRP: C-reactive protein.
bThe top-performing phenotype at each site is italicized.
Figure 4Performance of the top phenotyping feature sets (Table 7) over time at each site. The y axis is the number of admissions per week, the x axis is the week, and overall sensitivity and specificity are shown on each figure panel. Solid lines show the total number of weekly admissions for patients with a positive SARS-CoV-2 PCR test. Dashed lines show the number of weekly admissions after filtering to select patients admitted specifically for COVID-19 (ie, removing all patients who do not meet the phenotyping feature set criteria). The dotted line shows the difference between the solid line and the dashed line (ie, patients removed from the cohort in the dashed line). Green dots indicate correct classification by the phenotype according to chart review. Orange dots indicate incorrect classification. The dot size is proportional to the number of chart reviews. PCR: polymerase chain reaction.