| Literature DB >> 33774203 |
Juan Zhao1, Monika E Grabowska2, Vern Eric Kerchberger3, Joshua C Smith1, H Nur Eken4, QiPing Feng5, Josh F Peterson1, S Trent Rosenbloom1, Kevin B Johnson6, Wei-Qi Wei7.
Abstract
OBJECTIVE: Identifying symptoms and characteristics highly specific to coronavirus disease 2019 (COVID-19) would improve the clinical and public health response to this pandemic challenge. Here, we describe a high-throughput approach - Concept-Wide Association Study (ConceptWAS) - that systematically scans a disease's clinical manifestations from clinical notes. We used this method to identify symptoms specific to COVID-19 early in the course of the pandemic.Entities:
Keywords: COVID-19; EHR; Natural language processing
Mesh:
Year: 2021 PMID: 33774203 PMCID: PMC7992296 DOI: 10.1016/j.jbi.2021.103748
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 8.000
Fig. A1Flowchart of study design for ConceptWAS between COVID-19-positive (case) and COVID-19 negatives (control).
Fig. B1Proportion of cases/controls with clinical notes on the days around COVID-19 test date. The x-axis indicates the note day relative to the COVID-19 test date. > 86% patients who have a PCR test had a clinical note within 24 h before the test date.
The notes types that were extracted in the study.
| Note type |
|---|
| Progress Notes |
| Social History |
| Imaging |
| ECG_IMPRESSION |
| ED Triage Notes |
| ED Provider Notes |
| H&P |
| Problem List |
| Assessment & Plan Note |
| Synopsis Sub-Note |
| Procedures |
| Subjective & Objective |
| History of Present Illness Sub-Note |
| Consults |
| Diagnostic Studies Sub-Note |
| Initial Assessments |
| ED Notes |
| Anesthesia Preprocedure Evaluation |
| Hospital Course Sub-Note |
| Clinical Update |
| Pathology And Cytology |
| Anesthesia Procedure Notes |
| Cardiac Services |
| Operative Report |
| Consult Reason Sub-Note |
| ED Progress Note |
| Anesthesia Postprocedure Evaluation |
| Technologist Note |
| Brief Op Note |
| Perioperative Nursing Note |
| Nursing Note |
| Neurology |
| Discharge Summary |
| Plan by Systems Sub-Note |
| ED Procedure Note |
| Echocardiography |
| Transthoracic Echocardiogram Report |
| Significant Event |
| Lactation Note |
| LAB |
| Post-operative Check |
| Research Informed Consent Note |
| Treatment Plan |
| Group Note |
| Pre-Procedure Note |
| Pre-Procedure Instructions |
| Plan of Care |
| Death Summary |
| Covering Surgeon |
| Research Coordinator Notes |
| Post-Procedure Note |
| Transition Plan Sub-Note |
| Interval H&P Note |
| Anesthesia Post-op Follow-up Note |
| Discharge Instr - Other Orders |
| Discharge Instr - Appointments |
| Interim Summary |
| Research Billing Note |
| External Transfer Orders |
| Declaration of Brain Death |
| Radiation oncology |
| Teleconsult |
| Discharge Instr - Activity |
| Discharge Instr - Diet |
| ACP (Advance Care Planning) |
| Code Documentation |
| Letter |
| Medical Student Progress Note |
| Onc Cost of Treatment |
| Transesophageal Echocardiogram Report |
Fig. C1Schematic framework of the ConceptWAS and NLP pipeline.
Semantic type of concepts that were included in the analysis.
| Semantic type |
|---|
| Sign or Symptom |
| Finding |
| Disease or Syndrome |
| Mental Process |
| Mental or Behavioral Dysfunction |
| Organism Function |
| Laboratory or Test Result |
| Individual Behavior |
| Social Behavior |
| Acquired Abnormality |
| Age Group |
| Population Group |
Patient characteristics of the study cohort.
| Attribute | Cases: COVID-19-positive (n = 1,483) | Controls: COVID-19-negative (n = 18,209) | P- value |
|---|---|---|---|
| Age (mean years +/- stddev) | 41.5 (16.2) | 44.9 (16.9) | <0.0001 |
| Gender (% Male) | 48.0% | 41.7% | <0.0001* |
| Race (% White) | 49.6% | 66.7% | <0.0001* |
| Average EHR length (years, +/- stddev) | 7.3 (8.1) | 9.2 (8.5) | <0.0001 |
| Average CUIs (+/- stddev) | 46.1 (61.1) | 71.9 (96.3) | <0.0001 |
* 2-proportion z hypothesis test was performed. For age, EHR length, and average CUIs, a t-test was performed for comparing the mean and standard deviations.
Fig. 1Volcano plot of a ConceptWAS scan for 19, 692 patients that included COVID-19-positive group (cases) and negative group (controls). The points are colored by the semantic type of the concepts. Selected associations related to signs, symptoms, or diseases/syndromes are labeled. The volcano plot indicates -log10 (p-value) for association (y-axis) plotted against their respective log2 (fold change) (x-axis). The dashed line represents significance level using a Bonferroni correction.
ConceptWAS between COVID-19-positive (case) and COVID-19 negative(control). The table presents the significant concepts related to sign or symptom, disease or syndrome, or individual behaviors, which crossed Bonferroni p-value < 2.55E-06. “Neg” stands for negated attribute.
| Concept CUI (attribute) | Concept Name | Semantic type | Case Count (%) | Control Count(%) | OR (95%CI) | P-value |
|---|---|---|---|---|---|---|
| C1998726 | Adequate Knowledge | Finding | 109 (7.3%) | 3033 (16.7%) | 0.46 (0.38,0.56) | 2.22E-16 |
| C0243095 | Finding | Finding | 75 (5.1%) | 2309 (12.7%) | 0.44 (0.34,0.56) | 9.55E-14 |
| C0205400 | Increased thickness (Finding) | Finding | 11 (0.7%) | 854 (4.7%) | 0.19 (0.10,0.33) | 4.97E-13 |
| C0011570 | Depression | Mental or Behavioral Dysfunction | 34 (2.3%) | 1430 (7.9%) | 0.34 (0.24,0.47) | 5.12E-13 |
| C0013604 | Edema | Sign or Symptom | 48 (3.2%) | 1698 (9.3%) | 0.40 (0.29,0.53) | 1.88E-12 |
| C0015967 (neg) | Fever (neg) | Sign or Symptom | 286 (19.3%) | 5088 (27.9%) | 0.63 (0.55,0.72) | 3.87E-12 |
| C0003467 | Anxiety | Mental or Behavioral Dysfunction | 42 (2.8%) | 1494 (8.2%) | 0.39 (0.28,0.52) | 4.40E-12 |
| C1305863 | Anesthesia Type | Finding | 4 (0.3%) | 536 (2.9%) | 0.12 (0.04,0.26) | 1.17E-11 |
| C0554862 | Result, Lab.- General (Observable Entity) | Laboratory or Test Result | 17 (1.1%) | 891 (4.9%) | 0.27 (0.16,0.42) | 1.65E-11 |
| C0445088 | Neck Flexion (Finding) | Finding | 3 (0.2%) | 494 (2.7%) | 0.10 (0.03,0.24) | 1.74E-11 |
| C0231224 | Crisis | Finding | 14 (0.9%) | 850 (4.7%) | 0.24 (0.14,0.39) | 2.16E-11 |
| C0453996 | Tobacco smoking behavior | Individual Behavior | 99 (6.7%) | 2445 (13.4%) | 0.52 (0.42,0.64) | 4.11E-11 |
| C0020255 (neg) | Hydrocephalus (neg) | Disease or Syndrome | 8 (0.5%) | 662 (3.6%) | 0.18 (0.09,0.34) | 5.98E-11 |
| C0235195 (neg) | Sedated State(neg) | Finding | 3 (0.2%) | 445 (2.4%) | 0.10 (0.03,0.25) | 7.07E-11 |
| C0003126 | Anosmia | Finding | 30 (2.0%) | 85 (0.5%) | 4.97 (3.21,7.50) | 9.21E-11 |
| C1880200 | Current some day smoker | Finding | 17 (1.1%) | 796 (4.4%) | 0.28 (0.17,0.44) | 1.25E-10 |
| C0457318 | Blood Group Ab Rh(d) Negative | Laboratory or Test Result | 7 (0.5%) | 589 (3.2%) | 0.18 (0.08,0.34) | 1.58E-10 |
| C0586120 | Smoking monitoring status | Finding | 69 (4.7%) | 1780 (9.8%) | 0.48 (0.37,0.61) | 1.77E-10 |
| C0015967 | Fever | Sign or Symptom | 614 (41.4%) | 6055 (33.3%) | 1.43 (1.28,1.59) | 1.97E-10 |
| C0031039 (neg) | Pericardial Fluid(neg) | Disease or Syndrome | 15 (1.0%) | 819 (4.5%) | 0.27 (0.16,0.43) | 3.72E-10 |
| C1262869 | Body position | Finding | 16 (1.1%) | 821 (4.5%) | 0.28 (0.16,0.44) | 4.26E-10 |
| C0221198 | Lesion | Finding | 16 (1.1%) | 860 (4.7%) | 0.28 (0.17,0.45) | 7.68E-10 |
| C0521530 (neg) | Consolidation of Lung(neg) | Disease or Syndrome | 53 (3.6%) | 1604 (8.8%) | 0.46 (0.34,0.60) | 8.03E-10 |
| C0020295 (neg) | Hydronephroses(neg) | Disease or Syndrome | 10 (0.7%) | 667 (3.7%) | 0.22 (0.11,0.39) | 8.57E-10 |
| C0750426 | White Blood Cell Count Increased (Lab Result) | Finding | 70 (4.7%) | 1951 (10.7%) | 0.50 (0.39,0.64) | 2.02E-09 |
| C0455735 | Comments on own reading | Finding | 66 (4.5%) | 1850 (10.2%) | 0.51 (0.39,0.65) | 7.13E-09 |
| C0580359 | Allergy test positive | Laboratory or Test Result | 39 (2.6%) | 128 (0.7%) | 3.35 (2.29,4.79) | 7.42E-09 |
| C0231170 | Disability | Finding | 8 (0.5%) | 559 (3.1%) | 0.22 (0.10,0.40) | 1.40E-08 |
| C1277295 | Cough with fever | Sign or Symptom | 70 (4.7%) | 396 (2.2%) | 2.29 (1.75,2.96) | 1.46E-08 |
| C0427451 | Sickling test positive | Laboratory or Test Result | 15 (1.0%) | 22 (0.1%) | 8.66 (4.38,16.69) | 1.55E-08 |
| C0184763 | Patient condition unchanged | Finding | 3 (0.2%) | 403 (2.2%) | 0.13 (0.04,0.31) | 1.67E-08 |
| C0030554 (neg) | Paresthesias(neg) | Disease or Syndrome | 6 (0.4%) | 441 (2.4%) | 0.19 (0.08,0.38) | 1.84E-08 |
| C0043144 | Wheezings | Sign or Symptom | 10 (0.7%) | 571 (3.1%) | 0.25 (0.13,0.44) | 3.35E-08 |
| C3853152 | Does with Much Difficulty | Finding | 21 (1.4%) | 871 (4.8%) | 0.35 (0.22,0.53) | 3.71E-08 |
| C0028259 | Nodule | Acquired Abnormality | 7 (0.5%) | 552 (3.0%) | 0.21 (0.09,0.41) | 4.08E-08 |
| C0023518 | Leukocytosis | Disease or Syndrome | 10 (0.7%) | 607 (3.3%) | 0.26 (0.13,0.45) | 4.42E-08 |
| C0337671 | Former smoker | Finding | 28 (1.9%) | 1036 (5.7%) | 0.40 (0.27,0.57) | 4.62E-08 |
| C0014544 | Epilepsy | Disease or Syndrome | 8 (0.5%) | 556 (3.1%) | 0.23 (0.11,0.42) | 5.09E-08 |
| C2364111 | Ageustia | Sign or Symptom | 20 (1.3%) | 53 (0.3%) | 5.18 (3.02,8.58) | 6.16E-08 |
| C0444867 | both patent | Finding | 64 (4.3%) | 1673 (9.2%) | 0.52 (0.40,0.67) | 7.36E-08 |
| C0032227 | Pleural effusion disorder | Disease or Syndrome | 9 (0.6%) | 587 (3.2%) | 0.25 (0.12,0.45) | 9.69E-08 |
| C1851100 | Laurin-sandrow syndrome | Disease or Syndrome | 15 (1.0%) | 706 (3.9%) | 0.32 (0.18,0.51) | 1.23E-07 |
| C0086409 | Hispanics | Population Group | 30 (2.0%) | 71 (0.4%) | 3.66 (2.33,5.61) | 1.23E-07 |
| C1287298 | Urine volume finding | Finding | 3 (0.2%) | 374 (2.1%) | 0.14 (0.04,0.34) | 1.51E-07 |
| C0002871 | Anemia | Disease or Syndrome | 26 (1.8%) | 964 (5.3%) | 0.40 (0.27,0.59) | 2.00E-07 |
| C4081907 | Patient Identity Verified (Finding) | Finding | 3 (0.2%) | 311 (1.7%) | 0.14 (0.04,0.35) | 2.21E-07 |
| C0032074 | Planning | Mental Process | 195 (13.1%) | 3636 (20.0%) | 0.68 (0.58,0.79) | 3.48E-07 |
| C0455458 | Pmh - Past Medical History | Finding | 80 (5.4%) | 1887 (10.4%) | 0.57 (0.45,0.72) | 4.55E-07 |
| C0004048 | Inspiration Function | Organism Function | 54 (3.6%) | 1420 (7.8%) | 0.52 (0.39,0.68) | 5.06E-07 |
| C0442770 | Sees hand movements | Finding | 7 (0.5%) | 5 (0.0%) | 22.18 (7.26,71.96) | 5.60E-07 |
| C0332148 | Probable diagnosis | Finding | 37 (2.5%) | 1114 (6.1%) | 0.47 (0.33,0.64) | 5.73E-07 |
| C0700124 | Dilated | Finding | 16 (1.1%) | 691 (3.8%) | 0.35 (0.20,0.55) | 6.16E-07 |
| C0036572 | Convulsion | Sign or Symptom | 10 (0.7%) | 537 (2.9%) | 0.28 (0.14,0.50) | 7.28E-07 |
| C0233519 (neg) | Suspiciousness(neg) | Finding | 12 (0.8%) | 595 (3.3%) | 0.31 (0.17,0.52) | 7.50E-07 |
| C0332219 | Not Difficult at all | Finding | 9 (0.6%) | 501 (2.8%) | 0.27 (0.13,0.49) | 7.71E-07 |
| C0475269 | G1 Grade (Finding) | Finding | 2 (0.1%) | 309 (1.7%) | 0.12 (0.03,0.34) | 8.01E-07 |
| C0025517 | Disease, Metabolic | Disease or Syndrome | 17 (1.1%) | 746 (4.1%) | 0.36 (0.21,0.56) | 8.35E-07 |
| C0032326 (neg) | Pneumothorax (neg) | Disease or Syndrome | 161 (10.9%) | 3280 (18.0%) | 0.66 (0.55,0.78) | 8.87E-07 |
| C0277803 (neg) | Normal vital signs (neg) | Finding | 34 (2.3%) | 146 (0.8%) | 2.88 (1.94,4.17) | 9.60E-07 |
| C0032227 (neg) | Pleural effusion Disorder(neg) | Disease or Syndrome | 127 (8.6%) | 2674 (14.7%) | 0.64 (0.53,0.77) | 9.94E-07 |
| C0032310 | Pneumonias, Viral | Disease or Syndrome | 33 (2.2%) | 180 (1.0%) | 2.88 (1.94,4.15) | 1.00E-06 |
| C0033213 | Problem | Finding | 191 (12.9%) | 3525 (19.4%) | 0.69 (0.58,0.80) | 1.15E-06 |
| C1285647 | Characteristic of Perceptual Performance (Observable Entity) | Mental Process | 5 (0.3%) | 382 (2.1%) | 0.21 (0.08,0.43) | 1.32E-06 |
| C0574839 | Seen on arrival (Finding) | Finding | 20 (1.3%) | 729 (4.0%) | 0.39 (0.24,0.60) | 2.17E-06 |
| C0043157 | Caucasians | Population Group | 7 (0.5%) | 421 (2.3%) | 0.25 (0.11,0.48) | 2.25E-06 |
| C0449850 (neg) | Patient position finding (neg) | Finding | 3 (0.2%) | 269 (1.5%) | 0.16 (0.04,0.39) | 2.25E-06 |
| C0278061 | Altered Mental Status (Finding) | Mental or Behavioral Dysfunction | 7 (0.5%) | 469 (2.6%) | 0.25 (0.11,0.48) | 2.46E-06 |
Fig. 2Forest plot comparing individual concepts between COVID-19-positive (case) and COVID-19-negative (control) patients. Selected associations include the significant signals related to semantic types of symptoms that met Bonferroni-corrected significance (p-value < 2.55E-06). The odds ratio has been adjusted for age, gender, and race. The concepts are ordered by p-value.
Fig. 3Temporal ConceptWAS using every 2-week cumulative data. For significant signals (related to signs, symptoms) using all data (labeled in Fig. 2), the plot indicates their -log 10 (p-value) for association (y-axis) against using the cumulative data started between March 8, 2020 to n weeks (x-axis). The dashed line indicates a significant association using a Bonferroni correction.
Fig. D1The cumulative number of COVID-19-positive(cases) along weeks.
Results of chart reviews.
| Concepts | Reviewed samples | True signals | True signals percentage % | Examples of false positive |
|---|---|---|---|---|
| Anosmia | 20 | 19 | 95.00% | “(-) altered/loss of smell”, were wrongly recognized as an affirmative/ positive attribute. |
| Ageustia | 20 | 19 | 95.00% | “Symptoms, n/v, fever, cough, loss of taste or smell or around anyone + for Covid 19.” |
| Depression | 20 | 18 | 90.00% | One was recognized from a medical history title without any answers; the other came from a recommendation for further Psychosocial assessment. |
| Current some day smoker | 20 | 20 | 100.00% | |
| Smoking monitoring status | 20 | 19 | 95.00% | One is uncertain. “Smoking Status Not on file” . |
| Fever | 20 | 17 | 85.00% | Template issue. “The following ROS were reviewed and are negative, unless otherwise stated as + positive: |
| Pericardial Fluid (neg) | 20 | 20 | 100.00% | |
| Hydrocephalus (neg) | 20 | 20 | 100.00% | |
| Hydronephrosis | 20 | 20 | 100.00% | |
| Blood group AB Rh(D) negative | 20 | 0 | 0.00% | From blood typing tests. This signal was not specific to blood type AB+, but generated by other ABO blood types and Rh-positive patients. |
| Allergy test positive | 20 | 5 | 25.00% | The false positives were wrongly mapped from a sentence like “He /She has been exposed to covid, family member or friends have tested positive.” |
| Laurin-Sandrow syndrome | 20 | 20 | 100.00% | |
| Cough nonproductive | 20 | 20 | 100.00% | |
| In total | 260 | 217 | 83.46% |