| Literature DB >> 34051088 |
Jihoon Kim1, Larissa Neumann2,3, Paulina Paul1, Michele E Day1, Michael Aratow4, Douglas S Bell5, Jason N Doctor6, Ludwig C Hinske2,3, Xiaoqian Jiang7, Katherine K Kim8,9, Michael E Matheny10,11, Daniella Meeker12, Mark J Pletcher13, Lisa M Schilling14, Spencer SooHoo15, Hua Xu7, Kai Zheng16, Lucila Ohno-Machado1,17.
Abstract
OBJECTIVE: To utilize, in an individual and institutional privacy-preserving manner, electronic health record (EHR) data from 202 hospitals by analyzing answers to COVID-19-related questions and posting these answers online.Entities:
Keywords: COVID-19; common data elements; electronic health record; observational study; regression analysis
Mesh:
Year: 2021 PMID: 34051088 PMCID: PMC8194878 DOI: 10.1093/jamia/ocab054
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Participating sites: Cedars Sinai Medical Center (CSMC), University of Colorado Anschutz Medical Campus (CU-AMC), Ludwig Maximilian University of Munich (LMU), San Mateo Medical Center (SMMC), University of California (UC) Davis (UCD), Irvine (UCI), San Diego (UCSD), San Francisco (UCSF), University of Southern California (USC), University of Texas Health Science Center at Houston and Memorial Hermann Health System (UTH), Veterans Affairs Medical Center (VAMC)
|
| Hospitals | Beds | Discharges per year | EHR system | Data source |
|---|---|---|---|---|---|
| CSMC | 2 | 1019 | 61 386 | Epic | EHR |
| CU-AMC | 12 | 1829 | 106 325 | Epic | EHR |
| LMU | 12 | 1964 | 78 673 | SAP/i.s.h.medQCare IMESO | COVID-19 Registry |
| SMMC | 1 | 62 | 1951 | Harris Software(Pulsecheck)Cerner (Soarian)eClinicalworks | EHR |
| UCD | 1 | 620 | 32 248 | Epic | EHR |
| UCI | 1 | 417 | 21 656 | Epic | EHR |
| UCLA | 2 | 786 | 47 491 | Epic | EHR |
| UCSD | 3 | 808 | 29 895 | Epic | EHR |
| UCSF | 3 | 796 | 48 120 | Epic | EHR |
| USC | 2 | 1511 | 23 454 | Cerner | EHR |
| UTH | 17 | 4164 | 233 890 | Cerner | COVID-19 Registry |
| VAMC | 146 | 13 000 | 676 402 | ViSTa/CPRS | EHR |
| Total | 202 | 26 976 | 1 361 491 |
Available data on hospital characteristics from 2018.
Two additional sites joined the consortium and will begin answering queries in 2021.
Figure 1.What happens behind the scenes: from questions to answers. The workflow of the question-answer system is shown in 5 steps. Step 1. Users access a public web portal and post a new question if they cannot find a posted answer. Step 2. The questions get triaged to a Consortium Hub clinical informatician who determines their general interest and assigns the edited version of the question to a Lead Site. Step 3. At the Lead Site, the clinical informatician and the database analyst work together to create concept sets, design a query, and check local results. Step 4. The Responding Site runs the released structured query language (SQL) code and uploads its results to the Consortium Hub. During this step, the clinical informatician and the Responding Site data analyst adjust the concept set, inclusion logic, and database query code in SQL for local implementation; obtain and quality control the site-level results; and submit results to the Consortium Hub. Step 5. The Consortium Hub aggregates the site-level results, generates the visualizations, and posts the answer on the web portal.
Figure 2.Swimlane diagram. A Q&A process flow starts from a user entering a request and ends with the user receiving e-mail notification about a response. At the Consortium Hub, the data scientist is responsible for aggregating site-level results and for data quality checks. The clinician at the Consortium Hub is responsible for feasibility assessment of the question, triaging to a Lead Site, and for the approval of the aggregate answer. At the Lead Site, the clinician reviews the assigned question text and works with the database analyst to translate the question into SQL and ensure the results are clinically relevant. The database analyst at the Lead Site writes the SQL code, runs it, verifies the results, and releases the code to the Consortium Hub. At the Responding Site, the database analyst runs the Lead Site’s SQL code, reviews the results together with local clinicians, and uploads the site-level results to the Consortium Hub through an iterative process of ETL update, local data mapping, and concept set development led by the Lead Site.
Figure 3.Cohort definition and concept set development. Defining a cohort of patients that is frequently used to answer questions helps us reuse code. In this example, defining the cohort of patients hospitalized with COVID-19 involves use of SARS-CoV-2 test results or diagnosis codes (A). In (B), we illustrate how a laboratory test is defined differently at two sites and how blood type had yet to be harmonized into OMOP at one site.
Figure 4.An example of a COVID-19 question: monthly mortality. The in-hospital mortality rate per month (red line) is shown as a percentage, with its 95% confidence interval between January and November in 2020. The observed counts for the deceased during hospitalization (orange) and the discharged alive (blue) are shown in bar plots. The unit of analysis is the hospital encounter.
Data quality checks and issues. Different data quality check types are enumerated together with real issues identified with this COVID-19 project
| Check Type | Example of data quality issue |
|---|---|
| Date/time reversal | A condition/observation was recorded after discharge date |
| Extreme outlier | The hospital length of stay was greater than 80 days. The median length of stay ranged between 11 and 15 days in China and US studies |
| Gaps in data transformation | Discharge disposition and ICU departments were not transformed to OMOP |
| Loss of granularity during mapping | Invasive and noninvasive mechanical ventilation mapped to the same concept |
| Impossible events | Multiple death events occurred in different time points from multiple hospital encounters |
| Noncompliance to the output format | Header was missing in the predefined output .csv format, missing columns, shifted columns, and duplicate rows |
| Unexpected proportion | The percentage of current smokers was 65% at a certain site. The national percentage of smoking was 15.6% among male adults in 2018 US CDC data |
| Unexpected zero count | The number of patients who were taking any antihypertensives was zero |
| Unmatched group sum | The total sums of patient count in age groups and race groups were different even when all cell counts were greater than 10 |
| Version mismatch | The version of the template query was revised after the query result was uploaded |
Figure 6.Regression results. (A) Adjusted effects from the Grid binary LOgistic REgression (GLORE) (15) federated logistic regression model (3146 patients from 8 health systems). The baselines were SEX=female, RACE=white, ETHNICITY=non-Hispanic. AGE (in years) was divided by 100. After adjustment via distributed logistic regression, AGE remains significant. (B) Results from local logistic regression performed at two sites are also shown for comparison with GLORE results.
Concept relationships between ICD10CM and SNOMED concepts. ICD10CM concepts and their mapped SNOMED concepts from the
| Concept Code 1 (ICD10CM) | Concept Name 1 | Concept Id 1 | Relationship Id | Concept Id 2 | Concept Name 2 | Concept Code 2 (SNOMED) |
|---|---|---|---|---|---|---|
| J12.89 | Other viral pneumonia | 45572161 | ‘Maps to’ | 261326 | Viral pneumonia | 75570004 |
| J20.8 | Acute bronchitis due to other specified organisms | 35207965 | ‘Maps to’ | 260139 | Acute bronchitis | 10509002 |
| J22 | Unspecified acute lower respiratory infection | 35207970 | ‘Maps to’ | 4307774 | Acute lower respiratory tract infection | 195742007 |
| J40 | Bronchitis, not specified as acute or chronic | 35208013 | ‘Maps to’ | 256451 | Bronchitis | 32398004 |
| J80 | Acute respiratory distress syndrome | 35208069 | ‘Maps to’ | 4195694 | Acute respiratory distress syndrome | 67782005 |
| J98.8 | Other specified respiratory disorders | 35208108 | ‘Maps to’ | 320136 | Disorder of respiratory system | 50043002 |
| B97.29 | Other coronavirus as the cause of diseases classified elsewhere | 45600471 | ‘Maps to’ | 4100065 | Disease due to Coronaviridae | 27619001 |
| U07.1 | Emergency use of U07.1 | Disease caused by severe acute respiratory syndrome coronavirus 2 | 702953 | ‘Maps to’ | 37311061 | Disease caused by 2019-nCoV | 840539006 |