Literature DB >> 32995818

Privacy-Protecting, Reliable Response Data Discovery Using COVID-19 Patient Observations.

Jihoon Kim1, Larissa Neumann1,2,3, Paulina Paul1, Michael Aratow4, Douglas S Bell5, Jason N Doctor6, Ludwig C Hinske2,3, Xiaoqian Jiang7, Katherine K Kim8,9, Michael E Matheny10,11, Daniella Meeker6,12, Mark J Pletcher13, Lisa M Schilling14, Spencer SooHoo15, Hua Xu7, Kai Zheng16, Lucila Ohno-Machado1,17.   

Abstract

There is an urgent need to answer questions related to COVID-19's clinical course and associations with underlying conditions and health outcomes. Multi-center data are necessary to generate reliable answers, but centralizing data in a single repository is not always possible. Using a privacy-protecting strategy, we launched a public Questions & Answers web portal (https://covid19questions.org) with analyses of comorbidities, medications and laboratory tests using data from 202 hospitals (59,074 COVID-19 patients) in the USA and Germany. We find, for example, that 8.6% of hospitalizations in which the patient was not admitted to the ICU resulted in the patient returning to the hospital within seven days from discharge and that, when adjusted for age, mortality for hospitalized patients was not significantly different by gender or ethnicity.

Entities:  

Year:  2020        PMID: 32995818      PMCID: PMC7523159          DOI: 10.1101/2020.09.21.20196220

Source DB:  PubMed          Journal:  medRxiv


The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic represents a watershed event in public health and has highlighted numerous opportunities and needs in clinical and public health informatics infrastructure (1–3). One of the key challenges has been the rapid response of analyses and interpretation of observational data to inform clinical decision making and patient expectations, understanding, and perceptions (4–8). Several initiatives are building COVID-19 registries or consortia to analyze electronic health record (EHR) data (6, 7, 9). The expectation is that these resources will provide researchers and clinicians access to a rich source of observational data to understand the clinical progression of COVID-19, to estimate the impact of therapies, and to make predictions regarding outcomes. Registries may contain limited data for patients diagnosed with COVID-19: the barriers for having more data are based on both privacy concerns and also on what elements have been deemed valuable by health professionals and researchers at a particular point in time. The problems with a new and evolving disease like COVID-19 is that we do not know what data or information will be most valuable. For example, in the pandemic’s early stages, the dermatological and hematological findings were not evident, and those data were not included in registries or reports. Interest in specific laboratory markers (e.g., D-dimer, troponin) for these disturbances and additional symptoms (e.g. anosmia, conjunctivitis) has increased over time. Additionally, it is challenging for researchers and clinicians to understand the structure and quality of the data in registries and other types of data repositories, and to formulate queries to consult the data in their institution. The process becomes more complicated when data from multiple institutions are involved. Thus, the utilization of EHRs to characterize COVID-19 disease progression and outcomes is challenging. However, observational studies using EHR data may be useful when a research question does not lend itself to a randomized clinical trial (RCT). Observational studies may also help determine if results from RCTs replicate after relaxing eligibility criteria for real-world applications. While the scientific community has raised concerns about the reproducibility of findings, data provenance, and proper utilization of observational data, resulting in some COVID-19 articles being retracted (10), there remains a clear need to responsibly, ethically, and transparently analyze observational data to provide hypothesis generation and guidance in the pursuit of evidence-based healthcare. We focus on using novel decentralized data governance and methods to analyze EHR-derived data. Researchers’ questions posed in natural language are adapted to standardized queries using distributed data maintained in 12 health systems, covering 202 hospitals located in all U.S. states and two territories, and one international academic medical center (Table 1). This collaboration provides the capability to answer questions that require comparisons with historical data from over 45 million patients and uses a dynamic approach to account for an evolving awareness of the most impactful COVID-19 questions to answer and hypotheses to explore. Having access to complete EHR data from 10 of these health systems (two sites use COVID-19 registries), and not just a predefined list of key data elements, differentiates our approach from centralized registries and public health reports. The ability to build and evaluate multivariate models across a large number of health systems and to integrate results from registries differentiates our approach from most federated clinical data research network approaches.
Table 1.

Participating sites.

Cedars Sinai Medical Center (CSMC), University of Colorado Anschutz Medical Campus (CU-AMC), Ludwig Maximillian University of Munich (LMU), San Mateo Medical Center (SMMC), University of California (UC) Davis (UCD), Irvine (UCI), San Diego (UCSD), San Francisco (UCSF), University of Southern California (USC), University of Texas Health Science Center at Houston and Memorial Hermann Health System (UTH), Veterans Affairs Medical Center (VAMC).

InstitutionHospitalsBedsDischarges per yearEHR systemData Source
CSMC21,01961,386EpicEHR
CU-AMC121,829106,325EpicEHR
LMU*121,96478,673SAP/i.s.h.med QCare IMESOCOVID-19 Registry
SMMC1621,951Harris Software (Pulsecheck)EHR
Cerner (Soarian) eClinicalworks
UCD162032,248EpicEHR
UCI141721,656EpicEHR
UCLA278647,491EpicEHR
UCSD380829,895EpicEHR
UCSF379648,120EpicEHR
USC21,51123,454CernerEHR
UTH174,164233,890CernerCOVID-19 Registry
VAMC14613,000676,402ViSTa/CPRSEHR
Total20226,9761,361,491

Available data on hospital characteristics from 2018.

The responsibility of translating the question into code and of performing quality control processes lies among members of the Reliable Response Data Discovery for COVID-19 Clinical Consults using Patient Observations (R2D2) Consortium (see supplemental materials). The analyses do not require data transfer outside these institutions and reduce the risk of individual or institutional privacy breaches. We produce results that are publicly available as soon as they pass quality controls. In this approach, there is targeted analysis of specific data elements at a local level. Only the results of calculations (e.g., counts, statistics, coefficients, variance-covariance matrices) performed on data transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) from relevant patient cohorts are released from the healthcare institutions; no individual patient-level data are shared (11). Between December 11, 2020 and August 31, 2020, our consortium had 928,255 tested patients for SARS-CoV-2, 59,074 diagnosed with COVID-19, with 19,022 hospitalized and 2,591 deceased. Our public Questions and Answers (Q&A) portal (https://covid19questions.org) provides answers to research questions using several univariate or multivariate analyses, including potential associations between mortality and comorbidities; pre-hospitalization use of anti-hypertensive medications; laboratory values and hospital events. For each question, we report on the number of participating institutions and the time period within which local queries were run. Figure 1 shows the graphical display of two answers.
Fig 1.

Examples of two COVID-19 Questions and Answers: Return to hospital and mortality.

(A) 8.6% of hospitalizations without an ICU admission resulted in the patient presenting to the Emergency Room or a hospital readmission within seven days (data from ten health systems). (B-E) Unadjusted mortality rates from aggregated results are shown with 95% confidence intervals (data from ten health systems). Univariate analyses indicate that lower age, Hispanic ethnicity, and female gender are associated with lower mortality for adult hospitalized COVID-19 patients.

Example 1. “Many adult COVID-19 patients who were hospitalized did not get admitted to the ICU and were discharged alive. How many returned to the hospital within a week, either to the Emergency Room (ER) or for another hospital stay?” The answer indicates 8.6%. This question is both important from the standpoint of understanding the natural course of disease and planning for needed resources. Although efforts are underway to understand post-discharge outcomes in COVID-19 infected patients, to date they have been limited to case series (12), modest sample sizes (13), or single-center or geographically concentrated health systems (14, 15). These extant studies may also be hampered by fixed inclusion/exclusion criteria (16). Example 2. “Among adults hospitalized with COVID-19, how does the in-hospital mortality rate compare per subgroup (age, ethnicity, gender and race)?” The answers from univariate and multivariate analyses (logistic regression, Fig. 2) indicate that age is a major risk factor, but ethnicity and gender are also significant when considered univariately. There is great interest and growing peer-reviewed literature on risk factors for COVID-19 mortality: the agility of our approach allows us to quickly re-run queries and rebuild models as new predictors become relevant and the understanding of the disease evolves (17–20).
Fig 2.

Regression Results.

(A) Adjusted effects from the Grid Binary LOgistic REgression (GLORE) (11) federated logistic regression model (3,146 patients from eight health systems). The baselines were GENDER=female, RACE=white, ETHNICITY=non-hispanic. Age (in years) was divided by 100. After adjustment via distributed logistic regression, age remains significant. (B) Results from local logistic regression performed at two sites are also shown for comparison with GLORE results.

Several other questions and answers are shown in the portal and are updated when answers are approved for posting. We also have a set of approved questions awaiting response from the majority of sites. Institutions participating in our network are diverse in terms of organization, population served, prior investments in information technology, and location. A novel governance structure in our consortium allows us to distribute the workload across various teams without relying on a traditional coordinating center, instead including a Consortium Hub for certain functions. This approach keeps patient data in-house, simplifies data use agreements, avoids delegation of control of patient data to another institution, and allows any institution to benchmark its results to those produced by the consortium, since all questions and respective final, aggregated answers, database query code, concept definitions and analytics code are made public. It complies with HIPAA, the Common Rule, the GDPR, and the California Consumer Privacy Act with regards to handling of patient data. Code sharing and public answers promote transparency and reproducibility without disclosing patient information or institutional information. Our approach has advantages but also some limitations. The advantages are that we are able to, in relatively short time, publicly post answers to questions that are of general interest, using data from a spectrum of highly diverse institutions with different levels of information technology baselines and expertise in standardized data models and vocabularies, institutional policies, state and federal regulations. Because we keep data locally and only consult data elements that are necessary to answer specific questions, this approach has a lower risk of privacy breach when compared to registries in which patient data are exchanged or to distributed consortia in which summary-level results for each institution are reported. Additionally, since registries typically focus on a single disease or condition, they often lack comparator data from other patients, limiting the opportunity to characterize a new disease and discover how it differs from what we currently know. Participating sites do not need to transform all data into OMOP CDM and can decline to answer any questions they do not feel comfortable with or answer partially to ensure patient-level privacy by masking counts between 1 and 10. Institutional privacy is also preserved because all public answers combine the aggregate data from at least three Responding Sites (we do not specify which ones), but we keep all answers for audits. Making concept definitions, query code, and results available allows reproducibility and enables automated updates to the answers. A major advantage is that existing registries of consortia can serve as additional sites to help answer certain questions. The limitations are inherent from considering all sites equal when formulating a final answer. Regional or institutional practice variations are not represented in the answers. Additionally, the distributed nature of the R2D2 consortium adds a requirement for a dynamic management team, the need to educate local leadership on distributed analytics, and potential for delays in certain decisions in order to exercise shared governance. A specific limitation of our current consortium is the preponderance of institutions based in California: eight out of 12 (67%), accounting for 17.5% of COVID-19 patients (Fig. 3). This was a convenience sample of organizations that had shared interests and a history of collaboration. We invite other institutions, consortia and registries worldwide to join us in answering questions of general interest.
Fig 3.

Location of consortium’s medical centers and hospitals.

Map by Ilya Zaslavsky

  22 in total

1.  Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area.

Authors:  Safiya Richardson; Jamie S Hirsch; Mangala Narasimhan; James M Crawford; Thomas McGinn; Karina W Davidson; Douglas P Barnaby; Lance B Becker; John D Chelico; Stuart L Cohen; Jennifer Cookingham; Kevin Coppa; Michael A Diefenbach; Andrew J Dominello; Joan Duer-Hefele; Louise Falzon; Jordan Gitlin; Negin Hajizadeh; Tiffany G Harvin; David A Hirschwerk; Eun Ji Kim; Zachary M Kozel; Lyndonna M Marrast; Jazmin N Mogavero; Gabrielle A Osorio; Michael Qiu; Theodoros P Zanos
Journal:  JAMA       Date:  2020-05-26       Impact factor: 56.272

2.  COVID-19 and the Need for a National Health Information Technology Infrastructure.

Authors:  Dean F Sittig; Hardeep Singh
Journal:  JAMA       Date:  2020-06-16       Impact factor: 56.272

3.  A Data Quality Assessment Guideline for Electronic Health Record Data Reuse.

Authors:  Nicole G Weiskopf; Suzanne Bakken; George Hripcsak; Chunhua Weng
Journal:  EGEMS (Wash DC)       Date:  2017-09-04

4.  The COVID-19 Pandemic Highlights Shortcomings in US Health Care Informatics Infrastructure: A Call to Action.

Authors:  Vikas N O'Reilly-Shah; Katherine R Gentry; Wil Van Cleve; Samir M Kendale; Craig S Jabaley; Dustin R Long
Journal:  Anesth Analg       Date:  2020-08       Impact factor: 5.108

Review 5.  The urgent need for integrated science to fight COVID-19 pandemic and beyond.

Authors:  Negar Moradian; Hans D Ochs; Constantine Sedikies; Michael R Hamblin; Carlos A Camargo; J Alfredo Martinez; Jacob D Biamonte; Mohammad Abdollahi; Pedro J Torres; Juan J Nieto; Shuji Ogino; John F Seymour; Ajith Abraham; Valentina Cauda; Sudhir Gupta; Seeram Ramakrishna; Frank W Sellke; Armin Sorooshian; A Wallace Hayes; Maria Martinez-Urbistondo; Manoj Gupta; Leila Azadbakht; Ahmad Esmaillzadeh; Roya Kelishadi; Alireza Esteghamati; Zahra Emam-Djomeh; Reza Majdzadeh; Partha Palit; Hamid Badali; Idupulapati Rao; Ali Akbar Saboury; L Jagan Mohan Rao; Hamid Ahmadieh; Ali Montazeri; Gian Paolo Fadini; Daniel Pauly; Sabu Thomas; Ali A Moosavi-Movahed; Asghar Aghamohammadi; Mehrdad Behmanesh; Vafa Rahimi-Movaghar; Saeid Ghavami; Roxana Mehran; Lucina Q Uddin; Matthias Von Herrath; Bahram Mobasher; Nima Rezaei
Journal:  J Transl Med       Date:  2020-05-19       Impact factor: 5.531

6.  Rapid response in the COVID-19 pandemic: a Delphi study from the European Pediatric Dialysis Working Group.

Authors:  Fabian Eibensteiner; Valentin Ritschl; Gema Ariceta; Augustina Jankauskiene; Günter Klaus; Fabio Paglialonga; Alberto Edefonti; Bruno Ranchin; Claus Peter Schmitt; Rukshana Shroff; Constantinos J Stefanidis; Johan Vande Walle; Enrico Verrina; Karel Vondrak; Aleksandra Zurowska; Tanja Stamm; Christoph Aufricht
Journal:  Pediatr Nephrol       Date:  2020-05-17       Impact factor: 3.714

7.  Association of Race With Mortality Among Patients Hospitalized With Coronavirus Disease 2019 (COVID-19) at 92 US Hospitals.

Authors:  Baligh R Yehia; Angela Winegar; Richard Fogel; Mohamad Fakih; Allison Ottenbacher; Christine Jesser; Angelo Bufalino; Ren-Huai Huang; Joseph Cacchione
Journal:  JAMA Netw Open       Date:  2020-08-03

8.  Incidence, clinical course and risk factor for recurrent PCR positivity in discharged COVID-19 patients in Guangzhou, China: A prospective cohort study.

Authors:  Jiazhen Zheng; Rui Zhou; Fengjuan Chen; Guofang Tang; Keyi Wu; Furong Li; Huamin Liu; Jianyun Lu; Jiyuan Zhou; Ziying Yang; Yuxin Yuan; Chunliang Lei; Xianbo Wu
Journal:  PLoS Negl Trop Dis       Date:  2020-08-31

Review 9.  The COronavirus Pandemic Epidemiology (COPE) Consortium: A Call to Action.

Authors:  Andrew T Chan; David A Drew; Long H Nguyen; Amit D Joshi; Wenjie Ma; Chuan-Guo Guo; Chun-Han Lo; Raaj S Mehta; Sohee Kwon; Daniel R Sikavi; Marina V Magicheva-Gupta; Zahra S Fatehi; Jacqueline J Flynn; Brianna M Leonardo; Christine M Albert; Gabriella Andreotti; Laura E Beane-Freeman; Bijal A Balasubramanian; John S Brownstein; Fiona Bruinsma; Annie N Cowan; Anusila Deka; Michael E Ernst; Jane C Figueiredo; Paul W Franks; Christopher D Gardner; Irene M Ghobrial; Christopher A Haiman; Janet E Hall; Sandra L Deming-Halverson; Brenda Kirpach; James V Lacey; Loïc Le Marchand; Catherine R Marinac; Maria Elena Martinez; Roger L Milne; Anne M Murray; Denis Nash; Julie R Palmer; Alpa V Patel; Lynn Rosenberg; Dale P Sandler; Shreela V Sharma; Shepherd H Schurman; Lynne R Wilkens; Jorge E Chavarro; A Heather Eliassen; Jaime E Hart; Jae Hee Kang; Karestan C Koenen; Laura D Kubzansky; Lorelei A Mucci; Sebastien Ourselin; Janet W Rich-Edwards; Mingyang Song; Meir J Stampfer; Claire J Steves; Walter C Willett; Jonathan Wolf; Tim Spector
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2020-05-05       Impact factor: 4.090

10.  Rapid response to COVID-19: health informatics support for outbreak management in an academic health system.

Authors:  J Jeffery Reeves; Hannah M Hollandsworth; Francesca J Torriani; Randy Taplitz; Shira Abeles; Ming Tai-Seale; Marlene Millen; Brian J Clay; Christopher A Longhurst
Journal:  J Am Med Inform Assoc       Date:  2020-06-01       Impact factor: 7.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.