| Literature DB >> 33454728 |
Arianna Dagliati1, Alberto Malovini2, Valentina Tibollo2, Riccardo Bellazzi1,2.
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has clearly shown that major challenges and threats for humankind need to be addressed with global answers and shared decisions. Data and their analytics are crucial components of such decision-making activities. Rather interestingly, one of the most difficult aspects is reusing and sharing of accurate and detailed clinical data collected by Electronic Health Records (EHR), even if these data have a paramount importance. EHR data, in fact, are not only essential for supporting day-by-day activities, but also they can leverage research and support critical decisions about effectiveness of drugs and therapeutic strategies. In this paper, we will concentrate our attention on collaborative data infrastructures to support COVID-19 research and on the open issues of data sharing and data governance that COVID-19 had made emerge. Data interoperability, healthcare processes modelling and representation, shared procedures to deal with different data privacy regulations, and data stewardship and governance are seen as the most important aspects to boost collaborative research. Lessons learned from COVID-19 pandemic can be a strong element to improve international research and our future capability of dealing with fast developing emergencies and needs, which are likely to be more frequent in the future in our connected and intertwined world.Entities:
Keywords: COVID-19 pandemic; Electronic Health Record; clinical research; data sharing; international initiatives
Mesh:
Year: 2021 PMID: 33454728 PMCID: PMC7929411 DOI: 10.1093/bib/bbaa418
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Collaborative infrastructures
| Name | Resource Link | Founders | Description | Accessibility | Format |
|---|---|---|---|---|---|
| 4CE |
| 4 CE—Consortium for Clinical Characterization of COVID-19 by EHR | International consortium for EHR data-driven studies. The goal is to inform doctors, epidemiologists and public about COVID-19 patients with data acquired through the healthcare process | Free download of aggregated data provided strictly for research purposes | Files available in csv format |
| AWS data lake |
| Amazon AWS | Hosted on the AWS cloud, this curated data lake contains useful datasets such as COVID-19 case tracking data from The New York Times, COVID-19 testing data from the COVID Tracking Project, hospital bed availability from Definitive Healthcare, health survey data from the Delphi Research Group and research data from over 45 000 articles about COVID-19 and related coronaviruses from the Allen Institute for AI | It requires an Amazon AWS account | Unstructured data |
| C3.ai data lake |
| C3.ai | Daily case reports, epidemiology line lists, genomic sequences of COVID-19 nucleotide and protein samples, collection of journal articles, repositories of clinical assets related to COVID-19 such as CT and X-ray lung images, patient test results, vaccine coverage, active therapeutics and clinical trials, data sources on movement trends during COVID-19 in different geographies around the world, collection of actions and policies taken by government and regulatory bodies to address COVID-19 | Upon registration | Unstructured data |
| CORD-19 |
| The Semantic Scholar team—Allen Institute for AI | A free resource of >130 000 scholarly articles | Free download corpus | Annotated corpus |
| CORONANet |
| NYU Abu Dhabi, Hochschule für Politik at the TU Munich, Yale University | The data yields detailed information on the level of government responding to the COVID-19 crisis with focus on specific actions taken and the geographical areas targeted by these measures | Free download | Files available in csv format |
| Coronavirus Disease Dashboard |
| WHO | Trends over time and querying and retrieving information about epidemic summary statistics by country | Free download | Files available in csv format |
| COVID-19 research database |
| Public–private consortium (Datavant, Health Care Cost Institute, Medidata, Mirador Analytics, Veradigm, Change Healthcare, Snowflake) | The database includes de-identified and limited datasets from medical and pharmacy claims data, EHR data, mortality data and consumer data. More information, including coverage, data dictionaries and update frequency is available on our knowledge base for approved researchers | It requires registration. The database can be accessed by academic, scientific and medical researchers conducting real-world data studies related to COVID-19. Although researchers may come from any sector, only non-profit, non-commercial projects related to COVID-19 or pandemics will be considered. All results must be made publicly available, preferably through peer-reviewed publications | – |
| EU Open Data Portal |
| European Centre for Disease Prevention and Control | The dataset contains the latest available public data on COVID-19, including a daily situation update, the epidemiological curve and the global geographical distribution (EU/EEA and the UK, worldwide). The updates come from EU/EEA countries through the Early Warning and Response System (EWRS), The European Surveillance System (TESSy), the World Health Organization (WHO) and email exchanges with other international stakeholders | Free download for daily situation update. Access to TESSy data for individuals nominated by the EU/EEA countries, European Commission, EU bodies, international organizations and other entities, following the TESSy nomination procedure. | Files available in csv, xlsx, json, xml formats |
| European Data Portal |
| European Union | Collection of datasets that are directly or indirectly related to COVID-19, organized in categories. It includes epidemiological data, patient-level and population-level data, data related to the impact on lifestyle (food price monitor, slaughtered bovine animals) | Freely accessible URL to the datasets resource | URL to the datasets resource |
| Microsoft data lake |
| Microsoft | COVID-19-related datasets from various sources, covering testing and patient outcome tracking data, social distancing policy, hospital capacity, mobility, etc. | Free download | Files available in csv and json formats |
| National COVID Cohort Collaborative (N3C) |
| NCATS (CTSA, Clinical and Translational Science Awards) Program hubs, the National Center for Data to Health (CD2H) | It contains real-world data from patients who were tested for COVID-19 or whose symptoms are consistent with COVID-19, as well as data from individuals infected with pathogens such as SARS 1, MERS and H1N1, which can support comparative studies | Under an approved Data Use Agreement with NCATS, anyone can access N3C data after receiving approval for their Data Use Request. N3C users can include, but are not limited to, non-profit or not-for-profit organizations, federal, state and local health departments, researchers from industry and citizen scientists. Access is dependent on the level of data being requested, and IRB approval may be needed | OMOP CDM |
| OHDSI-CHARYBDIS |
| The Observational Health Data Sciences and Informatics (OHDSI) | It contains baseline demographic, clinical characteristics, treatments and outcomes of interest among individuals tested for SARS-CoV-2 and/or diagnosed with COVID-19 overall and stratified by sex, age and specific comorbidities. As more data becomes available, it will include additional databases that are formatted to the OMOP-CDM. There are now 13 databases from three different continents and from seven European countries | The analytical code is available at | OMOP CDM |
| UK Biobank |
| University of Oxford | It contains data about the health and well-being of 500 000 volunteer participants. It is now receiving COVID-19 test data for UK Biobank participants in England and more frequent updates of data on deaths, inpatient hospital admissions, including intensive care and primary care data | Upon registration (approved UK Biobank researcher) | – |