Literature DB >> 36150783

Data capture and sharing in the COVID-19 pandemic: a cause for concern.

Louis Dron¹, Vinusha Kalatharan², Alind Gupta³, Jonas Haggstrom⁴, Nevine Zariffa⁵, Andrew D Morris⁶, Paul Arora⁷, Jay Park⁸.

Abstract

Routine health care and research have been profoundly influenced by digital-health technologies. These technologies range from primary data collection in electronic health records (EHRs) and administrative claims to web-based artificial-intelligence-driven analyses. There has been increased use of such health technologies during the COVID-19 pandemic, driven in part by the availability of these data. In some cases, this has resulted in profound and potentially long-lasting positive effects on medical research and routine health-care delivery. In other cases, high profile shortcomings have been evident, potentially attenuating the effect of-or representing a decreased appetite for-digital-health transformation. In this Series paper, we provide an overview of how facets of health technologies in routinely collected medical data (including EHRs and digital data sharing) have been used for COVID-19 research and tracking, and how these technologies might influence future pandemics and health-care research. We explore the strengths and weaknesses of digital-health research during the COVID-19 pandemic and discuss how learnings from COVID-19 might translate into new approaches in a post-pandemic era.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36150783 PMCID： PMC9489064 DOI： 10.1016/S2589-7500(22)00147-9

Source DB: PubMed Journal: Lancet Digit Health ISSN： 2589-7500

This is the second in a Series of two papers about translating data in a pandemic. All papers in the Series are available at www.thelancet.com/series/translating-data-in-a-pandemic

Introduction

The need for timely and high-quality evidence to inform decision making for policy makers during the COVID-19 pandemic has catalysed the development of innovative digital-health technologies, which have had variable effects and benefits. Timely communication of crucial information requires the acquisition, access, and analysis of large volumes of data. For example, analysis of electronic health records (EHRs) that capture real-time patient records of routine clinical care have resulted in better disease surveillance and produced evidence to inform public-health decisions. EHRs can be supplemented with clinical trial data or administrative claims to monitor the effect of novel treatment strategies and vaccines for COVID-19 in almost real time.2, 3 Although the use of EHRs, observational datasets, public epidemiological data, and clinical trial data to inform public-health policies sounds promising, there has been mixed success during the COVID-19 pandemic. Challenges with data capture (including non-standardised data collection, heterogeneity in data terminologies, and the enduring structural, cultural, and political barriers to data sharing) have led to data gaps and attenuated the potential research effect of these data. To overcome the limitations of individual datasets, and the challenges of larger integrated datasets, several research groups have undertaken data mining for potentially valuable insights into disease processes.4, 5 These challenges in data access and dissemination are not unique to COVID-19 research, although the scale of data, the necessity of timely generation and dissemination, and the concurrent global research focus represent an entirely unique context for these topics. There have been numerous reviews on the impact of the COVID-19 pandemic on topics related to digital health, including on publication practices, public perception of science, and research funding overall. In this Series paper, we specifically consider the effect of the pandemic on primary data collection through digital-health technologies, subsequent data sharing for secondary use, and the associated challenges for collaborative international scientific endeavours uncovered by the COVID-19 pandemic.

Routinely collected health data: real-world evidence

Both before and during the COVID-19 pandemic, research using routinely collected health data has had a substantial focus on the concept of real-world evidence. The US Food and Drug Administration defines this concept as the translation of routinely collected medical data via EHRs, claims and billing activities, disease registries, and other mobile-health disease-monitoring data into actionable and meaningful evidence sets. A distinction is made between real-world evidence and real-world data, which pertains to the primary data capture associated with medical care. The use of real-world evidence in research programmes has been varied, covering disparate topics (such as core epidemiological estimates of disease burden), being integrated into regulatory control datasets for comparative efficacy, and used to identify routine study procedures for pragmatic clinical trials.10, 11 The uptake of these data has been driven by timely access and standardisation, methodological developments, and regulatory guidance. Existing momentum and expertise enabled rapid engagement and dissemination of real-world evidence for COVID-19 research, as illustrated by the sheer volume of output. By July 7, 2022, there had been 13 395 publications of real-world evidence on COVID-19. Restricting this to the first 600 days of the pandemic (September 24, 2021), there were 5951 peer-reviewed publications related to real-world data and COVID-19 (figure 1 ), and a substantial number of research hypotheses were tested using real-world data. We categorised the available studies by assigning broad research topics to studies on the basis of their principal stated research question, according to prespecified categories of study designs based on consensus by two reviewers (LD and VK). Studies were classified as being related to safety and efficacy if they reported on clinical outcome measures, clinical biomarkers of disease, or any safety-related outcomes from interventions to treat or prevent COVID-19. Research with a primary objective of describing the influence of COVID-19 in a population of patients with existing and known disease status was classified as at-risk population research, whereas studies describing outcomes and their variance (such as time in hospital) among general COVID-19 populations was classified as research on identifying outcomes and trajectories. Reasearch was classified as risk-factor studies if they explored the quantitative relationship between a clinical parameter and associated COVID-19 outcomes.

Figure 1

Number of peer-reviewed publications related to real-world data and COVID-19 published during the first year and a half of the COVID-19 pandemic

Number of peer-reviewed publications related to real-world data and COVID-19 published during the first year and a half of the COVID-19 pandemic The most frequent research questions pertained to the safety and efficacy of potential treatment for COVID-19 (21%), followed by specific at-risk population outcome research (16%), research identifying outcomes and trajectories of patients affected with COVID-19 (15%), and research identifying risk factors for COVID-19 and COVID-19 induced outcomes (11%). Our search might not have captured the full extent of research in this space, because studies incorporating EHRs might not have used the terms observational or real-world or their synonyms within their abstract or title. The publications from our search were used to inform our discussions and identify specific cases mentioned herein. The COVID-19 pandemic has fostered many international collaborative efforts for real-world evidence research. One example is the international Consortium for Clinical Characterization of COVID-19 by EHR (4CE). The consortium gathered data of 27 584 patients diagnosed with COVID-19 between Jan 1, 2020, and April 11, 2020, as well as 187 802 laboratory tests from 96 hospitals across five countries within 3 weeks. This initiative provided early data on the disease progression of people with COVID-19. Similarly, descriptive statistics harnessing large-volume EHRs were instrumental in the rapid identification of at-risk populations, such as the disproportionate morbidity and mortality of Black and Asian people in the UK being reported as early as March, 2020. This work was subsequently validated via the OpenSAFELY review of over 17 million primary care records in England, which similarly found that Black and South Asian people with COVID-19 had significantly higher mortality rates than white people did, even after adjustment for other sociodemographic details. Other international endeavours include the International COVID-19 Data Alliance, which aims to build an open and trustworthy international research partnership to support a rapid response to COVID-19, and a long-term alliance for making data accessible to health researchers and scientists worldwide. Collaborative approaches to data, technology, public engagement, and governance infrastructures are supporting 12 research projects, enabling data sharing across up to 42 countries. Sponsored projects include population-driven estimates of clinical complications across different COVID-19 waves in South Africa and assessing the efficacy of vaccines in marginalised communities within Brazil. Separate to these initiatives were developments by groups traditionally involved in offering commercial real-world data services. Multiple organisations offer these services, typically partnering with either health networks or insurance providers depending on the availability of data. These organisations frequently work with regulatory and reimbursement agencies to evaluate treatment patterns in non-randomised controlled settings. These existing networks pivoted towards COVID-19, with commercial vendors of real-world data rapidly advertising services relating to COVID-19 patient data from their health-care networks predominantly based in North America; there has been little coverage of commercial COVID-19 EHR data in other regions of the world. Differing health record systems across organisations and jurisdictions and localised processes still posed challenges in using these data for research during the COVID-19 pandemic. Case definition and case ascertainment were initially evolving concepts with little access to internationally agreed-upon criteria. During the early days of the pandemic, EHR systems did not have a systematic way to identify patients with COVID-19. To overcome this challenge, an International Classification for Diseases (ICD)-10 medical code for COVID-19 was developed by April 1, 2020. By December, 2020, updated ICD-10 codes for encounters related to COVID-19 (such as encounters for screening for COVID-19, or pneumonia due to COVID-19) were developed. These advances made identifying patients with COVID-19, and conducting COVID-19-related research using routine health-care data, less confounded by poor data access and heterogeneity in disease definitions. Despite these advances, most patients were still cared for outside health-care systems with digitised medical coding, and health-care system burden alongside a rapid introduction of new coding might have contributed to incomplete data capture, particularly at the earlier stages of the pandemic and for marginalised populations. There have been high-profile examples of routinely collected data generating high-value research outputs using combinations of innovative methodologies and data sources. Much attention has been focused on the use of these large-scale data-ready formats to assess vaccine effectiveness for COVID-19. One such example used causal inference methodologies applied to centrally provided electronic health data on 596 618 individuals who had received a COVID-19 vaccine to determine vaccine effectiveness in Israel. This study was one of the first large-scale studies for vaccine effectiveness in real-world settings. Similarly, a study used national linked data, based on routinely collected EHR data from different national databases, on 5·4 million residents of Scotland (approximately 99% of the Scottish population) to prospectively evaluate the real-world effectiveness of vaccination efforts in health-care personnel, using multiple vaccine types and timepoints. With this dataset, data on the differential risk of mortality from the delta variant was available within 6 weeks of its identification. These methods provide important insights to vaccination efforts for COVID-19 not achievable by conventional trials, owing to the particularly large sample sizes that are achievable with these methods. Major advancements in using routinely collected health data for research have been made over the past 15 years, and this process has continued during the pandemic. However, there is a need for a comprehensive checklist to critically appraise research that uses real-world data to better promote high-quality research. The regulatory framework for the use of real-world data in clinical research is distinct from clinical trials; therefore, the associated assumptions on quality assurance for clinical trials might not be directly translated to real-world evidence research. Checklists, such as the 4CE checklist for EHR research in COVID-19, have been proposed and other broader tools, such as the STaRT-RWE guidelines and ISPOR-ISPE recommendations, which were developed to critically appraise studies that use real-world evidence, have been suggested to critically appraise studies that use EHR data. These concepts are not necessarily new. Indeed, the RECORD statement in 2015 has been used extensively for studies using observational evidence from routinely collected health data. Whether these criteria receive the international endorsement of other critical appraisal tools, such as the CONSORT criteria for clinical trials, is unclear. Currently, there is little endorsement by medical journals on standards for publication of research harnessing EHRs or other real-world data sources. Standards for publication could improve reporting of key study characteristics, as noted in other clinical research areas, and this has often been achieved together with groups of journals and their editors. Whether such standards can be applied to EHR-related research remains to be seen, owing to the multifaceted analytical and research questions addressed using RWD. Advancements in routinely collected health-care data for individual research projects and opportunities have occurred,28, 29 but little success has been noted with their integration into clinical trial activities. Pragmatic clinical trials, a design that harnesses the existing capture of patients’ health-care data to generate testable hypotheses on health-care interventions, were argued to be uniquely applicable to the emergent nature of COVID-19. However, few examples of published pragmatic trials for COVID-19 management are noted. There remains uncertainty as to whether the small number of pragmatic trials on COVID-19 is owed to an incomplete existing infrastructure, insufficient familiarity with the design-related challenges of such trials, or challenges in coordinating such trials. Complications might exist in a pandemic context, with substantial non-capture of medical services administered in systems without electronic data capture, and for contexts in which the health-care system was overwhelmed and organisational structuring of data was deprioritised for patient care. Routinely collected medical data present a unique opportunity to address health inequities. As we have identified, multiple studies were able to rapidly identify disproportionate burdens of COVID-19-associated hospitalisation and mortality in minority populations.16, 17 Global health inequities are often widened by low visibility into these inequities, and little engagement and active recruitment of diverse clinical research groups into both active clinical trials and retrospective database reviews. As such, the rapid and representative acquisition of routinely collected health data of patients with COVID-19 could account for one illustrative example of how these routinely collected data might result in more representative and equitable health research. Other uses of routinely collected data to address COVID-19 health inequities have been focused on the possible role of EHRs to target populations or geographies with proportionately lower vaccination uptake. These efforts must be counterbalanced against the potential for research based on real-world evidence to inadvertently worsen these inequities. Data sources used in real-world evidence research are subject to their own selection bias for some patient demographics, owing to disparities in health-care use, and these can result in unbalanced representations of patient disease burden. It is important to emphasise that the use of medical services itself might not be the source of these disparities, rather access to services that are expressed as use are likely influential. Further, some authors contend that the use of race as an identifier in medical research is itself problematic, owing to its frequent use as a poor surrogate for socioeconomic health determinants. Indeed, this problem could be further compounded in routinely collected health data, for which recording of patients’ ethnicity and race has been noted to be suboptimal and non-representative. Accordingly, developing recording structures and reporting standards that acknowledge and articulate health inequities and racial biases in medical research might be beneficial in contextualising the health research generated for marginalised populations. The generation of these data has proven an essential element of research insights in the COVID-19 pandemic, yet their capture, reporting, and conversion to meaningful and interpretable outputs are conditional on two crucial aspects: the ability for insights to be shared both with the general public and other researchers, and the tools through which they are analysed. As such, we describe the importance of data sharing and its role in the COVID-19 pandemic.

Data sharing for COVID-19 research

Medical research data is multifaceted in the way that it is captured, covering data such as from qualitative surveys or wearable medical devices, efficacy and safety data from clinical trials, routinely collected health data, and audits. Accordingly, the ways in which data are shared are highly variable, ranging from direct transfer of raw medical data on individual patients to printed summaries of clinical trials published in medical journals or press releases. In these ways, the data are shared to enable and facilitate further research, as well as to provide quality assurance if required. Data, regardless of capture and primary type, are shared and assessed in a multitude of formats, ranging from imaging data through to highly ordered clinical codes and numerical outputs of laboratory assessments. Even within these subformats, varying specifications exist with regards to data standards and the software required to read and use the data. Accordingly, challenges exist both in the ability to share and use core medical data (table ).

Table

Summary of the barriers to, and challenges of, research using digital health technology, the solutions implemented during the COVID-19 pandemic and future, long-term solutions

	Solutions implemented during the COVID-19 pandemic	Solutions to improve future digital-health technology research
Routinely collected health-care data
Inadequate standardisation, including administrative codes for data capture and comparability	Adoption of shared administrative codes for COVID-19 and related conditions³²	Adoption of comprehensive checklists for research from these data by authors and scientific journals
Variability in uptake and availability for research use across different geographical settings	Screening of EHRs to identify populations at high-risk of COVID-1918, 28	Adoption of community-driven solutions and collaboration with researchers and community leaders for EHR research
Problems with accessibility of technology platforms due to cost and insufficient technology infrastructure	Causal inference methods applied to real-world observational data³⁴	Implementation of standardised data capture, dictionaries, and technology systems
Absence of unique patient identifiers resulting in potential duplicated patient records	Applications of real-time predictive analytics for in-hospital mortality⁴⁷	Educational foundations to improve researcher and reader literacy in associated methods and limitations
Data sharing
Absence of organisational support, staff, and incentives	Rapid dissemination of annotated imaging data48, 49	Building good quality and accessible common data infrastructures for scientific communities
Infrequent audit and enforcement of data sharing by scientific journals and governing bodies	Rapid dissemination of disease models alongside associated codes and datasets⁵⁰	Continued mandate and reinforcement of data reporting in trial registries and other forms of data sharing by regulators and scientific journals
Multiplicity in data-sharing avenues, increasing the burdens on data collectors	..	Establishment of quasi-automated data pipelines and review processes

EHRs=electronic health records. Concepts in the table were informed by the literature search, in conjunction with discussions with owners of trial data, research funding organisations, and data scientists.

Summary of the barriers to, and challenges of, research using digital health technology, the solutions implemented during the COVID-19 pandemic and future, long-term solutions EHRs=electronic health records. Concepts in the table were informed by the literature search, in conjunction with discussions with owners of trial data, research funding organisations, and data scientists. Although the most detailed form of data sharing would involve individual patient-level data, the most frequent form of data sharing occurs in publications as aggregate summary-level data. The framework for sharing of observational data, such as EHRs, is not as clear as the framework for sharing data of clinical trials, with varying formats being proposed. Several coordinated efforts have been made to improve data flows and sharing over the past decades. Some efforts include the adoption of FAIR (findable, accessible, interoperable, and reusable) principles; these principles represent a framework to improve data sharing across scientific disciplines, focusing on four domains of data properties. Separately there have been improvements to user interfaces and project-request pipelines from existing data sharing platforms. In 2017, the international committee of medical journal editors mandated data sharing of clinical trial data, and many funding bodies for medical research around the world have necessitated data sharing from any funded clinical research. Research participants could also be advocates for data sharing, with previous research indicating that a majority are willing to share their data with both the public and private sectors. Although many might wish for better data sharing, this is not invariably true for all individuals involved in research, and gaps exists across geographies, age groups, and other sociodemographic factors. Regardless of these broad characteristics, individuals might not trust the existing safeguards or research communities using this data. As such, without appropriate engagement of patient groups to ensure individuals are more informed about how their data will be used for research, the potential exists to exacerbate these existing gaps and perpetuate health inequities. In particular, challenges exist for medical data shared at the individual patient level. In interviews with clinical trialists’, Rathi and colleagues identified clinicians and researchers involved in clinical trials concerned with potentially misleading secondary analyses, implications for future planned research activities on the primary data, financial and workload burdens of data sharing, and concerns over ethics approval or regulatory implications of data sharing. These concerns have been echoed in systematic reviews on the topic of sharing individual patient data, additionally noting concerns over reidentification of research participants. These challenges are particularly intensified when considering global health inequities in data-sharing exercises, for which access to resources that facilitate transfers might be more scarce (ie, in low-income and middle-income countries), alongside concerns over extractive research practices. As such, there exists an intrinsic balance to be struck between the capacity to share individual patient data to facilitate novel and meaningful research activities and the individual burdens on principal investigators and research participants. This balance becomes increasingly challenging in the context of a global pandemic, as the relative benefit of novel research might be offset by the rapid pace of data generation and its subsequent pathways to data sharing agreements. Despite the significant support received for improving data sharing for the COVID-19 pandemic, sharing clinical trial data was still challenging. Although international registries have seen close to 3000 interventional clinical trials registered for COVID-19, there have been only 121 published articles on drug treatments for COVID-19 as of April 25, 2021. Despite large trial registration numbers, less than 4% of trials have been published as of April 25, 2021, and less than 2% of trials registered had results available within associated trial registries by this date. Of these, an unknown percentage have data-sharing agreements established, although the low percentage of available evidence indicates that this might only be a small part of the total evidence established to date for COVID-19 trials. Of 132 545 studies registered on the WHO International Clinical Trials Registry Platform between January, 2019, and December, 2020, 11·2% stated that individual patient data would be shared. Studies of COVID-19 on this platform had similar rates of data agreements to studies of other non-pandemic diseases in 2020 (13·7%). Within the context of the COVID-19 pandemic, there remains an absence of centralised, sponsor-driven, primary data sharing. As of July 7, 2022, there were 8012 studies registered on ClinicalTrials.gov, mixed between observational and interventional trials, identified as being associated with COVID-19 (figure 2 ). Across all registered COVID-19 research studies, a total of 259 (3·2%) had their results posted on the platform. This is despite 4239 (52·9%) studies having completion dates listed as Dec 31, 2021, or earlier (figure 2). It is important to note that there are multifaceted explanations as to why data are not shared in this way, despite the legal requirement in the USA for data sharing via this platform, including (but not limited to) an unawareness the legal requirements, poorer than anticipated trial results, trial misconduct, statistically important under-recruitment, and language barriers.

Figure 2

Data sharing of registered clinical trials investigating COVID-19 from CT.gov

CT.gov=ClinicalTrials.gov

Data sharing of registered clinical trials investigating COVID-19 from CT.gov CT.gov=ClinicalTrials.gov For researchers involved in clinical research integrating data discussed in this Series paper, the scarcity of data sharing during the COVID-19 pandemic is perhaps not a surprise, but it is a disappointment. Data sharing requires resources and investment. Data sharing and governance agreements must be made between organisations involved in the data transfer from one organisation to another. When international research is conducted there are potentially multiple legal frameworks that must be adhered to, creating challenges when synthesising data assets. Further, data transfer can involve many branches, from the originating source to specific external groups, and then from those groups to others, depending on the data being organised. Organising data for sharing purposes, as opposed to the originally defined research question, requires dedicated personnel time, both on behalf of the body providing data and the organisation receiving and disseminating data. Simultaneously, there are recognised sociopolitical barriers to data sharing both within and between institutions, and although these are potentially one of the most substantial barriers to data sharing, they are also among the most challenging to quantify and eradicate, even within the context of a global pandemic. Although there are many metaregistries, a type of clinical data registry that houses or links the data from multiple individually unique clinical data sources, the absence of a common data dictionary across different individual data sources can restrict data linkage and accordingly creates the need for staff time and resources to minimise discordant definitions. Additionally, individual researchers could be concerned that reanalysis of their data for alternative research questions might lead to contradictory conclusions from their original statistical analyses, and hence might be less willing to share their data. Finally, there might be commercial or financial reasons to restrict access to data, including proprietary information or data which could be subsequently monetised by groups incorporating data into their analyses. Several data sharing initiatives, however, have been organised through existing bodies and new entities in attempts to facilitate data sharing.66, 67, 68 For example, F1000Research is an example of an open research publishing platform that offers peer review, data deposition, and sharing of open research following the FAIR principles. Vivli is another example of a data-sharing initiative for clinical research data. This global clinical research data-sharing platform offers 21 COVID-19 studies eligible for review as of April 22, 2021; however, this represents only 0·74% of the trials registered on ClinicalTrials.gov.61, 66 This is despite widespread acknowledgment of the importance of data sharing, particularly in the context of emerging diseases with high unmet needs. Unlike for clinical trials, data sharing for EHRs and other non-trial data are less well established. Many of the existing data-sharing platforms are exclusive to clinical trial data. The data format of these data-sharing platforms is usually not in the Observational and Medical Outcomes Partnership format that could be more compatible with the clinical trial data standards set by the Clinical Data Interchange Standards Consortium. Despite this drawback, data sharing of non-clinical trial data in COVID-19 has been promoted and distributed by several pre-existing and new non-commercial groups.15, 70, 71 Reproducibility is a fundamental concern in clinical research, particularly for digital-health technologies. In the context of COVID-19, high-profile instances of fraudulent research activities73, 74 and increased public engagement with emerging research means that the importance of timely and independently conducted reproducible analytics is greater than ever. This issue is particularly pressing when integrating digital-health technologies into clinical research that harnesses and adapts to continually updated data, such as artificial intelligence and machine learning methodologies. In particular, the very research challenges that lend themselves to artificial-intelligence and machine-learning methodologies and data capture facilitate greater opportunities for data sharing than other primary data generation activities (such as in-vitro assays or psychological research). These opportunities are due to the requirements of these methods to use high volumes of data, which in turn can drive collaborative efforts to optimise the time invested in data acquisition and curation. By reducing traditional barriers to data-sharing access for the purposes of reproducibility, the capacity for iterative and robust research harnessing digital-health technologies can simultaneously become more timely and reliable. Outside of reproducibility, timely data transfer of both aggregate data and individual patient data is another issue that needs to be addressed. Traditional timelines for data sharing do not facilitate timely data transfer within the context of a health crisis such as a pandemic. There is no singular timeline for data sharing, with the timeline conditional on the data type shared (ie, individual patient vs aggregate), the data types handled (ie, summary group statistics vs original imaging data), and existing tools to share data.64, 65 Research communities can improve responsiveness for future pandemics and medical research by modifying technological and legal frameworks with regular audits and better enforcement of data sharing. Auditing and enforcing data sharing are not without historical precedent. On April 28, 2021, the US Food and Drug Administration issued a notice of non-compliance to a trial sponsor who had failed to submit summary results within the federally mandated reporting period. Although this pertains to an example of aggregate data sharing on a particular platform, similar frameworks could be envisioned for other data types, particularly when pressing health crises necessitate expedited timelines. Guidance without enforcement or audit can be subject to significant variability and is more challenging to monitor. On the basis of the findings above, the current pace of reporting is insufficient. Deciding on what the appropriate timeframes are for data transfer is challenging, owing to the multifaceted nature of clinical research. Instead, a pragmatic system of active encouragement and ingrained funding opportunities to incentivise research groups and provide resources associated with data sharing might help in this task and improve data sharing agreements for future health emergencies. Outside of clinical data, both data sharing and dissemination of aggregated health system metrics were often characterised by high-profile interest atypical of academic sources, particularly case, hospitalisation, and vaccination tracking dashboards, such as the John Hopkins University interactive web tracker. Outside of academic impact and citations, the data were published in a shareable format via a GitHub repository, which itself has been used in the reporting of multiple media outlets. Pivotal to the success of this web tracker was the rapid publication (in February, 2020) and data being in a free, accessible, and standardised data format. Similarly, there has been enormous success in the sharing of key molecular data at a rapid pace. For example, the full genomic characterisation of the SARS-CoV-2 virus was published on Jan 30, 2020. This timely data sharing has been instrumental in preclinical characterisation of potential drug targets and vaccine development. Although clinical data and genomic data clearly have several differences, the framework of genomic data development and dissemination across large study teams could prove informative for subsequent digital-health applications to improve data sharing. For example, sequencing of SARS-CoV-2 variants has been facilitated through several viral sequencing consortia including GISAID, Nextstrain, and Pango. These activities have provided valuable data for research focused on in silico work.

Conclusion

With COVID-19 there has been growing anticipation of paradigm shifts and fundamental overhauls to medical research, particularly in the application of digital-health technologies. Enormous shifts to the routine research process are evidenced by the rapid development and implementation of diagnostic and prognostic criteria, interventions to minimise patient burden, and in global vaccination programmes. The topics of data and methods for data sharing covered in this Series paper are united in their long history before COVID-19, which were awaiting widespread acceptance and routine integration into health-care research programmes. In some cases, incremental gains over the preceding years have translated into tangible benefits in developing meaningful and actionable tools and data to help manage the pandemic. In other cases, existing barriers (technological, sociological, and operational) have reduced the effect of these digital-health technologies. A key question remains: if the imperative of the COVID-19 pandemic was insufficient to progress data collection and sharing efforts, are the issues identified intractable? It is often stated that necessity is the mother of invention. However, uncertainty now exists on how these innovations can be sustained in the post-pandemic era when existing barriers might become more entrenched, or be considered insurmountable. As such, a unique and rare opportunity exists, as the lessons learned remain current, to potentially translate the successes and shortcomings of digital tools in COVID-19 into meaningful, sustainable, and equitable tools of transformational change.

Search strategy and selection criteria

To identify relevant studies related to electronic health records (EHRs) and COVID-19, electronic bibliographical databases of published research on MEDLINE were searched on July 7, 2022, using the terms “((COVID 19 or COVID-19 or 2019-nCoV Infection$ or 2019 nCoV or coronavirus or COVID19) and (Observational Study/ or observational stud* or real world or real-world or real life or real-life))” for articles published after Jan 1, 2019. Full-text English language reports and reference lists were reviewed to gain insights into the different types of EHR research being conducted for the COVID-19 pandemic.

Declaration of interests

LD reports personal fees from the Canadian Agency for Drugs and Technology in Health (CADTH). JH reports personal fees from the Bill & Melinda Gates Foundation. LD, JH, and PA report stock options in Cytel, who provide statistical consulting on topics inclusive of real-world evidence, outside the submitted work. NZ is Strategic Advisor to the US Food and Drug Administration (FDA), in the Office of Commissioner: RWD & COVID-19. This Series paper reflects NZ's personal views, not those of the FDA. NZ also reports personal fees from the Bill & Melinda Gates Foundation, AstraZeneca, Genentech, Bristol Myers Squibb (BMS), ANOVA, ZS Associates, and from the FDA outside the submitted work. No entity funded NZ's work on this Series paper. NZ is a member of the executive committee of International Covid-19 Data Alliance (ICODA). NZ is a founder of the NMD Group. NZ holds varying levels of stock equity in AstraZeneca, GlaxoSmithKline (GSK), Johnson & Johnson (J&J), Merck, Moderna, Pfizer, Sanofi, Takeda, TranslateBio, Vaxart, Vir, Inovio; and holds stock options in ANOVA. ADM is a member of the executive committee of ICODA. ADM was co-founder of Aridhia Informatics, and a non-executive director from 2007 to 2015 and holds less than 2% stock equity. All other authors declare no competing interests.

69 in total

1. Pandemic publishing poses a new COVID-19 challenge.

Authors: Adam Palayew; Ole Norgaard; Kelly Safreed-Harmon; Tue Helms Andersen; Lauge Neimann Rasmussen; Jeffrey V Lazarus
Journal: Nat Hum Behav Date: 2020-07

2. Pandemic: public feeling more positive about science.

Authors: Eric Allen Jensen; Eric B Kennedy; Ethan Greenwood
Journal: Nature Date: 2021-03 Impact factor: 49.962

3. Uptake and Accuracy of the Diagnosis Code for COVID-19 Among US Hospitalizations.

Authors: Sameer S Kadri; Jake Gundrum; Sarah Warner; Zhun Cao; Ahmed Babiker; Michael Klompas; Ning Rosenthal
Journal: JAMA Date: 2020-12-22 Impact factor: 56.272

4. The Role of the Electronic Medical Record in the Intensive Care Unit Nurse's Detection of Patient Deterioration: A Qualitative Study.

Authors: Laurel A Despins; Bonnie J Wakefield
Journal: Comput Inform Nurs Date: 2018-06 Impact factor: 1.985

5. The challenge of using routinely collected data to compare hospital admission rates by ethnic group: a demonstration project in Scotland.

Authors: S Knox; R S Bhopal; C S Thomson; A Millard; A Fraser; L Gruer; D Buchanan
Journal: J Public Health (Oxf) Date: 2020-11-23 Impact factor: 2.341

Review 6. The time is now: role of pragmatic clinical trials in guiding response to global pandemics.

Authors: Aws Almufleh; Jacob Joseph
Journal: Trials Date: 2021-03-24 Impact factor: 2.279

Review 7. How COVID-19 has fundamentally changed clinical research in global health.

Authors: Jay J H Park; Robin Mogg; Gerald E Smith; Etheldreda Nakimuli-Mpungu; Fyezah Jehan; Craig R Rayner; Jeanine Condo; Eric H Decloedt; Jean B Nachega; Gilmar Reis; Edward J Mills
Journal: Lancet Glob Health Date: 2021-05 Impact factor: 38.927

8. BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting.

Authors: Noa Dagan; Noam Barda; Eldad Kepten; Oren Miron; Shay Perchik; Mark A Katz; Miguel A Hernán; Marc Lipsitch; Ben Reis; Ran D Balicer
Journal: N Engl J Med Date: 2021-02-24 Impact factor: 91.245

9. Development and evaluation of rapid data-enabled access to routine clinical information to enhance early recruitment to the national clinical platform trial of COVID-19 community treatments.

Authors: Caroline Cake; Emma Ogburn; Heather Pinches; Garry Coleman; David Seymour; Fran Woodard; Sinduja Manohar; Marjia Monsur; Martin Landray; Gaynor Dalton; Andrew D Morris; Patrick F Chinnery; F D Richard Hobbs; Christopher Butler
Journal: Trials Date: 2022-01-20 Impact factor: 2.279

10. The international Perinatal Outcomes in the Pandemic (iPOP) study: protocol.

Authors: Sarah J Stock; Helga Zoega; Meredith Brockway; Rachel H Mulholland; Jessica E Miller; Jasper V Been; Rachael Wood; Ishaya I Abok; Belal Alshaikh; Adejumoke I Ayede; Fabiana Bacchini; Zulfiqar A Bhutta; Bronwyn K Brew; Jeffrey Brook; Clara Calvert; Marsha Campbell-Yeo; Deborah Chan; James Chirombo; Kristin L Connor; Mandy Daly; Kristjana Einarsdóttir; Ilaria Fantasia; Meredith Franklin; Abigail Fraser; Siri Eldevik Håberg; Lisa Hui; Luis Huicho; Maria C Magnus; Andrew D Morris; Livia Nagy-Bonnard; Natasha Nassar; Sylvester Dodzi Nyadanu; Dedeke Iyabode Olabisi; Kirsten R Palmer; Lars Henning Pedersen; Gavin Pereira; Amy Racine-Poon; Manon Ranger; Tonia Rihs; Christoph Saner; Aziz Sheikh; Emma M Swift; Lloyd Tooke; Marcelo L Urquia; Clare Whitehead; Christopher Yilgwan; Natalie Rodriguez; David Burgner; Meghan B Azad
Journal: Wellcome Open Res Date: 2021-02-02