Literature DB >> 31419833

Clinical Research Informatics: Contributions from 2018.

Abstract

OBJECTIVES: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2018.
METHOD: A bibliographic search using a combination of MeSH descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting of the editorial team was organized to conclude on the selection of best papers.
RESULTS: Among the 1,469 retrieved papers published in 2018 in the various areas of CRI, the full review process selected four best papers. The first best paper describes a simple algorithm detecting co-morbidities in Electronic Healthcare Records (EHRs) using a clinical data warehouse and a knowledge base. The authors of the second best paper present a federated algorithm for predicting heart failure hospital admissions based on patients' medical history described in their distributed EHRs. The third best paper reports the evaluation of an open source, interoperable, and scalable data quality assessment tool measuring completeness of data items, which can be run on different architectures (EHRs and Clinical Data Warehouses (CDWs) based on PCORnet or OMOP data models). The fourth best paper reports a data quality program conducted across 37 hospitals addressing data quality Issues through the whole data life cycle from patient to researcher.
CONCLUSIONS: Research efforts in the CRI field currently focus on consolidating promises of early Distributed Research Networks aimed at maximizing the potential of large-scale, harmonized data from diverse, quickly developing digital sources. Data quality assessment methods and tools as well as privacy-enhancing techniques are major concerns. It is also notable that, following examples in the US and Asia, ambitious regional or national plans in Europe are launched that aim at developing big data and new artificial intelligence technologies to contribute to the understanding of health and diseases in whole populations and whole health systems, and returning actionable feedback loops to improve existing models of research and care. The use of "real-world" data is continuously increasing but the ultimate role of this data in clinical research remains to be determined. Georg Thieme Verlag KG Stuttgart.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31419833 PMCID： PMC6697501 DOI： 10.1055/s-0039-1677921

Source DB: PubMed Journal: Yearb Med Inform ISSN： 0943-4747

Introduction

Within the 2018 International Medical Informatics Association (IMIA) Yearbook, the Clinical Research Informatics (CRI) section aims at providing an overview of research trends from 2018 publications that demonstrate the progress in multifaceted aspects of medical informatics supporting the life-cycle of clinical trials as well as the always growing use of “real-world” data. New methods, tools, and CRI systems have been developed in order to collect, integrate, and mine healthcare data for better care. The CRI community has especially addressed the important challenges of evaluating the impact of “new artificial intelligence technologies”, this year’s special theme of the IMIA Yearbook.

About the Paper Selection

A comprehensive review of articles published in 2018 and addressing a wide range of issues for CRI was conducted. The selection was performed by querying MEDLINE via PubMed (from NCBI, National Center for Biotechnology Information) with a set of predefined MeSH descriptors and free terms: Clinical research informatics, Biomedical research, Nursing research, Clinical research, Medical research, Pharmacovigilance, Patient selection, Phenotype, Genotype-phenotype associations, Feasibility studies, Eligibility criteria, Feasibility criteria, Cohort selection, Patient recruitment, Clinical trial eligibility screening, Eligibility determination, Patient-trial matching, Protocol feasibility, Real world evidence, Data Collection, Epidemiologic research design, Clinical studies as Topic, Multicenter studies as Topic, and Evaluation studies as Topic. Papers addressing topics of other sections of the Yearbook, such as Translational Bioinformatics, were excluded based on the predefined exclusion of MeSH descriptors such as Genetic research, Gene ontology, Human genome project, Stem cell research, or Molecular epidemiology. Bibliographic databases were searched on January 30, 2019 for papers published in 2018, considering the electronic publication date. Among an original set of 1,468 references, 1,019 papers were selected as being in the scope of CRI and their scientific quality was blindly rated as low, medium, or high by the two section editors based on papers’ title and abstract. Eighty-four references classified as medium or high quality contributions to the field by at least one of the section editors were classified into the following eleven dimensions/sub areas of the CRI domain: observational studies, reuse of electronic health record (EHR) data, data integration and semantic interoperability, feasibility studies, patient recruitment, data management and CRI systems, data/text mining and algorithms, data quality assessment or validation, security and confidentiality, ethical, legal, social, policy issues and solutions, stakeholder participation, communicating study results. The 84 references were reviewed jointly by the two section editors to select a consensual list of14 candidate best papers representative of all CRI categories. Following the IMIA Yearbook process, these 14 papers were peer-reviewed by the IMIA Yearbook editors and external reviewers (at least four reviewers per paper). Four papers were finally selected as best papers ( Table 1 ). A content summary of these best papers can be found in the appendix of this synopsis.

Table 1

Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2019 in the section ‘Clinical Research Informatics’. The articles are listed in alphabetical order of the first author’s surname.

SectionClinical Research Informatics
▪ Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. Federated learning of predictive models from federated Electronic Health Records. Int J Med Inform 2018 Apr;1 1 2:59-67.
▪ Daniel C, Serre P, Orlova N, Breant S, Paris N, Griffon N. Initializing a hospital-wide data quality program. The AP-HP experience. Comput Methods Programs Biomed 2018 Nov 9.
▪ Estiri H, Stephens KA, Klann JG, Murphy SN. Exploring completeness in clinical data research networks with DQe-c. J Am Med Inform Assoc 2018 Jan 1;25(1):17-24.
▪ Sylvestre E, Bouzille G, Chazard E, His-Mahier C, Riou C, Cuggia M. Combining information from a clinical data warehouse and a pharmaceutical database to generate a framework to detect comorbidities in electronic health records. BMC Med Inform Decis Mak 2018 Jan 24;18(1):9.

Outlook

The 14 candidate best papers for 2018 illustrate recent efforts towards data-driven research and innovation and exemplify trends in CRI sub-domains such as data/text mining, artificial intelligence, data integration and semantic interoperability, data management and CRI systems, data quality and reproducibility in biomedical research, security, initiatives for scaling up real world data. In addition to these research papers, a useful overview of the challenges and approaches to scaling up research using large-scale health data resources was published by Hemingway et al. 1 .

Data/Text Mining and New Technologies from Artificial Intelligence

The proliferation of diverse health data sources has made feasible the analysis of “real-world” data to generate evidence for healthcare professional decision-making. For example, Ledieu et al. demonstrates that smart representation of heterogeneous data integrated within Clinical Data Warehouses (CDWs) improves care givers’ experience 2 . One of the best papers is a paper from Sylvestre et al. , 3 . The authors propose an algorithm to detect comorbidities in electronic health records (EHRs). It combines structured data such as drug prescriptions and laboratory results with indications for each drug provided by a pharmaceutical database. Comorbidity diagnoses were suggested for 68.4% ofthe 4,312 patients of the test data set and confirmed in 20.3% of reviewed cases. Important health information in hospital CDWs is hidden in unstructured data. Garcelon et al. , 4 have combined two information extraction methods to detect phenotypes for patients with rare diseases. The document-oriented CDW PaDaWaN has been extended by Dietrich et al. , 5 with an ad hoc dynamic, interactive, and adjustable information extraction service that allows users to query text data in a manner similar to the one used to query structured data. This works on the fly, at runtime, to recognize negation and context, and can compute the frequencies for Boolean and numeric values with high recall and precision. One method for data protection of federated (virtual) databases is by avoiding granular data exchange. Another best paper is a paper authored by Brisimi et al. , 6 which describes a computationally efficient and privacy-aware solution for large-scale machine learning problems running on distributed data. The iterative cluster Primal Dual Splitting (cPDS) algorithm, developed for solving the large-scale sparse Support Vector Machine (sSVM) problem in a decentralized fashion allows the data holders to collaborate, while keeping every participant’s data private. The distributed learning scheme cPDS, evaluated on the problem of predicting hospitalizations due to heart diseases, converges faster than centralized methods and achieves similar prediction accuracy.

Data Integration and Semantic Interoperability

Data heterogeneity is one of the critical problems in sharing or linking, reusing, and analysing datasets. Fast Healthcare Interoperability Resources (FHIR) is the new HL7 interoperability standard. Substitutable Modular third-party Applications (SMART) defines the SMART-on-FHIR specification for how applications interface with EHRs through FhIR. Paris et al., 7 extended i2b2 to search remotely into one or multiple SMART-on-FHIR Application Programming Interfaces (APIs). This opens i2b2 to new data types and improves security and interoperability management in the context of scalable solutions for cross-border and cross-domain networking of data.

Data Management and CRI Systems

Devine et al. , 8 present an evaluation of data management at the hospitals of the Washington State’s Surgical Care Outcomes and Assessment Program (SCOAP) network engaged in the Comparative Effectiveness Research and Translation Network (CERTAIN). It aims at reusing EHRs for quality improvement and research. The authors compared a manual and an automated abstraction processes based on a centralized federated data model in four SCOAP hospitals. Six to 15 percent of data elements were automatically abstracted with more than 90% of consistency.

Data Quality and Reproducibility in Biomedical Research

Although a major concern in distributed research networks (DRNs), data quality (DQ) assessment of hospital information systems is largely unpublished. The US National Patient-Centered Clinical Research Network (PCORnet ¯ ) is one of the first DRNs to incorporate EHR data from multiple domains on a national scale. Quals et al., 9 describe the data curation process of the PCORnet’s Coordinating Center for evaluating foundational DQ and assessing fitness- for-use across a broad research portfolio. Looten et al , 10 leveraged the European Hospital Georges Pompidou CDW and tracked the evolution of 192 biological parameters over 17 years (445,000+ patients, 131 million laboratory test results). The authors developed computational and statistical methods to identify different evolution profiles and formulated recommendations to enable safe use and sharing of biological data collection to limit the impact of data evolution in retrospective and federated studies. The paper from Daniel et al. , 11 selected as a best paper , presents a DQ program at AP-HP to increase the reproducibility of analyses running on the CDW aggregating EHR data from 37 hospitals. Two DQ campaigns were conducted in patient identification (PI) and healthcare services (HS). The results of the semi-automated DQ profiling in the PI data set (8.8 M patients) and the HS data set (13,099 consultations, 2,122 care units) are presented with improvement campaigns that have already resulted in significant DQ improvement (11). The paper from Estiri et al. , 12 , also selected as a best paper , presents DQe-c, an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in EHR data repositories based on either the PCORnet ¯ or OMOP Common Data Model. DQe-c was validated on 200 000 patient records randomly selected from the Research Patient Data Registry at Partners HealthCare. The web-based DQ reports include descriptive graphics and tables that are tailored to EHR DQ assessment but could be extended to the other steps of the data quality life-cycle.

Security

Linking record-level data between repositories often utilises a pseudonym (a linkage key), for which privacy preserving linkage is an important approach to enable compliance with the EU General Data Protection Regulation (GDPR). A paper in 2018 applies the secure Multi-Party Computation (MPC), a well-known technique for Privacy-Preserving Data Mining, to three pilot data mining scenarios: location tracking within a hospital; joint data analysis across multiple care providers; mining a mixture of data sources 13 . MPC is proposed as a scalable method for linked data mining in a GDPR compliant way.

Initiatives for Scaling up Real World Data

Several European countries, alongside others globally, are investing in national infrastructures and competencies to integrate EHR data at scale to enable big data research. The two newest programmes to be launched are in Germany 14 and France. They have been designed quite differently, and the Survey Paper in this section provides an in depth analysis and comparison of both initiatives 15 . There are valuable opportunities for both programmes to learn from each other.

13 in total

1. Enabling Analytics on Sensitive Medical Data with Secure Multi-Party Computation.

Authors: Meilof Veeningen; Supriyo Chatterjea; Anna Zsófia Horváth; Gerald Spindler; Eric Boersma; Peter van der Spek; Onno van der Galiën; Job Gutteling; Wessel Kraaij; Thijs Veugen
Journal: Stud Health Technol Inform Date: 2018

2. Federated learning of predictive models from federated Electronic Health Records.

Authors: Theodora S Brisimi; Ruidi Chen; Theofanie Mela; Alex Olshevsky; Ioannis Ch Paschalidis; Wei Shi
Journal: Int J Med Inform Date: 2018-01-12 Impact factor: 4.046

3. Exploring completeness in clinical data research networks with DQe-c.

Authors: Hossein Estiri; Kari A Stephens; Jeffrey G Klann; Shawn N Murphy
Journal: J Am Med Inform Assoc Date: 2018-01-01 Impact factor: 4.497

4. Combining information from a clinical data warehouse and a pharmaceutical database to generate a framework to detect comorbidities in electronic health records.

Authors: Emmanuelle Sylvestre; Guillaume Bouzillé; Emmanuel Chazard; Cécil His-Mahier; Christine Riou; Marc Cuggia
Journal: BMC Med Inform Decis Mak Date: 2018-01-24 Impact factor: 2.796

5. Automating Electronic Clinical Data Capture for Quality Improvement and Research: The CERTAIN Validation Project of Real World Evidence.

Authors: Emily Beth Devine; Erik Van Eaton; Megan E Zadworny; Rebecca Symons; Allison Devlin; David Yanez; Meliha Yetisgen; Katelyn R Keyloun; Daniel Capurro; Rafael Alfonso-Cristancho; David R Flum; Peter Tarczy-Hornoch
Journal: EGEMS (Wash DC) Date: 2018-05-22

6. i2b2 implemented over SMART-on-FHIR.

Authors: Nicolas Paris; Michael Mendis; Christel Daniel; Shawn Murphy; Xavier Tannier; Pierre Zweigenbaum
Journal: AMIA Jt Summits Transl Sci Proc Date: 2018-05-18

7. Next generation phenotyping using narrative reports in a rare disease clinical data warehouse.

Authors: Nicolas Garcelon; Antoine Neuraz; Rémi Salomon; Nadia Bahi-Buisson; Jeanne Amiel; Capucine Picard; Nizar Mahlaoui; Vincent Benoit; Anita Burgun; Bastien Rance
Journal: Orphanet J Rare Dis Date: 2018-05-31 Impact factor: 4.123

8. Ad Hoc Information Extraction for Clinical Data Warehouses.

Authors: Georg Dietrich; Jonathan Krebs; Georg Fette; Maximilian Ertl; Mathias Kaspar; Stefan Störk; Frank Puppe
Journal: Methods Inf Med Date: 2018-05-25 Impact factor: 2.176

9. German Medical Informatics Initiative.

Authors: Sebastian C Semler; Frank Wissing; Ralf Heyder
Journal: Methods Inf Med Date: 2018-07-17 Impact factor: 2.176

Review 10. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.

Authors: Harry Hemingway; Folkert W Asselbergs; John Danesh; Richard Dobson; Nikolaos Maniadakis; Aldo Maggioni; Ghislaine J M van Thiel; Maureen Cronin; Gunnar Brobert; Panos Vardas; Stefan D Anker; Diederick E Grobbee; Spiros Denaxas
Journal: Eur Heart J Date: 2018-04-21 Impact factor: 29.983

1 in total

1. What Role Can Process Mining Play in Recurrent Clinical Guidelines Issues? A Position Paper.

Authors: Roberto Gatta; Mauro Vallati; Carlos Fernandez-Llatas; Antonio Martinez-Millana; Stefania Orini; Lucia Sacchi; Jacopo Lenkowicz; Mar Marcos; Jorge Munoz-Gama; Michel A Cuendet; Berardino de Bari; Luis Marco-Ruiz; Alessandro Stefanini; Zoe Valero-Ramon; Olivier Michielin; Tomas Lapinskas; Antanas Montvila; Niels Martin; Erica Tavazzi; Maurizio Castellano
Journal: Int J Environ Res Public Health Date: 2020-09-11 Impact factor: 3.390

1 in total