Literature DB >> 27506144

Validating the extract, transform, load process used to populate a large clinical research database.

Michael J Denney1, Dustin M Long2, Matthew G Armistead1, Jamie L Anderson3, Baqiyyah N Conway4.   

Abstract

BACKGROUND: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems.
METHODS: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems.
RESULTS: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems.
CONCLUSION: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of an ETL process. As secondary use of EHR data begins to transform the research landscape, the importance of the basic validation of the extracted EHR data cannot be underestimated and should start with the validation of the extraction process itself.
Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Keywords:  Clinical data warehouse; Correctness; Electronic health record; Extract transform load; Informatics

Mesh:

Year:  2016        PMID: 27506144      PMCID: PMC5556907          DOI: 10.1016/j.ijmedinf.2016.07.009

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  6 in total

1.  Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data.

Authors:  J R Logan; P N Gorman; B Middleton
Journal:  Proc AMIA Symp       Date:  2001

2.  Validity of The Health Improvement Network (THIN) for the study of psoriasis.

Authors:  N M Seminara; K Abuabara; D B Shin; S M Langan; S E Kimmel; D Margolis; A B Troxel; J M Gelfand
Journal:  Br J Dermatol       Date:  2011-02-03       Impact factor: 9.302

3.  Use and abuse of computer-stored medical records.

Authors:  J van der Lei
Journal:  Methods Inf Med       Date:  1991-04       Impact factor: 2.176

4.  Validity of The Health Improvement Network (THIN) for epidemiologic studies of hepatitis C virus infection.

Authors:  Vincent Lo Re; Kevin Haynes; Kimberly A Forde; A Russell Localio; Rita Schinnar; James D Lewis
Journal:  Pharmacoepidemiol Drug Saf       Date:  2009-09       Impact factor: 2.890

5.  On experiences of i2b2 (Informatics for integrating biology and the bedside) database with Japanese clinical patients' data.

Authors:  Takako Takai-Igarashi; Ryo Akasaka; Kenji Suzuki; Takahisa Furukawa; Makiko Yoshida; Keisuke Inoue; Tomohisa Maruyama; Toshimasa Maejima; Masahiro Bando; Masakazu Takasaki; Miki Sakota; Maki Eguchi; Akihiko Konagaya; Hiroya Matsuura; Toyotaro Suzumura; Hiroshi Tanaka
Journal:  Bioinformation       Date:  2011-03-26

Review 6.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.

Authors:  Nicole Gray Weiskopf; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2012-06-25       Impact factor: 4.497

  6 in total
  11 in total

1.  Independent risk factors for surgical site infection after cesarean delivery in a rural tertiary care medical center.

Authors:  Manuel C Vallejo; Ahmed F Attaallah; Robert E Shapiro; Osama M Elzamzamy; Michael G Mueller; Warren S Eller
Journal:  J Anesth       Date:  2016-10-12       Impact factor: 2.078

2.  Utilizing patient geographic information system data to plan telemedicine service locations.

Authors:  Neelkamal Soares; Joseph Dewalle; Ben Marsh
Journal:  J Am Med Inform Assoc       Date:  2017-09-01       Impact factor: 4.497

3.  Patient Cohort Identification on Time Series Data Using the OMOP Common Data Model.

Authors:  Christian Maier; Lorenz A Kapsner; Sebastian Mate; Hans-Ulrich Prokosch; Stefan Kraus
Journal:  Appl Clin Inform       Date:  2021-01-27       Impact factor: 2.342

4.  Coronary Artery Disease Phenotype Detection in an Academic Hospital System Setting.

Authors:  Amy Joseph; Charles Mullett; Christa Lilly; Matthew Armistead; Harold J Cox; Michael Denney; Misha Varma; David Rich; Donald A Adjeroh; Gianfranco Doretto; William Neal; Lee A Pyles
Journal:  Appl Clin Inform       Date:  2021-01-06       Impact factor: 2.342

Review 5.  Validation of asthma recording in electronic health records: a systematic review.

Authors:  Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas
Journal:  Clin Epidemiol       Date:  2017-12-01       Impact factor: 4.790

6.  Validation of asthma recording in electronic health records: protocol for a systematic review.

Authors:  Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas
Journal:  BMJ Open       Date:  2017-05-29       Impact factor: 2.692

7.  Validation of asthma recording in the Clinical Practice Research Datalink (CPRD).

Authors:  Francis Nissen; Daniel R Morales; Hana Mullerova; Liam Smeeth; Ian J Douglas; Jennifer K Quint
Journal:  BMJ Open       Date:  2017-08-11       Impact factor: 2.692

8.  A Framework for Classification of Electronic Health Data Extraction-Transformation-Loading Challenges in Data Network Participation.

Authors:  Toan Ong; Rosina Pradhananga; Erin Holve; Michael G Kahn
Journal:  EGEMS (Wash DC)       Date:  2017-06-13

9.  Use of nonintrusive sensor-based information and communication technology for real-world evidence for clinical trials in dementia.

Authors:  Stefan Teipel; Alexandra König; Jesse Hoey; Jeff Kaye; Frank Krüger; Julie M Robillard; Thomas Kirste; Claudio Babiloni
Journal:  Alzheimers Dement       Date:  2018-06-21       Impact factor: 21.566

10.  Design and Development of Lubricating Material Database and Research on Performance Prediction Method of Machine Learning.

Authors:  Dan Jia; Haitao Duan; Shengpeng Zhan; Yongliang Jin; Bingxue Cheng; Jian Li
Journal:  Sci Rep       Date:  2019-12-30       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.