Michael J Denney1, Dustin M Long2, Matthew G Armistead1, Jamie L Anderson3, Baqiyyah N Conway4. 1. Biomedical Informatics, West Virginia Clinical and Translational Science Institute, Morgantown, WV, USA. 2. Department of Biostatistics, West Virginia University, Morgantown, WV, USA. 3. Department of Health Information Management, West Virginia University Healthcare, Morgantown, WV, USA. 4. Department of Epidemiology, West Virginia University, Morgantown, WV, USA. Electronic address: bnconway@hsc.wvu.edu.
Abstract
BACKGROUND: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems. METHODS: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems. RESULTS: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems. CONCLUSION: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of an ETL process. As secondary use of EHR data begins to transform the research landscape, the importance of the basic validation of the extracted EHR data cannot be underestimated and should start with the validation of the extraction process itself.
BACKGROUND: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems. METHODS: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems. RESULTS: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems. CONCLUSION: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of an ETL process. As secondary use of EHR data begins to transform the research landscape, the importance of the basic validation of the extracted EHR data cannot be underestimated and should start with the validation of the extraction process itself.
Authors: N M Seminara; K Abuabara; D B Shin; S M Langan; S E Kimmel; D Margolis; A B Troxel; J M Gelfand Journal: Br J Dermatol Date: 2011-02-03 Impact factor: 9.302
Authors: Vincent Lo Re; Kevin Haynes; Kimberly A Forde; A Russell Localio; Rita Schinnar; James D Lewis Journal: Pharmacoepidemiol Drug Saf Date: 2009-09 Impact factor: 2.890
Authors: Manuel C Vallejo; Ahmed F Attaallah; Robert E Shapiro; Osama M Elzamzamy; Michael G Mueller; Warren S Eller Journal: J Anesth Date: 2016-10-12 Impact factor: 2.078
Authors: Amy Joseph; Charles Mullett; Christa Lilly; Matthew Armistead; Harold J Cox; Michael Denney; Misha Varma; David Rich; Donald A Adjeroh; Gianfranco Doretto; William Neal; Lee A Pyles Journal: Appl Clin Inform Date: 2021-01-06 Impact factor: 2.342
Authors: Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas Journal: Clin Epidemiol Date: 2017-12-01 Impact factor: 4.790
Authors: Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas Journal: BMJ Open Date: 2017-05-29 Impact factor: 2.692
Authors: Francis Nissen; Daniel R Morales; Hana Mullerova; Liam Smeeth; Ian J Douglas; Jennifer K Quint Journal: BMJ Open Date: 2017-08-11 Impact factor: 2.692
Authors: Stefan Teipel; Alexandra König; Jesse Hoey; Jeff Kaye; Frank Krüger; Julie M Robillard; Thomas Kirste; Claudio Babiloni Journal: Alzheimers Dement Date: 2018-06-21 Impact factor: 21.566