Edna C Shenvi1, Daniella Meeker2, Aziz A Boxwala3. 1. Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, United States. 2. RAND Corporation, Santa Monica, CA, United States. 3. Meliorix Inc., La Jolla, CA, United States. Electronic address: aziz.boxwala@meliorix.com.
Abstract
BACKGROUND AND OBJECTIVE: Usage of data from electronic health records (EHRs) in clinical research is increasing, but there is little empirical knowledge of the data needed to support multiple types of research these sources support. This study seeks to characterize the types and patterns of data usage from EHRs for clinical research. MATERIALS AND METHODS: We analyzed the data requirements of over 100 retrospective studies by mapping the selection criteria and study variables to data elements of two standard data dictionaries, one from the healthcare domain and the other from the clinical research domain. We also contacted study authors to validate our results. RESULTS: The majority of variables mapped to one or to both of the two dictionaries. Studies used an average of 4.46 (range 1-12) data element types in the selection criteria and 6.44 (range 1-15) in the study variables. The most frequently used items (e.g., procedure, condition, medication) are often available in coded form in EHRs. Study criteria were frequently complex, with 49 of 104 studies involving relationships between data elements and 22 of the studies using aggregate operations for data variables. Author responses supported these findings. DISCUSSION AND CONCLUSION: The high proportion of mapped data elements demonstrates the significant potential for clinical data warehousing to facilitate clinical research. Unmapped data elements illustrate the difficulty in developing a complete data dictionary.
BACKGROUND AND OBJECTIVE: Usage of data from electronic health records (EHRs) in clinical research is increasing, but there is little empirical knowledge of the data needed to support multiple types of research these sources support. This study seeks to characterize the types and patterns of data usage from EHRs for clinical research. MATERIALS AND METHODS: We analyzed the data requirements of over 100 retrospective studies by mapping the selection criteria and study variables to data elements of two standard data dictionaries, one from the healthcare domain and the other from the clinical research domain. We also contacted study authors to validate our results. RESULTS: The majority of variables mapped to one or to both of the two dictionaries. Studies used an average of 4.46 (range 1-12) data element types in the selection criteria and 6.44 (range 1-15) in the study variables. The most frequently used items (e.g., procedure, condition, medication) are often available in coded form in EHRs. Study criteria were frequently complex, with 49 of 104 studies involving relationships between data elements and 22 of the studies using aggregate operations for data variables. Author responses supported these findings. DISCUSSION AND CONCLUSION: The high proportion of mapped data elements demonstrates the significant potential for clinical data warehousing to facilitate clinical research. Unmapped data elements illustrate the difficulty in developing a complete data dictionary.
Authors: Shawn N Murphy; Griffin Weber; Michael Mendis; Vivian Gainer; Henry C Chueh; Susanne Churchill; Isaac Kohane Journal: J Am Med Inform Assoc Date: 2010 Mar-Apr Impact factor: 4.497
Authors: Russell A Wilke; Richard L Berg; Peggy Peissig; Terrie Kitchner; Bozana Sijercic; Catherine A McCarty; Daniel J McCarty Journal: Clin Med Res Date: 2007-03
Authors: Gregory W Hruby; Julia Hoxha; Praveen Chandar Ravichandran; Eneida A Mendonça; David A Hanauer; Chunhua Weng Journal: Int J Med Inform Date: 2016-04-02 Impact factor: 4.046
Authors: Diana Litmanovich; Alex Proutski; Viacheslav V Danilov; Alexander Kirpich; Dato Nefaridze; Alex Karpovsky; Yuriy Gankin Journal: Sci Rep Date: 2022-07-27 Impact factor: 4.996