RESEARCH OBJECTIVE: To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. MATERIALS AND METHODS: Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. RESULTS: Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. CONCLUSIONS: End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.
RESEARCH OBJECTIVE: To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. MATERIALS AND METHODS: Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. RESULTS: Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. CONCLUSIONS: End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.
Entities:
Keywords:
Electronic health record; Meaningful Use; Natural Language Processing; Normalization; Phenotype Extraction
Authors: Cheryl Clark; John Aberdeen; Matt Coarr; David Tresner-Kirsch; Ben Wellner; Alexander Yeh; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2011-04-22 Impact factor: 4.497
Authors: W Ed Hammond; Christopher Bailey; Philippe Boucher; Mark Spohr; Patrick Whitaker Journal: Health Aff (Millwood) Date: 2010-02 Impact factor: 6.301
Authors: John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman Journal: Int J Med Inform Date: 2010-10-14 Impact factor: 4.046
Authors: Stephen T Wu; Vinod C Kaggal; Dmitriy Dligach; James J Masanz; Pei Chen; Lee Becker; Wendy W Chapman; Guergana K Savova; Hongfang Liu; Christopher G Chute Journal: J Biomed Semantics Date: 2013-01-03
Authors: Daniel Albright; Arrick Lanfranchi; Anwen Fredriksen; William F Styler; Colin Warner; Jena D Hwang; Jinho D Choi; Dmitriy Dligach; Rodney D Nielsen; James Martin; Wayne Ward; Martha Palmer; Guergana K Savova Journal: J Am Med Inform Assoc Date: 2013-01-25 Impact factor: 4.497
Authors: Elyne Scheurwegs; Kim Luyckx; Léon Luyten; Walter Daelemans; Tim Van den Bulcke Journal: J Am Med Inform Assoc Date: 2015-08-27 Impact factor: 4.497
Authors: Thomas A Oniki; Ning Zhuo; Calvin E Beebe; Hongfang Liu; Joseph F Coyle; Craig G Parker; Harold R Solbrig; Kyle Marchant; Vinod C Kaggal; Christopher G Chute; Stanley M Huff Journal: J Am Med Inform Assoc Date: 2015-11-13 Impact factor: 4.497
Authors: Guoqian Jiang; Richard C Kiefer; Luke V Rasmussen; Harold R Solbrig; Huan Mo; Jennifer A Pacheco; Jie Xu; Enid Montague; William K Thompson; Joshua C Denny; Christopher G Chute; Jyotishman Pathak Journal: J Biomed Inform Date: 2016-07-05 Impact factor: 6.317
Authors: Elizabeth S Chen; Elizabeth W Carter; Tamara J Winden; Indra Neil Sarkar; Yan Wang; Genevieve B Melton Journal: J Am Med Inform Assoc Date: 2014-10-21 Impact factor: 4.497