Charmaine S Tam1,2, Janice Gullick3, Aldo Saavedra4,5, Stephen T Vernon6, Gemma A Figtree7,6, Clara K Chow8,9, Michelle Cretikos10, Richard W Morris4,7, Maged William11, Jonathan Morris7,12, David Brieger13. 1. Centre for Translational Data Science, The University of Sydney, Sydney, Australia. charmaine.tam@sydney.edu.au. 2. Northern Clinical School, The University of Sydney, Sydney, Australia. charmaine.tam@sydney.edu.au. 3. Susan Wakil School of Nursing and Midwifery, The University of Sydney, Sydney, Australia. 4. Centre for Translational Data Science, The University of Sydney, Sydney, Australia. 5. Faculty of Health Sciences, The University of Sydney, Sydney, Australia. 6. Cardiothoracic and Vascular Health, Kolling Institute of Medical Research and Department of Cardiology, Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, Australia. 7. Northern Clinical School, The University of Sydney, Sydney, Australia. 8. Westmead Applied Research Centre, The University of Sydney, Sydney, Australia. 9. Department of Cardiology, Westmead Hospital, Sydney, Australia. 10. Centre for Population Health, NSW Ministry of Health, Sydney, Australia. 11. Department of Cardiology, Central Coast Local Health District and University of Newcastle, Sydney, Australia. 12. Clinical and Population Perinatal Health, Northern Sydney Local Health District, Sydney, Australia. 13. Department of Cardiology, Concord Hospital, Sydney, Australia.
Abstract
BACKGROUND: There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. METHODS: Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. RESULTS: Among 802,742 encounters in a 5 year dataset (1/1/13-30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4-64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. CONCLUSIONS: Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
BACKGROUND: There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. METHODS:Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. RESULTS: Among 802,742 encounters in a 5 year dataset (1/1/13-30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4-64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. CONCLUSIONS: Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
Authors: Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2016-03-28 Impact factor: 4.497
Authors: Balwinder Singh; Amandeep Singh; Adil Ahmed; Gregory A Wilson; Brian W Pickering; Vitaly Herasevich; Ognjen Gajic; Guangxi Li Journal: Mayo Clin Proc Date: 2012-09 Impact factor: 7.616
Authors: Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J Embi; Noemie Elhadad; Stephen B Johnson; Albert M Lai Journal: J Am Med Inform Assoc Date: 2013-11-07 Impact factor: 4.497
Authors: Rahul Kashyap; Kumar Sarvottam; Gregory A Wilson; Jacob C Jentzer; Mohamed O Seisa; Kianoush B Kashani Journal: BMC Med Inform Decis Mak Date: 2020-05-07 Impact factor: 2.796
Authors: Janice Gullick; John Wu; Derek Chew; Chris Gale; Andrew T Yan; Shaun G Goodman; Donna Waters; Karice Hyun; David Brieger Journal: BMC Health Serv Res Date: 2022-03-22 Impact factor: 2.655
Authors: Dorian Culié; Renaud Schiappa; Sara Contu; Boris Scheller; Agathe Villarme; Olivier Dassonville; Gilles Poissonnet; Alexandre Bozec; Emmanuel Chamorey Journal: Int J Environ Res Public Health Date: 2022-09-26 Impact factor: 4.614