W Chen1, R Kowatch2, S Lin1, M Splaingard3, Y Huang1. 1. Research Information Solutions and Innovations , Columbus, OH. 2. Center for Innovation in Pediatric Practice , Columbus, OH. 3. Sleep Disorder Center, Nationwide Children's Hospital , Columbus, OH.
Abstract
UNLABELLED: Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. OBJECTIVE: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. METHODS: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. RESULTS: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. CONCLUSION: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
UNLABELLED: Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. OBJECTIVE: We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. METHODS: We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. RESULTS: 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. CONCLUSION: Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
Authors: Jennifer H Garvin; Scott L DuVall; Brett R South; Bruce E Bray; Daniel Bolton; Julia Heavirland; Steve Pickard; Paul Heidenreich; Shuying Shen; Charlene Weir; Matthew Samore; Mary K Goldstein Journal: J Am Med Inform Assoc Date: 2012-03-21 Impact factor: 4.497
Authors: Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu Journal: J Am Med Inform Assoc Date: 2011-04-20 Impact factor: 4.497
Authors: Carole L Marcus; Reneé H Moore; Carol L Rosen; Bruno Giordani; Susan L Garetz; H Gerry Taylor; Ron B Mitchell; Raouf Amin; Eliot S Katz; Raanan Arens; Shalini Paruthi; Hiren Muzumdar; David Gozal; Nina Hattiangadi Thomas; Janice Ware; Dean Beebe; Karen Snyder; Lisa Elden; Robert C Sprecher; Paul Willging; Dwight Jones; John P Bent; Timothy Hoban; Ronald D Chervin; Susan S Ellenberg; Susan Redline Journal: N Engl J Med Date: 2013-05-21 Impact factor: 91.245
Authors: Marc D Natter; Justin Quan; David M Ortiz; Athos Bousvaros; Norman T Ilowite; Christi J Inman; Keith Marsolo; Andrew J McMurry; Christy I Sandborg; Laura E Schanberg; Carol A Wallace; Robert W Warren; Griffin M Weber; Kenneth D Mandl Journal: J Am Med Inform Assoc Date: 2012-06-25 Impact factor: 4.497
Authors: Stephen T Wu; Hongfang Liu; Dingcheng Li; Cui Tao; Mark A Musen; Christopher G Chute; Nigam H Shah Journal: J Am Med Inform Assoc Date: 2012-04-04 Impact factor: 4.497
Authors: Catherine Byrd; Ureka Ajawara; Ryan Laundry; John Radin; Prasha Bhandari; Ann Leung; Summer Han; Stephen M Asch; Steven Zeliadt; Alex H S Harris; Leah Backhus Journal: BMC Med Inform Decis Mak Date: 2022-06-03 Impact factor: 3.298