Garrett Eickelberg1, Yuan Luo1, L Nelson Sanchez-Pinto1,2. 1. Department of Preventive Medicine (Health & Biomedical Informatics), Feinberg School of Medicine, Chicago, Illinois, USA. 2. Department of Pediatrics (Critical Care), Chicago, Illinois, USA.
Abstract
Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.
Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Michael E Matheny; Fern Fitzhenry; Theodore Speroff; Jacob Hathaway; Harvey J Murff; Steven H Brown; Elliot M Fielstein; Robert S Dittus; Peter L Elkin Journal: AMIA Annu Symp Proc Date: 2009-11-14
Authors: J Michael Miller; Matthew J Binnicker; Sheldon Campbell; Karen C Carroll; Kimberle C Chapin; Peter H Gilligan; Mark D Gonzalez; Robert C Jerris; Sue C Kehl; Robin Patel; Bobbi S Pritt; Sandra S Richter; Barbara Robinson-Dunn; Joseph D Schwartzman; James W Snyder; Sam Telford; Elitza S Theel; Richard B Thomson; Melvin P Weinstein; Joseph D Yao Journal: Clin Infect Dis Date: 2018-08-31 Impact factor: 9.079
Authors: Makoto Jones; Scott L DuVall; Joshua Spuhl; Matthew H Samore; Christopher Nielson; Michael Rubin Journal: BMC Med Inform Decis Mak Date: 2012-07-11 Impact factor: 2.796
Authors: Paul Turner; Andrew Fox-Lewis; Poojan Shrestha; David A B Dance; Tri Wangrangsimakul; Tomas-Paul Cusack; Clare L Ling; Jill Hopkins; Tamalee Roberts; Direk Limmathurotsakul; Ben S Cooper; Susanna Dunachie; Catrin E Moore; Christiane Dolecek; H Rogier van Doorn; Philippe J Guerin; Nicholas P J Day; Elizabeth A Ashley Journal: BMC Med Date: 2019-03-29 Impact factor: 8.775