Hannah L Weeks1, Cole Beck1, Elizabeth McNeer1, Michael L Williams1, Cosmin A Bejan2, Joshua C Denny3, Leena Choi1. 1. Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 2. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 3. Department of Biomedical Informatics, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Abstract
OBJECTIVE: We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. MATERIALS AND METHODS: Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. RESULTS: On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures >0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures >0.90 in the MIMIC-III dataset. DISCUSSION: The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. CONCLUSIONS: The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.
OBJECTIVE: We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. MATERIALS AND METHODS: Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. RESULTS: On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures >0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures >0.90 in the MIMIC-III dataset. DISCUSSION: The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. CONCLUSIONS: The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Hua Xu; Shane P Stenner; Son Doan; Kevin B Johnson; Lemuel R Waitman; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497
Authors: D M Roden; J M Pulley; M A Basford; G R Bernard; E W Clayton; J R Balser; D R Masys Journal: Clin Pharmacol Ther Date: 2008-05-21 Impact factor: 6.875
Authors: Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2016-03-28 Impact factor: 4.497
Authors: Kelly A Birdwell; Ben Grady; Leena Choi; Hua Xu; Aihua Bian; Josh C Denny; Min Jiang; Gayle Vranic; Melissa Basford; James D Cowan; Danielle M Richardson; Melanie P Robinson; Talat Alp Ikizler; Marylyn D Ritchie; Charles Michael Stein; David W Haas Journal: Pharmacogenet Genomics Date: 2012-01 Impact factor: 2.089
Authors: Caitlin W Elgarten; Joel C Thompson; Anne Angiolillo; Zhiguo Chen; Susan Conway; Meenakshi Devidas; Sumit Gupta; John A Kairalla; Jennifer L McNeer; Maureen M O'Brien; Karen R Rabin; Rachel E Rau; Susan R Rheingold; Cindy Wang; Charlotte Wood; Elizabeth A Raetz; Mignon L Loh; Sarah Alexander; Tamara P Miller Journal: Pediatr Blood Cancer Date: 2022-09-09 Impact factor: 3.838
Authors: Michael L Williams; Hannah L Weeks; Cole Beck; Kelly A Birdwell; Sara L Van Driest; Leena Choi Journal: Br J Clin Pharmacol Date: 2022-01-27 Impact factor: 3.716
Authors: Elizabeth McNeer; Cole Beck; Hannah L Weeks; Michael L Williams; Nathan T James; Cosmin A Bejan; Leena Choi Journal: J Am Med Inform Assoc Date: 2021-03-18 Impact factor: 7.942
Authors: Sally L Baxter; Adam R Klie; Bharanidharan Radha Saseendrakumar; Gordon Y Ye; Michael Hogarth Journal: J Med Internet Res Date: 2020-08-14 Impact factor: 5.428
Authors: Cosmin A Bejan; Katherine N Cahill; Patrick J Staso; Leena Choi; Josh F Peterson; Elizabeth J Phillips Journal: Clin Pharmacol Ther Date: 2021-08-10 Impact factor: 6.903