David Goodman-Meza1,2, Amber Tang3, Babak Aryanfar2, Sergio Vazquez4, Adam J Gordon5,6, Michihiko Goto7,8, Matthew Bidwell Goetz2,3, Steven Shoptaw9, Alex A T Bui10. 1. Division of Infectious Diseases, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA. 2. Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA. 3. Department of Internal Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA. 4. Undergraduate Studies, Dartmouth College, Hanover, New Hampshire, USA. 5. Informatics, Decision-Enhancement, and Analytic Sciences Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA. 6. Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, Utah, USA. 7. Department of Internal Medicine, University of Iowa, Iowa City, Iowa, USA. 8. Center for Access and Delivery Research and Evaluation, Iowa City Veterans Affairs Medical Center, Iowa City, Iowa, USA. 9. Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA. 10. Medical and Imaging Informatics Group, Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, California, USA.
Abstract
Background: Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific International Classification of Diseases (ICD) codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific ICD codes for identifying PWID. Methods: We manually reviewed 1000 records of patients diagnosed with Staphylococcus aureus bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of ICD codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value. Results: Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all ICD-based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best ICD-based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%. Conclusions: NLP/ML outperformed ICD-based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance. Published by Oxford University Press on behalf of Infectious Diseases Society of America 2022.
Background: Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific International Classification of Diseases (ICD) codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific ICD codes for identifying PWID. Methods: We manually reviewed 1000 records of patients diagnosed with Staphylococcus aureus bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of ICD codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value. Results: Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all ICD-based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best ICD-based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%. Conclusions: NLP/ML outperformed ICD-based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance. Published by Oxford University Press on behalf of Infectious Diseases Society of America 2022.
Authors: Saeed Mehrabi; Anand Krishnan; Sunghwan Sohn; Alexandra M Roch; Heidi Schmidt; Joe Kesterson; Chris Beesley; Paul Dexter; C Max Schmidt; Hongfang Liu; Mathew Palakal Journal: J Biomed Inform Date: 2015-03-16 Impact factor: 6.317
Authors: Laura J Ball; Adeel Sherazi; Dora Laczko; Kaveri Gupta; Sharon Koivu; Matthew A Weir; Tina Mele; Rommel Tirona; John K McCormick; Michael Silverman Journal: Med Care Date: 2018-10 Impact factor: 2.983
Authors: Michihiko Goto; Marin L Schweizer; Mary S Vaughan-Sarrazin; Eli N Perencevich; Daniel J Livorsi; Daniel J Diekema; Kelly K Richardson; Brice F Beck; Bruce Alexander; Michael E Ohl Journal: JAMA Intern Med Date: 2017-10-01 Impact factor: 21.873
Authors: Eric W Hall; Eli S Rosenberg; Christopher M Jones; Alice Asher; Eduardo Valverde; Heather Bradley Journal: Drug Alcohol Depend Date: 2022-03-26 Impact factor: 4.852
Authors: Kaitlin M McGrew; Juell B Homco; Tabitha Garwe; Hanh Dung Dao; Mary B Williams; Douglas A Drevets; S Reza Jafarzadeh; Yan Daniel Zhao; Hélène Carabin Journal: Drug Alcohol Depend Date: 2019-12-23 Impact factor: 4.852
Authors: Alysse G Wurcel; Jordan E Anderson; Kenneth K H Chui; Sally Skinner; Tamsin A Knox; David R Snydman; Thomas J Stopka Journal: Open Forum Infect Dis Date: 2016-07-26 Impact factor: 3.835
Authors: Kiersten L Strombotne; Aaron Legler; Taeko Minegishi; Jodie A Trafton; Elizabeth M Oliva; Eleanor T Lewis; Pooja Sohoni; Melissa M Garrido; Steven D Pizer; Austin B Frakt Journal: J Gen Intern Med Date: 2022-05-02 Impact factor: 6.473
Authors: Kaitlin M McGrew; Hélène Carabin; Tabitha Garwe; S Reza Jafarzadeh; Mary B Williams; Yan Daniel Zhao; Douglas A Drevets Journal: Drug Alcohol Depend Date: 2020-03-04 Impact factor: 4.852