Literature DB >> 31233140

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse.

Dmitriy Dligach1,2,3, Majid Afshar2,3, Timothy Miller4.   

Abstract

OBJECTIVE: Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks.
MATERIALS AND METHODS: Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task.
RESULTS: We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. DISCUSSION: We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task.
CONCLUSIONS: We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Keywords:  biomedical informatics; natural language processing; phenotyping

Mesh:

Year:  2019        PMID: 31233140      PMCID: PMC6798566          DOI: 10.1093/jamia/ocz072

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  10 in total

1.  Catastrophic forgetting in connectionist networks.

Authors: 
Journal:  Trends Cogn Sci       Date:  1999-04       Impact factor: 20.229

2.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

3.  $\mathtt {Deepr}$: A Convolutional Net for Medical Records.

Authors:  Phuoc Nguyen; Truyen Tran; Nilmini Wickramasinghe; Svetha Venkatesh
Journal:  IEEE J Biomed Health Inform       Date:  2016-12-01       Impact factor: 5.772

4.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

Authors:  Edward Choi; Mohammad Taha Bahadori; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  JMLR Workshop Conf Proc       Date:  2016-12-10

5.  Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption--II.

Authors:  J B Saunders; O G Aasland; T F Babor; J R de la Fuente; M Grant
Journal:  Addiction       Date:  1993-06       Impact factor: 6.526

6.  Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation.

Authors:  Majid Afshar; Andrew Phillips; Niranjan Karnik; Jeanne Mueller; Daniel To; Richard Gonzalez; Ron Price; Richard Cooper; Cara Joyce; Dmitriy Dligach
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

Review 7.  Systematic review of comorbidity indices for administrative data.

Authors:  Mansour T A Sharabiani; Paul Aylin; Alex Bottle
Journal:  Med Care       Date:  2012-12       Impact factor: 2.983

8.  Patient representation learning and interpretable evaluation using clinical notes.

Authors:  Madhumita Sushil; Simon Šuster; Kim Luyckx; Walter Daelemans
Journal:  J Biomed Inform       Date:  2018-07-03       Impact factor: 6.317

9.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.

Authors:  Riccardo Miotto; Li Li; Brian A Kidd; Joel T Dudley
Journal:  Sci Rep       Date:  2016-05-17       Impact factor: 4.379

10.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

  10 in total
  6 in total

Review 1.  A scoping review of ethics considerations in clinical natural language processing.

Authors:  Oliver J Bear Don't Walk; Harry Reyes Nieva; Sandra Soo-Jin Lee; Noémie Elhadad
Journal:  JAMIA Open       Date:  2022-05-26

2.  Using Natural Language Processing and Machine Learning to Identify Hospitalized Patients with Opioid Use Disorder.

Authors:  Suzanne V Blackley; Erin MacPhaul; Bianca Martin; Wenyu Song; Joji Suzuki; Li Zhou
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

3.  Pre-training phenotyping classifiers.

Authors:  Dmitriy Dligach; Majid Afshar; Timothy Miller
Journal:  J Biomed Inform       Date:  2020-11-28       Impact factor: 6.317

4.  External validation of an opioid misuse machine learning classifier in hospitalized adult patients.

Authors:  Majid Afshar; Brihat Sharma; Sameer Bhalla; Hale M Thompson; Dmitriy Dligach; Randy A Boley; Ekta Kishen; Alan Simmons; Kathryn Perticone; Niranjan S Karnik
Journal:  Addict Sci Clin Pract       Date:  2021-03-17

5.  Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study.

Authors:  Yanqun Huang; Zhimin Zheng; Moxuan Ma; Xin Xin; Honglei Liu; Xiaolu Fei; Lan Wei; Hui Chen
Journal:  J Med Internet Res       Date:  2022-08-03       Impact factor: 7.076

6.  Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.

Authors:  Brihat Sharma; Dmitriy Dligach; Kristin Swope; Elizabeth Salisbury-Afshar; Niranjan S Karnik; Cara Joyce; Majid Afshar
Journal:  BMC Med Inform Decis Mak       Date:  2020-04-29       Impact factor: 3.298

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.