Literature DB >> 33333323

Domain specific word embeddings for natural language processing in radiology.

Timothy L Chen1, Max Emerling2, Gunvant R Chaudhari3, Yeshwant R Chillakuru4, Youngho Seo3, Thienkhai H Vu3, Jae Ho Sohn5.   

Abstract

BACKGROUND: There has been increasing interest in machine learning based natural language processing (NLP) methods in radiology; however, models have often used word embeddings trained on general web corpora due to lack of a radiology-specific corpus.
PURPOSE: We examined the potential of Radiopaedia to serve as a general radiology corpus to produce radiology specific word embeddings that could be used to enhance performance on a NLP task on radiological text.
MATERIALS AND METHODS: Embeddings of dimension 50, 100, 200, and 300 were trained on articles collected from Radiopaedia using a GloVe algorithm and evaluated on analogy completion. A shallow neural network using input from either our trained embeddings or pre-trained Wikipedia 2014 + Gigaword 5 (WG) embeddings was used to label the Radiopaedia articles. Labeling performance was evaluated based on exact match accuracy and Hamming loss. The McNemar's test with continuity and the Benjamini-Hochberg correction and a 5×2 cross validation paired two-tailed t-test were used to assess statistical significance.
RESULTS: For accuracy in the analogy task, 50-dimensional (50-D) Radiopaedia embeddings outperformed WG embeddings on tumor origin analogies (p < 0.05) and organ adjectives (p < 0.01) whereas WG embeddings tended to outperform on inflammation location and bone vs. muscle analogies (p < 0.01). The two embeddings had comparable performance on other subcategories. In the labeling task, the Radiopaedia-based model outperformed the WG based model at 50, 100, 200, and 300-D for exact match accuracy (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively) and Hamming loss (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively).
CONCLUSION: We have developed a set of word embeddings from Radiopaedia and shown that they can preserve relevant medical semantics and augment performance on a radiology NLP task. Our results suggest that the cultivation of a radiology-specific corpus can benefit radiology NLP models in the future.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Analogy completion; Multi-label classification; Natural language processing; Word embeddings

Mesh:

Year:  2020        PMID: 33333323      PMCID: PMC7856086          DOI: 10.1016/j.jbi.2020.103665

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  20 in total

1.  RadLex: a new method for indexing online educational materials.

Authors:  Curtis P Langlotz
Journal:  Radiographics       Date:  2006 Nov-Dec       Impact factor: 5.333

2.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

3.  Recurrent neural networks for classifying relations in clinical notes.

Authors:  Yuan Luo
Journal:  J Biomed Inform       Date:  2017-07-08       Impact factor: 6.317

4.  Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches.

Authors:  Shumei Miao; Tingyu Xu; Yonghui Wu; Hui Xie; Jingqi Wang; Shenqi Jing; Yaoyun Zhang; Xiaoliang Zhang; Yinshuang Yang; Xin Zhang; Tao Shan; Li Wang; Hua Xu; Shui Wang; Yun Liu
Journal:  Int J Med Inform       Date:  2018-08-18       Impact factor: 4.046

Review 5.  Natural Language Processing in Radiology: A Systematic Review.

Authors:  Ewoud Pons; Loes M M Braun; M G Myriam Hunink; Jan A Kors
Journal:  Radiology       Date:  2016-05       Impact factor: 11.105

6.  Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort.

Authors:  Imon Banerjee; Matthew C Chen; Matthew P Lungren; Daniel L Rubin
Journal:  J Biomed Inform       Date:  2017-11-23       Impact factor: 6.317

7.  Deep Learning to Classify Radiology Free-Text Reports.

Authors:  Matthew C Chen; Robyn L Ball; Lingyao Yang; Nathaniel Moradzadeh; Brian E Chapman; David B Larson; Curtis P Langlotz; Timothy J Amrhein; Matthew P Lungren
Journal:  Radiology       Date:  2017-11-13       Impact factor: 11.105

8.  Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification.

Authors:  Imon Banerjee; Yuan Ling; Matthew C Chen; Sadid A Hasan; Curtis P Langlotz; Nathaniel Moradzadeh; Brian Chapman; Timothy Amrhein; David Mong; Daniel L Rubin; Oladimeji Farri; Matthew P Lungren
Journal:  Artif Intell Med       Date:  2018-11-23       Impact factor: 5.326

9.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.

Authors:  Hoo-Chang Shin; Holger R Roth; Mingchen Gao; Le Lu; Ziyue Xu; Isabella Nogues; Jianhua Yao; Daniel Mollura; Ronald M Summers
Journal:  IEEE Trans Med Imaging       Date:  2016-02-11       Impact factor: 10.048

10.  MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.

Authors:  Alistair E W Johnson; Tom J Pollard; Seth J Berkowitz; Nathaniel R Greenbaum; Matthew P Lungren; Chih-Ying Deng; Roger G Mark; Steven Horng
Journal:  Sci Data       Date:  2019-12-12       Impact factor: 6.444

View more
  1 in total

1.  Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).

Authors:  Jia Li; Yucong Lin; Pengfei Zhao; Wenjuan Liu; Linkun Cai; Jing Sun; Lei Zhao; Zhenghan Yang; Hong Song; Han Lv; Zhenchang Wang
Journal:  BMC Med Inform Decis Mak       Date:  2022-07-30       Impact factor: 3.298

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.