Literature DB >> 30616584

A clinical text classification paradigm using weak supervision and deep representation.

Yanshan Wang1, Sunghwan Sohn2, Sijia Liu2, Feichen Shen2, Liwei Wang2, Elizabeth J Atkinson2, Shreyasee Amin3,4, Hongfang Liu5.   

Abstract

BACKGROUND: Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these human efforts.
METHODS: We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance.
RESULTS: CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks.
CONCLUSION: The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.

Entities:  

Keywords:  Clinical text classification; Electronic health records; Machine learning; Natural language processing; Weak supervision

Mesh:

Year:  2019        PMID: 30616584      PMCID: PMC6322223          DOI: 10.1186/s12911-018-0723-6

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


  48 in total

1.  Constrained Deep Weak Supervision for Histopathology Image Segmentation.

Authors:  Zhipeng Jia; Xingyi Huang; Eric I-Chao Chang; Yan Xu
Journal:  IEEE Trans Med Imaging       Date:  2017-07-07       Impact factor: 10.048

2.  A proof of concept for assessing emergency room use with primary care data and natural language processing.

Authors:  J St-Maurice; M-H Kuo; P Gooch
Journal:  Methods Inf Med       Date:  2012-12-07       Impact factor: 2.176

Review 3.  Clinical information extraction applications: A literature review.

Authors:  Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2017-11-21       Impact factor: 6.317

4.  Identifying patient smoking status from medical discharge records.

Authors:  Ozlem Uzuner; Ira Goldstein; Yuan Luo; Isaac Kohane
Journal:  J Am Med Inform Assoc       Date:  2007-10-18       Impact factor: 4.497

5.  Adverse outcomes of osteoporotic fractures in the general population.

Authors:  L Joseph Melton
Journal:  J Bone Miner Res       Date:  2003-06       Impact factor: 6.741

6.  "Meaningful use" of electronic health records and its relevance to laboratories and pathologists.

Authors:  Walter H Henricks
Journal:  J Pathol Inform       Date:  2011-02-11

7.  Non-redundant association rules between diseases and medications: an automated method for knowledge base construction.

Authors:  François Séverac; Erik A Sauleau; Nicolas Meyer; Hassina Lefèvre; Gabriel Nisand; Nicolas Jay
Journal:  BMC Med Inform Decis Mak       Date:  2015-04-15       Impact factor: 2.796

8.  Evaluating word representation features in biomedical named entity recognition tasks.

Authors:  Buzhou Tang; Hongxin Cao; Xiaolong Wang; Qingcai Chen; Hua Xu
Journal:  Biomed Res Int       Date:  2014-03-06       Impact factor: 3.411

9.  A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews.

Authors:  Jinying Chen; Emily Druhl; Balaji Polepalli Ramesh; Thomas K Houston; Cynthia A Brandt; Donna M Zulman; Varsha G Vimalananda; Samir Malkani; Hong Yu
Journal:  J Med Internet Res       Date:  2018-01-22       Impact factor: 5.428

10.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

Authors:  Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane
Journal:  BMJ       Date:  2015-04-24
View more
  27 in total

1.  Deep Learning Identification of Asthma Inhaler Techniques in Clinical Notes.

Authors:  Bhavani Singh Agnikula Kshatriya; Elham Sagheb; Chung-Il Wi; Jungwon Yoon; Hee Yun Seol; Young Juhn; Sunghwan Sohn
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2021-01-13

2.  Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports.

Authors:  Sameer Zaman; Camille Petri; Kavitha Vimalesvaran; James Howard; Anil Bharath; Darrel Francis; Nicholas Peters; Graham D Cole; Nick Linton
Journal:  Radiol Artif Intell       Date:  2021-11-24

3.  Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials.

Authors:  Linh Hoanga; Lan Jiang; Halil Kilicoglu
Journal:  AMIA Annu Symp Proc       Date:  2022-05-23

4.  Prioritizing the glucose-lowering medicines for type 2 diabetes by an extended fuzzy decision-making approach with target-based attributes.

Authors:  Maryam Eghbali-Zarch; Reza Tavakkoli-Moghaddam; Fatemeh Esfahanian; Sara Masoud
Journal:  Med Biol Eng Comput       Date:  2022-07-01       Impact factor: 3.079

5.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

6.  Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

Authors:  Samir Gupta; Anas Belouali; Neil J Shah; Michael B Atkins; Subha Madhavan
Journal:  JCO Clin Cancer Inform       Date:  2021-05

7.  Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation.

Authors:  Marika Cusick; Prakash Adekkanattu; Thomas R Campion; Evan T Sholle; Annie Myers; Samprit Banerjee; George Alexopoulos; Yanshan Wang; Jyotishman Pathak
Journal:  J Psychiatr Res       Date:  2021-02-02       Impact factor: 4.791

8.  UMLS-based data augmentation for natural language processing of clinical research literature.

Authors:  Tian Kang; Adler Perotte; Youlan Tang; Casey Ta; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2021-03-18       Impact factor: 4.497

9.  Preliminary Radiogenomic Evidence for the Prediction of Metastasis and Chemotherapy Response in Pediatric Patients with Osteosarcoma Using 18F-FDF PET/CT, EZRIN and KI67.

Authors:  Byung-Chul Kim; Jingyu Kim; Kangsan Kim; Byung Hyun Byun; Ilhan Lim; Chang-Bae Kong; Won Seok Song; Jae-Soo Koh; Sang-Keun Woo
Journal:  Cancers (Basel)       Date:  2021-05-28       Impact factor: 6.639

10.  Maintaining proper health records improves machine learning predictions for novel 2019-nCoV.

Authors:  Koffka Khan; Emilie Ramsahai
Journal:  BMC Med Inform Decis Mak       Date:  2021-05-27       Impact factor: 2.796

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.