Literature DB >> 26063744

Expert guided natural language processing using one-class classification.

Erel Joffe1, Emily J Pettigrew2, Jorge R Herskovic3, Charles F Bearden4, Elmer V Bernstam5.   

Abstract

INTRODUCTION: Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing.
OBJECTIVES: To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text.
METHODS: The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%.
RESULTS: When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model).
CONCLUSIONS: 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  feature selection; natural language processing; novelty detection; one class classification

Mesh:

Year:  2015        PMID: 26063744      PMCID: PMC4986669          DOI: 10.1093/jamia/ocv010

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  6 in total

1.  Estimating the support of a high-dimensional distribution.

Authors:  B Schölkopf; J C Platt; J Shawe-Taylor; A J Smola; R C Williamson
Journal:  Neural Comput       Date:  2001-07       Impact factor: 2.026

2.  Leveraging rich annotations to improve learning of medical concepts from clinical free text.

Authors:  Shipeng Yu; Faisal Farooq; Balaji Krishnapuram; Bharat Rao
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

3.  Collaborative knowledge acquisition for the design of context-aware alert systems.

Authors:  Erel Joffe; Ofer Havakuk; Jorge R Herskovic; Vimla L Patel; Elmer Victor Bernstam
Journal:  J Am Med Inform Assoc       Date:  2012-06-28       Impact factor: 4.497

4.  Building a semantically annotated corpus of clinical texts.

Authors:  Angus Roberts; Robert Gaizauskas; Mark Hepple; George Demetriou; Yikun Guo; Ian Roberts; Andrea Setzer
Journal:  J Biomed Inform       Date:  2009-01-23       Impact factor: 6.317

Review 5.  A survey on annotation tools for the biomedical literature.

Authors:  Mariana Neves; Ulf Leser
Journal:  Brief Bioinform       Date:  2012-12-18       Impact factor: 11.622

6.  New directions in biomedical text annotation: definitions, guidelines and corpus construction.

Authors:  W John Wilbur; Andrey Rzhetsky; Hagit Shatkay
Journal:  BMC Bioinformatics       Date:  2006-07-25       Impact factor: 3.169

  6 in total
  4 in total

1.  Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2016-11-10

Review 2.  Biomedical informatics advancing the national health agenda: the AMIA 2015 year-in-review in clinical and consumer informatics.

Authors:  Kirk Roberts; Mary Regina Boland; Lisiane Pruinelli; Jina Dcruz; Andrew Berry; Mattias Georgsson; Rebecca Hazen; Raymond F Sarmiento; Uba Backonja; Kun-Hsing Yu; Yun Jiang; Patricia Flatley Brennan
Journal:  J Am Med Inform Assoc       Date:  2017-04-01       Impact factor: 4.497

3.  A Computable Phenotype for Acute Respiratory Distress Syndrome Using Natural Language Processing and Machine Learning.

Authors:  Majid Afshar; Cara Joyce; Anthony Oakey; Perry Formanek; Philip Yang; Matthew M Churpek; Richard S Cooper; Susan Zelisko; Ron Price; Dmitriy Dligach
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

Review 4.  Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

Authors:  Liwei Wang; Sunyang Fu; Andrew Wen; Xiaoyang Ruan; Huan He; Sijia Liu; Sungrim Moon; Michelle Mai; Irbaz B Riaz; Nan Wang; Ping Yang; Hua Xu; Jeremy L Warner; Hongfang Liu
Journal:  JCO Clin Cancer Inform       Date:  2022-07
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.