Literature DB >> 23954592

Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

Shaodian Zhang1, Noémie Elhadad.   

Abstract

Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work.
Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Chunking; Distributional semantics; Named entity recognition; Natural language processing; UMLS

Mesh:

Year:  2013        PMID: 23954592      PMCID: PMC3865922          DOI: 10.1016/j.jbi.2013.08.004

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  30 in total

1.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles.

Authors:  C Friedman; P Kra; H Yu; M Krauthammer; A Rzhetsky
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

2.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

3.  Aggregating UMLS semantic types for reducing conceptual complexity.

Authors:  A T McCray; A Burgun; O Bodenreider
Journal:  Stud Health Technol Inform       Date:  2001

4.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

5.  Using an ensemble system to improve concept extraction from clinical records.

Authors:  Ning Kang; Zubair Afzal; Bharat Singh; Erik M van Mulligen; Jan A Kors
Journal:  J Biomed Inform       Date:  2012-01-03       Impact factor: 6.317

6.  A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts.

Authors:  Rimma Pivovarov; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2012-01-25       Impact factor: 6.317

7.  Unlocking clinical data from narrative reports: a study of natural language processing.

Authors:  G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal:  Ann Intern Med       Date:  1995-05-01       Impact factor: 25.391

8.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.

Authors:  Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel Martin; Xiaodan Zhu
Journal:  J Am Med Inform Assoc       Date:  2011-05-12       Impact factor: 4.497

9.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

10.  An ontology for cell types.

Authors:  Jonathan Bard; Seung Y Rhee; Michael Ashburner
Journal:  Genome Biol       Date:  2005-01-14       Impact factor: 13.583

View more
  16 in total

1.  Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions.

Authors:  Noémie Elhadad; Shaodian Zhang; Patricia Driscoll; Samuel Brody
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

2.  tmChem: a high performance approach for chemical named entity recognition and normalization.

Authors:  Robert Leaman; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

3.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

4.  Wide-coverage relation extraction from MEDLINE using deep syntax.

Authors:  Nhung T H Nguyen; Makoto Miwa; Yoshimasa Tsuruoka; Takashi Chikayama; Satoshi Tojo
Journal:  BMC Bioinformatics       Date:  2015-04-01       Impact factor: 3.169

5.  Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.

Authors:  Justin R Gregg; Maximilian Lang; Lucy L Wang; Matthew J Resnick; Sandeep K Jain; Jeremy L Warner; Daniel A Barocas
Journal:  JCO Clin Cancer Inform       Date:  2017-06-08

Review 6.  Natural Language Processing for EHR-Based Computational Phenotyping.

Authors:  Zexian Zeng; Yu Deng; Xiaoyu Li; Tristan Naumann; Yuan Luo
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-06-25       Impact factor: 3.710

7.  Identifying named entities from PubMed for enriching semantic categories.

Authors:  Sun Kim; Zhiyong Lu; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2015-02-21       Impact factor: 3.169

8.  Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.

Authors:  Magnus Ahltorp; Maria Skeppstedt; Shiho Kitajima; Aron Henriksson; Rafal Rzepka; Kenji Araki
Journal:  J Biomed Semantics       Date:  2016-09-26

9.  A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Authors:  Mujiono Sadikin; Mohamad Ivan Fanany; T Basaruddin
Journal:  Comput Intell Neurosci       Date:  2016-10-24

10.  Text Mining the History of Medicine.

Authors:  Paul Thompson; Riza Theresa Batista-Navarro; Georgios Kontonatsios; Jacob Carter; Elizabeth Toon; John McNaught; Carsten Timmermann; Michael Worboys; Sophia Ananiadou
Journal:  PLoS One       Date:  2016-01-06       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.