Literature DB >> 12603046

Mining terminological knowledge in large biomedical corpora.

Hongfang Liu1, Carol Friedman.   

Abstract

Terminological knowledge of the biomedical domain is important for natural language processing (NLP) and information retrieval (IR) applications, and a number of terminological knowledge sources, such as LocusLink, GeneBank, and the UMLS, already exist. However, because of the tremendous amount of research activity in the field, new terms and symbols are continually being created, many of which are published in the literature, but are not available in any of the other resources. Therefore, effective mining of the literature for new terminology is critical for furthering NLP and IR applications. Abbreviations are widely used in the biomedical domain, and the understanding of abbreviations requires a terminological knowledge base that consists of abbreviations with their associated senses. In previous work, several methods have been developed for automatic construction of abbreviation knowledge bases from parenthetical expressions. However, these methods pair abbreviations and their expansions based on manually crafted patterns or rules. In this paper, we propose an automatic method, which is not based on patterns or rules but is based on the use of collocations, to extract a set of related terms from parenthetical expressions including abbreviations associated with their expansions and other types of related terms such as synonyms, or hyponyms etc. Our method is based on the observation that terms associated with parenthetical expressions i) are usually related, and ii) are often collocations because they tend to co-occur more often than expected by chance. Our method was applied to the collection of MEDLINE abstracts. The method and the results were evaluated using two collections: Berman's handcrafted abbreviation list and the LocusLink collection.

Mesh:

Year:  2003        PMID: 12603046

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  13 in total

1.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

2.  Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

Authors:  Yang Huang; Henry J Lowe; Dan Klein; Russell J Cucina
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

3.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

Authors:  A M Cohen; W R Hersh; C Dubay; K Spackman
Journal:  BMC Bioinformatics       Date:  2005-04-22       Impact factor: 3.169

4.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

5.  Enhancing acronym/abbreviation knowledge bases with semantic information.

Authors:  Manabu Torii; Hongfang Liu
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

6.  Building a high-quality sense inventory for improved abbreviation disambiguation.

Authors:  Naoaki Okazaki; Sophia Ananiadou; Jun'ichi Tsujii
Journal:  Bioinformatics       Date:  2010-03-25       Impact factor: 6.937

7.  Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection.

Authors:  Marc Weeber; Bob J Schijvenaars; Erik M Van Mulligen; Barend Mons; Rob Jelier; Christian C Van Der Eijk; Jan A Kors
Journal:  AMIA Annu Symp Proc       Date:  2003

Review 8.  Recent advances in biomedical literature mining.

Authors:  Sendong Zhao; Chang Su; Zhiyong Lu; Fei Wang
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

9.  A comparison study on algorithms of detecting long forms for short forms in biomedical text.

Authors:  Manabu Torii; Zhang-zhi Hu; Min Song; Cathy H Wu; Hongfang Liu
Journal:  BMC Bioinformatics       Date:  2007-11-27       Impact factor: 3.169

10.  Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

Authors:  Irena Spasić; Daniel Schober; Susanna-Assunta Sansone; Dietrich Rebholz-Schuhmann; Douglas B Kell; Norman W Paton
Journal:  BMC Bioinformatics       Date:  2008-04-29       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.