Literature DB >> 19134199

MBA: a literature mining system for extracting biomedical abbreviations.

Yun Xu1, ZhiHao Wang, YiMing Lei, YuZhong Zhao, Yu Xue.   

Abstract

BACKGROUND: The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter.
RESULTS: A literature mining system MBA was constructed to extract both acronym-type and non-acronym-type abbreviations. An abbreviation-tagged literature corpus, called Medstract gold standard corpus, was used to evaluate the system. MBA achieved a recall of 88% at the precision of 91% on the Medstract gold-standard EVALUATION Corpus.
CONCLUSION: We present a new literature mining system MBA for extracting biomedical abbreviations. Our evaluation demonstrates that the MBA system performs better than the others. It can identify the definition of not only acronym-type abbreviations including a little irregular acronym-type abbreviations (e.g., <CNS1, cyclophilin seven suppressor>), but also non-acronym-type abbreviations (e.g., <Fas, CD95>).

Entities:  

Mesh:

Year:  2009        PMID: 19134199      PMCID: PMC2639376          DOI: 10.1186/1471-2105-10-14

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  12 in total

1.  Mapping abbreviations to full forms in biomedical articles.

Authors:  Hong Yu; George Hripcsak; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2002 May-Jun       Impact factor: 4.497

2.  Automatic extraction of acronym-meaning pairs from MEDLINE databases.

Authors:  J Pustejovsky; J Castaño; B Cochran; M Kotecki; M Morrell
Journal:  Stud Health Technol Inform       Date:  2001

3.  Creating an online dictionary of abbreviations from MEDLINE.

Authors:  Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal:  J Am Med Inform Assoc       Date:  2002 Nov-Dec       Impact factor: 4.497

4.  Acronymesis: the exploding misuse of acronyms.

Authors:  Herbert L Fred; Tsung O Cheng
Journal:  Tex Heart Inst J       Date:  2003

Review 5.  A survey of current work in biomedical text mining.

Authors:  Aaron M Cohen; William R Hersh
Journal:  Brief Bioinform       Date:  2005-03       Impact factor: 11.622

6.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

Review 7.  Literature mining for the biologist: from information retrieval to biological discovery.

Authors:  Lars Juhl Jensen; Jasmin Saric; Peer Bork
Journal:  Nat Rev Genet       Date:  2006-02       Impact factor: 53.242

8.  ADAM: another database of abbreviations in MEDLINE.

Authors:  Wei Zhou; Vetle I Torvik; Neil R Smalheiser
Journal:  Bioinformatics       Date:  2006-09-18       Impact factor: 6.937

9.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

10.  A comparison study on algorithms of detecting long forms for short forms in biomedical text.

Authors:  Manabu Torii; Zhang-zhi Hu; Min Song; Cathy H Wu; Hongfang Liu
Journal:  BMC Bioinformatics       Date:  2007-11-27       Impact factor: 3.169

View more
  4 in total

Review 1.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

2.  Rewriting and suppressing UMLS terms for improved biomedical term identification.

Authors:  Kristina M Hettne; Erik M van Mulligen; Martijn J Schuemie; Bob Ja Schijvenaars; Jan A Kors
Journal:  J Biomed Semantics       Date:  2010-03-31

3.  Discriminative application of string similarity methods to chemical and non-chemical names for biomedical abbreviation clustering.

Authors:  Atsuko Yamaguchi; Yasunori Yamamoto; Jin-Dong Kim; Toshihisa Takagi; Akinori Yonezawa
Journal:  BMC Genomics       Date:  2012-06-11       Impact factor: 3.969

4.  A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.

Authors:  Wenhui Xing; Junsheng Qi; Xiaohui Yuan; Lin Li; Xiaoyu Zhang; Yuhua Fu; Shengwu Xiong; Lun Hu; Jing Peng
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.