Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 MBA: a literature mining system for extracting biomedical abbreviations.

Literature DB >> 19134199

MBA: a literature mining system for extracting biomedical abbreviations.

Yun Xu¹, ZhiHao Wang, YiMing Lei, YuZhong Zhao, Yu Xue.

Abstract

BACKGROUND: The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter.
RESULTS: A literature mining system MBA was constructed to extract both acronym-type and non-acronym-type abbreviations. An abbreviation-tagged literature corpus, called Medstract gold standard corpus, was used to evaluate the system. MBA achieved a recall of 88% at the precision of 91% on the Medstract gold-standard EVALUATION Corpus.
CONCLUSION: We present a new literature mining system MBA for extracting biomedical abbreviations. Our evaluation demonstrates that the MBA system performs better than the others. It can identify the definition of not only acronym-type abbreviations including a little irregular acronym-type abbreviations (e.g., <CNS1, cyclophilin seven suppressor>), but also non-acronym-type abbreviations (e.g., <Fas, CD95>).

Entities: Chemical Disease Gene Species

Mesh：

Year: 2009 PMID： 19134199 PMCID： PMC2639376 DOI： 10.1186/1471-2105-10-14

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

12 in total

1. Mapping abbreviations to full forms in biomedical articles.

Authors: Hong Yu; George Hripcsak; Carol Friedman
Journal: J Am Med Inform Assoc Date: 2002 May-Jun Impact factor: 4.497

2. Automatic extraction of acronym-meaning pairs from MEDLINE databases.

Authors: J Pustejovsky; J Castaño; B Cochran; M Kotecki; M Morrell
Journal: Stud Health Technol Inform Date: 2001

3. Creating an online dictionary of abbreviations from MEDLINE.

Authors: Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal: J Am Med Inform Assoc Date: 2002 Nov-Dec Impact factor: 4.497

4. Acronymesis: the exploding misuse of acronyms.

Authors: Herbert L Fred; Tsung O Cheng
Journal: Tex Heart Inst J Date: 2003

Review 5. A survey of current work in biomedical text mining.

Authors: Aaron M Cohen; William R Hersh
Journal: Brief Bioinform Date: 2005-03 Impact factor: 11.622

6. ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors: Hiroko Ao; Toshihisa Takagi
Journal: J Am Med Inform Assoc Date: 2005-05-19 Impact factor: 4.497

Review 7. Literature mining for the biologist: from information retrieval to biological discovery.

Authors: Lars Juhl Jensen; Jasmin Saric; Peer Bork
Journal: Nat Rev Genet Date: 2006-02 Impact factor: 53.242

8. ADAM: another database of abbreviations in MEDLINE.

Authors: Wei Zhou; Vetle I Torvik; Neil R Smalheiser
Journal: Bioinformatics Date: 2006-09-18 Impact factor: 6.937

9. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

10. A comparison study on algorithms of detecting long forms for short forms in biomedical text.

Authors: Manabu Torii; Zhang-zhi Hu; Min Song; Cathy H Wu; Hongfang Liu
Journal: BMC Bioinformatics Date: 2007-11-27 Impact factor: 3.169

4 in total

Review 1. Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors: Yael Garten; Adrien Coulet; Russ B Altman
Journal: Pharmacogenomics Date: 2010-10 Impact factor: 2.533

2. Rewriting and suppressing UMLS terms for improved biomedical term identification.

Authors: Kristina M Hettne; Erik M van Mulligen; Martijn J Schuemie; Bob Ja Schijvenaars; Jan A Kors
Journal: J Biomed Semantics Date: 2010-03-31

3. Discriminative application of string similarity methods to chemical and non-chemical names for biomedical abbreviation clustering.

Authors: Atsuko Yamaguchi; Yasunori Yamamoto; Jin-Dong Kim; Toshihisa Takagi; Akinori Yonezawa
Journal: BMC Genomics Date: 2012-06-11 Impact factor: 3.969

4. A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.

Authors: Wenhui Xing; Junsheng Qi; Xiaohui Yuan; Lin Li; Xiaoyu Zhang; Yuhua Fu; Shengwu Xiong; Lun Hu; Jing Peng
Journal: Bioinformatics Date: 2018-07-01 Impact factor: 6.937

4 in total