Literature DB >> 12501816

Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries.

J D Wren1, H R Garner.   

Abstract

OBJECTIVES: To develop an automated, accurate and scalable method by which acronym-definition pairs can be identified within text. Its primary advantage is in enabling information processing methods to resolve author-defined acronyms, but it also allows an automated creation of a reference work on acronym definitions. This has several advantages over manual or semi-automated methods, besides time and effort saved, such as enabling identification of relative frequencies for alternate acronyms and definitions as well as spelling, phrasing and hyphenation variants for a unique acronym-definition pair. It also aids users in identifying acronym/definition variants present in the literature that may not necessarily be in biomedical databases.
METHODS: A set of heuristics to accurately locate and identify the boundaries of acronym-definition pairs was developed and refined in terms of precision and recall on subsets of MEDLINE records. These training sets were gradually increased in size and heuristics re-evaluated to ensure scalability.
RESULTS: Our final set of Acronym Resolving General Heuristics (ARGH) had a sample-based estimated rate of 96.5 +/- 0.4% precision and 93.0 +/- 2.7% recall when tested on over 12 million MEDLINE records, identifying more than 174,000 unique acronyms and their 737,000 associated definitions.
CONCLUSIONS: We estimate that as much as 36% of the acronyms in MEDLINE are associated with more than one definition and, conversely, up to 10% of definitions are associated with more than one acronym. The number of unique acronyms in MEDLINE is increasing at a rate of approximately 11,000 per year, while the number of definitions associated with them is growing at approximately four times that rate. Access to the ARGH database is available online at http://lethargy.swmed.edu/ARGH/argh.asp. The heuristic module and database are available upon request.

Mesh:

Year:  2002        PMID: 12501816

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  18 in total

1.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

2.  A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources.

Authors:  Sungrim Moon; Serguei Pakhomov; Nathan Liu; James O Ryan; Genevieve B Melton
Journal:  J Am Med Inform Assoc       Date:  2013-06-27       Impact factor: 4.497

3.  Natural language query in the biochemistry and molecular biology domains based on cognition search™.

Authors:  Elizabeth J Goldsmith; Saurabh Mendiratta; Radha Akella; Kathleen Dahlgren
Journal:  Summit Transl Bioinform       Date:  2009-03-01

4.  A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide.

Authors:  Jonathan D Wren
Journal:  Bioinformatics       Date:  2009-05-15       Impact factor: 6.937

Review 5.  Value of Eponyms in Dermato-Trichological Nomenclature.

Authors:  Ralph M Trüeb
Journal:  Skin Appendage Disord       Date:  2017-08-12

6.  Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment.

Authors:  Susanne M Humphrey; Willie J Rogers; Halil Kilicoglu; Dina Demner-Fushman; Thomas C Rindflesch
Journal:  J Am Soc Inf Sci Technol       Date:  2006-01-01

7.  A text-mining system for extracting metabolic reactions from full-text articles.

Authors:  Jan Czarnecki; Irene Nobeli; Adrian M Smith; Adrian J Shepherd
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

8.  Allie: a database and a search service of abbreviations and long forms.

Authors:  Yasunori Yamamoto; Atsuko Yamaguchi; Hidemasa Bono; Toshihisa Takagi
Journal:  Database (Oxford)       Date:  2011-04-15       Impact factor: 3.451

9.  Applications of natural language processing in biodiversity science.

Authors:  Anne E Thessen; Hong Cui; Dmitry Mozzherin
Journal:  Adv Bioinformatics       Date:  2012-05-22

10.  Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts.

Authors:  Weisi Duan; Min Song; Alexander Yates
Journal:  BMC Bioinformatics       Date:  2009-03-19       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.