Literature DB >> 10984469

Corpus-based statistical screening for phrase identification.

W Kim1, W J Wilbur.   

Abstract

PURPOSE: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material.
METHOD: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database. MEASUREMENTS: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings. RESULT: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases.
CONCLUSION: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text.

Entities:  

Mesh:

Year:  2000        PMID: 10984469      PMCID: PMC79045          DOI: 10.1136/jamia.2000.0070499

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  8 in total

1.  Extracting noun phrases for all of MEDLINE.

Authors:  N A Bennett; Q He; K Powell; B R Schatz
Journal:  Proc AMIA Symp       Date:  1999

2.  UMLS-based conceptual queries to biomedical information databases: an overview of the project ARIANE. Unified Medical Language System.

Authors:  M Joubert; M Fieschi; J J Robert; F Volot; D Fieschi
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

3.  The Unified Medical Language System: an informatics research collaboration.

Authors:  B L Humphreys; D A Lindberg; H M Schoolman; G O Barnett
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

4.  An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.

Authors:  G F Cooper; R A Miller
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

5.  Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools.

Authors:  W R Hersh; E H Campbell; D A Evans; N D Brownlow
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

6.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

7.  Lexical methods for managing variation in biomedical terminologies.

Authors:  A T McCray; S Srinivasan; A C Browne
Journal:  Proc Annu Symp Comput Appl Med Care       Date:  1994

8.  Indexing consistency in MEDLINE.

Authors:  M E Funk; C A Reid
Journal:  Bull Med Libr Assoc       Date:  1983-04
  8 in total
  7 in total

1.  Finding UMLS Metathesaurus concepts in MEDLINE.

Authors:  Suresh Srinivasan; Thomas C Rindflesch; William T Hole; Alan R Aronson; James G Mork
Journal:  Proc AMIA Symp       Date:  2002

2.  Doublet method for very fast autocoding.

Authors:  Jules J Berman
Journal:  BMC Med Inform Decis Mak       Date:  2004-09-15       Impact factor: 2.796

3.  Identifying well-formed biomedical phrases in MEDLINE® text.

Authors:  Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur
Journal:  J Biomed Inform       Date:  2012-06-08       Impact factor: 6.317

4.  A strategy for assigning new concepts in the MEDLINE database.

Authors:  Won Kim; W John Wilbur
Journal:  AMIA Annu Symp Proc       Date:  2005

5.  Automatic extraction of candidate nomenclature terms using the doublet method.

Authors:  Jules J Berman
Journal:  BMC Med Inform Decis Mak       Date:  2005-10-18       Impact factor: 2.796

6.  Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study.

Authors:  Stephane Meystre; Ramkiran Gouripeddi; Joel Tieder; Jeffrey Simmons; Rajendu Srivastava; Samir Shah
Journal:  J Med Internet Res       Date:  2017-05-15       Impact factor: 5.428

7.  PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

Authors:  Sun Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Sci Data       Date:  2018-06-12       Impact factor: 6.444

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.