Literature DB >> 19956557

Comparing a Rule Based vs. Statistical System for Automatic Categorization of MEDLINE Documents According to Biomedical Specialty.

Susanne M Humphrey1, Aurélie Névéol, Julien Gobeil, Patrick Ruch, Stéfan J Darmoni, Allen Browne.   

Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings(®) (MeSH(®)) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI) based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for one hundred MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures, performance is comparable, and for one measure, JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule based) might be combined and then evaluated showing they are complementary to one another.

Entities:  

Year:  2009        PMID: 19956557      PMCID: PMC2782854          DOI: 10.1002/asi.21170

Source DB:  PubMed          Journal:  J Am Soc Inf Sci Technol        ISSN: 1532-2882


  20 in total

1.  The NLM Indexing Initiative.

Authors:  A R Aronson; O Bodenreider; H F Chang; S M Humphrey; J G Mork; S J Nelson; T C Rindflesch; W J Wilbur
Journal:  Proc AMIA Symp       Date:  2000

2.  Simplified access to MeSH tree structures on CISMeF.

Authors:  B Thirion; S J Darmoni
Journal:  Bull Med Libr Assoc       Date:  1999-10

3.  Automatic assignment of biomedical categories: toward a generic approach.

Authors:  Patrick Ruch
Journal:  Bioinformatics       Date:  2005-11-15       Impact factor: 6.937

4.  Using literature-based discovery to identify disease candidate genes.

Authors:  Dimitar Hristovski; Borut Peterlin; Joyce A Mitchell; Susanne M Humphrey
Journal:  Int J Med Inform       Date:  2005-03       Impact factor: 4.046

5.  Journal descriptor indexing tool for categorizing text according to discipline or semantic type.

Authors:  Susanne M Humphrey; Chris J Lu; Willie J Rogers; Allen C Browne
Journal:  AMIA Annu Symp Proc       Date:  2006

6.  A method for verifying a vector-based text classification system.

Authors:  Chris J Lu; Susanne M Humphrey; Allen C Browne
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

7.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation.

Authors:  Susanne M Humphrey
Journal:  J Am Soc Inf Sci       Date:  1999

8.  Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment.

Authors:  Susanne M Humphrey; Willie J Rogers; Halil Kilicoglu; Dina Demner-Fushman; Thomas C Rindflesch
Journal:  J Am Soc Inf Sci Technol       Date:  2006-01-01

9.  A recent advance in the automatic indexing of the biomedical literature.

Authors:  Aurélie Névéol; Sonya E Shooshan; Susanne M Humphrey; James G Mork; Alan R Aronson
Journal:  J Biomed Inform       Date:  2008-12-30       Impact factor: 6.317

10.  A MEDLINE categorization algorithm.

Authors:  Stefan J Darmoni; Aurelie Névéol; Jean-Marie Renard; Jean-Francois Gehanno; Lina F Soualmia; Badisse Dahamna; Benoit Thirion
Journal:  BMC Med Inform Decis Mak       Date:  2006-02-07       Impact factor: 2.796

View more
  2 in total

1.  How are the different specialties represented in the major journals in general medicine?

Authors:  Jean-Francois Gehanno; Joel Ladner; Laetitia Rollin; Badisse Dahamna; Stefan J Darmoni
Journal:  BMC Med Inform Decis Mak       Date:  2011-01-21       Impact factor: 2.796

2.  Extracting laboratory test information from biomedical text.

Authors:  Yanna Shen Kang; Mehmet Kayaalp
Journal:  J Pathol Inform       Date:  2013-08-31
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.