Literature DB >> 20091016

Chi-square-based scoring function for categorization of MEDLINE citations.

A Kastrin1, B Peterlin, D Hristovski.   

Abstract

OBJECTIVES: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic.
METHODS: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain.
RESULTS: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine-learning algorithms (support vector machines, decision trees, naïve Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine-learning algorithms.
CONCLUSIONS: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.

Entities:  

Mesh:

Year:  2010        PMID: 20091016     DOI: 10.3414/ME09-01-0009

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  6 in total

1.  A bottom-up approach to MEDLINE indexing recommendations.

Authors:  Antonio Jimeno-Yepes; Bartłomiej Wilkowski; James G Mork; Elizabeth Van Lenten; Dina Demner Fushman; Alan R Aronson
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Developing topic-specific search filters for PubMed with click-through data.

Authors:  J Li; Z Lu
Journal:  Methods Inf Med       Date:  2013-05-13       Impact factor: 2.176

3.  Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation.

Authors:  Emily Seymour; Rohini Damle; Alessandro Sette; Bjoern Peters
Journal:  BMC Bioinformatics       Date:  2011-12-19       Impact factor: 3.169

4.  MeSH indexing based on automatically generated summaries.

Authors:  Antonio J Jimeno-Yepes; Laura Plaza; James G Mork; Alan R Aronson; Alberto Díaz
Journal:  BMC Bioinformatics       Date:  2013-06-26       Impact factor: 3.169

5.  Extracting laboratory test information from biomedical text.

Authors:  Yanna Shen Kang; Mehmet Kayaalp
Journal:  J Pathol Inform       Date:  2013-08-31

6.  Using MEDLINE Elemental Similarity to Assist in the Article Screening Process for Systematic Reviews.

Authors:  Xiaonan Ji; Po-Yin Yen
Journal:  JMIR Med Inform       Date:  2015-08-31
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.