Literature DB >> 17441608

Multilabel associative classification categorization of MEDLINE articles into MeSH keywords.

Rafal Rak1, Lukasz A Kurgan, Marek Reformat.   

Abstract

The specific characteristic of classification of medical documents from the MEDLINE database is that each document is assigned to more than one category, which requires a system for multilabel classification. Another major challenge was to develop a scalable method capable of dealing with hundreds of thousand of documents. We proposed a novel system for automated classification of MEDLINE documents to MeSH keywords based on the recently developed data mining algorithm called ACRI, which was modified to accommodate multilabel classification. Five different classification configurations in conjunction with different methods of measuring classification quality were proposed and tested. The extensive experimental comparison showed superiority of methods based on reoccurrence of words in an article over nonrecurrent-based associative classification. The achieved relatively high value of macro F1 (46%) demonstrates the high quality of the proposed system for this challenging dataset. Accuracy of the proposed classifier, defined as the ratio of the sum of TP and TN examples to the total number of examples, reached 90%. Three scenarios were proposed based on the performed tests and different possible objectives. If a goal is to classify the largest number of documents, a configuration that maximizes micro F1 should be chosen. On the other hand, if a system is to work well for categories with a small number of documents, a configuration that maximizes macro F1 is more suitable. A tradeoff can be obtained by using a configuration that optimizes the average between macro and micro F1.

Mesh:

Year:  2007        PMID: 17441608     DOI: 10.1109/memb.2007.335581

Source DB:  PubMed          Journal:  IEEE Eng Med Biol Mag        ISSN: 0739-5175


  3 in total

1.  A recent advance in the automatic indexing of the biomedical literature.

Authors:  Aurélie Névéol; Sonya E Shooshan; Susanne M Humphrey; James G Mork; Alan R Aronson
Journal:  J Biomed Inform       Date:  2008-12-30       Impact factor: 6.317

2.  MeSH Up: effective MeSH text classification for improved document retrieval.

Authors:  Dolf Trieschnigg; Piotr Pezik; Vivian Lee; Franciska de Jong; Wessel Kraaij; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2009-04-17       Impact factor: 6.937

3.  Automatic inference of indexing rules for MEDLINE.

Authors:  Aurélie Névéol; Sonya E Shooshan; Vincent Claveau
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.