Literature DB >> 34019571

Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.

Demeke Endalie1, Tesfa Tegegne2.   

Abstract

The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.

Entities:  

Year:  2021        PMID: 34019571     DOI: 10.1371/journal.pone.0251902

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  1 in total

1.  Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.

Authors:  Demeke Endalie; Getamesay Haile; Wondmagegn Taye Abebe
Journal:  PeerJ Comput Sci       Date:  2022-04-25
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.