Literature DB >> 35170333

A fast supervised density-based discretization algorithm for classification tasks in the medical domain.

Aristos Aristodimou1, Andreas Diavastos2,3, Constantinos S Pattichis1.   

Abstract

Discretization is a preprocessing technique used for converting continuous features into categorical. This step is essential for processing algorithms that cannot handle continuous data as input. In addition, in the big data era, it is important for a discretizer to be able to efficiently discretize data. In this paper, a new supervised density-based discretization (DBAD) algorithm is proposed, which satisfies these requirements. For the evaluation of the algorithm, 11 datasets that cover a wide range of datasets in the medical domain were used. The proposed algorithm was tested against three state-of-the art discretizers using three classifiers with different characteristics. A parallel version of the algorithm was evaluated using two synthetic big datasets. In the majority of the performed tests, the algorithm was found performing statistically similar or better than the other three discretization algorithms it was compared to. Additionally, the algorithm was faster than the other discretizers in all of the performed tests. Finally, the parallel version of DBAD shows almost linear speedup for a Message Passing Interface (MPI) implementation (9.64× for 10 nodes), while a hybrid MPI/OpenMP implementation improves execution time by 35.3× for 10 nodes and 6 threads per node.

Entities:  

Keywords:  big data; classification; density estimation; density-based discretization; supervised discretization

Mesh:

Year:  2022        PMID: 35170333     DOI: 10.1177/14604582211065397

Source DB:  PubMed          Journal:  Health Informatics J        ISSN: 1460-4582            Impact factor:   2.681


  1 in total

1.  Analysis of Influencing Factors of College Students' Physical Exercise Habits Based on the Continuous Discrete Algorithm.

Authors:  Zhijian Zhang; Miaomiao Jiang; Guanglong Shi; Shanshan Gao
Journal:  J Environ Public Health       Date:  2022-08-16
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.