Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data.

Literature DB >> 32570782

Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data.

Bomi Jeong¹, Hyunjeong Cho^2,3, Jieun Kim¹, Soon Kil Kwon^2,3, SeungWoo Hong⁴, ChangSik Lee⁴, TaeYeon Kim⁴, Man Sik Park⁵, Seoksu Hong¹, Tae-Young Heo¹.

Abstract

This study aims to compare the classification performance of statistical models on highly imbalanced kidney data. The health examination cohort database provided by the National Health Insurance Service in Korea is utilized to build models with various machine learning methods. The glomerular filtration rate (GFR) is used to diagnose chronic kidney disease (CKD). It is calculated using the Modification of Diet in Renal Disease method and classified into five stages (1, 2, 3A and 3B, 4, and 5). Different CKD stages based on the estimated GFR are considered as six classes of the response variable. This study utilizes two representative generalized linear models for classification, namely, multinomial logistic regression (multinomial LR) and ordinal logistic regression (ordinal LR), as well as two machine learning models, namely, random forest (RF) and autoencoder (AE). The classification performance of the four models is compared in terms of accuracy, sensitivity, specificity, precision, and F1-Measure. To find the best model that classifies CKD stages correctly, the data are divided into a 10-fold dataset with the same rate for each CKD stage. Results indicate that RF and AE show better performance in accuracy than the multinomial and ordinal LR models when classifying the response variable. However, when a highly imbalanced dataset is modeled, the accuracy of the model performance can distort the actual performance. This occurs because accuracy is high even if a statistical model classifies a minority class into a majority class. To solve this problem in performance interpretation, we not only consider accuracy from the confusion matrix but also sensitivity, specificity, precision, and F-1 measure for each class. To present classification performance with a single value for each model, we calculate the macro-average and micro-weighted values for each model. We conclude that AE is the best model classifying CKD stages correctly for all performance indices.

Entities: Chemical Disease Gene Species

Keywords: autoencoder; chronic kidney disease; imbalanced data; machine learning; national health screening

Year: 2020 PMID： 32570782 DOI： 10.3390/diagnostics10060415

Source DB: PubMed Journal: Diagnostics (Basel) ISSN： 2075-4418

6 in total

1. Potential of Immune-Related Genes as Biomarkers for Diagnosis and Subtype Classification of Preeclampsia.

Authors: Ying Wang; Zhen Li; Guiyu Song; Jun Wang
Journal: Front Genet Date: 2020-12-01 Impact factor: 4.599

2. Construction of genetic classification model for coronary atherosclerosis heart disease using three machine learning methods.

Authors: Wenjuan Peng; Yuan Sun; Ling Zhang
Journal: BMC Cardiovasc Disord Date: 2022-02-12 Impact factor: 2.298

3. Identification of Immune-Related Biomarkers for Sciatica in Peripheral Blood.

Authors: Xin Jin; Jun Wang; Lina Ge; Qing Hu
Journal: Front Genet Date: 2021-12-02 Impact factor: 4.599

4. Automated Early Detection of Alzheimer's Disease by Capturing Impairments in Multiple Cognitive Domains with Multiple Drawing Tasks.

Authors: Masatomo Kobayashi; Yasunori Yamada; Kaoru Shinkawa; Miyuki Nemoto; Kiyotaka Nemoto; Tetsuaki Arai
Journal: J Alzheimers Dis Date: 2022 Impact factor: 4.160

5. Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients.

Authors: Nakib Hayat Chowdhury; Mamun Bin Ibne Reaz; Fahmida Haque; Shamim Ahmad; Sawal Hamid Md Ali; Ahmad Ashrif A Bakar; Mohammad Arif Sobhan Bhuiyan
Journal: Diagnostics (Basel) Date: 2021-12-03

Review 6. A Powerful Paradigm for Cardiovascular Risk Stratification Using Multiclass, Multi-Label, and Ensemble-Based Machine Learning Paradigms: A Narrative Review.

Authors: Jasjit S Suri; Mrinalini Bhagawati; Sudip Paul; Athanasios D Protogerou; Petros P Sfikakis; George D Kitas; Narendra N Khanna; Zoltan Ruzsa; Aditya M Sharma; Sanjay Saxena; Gavino Faa; John R Laird; Amer M Johri; Manudeep K Kalra; Kosmas I Paraskevas; Luca Saba
Journal: Diagnostics (Basel) Date: 2022-03-16

6 in total