Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes.

Literature DB >> 25449060

The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes.

Azra Ramezankhani¹, Omid Pournik², Jamal Shahrabi³, Fereidoun Azizi⁴, Farzad Hadaegh¹, Davood Khalili^1,5.

Abstract

OBJECTIVE: To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), naïve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS).
METHODS: . Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100%, 200%, 300%, 400%, 500%, 600%, and 700% of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden's index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH).
RESULTS: Oversampling the minority class at 700% (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64%, 51%, and 5%, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden's index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions.
CONCLUSIONS: To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database.

Entities: Chemical Disease Species

Keywords: SMOTE; classification; data mining; diabetes

Mesh：

Year: 2014 PMID： 25449060 DOI： 10.1177/0272989X14560647

Source DB: PubMed Journal: Med Decis Making ISSN： 0272-989X Impact factor: 2.583

Keyword Cloud
Cited

16 in total

1. Classification-based data mining for identification of risk patterns associated with hypertension in Middle Eastern population: A 12-year longitudinal study.

Authors: Azra Ramezankhani; Ali Kabir; Omid Pournik; Fereidoun Azizi; Farzad Hadaegh
Journal: Medicine (Baltimore) Date: 2016-08 Impact factor: 1.889

2. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study.

Authors: Azra Ramezankhani; Esmaeil Hadavandi; Omid Pournik; Jamal Shahrabi; Fereidoun Azizi; Farzad Hadaegh
Journal: BMJ Open Date: 2016-12-01 Impact factor: 2.692

Review 3. Machine Learning and Data Mining Methods in Diabetes Research.

Authors: Ioannis Kavakiotis; Olga Tsave; Athanasios Salifoglou; Nicos Maglaveras; Ioannis Vlahavas; Ioanna Chouvarda
Journal: Comput Struct Biotechnol J Date: 2017-01-08 Impact factor: 7.271

4. Multiobjective grammar-based genetic programming applied to the study of asthma and allergy epidemiology.

Authors: Rafael V Veiga; Helio J C Barbosa; Heder S Bernardino; João M Freitas; Caroline A Feitosa; Sheila M A Matos; Neuza M Alcântara-Neves; Maurício L Barreto
Journal: BMC Bioinformatics Date: 2018-06-26 Impact factor: 3.169

5. Detecting Hypoglycemia Incidents Reported in Patients' Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance.

Authors: Jinying Chen; John Lalor; Weisong Liu; Emily Druhl; Edgard Granillo; Varsha G Vimalananda; Hong Yu
Journal: J Med Internet Res Date: 2019-03-11 Impact factor: 5.428

6. Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models.

Authors: Rok Blagus; Lara Lusa
Journal: BMC Bioinformatics Date: 2015-11-04 Impact factor: 3.169

7. Comparing Three Data Mining Algorithms for Identifying zzm321990the Associated Risk Factors of Type 2 Diabetes

Authors: Habibollah Esmaeily; Maryam Tayefi; Majid Ghayour-Mobarhan; Alireza Amirabadizadeh
Journal: Iran Biomed J Date: 2018-01-27

8. PPAI: a web server for predicting protein-aptamer interactions.

Authors: Jianwei Li; Xiaoyu Ma; Xichuan Li; Junhua Gu
Journal: BMC Bioinformatics Date: 2020-06-09 Impact factor: 3.169

9. Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia.

Authors: Herdiantri Sufriyana; Yu-Wei Wu; Emily Chia-Yu Su
Journal: EBioMedicine Date: 2020-04-10 Impact factor: 8.143

10. Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance.

Authors: Davide Barbieri; Nitesh Chawla; Luciana Zaccagni; Tonći Grgurinović; Jelena Šarac; Miran Čoklo; Saša Missoni
Journal: Int J Environ Res Public Health Date: 2020-10-28 Impact factor: 3.390