Literature DB >> 25449060

The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes.

Azra Ramezankhani1, Omid Pournik2, Jamal Shahrabi3, Fereidoun Azizi4, Farzad Hadaegh1, Davood Khalili1,5.   

Abstract

OBJECTIVE: To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), naïve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS).
METHODS: . Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100%, 200%, 300%, 400%, 500%, 600%, and 700% of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden's index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH).
RESULTS: Oversampling the minority class at 700% (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64%, 51%, and 5%, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden's index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions.
CONCLUSIONS: To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database.
© The Author(s) 2014.

Entities:  

Keywords:  SMOTE; classification; data mining; diabetes

Mesh:

Year:  2014        PMID: 25449060     DOI: 10.1177/0272989X14560647

Source DB:  PubMed          Journal:  Med Decis Making        ISSN: 0272-989X            Impact factor:   2.583


  16 in total

1.  Classification-based data mining for identification of risk patterns associated with hypertension in Middle Eastern population: A 12-year longitudinal study.

Authors:  Azra Ramezankhani; Ali Kabir; Omid Pournik; Fereidoun Azizi; Farzad Hadaegh
Journal:  Medicine (Baltimore)       Date:  2016-08       Impact factor: 1.889

2.  Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study.

Authors:  Azra Ramezankhani; Esmaeil Hadavandi; Omid Pournik; Jamal Shahrabi; Fereidoun Azizi; Farzad Hadaegh
Journal:  BMJ Open       Date:  2016-12-01       Impact factor: 2.692

Review 3.  Machine Learning and Data Mining Methods in Diabetes Research.

Authors:  Ioannis Kavakiotis; Olga Tsave; Athanasios Salifoglou; Nicos Maglaveras; Ioannis Vlahavas; Ioanna Chouvarda
Journal:  Comput Struct Biotechnol J       Date:  2017-01-08       Impact factor: 7.271

4.  Multiobjective grammar-based genetic programming applied to the study of asthma and allergy epidemiology.

Authors:  Rafael V Veiga; Helio J C Barbosa; Heder S Bernardino; João M Freitas; Caroline A Feitosa; Sheila M A Matos; Neuza M Alcântara-Neves; Maurício L Barreto
Journal:  BMC Bioinformatics       Date:  2018-06-26       Impact factor: 3.169

5.  Detecting Hypoglycemia Incidents Reported in Patients' Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance.

Authors:  Jinying Chen; John Lalor; Weisong Liu; Emily Druhl; Edgard Granillo; Varsha G Vimalananda; Hong Yu
Journal:  J Med Internet Res       Date:  2019-03-11       Impact factor: 5.428

6.  Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models.

Authors:  Rok Blagus; Lara Lusa
Journal:  BMC Bioinformatics       Date:  2015-11-04       Impact factor: 3.169

7.  Comparing Three Data Mining Algorithms for Identifying zzm321990the Associated Risk Factors of Type 2 Diabetes

Authors:  Habibollah Esmaeily; Maryam Tayefi; Majid Ghayour-Mobarhan; Alireza Amirabadizadeh
Journal:  Iran Biomed J       Date:  2018-01-27

8.  PPAI: a web server for predicting protein-aptamer interactions.

Authors:  Jianwei Li; Xiaoyu Ma; Xichuan Li; Junhua Gu
Journal:  BMC Bioinformatics       Date:  2020-06-09       Impact factor: 3.169

9.  Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia.

Authors:  Herdiantri Sufriyana; Yu-Wei Wu; Emily Chia-Yu Su
Journal:  EBioMedicine       Date:  2020-04-10       Impact factor: 8.143

10.  Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance.

Authors:  Davide Barbieri; Nitesh Chawla; Luciana Zaccagni; Tonći Grgurinović; Jelena Šarac; Miran Čoklo; Saša Missoni
Journal:  Int J Environ Res Public Health       Date:  2020-10-28       Impact factor: 3.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.