| Literature DB >> 33155096 |
Nashreen Md Idris1, Yin Kia Chiam2, Kasturi Dewi Varathan3, Wan Azman Wan Ahmad4, Kok Han Chee4, Yih Miin Liew5.
Abstract
Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%. Graphical abstract.Entities:
Keywords: Classification algorithms; Coronary artery disease; Data mining; Feature selection; Heart disease prediction; Prediction model
Mesh:
Year: 2020 PMID: 33155096 DOI: 10.1007/s11517-020-02268-9
Source DB: PubMed Journal: Med Biol Eng Comput ISSN: 0140-0118 Impact factor: 2.602