| Literature DB >> 34306056 |
Rohit Bharti1, Aditya Khamparia2, Mohammad Shabaz3, Gaurav Dhiman4, Sagar Pande1, Parneet Singh5.
Abstract
The correct prediction of heart disease can prevent life threats, and incorrect prediction can prove to be fatal at the same time. In this paper different machine learning algorithms and deep learning are applied to compare the results and analysis of the UCI Machine Learning Heart Disease dataset. The dataset consists of 14 main attributes used for performing the analysis. Various promising results are achieved and are validated using accuracy and confusion matrix. The dataset consists of some irrelevant features which are handled using Isolation Forest, and data are also normalized for getting better results. And how this study can be combined with some multimedia technology like mobile devices is also discussed. Using deep learning approach, 94.2% accuracy was obtained.Entities:
Mesh:
Year: 2021 PMID: 34306056 PMCID: PMC8266441 DOI: 10.1155/2021/8387680
Source DB: PubMed Journal: Comput Intell Neurosci
Summary of the literature review.
| Sr.no. | Author | Year | Findings |
|---|---|---|---|
| 1 | Gárate-Escamila et al. [ | 2020 | DNN and ANN were used with the X ^ 2 statistical model. The clinical data parameters were used for conforming the predictions. |
| 2 | Harvard Medical School [ | 2020 | Hungarian-Cleveland datasets were used for predicting heart disease using different machine learning classifiers and PCA was used for dimensionality reduction and feature selection |
| 3 | Zhang et al. [ | 2018 | AdaBoost classifier with PCA combination was used for the feature extraction and the accuracy of the prediction was increased |
| 4 | Singh et al. [ | 2018 | Heart rate variability was for the detection of coronary artery disease. Fisher method and generalised discriminant analysis with binary classifiers were used for the detection of important features. |
| 5 | Chen et al. [ | 2018 | A subspace feature clustering was used as a subset of stratified feature clustering and for doing a feature reduction of the clusters formed |
| 6 | Yang and Nataliani [ | 2018 | A fuzzy clustering method especially fuzzy c-means was used for various feature weighted methods and features were reduced |
| 7 | Kumar [ | 2017 | Different machine learning algorithms were applied for getting the results and then compared with each other |
| 8 | Rajagopal and Ranganathan [ | 2017 | Combination of probabilistic neural network classifier, PCA, kernel PCA, and unsupervised dimensionality reduction was used so that feature reduction can be used and a domain expert was used for the correct analysis of the result |
| 9 | Zhang et al. [ | 2017 | Support vector machine is used for the classification purpose of the clinical data which is matched with the codes of New York heart association; further findings are left for other researchers |
| 10 | Khan and Quadri [ | 2016 | The main aim of this research was to summarize the best model and angiographic disease status by analyzing different unstructured data and using data mining techniques |
| 11 | Negi et al. [ | 2016 | Uncorrelated linear discriminant analysis with PCA was used for studying the electrocardiogram and Wilson methods were also used for the distinction of upper limb motions |
| 12 | Dun et al. [ | 2016 | They applied a variety of deep learning techniques and ensemble techniques and also performed hyperparameter tuning techniques for increasing the accuracy. |
| 13 | Rahhal et al. [ | 2016 | ECG approach is used by consulting various domain experts and then MIT-BIH arrhythmia database as well as two other databases called INCART and SVDB, respectively |
| 14 | Imani and Ghassemian [ | 2015 | There are several times when the data is not enough, so Imani approached a weighted training sample method including feature extraction for the spatial dimension of the images and the accuracy was increased |
| 15 | Guidi et al. [ | 2014 | Neural networks, SVM, and fuzzy system approach are used and Random Forest is used as a classifier, for the prediction of heart failure by using a clinical decision support system |
| 16 | Santhanam and Ephzibah [ | 2013 | A regression technique with PCA with its different versions like PCA1, PCA2, PCA3, and PCA4 was used and the features were extracted and the results were promising |
| 17 | Ratnasari et al. [ | 2013 | The datasets used were Cleveland–Hungarian dataset and the UCI machine learning datasets were analyzed with feature selection techniques |
| 18 | Kamencay et al. [ | 2013 | Object recognition was performed with scale-invariant feature transformation. Caltech 101 database was used for the evaluation purpose. |
| 19 | Melillo et al. [ | 2013 | Two public Holster databases were used for finding high-risk and low-risk patients. Cart algorithm is applied for the classification purpose. |
| 20 | Amma [ | 2012 | The dataset used was from University of California, Irvine. The genetic algorithm was used for the training purpose and neural network for the classification purpose. |
| 21 | Keogh and Mueen [ | 2012 | How to break the curse of dimensionality using PCA, SVM, and other classifiers and reduce features. |
| 22 | Parthiban and Srivatsa [ | 2012 | Diabetes is one of the main causes of heart disease. The classifiers used are Naïve Bayes and SVM for extracting important features and classification purpose. |
| 23 | Srinivas et al. [ | 2010 | Prediction of heart diseases in the coal mines was the prime consideration, and decision tree, naïve Bayes, and neural networks were used for the classification |
| 24 | Das et al. [ | 2009 | On Cleveland dataset, using a SAS-based software, a great accuracy was achieved with different ensemble techniques |
| 25 | Yaghouby et al. [ | 2009 | Cardiac arrhythmias was considered using the MIT-BIH database. HRV similar to [ |
| 26 | Asl et al. [ | 2008 | Generalised discriminant analysis and SVM were used for feature reduction and classification |
| 27 | Avendaño-Valencia et al. [ | 2009 | Feature extraction was based upon the heart murmur frequency with time representation frequency and PCA was used for the analysis of the features |
| 28 | Guyon et al. [ | 2008 | Book for doing feature extraction efficiently. |
| 29 | UCI Machine Learning Repository [ | 1998 | This dataset is used for many ML and deep learning benchmark results |
| 30 | Liu and Motoda [ | 1998 | Feature importance and how to select them appropriately was discussed in this book |
| 31 | Wettschereck et al. [ | 1997 | K-NN algorithm was used for the classification as they are mostly the derivatives for the lazy learning algorithms for the feature selection using weighted methods |
| 32 | Wettschereck and Dietterich [ | 1995 | Different classification problems decision boundaries were analyzed, and the problem was tackled using nested generalized example |
Figure 1Class distribution of disease and no disease.
Figure 2Distribution of age and sex.
Figure 3Distribution of chest pain and trestbps.
Figure 4Features important for heart disease.
Figure 5Features not important for heart disease.
Duplicate values.
| Age | Sex | Cp | Trest bps | Chol | Rest ecg | Thalach | Exang | Old peak | Slope | Ca | Thal | T |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 38 | 1 | 2 | 138 | 175 | 1 | 173 | 0 | 0.0 | 2 | 4 | 2 | 1 |
Using the pandas' function for dropping these values is the simplest. It is also an important part while performing data preprocessing.
Figure 61st schematic diagram of the proposed model.
Figure 72nd schematic diagram of the proposed model.
Figure 8Confusion matrix.
Figure 9Correlation heatmap.
Figure 10Feature selection on correlation heatmap.
Comparative analysis.
| Classifiers | Accuracy (%) | Specificity | Sensitivity |
|---|---|---|---|
| Logistic regression | 83.3 | 82.3 | 86.3 |
| K neighbors | 84.8 | 77.7 | 85.0 |
| SVM | 83.2 | 78.7 | 78.2 |
| Random forest | 80.3 | 78.7 | 78.2 |
| Decision tree | 82.3 | 78.9 | 78.5 |
| DL | 94.2 | 83.1 | 82.3 |