Literature DB >> 34966445

Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms.

Ebrahim Mohammed Senan1, Ibrahim Abunadi2, Mukti E Jadhav3, Suliman Mohamed Fati2.   

Abstract

Cardiovascular disease (CVD) is one of the most common causes of death that kills approximately 17 million people annually. The main reasons behind CVD are myocardial infarction and the failure of the heart to pump blood normally. Doctors could diagnose heart failure (HF) through electronic medical records on the basis of patient's symptoms and clinical laboratory investigations. However, accurate diagnosis of HF requires medical resources and expert practitioners that are not always available, thus making the diagnosing challengeable. Therefore, predicting the patients' condition by using machine learning algorithms is a necessity to save time and efforts. This paper proposed a machine-learning-based approach that distinguishes the most important correlated features amongst patients' electronic clinical records. The SelectKBest function was applied with chi-squared statistical method to determine the most important features, and then feature engineering method has been applied to create new features correlated strongly in order to train machine learning models and obtain promising results. Optimised hyperparameter classification algorithms SVM, KNN, Decision Tree, Random Forest, and Logistic Regression were used to train two different datasets. The first dataset, called Cleveland, consisted of 303 records. The second dataset, which was used for predicting HF, consisted of 299 records. Experimental results showed that the Random Forest algorithm achieved accuracy, precision, recall, and F1 scores of 95%, 97.62%, 95.35%, and 96.47%, respectively, during the test phase for the second dataset. The same algorithm achieved accuracy scores of 100% for the first dataset and 97.68% for the second dataset, while 100% precision, recall, and F1 scores were reached for both datasets.
Copyright © 2021 Ebrahim Mohammed Senan et al.

Entities:  

Mesh:

Year:  2021        PMID: 34966445      PMCID: PMC8712170          DOI: 10.1155/2021/8500314

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

Cardiovascular disease (CVD) is one of the most common diseases that cause morbidity and mortality. It contributes to a third of deaths worldwide according to the American College [1]. Since 2102, numerous surveys concluded that nearly 56 million people lost their lives in 2012; amongst them, 17.5 million died due to CVD [2]. According to [3], CVD has three types: circulatory, structural, and electrical. In circulatory CVD, which is also called coronary artery disease (CAD), atherosclerosis (i.e., accumulation of plaques) is built up on the inner walls of a coronary artery, causing the arteries to harden [4-6]. This accumulated plaque consists of cholesterol or fatty deposits that restrict blood flow through the arteries. When CAD progresses, potentially fatal symptoms, such as stroke and myocardial infarction, begin to appear. Therefore, early treatment and recovery of atherosclerosis are important to minimise CVD risks. Several imaging methods are introduced with high accuracy and sensitivity to determine the disease severity [7, 8]. They include dobutamine stress echocardiography, exercise electrocardiogram (ECG), coronary computed tomography angiography, myocardial perfusion scintigraphy, and conventional coronary angiography. However, all these imaging methods could only discover existing atherosclerosis that already developed. For instance, ECG is an easy-to-access diagnostic tool that records the electrical activity of the heart [9]. ECG signals could be obtained during exercise as a patient undergoes stress [10]. ECG signals allow heart rate variability signals to be extracted [11]. The ECG technique is the primary choice for evaluating heart conditions because it is easy to perform, inexpensive, and noninvasive. However, manual diagnosis of ECG signals is tedious and difficult because the signals differ morphologically. In addition, the discovery of biomarkers, such as chest pain, serum cholesterol, resting electrocardiographic results, resting blood pressure, maximum heart rate, depression, fasting blood sugar, exercise-induced angina, slope of peak exercise, number of major vessels, segment, and thallium stress, in clinical samples has been helpful in understanding and diagnosing atherosclerosis. Therefore, to conduct a diagnostic process of ECG signals with high accuracy, artificial intelligence techniques are used to help diagnose arteriosclerosis through these tests and biomarkers. As the data collected for these biomarkers are huge, most of the studies on the diagnosis systems focused on the preprocessing process to clean the data, select the most important representative features, delete redundant features, and choose appropriate classification algorithms. For instance, Zimmerli et al. presented an assay for polypeptides that contribute to biomarkers for identifying CAD. They screened 359 urine samples from 88 patients with CAD and 282 controls. The system reached a sensitivity of 98% and a specificity of 83% [12]. Likewise, Tan et al. presented three diagnostic algorithms for a set of diagnostic features of heart disease. The systems were evaluated by accuracy, sensitivity, and specificity on four datasets: Cleveland, Hungarian, SPECTF, and Switzerland. Their proposed system reached accuracy scores of 81.19%, 92.68%, 82.7%, and 84.52% for Cleveland, Hungarian, SPECTF, and Switzerland, respectively [13]. Arabasadi et al. presented a hybrid method to diagnose CAD, and their algorithm was able to increase the performance of neural networks by 10% through a genetic algorithm (GA), which optimises primary weights. The system achieved an accuracy of 93.85%, a sensitivity of 97%, and a specificity of 92% [14]. Maji and Arora presented a hybrid method between Decision Tree and ANN classifiers for diagnosing heart disease. The ANN achieved an accuracy of 77.4%, a sensitivity of 77.4%, and a specificity of 21.7% [15]. Saqlain et al. presented three algorithms for selecting the most important features, which are the Fisher score-based algorithm, the algorithm for selecting the most important features based on forward, and the algorithm for selecting the most important features based on the reverse. The selected features were entered into an SVM classifier based on the RBF kernel for the diagnosis of four cardiac disease datasets. The system achieved an accuracy of 81.19% for the Cleveland dataset [16]. Babu et al. applied 14 features that were extracted, then fed into three classification algorithms, namely,K-means, MAFI, and Decision Tree, to classify heart disease. All algorithms performed well for diagnosing heart failure [17]. Reddy and Khare presented a rule-based fuzzy logic (RBFL) algorithm to predict heart disease and help medical practitioners diagnose it at an early stage. The locality preserving projection (LPP) method was first applied to determine the most important characteristics of the UCI dataset. The RBFL algorithm achieved an accuracy of 78% [18]. Feshki et al. presented Particle Swarm Optimization method with Neural Network Feed Forward Back Propagation, which reduced the features from 13 features to 8 enhanced features; the system reached an accuracy of 91.94% with these selected features [19]. Uyar and İlhan presented a GA based on a Recurrent Fuzzy Neural Network (RFNN) algorithm trained to diagnose heart disease, and the system achieved an accuracy of 97.78% [20]. Haq et al. presented seven machine learning algorithms to classify features extracted by three methods for selecting features of the heart failure dataset. The performance of the systems was evaluated using several scales such as accuracy, sensitivity, specificity, receiving optimism curves, and AUC, and they reached good results [21]. Kerexeta et al. presented two methods for predicting the risk of returning a patient with high blood pressure back to hospital. In the first method, the supervised and supervised classification methods were combined, and the system reached an AUC of 61%. The second method was combined the Naïve Bayes classifiers and the Decision Tree, and the method achieved an AUC of 73%. The limitations in this study are related to the dataset because the study is based on a readmission day threshold [22]. Adler et al. presented machine learning algorithms that link patient features with mortality, by training a Decision Tree algorithm with a set of features associated with high mortality risk. Eight characteristics that have a very high risk of death were extracted, and the risk score for these advantages was 88% for the AUC scale. Limitation of MARKER-HF was derived from two hospitals, San Diego, California, and is therefore subject to demographic region bias [23]. Jin et al. presented an effective method for predicting heart failure by using a neural network, where they used one-hot encoding and word vectors to model the diagnosis and prediction of heart failure through a long short-term memory algorithm [24]. Gjoreski et al. presented a method that combines machine learning and deep learning to diagnose heart failure based on the heart sounds of 947 people. Machine learning algorithms train expert features, while deep learning models train from the spectral chains of the heart signal. The method achieved an accuracy of 92.9 and an error rate of 7.1% [25]. Vijayashree and Sultana presented the Particle Swarm Optimization (PSO) method, which selects the most appropriate features and increases the important features for diagnosing heart disease. PSO was used in conjunction with SVM to reduce the number of features and increase accuracy; the system achieved good results for diagnosing heart disease [26]. However, most of the discussed studies above are insufficient. Therefore, the main contributions of this paper are as follows: Adjust and optimise hyperparameter of five machine learning algorithms for predicting heart failure (HF) with high accuracy Select the most important features with strong correlation to obtain more realistic diagnostic results Apply feature scoring to rank the features based on the correlated to the target feature Solve the class imbalance issue in the second dataset by synthetic minority oversampling (SMOTE) technique Create new features that have strong correlation with the target feature to obtain more realistic diagnostic results The remainder of this study is organised as follows. Section 2 describes a background on the overview and risk factors of HF diseases and an explanation of machine learning algorithms. Section 3 discusses the exploratory data analysis (EDA) to describe the two sets of data and explain the correlation between the features and the replacement of missing values. Section 4 presents data processing that includes subsections for engineering and selection of the most important features. Section 5 describes the experimental result and discussion part. Finally, Section 6 concludes the paper.

2. Background

2.1. Overview and Risk Factors of HF Diseases

Heart disease and atherosclerosis are disorders of the heart and arteries that include HF, coronary heart disease (heart attacks), cerebrovascular diseases (strokes), and other types of heart disease [27]. CVD is one of the most common causes of death in the world, with the number of deaths reaching roughly 17 million annually worldwide. HF occurs because the heart is unable to pump enough blood to the rest of the body. It is caused by diabetes, high blood pressure, and other heart diseases [28]. Doctors classify HF into two types on the basis of the ejection fraction value, which is the percentage of blood that the heart pumps during one contraction and a physiological value ranging from 50% to 75%. Low HF causes the ejection fraction, previously called left ventricular (LV) HF, to drop below 40% [29]. The final ejection fraction rate is HF with preserved ejection fraction, previously called diastolic HF, with a normal ejection fraction. In this case, during systole, the LV contracts normally but fails during diastole due to ventricular stiffness; thus, blood pumping is impaired [30]. Due to the importance of the heart organ, HF prediction has become of utmost importance for physicians in predicting HF; however, even today in clinical practices, physicians have failed to reach high accuracy in predicting HF [31]. Electronic medical records could be considered one of the most useful sources for uncovering correlated data amongst patients and an important source for researchers and clinical practices [32]. Machine learning techniques play an important role in analysing medical records, predicting the survival of each patient with HF, and detecting the most important features that lead to HF [33].

2.2. Machine Learning

Machine learning is the ability of computer programs to adapt, learn, and address new problems. Machine learning algorithms work on medical diagnostics and help experts support their decisions about their medical diagnosis. Machine learning has the ability to learn from training data and solve classification problems for new data [34].

2.2.1. K-Nearest Neighbor (KNN)

KNN is used to solve classification problems based on stored data. The algorithm trains the dataset and stores it in the memory. When the classification process is to test new data points, the algorithm works on the basis of similarity of the state between the new data point and the stored dataset and classifies new data in accordance with the most similar class on the basis of the value of K and the closest one on the basis of Euclidean distance.

2.2.2. Support Vector Machine (SVM)

This model is similar to neural networks in its objective of adjusting a set of parameters, which allow to establish boundaries in a dimensional space and approximate functions or separate patterns in different regions of the attribute space. The difference lies in the training method for adjusting the parameters. By contrast, SVMs base their training on maximizing the margin between the hyperplane and the instances of two classes (initially, this model was designed to solve problems of classifying two classes but extensions for multiclass and regression problems exist) [35]. The algorithm works with linear and nonlinear data. When the data are linear, the algorithm finds a hyperplane with maximum margin, which is the largest distance between data points of two classes. Maximum margin gives the algorithm power to classify the test dataset with high confidence. Hyperplane is the decision boundary that separates the class data. Support vectors are the data points that form close to the hyperplane. In accordance with support vectors, the distance is increased to maximize the margin. Thus, the hyperplanes change when removing these support vectors. Therefore, these points build an SVM classifier. For nonlinear data, the original coordinate area is converted into a separable space [35].

2.2.3. Decision Tree

Decision Tree is used to solve classification problems. It consists of root node, inner nodes, branches, and leaf nodes. It is organised in the form of a tree, where the root node represents the complete dataset, the internal nodes represent the features contained in the dataset, the branches represent the decision-making area, and the leaf nodes represent the outcome. Decisions are made on the basis of features selected in the dataset. When predicting dataset features, the algorithm starts from the root node. The algorithm compares the value of the root feature with the feature's values of the dataset, and in accordance with the comparison, it moves to the next nodes. The process continues to the next node, where the feature in the node is compared with the features in the next nodes, and the process continues until the leaf node is obtained.

2.2.4. Random Forest

Random Forest is used to solve classification problems. It works on the basis of ensemble learning, as it solves the problem by combining several classifiers to improve the performance of the algorithm. The algorithm contains several classifiers of Decision Trees. Each Decision Tree works with a subset of data and average taken to improve prediction accuracy. Instead of taking prediction from one tree, the Random Forest algorithm takes prediction from each tree and works on prediction on the basis of majority voting.

2.2.5. Logistic Regression

Logistic Regression is one of the supervised machine learning algorithms used to solve classification problems to predict probability-based target variables. The target or dependent variables are binary variables that contain two classes; multinomial target variables have three or more unordered types or ordinal variables, where the target variable contains three or more ordered variables.

3. Exploratory Data Analysis (EDA)

This section focuses on data preprocessing, including missing data treatment, outlier removal, and feature correlation test. Figure 1 describes the structure applied to evaluate the performance of the algorithms on the two datasets for early diagnosis of heart disease.
Figure 1

Experimental methodology of heart disease.

3.1. Description of Datasets

Two datasets of heart disease and failure data were collected from the UCI machine learning repository. The first dataset is called Cleveland (https://archive.ics.uci.edu/ml/datasets/heart+Disease) [36], which is commonly used by heart-disease diagnostic machine learning researchers. The Cleveland dataset consists of 303 records, with 76 features. However, the UCI repository provides approved 14 features that are most influential in the field of diagnosing heart disease. Table 1 describes the features, measures, and their ranges. Thirteen features could be used for diagnosing heart disease, and one target feature could be used to describe whether or not a disease exists.
Table 1

Diagnosing heart disease features with metrics from the Cleveland dataset.

FeaturesDescriptionExplanationType
AgePatient ageAge of patient in yearNumeric
Sex1 = maleNominal
Patient gender0 = female
cp1 = typical anginaNominal
Chest pain2 = atypical angina
3 = nonanginal pain
4 = asymptomatic
trestbpsPatient's blood pressure at rest (mm/Hg)Resting blood pressure (mm/Hg)Numeric
cholPatient's cholesterol (mg/dL)Serum cholesterol (mg/dL)Numeric
fbs1 = Fasting blood sugar > 120 mg/dLNominal
Patient's blood sugar during fasting0 = Fasting blood sugar < 120 mg/dL
restecg0 = normalNominal
Electrocardiographic measurement at rest1 = ST-T wave abnormality
2 = probable left ventricular hypertrophy
thalachMaximum heart ratesMaximum heart rate achievedNumeric
exangAngina due to exercise1 = exercise induced anginaNominal
0 = exercise induced no angina
OldpeakST depressionST depression induced by exercise relative to restNumeric
Slope1 = upslopingNominal
Slope of ST2 = flat
3 = downsloping
caNumber of major vesselsNumber of major vessels (0-3) colored by fluoroscopyNumeric
thal3 = normalNominal
Blood disorder6 = fixed defect
7 = reversible defect
Target0 = normalNominal
1 = heart disease
The second dataset for predicting HF contains medical records of 299 patients with HF (https://archive.ics.uci.edu/ml/machine-learning-databases/00519/) [37]. The dataset was collected from the Faisalabad Institute of Heart Disease and the Allied Hospital in Faisalabad. Table 2 describes the features, measurement, and range of HF prediction. Twelve features could predict HF in addition to the target feature that describes whether or not the patient died during follow-up. Table 2 also explains each feature and the subsections that represent each feature.
Table 2

Heart failure features with metrics from the Allied Hospital dataset.

FeaturesExplanationRangeMeasurement
AgeAge of patient in year[40, ..., 95]Year
Anaemia1 = haematocrit levels lower than 36%0, 1Boolean
0 = haematocrit levels higher than 36%
High blood pressure1 = patient has hypertension0, 1Boolean
0 = patient has no hypertension
Creatinine phosphokinaseLevel of CPK in blood[23, ..., 7861]mcg/L
Diabetes1 = patient has diabetes0, 1Boolean
0 = patient has no diabetes
Sex1 = male0, 1Boolean
0 = female
PlateletsBlood platelets[25.01, ..., 850.00]Kiloplatelets/mL
Serum creatinineLevel of creatinine in bloodmg/dL[0.50, ..., 9.40]
Serum sodiumLevel of sodium in bloodmEq/L[114, ..., 148]
Smoking1 = patient smokes0, 1Boolean
0 = patient does not smoke
TimePeriodic follow-up of patientDays[4,..., 285]
Death event (target)1 = patient died during follow-up0, 1Boolean
0 = patient did not die during follow-up

mcg/L refers to micrograms per litre. mL refers to microlitre. mEq/L refers to milliequivalents per litre.

3.2. Statistical Feature Correlation Using Heat Map

A heat map is a graphical representation that shows the correlation between features and the percentage of correlation of each feature with the other. It also describes the correlation of all features with the target feature. Statistics is a set of computational tools used to interpret raw data and convert them into information to be understood. It is one of the tools used in the field of machine learning. Statistics and machine learning are two closely related fields. In this study, descriptive statistics were calculated on the dataset of heart disease (Cleveland dataset) and HF to obtain the features of common and correlated data samples as mean, standard deviation, and max and min values. Table 3 describes the statistical processes applied to the Cleveland dataset, where count refers to the number of features of dataset, mean refers to the mean between the features of the dataset, std refers to the standard deviation between the features of the dataset, and min and max refer to the minimum and maximum values amongst the features of the dataset. Descriptive statistics have a positive effect on graphic visualisations to easily understand raw data and relate the data to one another. Figure 2 illustrates the correlation between the features of the dataset with one another. “cp” (chest pain), “thalach,” and “slope” features were correlated closely with the target feature, and the correlation of each distinctive with the other was noted.
Table 3

Statistical operations for the Cleveland dataset.

StatisticalAgeSexcptrestbpscholfbsrestecgthalachexangOldpeakSlopecathalTarget
Count303303303303303303303303303303303303303303
Mean54.370.680.97131.6246.30.150.53149.70.331.041.40.732.310.54
std9.080.471.0317.5451.830.360.5322.910.471.160.621.020.610.5
Min2900941260071000000
Max77132005641220216.22431
Figure 2

Feature correlation of the Cleveland dataset by using heat map.

3.3. Treatment of Missing Values

Datasets that contain missing values need to be addressed and cleaned up. Missing values result from patients missing out on some metrics when they undergo a test. The Cleveland dataset contained missing values, and the second dataset contained six missing values. Thus, statistical measures must be applied to replace the missing values. Statistical measures, such as mean, median, and standard deviation, are applied to replace numerical values. The mode method is also applied to replace nominal values. Table 4 describes the missing features for the Cleveland dataset after processing. The mean method was applied to replace the numerical values by calculating the mean for the features and replacing the missing value. The mode method was also applied to replace the missing nominal values by replacing the nominal value with the most common value in the features.
Table 4

Missing values.

FeaturesMissing values
Age0
Sex0
cp0
trestbps0
chol0
fbs0
restecg0
thalach0
exang0
Oldpeak0
Slope0
ca0
thal0
Target0
dtype: int640

3.4. Balancing a Dataset

With regard to data balancing, the Cleveland dataset contains 165 people with heart disease and 138 people without heart disease. Thus, the dataset is balanced. Regarding the second dataset, the number of people who died during follow-up was 203, while 96 people did not die during follow-up; therefore, the dataset is unbalanced. To obtain satisfactory results, the dataset must be balanced during the training phase. In this study, the synthetic minority oversampling technique (SMOTE) was applied, which is one of the appropriate methods for balancing the dataset. SMOTE technique searches for minority classes and finds the nearest neighbor for each point (value) in the minority class to generate new synthetic samples at a given point randomly, the mechanism continues until the dataset is balanced during the training phase, and the minority class becomes approximately equal to the majority class. Table 5 describes the second dataset before and after the application of the SMOTE technique, where it is noted that the cases of the minority class (die during follow-up) increased from 79 cases to 160 cases; thus, the two classes became equal during the training phase.
Table 5

Balancing the dataset by SMOTE.

Dataset
PhaseTraining 80%Testing 20%
ClassesDid not die during follow-upDie during follow-upDid not die during follow-upDie during follow-up
Before Oversampling160794317
After Oversampling1601604317

3.5. Data Conversion

Data processing is the ability to transform data into useful data that could be manipulated and analysed. In this study, categorical variables were converted into dummy variables, which include the values 0 and 1. Dummy variables are useful for multiple groups in single regression equations. Table 6 describes the dataset after converting the categorical features to dummy.
Table 6

Converting categorical data to dummy.

AgetrestbpscholthalachOldpeakTargetSex_0Sex_1cp_0cp_1...Slope_2ca_0ca_1ca_2ca_3
0631452331502.310100...0100
1371302501873.510100...0100
2411302041721.411001...1100
3561202361780.810101...1100
4571203541630.611010...1100

3.6. Data Standardization

Preprocessing is one of the most important stages of data mining, and it leads to diagnostic accuracy in the following stages. In this study, data standardization method was applied to the two datasets. Standardization coordinates the data internally, ensures that all data have the same formatting and content, and gives the dataset more meaning. It transforms the dataset, in which its distribution has a mean of 0 and a standard deviation value of 1. In this study, the standardization method was applied in accordance with Equation (1) and the dataset obtained its data distribution. Each feature in the dataset obtained its value subtracted from the mean and divided by the standard deviation of the whole dataset. Table 7 describes the application of the standardization method to four features of the dataset, namely, “trestbps,” “chol,” “thalach,” and “oldpeak.” where x denotes the value of each feature, μ refers to the mean for each feature, σ denotes the standard deviation of the dataset, and z refers to the features in a standardised form.
Table 7

Preprocessing of data by using standardization method.

AgetrestbpscholthalachOldpeak
00.950.76−0.260.021.09
1−1.92−0.090.071.632.12
2−1.47−0.09−0.820.980.31
30.18−0.66−0.21.24−0.21
40.29−0.662.080.58−0.38

4. Feature Processing

4.1. Feature Engineering

Feature engineering, also called feature creation, is the process of creating new features from the existing dataset for the purpose of training machine learning models and obtaining more reliable results. Usually, the feature engineering process is manual, relying on intuition, field knowledge, and data manipulation. It is also very tedious and limited. Thus, automated feature engineering helps data scientists create new features that are well correlated and use them for training. Table 8 describes the additional relevant extracted features correlated between two features, in which 60 features from the original Cleveland dataset containing 13 features were obtained. With these new features, the solution to the classification problem could be enhanced. The better the features, the better the results.
Table 8

Creation of features and arranging the best features.

NoFeatureScore
1exang_oldpeak2652.854396
2exang_ca263.212119
3Sex_oldpeak2252.657949
4thal_oldpeak2247.914913
5exang_trestbps2241.749732
6Thalach186.180286
7Oldpeak2171.4864
8fbs_oldpeak2164.89897
9Age2_oldpeak2139.151372
10exang_chol2131.365522
11thal_trestbps2116.88462
12thal_chol2113.985724
13thal_ca90.668503
14Sex_trestbps278.162433
15Sex_ca77.302537
16Oldpeak71.692782
17Ca71.020719
18Cp62.116086
19Age2_ca54.956199
20Age2_trestbps253.221349
21restecg_cp51.837075
22fbs_ca43.441045
23exang38.518849
24Age2_chol236.438097
25Sex_chol235.823916
26fbs_cp32.072291
27restecg_thalach229.718076
28exang_thalach227.279766
29Age22.210517
30Chol21.690747
31exang_slope20.48139
32exang_cp18.443334
33restecg_slope18.246965
34trestbps15.094591
35restecg_trestbps212.462827
36thal_thalach212.403249
37Slope9.677715
38restecg_oldpeak28.249627
39Sex7.72169
40thal_slope7.199342
41Sex_thalach25.89906
42fbs_trestbps25.897746
43thal_cp5.838268
44Thal5.75303
45thalach25.688919
46restecg_chol25.639162
47fbs_slope5.480661
48Age2_thalach25.466855
49fbs_thalach24.64652
50Age2_slope4.031292
51Sex_slope2.987813
52restecg2.877743
53Age2_cp2.324974
54fbs_chol22.116344
55Age22.00334
56trestbps21.655964
57chol20.877493
58Sex_cp0.564425
59restecg_ca0.256374
60fbs0.184946

4.2. Feature Selection

Feature selection methods are aimed at reducing unimportant features and focusing on features that contribute to the most predictable feature of the target feature. Reducing the number of features reduces the computational cost of modelling and improves the performance of the model. The methods for selecting features by means of statistics include assessing the relationship between each feature and the target feature and selecting the input features that have the strongest correlation with the target feature. In this study, SelectKBest with the chi-square method was used to extract the best features from the dataset. The SelectKBest function uses this method as a score function to determine a score and the correlation between each feature and target feature. It passes chi-square to determine the score between each feature and the target feature. If the resulting value is lower, then the feature is independent of the target feature, while higher resulting value indicates that the feature is not randomly related to the target feature. Table 8 describes how the SelectKBest function automatically returns the first K features with the highest scores of the Cleveland dataset. The exang_oldpeak2 feature, which is correlated between exang feature and oldpeak2 feature, had the highest score of 652.85, while the lowest fbs feature score was 0.18.

5. Experimental Result and Discussion

5.1. Splitting Datasets

The Cleveland dataset consisted of 303 records for two classes: heart disease (class 1), which contained 165 records by 54.46%, and normal (class 0), which contained 138 records by 45.54%. The second dataset (HF) contains 299 records for two classes: died during follow-up (class 1), containing 203 records by 67.87%, and did not die during follow-up (class 0) containing 96 records by 32.13%. After balancing the second dataset, the two classes became equal to 160 cases during the training phase. Table 9 describes the distribution of the two datasets for the two groups during the training and testing phases.
Table 9

Splitting the datasets.

DatasetCleveland datasetHF dataset
ClassHeart diseaseNormalDied during follow-upDid not die during follow-up
Training133109160160
Testing32294317

5.2. Evaluation Criteria

Four qualitative measures were used, namely, accuracy, precision, recall, and F1score, to evaluate the proposed systems on the two datasets, as shown in Equations (2)–(5). where TP is the number of heart disease samples that are correctly classified, TN is the number of nonheart disease samples that are correctly classified, FN is the number of heart disease samples classified as nonheart disease, and FP is the number of nonheart disease samples classified as heart disease [38].

5.3. Results for the Cleveland Dataset

Several machine learning algorithms have been applied to predict heart disease for patient survival. Classification and hyperparameter algorithms that produce optimal networks have been optimised to reduce function loss and to obtain high diagnostic performance. Tuning hyperparameter is an important process for determining the behaviour of machine learning networks during training. In this study, machine learning models were applied to the dataset containing 13 original features of 303 patients. New features were created through the correlation of the original features, and they were expanded to 60 features for each patient. The dataset was divided randomly into 80% for training (242 patients) and 20% for testing (61 patients). Figure 3 shows the performance of classification algorithms on the dataset during the training and testing phases. Table 10 shows the diagnostic results for heart disease by using five machine learning algorithms during the training and testing processes. During the training phase, Decision Tree and Random Forest obtained the best results of 100% for all measures. However, during the testing phase, SVM and KNN algorithms achieved the best results, with an approximate rate of 90% for all measures. Logistic Regression obtained the lowest result during the training and testing phases amongst all the algorithms. During the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached accuracy scores of 92.56%, 87.60%, 100%, 100%, and 87.60%, respectively; in the testing phase, their accuracy scores were 90.16%, 90.16%, 81.97%, 85.25%, and 88.52%, respectively. For the precision, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached of 93.45%, 88.82%, 100%, 100%, and 88.19%, respectively; in the testing phase, their precision rates were 90.12%, 90.26%, 82.43%, 85.82%, and 88.56%, respectively. During the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached recall rates of 92.52%, 87.63%, 100%, 100%, and 87.44%, respectively; in the testing phase, their recall rates were 90.45%, 90.38%, 82.39%, 85.29%, and 89.46%, respectively. For the F1 score, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached 92.98%, 88.23%, 100%, 100%, and 87.81%, respectively; In the testing phase, their F1 scores were 90.28%, 90.32%, 82.41%, 85.55%, and 89%, respectively.
Figure 3

Evaluating the performance of five classifiers on the Cleveland dataset.

Table 10

Results of diagnosing heart disease (Cleveland dataset) by using five machine learning algorithms.

ClassifiersSVMKNNDecision TreeRandom ForestLogistic Regression
CriteriaTraining 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%
Accuracy (%)92.5690.1687.6090.1610081.9710085.2587.6088.52
Precision (%)93.4590.1288.8290.2610082.4310085.8288.1988.56
Recall (%)92.5290.4587.6390.3810082.3910085.2987.4489.46
F1 score92.9890.2888.2390.3210082.4110085.5587.8189.00
Table 11 and Figure 4 describe the analysis of the results obtained in depth at each class, where heart disease = 1 and nonheart disease = 0. The dataset was divided into 80% for training and 20% for testing. The training data were further divided into 133 for heart disease and 109 for nonheart disease, while the test data were divided into 32 for heart disease and 29 for nonheart disease. During the training phase, Decision Tree and Random Forest achieved the best results for diagnosing heart and nonheart diseases by 100% for all measures. However, during the testing phase, KNN achieved better than the rest of the algorithms, with rates of 90% for all criteria when diagnosing negative cases (nonheart disease) and 91% for all criteria when diagnosing positive cases (heart disease). First, in the analysis and interpretation of the results of the diagnosis of heart disease (class 1) during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a precision of 92%, 88%, 100%, 100%, and 87%, respectively; in the testing phase, their precision was 93%, 91%, 86%, 87%, and 90%, respectively. For the recall, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 95%, 90%, 100%, 100%, and 91%, respectively; in the testing phase, their recall was 88%, 91%, 78%, 84%, and 88%, respectively, while for the F1 score during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 93%, 89%, 100%, 100%, and 89%, respectively; in the testing phase, their F1 scores were 90%, 91%, 82%, 86%, and 89%, respectively. Second, in the analysis and interpretation of the results of the diagnosis of nonheart disease (class 0) during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a precision of 93%, 88%, 100%, 100%, and 88%, respectively; in the testing phase, their precision was 87%, 90%, 78%, 83%, and 87%, respectively. For the recall, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 90%, 84%, 100%, 100%, and 83%, respectively; in the testing phase, their recall was 93%, 90%, 86%, 86%, and 90%, respectively, while for the F1 score during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 92%, 86%, 100%, 100%, and 86%, respectively; in the testing phase, their F1 scores were 90%, 90%, 82%, 85%, and 88%, respectively.
Table 11

Results of diagnosing heart disease as the category using five machine learning algorithms.

SNClassifiersDivision of dataClassPrecision (%)Recall (%)F1 score (%)Number of patients
1SVMTraining (80%)0939092109
21929593133
3Testing (20%)087939029
4193889032
5KNNTraining (80%)0888486109
61889089133
7Testing (20%)090909029
8191919132
9Decision TreeTraining (80%)0100100100109
101100100100133
11Testing (20%)078868229
12186788232
13Random ForestTraining (80%)0100100100109
141100100100133
15Testing (20%)083868529
16187848632
17Logistic RegressionTraining (80%)0888386109
181879189133
19Testing (20%)087908829
20190888932
Figure 4

The performance of classification algorithms for each class.

5.4. Results of HF Dataset

A medical dataset containing 299 patients with HF was analysed. This section describes the outcomes that predicted patient survival during the follow-up period. The features were arranged in accordance with the correlation with the target feature (death event), and the data lost due to the loss of some tests during patient examination were processed and replaced. Correlated features were created between two features, and they have an effective effect on increasing the accuracy of prediction. Hyperparameter classification algorithms were adjusted to reduce the loss function and obtain high predictive results. The dataset was divided into 80% for training (160 patients died during follow-up, and 79 patients did not die during follow-up) and 20% for testing (43 patients died during follow-up, and 17 patients did not die during follow-up). Figure 5 illustrates the evaluation of the dataset on the performance of the algorithms during the training and testing phases. Table 12 shows the results for predicting HF by using five classification algorithms during the training and testing phases. Random Forest achieved the best performance during both phases, followed by Decision Tree, KNN, SVM, and Logistic Regression. During the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached an accuracy of 92.35%, 96.82%, 96.46%, 97.68%, and 91.05%, respectively; in the testing phase, their accuracy was 90%, 93.33%, 95%, 95%, and 88.33%, respectively. For the precision, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached of 95.41%, 95.76%, 97.11%, 100%, and 94.52%, respectively; in the testing phase, their precision rates were 93.02%, 93.33%, 93.48%, 97.62%, and 93%, respectively. During the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached recall rates of 96.10%, 98.51%, 100%, 100%, and 92.39%, respectively; in the testing phase, their recall rates were 93.02%, 97.67%, 100%, 95.35%, and 90%, respectively. For the F1 score, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached 95.75%, 97.12%, 98.53%, 100%, and 93.44%, respectively; in the testing phase, their F1 scores were 93.02%, 95.45%, 96.63%, 96.47%, and 91.93%, respectively.
Figure 5

Evaluating the performance of five classifiers on the heart failure dataset.

Table 12

Prediction results of heart failure by using five machine learning algorithms.

ClassifiersSVMKNNDecision TreeRandom ForestLogistic Regression
CriteriaTraining 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%Training 80%Testing 20%
Accuracy (%)92.3590.0096.8293.3396.4695.0097.6895.0091.0588.33
Precision (%)95.4193.0295.7693.3397.1193.48100.0097.6294.5293.00
Recall (%)96.1093.0298.5197.67100.00100.00100.0095.3592.3990.90
F1 score95.7593.0297.1295.4598.5396.63100.0096.4793.4491.93
Table 13 and Figure 6 describe the HF results predicted using the five machine learning algorithms for each class (1 = died during follow-up and 0 = did not die during follow-up). The training data were divided into 160 for class 1 and 79 for class 0, while the test data were divided into 43 for class 1 and 17 for class 0. Random Forest achieved the best result during the training phase for both classes, with 100% for each criterion (precision, recall, and F1 score). During the testing phase, Random Forest also achieved the best precision for predicting HF, with 97% for class 1 and 98% for class 0. Meanwhile, Decision Tree achieved the best recall of 100% for both classes. Random Forest showed the best F1 score of 96% for predicting positive cases and 97% for predicting negative cases. First, in the analysis and interpretation of the results of the diagnosis of died during follow-up (class 1) during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a precision of 93%, 91%, 98%, 100%, and 96%, respectively; in the testing phase, their precision was 94%, 94%, 94%, 97%, and 93%, respectively. For the recall, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 96%, 99%, 100%, 100%, and 91%, respectively; in the testing phase, their recall was 94%, 98%, 100%, 95%, and 92%, respectively, while for the F1 score during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 97%, 98%, 99%, 100%, and 93%, respectively; in the testing phase, their F1 scores were 94%, 96%, 97%, 96%, and 93%, respectively. Second, in the analysis and interpretation of the results of the diagnosis of did not die during follow-up (class 0) during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a precision of 92%, 89%, 97%, 100%, and 94%, respectively; in the testing phase, their precision was 92%, 92%, 92%, 98%, and 93%, respectively. For the recall, during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 95%, 97%, 100%, 100%, and 93%, respectively; in the testing phase, their recall was 91%, 97%, 100%, 96%, and 90%, respectively, while for the F1 score during the training phase, SVM, KNN, Decision Tree, Random Forest, and Logistic Regression reached a recall of 95%, 96%, 98%, 100%, and 94%, respectively; in the testing phase, their F1 scores were 92%, 94%, 95%, 97%, and 91%, respectively.
Table 13

Result prediction of heart failure as the category using five machine learning algorithms.

SNClassifiersDivision of dataClassPrecision (%)Recall (%)F1 score (%)Number of patients
1SVMTraining (80%)092959579
21939697160
3Testing (20%)092919217
4194949443
5KNNTraining (80%)089979679
61919998160
7Testing (20%)092979417
8194989643
9Decision TreeTraining (80%)0971009879
1019810099160
11Testing (20%)0921009517
121941009743
13Random ForestTraining (80%)010010010079
141100100100160
15Testing (20%)098969717
16197959643
17Logistic RegressionTraining (80%)094939479
181969193160
19Testing (20%)093909117
20193929343
Figure 6

The performance of classification algorithms for each class.

5.5. Comparison of the Performance of Algorithms between the Two Datasets

Similar data processing methods, preprocessing, processing features and arranging them in order of importance, and classification algorithms, were applied to the two datasets, Cleveland and HF datasets. Through the analyses in the previous sections, the diagnostic systems were able to evaluate the HF dataset, with an accuracy that exceeded the evaluation of the Cleveland dataset during the test phase. Table 14 and Figure 7 describe the analytical results to compare the performance of machine learning algorithms on the two datasets. First, the performance of the SVM, KNN, Decision Tree, Random Forest, and Logistic Regression on the Cleveland dataset during the training phase reached to an accuracy of 92.56%, 87.60%, 100%, 100%, and 87.60%, respectively; in the testing phase, their accuracy was 90.16%, 90.16%, 81.97%, 85.25%, and 88.52%, respectively. Second, the performance of the SVM, KNN, Decision Tree, Random Forest, and Logistic Regression on the HF dataset during the training phase reached to an accuracy of 92.35%, 96.82%, 96.46%, 97.68%, and 91.05%, respectively; in the testing phase, their accuracy was 90%, 93.33%, 95%, 95%, and 88.33%, respectively.
Table 14

Accuracy of diagnosing two dataset using five machine learning algorithms.

DatasetSVMKNNDecision TreeRandom ForestLogistic Regression
DatasetTrainingTestingTrainingTestingTrainingTestingTrainingTestingTrainingTesting
Cleveland92.5690.1687.6090.1610081.9710085.2587.6088.52
HF92.3590.0096.8293.3396.4695.0097.6895.0091.0588.33
Figure 7

Comparison of system performance on the diagnostic accuracy in two datasets.

5.6. Comparison with Previous Studies

Table 15 and Figure 8 describe the evaluation of machine learning network models proposed by several criteria evaluated in relevant previous studies. As noted, previous studies were evaluated with some criteria. All previous studies reached an accuracy ranging between 93.85% and 77.55%, while the accuracy of the proposed system reached 100% during training and 95% during testing. The previous studies reached a precision ranging between 91.4% and 77.4%, while the proposed system reached 100% during training and 97.62% during testing. The recall (sensitivity) in previous studies reached a rate ranging between 97% and 72%, while the proposed system reached 100% during training and 95.35% during testing.
Table 15

Comparison of the performance between the proposed system and previous studies.

Previous studiesAccuracy (%)Precision (%)Recall (%)F1 score (%)
Arabasadi et al. [14]93.8597
Maji and Arora [15]77.477.4
Reddy et al. [39]9091
Amin et al. [40]78.1578.1580.25
Feshki and Shijani [19]91.9491.993
Pouriyeh et al. [41]77.5577.48380.1
Chicco and Jurman [42]83.87271.9
Proposed model first dataset for training97.68100100100
Proposed model second dataset for training100100100100
Proposed model first dataset for testing90.1690.2690.3890.32
Proposed model second dataset for testing9597.6295.3596.47
Figure 8

The performance of our systems with the previous studies.

6. Conclusion and Future Work

The importance of electronic biometrics was verified in the process of predicting heart disease and failure. The SelectKBest function with the chi-square statistical method was applied to select the features with strong correlation with the target feature, and then, the degree between each feature and the target feature was determined. Feature engineering method was also applied to increase the number of correlated features between them and train machine learning models to obtain reliable results that were better than the results obtained from the original features of the two datasets. Machine learning algorithms used optimised hyperparameters and fed them with new features. All algorithms reached superior results during the training and testing phases of the two datasets. During the testing phase, all algorithms achieved better results for the second dataset (HF) than for the first dataset (Cleveland). For the first dataset, Random Forest and Decision Tree reached the best results during the training phase, with 100% for all measures. During the testing phase, SVM and KNN achieved better results than the rest of the algorithms. For the second dataset, Random Forest obtained the best results during both phases. There are some limitations to the study. First, the two datasets used and publicly available are relatively small. Second, the two datasets do not contain feature natriuretic peptides (NPs) which are biomarkers of heart failure, where NPs rise with age and NPs decrease in obese patients. Third, the two datasets did not include advantages about the patients' diet. However, despite the limitations, the two datasets had sufficient features. Our aim was to rank the significance of the features on the basis of the score and correlation feature of heart failure. The future scope of this work is the application of the Internet of Things and the testing of new samples in real time.
  13 in total

Review 1.  Atherosclerosis: current pathogenesis and therapeutic options.

Authors:  Christian Weber; Heidi Noels
Journal:  Nat Med       Date:  2011-11-07       Impact factor: 53.440

2.  Application of stacked convolutional and long short-term memory network for accurate identification of CAD ECG signals.

Authors:  Jen Hong Tan; Yuki Hagiwara; Winnie Pang; Ivy Lim; Shu Lih Oh; Muhammad Adam; Ru San Tan; Ming Chen; U Rajendra Acharya
Journal:  Comput Biol Med       Date:  2018-01-02       Impact factor: 4.589

3.  Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm.

Authors:  Zeinab Arabasadi; Roohallah Alizadehsani; Mohamad Roshanzamir; Hossein Moosaei; Ali Asghar Yarifard
Journal:  Comput Methods Programs Biomed       Date:  2017-01-18       Impact factor: 5.428

Review 4.  Characterization of coronary atherosclerosis by intravascular imaging modalities.

Authors:  Satoshi Honda; Yu Kataoka; Tomoaki Kanaya; Teruo Noguchi; Hisao Ogawa; Satoshi Yasuda
Journal:  Cardiovasc Diagn Ther       Date:  2016-08

5.  Global, Regional, and National Burden of Cardiovascular Diseases for 10 Causes, 1990 to 2015.

Authors:  Gregory A Roth; Catherine Johnson; Amanuel Abajobir; Foad Abd-Allah; Semaw Ferede Abera; Gebre Abyu; Muktar Ahmed; Baran Aksut; Tahiya Alam; Khurshid Alam; François Alla; Nelson Alvis-Guzman; Stephen Amrock; Hossein Ansari; Johan Ärnlöv; Hamid Asayesh; Tesfay Mehari Atey; Leticia Avila-Burgos; Ashish Awasthi; Amitava Banerjee; Aleksandra Barac; Till Bärnighausen; Lars Barregard; Neeraj Bedi; Ezra Belay Ketema; Derrick Bennett; Gebremedhin Berhe; Zulfiqar Bhutta; Shimelash Bitew; Jonathan Carapetis; Juan Jesus Carrero; Deborah Carvalho Malta; Carlos Andres Castañeda-Orjuela; Jacqueline Castillo-Rivas; Ferrán Catalá-López; Jee-Young Choi; Hanne Christensen; Massimo Cirillo; Leslie Cooper; Michael Criqui; David Cundiff; Albertino Damasceno; Lalit Dandona; Rakhi Dandona; Kairat Davletov; Samath Dharmaratne; Prabhakaran Dorairaj; Manisha Dubey; Rebecca Ehrenkranz; Maysaa El Sayed Zaki; Emerito Jose A Faraon; Alireza Esteghamati; Talha Farid; Maryam Farvid; Valery Feigin; Eric L Ding; Gerry Fowkes; Tsegaye Gebrehiwot; Richard Gillum; Audra Gold; Philimon Gona; Rajeev Gupta; Tesfa Dejenie Habtewold; Nima Hafezi-Nejad; Tesfaye Hailu; Gessessew Bugssa Hailu; Graeme Hankey; Hamid Yimam Hassen; Kalkidan Hassen Abate; Rasmus Havmoeller; Simon I Hay; Masako Horino; Peter J Hotez; Kathryn Jacobsen; Spencer James; Mehdi Javanbakht; Panniyammakal Jeemon; Denny John; Jost Jonas; Yogeshwar Kalkonde; Chante Karimkhani; Amir Kasaeian; Yousef Khader; Abdur Khan; Young-Ho Khang; Sahil Khera; Abdullah T Khoja; Jagdish Khubchandani; Daniel Kim; Dhaval Kolte; Soewarta Kosen; Kristopher J Krohn; G Anil Kumar; Gene F Kwan; Dharmesh Kumar Lal; Anders Larsson; Shai Linn; Alan Lopez; Paulo A Lotufo; Hassan Magdy Abd El Razek; Reza Malekzadeh; Mohsen Mazidi; Toni Meier; Kidanu Gebremariam Meles; George Mensah; Atte Meretoja; Haftay Mezgebe; Ted Miller; Erkin Mirrakhimov; Shafiu Mohammed; Andrew E Moran; Kamarul Imran Musa; Jagat Narula; Bruce Neal; Frida Ngalesoni; Grant Nguyen; Carla Makhlouf Obermeyer; Mayowa Owolabi; George Patton; João Pedro; Dima Qato; Mostafa Qorbani; Kazem Rahimi; Rajesh Kumar Rai; Salman Rawaf; Antônio Ribeiro; Saeid Safiri; Joshua A Salomon; Itamar Santos; Milena Santric Milicevic; Benn Sartorius; Aletta Schutte; Sadaf Sepanlou; Masood Ali Shaikh; Min-Jeong Shin; Mehdi Shishehbor; Hirbo Shore; Diego Augusto Santos Silva; Eugene Sobngwi; Saverio Stranges; Soumya Swaminathan; Rafael Tabarés-Seisdedos; Niguse Tadele Atnafu; Fisaha Tesfay; J S Thakur; Amanda Thrift; Roman Topor-Madry; Thomas Truelsen; Stefanos Tyrovolas; Kingsley Nnanna Ukwaja; Olalekan Uthman; Tommi Vasankari; Vasiliy Vlassov; Stein Emil Vollset; Tolassa Wakayo; David Watkins; Robert Weintraub; Andrea Werdecker; Ronny Westerman; Charles Shey Wiysonge; Charles Wolfe; Abdulhalik Workicho; Gelin Xu; Yuichiro Yano; Paul Yip; Naohiro Yonemoto; Mustafa Younis; Chuanhua Yu; Theo Vos; Mohsen Naghavi; Christopher Murray
Journal:  J Am Coll Cardiol       Date:  2017-05-17       Impact factor: 24.094

6.  A systematic review of clinical prediction rules for the diagnosis of chronic heart failure.

Authors:  Joe Gallagher; Darren McCormack; Shuaiwei Zhou; Fiona Ryan; Chris Watson; Kenneth McDonald; Mark T Ledwidge
Journal:  ESC Heart Fail       Date:  2019-03-10

7.  Machine learning for prediction of sudden cardiac death in heart failure patients with low left ventricular ejection fraction: study protocol for a retroprospective multicentre registry in China.

Authors:  Fanqi Meng; Zhihua Zhang; Xiaofeng Hou; Zhiyong Qian; Yao Wang; Yanhong Chen; Yilian Wang; Ye Zhou; Zhen Chen; Xiwen Zhang; Jing Yang; Jinlong Zhang; Jianghong Guo; Kebei Li; Lu Chen; Ruijuan Zhuang; Hai Jiang; Weihua Zhou; Shaowen Tang; Yongyue Wei; Jiangang Zou
Journal:  BMJ Open       Date:  2019-05-16       Impact factor: 2.692

Review 8.  Clinical profiles in acute heart failure: an urgent need for a new approach.

Authors:  Brittany Chapman; Adam D DeVore; Robert J Mentz; Marco Metra
Journal:  ESC Heart Fail       Date:  2019-04-25

9.  Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone.

Authors:  Davide Chicco; Giuseppe Jurman
Journal:  BMC Med Inform Decis Mak       Date:  2020-02-03       Impact factor: 2.796

Review 10.  Biological Phenotypes of Heart Failure With Preserved Ejection Fraction.

Authors:  Gavin A Lewis; Erik B Schelbert; Simon G Williams; Colin Cunnington; Fozia Ahmed; Theresa A McDonagh; Christopher A Miller
Journal:  J Am Coll Cardiol       Date:  2017-10-24       Impact factor: 24.094

View more
  3 in total

1.  Hybrid and Deep Learning Approach for Early Diagnosis of Lower Gastrointestinal Diseases.

Authors:  Suliman Mohamed Fati; Ebrahim Mohammed Senan; Ahmad Taher Azar
Journal:  Sensors (Basel)       Date:  2022-05-27       Impact factor: 3.847

2.  Multi-Method Diagnosis of Blood Microscopic Sample for Early Detection of Acute Lymphoblastic Leukemia Based on Deep Learning and Hybrid Techniques.

Authors:  Ibrahim Abunadi; Ebrahim Mohammed Senan
Journal:  Sensors (Basel)       Date:  2022-02-19       Impact factor: 3.576

3.  Early Diagnosis of Oral Squamous Cell Carcinoma Based on Histopathological Images Using Deep and Hybrid Learning Approaches.

Authors:  Suliman Mohamed Fati; Ebrahim Mohammed Senan; Yasir Javed
Journal:  Diagnostics (Basel)       Date:  2022-08-05
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.