Literature DB >> 30170646

A novel approach for the prediction of treadmill test in cardiology using data mining algorithms implemented as a mobile application.

A Jerline Amutha¹, R Padmajavalli², D Prabhakar³.

Abstract

OBJECTIVE: To develop a mobile app called "TMT Predict" to predict the results of Treadmill Test (TMT), using data mining techniques applied to a clinical dataset using minimal clinical attributes. To prospectively test the results of the app in realtime to TMT and correlate with coronary angiogram results.
METHODS: In this study, instead of statistics, data mining approach has been utilized for the prediction of the results of TMT by analyzing the clinical records of 1000 cardiac patients. This research employed the Decision Tree algorithm, a new modified version of K-Nearest Neighbor (KNN) algorithm, K-Sorting and Searching (KSS). Furthermore, curve fitting mathematical technique was used to improve the Accuracy. The system used six clinical attributes such as age, gender, body mass index (BMI), dyslipidemia, diabetes mellitus and systemic hypertension. An Android app called "TMT Predict" was developed, wherein all three inputs were combined and analyzed. The final result is based on the dominating values of the three results. The app was further tested prospectively in 300 patients to predict the results of TMT and correlate with Coronary angiography.
RESULTS: The accuracy of predicting the result of a TMT using data mining algorithms, Decision Tree and K-Sorting & Searching (KSS) were 73% and 78%, respectively. The mathematical method curve fitting predicted with 82% accuracy. The accuracy of the mobile app "TMT Predict", improved to 84%. Age-wise analysis of the results show that the accuracy of the app dips when the age is more than 60years indicating that there may be other factors like retirement stress that may have to be included. This gives scope for future research also. In the prospective study, the positive and negative predictive values of the app for the results of TMT and coronary angiogram were found to be 40% and 83% for TMT and 52% and 80% for coronary angiogram. The negative predictive value of the app was high, indicating that it is a good screening tool to rule out coronary artery heart disease (CAHD).
CONCLUSION: "TMT Predict" is a simple user-friendly android app, which uses six simple clinical attributes to predict the results of TMT. The app has a high negative predictive value indicating that it is a useful tool to rule out CAHD. The "TMT Predict" could be a future digital replacement for the manual TMT as an initial screening tool to rule out CAHD.

Entities: Chemical Disease Gene Species

Keywords: Cardiology; Curve fitting; K-Nearest neighbour (KNN); K-Sorting &; Pattern recognition; Searching (KSS); Treadmill test (TMT)

Mesh：

Year: 2018 PMID： 30170646 PMCID： PMC6117803 DOI： 10.1016/j.ihj.2018.01.011

Source DB: PubMed Journal: Indian Heart J ISSN： 0019-4832

Introduction

All over the world, cardiovascular diseases account for a considerable percentage of untimely deaths. A statistical study conducted by WHO in 2012 revealed that, around 17.5 million people had died due to coronary artery heart disease (CAHD) around the world. In India, nearly 45 million people suffer from cardiac diseases, out of which one in four Indians die of CAHD every year. Inadequate awareness and lack of seriousness often result in the loss of precious lives. Periodical screening and risk factor modification are mandatory to minimize this epidemic of CAHD. The earlier studies found that the non-modifiable risk factors such as age, gender and family history were the main reasons responsible for the heart disease. But subsequent studies revealed that modifiable risk factors, namely, the lack of exercise, high fat intake, overweight, stress, smoking, alcohol, diabetes and systemic hypertension, were equally responsible for the development of heart diseases. As a result, the combination of non-modifiable and modifiable risk factors are identified as important parameters to be evaluated to predict CAHD. Exercise treadmill test (TMT) is an important screening test to detect provocable ischemia. TMT is time consuming and requires physical stress. There is also a small element of risk in case of serious silent CAHD. To simplify and safely perform a screening procedure, data mining and the curve fitting mathematical procedure were applied to develop a mobile application called “TMT Predict” as a replacement for TMT. This app aimed at predicting the result of the TMT without the person actually undergoing a TMT. The methodology for prediction includes data mining algorithms such as Decision Tree, K-Sorting & Searching (KSS) which is a new modification in K-Nearest Neighbor (KNN) and the mathematical curve fitting method.

Methods

At the global level, development of methods and models to improve the accuracy of diagnostic techniques is an ongoing process and a lot of research is still underway. Automatic diagnosis of heart disease by taking into consideration, a number of clinical attributes (more than 13)7, 8 using data mining algorithms and evaluating its Accuracy have been done since 2001. The data mining strategies like association, clustering, classification and prediction are widely used in virtually every field.10, 11 Within the medical field, the classification and prediction of various diseases have frequently been analyzed. A technically flexible and strong combination of data mining and machine learning, provide a better platform for the analysis of a large volume of data. In accordance with the same, an android mobile application called “TMT Predict” as shown in Fig. 4, was developed to predict the results of TMT. This app utilizes data mining methodologies with a minimum number of six clinical attributes. The attributes of age, sex, height and weight are entered in the app whereas BMI is calculated by the app. If the patient is known to have diabetes, hypertension or dyslipidemia, this can be entered as yes or no. There is another option to enter the values also wherein the app recognizes systemic hypertension (if BP is >139/89 mmHg), diabetes mellitus (if fasting blood sugar > 125 mg/dl or postprandial blood sugar > 199 mg/dl), and dyslipidemia (HDL < 40 mg/dl in males, HDL < 50 mg/dl in females, LDL > 70 mg/dl in diabetes, LDL > 100 mg/dl in SHT, or LDL > 130) in others.

Fig. 4

Screen Shots of the Mobile app “TMT Predict” for Known and Unknown Cases.

Work flow

As illustrated in Fig. 1, the first step is data collection followed by data pre-processing, classification and clustering. Then, the data is ready for the application of algorithms to predict the results of TMT.

Fig. 1

Work flow of the Preparation of the Model.

Work flow of the Preparation of the Model. Two data mining algorithms, Decision Tree and KSS were applied on the processed data. In addition, a mathematical procedure called as curve fitting was also applied.

Data Description

The cardiology clinical dataset was acquired from a cardiology center at Chennai, where TMT was performed as a test for provocable ischemia in asymptomatic individuals without CAHD. The patient case details were electronically recorded and contain demographic details, risk factor assessment and all the test results including details of treatment and follow-up. The data is stored in cloud as MS-Excel spread sheets.

Data Pre-Processing

The data taken for the current study was pre-processed to eliminate inconsistent, noisy and redundant data. From this processed data set, 1000 TMT records were separated for analysis. As discussed in paper, initially all the clinical attributes were considered for classification, but, analysis of the impact of each attribute on TMT helped in identifying the most significant attributes. The six most significant attributes were age, sex, BMI, diabetes, dyslipidemia and systemic hypertension. Ultimately the most significant attributes were finalized using the entropy for each attribute as given in Eq. (1).where p represents positive cases and (1 – p) represents negative cases. Further, the Boolean attributes such as gender, diabetes, hypertension and dyslipidemia were converted into numeric representation, for example, the gender attribute has a value of one for male and zero for female.

Data Analysis

In order to develop the “TMT Predict” mobile app, as per the data mining approach, the TMT dataset was divided into a training set of 750 records, and a test set of 250 records. The training set was used to formulate a method for the prediction by considering the dependency of the attributes and the test data was used to test the formulated method. The preliminary analysis of the most significant attributes identified “gender” as one of the prime factors in the diagnosis of the heart disease. This led to the classification of the entire training data set into male and female Classes. Subsequent analysis of data revealed that, each age group (such as age group ≤ 30, 31–40, 41–50, 51–60 and above 60) had a different behaviour on TMT result for each gender. This led to the clustering of data into different groups by means of their ‘age’ category.

Implementation of Decision Tree Method

The Decision Tree method was applied on the Classified and Clustered dataset which was converted to Classification IF-THEN rules by tracing the path from the root node to each leaf node in the tree (Fig. 2). For example one of the rules extracted from the tree is:

Fig. 2

Decision Tree.

Decision Tree. R1: IF Gender = Female & Age ≤ 30 & Dyslipidemia = no & Diabetes = no &BMI < 25THEN TMT class = ’Negative’ From Table 1, it is evident that the application of rules of Decision Tree yields 73% Accuracy, which can be improved. With the scope of increasing the Accuracy, the analysis has been advanced to device an alternate algorithm and the search has been extended to find the procedures for the exact prediction of TMT result.

Table 1

Results of 250 test data for Decision Tree, Curve Fitting, KSS Pattern matching and the proposed TMT Predict along with its Accuracy Graph.

Pattern Recognition

In order to improve the Accuracy, the Pattern Recognition strategy has been proposed. Comparing the new patient’s data with the existing information helps in increasing the accurate detection of the illness. With respect to Data Mining Pattern Recognition procedure, KNN method was chosen and applied in our research for the prediction of TMT. KNN with its characteristic low Accuracy& time complexity leads to a revised procedure in terms of ‘Divide and Conquer’ method. This algorithm is named as ‘K-Sorting & Searching (KSS)’.

Implementation of K-Sorting & Searching (KSS) Algorithm:

While handling large amount of data, searching for a particular pattern in large database is a time consuming process. For the efficient recognition of the pattern, KSS algorithm can be applied which is based on “Divide and Conquer” method. The Algorithm is as follows K-Sorting & Searching (KSS) algorithm Label the most significant numeric attribute as SORT_FACT. Arrange the records in the TRANING data set in ascending order of SORT_FACT. Divide the TRANING data into different ranges based on SORT_FACT. Assign k value for each range in such a way that k > = 0. Store the first record SEQ_NO in FIRST_POINT[k] and the last record SEQ_NO in LAST_POINT[k] for each category of k value. Get the TEST data Assign k value for the TEST data based on the range defined by SORT_FACT. Call DIS_CAL (TRAINING, TEST) for TRAINING = FIRST_POINT[k] to LAST_POINT[k] TRAINING record which has minimum distance with the TEST record will be considered as the most similar search result and the corresponding pattern will be the required output. End. DIS_CAL (TRINING, TEST)where xi = yi = > DH (xi,yi) = 0x Find the Hamming distance (DH) between the attributes of TRAINING and TEST data. Let xi be an attribute of a TEST data and yi be an attribute of the TRAINING data where i = 1,2,3… then the Hamming distance between xi and yi is as given in Eq. (2). While applying KSS algorithm on the TMT dataset, the variable SORT_FACT was identified as age, which is a most significant attribute in the given dataset and all the patients’ details were sorted based on SORT_FACT (age). The ranges were defined based on SORT_FACT and the k value was assigned for each range as follows, If age ≤ 30 then k = 0; If age = 31–40 then k = 1; If age = 41–50 then k = 2; If age = 51–60 then k = 3; If age ≥ 60 then k = 4. Let us consider the TEST data with Age = 45. Unlike KNN, in this method it is not necessary to search the pattern in the entire training data. Instead, according to the KSS algorithm, k was assigned a value of 2 as in step 7 of KSS algorithm. The Hamming distance (Eq. (2)) was calculated between the TEST data and the TRAINING data which is in the range of FIRST_POINT to LAST_POINT 2 as described in step 8 of KSS algorithm. The TRAINING data which has minimum distance with the TEST data in the range k = 2 was taken as the search result. The TMT value corresponding to the TRAINING data was the predicted TMT result of the TEST data. MATLAB software was used for programming purpose to test this procedure before its usage in the app. The results obtained reveal the time complexity of O (n log n). As listed in Table 1, the Accuracy was 78% when KSS was applied on TMT Prediction which is an improvement over Decision Tree.

Implementation of Curve Fitting Method

The mathematical derivations such as polynomial methods were used to identify the relationship with simple empirical models. This was applied on the existing system in order to find the mathematical relationship between the attributes for the prediction of TMT result. Polynomial models for Curves are given by Eq. (3) as cited in.where n + 1 is the order of the polynomial, n is the degree of the polynomial where 1 ≤ n ≤ 9. In order to derive the Polynomial function, the attribute combination of age and TMT, with respect to BMI, systemic hypertension, dyslipidemia and diabetes, were considered individually through Curve Fitting methodology, and executed with the help of MATLAB Curve Fitting tool. The Polynomial derived for the attribute combination of age and TMT with respect to BMI is given in Eq. (4).The Goodness of Fit for the polynomial was calculated in terms of Sum of Squared Errors (SSE), R-Square, Adjusted R-Square and Root Mean Square Deviation (RMSE).The corresponding values are as in the list given below. Similarly other polynomial functions were retrieved for other attribute combinations.where f(x,y) represents TMT x represents Age y represents BMI p00, p10 & p01 represent coefficients Coefficients (with 95% confidence bounds): p00 = − 0.919 (−2.126, 0.2878) p10 = 0.02791 (0.003078, 0.05274) p01 = − 0.006745 (−0.02655, 0.01306) Goodness of fit: Sum of Squared Errors (SSE): 14.27 R-square: 0.04722 Adjusted R-square: 0.0289 Root Mean Square Deviation (RMSE): 0.3704 The derived polynomial functions were reduced to a single polynomial equation with all the attributes for the exact prediction of TMT. The resultant TMT values are analyzed as shown in Fig. 3. This reveals that for instance, in the case of patients whose age group is less than 30 and if the calculated TMT value is less than 1, then the TMT result of the model will be taken as Negative. Otherwise it would be Positive (for age group ≤30). Similarly, the TMT range has been fixed for other age groups to classify the Positive and Negative TMT. The result analysis, in Table 1, shows that the Accuracy value of this procedure is found to be 82%, which is 9% increase from the Decision Tree methodology Fig. 4.

Fig. 3

An illustration for the mathematical calculation of TMT value &The Curve Fitting for TMT with Age and BMI.

An illustration for the mathematical calculation of TMT value &The Curve Fitting for TMT with Age and BMI. Screen Shots of the Mobile app “TMT Predict” for Known and Unknown Cases.

TMT Predict

The three methods were applied to develop a simple user-friendly Android App called “ TMT Predict”. The app “TMT Predict”, predicts the results of TMT for each of the three methods independently and compares the results. If the results of the Decision Tree, Curve Fitting and Pattern recognition methods are identical then “TMT Predict” will also deliver the same result. Otherwise, the two leading results out of the three methods can be treated as the final TMT result of the app. For example, if the result of TMT in case of KSS and Curve Fitting are positive but, Decision Tree is negative then the result of “TMT Predict” will be taken as positive. Compared to the three methods implemented individually, the results of the “TMT Predict”, obtained using the combination of three methods, were higher and has been implemented in the Android mobile application. The Accuracy has been tested in real time environment and will be uploaded in the Google Play Store for the patients’ usage.

Evaluation methods

Evaluation of a model is the integral part of developing a Classifier/Predictor.19, 20 The app was tested in real time with 300 new patients, and the positive, negative predictive values and the accuracy in predicting the result of the TMT were calculated from True Negative (TN), True Positive (TP), False Negative (FN) and False Positive(FP) values. Accuracy = (TN + TP)/(TN + TP + FN + FP). Positive Predictive Value = TP/(TP + FP). Negative Predictive Value = TN/(TN + FN).

Results & discussions

The three different methods Decision Tree, KSS and Curve Fitting, along with the “TMT Predict” app, were assessed in the test set of 250 patients. The results are shown in Table 1. The Accuracy of the Decision Tree, KSS and Curve Fitting was 73%, 78% and 82% respectively. The “TMT Predict” app which combined all the three methods improved the accuracy to 84%. Results of 250 test data for Decision Tree, Curve Fitting, KSS Pattern matching and the proposed TMT Predict along with its Accuracy Graph. The results were analyzed for different age categories as shown in Table 1. The accuracy of the app when the age was greater than 60 years decreased from 84% to 76%. This raises the question of other influencing factors like retirement stress which may have to be considered in this age group. This opens up a new avenue for future research. When the working age category (age between 30 years and 60 years) was considered, the accuracy of the “TMT Predict” app improved from 84% to 88%. This behavior of the dataset paves the way for further inclusion of other compounding risk factors in different age groups, which is the future scope of this work. The flexibility of the app is that it can be modified with new attributes in subsequent versions. As the final step, the app was tested prospectively on 300 new patients. The patient count was taken as 50 for each group and the results are tabulated in Table 2. The sensitivity and specificity of the “TMT Predict” app were 35% and 77%, respectively. Based on the TMT Predict, the Coronary Angiographic results for the same set of patients were also analyzed, which showed a specificity of 80%, whereas the sensitivity remained the same at 35%. This indicates that the TMT Predict app is more specific than the TMT itself in predicting the results of coronary angiography. For the real time analysis, the Positive and Negative Predictive values were calculated. The Positive Predictive Value and Negative Predictive Values for the TMT were 40% and 83% and for coronary angiogram, it was 52% and 80%. The Negative Predictive value of the app was high, indicating that it is a good screening tool to rule out CAHD.

Table 2

The real time result for 300 patients through “TMT Predict” app.

Real time Analysis	Evaluative Metrics	I	II	III	IV	V	VI	Total
Mobile App- “TMT Predict”	TP	2	5	6	2	4	5	24
	TN	33	29	31	25	29	32	179
	FP	10	9	6	10	10	9	54
	FN	5	7	7	13	7	4	43
	Se%	29	42	60	15	36	55	35
	Sp%	77	76	84	71	74	78	77
	Acc%	70	68	74	54	66	74	67

Angio	TP	2	4	7	2	3	6	24
	TN	33	31	33	26	32	33	188
	FP	5	5	5	12	4	3	34
	FN	10	10	5	10	11	8	54
	Se%	20	29	58	17	21	43	35
	Sp%	87	86	94	68	89	92	80
	Acc%	70	70	80	56	70	78	71

The real time result for 300 patients through “TMT Predict” app. There are no similar datasets in literature using multiple Data mining techniques to predict the results of TMT.

Conclusion

The “TMT Predict” app is a novel method to predict the result of TMT. It is a simple, user-friendly app requiring only six parameters (age, sex, BMI, diabetes, dyslipidemia, and systemic hypertension). Utilizing Data Mining techniques (Decision Tree, KSS method based on KNN algorithm), and the Curve Fitting method, the app was able to give a very high Negative Predictive value although the sensitivity was not satisfactory. This makes the app an useful tool to “rule out” CAHD. The data analysis has also given interesting outputs like reduced Accuracy if the age is more than 60 years, indicating that there are other factors that will have to be analyzed in the future. The methodologies used in this app are simple and flexible giving scope for future refinement. The proposed “TMT Predict” app can be a digital replacement for the manual TMT in the future.

7 in total

A novel approach for the prediction of treadmill test in cardiology using data mining algorithms implemented as a mobile application.

Introduction

Methods

Work flow

Data Description

Data Pre-Processing

Data Analysis

Implementation of Decision Tree Method

Pattern Recognition

Implementation of K-Sorting & Searching (KSS) Algorithm:

Implementation of Curve Fitting Method

TMT Predict

Evaluation methods

Results & discussions

Conclusion

1. Knowledge discovery with classification rules in a cardiovascular dataset.

2. Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.

Review 3. Knowledge discovery in cardiology: A systematic literature review.

4. A data mining approach for diagnosis of coronary artery disease.

5. Data mining for rapid prediction of facility fit and debottlenecking of biomanufacturing facilities.

6. Understanding and using sensitivity, specificity and predictive values.

7. Part 1: Simple Definition and Calculation of Accuracy, Sensitivity and Specificity.