Ahmad Shaker Abdalrada1, Jemal Abawajy2, Tahsien Al-Quraishi1, Sheikh Mohammed Shariful Islam3. 1. Faculty of Computer Science and Information Technology, Wasit University, Al Kut, Iraq. 2. School of Information Technology, Deakin University, Melbourne, VIC, Australia. 3. Institute for Physical Activity and Nutrition, Deakin University, 221 Burwood Highway, Burwood, Melbourne, VIC 3125, Australia.
Abstract
Background: Cardiac autonomic neuropathy (CAN) is a diabetes-related complication with increasing prevalence and remains challenging to detect in clinical settings. Machine learning (ML) approaches have the potential to predict CAN using clinical data. In this study, we aimed to develop and evaluate the performance of an ML model to predict early CAN occurrence in patients with diabetes. Methods: We used the diabetes complications screening research initiative data set containing 200 CAN-related tests on more than 2000 participants with type 2 diabetes in Australia. Data were collected on peripheral nerve functions, Ewing's tests, blood biochemistry, demographics, and medical history. The ML model was validated using 10-fold cross-validation, of which 90% were used in training the model and the remaining 10% was used in evaluating the performance of the model. Predictive accuracy was assessed by area under the receiver operating curve, and sensitivity, specificity, positive predictive value, and negative predictive value. Results: Of the 237 patients included, 105 were diagnosed with an early stage of CAN while the remaining 132 were healthy. The ML model showed outstanding performance for CAN prediction with receiver operating characteristic curve of 0.962 [95% confidence interval (CI) = 0.939-0.984], 87.34% accuracy, and 87.12% sensitivity. There was a significant and positive association between the ML model and CAN occurrence (p < 0.001). Conclusion: Our ML model has the potential to detect CAN at an early stage using Ewing's tests. This model might be useful for healthcare providers for predicting the occurrence of CAN in patients with diabetes, monitoring the progression, and providing timely intervention.
Background: Cardiac autonomic neuropathy (CAN) is a diabetes-related complication with increasing prevalence and remains challenging to detect in clinical settings. Machine learning (ML) approaches have the potential to predict CAN using clinical data. In this study, we aimed to develop and evaluate the performance of an ML model to predict early CAN occurrence in patients with diabetes. Methods: We used the diabetes complications screening research initiative data set containing 200 CAN-related tests on more than 2000 participants with type 2 diabetes in Australia. Data were collected on peripheral nerve functions, Ewing's tests, blood biochemistry, demographics, and medical history. The ML model was validated using 10-fold cross-validation, of which 90% were used in training the model and the remaining 10% was used in evaluating the performance of the model. Predictive accuracy was assessed by area under the receiver operating curve, and sensitivity, specificity, positive predictive value, and negative predictive value. Results: Of the 237 patients included, 105 were diagnosed with an early stage of CAN while the remaining 132 were healthy. The ML model showed outstanding performance for CAN prediction with receiver operating characteristic curve of 0.962 [95% confidence interval (CI) = 0.939-0.984], 87.34% accuracy, and 87.12% sensitivity. There was a significant and positive association between the ML model and CAN occurrence (p < 0.001). Conclusion: Our ML model has the potential to detect CAN at an early stage using Ewing's tests. This model might be useful for healthcare providers for predicting the occurrence of CAN in patients with diabetes, monitoring the progression, and providing timely intervention.
Diabetes is one of the significant burden of diseases globally with growing
prevalence and leads to increased morbidity and mortality.[1,2] The International Diabetes
Federation estimated that by 2040, one in every 10 people in the world would have diabetes.
Patients with diabetes often present with several co-morbidities, including
cardiovascular diseases,[4,5]
stroke, kidney diseases, depression,[6,7] which affect their quality of life
and lead to increased morbidity. Cardiac autonomic neuropathy (CAN) is one of
the most severe complications of diabetes resulting from a complex interaction of
blood glucose control, duration of disease, age, and blood pressure (BP).[9-11] CAN affect nerve fibers in
the blood vessels and the heart muscle in diabetic patients, thereby causing cardiac
arrhythmias, exercise intolerance, myocardial injury, silent myocardial ischemia,
stroke, and sudden cardiac deaths[12,13] and substantially impact the
quality of life of patients.[10,14] Furthermore, CAN
independently predict the progression of diabetic nephropathy and chronic kidney
disease in diabetes.
Longitudinal studies in patients with CAN have shown 5-year mortality rates
of 16–50%, with a high proportion attributed to sudden cardiac death.Globally, there is an increase in the occurrence of CAN, particularly in patients
with type 2 diabetes.[17,18] Previous studies estimated that 20–73% of type 2 diabetes
patients and about 17% of type 1 diabetes patients develop CAN.[19,20] Subclinical
CAN, manifested as changes in heart rate variability (HRV), may be detected within
1–2 years of diagnosis of diabetes.
The clinical symptoms of CAN, however, often do not appear until long after
diabetes onset, thus remaining undiagnosed until the disease progresses to an
advanced stage.
The early detection and appropriate management of CAN are essential to
prevent future complications and has been widely recommended.[15,22]Despite the importance of early detection, currently, there is no specific method for
predicting CAN and its progression.
CAN diagnosis is commonly performed in the clinics using Ewing’s test that
includes the assessment of HRV, orthostatic hypotension, and 24 h of BP profiles and
other variables in clinical settings.
There is, however, still debate about the diagnostic criteria and staging of
CAN using Ewing’s test.
A majority of the previous research conducted in this regard has focused on
using Ewing’s battery test alongside a wide range of risk factors to diagnose
CAN.[24-32] Although these methods have
performed well, the progression of CAN remains unclear. This has a significant
impact on the ability of healthcare providers to increase the awareness of patients
and to intervene promptly. In this study, we aimed at addressing this gap by
developing and validating a machine learning (ML) model to detect and predict the
progression of CAN in diabetic patients.
Methods
Design
We conducted secondary analysis from a retrospective cohort study. We developed
and evaluated the performance of an ML model for detecting and prediction of CAN
progression using the following steps: develop an ML model for CAN detection and
prediction of progress; investigate Ewing’s tests significance in the prediction
of normal and early categories of CAN; and evaluate the proposed ML predictive
model.
Participants, location, and data collection
We used the Diabetes Complications Screening Research Initiative (DiScRi) data
set,[26,33,34] which was collected at Charles Sturt University in Australia.
The DiScRi data set contains more than 200 variables conducted on more
than 2000 participants with type 2 diabetes in rural New South Wales, Australia,
between 2011 and 2014. Patients were recruited through a public media campaign,
including newspaper advertisements, radio and local television, and
advertisements in general practice and community health centers. Potential
participants were requested to contact the university if they wished to undergo
a health check, and an appointment was made to attend the clinic. All
participants older than 40 years were eligible to participate.
Participants with existing cardiovascular, respiratory, and renal disease
as well as depression, schizophrenia, and Parkinson’s disease were excluded. The
data collection procedure involved the following steps: All participants were
required to stop smoking or to consume drinks like alcohol and coffee 24 h
before being tested. They were required to fast, beginning from midnight prior
to the testing day. The tests were conducted from 9:00 a.m. to 12:00 p.m.
Variables and measurements
The data set contains a record of participants’ details, including demographic
data such as age and sex, and history of diabetes, heart attack, palpitations,
and atrial fibrillation. The data set also contained measurements of BP, body
mass index (BMI), blood glucose level (BGL), cholesterol profile, and
electrocardiography (ECG), and tests on peripheral nerve function, Ewing’s
battery tests, HRV, and attributes of different blood biochemistry.In the current study, we used Ewing’s tests, which include the five standard
tests for CAN proposed by Ewing and Clarke.
The five tests are (1) lying to standing heart rate (LSHR) change
expressed by 30:15 ratio. Such test indicates the ratio of longest R-R interval
(ranging from 20 to 40 beats) to the shortest R-R interval (ranging from 5 to 25
beats) produced by a change in position (from a horizontal position to vertical
position); (2) deep breathing heart rate (DBHR) change, which refers to the
evaluation of beat-to-beat heart rate variation (R-R variation) based on deep
breathing; (3) valsalva maneuver heart rate (VAHR) change measuring the response
of heart rate during and after increasing the intra-abdominal and intrathoracic
pressure; (4) handgrip blood pressure (HGBP) change measuring the change in
diastolic BP after using a handgrip dynamometer; and (5) lying to standing BP
(LSBP) change measuring the difference in the baroreflex-mediated BP after a
change in the position. The results of Ewing’s test are shown in Supplementary Table S2.Based on the Ewing’s tests, CAN has been categorized into five main classes: (1)
normal (all tests normal or one borderline); (2) early CAN (one of the three
heart rate tests abnormal or two borderline); (3) definite CAN (two or more of
the heart rate tests abnormal); (4) severe CAN (two or more of the heart rate
tests abnormal plus one or both of the BP tests abnormal or both borderline);
and (5) atypical CAN (any other combination of tests with abnormal results).
Because there were only a few patients with a severe, definitive, and
atypical CAN in this study, we excluded them from the analysis and only included
237 patients with normal and early CAN.
Ethics
Written informed consent was obtained from all participants before data
collection. The protocol for the DiScRi study was approved by the Ethics in
Human Research Committee of the Charles Sturt University (ID # 03/164).
The framework of the ML predictive model
Data analysis: Figure 1 shows the high-level
framework along the steps followed to construct the proposed model. We used
logistic regression which is one of the most common ML algorithms to develop
the model and compatibility with a wide range of tools and platforms like
R-language using the following formula
Figure 1.
A graphical description of the proposed model.
where P denotes the probability of an outcome occurrence and
z represents the linear combination function which is
inclusive of independent variables. An expression z can
also be given as followswhere
is represents the expected mean value of
P when all
; n denotes the number of the independent
variables;
is the regression coefficient of each independent variable
which is the influence of each independent variable on the likelihood of
value (P ), and νn represents the independent
variables that are included. Through the application of the linear
combination function given in equation (1) for the
Ewing’s tests as independent variables to determine an outcome (CAN
progress), the predictive model’s formula performs the computation of the
possibility of the occurrence of CAN progress based on the predictability of
Ewing’s tests as expressed belowwhere P denotes the possibility of the occurrence of an
outcome (CAN progress) based on the selected independent variables.
indicates the regression coefficients of Ewing’s test that
has been included.
refer to Ewing’s tests LSHR, DBHR, VAHR, HGBP, and LSBP,
respectively. The description of Ewing’s battery tests is presented
below.A graphical description of the proposed model.Subsequently, the Ewing’s tests coefficients were calculated using logistic
regression. The results obtained from the application of logistic regression
are presented in Table
1. For the logistic regression, the dependent variable is CAN,
whereas the independent variables are Ewing’s tests. Table 1 shows a significant
association between Ewing’s tests and CAN (p < 0.05),
and as such, have been included in the predictive model.
Table 1.
The results of logistic regression analysis.
Test
Coefficient
p value
Odds ratio
Confidence interval (%)
Intercept
28.88
0.000
3.48
LSHR
−4.68
0.013
0.009
0.000−0.373
DBHR
−0.488
0.000
0.614
0.53−0.711
VAHR
−11.57
0.000
0.009
0.00−0.001
HGBP
−0.197
0.000
0.821
0.754−0.894
LSBP
0.11
0.003
1.116
1.038−1.199
DBHR, deep breathing heart rate; HGBP, handgrip blood pressure;
LSBP, lying to standing blood pressure; LSHR, lying to standing
heart rate; VAHR, valsalva maneuver heart rate.
The results of logistic regression analysis.DBHR, deep breathing heart rate; HGBP, handgrip blood pressure;
LSBP, lying to standing blood pressure; LSHR, lying to standing
heart rate; VAHR, valsalva maneuver heart rate.As presented in Table
1, the
and coefficients of T1 and
T2 tests were
= −4.68, p = 0.013, and
= −0.488, p = 0.000, respectively. The
coefficients of T3 and
T4 tests were
= −11.57, p = 0.000, and
= −0.197, p = 0.000, respectively. As for
T5, the coefficient was
= 0.11, p = 0.000.Negative β coefficient values were obtained as the outcomes
of T1 (LSHR), T2
(DBHR), T3 (VAHR), and
T4 (HGBP) indicating that they have a
negative influence on the occurrence of the disease. In other words, the
influence of the tests on the occurrence of the disease is negative. In
contrast, positive β coefficient values were produced by
the T5 (LSBP), which is indicative of the
positive effect of the tests on the occurrence of the disease. This confirms
the correctness of the predictive model depending on Ewing’s rules to
diagnose the condition. Based on equation (3) and the
outcomes in Table
1, the predictive model was built as followsSubsequent to the construction of the predictive model of CAN, if new tests
are obtained from the clinic, for example, the predictive model can be used
in calculating the probability of the occurrence of CAN. The following cases
are classic illustrations of the application of the predictive model in
determining the probability of CAN. Assume the Ewing’s tests for patient A
are LSHR = 1.06, DBHR = 16, VAHR = 1.1, HGBP = 15, and LSBP = 10; patient B
are LSHR = 1.4, DBHR = 10, VAHR = 1.23, HGBP = 15, and LSBP = 8; and patient
C are LSHR = 1.1, DBHR = 14, VAHR = 1.1, HGBP = 14, and LSBP = 8. Through
the application of equation (4) with the tests
of the three patients, the probability that diabetic patient A will develop
CAN can be computed as followsFor patient B, the probability of developing CAN’s disease can be calculated
as followsThe probability of patient C developing CAN’s disease can be calculated as
follows
Evaluation of the predictive model
The evaluation of the efficiency of the proposed predictive model was done
using sensitivity, accuracy, specificity, type I error and type II error,
and confusion matrix measurements. The model was validated using 10-fold
cross-validation to obtain an unbiased evaluation of generalization error.
The whole data set was randomly divided into 10 subsets, of which nine (90%)
were used in training the model and the remaining 10% was used in testing
the performance of the model. The training and testing procedures were
carried out 10 times repeatedly. All experiments were performed using SPSS
version 20.0 (SPSS, Inc., Chicago, IL, USA) and R programming language on a
personal computer with Intel Core i5, CPU 3.4 GB, 16 GB RAM running Windows
10 operating system.
Performance metrics
The performance of the model was evaluated using the area under the curve
(AUC), which is referred to as a receiver operating characteristic (ROC)
curve using the following assessment measures:Sensitivity indicates the number of patients with CAN that are
correctly predictedSpecificity indicates the number of controls that are correctly
predictedAccuracy exposes the total number of the patients and controls that
are correctly predictedType I error (α) is the probability that patients will be diagnosed
into the control groupType II error refers to the probability that control group will be
diagnosed and categorized into patients’ group
Results
A total of 237 patients with type 2 diabetes participated in this study (55% females,
age range = 32–90 years), of which 105 were diagnosed with an early stage of CAN
while the remaining 132 participants were found to be healthy and disease-free. Of
the 132 healthy participants, the health statuses of 119 participants were correctly
predicted as healthy by the predictive model, and of 105 patients, 88 were correctly
predicted to be at the early stage of the disease. The confusion matrix (Table 2) shows the
predictive model has achieved a predictive accuracy of 87.34%. This performance is
achieved because all the included tests are significantly associated with the tested
categories of CAN (p < 0.05).
Table 2.
The confusion matrix.
Class
Normal
Early
Normal
119
13
Early
17
88
The confusion matrix.The predictive model was able to achieve a sensitivity value of 87.12%, being the
number of people that were correctly identified with the positive disease.
Meanwhile, the model was able to achieve specificity of 87.5%. Also, the predictive
model achieved values of 12.88% and 12.5% for type I error and type II error,
respectively. This is indicative of the ability of the model to efficiently predict
the probability of the occurrence of CAN. This model can be exploited with a wide
range of health conditions.The ROC curves of the predictive model are presented in Figure 2, while the AUC of the predictive
model is enlisted in Table
3. The predictive model achieved an ROC of 0.962% with a 95% confidence
interval of 0.939–0.984. In addition, the model achieved a significant difference
from 0.5, given that p value (asymptotic significance) is less than
0.05, indicating that the proposed model achieved a significant prediction better
than by chance.
Figure 2.
ROC curve of the predictive model.
Table 3.
The area under the curve of the predictive model.
Area under the curve
Test result variable(s): predicted
probability
Area
Std. error
Asymptotic sig.a
Asymptotic 95% confidence interval
Lower bound
Upper bound
0.962
0.012
0.000
0.939
0.984
P-value (Asymptotic Significance) is < 0.05.
ROC curve of the predictive model.The area under the curve of the predictive model.P-value (Asymptotic Significance) is < 0.05.
Discussion
In this article, we presented an ML model to predict the probability of CAN
occurrence in patients with diabetes. The results suggest that an ML method can
predict the risk of developing CAN in patients with diabetes in the primary care
setting. The model provided high accuracy (87.34%), sensitivity (87.12%) and
specificity (87.5%) as well as very high stability (0.962%). The cardiovascular
autonomic function abnormalities in CAN may develop before diabetes is manifested
and remain subclinical. Our results provide evidence of the suitability of the use
of the ML model to detect CAN early with minimum resources. The predictive model
also provided further evidence that Ewing’s tests can significantly predict the
normal and early categories of this condition. Thus, our model can be used by
healthcare providers to detect and follow the progression of CAN over time and take
appropriate measures.Our findings support and extend prior studies of CAN detection using Ewing’s tests
and ML models. In a previous study by Jelinek et al.,
the neurological diagnostics of CAN was examined on the basis of five Ewing’s
test attributes using decision tree classification like REPtree, J48, SimpleCart,
and NBTree. In a study by Kelarev et al.,
Ewing’s test features were combined with extra features from the clinical
data to detect CAN using feature selection based on random forest and multilevel
ensemble classifiers. A research conducted by Abawajy et al.
focused on enhancing the accuracy of the classification of CAN using blood
biochemistry features alongside the results of Ewing’s tests. They achieved the
enhancement of the classification accuracy through the use of the automated
iterative multitier ensemble (AIME), which makes use of a variety of ensemble
classifiers in each layer. The AIME achieved a high level of accuracy (99.57%) and
could be used where Ewing’s tests are not available. In the current study, however,
we used logistic regression which is more robust to detect CAN and its progression
rather than only detecting CAN as performed by previous researchers.Previous studies have attempted to predict CAN using different approaches. A hybrid
of wrapper filter feature selection was proposed by Huda et al.
using ECG features and Ewing’s tests. Based on Ripple down Rules, ensemble
classifiers were proposed by Kelarev et al.
to predict CAN using Ewing’s tests. Kelarev et al.
proposed ensemble classifiers based on decision tree classifiers for the
monitoring of diabetic patients through using ECG features and Ewing’s tests. A
different study conducted by Abawajy et al.
proposed a multitier ensemble classifier for the prediction of CAN, based on
the QRS features of ECG and Ewing’s tests. Jelinek et al.
discussed an iterative multilayer attributes selection and classification
model, which used Ewing’s tests and HRV for the prediction of CAN. A meta-ensemble
model was proposed in another study
to investigate the prediction of CAN using HRV. Although these
models[24-32] demonstrated superior
performances, they did not address the problem of predicting the probability of CAN
occurrence in patients with diabetes. In addition, the models were more general,
where they have been proposed to predict the categories of the disease (e.g. normal,
early, severe, definitive, abnormal). In contrast, our ML model can measure the
relationship between the CAN categories and Ewing’s tests by estimating
probabilities using a logistic function, which is the cumulative logistic
distribution and a simple novel method.The main strength of the study is the applicability of the results to the primary
care diabetes population in rural settings. This study has limitations that should
be considered. First, the small sample size might not be representative of all
diabetes patients. Therefore, the results should be interpreted with caution.
Second, individuals in our study were diagnosed using only the Ewing’s test with
normal and early categories. Third, the probabilities predicted by our model were
not calibrated to true occurrence probabilities. Furthermore, we did not correlate
our model with clinical findings and biochemical data. In future, we aim to compare
the model with various clinical and biochemical tests in more extensive studies with
a more varied population.CAN is a serious medical complication of diabetes and an independent predictor of
cardiovascular morbidity and mortality. The importance of early diagnosis of CAN is
widely recognized. Diagnosis of CAN, however, is problematic due to the medical
expertise required for evaluating the data from Ewing’s test even when resources for
the test are available. More broadly, our proposed model points toward the
application of innovative methods for the early detection of CAN. The unique feature
of our model is that it is easy and simple to employ and does not require a complex
code. Thus, our model can be applied as a mobile application or online tool,
for example, in combination with the National Heart Foundation of Australia
heart age calculator which is commonly used by clinicians in general practice
(https://www.heartfoundation.org.au/heart-age-calculator). These
results might be useful for healthcare providers to predict the early occurrence of
CAN in people with diabetes and support them in combination with digital
technologies[39-42] such as wearable devices,
smartphone apps
and text messages[45,46] which have shown to be cost-effective.[47,48]Furthermore, our study suggests several directions for future work. The ML models can
be potentially applied in primary healthcare settings to early detect CAN by general
practitioners and specialist nurses where there is a lack of trained healthcare
specialists. Clinicians may use these models to monitor the progress of CAN using
cheap and straightforward digital health tools such as web-based and mobile phone
applications in their clinics. This may lead to early detection and prevention of
CAN-related complications and morbidity. Furthermore, the methods used in this study
offer a principled way to help inform individualized care using routine data from
clinics. Before implementation in clinical practice, however, further research using
large local data sets and randomized trials evaluating the effectiveness and
cost-effectiveness is recommended.
Conclusion
The ML model developed in this study showed to predict early-stage CAN, which might
be useful for both healthcare providers and patients for early intervention, thereby
leading to the prevention of CAN-related complications.Click here for additional data file.Supplemental material, sj-docx-1-tae-10.1177_20420188221086693 for Prediction of
cardiac autonomic neuropathy using a machine learning model in patients with
diabetes by Ahmad Shaker Abdalrada, Jemal Abawajy, Tahsien Al-Quraishi and
Sheikh Mohammed Shariful Islam in Therapeutic Advances in Endocrinology and
Metabolism
Authors: Smriti Banthia; Daniel W Bergner; Alexandru B Chicos; Jason Ng; Daniel J Pelchovitz; Haris Subacius; Alan H Kadish; Jeffrey J Goldberger Journal: J Diabetes Complications Date: 2012-10-16 Impact factor: 2.852
Authors: Solomon Tesfaye; Andrew J M Boulton; Peter J Dyck; Roy Freeman; Michael Horowitz; Peter Kempler; Giuseppe Lauria; Rayaz A Malik; Vincenza Spallone; Aaron Vinik; Luciano Bernardi; Paul Valensi Journal: Diabetes Care Date: 2010-10 Impact factor: 19.112
Authors: Sheikh Mohammed Shariful Islam; Shyfuddin Ahmed; Riaz Uddin; Muhammad U Siddiqui; Mahsa Malekahmadi; Abdullah Al Mamun; Roohallah Alizadehsani; Abbas Khosravi; Saeid Nahavandi Journal: J Diabetes Metab Disord Date: 2021-02-15
Authors: Sheikh Mohammed Shariful Islam; Rebecca Nourse; Riaz Uddin; Jonathan C Rawstorn; Ralph Maddison Journal: Front Cardiovasc Med Date: 2022-06-29