Literature DB >> 33286846

Modified Distribution Entropy as a Complexity Measure of Heart Rate Variability (HRV) Signal.

Radhagayathri Udhayakumar¹, Chandan Karmakar¹, Peng Li², Xinpei Wang³, Marimuthu Palaniswami⁴.

Abstract

The complexity of a heart rate variability (HRV) signal is considered an important nonlinear feature to detect cardiac abnormalities. This work aims at explaining the physiological meaning of a recently developed complexity measurement method, namely, distribution entropy (DistEn), in the context of HRV signal analysis. We thereby propose modified distribution entropy (mDistEn) to remove the physiological discrepancy involved in the computation of DistEn. The proposed method generates a distance matrix that is devoid of over-exerted multi-lag signal changes. Restricted element selection in the distance matrix makes "mDistEn" a computationally inexpensive and physiologically more relevant complexity measure in comparison to DistEn.

Entities: Chemical Disease Gene Species

Keywords: Shannon entropy; complexity analysis; distribution entropy; heart rate variability

Year: 2020 PMID： 33286846 PMCID： PMC7597155 DOI： 10.3390/e22101077

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Heart rate variability (HRV) analysis is a powerful non-invasive method used to examine the functioning of the autonomic nervous system (ANS). It is useful to understand the interplay between the sympathetic and parasympathetic wings of ANS that serve to speed up and slow down the heart rate respectively [1]. HRV, a variation of the time period between consecutive heart beats ( intervals), is thought to reflect the heart’s adaptability to changing physiological conditions. Various HRV measures are considered to be critical bio-markers for understanding and diagnosing cardiac health [2,3]. Popular non-linear entropy statistics such as and are significant bio-markers that measure the extent of irregularities contained in HRV signals [4,5,6]. Physiological signals are highly non-linear in nature, so it is important to use non-linear tools of analysis over the linear ones [7,8,9,10]. The functioning of a healthy cardiac system is associated with higher complexity than one with some sort of cardiac ailment. A high level of complexity does not necessarily indicate a high level of irregularity [11]. and , being measures of irregularity [12,13], do not always translate to the level of complexity contained in the underlying system. and assess a signal’s state of orderliness (or chaos) by surveying existential patterns interpreted from the signal. An irregular signal may not always be associated with a high level of complexity and vice versa. For example, when an original time series (say, one that represents an underlying complex system) is randomized to form its surrogate time series, or will be higher for the surrogate series than the original. However, is this increase in randomness (or entropy) also a reflection of increase in complexity of the representative system? No, because technically, randomization breaks the inherent structure of the originally complex series, leading to information loss, in other words a loss of content/complexity [14]. Many previous studies have reported higher irregularity in arrhythmic cardiac signals than their healthy counterparts [11,15]. However, an arrhythmic heart functions with a much lower level of complexity than a healthy one. In such a case, analyzing complexity apart from irregularity becomes very significant. Distribution entropy () is a recently introduced measure of signal “complexity”. It is calculated from the empirical probability distribution function () of vector-to-vector distances of the signal [16]. has been used to extract complexity information (rather than irregularity) from HRV signals [16,17,18]. follows the same conceptual strategy as and . However, unlike or , (1) quantifies complexity, not irregularity and (2) is computationally superior, since it does not require use of the most critical [4,19] parameter r (tolerance) like or do [16]. is a function of three parameters:m data length N, embedding dimension m and number of bins M used in the probability distribution. In most cases, is known to be less influenced by changes in N ad M [16,20]. Additionally, performs better than other entropy measures, especially for short length signals [16]. ’s efficiency as a complexity measure and bio-marker has been tested and proved good in the cases of both synthetic and physiological signals [16]. In this study, we explore the physiological relevance of in HRV analysis. We hypothesized that such an exploration could answer significant questions. For instance: (1) Is the quantified value a direct consequence of any underlying physiological mechanism? (2) In measurement, can the distance between template vectors be mapped to change in a physiological factor? Consequently, we introduce a variant of ; “modified distribution entropy (),” which is defined considering the underlying physiology of a HRV signal. Finally, the efficacy of is compared to that of , as a bio-marker of cardiac health. The novelty of this modified algorithm lies in the way takes advantage of the distances between vectors within a certain time lag instead of collecting the distances across all vectors in the state space, the way original does. Synthetic: Logistic time series at two different levels of irregularity were used for the study. The data were generated using the logistic map using MATLAB R2019b. The initial value was set as 0.5. The constant a represents the level of irregularity in the generated signal; for a “periodic” time-series and for a “chaotic” one. While generating the time-series, the function also adds a random noise to the signal as follows: , where . Here is a normally distributed signal of random numbers, of the same length as . The (noise standard deviation divided by the standard deviation of the noise-free time series) of the function is set at 0.1. represents the standard deviation. Ten different realizations (difference being created by the new random noise added each time) were synthesized at each level of irregularity, namely, “periodic” and “chaotic.” We only used logistic map to produce time-series with chaotic and periodic regimes since it has been the simplest and most widely used on synthetic data examples to demonstrate entropy level variations [5,16,21,22,23]. Data lengths of 50, 100, 200, 500 and 1000 were used for the generation. Physiological: All real time RR interval data were obtained from the PhysioNet database [24]. Corrected beat annotation files were available from the database. These were further manually corrected to remove the ectopic beats. The data included: (i) Healthy: RR interval time-series of 72 normal sinus rhythm subjects were obtained from PhysioNet, which included 18 subjects from the MIT-BIH Normal Sinus Rhythm database (nsrdb) and 54 subjects from Normal Sinus Rhythm RR Interval database (nsr2db). (ii) Diseased: RR interval time-series of diseased subjects were obtained from the MIT-BIH database of PhysioNet, constituting (a) 48 arrhythmic data extracted from 47 subjects [25]. The recordings were digitized at 360 samples per second per channel with 1-bit resolution over a 10 mV range; (b) 25 atrial fibrillated data [25], each sampled at 250 samples per second with 12-bit resolution over a range of 10 millivolts. Atrial fibrillation is a specific category of arrhythmia related to paroxysmal atrial malfunctions. Atrial fibrillation is the most common form of arrhythmia and can occur as a post-surgical event, unlike many other common arrhythmias. After direct extraction of RR interval series from all data, each signal segment was selected from the beginning by varying length from 50 to 1000 (total 5 different lengths—50, 100, 200, 500 and 1000 beats).

2. Data and Methods

2.2. Distribution Entropy

Distribution entropy () is calculated based on the empirical probability distribution function () of distances among vectors formed from a given time series [16]. For given time series data of length N and embedding dimension m, is calculated as follows: Form vectors of length m each, given by where Take each vector of step 1 as a template vector and find its distance from every vector , where the distance is given by This when repeated for all i-th template vectors where , a distance matrix D of dimension is formed as shown below From matrix (3), it is evident that elements in D are being repeated twice, i.e., . This is true because the distances are absolute values as can be seen from Equation (2). Thus, in formulating , it becomes sufficient to use either the upper triangle or lower triangle of D [16]. Here, we use the upper triangle only and denote the resulting matrix as , where The elements of distance matrix are now divided equally into M number of bins and the corresponding histogram is obtained. Now, at each bin t of the histogram, its probability is estimated as for . is the probability of the i-th bin in the histogram. By the definition of Shannon entropy, the normalized of a given time series is defined by the expression

2.3. Modified Distribution Entropy

2.3.1. Physiological Explanation of Distance in Measurement for HRV Signal

Let an inter-heartbeat RR interval time series of length N be defined as For an embedding dimension m, template vectors can be defined using Equation (1) and for the template vectors of will be: Now, the distance of vectors from template vector can be computed using Equation (2) as follows: where and i denotes the i-th RR interval and l is the lag or delay used to calculate the change between RR intervals (shown in Figure 1). Similarly, for embedding dimension , the template vectors can be defined as:

Figure 1

Changes of individual RR intervals from their l lagged RR interval for embedding dimension .

Now, the distance of vectors from template vector can be computed using Equation (2) as follows: This signifies that quantifies the maximum of changes of individual RR interval from its l lagged or delayed RR interval for embedding dimension (shown in Figure 1). Therefore, the generalized distance Equation (2) can be rewritten with respect to RR interval signal as: Therefore, is a measure of the Shannon entropy of change of an RR interval calculated for lags ranging from 1:. The embedding dimension m controls the calculation of change by defining the number of candidates for maximum change calculation.

2.3.2. Elimination of

From the analytical explanation of , it is obvious that it measures the entropy of the change or the derivative of the HRV signal at all lags 1:. Therefore, the maximum lag at which the change is measured depends on the data length N and embedding dimension m. Since , we can say that the maximum lag predominantly depends on the length of the signal. The physiological discrepancy in defining lies behind this dependency of lag on data length. If we consider the physiological mechanism of heart rate variability, the effect of the present heart beat on future heart beats is defined by the properties of cardiovascular mechanisms rather than recording length or number of heart beats. Therefore, the use of lags based on data length (for calculating change in HRV) may mostly assess random phenomena rather than physiological information. In previous studies, it has been reported that a heartbeat’s influence is felt on an average of only 6–10 beats following it [26,27]. Thus it becomes physiologically irrelevant to find the change between a given beat and all other beats following it, as is done in the case of . Thus, from , it is physiologically justified to remove all changes corresponding to lags . This modification to results in This modified distance matrix (13) is now subjected to Shannon entropy calculation using steps 5 to 7 of Section 2.2 for evaluation of modified distribution entropy () of the signal.

2.4. Statistical Analysis

In order to test the efficiency of regularity measures as classification features, we need to find their strength in separating data belonging to different classes. In our study, we have used the statistical test parameters p and AUC for the purpose. The p-value obtained using Mann–Whitney U test represents the probability of X and Y belonging to continuous distributions of the same median, where X and Y are samples taken from two independent populations. p can take values from 0 to 1 and in this study we have considered 0.05 as statistical significance. AUC, the area under the ROC (receiver operating characteristic) curve is the probability that a classifier ranks a randomly chosen instance X higher than a randomly chosen instance Y—X and Y being samples taken from two independent populations. An AUC value of 0.5 indicates that the distributions of the features are similar in the two groups with no discriminatory power. Conversely, an ROC area value of 1.0 would mean that the distributions of the features of the two groups do not overlap at all. The statistics toolbox of MATLAB R2019b was used to perform all statistical tests.

3. Results

3.1. Effect of Eliminating from

For a data of length , the average was calculated for each lag l ranging from 1 to 99; the histogram consisted of elements of D corresponding to lags 1:l. The embedding dimension value was 2 and the value of parameter M wass kept fixed at 500. As can be seen from Figure 2, Figure 3 and Figure 4, the entropy values obtained using lags from 1 to 10 (i.e., ) were 0.4838, 0.9066 and 0.3885 (marked by a vertical blue line in each sub graph) for periodic, chaotic and healthy RR interval time series respectively. These values increased by 0.0804, 0.0665 and 0.0266 respectively using measure, i.e., considering lags from 1:98. The increase in entropy values due to the addition of elements corresponding to lags over 10 was negligible compared to the already attained values from the first 10 lags.

Figure 2

Average of periodic data (10 realizations) as a function of lag. Blue line indicates the end of first 10 lags. calculated using the first 10 lags was 0.4838, while calculated using all lags was 0.5642.

Figure 3

Average of chaotic data (10 realizations) as a function of lag. Blue line indicates the end of first 10 lags. calculated using the first 10 lags was 0.9066, while calculated using all lags was 0.9731.

Figure 4

Average of healthy RR interval data (72 RR interval time-series) as a function of lag. Blue line indicates the end of first 10 lags. calculated using the first 10 lags was 0.0.3885, while calculated using all lags was 0.4151.

This supports our hypothesis that the entropy of underlying physiological mechanism can be captured from a change of the signal of up to 10 lags rather than using all lags based on data length. Another benefit of using maximum lag as 10 is it reduces computational cost from to . From Equation (3) it is obvious that for any data length N the number of elements to be calculated is . On the other hand, for the number of elements in is . Therefore, reduces the computational burden and is suitable for energy constrained devices such as mobile or sensor devices.

3.2. as a Classification Feature: Comparison with

The values of and corresponding to synthetic and physiological data are shown in Figure 5, Figure 6 and Figure 7. It can be seen that both the measures classify synthetic data very significantly and consistently across data length N, while for the physiological data, the significance of classification varies with data length N. A better sense of the classification can be gotten by calculating the corresponding p-values of significance (listed in Table 1). As can be seen from the table, for (a) the healthy vs. arrhythmic case, both and classify the data set significantly at all data lengths. The significance is slightly more (smaller p-values) in the case of . On the other hand, for (b) the healthy vs. atrial fibrillation case, shows significant classification only at the higher data lengths (). However, shows significant classification from N as low as 100. Thus, is surely better than at handling shorter lengths of data.

Figure 5

Periodic vs. chaotic data: values of and .

Figure 6

Healthy vs. arrhythmic HRV data: values of and .

Figure 7

Healthy vs. atrial fibrillation HRV data: values of and .

Table 1

p values of and in classification of data at various data lengths.

p-Value
	DistEn					mDistEn
N	50	100	200	500	1000	50	100	200	500	1000
Periodic vs. Chaotic	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵	1.59 × 10⁻⁵
Healthy vs. Arrhythmic	5.64 × 10⁻¹⁶	1.75 × 10⁻¹⁵	4.14 × 10⁻¹⁵	7.56 × 10⁻¹⁷	1.30 × 10⁻¹⁶	5.02 × 10⁻¹⁷	2.10 × 10⁻¹⁷	4.97 × 10⁻¹⁸	1.09 × 10⁻¹⁸	3.78 × 10⁻¹⁹
Healthy vs. Atrial Fibrillated	NS	NS	NS	0.03	0.01	NS	0.05	0.01	0.004	0.002

For further clarity here, the values of and corresponding to synthetic and physiological data are shown in Figure 8 and tabulated in Table 2. For synthetic signals, the values of both and are the same and consistent with respect to data length N. This shows that performs equally to and supports the previous finding that is less affected by data length [20].

Figure 8

AUC values of and in classification of data at various data lengths.

Table 2

AUC values of and in classification of data at various data lengths.

	AUC
	DistEn					mDistEn
N	50	100	200	500	1000	50	100	200	500	1000
Periodic vs. Chaotic	1	1	1	1	1	1	1	1	1	1
Healthy vs. Arrhythmic	0.94	0.93	0.92	0.95	0.95	0.95	0.96	0.97	0.98	0.98
Healthy vs. Atrial Fibrillated	0.61	0.61	0.60	0.64	0.66	0.61	0.64	0.67	0.69	0.71

Looking at healthy vs. arrhythmia data, the values of are higher than those of and consistent with data length N. Therefore, performs better than for all N and this improvement can be attributed to physiologically motivated selection of lags for evaluation of change in measurement. Similarly, for healthy vs. atrial fibrillation data the values show that performs better than for all . At the lowest used data length of 50, the performances of the two methods are equal and not significant (NS). Overall, the results indicate that increasing lags in (with increasing data length) negatively affects the classification performance, which is avoided in by choosing physiologically relevant number of lags.

4. Discussion

Complexity analysis of HRV signals has significant prognostic value. It could be used as an important non-invasive predictor of adverse cardiovascular events, such as arrhythmia and atrial fibrillation [28,29,30]. Many non-linear algorithms have been used to assess HRV complexity, especially the entropy methods [31]. Among these, is a recently introduced measure that is less parametric compared to traditional entropy formulations such as and [16]. Different methods capture one or several different aspects of signal complexity, including irregularity and fractal dynamics. captures irregularity of spatial structures (of a given time-series) in the state space that is unique for different dynamics [16].This represents one aspect of signal complexity. If, on the other hand we are interested in a measure of randomness, may not show the differentiation of a signal from its surrogate. However, this is true only when the surrogate data are generated by random shuffling of the original time series, not for surrogate data based on phase randomization. relies on the distribution of inter-vector distances that is retained theoretically after random shuffling but perturbed by other randomization processes. We may also interpret that appears sensitive to the irregularity of signal dynamics since it goes up as the number of random dynamics increases in the MIX process. This concept is in keeping with the two well-studied entropy ancestors and [16]. Thus, is not a complete measure of signal complexity and captures just a few aspects of it, each interpreted independently. In this study, we interpret complexity as the irregularity of spatial structures in the state space. is an algorithm that focuses particularly on short-term data [16,20]. The idea behind is to map length-N RR intervals to an inter-vector distance matrix of dimension in the state space. This logarithmically expands the limited information contained in the original RR interval time-series [16]. Examinations on both bench mark synthetic and real clinical data have indicated significantly improved stability and reliability of [16,20] over traditional methods. This is because uses the probability distribution of the entire inter-vector distance matrix; a global quantification as compared to the partial quantification seen in or [16]. In the present study, we have mapped inter-vector distances to the given RR intervals, using a limited time lag. In other words, we have reformed the estimation procedure of inter-vector distances in the original algorithm. The reformation was reminiscent of the possibility of not all elements in the distance matrix being physiologically significant. This is because the influence of a heartbeat may last until only 6–10 beats following it [26,27]. A modified () algorithm has been developed accordingly to restrict the time lag to a fixed value, thereby counting only those that are physiologically relevant to the template vector. Our simulation tests on logistic and RR interval time series suggest that the proposed (using only lags up to 10) accounts for ~90% (the ratios of in Figure 2, Figure 3 and Figure 4 are close to 0.9) of what (using all possible lags) measures. This only indicates that the vectors corresponding to time lags > 10 contribute to a very small portion (less than 10%) of quantified information. Our tests also prove that the information captured by (∼90%) has sufficient prognostic value to classify distinct data sets—in fact, more than that of . We have shown that is a better classification feature than in differentiating arrhythmic or atrial fibrillation patients from healthy controls. Using physiologically insignificant lags (as does) only increases computational expense, adding absolutely no informative value. Consequently, a big advantage of our limited-lag algorithm is the reduction of computational complexity, giving it the potential to be embedded in modern, battery-driven wearable devices that are becoming increasingly popular these days. An interesting question here would be about the role of the inter-vector distances corresponding to the larger lags (lags > 10). These appear to be largely negligible when comparing the absolute difference between En and . Looking from a physiological perspective, we understand that vagal and sympathetic mediation on RR intervals happen through the synaptic release of acetylcholine and noradrenaline, respectively. The vagal effects are almost immediate on a beat-by-beat basis as the turnover rate of acetylcholine is high. On the contrary, the noradrenaline is reabsorbed and metabolized relatively slowly, which results in a long effect latency of sympathetic mediation [32]. Therefore, it may seem necessary to use larger lags in entropy measurement (). However, the negligible difference between and in presented scenarios clearly showed that most of the information can be captured with . In this study, we have not used RR time series of very long durations such as h, and therefore, the impact of very long duration HRV time series on the proposed is currently unknown. This is a limitation of the current study and future exploration on continuous data from ambulatory monitoring could bring more light to the use of for analyzing long-term HRV time series. For physiological signal other than HRV, a respective physiological mechanism should be considered to find the memory effect for determining range of lag. Therefore, we propose this modification to only for HRV analysis. A second limitation of our study is that was proposed in the context of HRV complexity analysis, after we had prior knowledge of the possible effect time (6–10 subsequent beats). Given a completely different data set to study (e.g., EEG data), cannot be used unless there are clear implications on the restriction of effect time pertaining to the data. On the other hand, the original algorithm can still be used, irrespective of the data that are picked. In conclusion, the better performance indicated by in the current study does imply that in future, the design of algorithms could take "physiological context" into consideration too, in order for better accuracy and reduced computation, thereby maximizing the benefits of such algorithms.

5. Conclusions

This study examined distribution entropy () measurement on HRV signal and modified the method to better reflect the complexity of underlying physiological mechanisms. We explained what the inter-vector distances in represent, when mapped to the given RR interval time series. uses multiple time lags to measure the Shannon entropy of changes in HRV signal. In this paper, we propose modified distribution entropy (), a physiologically significant alternative to for HRV complexity analysis. Our experiments and analyses indicate that in comparison to , could reduce computational costs and perform better in classifying both synthetic and physiological signals. Thus, is a more pragmatic option over since it is (i) physiologically more relevant, (ii) computationally less expensive and (iii) a better classification feature, for HRV complexity analysis.

26 in total

1. Approximate entropy as a measure of system complexity.

Authors: S M Pincus
Journal: Proc Natl Acad Sci U S A Date: 1991-03-15 Impact factor: 11.205

2. Effect of data length and bin numbers on distribution entropy (DistEn) measurement in analyzing healthy aging.

Authors: Radhagayathri K Udhayakumar; Chandan Karmakar; Marimuthu Palaniswami
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2015

Review 3. Heart rate variability: a review.

Authors: U Rajendra Acharya; K Paul Joseph; N Kannathal; Choo Min Lim; Jasjit S Suri
Journal: Med Biol Eng Comput Date: 2006-11-17 Impact factor: 2.602

4. Assessing the complexity of short-term heartbeat interval series by distribution entropy.

Authors: Peng Li; Chengyu Liu; Ke Li; Dingchang Zheng; Changchun Liu; Yinglong Hou
Journal: Med Biol Eng Comput Date: 2014-10-29 Impact factor: 2.602

5. Power spectrum analysis of heart rate fluctuation: a quantitative probe of beat-to-beat cardiovascular control.

Authors: S Akselrod; D Gordon; F A Ubel; D C Shannon; A C Berger; R J Cohen
Journal: Science Date: 1981-07-10 Impact factor: 47.728

6. Physiological time-series analysis: what does regularity quantify?

Authors: S M Pincus; A L Goldberger
Journal: Am J Physiol Date: 1994-04

7. Heart rate analysis in normal subjects of various age groups.

Authors: Rajendra Acharya U; N Kannathal; Ong Wai Sing; Luk Yi Ping; TjiLeng Chua
Journal: Biomed Eng Online Date: 2004-07-20 Impact factor: 2.819

8. Selection of entropy-measure parameters for knowledge discovery in heart rate variability data.

Authors: Christopher C Mayer; Martin Bachler; Matthias Hörtenhuber; Christof Stocker; Andreas Holzinger; Siegfried Wassertheurer
Journal: BMC Bioinformatics Date: 2014-05-16 Impact factor: 3.169

Review 9. The physiological basis and measurement of heart rate variability in humans.

Authors: Adina E Draghici; J Andrew Taylor
Journal: J Physiol Anthropol Date: 2016-09-28 Impact factor: 2.867

10. Stability, Consistency and Performance of Distribution Entropy in Analysing Short Length Heart Rate Variability (HRV) Signal.

Authors: Chandan Karmakar; Radhagayathri K Udhayakumar; Peng Li; Svetha Venkatesh; Marimuthu Palaniswami
Journal: Front Physiol Date: 2017-09-20 Impact factor: 4.566

1 in total

1. Entropy Profiling: A Reduced-Parametric Measure of Kolmogorov-Sinai Entropy from Short-Term HRV Signal.

Authors: Chandan Karmakar; Radhagayathri Udhayakumar; Marimuthu Palaniswami
Journal: Entropy (Basel) Date: 2020-12-10 Impact factor: 2.524

1 in total