| Literature DB >> 32161808 |
Chang Chen1, Shixue Sun1, Zhixin Cao2,3,4, Yan Shi5, Baoqing Sun6, Xiaohua Douglas Zhang1,7.
Abstract
Sample entropy is a powerful tool for analyzing the complexity and irregularity of physiology signals which may be associated with human health. Nevertheless, the sophistication of its calculation hinders its universal application. As of today, the R language provides multiple open-source packages for calculating sample entropy. All of which, however, are designed for different scenarios. Therefore, when searching for a proper package, the investigators would be confused on the parameter setting and selection of algorithms. To ease their selection, we have explored the functions of five existing R packages for calculating sample entropy and have compared their computing capability in several dimensions. We used four published datasets on respiratory and heart rate to study their input parameters, types of entropy, and program running time. In summary, NonlinearTseries and CGManalyzer can provide the analysis of sample entropy with different embedding dimensions and similarity thresholds. CGManalyzer is a good choice for calculating multiscale sample entropy of physiological signal because it not only shows sample entropy of all scales simultaneously but also provides various visualization plots. MSMVSampEn is the only package that can calculate multivariate multiscale entropies. In terms of computing time, NonlinearTseries, CGManalyzer, and MSMVSampEn run significantly faster than the other two packages. Moreover, we identify the issues in MVMSampEn package. This article provides guidelines for researchers to find a suitable R package for their analysis and applications using sample entropy.Entities:
Keywords: R package; comparison; nonlinear dynamics; sample entropy; time series
Year: 2019 PMID: 32161808 PMCID: PMC6994089 DOI: 10.1093/biomethods/bpz016
Source DB: PubMed Journal: Biol Methods Protoc ISSN: 2396-8923
General information of testing datasets
| Dataset no | Data type | Data point number of Datasets | Data content |
|---|---|---|---|
| Dataset 1 | Air flow | 196 533 | 24-hours recording of one subject |
| Dataset 2 | Air flow | 44 419 | 24-hours recording of one subject |
| Dataset 3 | RR interval | Average 6000 | 20 subjects |
| Dataset 4 | RR interval | 1134 | One subject |
Comparison of five R programs
| Package |
|
|
|
|
|
|---|---|---|---|---|---|
| Download web address |
|
|
|
|
|
| packages/mousetrap/index.html | packages/pracma/index.html | packages/nonlinearTseries/index.html | areshenk/MSMVSampEn | packages/CGManalyzer/index.html | |
| Latest version | Version 3.1.3, | Version 2.2.5, 9 April 2019 | Version 0.2.6, 21 February 2019 | No exact version | Version 1.2 |
| Updated time | 4 October 2019 | 17 July 2017 | 23 October 2019 | ||
| Core function |
|
|
|
|
|
| Type of entropy | Sample entropy | Sample entropy, Approximate entropy | Sample entropy | MMSE | MSE |
| Types of input data | Mouse movement trajectory | One-dimensional time series | One-dimensional time series | High-dimensional time series | One-dimensional time series |
| Embedding dimension | Single, modifiable | Single, modifiable | Multiple, modifiable | Single, modifiable | Multiple, modifiable |
| Time lag | Unchangeable | Modifiable | Modifiable | Modifiable | Unchangeable |
| (Time lag = 1) | (Time lag =1) | ||||
| Similarity threshold | Single, require multiplying standard deviation | Single, require multiplying standard deviation | Multiple, require multiplying standard deviation | Single, require multiplying standard deviation | Multiple |
| Estimated value | No | No | Yes | No | No |
| Multiscale value | No | No | No | Yes | Yes |
| Output value | Single value | Single value | Multiple value | Single value | Multiscale value |
| Figures for displaying data/results | No | No | Yes | No | Yes |
| Required package/ work | No | No | Require correlation dimension | Require | No |
Sample entropy value and operating time for four datasets
| Dataset | Dataset 1 (191 415 points) | Dataset 2 (44 419 points) | Dataset 3 series 1 (6823 points) | Dataset 4 (1134 points) | ||||
|---|---|---|---|---|---|---|---|---|
| Package | Value | Time | Value | Time | Value | Time | Value | Time |
|
| 0.2187467 | 6 h 5 m 58 s | 1.083679 | 19 m 40 s | 0.8499059 | 31 s | 1.315703 | 1 s |
|
| 0.1780288 | 6 h 19 m 16 s | 1.286609 | 18 m 45 s | 0.9012826 | 29 s | 1.530295 | 1 s |
|
| 0.1780246 | 11 h 20 m 7 s | 1.286609 | 34 m 58 s | 0.9012826 | 49 s | 1.530295 | 2 s |
|
| 0.1780273 | 1 m 14 s | 1.287526 | 1 s | 0.9039996 | 1 s | 1.535612 | 1 s |
|
| NaN | NA | 1.28662 | 2 m 11 s | 0.9013839 | 3 s | 1.530749 | 1 s |
|
| 0.178 | 2 m 2 s | 1.287 | 6 s | 0.901 | 1 s | 1.530 | 1 s |
The percentage error in the sample entropy values compared with the nonlinearTseries package and other packages.
| R package name compared with | 40 000 | 20 000 | 5000 | 2000 | 500 |
|---|---|---|---|---|---|
|
| 0.02% ± 0.01% | 0.03% ± 0.02% | 0.19% ± 0.12% | 0.42% ± 0.30% | 2.33% ± 1.09% |
|
| 0.02% ± 0.01% | 0.05% ± 0.02% | 0.19% ± 0.12% | 0.42% ± 0.30% | 2.33% ± 1.10% |
|
| |||||
|
| 0.04% ± 0.03% | 0.10% ± 0.06% | 0.22% ± 0.20% | 0.46% ± 0.20% | 2.28% ± 2.40% |
|
| 0.05% ± 0.01% | 0.04% ± 0.03% | 0.21% ± 0.18% | 0.49% ± 0.32% | 2.24% ± 1.15% |
The percentage error is expressed as deviations from the values returned by the other packages and computed for datasets of different size. Values are given as mean ± standard deviation.
Figure 1:Adjustment of missing value using CGManalyzer. (a) First 500 data points from original Dataset 2. (b) Adjusted data after processing missing value by CGManalyzer. The red line indicates the mean value of the data.
The operating time (unit: second) of five packages for different lengths of data
| R package name | 40 000 | 20 000 | 5000 | 2000 | 500 |
|---|---|---|---|---|---|
|
| 1205.6 ± 428.73 | 278.4 ± 10.85 | 25.0 ± 3.37 | 19.6 ± 1.29 | 3.1 ± 0.32 |
|
| 2034.0 ± 507.65 | 440.2 ± 5.16 | 27.1 ± 0.31 | 4.4 ± 0.52 | 0.3 ± 0.48 |
|
| 7.2 ± 0.42 | 1.5 ± 0.52 | 0.1 ± 0.31 | 0.0 ± 0.0 | 0.0 ± 0.0 |
|
| 104.8 ± 1.98 | 26.4 ± 0.70 | 1.7 ± 0.48 | 0.5 ± 0.53 | 0.0 ± 0.0 |
|
| 7.0 ± 1.05 | 2.0 ± 0 | 0.3 ± 0.48 | 0.3 ± 0.48 | 0.2 ± 0.42 |
Values are given as means ± standard deviation.
Figure 2:Graphs from NonlinearTseries. (a) Curve of sample entropy values changing with similarity threshold under different embedding dimensions for Dataset 2 (44419 points). (b) Linear fitting results for the curve of each embedding dimension. Users can select a specific similarity threshold interval for a linear fit and then calculate the mean of these linear fitted values as an approximation of the entropy values.
Figure 3:Display of calculated sample entropy using CGManalyzer. (a): Plot showing the mean and standard deviation of sample entropy, respectively, in two groups in Dataset 3 of RR interval data by different groups. (b): An antenna plot showing strictly SSMD against the mean difference and its confidence interval between old and young people.