| Literature DB >> 35271128 |
Mouna Benchekroun1,2, Baptiste Chevallier1,3, Dan Istrate1, Vincent Zalc1, Dominique Lenne2.
Abstract
Thanks to wearable devices joint with AI algorithms, it is possible to record and analyse physiological parameters such as heart rate variability (HRV) in ambulatory environments. The main downside to such setups is the bad quality of recorded data due to movement, noises, and data losses. These errors may considerably alter HRV analysis and should therefore be addressed beforehand, especially if used for medical diagnosis. One widely used method to handle such problems is interpolation, but this approach does not preserve the time dependence of the signal. In this study, we propose a new method for HRV processing including filtering and iterative data imputation using a Gaussian distribution. The particularity of the method is that many physiological aspects are taken into consideration, such as HRV distribution, RR variability, and normal boundaries, as well as time series characteristics. We study the effect of this method on classification using a random forest classifier (RF) and compare it to other data imputation methods including linear, shape-preserving piecewise cubic Hermite (pchip), and spline interpolation in a case study on stress. Features from reconstructed HRV signals of 67 healthy subjects using all four methods were analysed and separately classified by a random forest algorithm to detect stress against relaxation. The proposed method reached a stable F1 score of 61% even with a high percentage of missing data, whereas other interpolation methods reached approximately 54% F1 score for a low percentage of missing data, and the performance drops to about 44% when the percentage is increased. This suggests that our method gives better results for stress classification, especially on signals with a high percentage of missing data.Entities:
Keywords: ambulatory; biosensors; e-health; heart rate variability (HRV); stress monitoring; wearables
Mesh:
Year: 2022 PMID: 35271128 PMCID: PMC8914897 DOI: 10.3390/s22051984
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of processing steps from data collection to HRV feature extraction and classification. Each of these steps is detailed in Section 2.
Figure 2Flowchart for DVC algorithm including filtering and data imputation processes.
Physiological conditions for RRI.
| 1. 0.3 s < |
|---|
| 2. Deviation ( |
| where: |
| where: |
|
|
Figure 3In the right merge, < 0.3 s is added to the next value, and and its timestamp are deleted. The new value . In the left merge, is added to and placed at , and as well as its timestamp are deleted to respect the equality abscissa-ordinate. The new value . The subscript i is used to index initial RR intervals and the new value is referred to as .
Figure 4Data imputation using the DVC method. In the first iteration, T1 is computed and RR1 is randomly generated. This same process is repeated until < 1.3 s and the last RR is the time difference between the last two timestamps.
Figure 5Poincaré plot analysis with the ellipse fitting procedure. SD1 and SD2 are the standard deviations in the directions and . Adapted with permission from [35]. 2016–2021 Kubios Oy.
Figure 6Example of data imputation for 20% deleted data.
Figure 7Example of standardized SDNN extracted from the raw and reconstructed signals of 10 min length.
F1 scores for each data imputation method.
| RF Hyper-Parameters |
|---|
| criterion = ’entropy’, max_features = 0.6, min_samples_split = 3, n_estimators = 500 |
F1 scores for each data imputation method.
| F1 Scores | |||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| 5% | 5% |
| 0.54 | 0.53 | 0.56 |
| 5% | 10% |
| 0.52 | 0.51 | 0.54 |
| 5% | 15% |
| 0.48 | 0.47 | 0.55 |
| 5% | 20% |
| 0.45 | 0.45 | 0.55 |
| 5% | 25% |
| 0.44 | 0.43 | 0.55 |
| 5% | 30% |
| 0.44 | 0.43 | 0.55 |
| 5% | 35% |
| 0.44 | 0.43 | 0.55 |
Summary table for advantages and disadvantages of data imputation methods.
| Method | Advantages | Disadvantages |
|---|---|---|
| Linear | - Assumes less than the other methods - Simple and efficient for good quality signals | - Less effective for signals with lots of missing data - Loss of time dependency |
| Pchip | - Preserves the linear trend and the slightly non linear contributions in the RR time-series [ | - Less effective for signals with lots of missing data - Loss of time dependency |
| Spline | - Can capture abrupt variations when data quality is good | - Introduces outliers due to oscillation of the interpolation function [ |
| DVC | - Adaptive to data distribution and variability - No ectopic values in the processed signal - Preserves signal’s time dependency - Effective for low quality signals | - Computationally expensive - Algorithm could be optimised |