| Literature DB >> 34785733 |
Marta Vigier1,2, Benjamin Vigier3, Elisabeth Andritsch4, Andreas R Schwerdtfeger5,6.
Abstract
Most cancer patients exhibit autonomic dysfunction with attenuated heart rate variability (HRV) levels compared to healthy controls. This research aimed to create and evaluate a machine learning (ML) model enabling discrimination between cancer patients and healthy controls based on 5-min-ECG recordings. We selected 12 HRV features based on previous research and compared the results between cancer patients and healthy individuals using Wilcoxon sum-rank test. Recursive Feature Elimination (RFE) identified the top five features, averaged over 5 min and employed them as input to three different ML. Next, we created an ensemble model based on a stacking method that aggregated the predictions from all three base classifiers. All HRV features were significantly different between the two groups. SDNN, RMSSD, pNN50%, HRV triangular index, and SD1 were selected by RFE and used as an input to three different ML. All three base-classifiers performed above chance level, RF being the most efficient with a testing accuracy of 83%. The ensemble model showed a classification accuracy of 86% and an AUC of 0.95. The results obtained by ML algorithms suggest HRV parameters could be a reliable input for differentiating between cancer patients and healthy controls. Results should be interpreted in light of some limitations that call for replication studies with larger sample sizes.Entities:
Mesh:
Year: 2021 PMID: 34785733 PMCID: PMC8595703 DOI: 10.1038/s41598-021-01779-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Cancer patients characteristics.
| Patients characteristics | N 77 |
|---|---|
| 50 | |
| Male | 30 |
| Female | 47 |
| Breast | 33 |
| Colorectal | 29 |
| Lung | 3 |
| Pancreas | 10 |
| Prostate | 2 |
| I | 32 |
| II | 7 |
| III | 12 |
| IV | 26 |
HRV features.
| HRV features | Type | Description |
|---|---|---|
| Mean RR (ms) | Time-domain | The average of RR intervals during a period of time |
| SDNN (ms) | Time-domain | Standard deviation of NN intervals |
| RMSSD (ms) | Time-domain | Root mean square of successive RR interval differences |
| pNN50% | Time-domain | Percentage of successive RR intervals that differ by more than 50 ms |
| HRV triangular index | Time-domain | The integral of the sample density distribution of RR intervals divided by the maximum of the density distribution |
| TINN (ms) | Time-domain | Baseline width of the RR interval histogram |
| LF power % | Frequency-domain | Includes the frequency range between 0.04 Hz and 0.15 Hz |
| HF power % | Frequency-domain | Includes the frequency range between 0.16 Hz and 0.4 Hz |
| Total Power (ms) | Frequency-domain | Reflects the overall autonomic activity |
| SD1 | Non-linear | Poincaré plot standard deviation perpendicular to the line of identity |
| SD2 | Non-linear | Poincaré plot standard deviation along the line of identity |
| Sample Entropy | Non-linear | Measures the regularity and complexity of a time series |
HRV features comparison between cancer and control individuals.
| HRV features | Cancer ( | Control ( | ||
|---|---|---|---|---|
| Mean RR (ms) | 717.91/93.18 | 832.9/108.9 | 5119 | < 0.001 |
| SDNN (ms) | 21.4/7.57 | 35.37/14.76 | 5289 | < 0.001 |
| RMSSD (ms) | 13.87/4.49 | 29.26/16.47 | 5332 | < 0.001 |
| pNN50% | 0.62/0.87 | 7.82/12.15 | 5345 | < 0.001 |
| HRV triangular index | 5.5/1.73 | 8.95/3.57 | 5503.5 | < 0.001 |
| TINN (ms) | 126.69/57.68 | 194.6/101.61 | 4625 | 0.013 |
| LF power % | 69.27/12.22 | 60.91/17.85 | 5160 | < 0.001 |
| HF power % | 16.69/9.17 | 28.92/18.2 | 5522 | < 0.001 |
| Total power (ms) | 440.23/357.83 | 1294/1231.43 | 5425 | < 0.001 |
| SD1 | 9.81/3.18 | 20.72/11.66 | 5332 | < 0.001 |
| SD2 | 28/10.55 | 45.03/18.6 | 5290 | < 0.001 |
| Sample entropy | 1.37/0.34 | 1.58/0.32 | 4811 | 0.002 |
Results of recursive features elimination algorithm applied to 12 prior knowledge-based selected HRV features.
| Variables | Accuracy | Kappa | Accuracy SD | Kappa SD | Selected |
|---|---|---|---|---|---|
| 4 | 0.8 | 0.58 | 0.1 | 0.21 | |
| 8 | 0.83 | 0.64 | 0.1 | 0.2 | * |
| 12 | 0.82 | 0.62 | 0.1 | 0.21 | |
| Top 5 out of 8 | RMSSD, SD1, SDNN, pNN50, HRV.triangular.index | ||||
Figure 1Density plots for the five most important HRV features selected by the RFE method. The density plots illustrate the distribution of the top features between control individuals (pink) and cancer patients (blue).
Prediction performance of base classifiers and stacked ensemble.
| Classifier | Accuracy | Kappa | ROC | Sensitivity | Specificity |
|---|---|---|---|---|---|
| LDA | 0.798 | 0.6 | 0.91 | 0.88 | 0.74 |
| NB | 0.790 | 0.58 | 0.89 | 0.67 | 0.89 |
| RF | 0.849 | 0.7 | 0.91 | 0.83 | 0.86 |
| Ensemble | 0.93 | 0.86 | 0.96 | 0.85 | 0.92 |
Figure 2Accuracy and Kappa statistics for different classifiers.
Figure 3ROC, sensitivity and specificity for different classifiers.
Figure 4Correlation between the results of base classifiers.
Figure 5Confusion matrix showing the meta-classifier results for the testing dataset.
Figure 6Receiver operating characteristic curve for the meta-classifier model on the testing dataset. The horizontal axis represents the false-positive rate (1-Specificity). The vertical axis represents the true-positive rate (Sensitivity).