| Literature DB >> 30372463 |
Lucas Marzec1,2, Sridharan Raghavan1,3,4, Farnoush Banaei-Kashani1,5, Seth Creasy1,6, Edward L Melanson1,6,7,8, Leslie Lange1, Debashis Ghosh1,9, Michael A Rosenberg1,10.
Abstract
Low levels of physical activity are associated with increased mortality risk, especially in cardiac patients, but most studies are based on self-report. Cardiac implantable electronic devices (CIEDs) offer an opportunity to collect data for longer periods of time. However, there is limited agreement on the best approaches for quantification of activity measures due to the time series nature of the data. We examined physical activity time series data from 235 subjects with CIEDs and at least 365 days of uninterrupted measures. Summary statistics for raw daily physical activity (minutes/day), including statistical moments (e.g., mean, standard deviation, skewness, kurtosis), time series regression coefficients, frequency domain components, and forecasted predicted values, were calculated for each individual, and used to predict occurrence of ventricular tachycardia (VT) events as recorded by the device. In unsupervised analyses using principal component analysis, we found that while certain features tended to cluster near each other, most provided a reasonable spread across activity space without a large degree of redundancy. In supervised analyses, we found several features that were associated with the outcome (P < 0.05) in univariable and multivariable approaches, but few were consistent across models. Using a machine-learning approach in which the data was split into training and testing sets, and models ranging in complexity from simple univariable logistic regression to ensemble decision trees were fit, there was no improvement in classification of risk over naïve methods for any approach. Although standard approaches identified summary features of physical activity data that were correlated with risk of VT, machine-learning approaches found that none of these features provided an improvement in classification. Future studies are needed to explore and validate methods for feature extraction and machine learning in classification of VT risk based on device-measured activity.Entities:
Mesh:
Year: 2018 PMID: 30372463 PMCID: PMC6205644 DOI: 10.1371/journal.pone.0206153
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A. First two principal components of activity features. B. Cumulative variance explained by principal components.
Device types and episodes.
| Type | Number (%) | Episodes (%) |
|---|---|---|
| ICD—Single chamber | 88 (37%) | 16 (18%) |
| ICD—Dual chamber | 46 (20%) | 16 (35%) |
| CRT-D | 59 (25%) | 17 (29%) |
| Pacemaker—Single chamber | 6 (3%) | - |
| Pacemaker—Dual chamber | 30 (13%) | - |
| CRT-P | 6 (3%) | - |
ICD = Implantable cardioverter-defibrillator, CRT-D = Cardiac resynchronization device-defibrillator, CRT-P = Cardiac resynchronization device-pacemaker
Physical activity summary information about for each patient (N = 235).
| Mean | 124.0 ± 61.5 | |
| SD | 40.3 ± 19.3 | |
| Skew | 0.74 ± 0.80 | |
| Kurtosis | 1.8 ± 6.1 | |
| Max | 277.7 ± 112.4 | |
| Min | 34.6 ± 28.3 | |
| Slope | -0.04 ± 0.09 | |
| Intercept | 130.6 ± 63.2 | |
| ACF1 | -0.43 ± 0.09 | |
| ACF2 | -0.03 ± 0.10 | |
| ACF7 | 0.10 ± 0.14 | |
| ACF14 | 0.11 ± 0.13 | |
| PACF1 | -0.43 ± 0.09 | |
| PACF2 | -0.27 ± 0.07 | |
| PACF7 | -0.06 ± 0.06 | |
| PACF14 | -0.03 ± 0.06 | |
| 7-day | 122.3 ± 67.6 | |
| 30-day | 123.4 ± 65.3 | |
| 60-day | 121.1 ± 59.0 | |
| 90-day | 123.6 ± 65.9 | |
| Top Period | 6.9 |
All values except Top Period are mean±standard deviation of physical activity, across all patients. Daily physical activity is measured in minutes/day. Forecasts obtained based on autoregressive integrated moving average (ARIMA) (1, 0, 1) models. Top Period is the mode (in days) across patients, obtained from fast Fourier transform for activity, and corresponds to the highest peak of the frequency plot for each patient. See Methods for details.
Fig 2A. Univariate association with VT episodes (t-test). B. Multivariable logistic regression p-values for association with VT episodes. Dashed red line: p = 0.05.
Fig 3Variable importance plot.
From random forest model.
Predictive accuracy of different models for VT episodes.
| Accuracy | AUC | F1 score | |
|---|---|---|---|
| Naïve | 74.5% | 0.50 | NA |
| UV Logistic Regression | 74.5% | 0.50 | NA |
| MV Logistic Regression | 70.2% | 0.61 | 0.417 |
| Penalized Logistic Regression | 74.5% | 0.50 | NA |
| Bagged Decision Tree | 74.5% | 0.55 | 0.250 |
| Random Forest | 76.6% | 0.57 | 0.267 |
| Boosted Decision Tree | 70.2% | 0.50 | 0.125 |
| KNN (k = 1) | 55.3% | 0.43 | 0.16 |
| KNN (k = 10) | 72.3% | 0.49 | 0.00 |
| SVC | 74.5% | 0.50 | NA |
| SVM | 74.5% | 0.50 | NA |
Note: Penalized Logistic Regression includes lasso, ridge, and elastic net regression models (result was same across models). UV = Univariable (Standard deviation, skew, kurtosis, and 60-day forecast, separately), MV = Multivariable, KNN = K-nearest neighbors classifier, SVC = Support vector classifier, SVM = Support vector machine, AUC = Area under receiver operator curve. F1 score is the harmonic average of precision and recall (range 0–1).