| Literature DB >> 32445459 |
Jong-Hwan Jang1, Junggu Choi1, Hyun Woong Roh2, Sang Joon Son3, Chang Hyung Hong3, Eun Young Kim1,4, Tae Young Kim1, Dukyong Yoon1,4.
Abstract
BACKGROUND: Data collected by an actigraphy device worn on the wrist or waist can provide objective measurements for studies related to physical activity; however, some data may contain intervals where values are missing. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data without any such assumptions and may outperform previous approaches in imputation tasks.Entities:
Keywords: accelerometer; actigraphy; autoencoder; deep learning; imputation
Mesh:
Year: 2020 PMID: 32445459 PMCID: PMC7413283 DOI: 10.2196/16113
Source DB: PubMed Journal: JMIR Mhealth Uhealth ISSN: 2291-5222 Impact factor: 4.773
Figure 1Overview of the study and data where n indicates number of records (days). IV: intradaily variability; KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank; KNHANES: Korea National Health and Nutrition Examination Survey; MVPA: moderate-to-vigorous physical activity; NHANES: National Health and Nutrition Examination Survey; PRMSE: partial root mean square error; PMAE: partial mean absolute error; RMSE: root mean square error.
Figure 2(a) Frequency of missing data intervals found in the NHANES data set. The interval of approximately 30 minutes occurred most frequently. (b) Example of a complete data record and of a record with missing data interval.
Figure 3Model architecture for zero-inflated denoising convolutional autoencoder consisting of encoder with 5 convolutional layers in which the filter size and stride decrease and the number of feature maps increases and decoder with 5 transconvolutional layers in which hyperparameters are symmetrically the same as those used in the encoder.
Baseline characteristics of the data sets.
| Characteristics | NHANESa (n=12,475) | KNHANESb (n=1768) | KCCDBc (n=177) | |
| Age (years), mean (SD) | 39.04 (22.27) | 42.88 (13.04) | 74.07 (7.05) | |
|
|
|
|
| |
|
| Male | 6077 (48.71) | 662 (37.44) | 56 (31.63) |
|
| Female | 6398 (51.28) | 1106 (62.55) | 121 (68.36) |
| Weight (kg), mean (SD) | 75.26 (21.73) | 63.35 (11.97) | 59.03 (10.04) | |
| Height (cm), mean (SD) | 166.01 (11.72) | 163.45 (8.55) | 156.96 (8.33) | |
| BMI (kg/m2), mean (SD) | 27.03 (6.56) | 23.62 (3.48) | 22.66 (7.19) | |
| Activity (count), mean (SD) | 344 (694.23) | 433 (586.78) | 637 (1121.27) | |
aNHANES: National Health and Nutrition Examination Survey data set (device: ActiGraph AM-7164; type: uniaxial; sample rate: 0.016 Hz).
bKNHANES: Korea National Health and Nutrition Examination Survey data set (device: ActiGraph GTX3+; type: triaxial; sample rate: 0.016 Hz).
cKCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank data set (device: Fitmeter; type: triaxial; sample rate: 0.01 Hz).
Result of 10-fold cross-validation.
|
| Hyperparameters | Result | ||
| Experiment | Latent vector size, k | Filter size, m | RMSEa (count), mean | |
| 1 | 40 | 20 | 830.5 | |
| 2 | 40 | 30 | 838.0 | |
| 3 | 60 | 20 | 858.7 | |
| 4 | 60 | 30 | 788.4 | |
| 5 | 80 | 20 | 825.1 | |
| 6 | 80 | 30 | 831.0 | |
aRMSE: root mean square error.
Hyperparameter settings for the zero-inflated denoising convolutional autoencoder.
| Component layer (order of layer) | Filter, na ×size (stride) | Feature map output size, n × size | |
| Input |
| 1×720 | |
|
|
|
| |
|
| Convolution (1) | 8×30b (2) | 8×346 |
|
| Convolution (2) | 16×20 (2) | 16×164 |
|
| Convolution (3) | 32×10 (2) | 32×78 |
|
| Convolution (4) | 64×10 (1) | 64×69 |
|
| Convolution (5) | 128×10 (1) | 128×60c |
|
|
|
| |
|
| Transconvolution (6) | 64×10 (1) | 64×69 |
|
| Transconvolution (7) | 32×10 (1) | 32×78 |
|
| Transconvolution (8) | 16×10 (2) | 16×164 |
|
| Transconvolution (9) | 8×20 (2) | 8 ×346 |
|
| Transconvolution (10d) | 1 ×30 (2) | 1×720 |
anumber of filters, n.
bfilter size, m.
clatent vector size, k, extracted by the encoder.
dnumber of layers, q.
Figure 4Examples of (a) NHANES, (b) KNHANES, and (c) KCCDB data sets for zero-inflated denoising convolutional autoencoder (left), zero-inflated Poisson regression (center), and Bayesian regression (right) with imputed (red) and original (green) intervals within the record (blue). KCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank; KNHANES: Korea National Health and Nutrition Examination Survey; ZI-DCAE: zero-inflated denoising convolutional autoencoder; ZIP: zero-inflated Poisson regression.
Imputation performance results for the comparison methods.
| Dataset measurement | ZI-DCAEa | Mean imputation | Zero-inflated Poisson regression | Bayesian | |||||
|
|
|
|
|
| |||||
|
| partial RMSEc (counts) | 839.3 | 1053.2 | 1255.6 | 924.5 | ||||
|
| partial MAEd (counts) | 431.1 | 545.4 | 508.5 | 605.8 | ||||
|
| RMSE of SD (counts) | 35.1 | 65.2 | 69.2 | 34.2 | ||||
|
| RMSE of intradaily variability index | 0.047 | 0.067 | 0.060 | 0.071 | ||||
|
| RMSE of moderate-to-vigorous physical activity (minutes) | 12.3 | 16.2 | 16.2 | 11.0 | ||||
|
|
|
|
|
| |||||
|
| partial RMSE (counts) | 672.1 | 660.0 | 778.8 | 824.1 | ||||
|
| partial MAE (counts) | 396.3 | 419.7 | 395.5 | 555.5 | ||||
|
| RMSE of SD (counts) | 24.4 | 26.5 | 26.0 | 24.7 | ||||
|
| RMSE of intradaily variability index | 0.037 | 0.039 | 0.040 | 0.050 | ||||
|
| RMSE of moderate-to-vigorous physical activity (minutes) | 12.9 | 14.7 | 14.6 | 12.2 | ||||
|
|
|
|
|
| |||||
|
| partial RMSE (counts) | 1217.2 | 1313.2 | 1638.4 | 1139.4 | ||||
|
| partial MAE (counts) | 819.6 | 1045.8 | 1161.6 | 810.7 | ||||
|
| RMSE of SD (counts) | 27.1 | 30.8 | 29.6 | 27.7 | ||||
|
| RMSE of intradaily variability index | 0.02 | 0.036 | 0.041 | 0.018 | ||||
|
| RMSE of moderate-to-vigorous physical activity (minutes) | 13.4 | 14.9 | 14.9 | 13.6 | ||||
aZI-DCAE: zero-inflated denoising convolutional autoencoder.
bNHANES: National Health and Nutrition Examination Survey data set.
cRMSE: root mean square error.
dMAE: mean absolute error.
eKNHANES: Korea National Health and Nutrition Examination Survey data set.
fKCCDB: Korean Chronic Cerebrovascular Disease Oriented Biobank data set.