| Literature DB >> 33924388 |
Ryan Moore1, Kristin R Archer2,3, Leena Choi1.
Abstract
Accelerometers are increasingly being used in biomedical research, but the analysis of accelerometry data is often complicated by both the massive size of the datasets and the collection of unwanted data from the process of delivery to study participants. Current methods for removing delivery data involve arduous manual review of dense datasets. We aimed to develop models for the classification of days in accelerometry data as activity from human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery. We developed statistical and machine learning models for the classification of accelerometry data in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. Model performances were assessed and compared using Monte Carlo cross-validation. We found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively. The best performing models and related data processing techniques are made publicly available in the R package, Physical Activity.Entities:
Keywords: accelerometry; machine learning; neural networks; physical activity; predictive modeling; statistical learning
Mesh:
Year: 2021 PMID: 33924388 PMCID: PMC8069625 DOI: 10.3390/s21082726
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Example of accelerometry data for an assessment. The black lines represent the measurements on the x-axis with a one-minute epoch. Vertical dashed blue lines indicate midnight. The ‘Delivery’ label indicates 0 for a human wear day and 1 for a delivery day. The red text enumerates the day of the assessment.
Summary of methods to generate minimally and fully processed data.
| Minimally Processed | Fully Processed |
|---|---|
| Zeropad days with <1440 min | Zeropad days to = 1440 min |
| Remove days with <5000 total counts |
Figure 2Example of minimally and fully processed data. The minimally processed data retains all of days from the raw data. The full processing removes days with non-compliant human activity (days 10–11) and delivery days with little information (days 1–2, 16–18, and 20–21).
Figure 3The number of days by activity in the minimally and fully processed datasets. The percentage of each activity is presented on top of each bar. The fully processed data has a lower proportion of delivery days than the minimally processed data.
Figure 4Box plots of the number of days of data per participant for the minimally and fully processed datasets. The center line of the boxplot indicates the median. The bottom and top hinges of the box indicate the 25th and 75th quantiles. The whiskers extend from the end of the box to a length of 1.5 multiplied by the interquartile range. Additionally, data points are overlaid on the boxplot.
Figure 5Cross-validated average model performance metrics on the (A) minimally processed data; (B) and fully processed data. The models were validated with 5-fold Monte Carlo cross-validation. RF: Random Forest; GLM: Generalized Linear Model; GLMM: Generalized Linear Mixed-Effects Model; MLP: Multilayer Perceptron; CNN: Convolutional Neural Network; RNN: Recurrent Neural Network; CRNN: Convolutional Recurrent Neural Network; PPV: Positive Predictive Value.