| Literature DB >> 29463052 |
Talko B Dijkhuis1,2, Frank J Blaauw3,4, Miriam W van Ittersum5, Hugo Velthuijsen6, Marco Aiello7.
Abstract
Living a sedentary lifestyle is one of the major causes of numerous health problems. To encourage employees to lead a less sedentary life, the Hanze University started a health promotion program. One of the interventions in the program was the use of an activity tracker to record participants' daily step count. The daily step count served as input for a fortnightly coaching session. In this paper, we investigate the possibility of automating part of the coaching procedure on physical activity by providing personalized feedback throughout the day on a participant's progress in achieving a personal step goal. The gathered step count data was used to train eight different machine learning algorithms to make hourly estimations of the probability of achieving a personalized, daily steps threshold. In 80% of the individual cases, the Random Forest algorithm was the best performing algorithm (mean accuracy = 0.93, range = 0.88-0.99, and mean F1-score = 0.90, range = 0.87-0.94). To demonstrate the practical usefulness of these models, we developed a proof-of-concept Web application that provides personalized feedback about whether a participant is expected to reach his or her daily threshold. We argue that the use of machine learning could become an invaluable asset in the process of automated personalized coaching. The individualized algorithms allow for predicting physical activity during the day and provides the possibility to intervene in time.Entities:
Keywords: coaching; machine learning; physical activity; sedentary lifestyle
Mesh:
Year: 2018 PMID: 29463052 PMCID: PMC5856112 DOI: 10.3390/s18020623
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Confusion matrix.
| True Positives (TP) | False Negatives (FN) | ||
| False Positives (FP) | True Negatives (TN) |
True Positive: the threshold of daily steps was met and predicted; True Negative: the threshold of daily steps was not met and predicted; False Negative: the threshold of daily steps was met and not predicted; False Positive: the threshold of daily steps was not met and not predicted.
Algorithms and their scores for the whole dataset.
| Algorithm Name | Mean Accuracy | Mean F1 (Standard Deviation) | Rank |
|---|---|---|---|
| AdaBoost (ADA) | 0.776623 (0.002080) | 0.854157 (0.001626) | 1 |
| Neural Networking (NN) | 0.777774 (0.001545) | 0.852797 (0.002938) | 2 |
| Support Vector Classifier (SVC) | 0.770728 (0.002505) | 0.856341 (0.002405) | 3 |
| Stochastic Gradient Descent (SGD) | 0.767623 (0.005490) | 0.853575 (0.004574) | 4 |
| KNeighborsClassifier (KNN) | 0.749171 (0.005683) | 0.829826 (0.005544) | 5 |
| Logistic Regression (LR) | 0.742125 (0.009821) | 0.825725 (0.008487) | 6 |
| Random Forest (RF) | 0.737451 (0.003210) | 0.819065 (0.003840) | 7 |
| Decision Tree (DT) | 0.720535 (0.004787) | 0.804220 (0.003006) | 8 |
Figure 1Algorithm accuracy comparison.
Figure 2Algorithm F1-score comparison.
Algorithms, used parameters, and grid search values.
| Algorithm name | Hyperparameters | Values |
|---|---|---|
| AdaBoost (ADA) | n_estimators: number of decision trees in the ensemble | [10,50] |
| learning rate: the shrink of the contribution of each successive decision tree in the ensemble | [0.1, 0.5, 1.0, 10.0] | |
| Decision Tree (DT) | criterion: the algorithm to use to decide on split | [‘gini’, ‘entropy’] |
| max_features: the number of features to consider when to split | [‘auto’,‘sqrt’,‘log2’] | |
| KNeighborsClassifier (KNN) | metrics: the distance metric to use | [‘minkowski’,‘euclidean’,‘manhattan’] |
| weights: weight function used | [‘uniform’,‘distance’] | |
| n_neighbors: number of neighbors to use for queries | [5, 6, 7, 8, 9] | |
| Neural Networking (NN) | learning_rate_init: the control of the step-size in updating the weights | [‘constant’, ‘invscaling’, ‘adaptive’] |
| activation: the activation function for the hidden layer | [‘identity’, ‘logistic’, ‘tanh’, ‘relu’] | |
| learning_rate: the rate for the weight of the updates | [0.01, 0.05, 0.1, 0.5, 1.0] | |
| Logistic Regression (LR) | C: regularization strength | [0.001, 0.01, 0.1, 1, 10, 100, 1000] |
| penalty: whether to use Lasso (L1) or Ridge (L2) regularization | [‘l1’, ‘l2’] | |
| fit_intercept: whether or not to compute the intercept of the linear classifier | [True, False] | |
| Stochastic Gradient Descent (SGD) | fit_intercept: whether or not the intercept should be computed | [True, False] |
| l1_ratio: the penalty is set to L1 or L2 | [0,0.15,1] | |
| loss: quantification of the loss | [‘log’,‘modified_huber’] | |
| Support Vector Classifier (SVM) | kernel: the kernel type to be used in the algorithm | [‘linear’,‘rbf’] |
| Random Forest (RF) | n_estimators:number of decision trees | [10, 50, 100, 500] |
| max_features: the number of features to consider when to split | [0.1, 0.25, 0.5, 0.75, ‘sqrt’, ‘log2’, None] | |
| criterion: which algorithm should be used to decide on split | [‘gini’, ‘entropy’] |
Figure 3Average accuracy and F1-score per model.
Example of different tuned personalized Random Forest models.
| Participant | Parameters | Values |
|---|---|---|
| 1119 | criterion | gini |
| 1121 | criterion | entropy |
The number of different values per Random Forest hyperparameter.
| Hyperparameter | Value | Number of Occurrences |
|---|---|---|
| criterion | entropy | 7 |
| gini | 37 | |
| max_features | 0.1 | 4 |
| 0.25 | 5 | |
| 0.5 | 7 | |
| 0.75 | 15 | |
| log2 | 2 | |
| sqrt | 2 | |
| null | 9 | |
| n_estimators | 10 | 3 |
| 100 | 17 | |
| 50 | 16 | |
| 500 | 6 |
Figure 4Average F1-Score per algorithm, per hour based on the individual scores.
Figure 5Average accuracy per algorithm, per hour based on the individual scores.
Figure 6Screenshot of the Personalized Coach Web Application.