| Literature DB >> 36103223 |
Toby Perrett1, Alessandro Masullo1, Dima Damen1, Tilo Burghardt1, Ian Craddock1, Majid Mirmehdi1.
Abstract
BACKGROUND: Calorimetry is both expensive and obtrusive but provides the only way to accurately measure energy expenditure in daily living activities of any specific person, as different people can use different amounts of energy despite performing the same actions in the same manner. Deep learning video analysis techniques have traditionally required a lot of data to train; however, recent advances in few-shot learning, where only a few training examples are necessary, have made developing personalized models without a calorimeter a possibility.Entities:
Keywords: calories, calorimetry; computer vision; deep learning; energy expenditure
Year: 2022 PMID: 36103223 PMCID: PMC9520387 DOI: 10.2196/33606
Source DB: PubMed Journal: JMIR Form Res ISSN: 2561-326X
Figure 1Neural network architecture for processing silhouette video streams, consisting of a convolutional neural network (CNN) for extracting frame features and a temporal convolutional network (TCN) for combining frame features over a period of 30 seconds. To achieve an initialization that can be quickly adapted to unseen participants, the main training objective is to minimize the calorie loss while maximizing the person loss. Seq: sequence.
Figure 2Visualization of our data pipeline used to train and fine-tune a neural network, which is then used to provide personalized energy expenditure estimations from video.
Mean square error averaged across all participants.
| Actions | Stand | Sit | Walk | Wipe | Vacuum | Sweep | Lie | Exercise | Stretch | Clean | Read | ||||||||||||
| Single action | 2.32 | 2.20 | 2.80 | 1.40a | 1.94 | 2.15 | 2.43 | 6.77 | 17.85 | 3.27 | 2.47 | ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||||||||
|
| Stand | 2.55 | —b | — | — | — | — | — | — | — | — | — | |||||||||||
|
| Sit | 2.65 | 2.09 | — | — | — | — | — | — | — | — | — | |||||||||||
|
| Walk | 2.87 | 2.57 | 2.72 | — | — | — | — | — | — | — | — | |||||||||||
|
| Wipe | 1.52 | 1.50 | 1.72 | 1.55 | — | — | — | — | — | — | — | |||||||||||
|
| Vacuum | 1.40 | 1.77 | 1.74 | 1.34 | 2.01 | — | — | — | — | — | — | |||||||||||
|
| Sweep | 1.61 | 1.09c | 1.59 | 1.36 | 1.60 | 2.55 | — | — | — | — | — | |||||||||||
|
| Lie | 1.56 | 1.24 | 2.39 | 1.38 | 2.12 | 2.60 | 2.34 | — | — | — | — | |||||||||||
|
| Exercise | 2.42 | 1.87 | 2.82 | 3.18 | 2.89 | 2.62 | 3.50 | 5.72 | — | — | — | |||||||||||
|
| Stretch | 17.70 | 3.01 | 4.47 | 6.34 | 4.88 | 4.83 | 11.79 | 7.63 | 12.65 | — | — | |||||||||||
|
| Clean | 1.45 | 1.59 | 1.71 | 1.52 | 2.46 | 2.03 | 2.08 | 5.28 | 8.06 | 3.42 | — | |||||||||||
|
| Read | 1.98 | 4.98 | 2.47 | 1.40 | 2.16 | 2.24 | 2.44 | 3.43 | 3.02 | 2.42 | 2.35 | |||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||||||||
|
| METd | 2.87 | — | — | — | — | — | — | — | — | — | — | |||||||||||
|
| Before train only | 2.17 | — | — | — | — | — | — | — | — | — | — | |||||||||||
|
| All actions (whole sequence) | 1.06e | — | — | — | — | — | — | — | — | — | — | |||||||||||
|
| All actions (32s/action) | 3.30 | — | — | — | — | — | — | — | — | — | — | |||||||||||
|
| Sequence start [ | 1.74 | — | — | — | — | — | — | — | — | — | — | |||||||||||
aBest single action.
bNot applicable.
cBest paired action.
dMET: metabolic equivalent task.
eBest baseline.
Figure 3Example energy expenditure estimations from silhouettes (recorded at 30 frames per second) using single action fine-tuning. The top example shows a success case where a model fine-tuned using only 32 seconds of wipe outperforms the whole sequence baseline, and that stretch is not a good action to use. The bottom example shows a failure case, where the models fine-tuned on a single action do not adapt to the period of high energy expenditure toward the end of a sequence. Seq: sequence.
Figure 4An example sequence of silhouettes and energy expenditure estimations. Here, the best pair of actions for calibration across all participants is compared against the best single action, a whole video sequence to calibrate, and shorter footage from every action. Seq: sequence.
Mean square error of baselines and single- and selected double-action fine-tuned models. The results are shown for each participant (“Pn”) individually along with the average over all participants. A blank entry indicates the action was not in video sequence used for fine-tuning.
| Actions | Participants | Average | ||||||||||||||||||||||
|
| P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 |
| |||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||||||||
|
| METa | 2.19 | 2.56 | 1.76 | 0.22 | 2.52 | 3.96 | 8.81 | 1.72 | 1.84 | 3.14 | 2.87 | ||||||||||||
|
| Before train only | 1.38 | 0.82 | 0.87 | 0.69 | 1.46 | 3.55 | 7.41 | 0.71 | 1.43 | 3.34 | 2.17 | ||||||||||||
|
| All (whole sequence) | 0.60 | 0.54 | 0.62 | 0.14 | 1.54 | 1.54 | 1.75 | 0.28 | 0.55 | 2.02 | 1.06b | ||||||||||||
|
| All (32s/action) | 0.85 | 0.41 | 0.74 | 0.09 | 1.11 | 2.53 | 22.79 | 0.79 | 0.63 | 3.10 | 3.30 | ||||||||||||
|
| Sequence start [ | 0.29 | 0.58 | 0.54 | 0.29 | 1.25 | 2.30 | 3.24 | 3.50 | 0.65 | 4.73 | 1.74 | ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||||||||
|
| Stand | 0.53 | 0.67 | 0.60 | 0.50 | 1.10 | 5.04 | 4.26 | 2.20 | 0.59 | 7.66 | 2.32 | ||||||||||||
|
| Sit | 0.49 | 0.92 | 0.42 | 0.21 | 1.13 | 3.02 | 3.35 | 3.12 | 0.42 | 8.96 | 2.20 | ||||||||||||
|
| Walk | 0.80 | 0.53 | 0.47 | 0.29 | 2.07 | 7.78 | 4.32 | 2.28 | 0.47 | 8.97 | 2.80 | ||||||||||||
|
| Wipe | 0.29 | 1.36 | 0.45 | 0.36 | 0.73 | 3.37 | 2.95 | 1.80 | 0.48 | 2.17 | 1.40c | ||||||||||||
|
| Vacuum | 0.79 | 0.63 | 0.54 | 0.60 | 1.67 | 2.95 | 5.18 | 1.89 | 0.85 | 4.29 | 1.94 | ||||||||||||
|
| Sweep | 1.01 | 0.57 | 0.81 | 0.47 | 0.62 | 2.85 | 9.24 | 3.29 | 0.39 | 2.31 | 2.15 | ||||||||||||
|
| Lie | 1.52 | 0.70 | 1.14 | 1.29 | 0.92 | 3.04 | 10.59 | 1.53 | 1.35 | 2.22 | 2.43 | ||||||||||||
|
| Exercise | 1.56 | 0.76 | —d | 2.96 | 0.59 | 5.74 | 7.59 | 7.47 | 0.80 | 33.41 | 6.77 | ||||||||||||
|
| Stretch | 5.93 | 46.52 | 0.48 | 5.82 | 5.16 | 21.19 | 30.64 | 13.86 | 30.81 | 18.05 | 17.85 | ||||||||||||
|
| Clean | 1.17 | 2.15 | 0.94 | 0.32 | 1.04 | 5.93 | 8.94 | 2.17 | 4.65 | 5.35 | 3.27 | ||||||||||||
|
| Read | 2.05 | 1.35 | 0.84 | 0.56 | 0.81 | 2.53 | 7.50 | 1.92 | 2.22 | 4.90 | 2.47 | ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||||||||
|
| Sweep/sit | 0.96 | 0.67 | 0.47 | 0.13 | 0.90 | 2.51 | 1.02 | 0.99 | 0.47 | 2.75 | 1.09e | ||||||||||||
| Lie/sit | 0.61 | 0.53 | 0.43 | 0.45 | 0.82 | 2.69 | 3.00 | 1.07 | 0.60 | 2.24 | 1.24 | |||||||||||||
|
| Vacuum/stand | 0.87 | 0.57 | 0.38 | 0.14 | 1.53 | 3.39 | 1.73 | 0.78 | 0.61 | 4.04 | 1.40 | ||||||||||||
| Vacuum/wipe | 0.48 | 0.64 | 0.64 | 0.19 | 1.21 | 3.25 | 1.59 | 1.63 | 0.60 | 3.15 | 1.34 | |||||||||||||
|
| Sweep/wipe | 0.60 | 0.59 | 0.67 | 0.16 | 1.01 | 2.52 | 4.16 | 1.02 | 0.54 | 2.35 | 1.36 | ||||||||||||
| Wipe/wipe | 0.57 | 0.95 | 0.48 | 0.11 | 1.07 | 3.88 | 3.43 | 1.98 | 0.56 | 2.42 | 1.55 | |||||||||||||
|
| Stretch/exercise | 2.83 | 2.19 | — | 4.18 | 3.58 | 8.74 | 14.63 | 8.93 | 0.93 | 22.70 | 7.63 | ||||||||||||
| Clean/stretch | 5.01 | 5.39 | 0.57 | 2.20 | 2.79 | 11.90 | 21.57 | 7.26 | 17.12 | 6.80 | 8.06 | |||||||||||||
|
| Stretch/lie | 1.78 | 9.14 | 0.61 | 1.19 | 3.70 | 2.98 | 77.20 | 3.46 | 9.64 | 8.23 | 11.79 | ||||||||||||
| Stretch/stand | 1.72 | 2.63 | 0.56 | 1.72 | 1.96 | 8.08 | 146.70 | 5.08 | 3.49 | 5.01 | 17.70 | |||||||||||||
aMET: metabolic equivalent task.
bBest baseline.
cBest single action.
dBlank entries indicate the action was not in the video sequence used for fine-tuning.
eBest action pair.