| Literature DB >> 33023619 |
Charlotte Lund Rasmussen1,2, Javier Palarea-Albaladejo3, Melker Staffan Johansson4, Patrick Crowley5, Matthew Leigh Stevens5, Nidhi Gupta5, Kristina Karstad5, Andreas Holtermann5,4.
Abstract
BACKGROUND: Researchers applying compositional data analysis to time-use data (e.g., time spent in physical behaviors) often face the problem of zeros, that is, recordings of zero time spent in any of the studied behaviors. Zeros hinder the application of compositional data analysis because the analysis is based on log-ratios. One way to overcome this challenge is to replace the zeros with sensible small values. The aim of this study was to compare the performance of three existing replacement methods used within physical behavior time-use epidemiology: simple replacement, multiplicative replacement, and log-ratio expectation-maximization (lrEM) algorithm. Moreover, we assessed the consequence of choosing replacement values higher than the lowest observed value for a given behavior.Entities:
Keywords: Compositional data analysis; Missing data; Physical activity; Sedentary time; Time-use
Mesh:
Year: 2020 PMID: 33023619 PMCID: PMC7542467 DOI: 10.1186/s12966-020-01029-z
Source DB: PubMed Journal: Int J Behav Nutr Phys Act ISSN: 1479-5868 Impact factor: 6.457
Characteristics of the study sample (n = 1310)
| Variable | n | % | Mean (SD) | Range |
|---|---|---|---|---|
| Age in years | 1296 | 45.1 (10.1) | [18.0; 71.0] | |
| BMI in kg/m2 | 1257 | 26.7 (4.8) | [15.4; 45.1] | |
| Sex | ||||
| Men | 542 | 42 | ||
| Women | 754 | 58 | ||
| Cohort | ||||
| NOMAD | 230 | 18 | ||
| DPhacto | 686 | 52 | ||
| DOSES | 394 | 30 | ||
| Working sector | ||||
| Cleaning | 159 | 12 | ||
| Manufacturing | 540 | 41 | ||
| Transportation | 73 | 6 | ||
| Health Service | 409 | 31 | ||
| Assemblers | 33 | 3 | ||
| Construction | 39 | 3 | ||
| Garbage Collectors | 27 | 2 | ||
| Mobile Plant Operators | 10 | 1 | ||
| OtherA | 20 | 2 | ||
BMI body mass index, SD standard deviation. AIncludes general office clerks and other elementary workers
Fig. 1Difference in ADG, RDTV, and RDCM FOR EACH REPLACEMENT METHODS ACROSS THE SIX SIMULATED SCENARIOS. Comparison between complete dataset and dataset with replaced zeros using average difference in geometric means (ADG), relative difference in total variance (RDTV) and relative difference in ilr-covariance matrices (RDCM). Low values for ADG, RDTV and RDCM indicate small differences in geometric means, total variance and covariance structure, respectively, between the complete and replaced datasets. EM is the log-ratio expectation-maximization replacement method, MR is the multiplicative replacement method, and SR is the simple replacement method. The points indicate mean across 1000 simulated data sets and vertical lines represent ± standard deviation
Compositional means of complete dataset and datasets with replaced zeros
| Compositional mean in minutes/day | |||||||
|---|---|---|---|---|---|---|---|
| Replacement method | SB | Standing | Walking | Running | Stairs | TIB | |
| Complete dataset | 516.04 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 | |
| 20% zeros, Replacement value of 0.5 minA | SR | 514.01 | 350.52 | 124.62 | 1.62 | 6.14 | 441.09 |
| MR | 516.01 | 350.53 | 124.62 | 1.60 | 6.14 | 441.10 | |
| lrEM | 516.04 | 350.54 | 124.63 | 1.54 | 6.14 | 441.12 | |
| 20% zeros, Considering observation thresholdB | SR | 516.04 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 |
| MR | 516.04 | 350.54 | 124.63 | 1.54 | 6.14 | 441.12 | |
| lrEM | 516.05 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 | |
| 10% zeros, Replacement value of 0.5 minA | SR | 516.03 | 350.53 | 124.62 | 1.58 | 6.14 | 441.10 |
| MR | 516.01 | 350.52 | 124.62 | 1.62 | 6.14 | 441.09 | |
| lrEM | 516.04 | 350.54 | 124.63 | 1.54 | 6.14 | 441.12 | |
| 10% zeros, Considering observation thresholdB | SR | 516.04 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 |
| MR | 516.04 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 | |
| lrEM | 516.05 | 350.54 | 124.63 | 1.53 | 6.14 | 441.12 | |
MR Multiplicative replacement method, lrEM log-ratio Expectation-Maximization replacement method, SB sedentary behavior, SR simple replacement method, TIB time in bed. ASR and MR use 0.5 min for replacement, whereas lrEM estimates a value below the observation threshold. BSR and MR use 65% the observation threshold for replacement, whereas lrEM estimates a value below the observation threshold
Fig. 2Biplots of complete and replaced datasets (20% zeros, 0.5 minutes replacement). Zeros replaced using simple, multiplicative, and lrEM replacement. Note that the use of the fixed value of 0.5 minutes to replace zeros only affects the simple and multiplicative replacement methods, whereas the lrEM method by construction replaces zeros with estimated values below the observation threshold. Individuals for which zeros have been imposed and replaced are indicated with a different color
Fig. 3Biplots of complete and replaced datasets (20% zeros, observation threshold-based replacement). Zeros replaced using simple, multiplicative and lrEM replacement. The simple and multiplicative replacement methods were set up to replace zeros with 65% the observation threshold, whereas the lrEM by construction replaces zeros with estimated values below the observation threshold. Individuals with imposed and replaced zeros are indicated with a different color