Nidhi Gupta1, Charlotte Lund Rasmussen1,2, Andreas Holtermann1,3, Svend Erik Mathiassen4. 1. National Research Centre for the Working Environment, Department of Musculoskeletal Disorders and Physical Work Demands, Copenhagen Ø, Denmark. 2. Section of Social Medicine, Department of Public Health, University of Copenhagen, Copenhagen, Denmark. 3. Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark. 4. Centre for Musculoskeletal Research, Department of Occupational Health Sciences and Psychology, University of Gävle, Gävle, Sweden.
Abstract
Data on the use of time in different exposures, behaviors, and work tasks are common in occupational research. Such data are most often expressed in hours, minutes, or percentage of work time. Thus, they are constrained or 'compositional', in that they add up to a finite sum (e.g. 8 h of work or 100% work time). Due to their properties, compositional data need to be processed and analyzed using specifically adapted methods. Compositional data analysis (CoDA) has become a particularly established framework to handle such data in various scientific fields such as nutritional epidemiology, geology, and chemistry, but has only recently gained attention in public and occupational health sciences. In this paper, we introduce the reader to CoDA by explaining why CoDA should be used when dealing with compositional time-use data, showing how to perform CoDA, including a worked example, and pointing at some remaining challenges in CoDA. The paper concludes by emphasizing that CoDA in occupational research is still in its infancy, and stresses the need for further development and experience in the use of CoDA for time-based occupational exposures. We hope that the paper will encourage researchers to adopt and apply CoDA in studies of work exposures and health.
Data on the use of time in different exposures, behaviors, and work tasks are common in occupational research. Such data are most often expressed in hours, minutes, or percentage of work time. Thus, they are constrained or 'compositional', in that they add up to a finite sum (e.g. 8 h of work or 100% work time). Due to their properties, compositional data need to be processed and analyzed using specifically adapted methods. Compositional data analysis (CoDA) has become a particularly established framework to handle such data in various scientific fields such as nutritional epidemiology, geology, and chemistry, but has only recently gained attention in public and occupational health sciences. In this paper, we introduce the reader to CoDA by explaining why CoDA should be used when dealing with compositional time-use data, showing how to perform CoDA, including a worked example, and pointing at some remaining challenges in CoDA. The paper concludes by emphasizing that CoDA in occupational research is still in its infancy, and stresses the need for further development and experience in the use of CoDA for time-based occupational exposures. We hope that the paper will encourage researchers to adopt and apply CoDA in studies of work exposures and health.
Occupational research and practice has a long-standing interest in the use of time (Bauman ). Examples of exposures include time spent in different work tasks (Mathiassen ; Svendsen ; Lewné ; Notø ; Pulido ), in hazardous environments (Fruin ; Stewart ), on different physical activities (Thorp ; Rasmussen ; Gupta ), and in specific postures (Mathiassen ; Wahlström ; Palm ).Such time-based occupational exposures are often expressed in terms of minutes (e.g. per day), hours (e.g. per week), or percentages (e.g. of total working time). Thus, they are compositional: they form parts of a finite total such as a whole day, a whole week, or 100% working time. Therefore, any exposure part, such as time spent in a particular work task, will necessarily range between 0 and 100%, and correlate with time spent in other exposures that are also parts of the total 100%.Already in 1896, Pearson issued a warning of using standard statistical techniques when handling data adding up to a whole (Pearson, 1896). Since then, different approaches have been suggested on how to manage compositional data. Examples include multivariate pattern analysis (Aadland ), fractional multinomial models (Murteira and Ramalho, 2016), isotemporal substitution models (Mekary ), modified hierarchical regression (Jansen ), and compositional data analysis (CoDA) (Aitchison, 1982). The present paper focuses on the latter, i.e. CoDA, which has received particular attention lately.The landmark statistical basis of CoDA was developed in the early 1980s (Aitchison, 1982). Since then, CoDA has become an established framework of how to handle compositional data in, e.g. nutritional epidemiology (Leite, 2016), geology (Tolosana-Delgado and von Eynatten, 2009), and chemistry (Buccianti and Pawlowsky-Glahn, 2005). However, in public and occupational health sciences, CoDA has gained attention only recently (Pedišić, 2014; Chastin ; Pedisic ; Dumuid ; Foley ; Bauman ), with few papers devoted to exposures at work (Gupta , 2019; Rasmussen ; Hallman ; Coenen ).In this paper, we institute CoDA by, (i) explaining why time-based data need to be analyzed using CoDA, (ii) showing how data are processed and analyzed within a CoDA framework, and (iii) pointing at some remaining methodological challenges in CoDA, emphasizing the need for continued development. In giving this introduction to CoDA, we hope that readers will be inspired to adopt and apply CoDA when dealing with time-based occupational exposures.
The whys of CoDA
As emphasized above, data on time spent in various exposures, behaviors, or tasks during work are often compositional. Thus, Fig. 1A illustrates a cleaner spending 72, 9, 6, and 13% of the working hours on two different cleaning tasks, other work tasks, and breaks, respectively. These compositional ‘parts’ add up to 100% work time. Similarly, Fig. 1B and C illustrates the distributions of 100% working time in different postures for a construction worker (Fig. 1B) and in different physical behaviors for an office worker (Fig. 1C).
Figure 1.
Examples of compositions of time at work spent: (A) on different tasks for a cleaner; (B) in different categories of arm elevation for a construction worker, and (C) in sitting (in periods of <30 and ≥30 min), standing and moving for an office worker.
Examples of compositions of time at work spent: (A) on different tasks for a cleaner; (B) in different categories of arm elevation for a construction worker, and (C) in sitting (in periods of <30 and ≥30 min), standing and moving for an office worker.Because the composition is a finite total, changing the time spent in one part will inevitably lead to a change in time for at least one of the remaining parts. For example, for the office worker in Fig. 1C, reducing time in long sitting bouts will increase time spent in at least one of the remaining behaviors, i.e. short sitting bouts, standing, or moving. Notably, parts in the composition will still be dependent even if one or more are left out. Thus, even if only time in long sitting bouts (cf. Fig. 1C) is considered of interest in a particular study, it needs to be treated as part of a full composition including the other behaviors. The inevitable presence of those other behaviors also contribute in determining the possible effects of long sitting bouts.Previous occupational research dealing with compositional data has predominantly made the error of assuming that one part of a composition influences health independent of other parts in that composition. This may result in misleading inferences (e.g. Chastin ; Gupta ; Dumuid ). As an example, times spent in sedentary behavior and physical activity are very often treated as separate variables in epidemiology, even though they are complimentary parts of a composition (Dumuid ). Negative health effects claimed to result from extensive sitting may, therefore, actually be due to the complimentary behavior, i.e. too little non-sitting (van der Ploeg and Hillsdon, 2017; Stamatakis ). Analyzing compositional data using standard methods may even in some cases lead to absurd results, such as confidence intervals (CIs) including values less than 0% or more than 100% time (Coenen ).Similarly, in occupational practice, compositional data are often understood using a standard approach focusing on a single risk factor within the composition of multiple factors. For example, a recent guideline for nurses focused on reducing time spent standing without considering which other compositional parts (e.g. walking or sitting) should then be increased (Waters and Dick, 2015). This approach of addressing only isolated parts of a composition in research and practice needs to be revisited, and CoDA offers a suitable tool for that purpose.
The ‘hows’ of CoDA
Since the first comprehensive proposal of how to deal with compositional data in 1982, several extensive and excellent textbooks have presented CoDA procedures (e.g. Pawlowsky-Glahn ). Also, a number of recent papers have, in detail, reviewed and described basic issues in CoDA implementation (Chastin ; Dumuid , 2019, 2020).CoDA generally comprises a series of steps:Log-ratio transformation of compositional parts. The point of departure of CoDA is the notion that the information contained in a part of a composition can only be correctly understood if it is expressed relative to information about other part(s). Thus, in a first step of processing data, ratios are formed between compositional parts. Ratios of non-negative numbers can, however, only take non-negative values, and are still constrained. Therefore, in a second step, the ratios are log-transformed in order to arrive at numbers that can vary freely on the entire scale from minus to plus infinity. This two-step log-ratio transformation moves, in mathematical terms, data from a so-called Simplex to the Euclidian space, where standard data operate, and standard statistics can be used (Aitchison, 1986).Different principles for constructing ratios and performing the log-transformation have been described in the literature (Pawlowsky-Glahn ). In current research, the most widely used is the isometric log-ratio (ilr) transformation (Carson ; Biddle ; Dumuid ,b; Foley ; Hallman ; Coenen ). Using the ilr transformation on a composition with d parts results in d − 1 orthonormal ilr coordinates. This partition can be done in several ways if the composition consists of more than two parts. Which partition to use will depend on the research question. Appendix A, available at Annals of Work Exposures and Health online, exemplifies two possible sets of ilrs pertaining to the four-part composition illustrated in Fig. 1C, including the associated log-transformation formulas.2. Further analysis using standard statistical methods. After transformation of the composition into a set of log-ratios, standard statistical methods can be applied. Obviously, this must always be preceded by a control of assumptions associated with the intended statistical analyses.3. Interpretation of results. Since the statistical analyses are performed on log-transformed data, some results, e.g. regression coefficients, are also expressed on a log-scale, making straightforward interpretation of some statistical parameters difficult. Procedures have been proposed, easing the interpretation of results of regression analyses of exposure–outcome associations with compositions as the independent variable(s) (Chastin ; Dumuid , 2019; Gupta ); regression analyses with compositions as the dependent variable(s) (Rasmussen ; Hallman ); and ANOVA or MANOVA addressing differences in compositions between groups (Foley ; Gupta ).
Example of a CoDA application: is time sitting at work associated with low-back pain?
In an accelerometer-based study of 209 blue-collar workers (Gupta ), workers spent, on average, 478 min at work, distributed between sitting (175 min), standing (205 min), and walking (98 min). The workers reported an average low-back pain (LBP) intensity during the past month of 2.9 (SD 2.6) on a scale from 0 to 9.The three-part composition was transformed into a set of two ilrs, i.e. ilr1: the log-transformed ratio of sitting to the geometric mean of standing and walking; ilr2: the log-transformed ratio of standing to walking.We examined the association between the compositions, expressed in terms of the two ilrs, and LBP intensity using multiple linear regression, adjusted for confounders.The regression coefficient for ilr1 indicated that more sitting time relative to time spent in standing and walking is associated with higher LBP intensity (B = 0.73; 95% CI 0.25, 1.20; P = 0.003). Since the effect size estimate B (and its associated CI) is measured on a logarithmic scale, it needs to be ‘back-transformed’ to the original scale (i.e. minutes) for ease of interpretation. For this purpose, we used the ‘compositional isotemporal substitution’ method (Dumuid ). This method interprets regression parameters in terms of the expected difference in LBP intensity (outcome) if time is reallocated to/from sitting from/to standing and walking (Dumuid ). The isotemporal substitution procedure is detailed in Appendix B, available at Annals of Work Exposures and Health online.We examined reallocations in the range of −60 to + 60 min of sitting, which was within limits occurring in the source data. Results are illustrated in Fig. 2. For instance, a reallocation of 30 min from sitting to standing and walking is estimated to be associated with a 0.17 (95% CI −0.28, −0.06) lower LBP intensity.
Figure 2.
Isotemporal substitution illustrating the direction and strength of the association between time in sitting, relative to standing and walking, and LBP intensity. Zeroes on the x and y axes correspond to the average composition (175 min sitting, 205 min standing, 98 min walking), and the mean pain intensity (2.9) in the source population, respectively. Numbers on the x-axis show reallocations of time to/from sitting from/to standing and walking (see running text for details). For example, reallocating 60 min from sitting to standing and walking (41 min to standing and 19 min to walking) is estimated to be associated with a LBP 0.36 [95% CI (−0.59, −0.12)] lower than the group average.
Isotemporal substitution illustrating the direction and strength of the association between time in sitting, relative to standing and walking, and LBP intensity. Zeroes on the x and y axes correspond to the average composition (175 min sitting, 205 min standing, 98 min walking), and the mean pain intensity (2.9) in the source population, respectively. Numbers on the x-axis show reallocations of time to/from sitting from/to standing and walking (see running text for details). For example, reallocating 60 min from sitting to standing and walking (41 min to standing and 19 min to walking) is estimated to be associated with a LBP 0.36 [95% CI (−0.59, −0.12)] lower than the group average.Isotemporal substitutions offer one way to interpret regression estimates based on CoDA. However, they suffer from not being able to answer the obvious question of ‘what is the expected outcome associated with a specified composition?’ Thus, we suggest developing alternative graphical illustrations to answer this question. Fig. 3 shows an example suiting a three-part composition as that in the example study. The figure illustrates the estimated LBP associated with a specific composition within the range represented in the source population. For example, a composition of 24% sitting, 52% standing, and 24% walking is estimated to be associated with a LBP intensity of 2.6 while a composition of 49% sitting, 34% standing, and 17% walking corresponds to an estimated intensity of 3.2. While this ternary plot alternative to isotemporal substitution works well for three-part compositions, it remains a challenge to visualize regression models developed for compositions with four or more parts.
Figure 3.
A ternary plot illustrating the estimated LBP intensity at different compositions of sitting, standing, and walking, according to a CoDA regression analysis among 209 blue-collar workers. The gray-blue contour plot indicates the occurrence of compositions in the source population, with the density of the gray-blue color representing the number of workers; lighter blue color, higher density. For example, many workers had compositions of about 20% sitting, 55% standing, and 25% walking (upper right ‘mountain’), and many had about 60% sitting, 30% standing, and 10% walking (lower left mountain). The circles illustrate the estimated pain intensity for selected compositions, sizes coded as shown in the legend. The white dot shows the average composition and pain intensity in the source population.
A ternary plot illustrating the estimated LBP intensity at different compositions of sitting, standing, and walking, according to a CoDA regression analysis among 209 blue-collar workers. The gray-blue contour plot indicates the occurrence of compositions in the source population, with the density of the gray-blue color representing the number of workers; lighter blue color, higher density. For example, many workers had compositions of about 20% sitting, 55% standing, and 25% walking (upper right ‘mountain’), and many had about 60% sitting, 30% standing, and 10% walking (lower left mountain). The circles illustrate the estimated pain intensity for selected compositions, sizes coded as shown in the legend. The white dot shows the average composition and pain intensity in the source population.
Results and interpretations based on a standard analysis approach
We also analyzed the same dataset using a standard (non-compositional) multiple linear regression. We observed—as expected—that it was not possible to include all three behaviors in the regression model at the same time, since their perfect collinearity resulted in a singular covariance matrix. Therefore, two different models were constructed, one including sitting and standing (but not walking); the other including sitting and walking (but not standing).The estimated association of sitting time with LBP intensity differed considerably between the two models. The model including only sitting and standing resulted in a regression coefficient B for sitting of 0.46 (95% CI −0.06, 0.98; P = 0.08), while the effect of sitting was less pronounced, yet more certain, according to the other model including sitting and walking: B = 0.32 (95% CI 0.01, 0.63; P = 0.04). This illustrates an annoying—and misleading—result of analyzing compositional data using standard procedures, i.e. that the association between a particular behavior and LBP may differ depending on which other behavior is omitted from the statistical model.
The future of CoDA in occupational studies
CoDA is a suitable tool when dealing with data expressing tasks, exposures, and behaviors in terms of time use. Associations between time use and health outcomes, or differences in time use between occupational groups or working conditions need to be examined with consideration to the constrained and correlated nature of compositional data. CoDA takes into account the complete combination of exposures or behaviors, e.g. when determining effects on health, as opposed to the standard approach of addressing one exposure, behavior or risk factor at a time. Thus, CoDA shifts the focus of research and practice from the influence of a single exposure to understanding and intervening on exposures occurring together, as parts of the total time spent at work.We wish to emphasize, however, that occupational risk prevention strategies also need to consider exposures and behaviors during non-work time. As an example, the effect of high occupational physical activity on workers’ health has been shown to depend on the extent of leisure time physical activity (Holtermann ; Hallman ). This extension of an occupational health perspective to include non-work exposures that are otherwise most often covered by public health studies is a prerequisite for understanding the contributions of work to health, well-being, and social equality, in a 24/7 approach (Holtermann , 2020). CoDA offers an attractive opportunity for such analyses in allowing information from non-work time to be included in an occupational research context, as part of a full-day composition (Gupta ).In the present paper, we exemplify the use of CoDA in cases where exposure is compositional, such as in regression analysis of effects of physical behaviors on a health outcome such as LBP. We emphasize that CoDA is equally justified if the outcome is compositional; or both the exposure and the outcome, such as illustrated in a recent study on associations between physical behaviors at work and during leisure (Rasmussen ).CoDA procedures and applications have developed considerably since it first appeared in 1982 (Aitchison, 1982), but implementation of CoDA is still in its infancy in the area of work exposures and health. Thus, research is needed to obtain more experience in the pros and cons of CoDA, including addressing a number of issues that still need to be resolved. One important issue is how to deal with zeroes in compositional parts since CoDA builds on log-transformed ratios not allowing such zeroes. Rounded zeroes resulting from very little time being trimmed to a zero, or from the data sampling strategy not detecting a particular exposure that does, in fact, occur at times, can be handled (Martín-Fernández , 2012). Essential, ‘true’ zeroes, however, are still a major challenge in CoDA (Martín-Fernández ). Another challenge in CoDA is how to illustrate and interpret CoDA-based results in terms of useful metrics and informative diagrams. We suggested one way of visualizing regression results (cf. Fig. 3), and we encourage further developments addressing this issue. CoDA addresses summary metrics of time use, such as percentages of a time total, but not the real-time sequence of exposure, such as whether periods of walking are followed by standing or by sitting. Some papers have suggested approaches for handling real-time properties of time-use data (e.g. Paraschiv-Ionescu ; Chinapaw ), and integrating CoDA with such methods would be of high interest.
Conclusion
In this paper, we have presented reasons why tasks, exposures, and behaviors expressed in terms of time need to be processed and analyzed using methods that acknowledge their compositional properties. We have argued that CoDA answers this need, and we have explained how to use CoDA, hoping that the present paper will inspire readers to adopt and apply CoDA. While CoDA may appear unfamiliar and difficult at present, we believe that it will eventually be adopted as a standard approach in studies of work exposures and health.
Funding
Funding for this project was provided by the Danish Working Environment Research Fund (grant no. 01-2015-09 and 01-2015-03) and the Swedish Research Council for Health, Working Life and Welfare (Dnr. 2009-1761).
Conflict of interest
The authors declare no conflict of interest relating to the material presented in this Article. Its contents, including any opinions and/or conclusions expressed, are solely those of the authors.Click here for additional data file.Click here for additional data file.
Authors: Peter Palm; Nidhi Gupta; Mikael Forsman; Jørgen Skotte; Tobias Nordquist; Andreas Holtermann Journal: Ann Work Expo Health Date: 2018-07-06 Impact factor: 2.179
Authors: David M Hallman; Svend Erik Mathiassen; Allard J van der Beek; Jennie A Jackson; Pieter Coenen Journal: Int J Environ Res Public Health Date: 2019-08-27 Impact factor: 3.390
Authors: Anders Fritz Lerche; Svend Erik Mathiassen; Charlotte Lund Rasmussen; Leon Straker; Karen Søgaard; Andreas Holtermann Journal: Int J Environ Res Public Health Date: 2021-04-28 Impact factor: 3.390
Authors: David M Hallman; Leticia Bergamin Januario; Svend Erik Mathiassen; Marina Heiden; Sven Svensson; Gunnar Bergström Journal: BMC Public Health Date: 2021-03-17 Impact factor: 3.295
Authors: Lisa-Marie Larisch; Emil Bojsen-Møller; Carla F J Nooijen; Victoria Blom; Maria Ekblom; Örjan Ekblom; Daniel Arvidsson; Jonatan Fridolfsson; David M Hallman; Svend Erik Mathiassen; Rui Wang; Lena V Kallings Journal: Int J Environ Res Public Health Date: 2021-04-15 Impact factor: 3.390
Authors: Andreas Holtermann; Charlotte Lund Rasmussen; David M Hallman; Ding Ding; Dorothea Dumuid; Nidhi Gupta Journal: Sports Med Open Date: 2021-12-20
Authors: Nidhi Gupta; Charlotte Lund Rasmussen; Mikael Forsman; Karen Søgaard; Andreas Holtermann Journal: Scand J Work Environ Health Date: 2021-11-28 Impact factor: 5.492
Authors: Charlotte Lund Rasmussen; Dorothea Dumuid; Karel Hron; Nidhi Gupta; Marie Birk Jørgensen; Kirsten Nabe-Nielsen; Andreas Holtermann Journal: BMC Public Health Date: 2021-07-07 Impact factor: 3.295
Authors: Anders Fritz Lerche; Maja Vilhelmsen; Kathrine Greby Schmidt; Rasmus Kildedal; Natja Launbo; Pernille Kold Munch; Mark Lidegaard; Sandra Schade Jacobsen; Charlotte Lund Rasmussen; Svend Erik Mathiassen; Leon Straker; Andreas Holtermann Journal: Int J Environ Res Public Health Date: 2020-10-12 Impact factor: 3.390
Authors: Luiz Augusto Brusaca; Dechristian França Barbieri; Svend Erik Mathiassen; Andreas Holtermann; Ana Beatriz Oliveira Journal: Int J Environ Res Public Health Date: 2021-06-10 Impact factor: 3.390
Authors: Venerando Rapisarda; Carla Loreto; Laura De Angelis; Giuditta Simoncelli; Claudia Lombardo; Riccardo Resina; Nicola Mucci; Agata Matarazzo; Luigi Vimercati; Caterina Ledda Journal: Int J Environ Res Public Health Date: 2021-12-10 Impact factor: 3.390
Authors: Suzanne Lerato Merkus; Pieter Coenen; Mikael Forsman; Stein Knardahl; Kaj Bo Veiersted; Svend Erik Mathiassen Journal: Int J Environ Res Public Health Date: 2022-02-26 Impact factor: 3.390