| Literature DB >> 35742269 |
Rémi Colin-Chevalier1,2,3, Frédéric Dutheil1,2,3, Sébastien Cambier4, Samuel Dewavrin3, Thomas Cornet3, Julien Steven Baker5, Bruno Pereira4.
Abstract
Ever greater technological advances and democratization of digital tools such as computers and smartphones offer researchers new possibilities to collect large amounts of health data in order to conduct clinical research. Such data, called real-world data, appears to be a perfect complement to traditional randomized clinical trials and has become more important in health decisions. Due to its longitudinal nature, real-world data is subject to specific and well-known methodological issues, namely issues with the analysis of cluster-correlated data, missing data and longitudinal data itself. These concepts have been widely discussed in the literature and many methods and solutions have been proposed to cope with these issues. As examples, mixed and trajectory models have been developed to explore longitudinal data sets, imputation methods can resolve missing data issues, and multilevel models facilitate the treatment of cluster-correlated data. Nevertheless, the analysis of real-world longitudinal occupational health data remains difficult, especially when the methodological challenges overlap. The purpose of this article is to present various solutions developed in the literature to deal with cluster-correlated data, missing data and longitudinal data, sometimes overlapped, in an occupational health context. The novelty and usefulness of our approach is supported by a step-by-step search strategy and an example from the Wittyfit database, which is an epidemiological database of occupational health data. Therefore, we hope that this article will facilitate the work of researchers in the field and improve the accuracy of future studies.Entities:
Keywords: cluster-correlated data; longitudinal data; methodological issues; missing data; modeling; occupational health; real-world data
Mesh:
Year: 2022 PMID: 35742269 PMCID: PMC9222958 DOI: 10.3390/ijerph19127023
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Data diversity of real-world longitudinal occupational health databases (see Appendix A for the novelty and usefulness of our approach and main formulas surrounding cluster-correlated data, missing data, and longitudinal data). SEM: structural equation model, CLPM: “cross-lagged” panel model.
Figure 2Example of cluster-correlated data: here, a prospective cohort of workers from multiple companies is followed over time.
Figure 3Missing data structures. White sections indicate the presence of data, shaded sections their absence.
Imputation methods by type of missing data.
| Missing Completely at Random | Missing at | Missing Not at Random | |
|---|---|---|---|
| Expectation maximization algorithm | “Sensitivity analysis” | ||
|
| |||
| Multiple imputation | |||
Figure 4Evolutionary trajectories of job satisfaction among Wittyfit users: (a) mean trajectory using a mixed model (standard errors of coefficient are symbolized by error bars), (b) individual trajectories using a group-based trajectory model, where each group represent a possible evolution of the workers’ job satisfaction in the population (e.g., individuals belonging to “Group 3” are characterized by a slight decrease between times 1 and 2, then by a sharp increase beyond time 2).
Summary of the different models.
| Model | Search Strategy | Mathematical Formulation | Missing Data | Advantages | Drawbacks |
|---|---|---|---|---|---|
| MULTILEVEL MODELING | |||||
| MLM | 3585 |
|
|
Calculation of the intra-cluster variability apart from the overall variability |
Validity of conclusions highly dependent on the cluster effect specification and definition and interpretation or the model parameters |
| FIRST APPROACHES | |||||
| ANOVA for repeated measures | 1547 |
|
|
Comparison of group’s means across time |
No information on possible individual trajectories Time effect as fixed effect |
| Mixed model | 1983 |
|
Combination of fixed and random effects Randomly missing estimates unbiased Possible consideration of subject-specific/cluster effect |
Only provides information on the average trajectory followed by the population | |
| TRAJECTORIES | |||||
| GCM | 126 |
|
|
Person-centered method Identification of different trajectories within population Assigning individuals to a trajectory Possibility to study each trajectory shape with static covariates Possible consideration of cluster effect |
Questioning of the validity of individual trajectories caused by data missingness Need for sufficient amount of data for each individual |
| GMM | 50 |
| |||
| LCA | 154 |
| |||
| GBTM | 78 |
| |||
| COMPLEMENTARY APPROACHES | |||||
| SEM | 380 |
|
|
Description of latent process |
Need for a large sample size to fit |
| CLPM | 105 |
|
|
Study of causal relationships |
Need for a large sample size to fit Strong assumptions (stationarity and synchronicity) |
Legend: Missing data: a: complete-cases analysis, b: possible estimates computation despite missing data, *: unbiased estimates. Number of articles (search strategy in PubMed): The keywords (“occupation*” OR “profession*” OR “job-related” OR “work-related”) were linked with the following keywords: MLM: (multilevel OR multi-level). Repeated measures ANOVA: (ANOVA). Mixed model: (mixed). GCM: (GCM OR growth curve). GMM: (GMM OR growth mixture). LCA: (LCA OR LCGA OR latent class). GBTM: (GBTM or group-based trajector*). SEM: (SEM or structural equation). CLPM: (CLPM or cross-lagged).