| Literature DB >> 32396543 |
Lennard L van Wanrooij1, Marieke P Hoevenaar-Blom1,2, Nicola Coley3,4, Tiia Ngandu5, Yannick Meiller6, Juliette Guillemont7, Anna Rosenberg8, Cathrien R L Beishuizen1, Eric P Moll van Charante9, Hilkka Soininen8,10, Carol Brayne11, Sandrine Andrieu3,4, Miia Kivipelto5,8,12,13, Edo Richard1,2.
Abstract
BACKGROUND: Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results.Entities:
Year: 2020 PMID: 32396543 PMCID: PMC7217432 DOI: 10.1371/journal.pone.0232970
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Main study characteristics of pooled studies.
| preDIVA | FINGER | MAPT | |
|---|---|---|---|
| 2006 | 2009 | 2008 | |
| 2015 | 2014 | 2014 | |
| 3526 | 1260 | 1679 | |
| 70–78 | 60–75 | >70 | |
| 6–8 | 2 | 3 | |
| dementia incidence, disability level | change in cognitive function | change in cognitive function | |
| cardiovascular events, change in cognitive function, depression | cardiovascular events, dementia incidence, depression, disability level, quality of life, health resources utilization | cardiovascular events functional assessment, depression, dementia incidence, health resources utilization |
Fig 1Flowchart of step 1 and step 2 analyses.
Fig 2Pooling accuracy for three data recoding categories.
Change of the beta coefficient of an association when using the recoded variable as a confounder compared to the original variable for the variables with less than 80% explained variance after recoding to assess the impact of information loss on the validity of associations.
| Original variable | Recoded variable | Study | Proportion of variance explained | Dependent variable | Independent variable | Βeta of independent variable when using original confounder | Βeta of independent variable when using recoded confounder | Change in Beta >10% |
|---|---|---|---|---|---|---|---|---|
| Number of years stopped smoking | Stopped smoking more than 3 years (yes/no) | preDIVA | 0.10 | BMI | Age | -0.027 (-0.098 to 0.043, p = 0.448) | -0.042 (-0.112 to 0.028, p = 0.237) | yes |
| Glucose level (mmol/L) | Glucose normal (yes/no) | preDIVA | 0.20 | Diagnosis of diabetes | LDL | -0.076 (-0.086 to -0.065, p < .001) | -0.125 (-0.138 to -0.112, p < .001) | yes |
| Glucose level (g/L) | Glucose normal (yes/no) | MAPT | 0.35 | Diagnosis of diabetes | LDL | -0.038 (-0.085 to 0.009, p = 0.111) | -0.045 (-0.090 to 0.000, p < .001) | yes |
| Zung | Depression (yes/no) | FINGER | 0.42 | MMSE | Age | -0.043 (-0.068 to -0.017, p = 0.001) | -0.045 (-0.070 to -0.019, p = 0.001) | no |
| Glucose level (mmol/L) | Glucose normal (yes/no) | FINGER | 0.51 | Diagnosis of diabetes | LDL | -0.083 (-0.102 to -0.064, p < .001) | -0.084 (-0.102 to -0.065, p < .001) | no |
| GDS-15 | Depression | preDIVA | 0.58 | MMSE sum score | Age | -0.016 (-0.040 to 0.007, p = 0.179) | -0.020 (-0.045 to 0.004, p = 0.107) | yes |
| GDS-15 sum score | Depression | MAPT | 0.63 | MMSE sum score | Age | -0.047 (-0.064 to -0.030, p < .001) | -0.048 (-0.066 to -0.031, p < .001) | no |
| Heart disease: Father (yes/no) Mother (yes/no) Sibling (yes/no) Child (yes/no) | Family history of heart disease (yes/no) | preDIVA | 0.78 | History of heart disease (yes/no) | LDL | -0.130 (-0.145 to -0.114, p < .001) | -0.130 (-0.145 to -0.114, p < .001) | no |
a Body mass index
b low-density lipoprotein
c Zung Self-Rating Depression Scale
d Mini-Mental State Examination
e Geriatric Depression Scale
Fig 3R2s for simulations in which numbers between 0 and 1000 are recoded to 0 and those above 1000 to 1.
For all iterations, 1000 numbers between 0 and 1000 were sampled and recoded to 0. Numbers that were recoded to 1 originated from sampling numbers between 1001 and N with sample size K. Left: K was a constant of 1000, N was between 1 and 34 times as high or low as 1000 (simulation type 1a, red circles); N was a constant of 2001, K was between 1 and 34 times as high or low as 1000 (simulation type 1b, blue triangles). Right: N was between 1 and 34 times as high or low as 1000 and K was between 1 and 34 times as high or low as 1000 (simulation type 2).