| Literature DB >> 34888420 |
Kamala Adhikari1, Scott B Patten1, Alka B Patel1,2, Shahirose Premji3, Suzanne Tough1,4, Nicole Letourneau1,4,5,6, Gerald Giesbrecht1,4, Amy Metcalfe1,7,8.
Abstract
Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.Entities:
Keywords: cohort studies; comparable dataset; data harmonization; data pooling or combination; harmonization strategies
Mesh:
Year: 2021 PMID: 34888420 PMCID: PMC8631396 DOI: 10.23889/ijpds.v6i1.1680
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
|
|
|
|
|
|
|---|---|---|---|---|
| Maternal age | Variable name: Q1MMAGE2 Construct: Maternal age at recruitment | Variable name: MAQ |
Complete matching of construct Complete matching of response or data type and coding, except missing value coding (partial matching) | Maternal age variables with continuous data combined and recoded as
<35 years ≥35 years . Missing |
| Marital status | Variable name: Q1MMSTAT1 1 Single 2 Single with partner 3 Married 4 Common-law 5 Divorced 6 Separated . Missing | Variable name: MAGB1 0 Single 1 Married 2 Divorced 3 Common-law 4 Widowed 5 Separated 999 Missing |
Complete matching construct Partial matching of variable response and coding 0 Single 1 Married/common-law 2 Divorced/separated/widowed . Missing | Variables with the following categories combined
0 Single 1 Married/common-law 2 Divorced/separated/widowed . Missing |
| Maternal ethnicity | Variable name: Q1METH1_2 Construct: Ethnic origin 0 Others 1 White/Caucasian | Variable name: MAGB16 1 Caucasian 2 Chinese 3 Filipino 4 Japanese 5 Korean 6 Latin American 7 Aboriginal/Native 8 South Asian 9 South East Asian 10 Arab 11 West Asian 12 Black 13 Others |
Complete matching construct Partial matching of variable response and coding 0 Others 1 White/Caucasian Variable renamed with same name | Variables with the following categories combined
0 Others 1 White/Caucasian |
| Body mass index | Variable name: Q1MHW8 | Variable name: MAANTH2 and MBANTH2 |
Complete matching construct Partial matching of data coding or management system Combined 2 weight variables into one Combined 2 height variables into one Recoded missing (999) into (.) Calculated body mass index | Combined continuous body mass index variable and recoded as 4 categories
0 Underweight <18.5 1 Normal weight 18.5 – 24.9 2 Overweight 25 – 29.9 3 Obese 30+ |
| Parity | Variable name: Q1MPPI1_1 0 No previous births 1 Previous birth to a fetus (at least once) . Missing 1 to 7 Missing (.) | Variable name: MAPI3 0 to 4 missing (999) |
Complete matching construct Partial matching variable response and coding 1 Primiparous 2 Multiparous 3 Grand multiparous (>2 live births) . “missing” | Variables with the following categories combined
1 Primiparous 2 Multiparous 3 Grand multiparous . Missing |
| Depression during pregnancy | Variable name: Q1MEDPS | Variable name: MAEPDS_Score |
Complete matching construct Partial matching in terms of number of measurements and measurement time during pregnancy (week of gestation) EPDS score in first trimester EPDS score in second trimester EPDS score third trimester | Three combined variables for depression during pregnancy
EPDS score in first trimester EPDS score in second trimester EPDS score third trimester |
| Anxiety during pregnancy | Variable name: Q1MSSAI | Variable name: MASCL_Score |
Completely un-matching variable Action taken: Harmonized anxiety score measured by each scale for each trimester using the same process for depression during pregnancy. Accordingly, three separate variables for anxiety during pregnancy by trimester (as for depression) for each anxiety scale were created. Overlapped participants and their anxiety data measured by both scales identified. | Anxiety data measured by two different scales were pooled as two different variables
For 231 participants who participated both studies, each variable contained anxiety data. For independent participants, each variable contained missing values if they did not have anxiety data measured by the same scale. |
| Anxiety during pregnancy, measured by EPDS-3A | Variable name: Q1MEDPS | Variable name: MAEPDS_Score |
Complete matching construct Partial matching in terms of number of measurements and measurement time during pregnancy (week of gestation) EPDS-3A score in first trimester EPDS-3A score in second trimester EPDS- 3A score third trimester | Three combined variables for anxiety during pregnancy
EPDS-3A score in first trimester EPDS-3A score in second trimester EPDS-3A score third trimester |
Note: AOF: All Our Families; APrON: Alberta Pregnancy Outcomes and Nutrition; EPDS: Edinburgh Postnatal Depression Scale; STAI-20: State-Trait Anxiety Inventory-State 20-item scale; SCL-90: Symptoms Checklist-90; EPDS-3A: Edinburgh Postnatal Depression scale- anxiety subscale.