| Literature DB >> 36219622 |
Kylie E Hunter1, Angela C Webster1, Mike Clarke2, Matthew J Page3, Sol Libesman1, Peter J Godolphin4, Mason Aberoumand1, Larysa H M Rydzewska4, Rui Wang5, Aidan C Tan1, Wentao Li5, Ben W Mol5, Melina Willson1, Vicki Brown6, Talia Palacios1, Anna Lene Seidler1.
Abstract
Individual participant data meta-analyses enable detailed checking of data quality and more complex analyses than standard study-level synthesis of summary data based on publications. However, there is limited existing guidance on the specific systematic checks that should be undertaken to confirm and enhance data quality for individual participant data meta-analyses and how to conduct these checks. We aim to address this gap by developing a checklist of items for data quality checking and cleaning to be applied to individual participant data meta-analyses of randomised trials. This study will comprise three phases: 1) a scoping review to identify potential checklist items; 2) two e-Delphi survey rounds among an invited panel of experts followed by a consensus meeting; and 3) pilot testing and refinement of the checklist, including development of an accompanying R-markdown program to facilitate its uptake.Entities:
Mesh:
Year: 2022 PMID: 36219622 PMCID: PMC9553056 DOI: 10.1371/journal.pone.0275893
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Stages of an individual participant data meta-analysis, highlighting where CHIPPR fits in and interrelated types of data checks (adapted from Rydzewska & Tierney) [4].
Dimensions of data quality.
| Data dimension | Definition | Examples of non-compliance |
|---|---|---|
|
| The degree to which i) all required variables, ii) all required records, and iii) all required data values are present in the dataset | • Birthweight variable is missing from dataset, but is reported in publication |
| • Not all randomised participants are present in the dataset | ||
| • Birthweight is not provided for every record/participant | ||
|
| The degree to which a data pattern meets expectations. Includes time series trends, expectations of randomness and digit preference | • Allocation patterns not random |
| • Despite equal chance a digit may take any value, it is never 5 or mostly 8 | ||
| • Participant height at time 2 is less than height at time 1 | ||
|
| The degree to which values and variables are in accordance with IPD study codebook either in their original form or after transformation. Includes variable definition, format specification, categories and outcome scales | • Nutritional intake should be in kilojoules but is provided in calories |
| • Post-partum haemorrhage defined as blood loss >600ml rather than >500ml | ||
| • Date recorded as mm/dd/yy instead of dd/mm/yyyy | ||
| • Sex should be coded male = 1, female = 2, but is female = 1, male = 2 | ||
|
| The degree to which i) data values match corresponding publications or reports (external consistency), or ii) data values of two or more variables within a participant are logical or comply with a rule or equation (internal consistency) | • Publication reports 52/168 children had obesity; data shows 49/170 |
| • Protocol states eligible age is ≥18 years, but participant is 15 years old | ||
| • Participant body mass index does not equal their weight/height2 | ||
| • Participant age at baseline was 50, but age at follow-up was 45 | ||
| Do not confuse consistency with accuracy or correctness | ||
| • Hours slept at night plus hours slept during the day is > 24 hours | ||
| • Date of follow-up assessment occurs before date of enrolment | ||
|
| The degree to which dates or data values are within a pre-specified or valid range. A data value can be valid but not accurate | • An enrolment date that occurs outside of study start and end dates |
| • Score on a scale of 1–10 is 12 | ||
| • Gestational age at birth is 54 weeks (outlier), but valid range is 20–45 weeks | ||
|
| The degree to which data values match knowledge of the real world (this can be seen as a type of consistency). Values may be possible but not plausible | • 90% of participants died despite scoring well on measures of health |
| • Participant self-reported intense physical activity as 20hrs/day | ||
| • 80% of routine visits occurred on a public holiday | ||
|
| The degree to which records occur only once in a dataset | • Duplicate participant identifiers |
| • Identical values for all variables except participant identifier for ≥2 records |
Adapted from Black et al. [12, 13].
Fig 2Overview of project phases.