| Literature DB >> 32169053 |
Marianne Huebner1,2, Werner Vach3, Saskia le Cessie4, Carsten Oliver Schmidt5, Lara Lusa6,7.
Abstract
BACKGROUND: In the data pipeline from the data collection process to the planned statistical analyses, initial data analysis (IDA) typically takes place between the end of the data collection and do not touch the research questions. A systematic process for IDA and clear reporting of the findings would help to understand the potential shortcomings of a dataset, such as missing values, or subgroups with small sample sizes, or shortcomings in the collection process, and to evaluate the impact of these shortcomings on the research results. A clear reporting of findings is also relevant when making datasets available to other researchers. Initial data analyses can provide valuable insights into the suitability of a data set for a future research study. Our aim was to describe the practice of reporting of initial data analyses in observational studies in five highly ranked medical journals with focus on data cleaning, screening, and reporting of findings which led to a potential change in the analysis plan.Entities:
Keywords: Initial data analysis; Observational studies; Reporting; STRATOS initiative
Mesh:
Year: 2020 PMID: 32169053 PMCID: PMC7071755 DOI: 10.1186/s12874-020-00942-y
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Search and selection of articles
| NEJM | JCO | Lancet | JAMA | CIRC | Total | |
|---|---|---|---|---|---|---|
| Selected papers via Pubmed search | 11 | 63 | 21 | 29 | 68 | 192 |
| Included according to criteria after reviewing abstract | 7 | 22 | 12 | 19 | 45 | 105 |
| Included according to criteria after reviewing full text article | 6 | 21 | 10 | 19 | 44 | 100 |
| Randomly selected for review | 5 | 5 | 5 | 5 | 5 | 25 |
Fig. 1Flow Diagram for Initial Data Analysis reporting
Characteristics of the included studies
| Study | Journal | Location | Years of participant selectiona | Study sizea | Data sourcea |
|---|---|---|---|---|---|
| Inohara et al. [ | JAMA | USA | 2013–2016 | 141,311 | Stroke registry |
| Purnell et al. [ | JAMA | USA | 1995–2014 | 453,162 | Transplant registry |
| Reges et al. [ | JAMA | Israel | 2005–2015 | 33,540 | Multiple hospitals |
| Snyder et al. [ | JAMA | USA | 2006–2007 | 8529 | Cancer registry |
| Yu et al. [ | JAMA | China | 2004–2008 | 271,217 | Nationwide Biobank |
| Biccard et al. [ | Lancet | 25 African countries | 2016 | 11,422 | Multiple hospitals |
| Wood et al. [ | Lancet | 19 high income countries | 1964–2010 | 599,912 | Multiple CVD registries and a biobank |
| Dziadzko et al. [ | Lancet | USA | 2000–2010 | 1294 | Single hospital and a medical registry of area residents |
| Zylbersztejn et al. [ | Lancet | UK, Sweden | 2003–2013 | 4,946,246 | Hospital episode registries, birth and death registries |
| Gilbert et al. [ | Lancet | UK | 2013–2015 | 22,139 | Hospital episode registry; death registry |
| Alexander et al. [ | Circulation | Australia | 1987–1996 | 80 | Childhood cardio-myopathy registry |
| Nazerian et al. [ | Circulation | Brazil, Germany, Italy, Switzerland | 2014–2016 | 1850 | Multiple hospitals |
| Pollack et al. [ | Circulation | USA, Canada | 2011–2015 | 2500 | Resuscitation outcomes registry |
| Puelacher et al. [ | Circulation | Switzerland | 2014–2015 | 2018 | Single hospital |
| Chao et al. [ | Circulation | Taiwan | 1996–2015 | 32,160 | Health Insurance database |
| Chow et al. [ | JCO | USA | 1962–2001 | 13,060 | Multiple hospitals |
| Kenzik et al. [ | JCO | USA | 2000–2011 | 72,408 | Cancer registry and Health insurance database |
| Degnim et al. [ | JCO | USA | 1967–2001 | 669 | Single hospital |
| Gundle et al. [ | JCO | USA | 1989–2014 | 2217 | Single hospital |
| Clarke et al. [ | JCO | USA | 2003–2015 | 944,227 | Multiple hospitals |
| Hoen et al. [ | NEJM | French territories in the Americas | 2016 | 555 | ZIKV pregnancy population cohort |
| Amarenco et al. [ | NEJM | Europe, Asia, Latin America | 2009–2011 | 3356 | Stroke registry |
| Calderon et al. [ | NEJM | Israel | 1980–2014 | 1,522,731 | Renal registry and population cohort |
| Kyle et al. [ | NEJM | USA | 1960–1994 | 1384 | Single hospital |
| Mead et al. [ | NEJM | USA | 2016–2017 | 184 | ZIKV male population cohort |
aOnly the development sample size (i.e not the validation sample size) was included here or the population of main interest for the analysis (i.e. not matched populations)
Number of papers with data screening statements by location in the paper
| Location in Paper | |||||
|---|---|---|---|---|---|
| Mentioned in papers, n (%) | M | R | D | S | |
| Description of non-outcome variables | 25 (100%) | 5 | 24 | 0 | 15 |
| Description of missing values of non-outcome variables | 19 (76%) | 6 | 12 | 0 | 6 |
| Reporting association between non-outcome variables | 14 (56%) | 5 | 6 | 0 | 5 |
| Description of non-outcome variables for subgroups | 21 (84%) | 2 | 19 | 1 | 11 |
| Description of transformation of non-outcome variables | 10 (40%) | 4 | 4 | 0 | 2 |
| Description of outcome variable(s) | 25 (100%) | 2 | 25 | 0 | 9 |
| Information of missing values for outcome variables | 12 (48%) | 3 | 7 | 3 | 4 |
| Description of methods for outcome variables | 19 (76%) | 13 | 4 | 0 | 1 |
| Description of missingness of subjects | 15 (60%) | 1 | 11 | 2 | 5 |
| Description of transformations in outcome variables | 7 (28%) | 1 | 6 | 0 | 0 |
Abbreviations: M Methods, R Results, D Discussion, S Supplement
Number of papers with changes of the analysis plan statements by location in the paper
| Reasons for change | Number of papers, n (%) | Location in Paper | |||
|---|---|---|---|---|---|
| M | R | D | S | ||
| Unexpected Values | 2 (8%) | 2 | 0 | 1 | 0 |
| Heterogeneity | 1 (4%) | 0 | 1 | 0 | 0 |
| Unexpected confounding | 2 (8%) | 1 | 1 | 2 | 0 |
| Variable Distribution | 4 (16%) | 3 | 1 | 1 | 0 |
| Other Data Properties | 2 (8%) | 2 | 0 | 0 | 0 |
| Missing Data | 5 (20%) | 4 | 1 | 1 | 0 |
Abbreviations: M Methods, R Results, D Discussion, S Supplement
Recommendations for reporting practice for initial data analyses
| Current reporting practice | Recommendations for improved reporting practice | |
|---|---|---|
| 1 | Information on IDA is sparse and may suffer from selective reporting | Full reporting of relevant results as supplementary material and reporting of all results with impact on analysis/interpretation in the paper |
| 2 | Information on IDA can be found in all sections of a paper. | • IDA methodology to be described in Methods; • IDA results to be described in Methods or Results; • Impact of IDA on interpretation to be described in discussion. |
| 3 | Distinction between pre-planned decisions and IDA-driven decisions are unclear. | Pre-planned decisions should be reported in Methods; IDA driven alterations of the analysis plan should be reported with motivation in Methods. |
| 4 | Characteristics of participants are listed without comments. | Participants’ characteristics should be checked for consistency with expectations and for potential impact on analysis and interpretation. At a minimum a statement should be included to confirm no violated expectations. |
| 5 | Reporting on missingness is incomplete. | Full reporting of missingness, e.g. a flow chart describes unit missingness and a table for item missingness of variables |
| 6 | Associations among variables are not reported. | Associations not involving the research question but with potential impact on interpretation of results should be reported |