Literature DB >> 35504648

Exploratory analyses in aetiologic research and considerations for assessment of credibility: mini-review of literature.

Kim Luijken¹, Olaf M Dekkers², Frits R Rosendaal², Rolf H H Groenwold^2,3.

Abstract

OBJECTIVE: To provide considerations for reporting and interpretation that can improve assessment of the credibility of exploratory analyses in aetiologic research.
DESIGN: Mini-review of the literature and account of exploratory research principles.
SETTING: This study focuses on a particular type of causal research, namely aetiologic studies, which investigate the causal effect of one or multiple risk factors on a particular health outcome or disease. The mini review included aetiologic research articles published in four epidemiology journals in the first issue of 2021: American Journal of Epidemiology, Epidemiology, European Journal of Epidemiology, and International Journal of Epidemiology, specifically focusing on observational studies of causal risk factors of diseases. MAIN OUTCOME MEASURES: Number of exposure-outcome associations reported, grouped by type of analysis (main, sensitivity, and additional).
RESULTS: The journal articles reported many exposure-outcome associations: a mean number of 33 (range 1-120) exposure-outcome associations for the primary analysis, 30 (0-336) for sensitivity analyses, and 163 (0-1467) for additional analyses. Six considerations were discussed that are important in assessing the credibility of exploratory analyses: research problem, protocol, statistical criteria, interpretation of findings, completeness of reporting, and effect of exploratory findings on future causal research.
CONCLUSIONS: Based on this mini-review, exploratory analyses in aetiologic research were not always reported properly. Six considerations for reporting of exploratory analyses in aetiologic research were provided to stimulate a discussion about their preferred handling and reporting. Researchers should take responsibility for the results of exploratory analyses by clearly reporting their exploratory nature and specifying which findings should be investigated in future research and how. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Mesh：
Humans
Research Design

Year: 2022 PMID： 35504648 PMCID： PMC9062703 DOI： 10.1136/bmj-2021-070113

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Reports of aetiologic studies often have results of multiple exploratory analyses, with the aim of identifying topics for future research. Although this form of reporting might seem reasonable, it is not without risk, because compared with the results of a confirmatory study, assessing the credibility of exploratory findings is generally more complicated. The origin of exploratory data analysis can be traced back at least to Tukey in the 1960s and 1970s1 2 who encouraged statisticians to develop visualisation techniques for representing and capturing structures in datasets to establish new research questions. These new research questions should subsequently be answered with independent datasets (often termed confirmatory analysis). For example, when a new biomarker is thought to be part of a known causal pathway, performing a small preparatory exploratory study before conducting a full blown large cohort study seems worthwhile, because the cohort study is financially expensive and requires large investments of resources. Similarly, if a known exposure-outcome effect is thought to vary across subgroups of the population, exploring this idea first before embarking on confirmative analyses of the effect of heterogeneity seems appropriate. Even when researchers consider an analysis to be exploratory, a hypothesis is easily promoted to a fact. For example, findings in journal articles can be exaggerated to more certain statements in press releases and news articles.3 In medical science in particular, where results are sometimes quickly implemented in clinical practice, researchers should take responsibility for the results they report. The Hippocratic oath (“First, do not harm”) applies as well to medical research as it does to clinical practice. In this paper, we discuss issues that complicate the interpretation of exploratory analyses in causal studies. Causal research can refer to different types of research, such as randomised studies or intervention studies. We do not address these studies in our manuscript; we focus on aetiologic research, in which causes of disease are investigated. Specifically, the causal effect of risk factors on a health outcome or disease are studied, typically in an observational setting. We provide practical pointers for researchers on how to report exploratory analyses in aetiologic research and how to clarify what the exploratory results imply for future research and implementation in practice. We hope to encourage a discussion about the preferred handling and reporting of these analyses.

Methods

Exploratory analyses in aetiologic research

The term exploratory analysis typically refers to analyses for which the hypothesis was not specified before the data analysis.4 Considering exploratory analyses in a broader sense, however, is probably more relevant in aetiologic research, because of the observational data and clustering of analyses within cohorts. We use the term exploratory analyses here to indicate analyses that are initial and preliminary steps towards solving a research problem. Exploratory analyses are often conducted in addition to planned primary analyses of a study. We do not consider sensitivity analyses, where the main hypothesis is evaluated under different assumptions, to be exploratory in this paper. We also do not consider outcomes that are evaluated as a secondary objective but are correlated with the primary outcome to be exploratory, because these analyses contribute to the investigation of the primary research question. Genome-wide association studies, where the exploratory nature of analyses is commonly accounted for by looking at multiple testing,5 are beyond the scope of this paper.

Mini-review and overview of existing reporting guidance

Before we discuss considerations about the reporting of exploratory aetiologic studies, we wanted to illustrate some of the aspects of exploratory studies that need explicit reporting. Hence we performed a small review of published aetiologic studies. We identified all articles on original research in four journals in their first issue of 2021: American Journal of Epidemiology, Epidemiology, European Journal of Epidemiology, and International Journal of Epidemiology. We excluded studies that did not look at an aetiologic research question, such as prediction studies, studies on therapeutic interventions, and randomised trials. For each article, we counted the number of primary analyses, sensitivity analyses, and additional analyses that were performed. The unit of counting was the association estimator, where we counted only one association if the association was reported on different scales (eg, absolute and relative scales for binary endpoints). Also, we reviewed existing reporting guidance documents on aspects relevant to exploratory analyses, specifically the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement,6 RECORD (REporting of studies Conducted using Observational Routinely collected health Data) statement,7 STROBE-MR (Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomisation) for mendelian randomisation studies,8 STREGA (Strengthening the Reporting of Genetic Association Studies) for genome association studies,9 and the CONSORT (Consolidated Standards of Reporting Trials) extension to randomised pilot and feasibility trials.10

Patient and public involvement

Involving patients or the public in the design, conduct, reporting, or dissemination plans of our research was not appropriate or possible.

Results

Mini-review

The mini-review included 25 original aetiologic articles. These articles reported a mean number of 33 (range 1-120) exposure-outcome associations for the primary analysis, 30 (0-336) for sensitivity analyses, and 163 (0-1467) for additional analyses, mainly concerning subgroup or interaction analyses (supplementary file). Most articles did not explicitly report which analyses were prespecified, and only one study referred to a publicly available protocol.11 The methodological scrutiny of the subgroup analyses varied from thoughtful evaluations of exposure effect heterogeneity in well established subgroups to evaluations of exposure effects across subgroups that seemed to have been formed exhaustively across many potential risk factors. Despite the fact that our review included only a small sample of studies, the image that arises from it is that many results were presented, and insufficient information was reported to fully judge the validity and merits of the results.

Existing reporting guidance

The STROBE6 and RECORD7 statements provide checklists of items to report in observational studies that are relevant to exploratory analyses (table 1). Extensions of STROBE, such as STROBE-MR8 and STREGA,9 provide additional guidance for reporting of studies where many analyses are performed. Guidance for reporting randomised trials also provides helpful information for reporting exploratory analyses in aetiologic research, in particular the CONSORT extension to randomised pilot and feasibility trials.10 Not all of these recommendations can be directly applied to observational aetiologic studies, however, because the procedures for generating and testing of hypotheses are more established in randomised studies than in observational settings.

Table 1

Considerations for reporting of exploratory aetiologic research

Considerations for reporting of exploratory aetiologic studies	Items from existing guidelines that can inform reporting of exploratory analyses
1. Explicitly state the objective of all analyses, including exploratory analyses State the objective of an exploratory analysis to clarify how the results are to be interpreted. Outline the objective of the definite analysis of interest and clarify why an exploratory analysis should be conducted first	State the objective of the definite trial and rationale for a pilot study (CONSORT pilot and feasibility studies,10 items 2 and 6c)
2. Establish a study protocol before data analysis and make the protocol available to readers Specify the objective, design, and analysis plan in a protocol, even when existing data are analysed or when an analysis is considered exploratory	Register a protocol (CONSORT statement12 and CONSORT pilot and feasibility studies,10 items 23 and 24; RECORD,7 item 22)
3. Do not base judgments on significance values only Avoid selective reporting of results based on significance values, particularly because exploratory analyses are commonly conducted with less rigorously collected data and suboptimal ability to adjust for confounding. Also, statistical properties of exploratory tests are less well known than those of confirmatory tests	Consider adjustment for multiple comparisons (STROBE-MR,8 item 6e; STREGA,9 items 12i and 16d)
4. Interpret findings in line with the nature of the analysis Be transparent about the exploratory aim of the analysis and avoid overstating the credibility of findings. Minimise suggestions on generalisability and clinical relevance for exploratory findings	Distinguish prespecified from exploratory results in reporting (CONSORT pilot and feasibility studies,10 items 21 and 22; CONSORT statement,12 items 3b and 6b)Discuss if the data used for analysis might affect interpretation of findings (RECORD,7 item 19)
5. Report (summarised) results of all exploratory analyses that were performed Report results of all exploratory analyses that were conducted (possibly in a supplementary file) to provide a transparent and honest account of the analysis that facilitates interpretation of findings	Report all study results (STROBE6 and RECORD,7 items 16 and 17; CONSORT statement12 and CONSORT pilot and feasibility studies,10 items 17 and 18)Report all study results when multiple (data driven) analyses are preformed (STROBE-MR,8 items 11 and 13; STREGA,9 items 17b and 17c)Make analysis code available (RECORD,7 item 22)
6. Accompany exploratory analyses by a proposed research agenda Formulate a research agenda prioritising future research and how this research should be set up. This process ensures researchers take responsibility for the presented exploratory findings and follow-up research that should be performed	Report which and how future confirmative studies can be informed by the conducted exploratory analyses (CONSORT pilot and feasibility studies,10 items 21 and 22)

STROBE=Strengthening the Reporting of Observational Studies in Epidemiology statement; RECORD= REporting of studies Conducted using Observational Routinely collected health Data statement; STROBE-MR= Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization; STREGA=Strengthening the Reporting of Genetic Association Studies; CONSORT=Consolidated Standards of Reporting Trials.

Considerations for reporting of exploratory aetiologic research STROBE=Strengthening the Reporting of Observational Studies in Epidemiology statement; RECORD= REporting of studies Conducted using Observational Routinely collected health Data statement; STROBE-MR= Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization; STREGA=Strengthening the Reporting of Genetic Association Studies; CONSORT=Consolidated Standards of Reporting Trials.

Exploratory research principles

Inspired by the existing recommendations for reporting, we list six considerations for reporting and interpretation that can improve the assessment of the credibility of exploratory analyses in aetiologic research (table 1). The list is not exhaustive but we hope it will encourage further discussion on the reporting of exploratory research.

Consideration 1: explicitly state the objective of all analyses, including exploratory analyses

Stating the objective of an aetiologic study clarifies how to interpret the results. The objectives of confirmatory aetiologic research ideally contain a well defined targeted effect of a specific aetiologic factor on a specific outcome in a specific population.13 14 In early discovery research, objectives are not always rigorously defined but could be specified more generally (eg, understanding the origin of a particular outcome). An implication of stating the objective in general terms, however, is that the methodological handling of the analysis becomes less clear and the number of researchers’ degrees of freedom becomes large.15 Consequently, interpreting results without deriving spurious (causal) conclusions requires thought and effort because the analysis does not necessarily provide information towards a causal effect (see consideration 4).16 17 18 The more general an objective is stated, the more provisional the analysis becomes. This caveat includes machine learning approaches where no explicit causal modelling assumptions are made. Because exploratory analyses in aetiologic research often aim to inform a future in-depth causal analysis, reporting both the objective of the provisional exploratory analysis and the (future) confirmatory analysis is important. This reporting is in line with the CONSORT reporting checklist for pilot randomised controlled trials which recommends that researchers state the objective of the eventual trial in the manuscript of a pilot study.10 The rationale and need for the exploratory analysis in aetiologic research should be outlined together with uncertainties that need to be dealt with before performing an independent confirmative analysis of the causal mechanism. Reporting the position of provisional analyses relative to future research clarifies the level of credibility of the findings from exploratory analyses.

Consideration 2: establish a study protocol before data analysis and make the protocol available to readers

Preregistered protocols help distinguish which analyses were planned before observing the data and which analyses were performed post hoc, thereby avoiding hypothesising after the results are known. For randomised trials, preregistration of the study protocol is considered the norm.19 Preregistration does not seem as widespread in observational aetiologic research, but is increasingly encouraged,20 21 and explicitly recommended in the RECORD reporting checklist.7 Because aetiologic research often uses existing cohort data that have been analysed for related research questions, preregistration of aetiologic studies does not ensure the same level of credibility of statistical evidence as preregistration before collecting the data. Nosek and colleagues22 have provided preliminary guidance on preregistration of analyses conducted with existing data. These authors suggest that what was known in advance about the dataset should be transparently reported so that the credibility of statistical findings can be assessed, taking into account analyses that have been performed previously. Implementing this advice is probably challenging in large epidemiological cohort studies because of the many analyses that might have been performed. But trying to clarify why and how an analysis is conducted before observing the data is a laudable practice that can be implemented directly in aetiologic studies. This practice is ideally accompanied by work on developing guidance for preregistration of aetiologic studies that use existing data. Preregistration of analyses that are exploratory in nature is even less common, possibly contradicting the definition of exploration. We consider exploratory analysis, however, as discovery work that serves to motivate funding for larger studies that are, for example, better able to control confounding or to collect data rigorously. Given this important probing role, simply stating in a research protocol that certain relations will be explored is not enough; time and effort must be invested in designing the analysis appropriately. Not every detail can be specified in advance, but interpretation of the results provided by data can be challenging and unintentionally overconfident when no question was clearly articulated before seeing the answer.

Consideration 3: do not base judgments on significance values only

Only reporting the results of analyses that provided a P value below the prespecified α level (eg, 0.05) is discouraged throughout all scientific disciplines (for example, as discussed in a 2019 supplementary issue of The American Statistician).23 Avoiding selective reporting based on significance values is particularly relevant to exploratory findings because the statistical properties of exploratory tests are less well known than those of confirmatory tests.24 For example, the expected number of false positives (that is, the type I error rate) is probably increased when the choice for a statistical test was based on pattens in the observed data. Although procedures have been developed for correction of multiple testing in confirmatory settings, consensus on how to prevent false positive findings in exploratory settings has not yet been established.24 25 26 Increasing the number of exploratory analyses, without correction for multiple testing, raises the risk of deriving false positive conclusions, but too strict corrections for multiple testing increases the probability of false negative findings (that is, the type II error rate).27 A raised type II error rate could occur, for example, when an analysis of various positively correlated hypotheses is corrected for multiple testing as if all of the hypotheses were independent (eg, by applying a Bonferroni correction). The decision to statistically correct for multiple testing depends, among other issues, on the total number of tests performed in the same dataset, correlation between the hypotheses being tested, and sample size. Reporting each of these considerations clarifies the analytical context of findings and helps to assess the credibility of the results. This form of reporting is in line with the STROBE-MR8 and STREGA9 checklists which recommend stating how multiple comparisons were managed, although recommendations for the handling of multiple testing seem more established in genome-wide association studies than in clinical aetiologic cohort studies.5

Consideration 4: interpret findings in line with the nature of the analysis

Interpreting and communicating results in line with the exploratory nature of an analysis is challenging because an accurate representation of the degree of tentativeness of the results is required. Assessing this degree of tentativeness based on only the results of an analysis (that is, based on the numerical estimates) is complicated because seemingly convincing results can be misleading and a clinical explanation can be found that does not follow from the statistical evidence.28 29 Cognitive biases, such as hindsight bias, can distort the interpretation of findings. Reporting of findings from exploratory analyses starts with indicating whether the analysis was planned before or after observing the data, which is recommended in the CONSORT extension to randomised pilot and feasibility trials.10 Results of exploratory analyses can be interpreted by focusing on what is reported about the objectives and applied methodology rather than overstepping the findings. The specificity with which findings are interpreted should match the generality with which the objective is stated (see consideration 1).16 17 18 For example, when various subgroup analyses are performed with the general aim of identifying possible subgroups from the available data where an exposure effect was different, researchers should report that many subgroups were explored, including characterisation of the subgroups and description of the presence or absence of effect heterogeneity, rather than discussing only one or two specific subgroups where the effect size was extreme. Furthermore, exploratory analyses often fail to support strong conclusions. Recommendations for clinical practice or generalisations based on exploratory analyses should generally be avoided.

Consideration 5: report (summarised) results of all exploratory analyses that were performed

When findings are selectively reported, especially when reporting is guided by significant findings (see consideration 3), the credibility of reported findings is probably overstated.30 Reporting the results of all of the exploratory analyses that were conducted (possibly in a supplementary file) provides a transparent and honest report of the analysis and facilitates better interpretation of the findings. This approach is in line with the STROBE extension in STREGA, which recommends that all results of analyses should be presented, even if numerous analyses were undertaken.9 Reporting all analyses that have been conducted seems simple, but can be challenging in practice, mainly because the process of performing a study is typically iterative. A framework for initial data analysis by Huebner and colleagues could help keep track of all subanalyses that are conducted as part of a main analysis.31 This framework distinguishes exploratory analyses that are part of a primary analysis from additional exploratory analyses that require separate reporting. Another helpful practice could be to have a reflection period after performing analyses to establish whether the analyses look at (slightly) different research questions and to report separate analyses for each research question.

Consideration 6: accompany exploratory analyses by a proposed research agenda

The credibility of exploratory findings can be communicated through a research agenda prioritising future research and how this research should be set up. Reporting a research agenda is similar to the CONSORT extension to randomised pilot and feasibility trials that recommends reporting which and how future confirmative trials can be informed by the pilot study.10 Formulating a research agenda allows researchers to take responsibility for the exploratory findings presented and future research that should be performed, avoiding the empty statement that “more research is needed”. In medical science in particular, where study results are sometimes quickly implemented into clinical practice, researchers are encouraged to take responsibility for the results they report by clearly explaining which exploratory findings should be investigated in future research and how.

Discussion

Our mini-review showed that exploratory analyses in aetiologic research were not always reported optimally. The credibility of exploratory results is affected by a combination of the theoretical rationale for the analysis, clarity of the defined research problem, applied methodology, and degree to which analytical decisions are driven by the data. Choosing a particular analysis based on observed patterns in the data complicates statistical inferences. Moreover, the design and methods applied in an exploratory analysis might be less optimal than the primary analysis of the study, which further complicates interpretation of exploratory analyses. Therefore, information on these aspects should be clearly reported. Exploration is essential to the progress of science. Strict confirmatory studies are a powerful mechanism for final evaluations before implementation in clinical practice, but will probably not stimulate new ideas.32 33 Open minded exploratory analyses can lead to unexpected discoveries and resourceful innovations of epidemiological science, but effort is required to accurately interpret the results. Because exploratory analyses are usually done to generate new research questions, quickly performing a statistical test (or multiple tests) to get the first answer to the problem is tempting. When quick test results are presented in a research article, however, their interpretation might be ad hoc and unintentionally overconfident. To show their full value, exploratory analyses of aetiologic research need to be conducted and interpreted correctly. We have provided six considerations for reporting of exploratory analyses to encourage a discussion on exploratory analyses and how the credibility of these analyses is ideally assessed in aetiologic research. Continuation of this discussion will contribute to the understanding of inferences that can be made from exploratory analyses in aetiologic research and will help strike a balance between their opportunities and risks. Exploratory analyses in aetiologic research are initial steps towards solving a research problem and are often conducted in addition to planned primary analyses of a study Exploratory analyses might lead to new discoveries in aetiologic research, but effort is needed to accurately interpret the results because these analyses are often conducted with few data resources and insufficient adjusting for confounding Statistical properties of exploratory tests are less well known than those of confirmatory tests This study focuses on a particular type of causal research, namely aetiologic studies, which investigate the causal effect of one or multiple risk factors on a particular health outcome or disease Six considerations for reporting of exploratory analyses in aetiologic research were provided to stimulate a discussion about their preferred handling and reporting Researchers should take responsibility for results of exploratory analyses by clearly reporting their exploratory nature and specifying which findings should be investigated in future research and how

24 in total

Review 1. Descriptive studies: what they can and cannot do.

Authors: David A Grimes; Kenneth F Schulz
Journal: Lancet Date: 2002-01-12 Impact factor: 79.321

2. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.

Authors: Joseph P Simmons; Leif D Nelson; Uri Simonsohn
Journal: Psychol Sci Date: 2011-10-17

3. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials.

Authors: David Moher; Sally Hopewell; Kenneth F Schulz; Victor Montori; Peter C Gøtzsche; P J Devereaux; Diana Elbourne; Matthias Egger; Douglas G Altman
Journal: BMJ Date: 2010-03-23

4. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

Authors: Erik von Elm; Douglas G Altman; Matthias Egger; Stuart J Pocock; Peter C Gøtzsche; Jan P Vandenbroucke
Journal: BMJ Date: 2007-10-20

5. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients.

Authors: Daniel Westreich; Sander Greenland
Journal: Am J Epidemiol Date: 2013-01-30 Impact factor: 4.897

6. Update on Trial Registration 11 Years after the ICMJE Policy Was Established.

Authors: Deborah A Zarin; Tony Tse; Rebecca J Williams; Thiyagu Rajakannan
Journal: N Engl J Med Date: 2017-01-26 Impact factor: 91.245

7. Population-based organized screening by faecal immunochemical testing and colorectal cancer mortality: a natural experiment.

Authors: Matthew T Keys; Miquel Serra-Burriel; Natalia Martínez-Lizaga; Maria Pellisé; Francesc Balaguer; Ariadna Sánchez; Enrique Bernal-Delgado; Antoni Castells
Journal: Int J Epidemiol Date: 2021-03-03 Impact factor: 7.196