| Literature DB >> 30225177 |
Leslie Myint1, Jeffrey T Leek2, Leah R Jager2.
Abstract
Most researchers do not deliberately claim causal results in an observational study. But do we lead our readers to draw a causal conclusion unintentionally by explaining why significant correlations and relationships may exist? Here we perform a randomized controlled experiment in a massive open online course run in 2013 that teaches data analysis concepts to test the hypothesis that explaining an analysis will lead readers to interpret an inferential analysis as causal. We test this hypothesis with a single example of an observational study on the relationship between smoking and cancer. We show that adding an explanation to the description of an inferential analysis leads to a 15.2% increase in readers interpreting the analysis as causal (95% confidence interval for difference in two proportions: 12.8%-17.5%). We then replicate this finding in a second large scale massive open online course. Nearly every scientific study, regardless of the study design, includes an explanation for observed effects. Our results suggest that these explanations may be misleading to the audience of these data analyses and that qualification of explanations could be a useful avenue of exploration in future research to counteract the problem. Our results invite many opportunities for further research to broaden the scope of these findings beyond the single smoking-cancer example examined here.Entities:
Keywords: Behavior; Causality; Explanation; Inference; Interpretation; Randomized trial
Year: 2018 PMID: 30225177 PMCID: PMC6139016 DOI: 10.7717/peerj.5597
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Goals for different analysis types (Leek & Peng, 2015).
These analysis types form the set of possible answer choices in our randomized experiment and were taught to students before the experiment was performed.
| Type of analysis | Goal of analysis |
|---|---|
| Descriptive | Summarizing the data without interpretation |
| Exploratory | Summarizing the data with interpretation, but without generalization beyond the original sample |
| Inferential | Generalizing beyond the original sample, with the goal of describing an association in a larger population |
| Predictive | Generalizing beyond the original sample, with the goal of predicting a measurement for a new individual |
| Causal | Generalizing beyond the original sample, with the goal of learning how changing the average of one measurement affects, on average, another measurement |
| Mechanistic | Generalizing beyond the original sample, with the goal of learning how changing one measurement deterministically affects another variable’s measurement. |
Effect of explanatory language on student responses.
For each of four sets of answer choices seen, differences in the percentage choosing the “causal” and “inferential” answer choices are given, as well as 95% confidence intervals for the differences and sample sizes.
| Answer choices seen | Difference in percentage choosing “causal” when seeing explanatory language vs. not seeing explanatory language (95% CI for difference in proportions) | |
|---|---|---|
| January 2013 course | October 2013 course | |
| Inferential, causal, descriptive, predictive | 14.5% (12.2%, 16.8%) | 14.3% (6.4%, 22.2%) |
| Inferential, causal, descriptive, mechanistic | 15.8% (13.4%, 18.1%) | 14.8% (6.6%, 23.0%) |
| Inferential, causal, predictive, mechanistic | 15.2% (12.8%, 17.5%) | 19.9% (11.5%, 28.3%) |
Detailed results for the experimental arm with answer choices: inferential, causal, predictive, and mechanistic.
In the presence of explanatory language, nearly twice as many students incorrectly selected “causal” with a corresponding decrease in the percentage of students correctly selecting “inferential”.
| January 2013 course ( | October 2013 course ( | |||
|---|---|---|---|---|
| This is an example of a/an _________ data analysis. | Saw explanatory language ( | No explanatory language ( | Saw explanatory language ( | No explanatory language ( |
| inferential | 1,508 (59.9%) | 1,977 (76.9%) | 116 (58.3%) | 190 (79.8%) |
| causal | 799 (31.8%) | 427 (16.6%) | 68 (34.2%) | 34 (14.3%) |
| predictive | 120 (4.8%) | 138 (5.4%) | 8 (4.0%) | 11 (4.6%) |
| mechanistic | 89 (3.5%) | 30 (1.2%) | 7 (3.5%) | 3 (1.3%) |
Detailed results for the experimental arm with answer choices: inferential, descriptive, predictive, and mechanistic (no causal).
In the presence of explanatory language, a lower percentage of students correctly selected “inferential”, and a higher percentage of students incorrectly selected “mechanistic”.
| January 2013 course ( | October 2013 course ( | ||||
|---|---|---|---|---|---|
| This is an example of a/an _________ data analysis. | Saw explanatory language ( | No explanatory language ( | Saw explanatory language( | No explanatory language ( | |
| inferential | 2,011 (80.9%) | 2,232 (88.2%) | 160 (80.4%) | 185 (85.3%) | |
| predictive | 196 (7.9%) | 181 (7.2%) | 10 (5.0%) | 12 (5.5%) | |
| descriptive | 138 (5.6%) | 82 (3.2%) | 14 (7.0%) | 14 (6.5%) | |
| mechanistic | 140 (5.6%) | 36 (1.4%) | 15 (7.5%) | 6 (2.8%) | |