| Literature DB >> 26582236 |
J Masel1, P T Humphrey2, B Blackburn3, J A Levine2.
Abstract
Most students have difficulty reasoning about chance events, and misconceptions regarding probability can persist or even strengthen following traditional instruction. Many biostatistics classes sidestep this problem by prioritizing exploratory data analysis over probability. However, probability itself, in addition to statistics, is essential both to the biology curriculum and to informed decision making in daily life. One area in which probability is particularly important is medicine. Given the preponderance of pre health students, in addition to more general interest in medicine, we capitalized on students' intrinsic motivation in this area to teach both probability and statistics. We use the randomized controlled trial as the centerpiece of the course, because it exemplifies the most salient features of the scientific method, and the application of critical thinking to medicine. The other two pillars of the course are biomedical applications of Bayes' theorem and science and society content. Backward design from these three overarching aims was used to select appropriate probability and statistics content, with a focus on eliciting and countering previously documented misconceptions in their medical context. Pretest/posttest assessments using the Quantitative Reasoning Quotient and Attitudes Toward Statistics instruments are positive, bucking several negative trends previously reported in statistics education.Entities:
Mesh:
Year: 2015 PMID: 26582236 PMCID: PMC4710403 DOI: 10.1187/cbe.15-04-0079
Source DB: PubMed Journal: CBE Life Sci Educ ISSN: 1931-7913 Impact factor: 3.325
Common misconceptions about probability
| Misconception | Reference | Example of reasoning according to the misconception |
|---|---|---|
| Outcome orientation |
| Considering the next 6 rolls of a dice with 5 black sides and one white, the most likely outcome is 6 rolls of black. |
| Representativeness heuristic |
| The sequence of births G B G B B G (G = girl; B = boy) is more likely than the sequence B G B B B B. |
| Equiprobability bias |
| When rolling two dice, rolling a 5 and a 6 is equally as likely as rolling two 5s. |
Figure 1.Use of a natural frequency tree to implement Bayes’ theorem. For this problem, the information given is “About 0.01% of men with no known risk factors have HIV. HIV+ men test positive 99.9% of the time. HIV− men test negative 99.99% of the time. A man with no known risk factors tests positive. What is the probability that he has HIV?” The two individuals meeting the condition of testing positive are circled; one of them has HIV, making the probability 0.5.
Figure 2.Evidence pyramid. Near the end of the course, students are exposed to alternatives to RCTs and learn to identify the level to which a research article belongs and to choose the highest level of evidence available for a given question. The value of meta-analyses vs. large single trials is discussed, with publication bias raising a question mark over the order shown here.
Course content
| Basic probability | Statistics/randomized controlled trials | Science and society | Bayes’ theorem/medical screening |
|---|---|---|---|
Frequency = the number of times something happened ÷ the number of times it could have happened Probability = limit of frequency given a large number of “events,” e.g., rolls of a multilink cube or patients AND rule for independent events, OR rule for mutually exclusive events Coin tosses HH, HT, TH, and TT are equiprobable, but 2H, 1H1T, and 2T are not Binomial distribution | 2 × 2 contingency table: treated vs. not, live vs. die Case study on Fisher’s lady tasting tea, illustrating principles of experimental design Likelihood ratio test (using binomial distributions) on 2 × 2 contingency tables Mean, SD, and SEM The effect size you have power to detect goes with (SD)/sqrt( Statistical vs. clinical significance Randomized interventions distinguish between causation and correlation Randomization procedures with parallel groups, crossover, split-body, and cluster study designs Efficacy and effectiveness Class projects with randomized designs Regression to the mean Experimental designs to distinguish between different kinds of placebo effects How bias can creep into experimental design, e.g., cherry-picking outcomes, subgroup analysis Evidence pyramid | Reading the history book Case study on Semmelweis and hand-washing Hand-washing and checklists today Institutional review board procedures, informed consent Reductionism in the biomedical sciences Case studies of the history behind drug discoveries Lack of evidence for “chemical imbalance” theories Declining cost-effectiveness of drug discovery pipeline Placebo effects, e.g., for antidepressant drugs and vertebroplasty Ethics of placebo use Insurance and doctor payments systems, Affordable Care Act Essay on “How do you want the U.S. healthcare system to work?” Approval processes for new drugs/devices Drug company marketing strategies | Conditional probability Prob(die of breast cancer | die young) ≠ prob(die young | die of breast cancer) Bayes’ theorem is needed for backward prob (hypothesis | data), likelihood = forward prob(data | hypothesis): when to use which Classical vs. frequentist vs. Bayesian definitions of what probability means Low base rates lead to high probability that a positive is false, e.g., for HIV and mammograms Lead-time bias and length bias: earlier detection tests can perform worse False positives vs. overdiagnosis Balance sheet of harms vs. benefits for mammograms and how to communicate them, e.g., relative vs. absolute risks “Why Most Published Research Findings Are False” ( |
Course learning objectives are for students to
Recognize opportunities for gaining knowledge via randomized trials in familiar contexts within your daily life Recognize the temptations to, and dangers of, not using randomized trials, including in contexts you have not seen before Understand why randomization removes the need to “control” for everything in an experiment Understand how a randomized intervention solves the problem of distinguishing between correlation and causation Calculate probabilities and frequencies, using tools that include the “AND” and “OR” rules, the binomial distribution, and Bayes’ theorem Use the most appropriate interpretation of probability (classical, frequentist, and Bayesian) depending on the task Distinguish between prob(A | B) and prob(B | A) and choose the correct one for any question Identify null and alternative hypotheses in novel situations Explain and justify the philosophy of a null hypothesis and a Identify type I and type II errors (false positives and false negatives), including in contexts you have not seen before Calculate mean, variance, SD, the SEM, and the SE of the difference between two means, and relate these quantities to power and to each other Analyze alternative study designs (e.g., parallel, crossover, Test hypotheses using a log-likelihood test for discrete data and the direct application of the binomial distribution (Fisher’s lady tasting tea) for discrete data, also having some familiarity with the Critique our current systems of healthcare and biomedical research, including the roles of reductionism, the Food and Drug Administration, the drug companies, and payment/insurance systems Understand the pipeline of drug discovery and approval Evaluate biomedical ethical regulations, norms, and decision-making processes Compare, contrast, and critique the different ways to communicate statistics (relative risk reduction, absolute risk reduction, number needed to treat, and increase in life expectancy) Apply Bayes’ theorem to clinical screening programs such as mammography Develop correct intuitions about the importance of different factors on Bayes’ theorem and power calculations Analyze how lead-time bias and length bias affect screening programs such as mammography Distinguish between false positives and overdiagnosis Evaluate the appropriateness of screening decisions based both on available data and on patient values Identify multiple placebo effects in medicine (including regression to the mean) and design experiments to control for/investigate them Evaluate the reliability of biomedical findings using the evidence-based pyramid, taking into account factors including publication bias and reproducibility |
Precourse, postcourse, and changes in QRQ total and subscores
| Category | Category text | Pretest meana | Pretest SD | Posttest meana | Posttest SD | Mean difference | |
|---|---|---|---|---|---|---|---|
| Overall | — | 58.95 | 9.9 | 65.15 | 12.28 | 6.2 (± 2.93) | <0.001 |
| Competencies | 2.55 | 1.4 | 2.85 | 1.46 | 0.30 (± 0.44) | 0.183 | |
| 3.74 | 0.79 | 3.89 | 0.75 | 0.14 (± 0.25) | 0.263 | ||
| 2.4 | 0.9 | 2.67 | 0.99 | 0.27 (± 0.35) | 0.132 | ||
| 2.78 | 1.1 | 2.48 | 1.13 | −0.30 (± 0.5) | 0.239 | ||
| 3.83 | 1.21 | 4.23 | 1.02 | 0.40 (± 0.43) | 0.073 | ||
| 2.56 | 0.79 | 3.33 | 0.83 | 0.77 (± 0.28) | <0.001 | ||
| 3.57 | 1.22 | 4.03 | 1.31 | 0.47 (± 0.38) | 0.018 | ||
| 3.1 | 1.81 | 3.2 | 1.8 | 0.10 (± 0.82) | 0.809 | ||
| 3.15 | 1.66 | 3.85 | 1.49 | 0.70 (± 0.68) | 0.046 | ||
| 3.94 | 0.86 | 4.26 | 0.84 | 0.32 (± 0.31) | 0.044 | ||
| 3.73 | 1.11 | 3.58 | 1.24 | −0.15 (± 0.41) | 0.467 | ||
| Misconceptions | 2.54 | 0.7 | 2.2 | 0.66 | −0.34 (± 0.28) | 0.018 | |
| 1.53 | 0.37 | 1.39 | 0.35 | −0.14 (± 0.14) | 0.048 | ||
| 2.5 | 1.28 | 2.43 | 1.34 | −0.08 (± 0.47) | 0.752 | ||
| 2.05 | 1.2 | 1.6 | 0.93 | −0.45 (± 0.49) | 0.071 | ||
| 1.93 | 0.89 | 1.6 | 0.77 | −0.33 (± 0.29) | 0.025 | ||
| 2.43 | 1.22 | 1.97 | 1.31 | −0.47 (± 0.38) | 0.018 | ||
| 3.25 | 1.39 | 3.43 | 1.34 | 0.18 (± 0.6) | 0.562 | ||
| 2.1 | 1.8 | 2.5 | 1.96 | 0.40 (± 0.75) | 0.291 | ||
| 2 | 1.01 | 1.75 | 0.98 | −0.25 (± 0.38) | 0.200 | ||
| 1.25 | 0.5 | 1.23 | 0.53 | −0.03 (± 0.18) | 0.785 | ||
| 1.4 | 0.41 | 1.5 | 0.51 | 0.10 (± 0.19) | 0.291 | ||
| 2 | 1.76 | 1.7 | 1.54 | −0.30 (± 0.66) | 0.372 | ||
| 1.97 | 0.81 | 1.63 | 0.72 | −0.33 (± 0.27) | 0.018 | ||
| 1.1 | 0.64 | 1.2 | 0.88 | 0.10 (± 0.35) | 0.570 | ||
| 1.3 | 0.27 | 1.26 | 0.25 | −0.04 (± 0.09) | 0.418 |
aOverall score scaled to 0–100%; individual scores scaled to 1–5 Likert-like scale. Error on mean differences is ± 2 × SEM of paired post- vs. pretest differences.
Precourse, postcourse, and changes in ATS total and individual items
| Category | Pretest meana | Pretest SD | Posttest meana | Posttest SD | Mean difference | |
|---|---|---|---|---|---|---|
| Overall attitude | 63.71 | 10.35 | 69.50 | 12.57 | 5.78 (± 3.64) | 0.002 |
| 3.85 | 0.93 | 4.18 | 0.97 | 0.33 (± 0.34) | 0.037 | |
| 3 | 1 | 3.08 | 1.11 | 0.08 (± 0.33) | 0.656 | |
| 3.95 | 0.86 | 4.56 | 0.55 | 0.62 (± 0.3) | 0.001 | |
| 3.31 | 0.92 | 3.54 | 0.85 | 0.23 (± 0.33) | 0.214 | |
| 3.64 | 0.71 | 4 | 0.69 | 0.36 (± 0.23) | 0.005 | |
| 3.79 | 0.92 | 4.05 | 1 | 0.26 (± 0.32) | 0.106 | |
| 3.05 | 0.96 | 3.55 | 0.92 | 0.50 (± 0.38) | 0.014 | |
| 2.56 | 0.72 | 2.92 | 1.09 | 0.36 (± 0.32) | 0.031 | |
| 3.82 | 0.6 | 3.95 | 0.79 | 0.13 (± 0.3) | 0.414 | |
| 4.03 | 0.67 | 4.31 | 0.69 | 0.28 (± 0.24) | 0.029 | |
| 3.66 | 0.88 | 4.05 | 0.8 | 0.39 (± 0.23) | 0.003 | |
| 3.13 | 1.08 | 3.59 | 1.14 | 0.46 (± 0.38) | 0.023 | |
| 3.66 | 0.71 | 3.92 | 0.78 | 0.26 (± 0.24) | 0.043 | |
| 3.97 | 0.74 | 4.1 | 0.94 | 0.13 (± 0.28) | 0.386 | |
| 3.08 | 1.02 | 3.32 | 1.16 | 0.24 (± 0.4) | 0.238 | |
| 3.76 | 0.85 | 3.82 | 0.69 | 0.05 (± 0.29) | 0.721 | |
| 4.03 | 0.79 | 4.08 | 0.91 | 0.05 (± 0.26) | 0.721 | |
| 3.13 | 1.04 | 3.42 | 1.15 | 0.29 (± 0.32) | 0.100 | |
| 2.76 | 0.86 | 3.14 | 1.03 | 0.38 (± 0.39) | 0.072 | |
| 4.16 | 0.68 | 4.24 | 0.88 | 0.08 (± 0.27) | 0.607 | |
| 3.92 | 0.78 | 4.05 | 0.84 | 0.13 (± 0.2) | 0.212 | |
| 3.84 | 0.59 | 4.08 | 0.82 | 0.24 (± 0.32) | 0.166 | |
| 3.92 | 0.43 | 4.05 | 0.77 | 0.13 (± 0.27) | 0.374 | |
| 3.68 | 0.62 | 3.92 | 0.85 | 0.24 (± 0.27) | 0.100 | |
| 3.58 | 1 | 3.74 | 1.06 | 0.16 (± 0.25) | 0.228 | |
| 3.26 | 0.6 | 3.63 | 0.79 | 0.37 (± 0.27) | 0.014 | |
| 3.63 | 0.82 | 3.74 | 1 | 0.11 (± 0.32) | 0.597 | |
| 3.68 | 0.77 | 3.89 | 0.8 | 0.21 (± 0.26) | 0.120 | |
| 2.68 | 0.84 | 2.53 | 0.89 | –0.16 (± 0.3) | 0.313 |
aOverall scores scaled to 0–100%; individual scores reflect 1–5 Likert scale. Scores for all items oriented to reflect 1–5 negative-to-positive transition, with 3 being neutral. Error on mean differences are ± 2 × SEM of paired post-vs. pretest differences. p-values for individual items were obtained from nonparametric paired Wilcoxon signed-rank tests and for overall attitudes by repeated-measures ANOVA.
Figure 3.We observed postcourse vs. precourse (a) overall improvements and improvements in some (b) QRQ subscores and (c) ATS item scores for our Spring 2014 course offering. The ATS is a 1–5 Likert scale, and QRQ scores are arbitrarily scaled to match. Negatively phrased ATS questions are shown with scores in reverse direction such that higher scores indicate more positive attitudes across all items; an attitude score of 3 is “neutral.” Because analysis is of paired measures, 95% confidence intervals (red) are shown once for the precourse vs. postcourse differences rather than separately for precourse and for postcourse scores.
The course addresses calls for change
| Challenge addressed | Document | Reference | Specific competencies covered by our course |
|---|---|---|---|
| Four out of six core competencies |
| Ability to apply the process of science Ability to use quantitative reasoning Ability to tap into the interdisciplinary nature of science Ability to understand the relationship between science and society | |
| Two of the eight competencies |
| Apply quantitative reasoning and appropriate mathematics to describe or explain phenomena in the natural world Demonstrate understanding of the process of scientific inquiry and explain how scientific knowledge is discovered and validated | |
| New MCAT requirements | — |
| Psychological, Social, and Biological Foundations of Behavior Critical Analysis and Reasoning Skills |
| Integrate ethics with scientific content | — |
| — |