| Literature DB >> 23055992 |
Kevin Cummiskey1, Shonda Kuiper, Rodney Sturdivant.
Abstract
This paper discusses the influence that decisions about data cleaning and violations of statistical assumptions can have on drawing valid conclusions to research studies. The datasets provided in this paper were collected as part of a National Science Foundation grant to design online games and associated labs for use in undergraduate and graduate statistics courses that can effectively illustrate issues not always addressed in traditional instruction. Students play the role of a researcher by selecting from a wide variety of independent variables to explain why some students complete games faster than others. Typical project data sets are "messy," with many outliers (usually from some students taking much longer than others) and distributions that do not appear normal. Classroom testing of the games over several semesters has produced evidence of their efficacy in statistics education. The projects tend to be engaging for students and they make the impact of data cleaning and violations of model assumptions more relevant. We discuss the use of one of the games and associated guided lab in introducing students to issues prevalent in real data and the challenges involved in data cleaning and dangers when model assumptions are violated.Entities:
Keywords: Guided Interdisciplinary Statistics Games and Labs; messy data; model assumptions
Year: 2012 PMID: 23055992 PMCID: PMC3457080 DOI: 10.3389/fpsyg.2012.00354
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Tangrams web interface.
Figure 2Side-by-side boxplots of the completion time of the Tangrams game for raw and cleaned data.
Summary statistics for raw and cleaned data.
| Raw data | Cleaned data (outliers removed) | |||
|---|---|---|---|---|
| Athlete | Non-athlete | Athlete | Non-athlete | |
| Sample size | 36 | 92 | 33 | 84 |
| Sample mean | 82.72 | 72.50 | 65.23 | 53.02 |
| SD | 72.00 | 73.50 | 39.35 | 27.11 |
Summary of Shapiro–Wilks normality test under various conditions.
| Athlete | Non-athlete | |
|---|---|---|
| Raw data | <0.001 | <0.001 |
| Cleaned data | 0.00157 | <0.001 |
| Log-transformed raw data | 0.118 | <0.001 |
| Log-transformed cleaned data | 0.153 | 0.0521 |
Figure 3Side-by-side boxplots of the log of completion time for the cleaned data.
Summary of .
| Raw data ( | Cleaned data ( | |
|---|---|---|
| Two-sample | 0.478 | 0.058 |
| Two-sample | 0.475 | 0.109 |
| Two-sample | 0.307 | 0.139 |
| Two-sample | 0.323 | 0.180 |
Candidate independent variables to explain Tangrams performance.
| Variable | Research question |
|---|---|
| Gender | Do males or females perform better at tangrams? |
| Academic major | Do students majoring in science, technology, engineering, and mathematics perform better at tangrams than other students? |
| Type of high school attended | Do students who attended private or public high schools perform better at tangrams? |
| Athlete | Do college athletes perform better at tangrams than non-athletes? |
| Political affiliation | Do students who affiliate with the democratic, republican, or other parties perform better at tangrams? |
| Academic performance | Do students who made the dean’s list perform better at tangrams than those that did not? |
Candidate dependent variables.
| Variable | Description |
|---|---|
| Puzzle completion time | Time to complete a tangrams puzzle |
| Puzzle success or failure | Given a fixed amount of time, whether or not a student can complete the puzzle |
| Number of moves | Number of moves (a flip or rotation) required to solve the puzzle |
| Time to quit | Time before a student quits a puzzle that is impossible to solve |
| Time to receive a hint | Time until a student asks the game for a hint |
| Number of puzzles solved | Given a fixed amount of time, the number of puzzles that a student can solve |
Survey results for 115 students after completing the Tangrams lab.
| Survey question | Strongly agree (%) | Agree (%) | Neutral (%) | Disagree (%) | Strongly disagree (%) |
|---|---|---|---|---|---|
| The Tangrams lab was a good way of learning about hypothesis testing | 43 | 38 | 8 | 7 | 3 |
| Students who do not major in science should not have to take statistics courses | 5 | 10 | 23 | 37 | 24 |
| Statistics is essentially an accumulation of facts, rules, and formulas | 10 | 34 | 30 | 19 | 6 |
| Creativity plays a role in research | 30 | 47 | 12 | 7 | 4 |
| If an experiment shows that something does not work, the experiment was a failure | 9 | 2 | 5 | 31 | 52 |
| The tangrams lab had a possible effect on my interest in statistics | 17 | 38 | 32 | 13 | 1 |