| Literature DB >> 35432129 |
Florian Loffing1,2.
Abstract
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.Entities:
Keywords: continuous data; descriptive; quantitative methods; statistics; teaching; univariate distribution
Year: 2022 PMID: 35432129 PMCID: PMC9005633 DOI: 10.3389/fpsyg.2022.808469
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Overview of study designs and graph types covered by the SPSS-syntax collection.
| Design | Grouping variables in SPSS | Measurement variables in SPSS | Examples of parametric statistical tests | Pure dot plot | Box plot with dots | Line plot with dots | Bar plot with dots |
|---|---|---|---|---|---|---|---|
| One-sample | 0 | 1 | One-sample | ✓ | ✓ | × | × |
| One between-subject factor | 1 (≥2 levels) | 1 | ✓ | ✓ | × | ✓ | |
| Two between-subject factors | 2 (IVbs-1 and IVbs-2: ≥2 levels) | 1 | Two-factorial univariate ANOVA | ✓ | ✓ | × | ✓ |
| One within-subject factor | 0 | 2 | ✓ | ✓ | ✓ | × | |
| 3 | One-way repeated-measures ANOVA (e.g., pre-post-retention) | ✓ | ✓ | ✓ | × | ||
| 4 or 5 | One-way repeated-measures ANOVA | ✓ | ✓ | ✓ | × | ||
| Mixed design (1 between-subject factor x 1 within-subject factor) | 1 (IVbs: 2 levels) | 2 (IVws) | Two-factorial mixed ANOVA with repeated measures on the within-subject factor | ✓ | ✓ | ✓ | × |
| 3 (IVws) | ✓ | ✓ | ✓ | × | |||
| 4 or 5 (IVws) | ✓ | ✓ | ✓ | × | |||
| 1 (IVbs: 3 levels) | 2 (IVws) | ✓ | ✓ | ✓ | × | ||
| 3 (IVws) | ✓ | ✓ | ✓ | × | |||
| 4 or 5 (IVws) | ✓ | ✓ | ✓ | × | |||
| Two within-subject factors | 0 | 4 (IVws-A [2 levels] x IVws-B [2 levels]) | Two-factorial repeated measures ANOVA | ✓ (difference scores only) | × | × | × |
| 6 (IVws-A [2 levels] x IVws-B [3 levels]) | ✓ (difference scores only) | × | × | × |
Two visualization options are available: Individual dots (i.e., cases in a dataset) connected by grey solid lines (filenames include the string “raw-dots-CONNECTED”) or not connected (filenames include the string “raw-dots-NOT-connected”; see main text for details).
An additional panel with difference score(s) can optionally be created as well. The corresponding syntax filenames include the string “with-Delta.”
IV = independent variable. In the case of two-factorial designs, subscripts differentiate between the two independent variables (bs = between-subject factor, 1 = factor 1, 2 = factor 2; ws = within-subject factor, A = factor A, B = factor B).
Options for the visualization of measures of central tendency and dispersion, in addition to raw data, depending on graph type.
| Measures | Dot plot | Box plot | Line plot | Bar plot |
|---|---|---|---|---|
| Mean(s) + 95% CI | ✓ | × | ✓ | ✓ |
| Mean(s) + SD | ✓ | × | ✓ | ✓ |
| Mean(s) | ✓ | ✓ | ✓ | ✓ |
| Median(s) | ✓ | × | ✓ | ✓ |
| Mean(s) + median(s) | ✓ | × | ✓ | ✓ |
| Mean(s) + 95% CI + median(s) | ✓ | × | ✓ | ✓ |
| Mean(s) + SD + median(s) | ✓ | × | ✓ | ✓ |
The order of measures listed from the table’s top to bottom corresponds to the order of code included in the respective syntax files. By default, in all graphs that can be created with the SPSS-syntax templates mean values are indicated by red horizontal lines (except for bar graphs) and median values are indicated by blue horizontal lines. CI = confidence interval, SD = standard deviation. Note that CIs are visualized as stand-alone intervals that are related to their respective arithmetic means. Therefore, inference of between- or within-subject comparisons from visual inspection of CIs is not permitted (see Cousineau et al., 2021, for an R-based solution to that problem).
Figure 1Sections taken from the variable view of the SPSS dataset template and variables needed to specify in the dataset depending on the study design or the factors/factor combination selected for data visualization. Color-filled circles denote essential variables for a given design, whereas white-filled color-bordered circles denote variables that are optional for a given design depending on the number of factors (one- or two-factorial between-subject designs), the number of within-subject factor levels in one-factorial within-subject or mixed designs (2–5 levels supported) or the number of levels of the second factor B (2 or 3 levels supported) in two-factorial within-subject designs.
Figure 2Basic workflow for raw data visualization in SPSS using syntax and the dataset template provided alongside this article (see main text for details).
Figure 3Dot plots created based on the example dataset for one-factorial between-subject designs. (A) Original output based on syntax and (B) adjusted figure based on the properties changes illustrated in (C). In (A,B) the red horizontal bars represent the arithmetic mean and error bars represent 95% CIs associated with respective means. In (C), the adjustments made are highlighted red (see main text for details).
Figure 4Box plots created based on the example dataset for two-factorial between-subject designs. (A) Original output based on syntax and (B) adjusted figure based on similar settings as for the one-factorial between-subject example. Further adjustment of properties related to the bottom horizontal axis (here: the second grouping factor “age”) as illustrated in (C) results in the figure displayed in (D). In (A,B,D) each group’s arithmetic mean is represented through the red horizontal bars. In (C), the adjustments made are circled red (see main text for details).
Figure 5Line plots created based on the example dataset for two-factorial mixed-designs. (A) Original output based on syntax and (B) adjusted figure based on the settings illustrated and highlighted red in (C). Please note that not each single step realized to move from (A) to (B) is shown in (C) (see main text for details). In (A,B) the red horizontal bars represent the arithmetic mean, the smaller blue horizontal bars represent the median and error bars represent 95% CIs associated with respective means.
Figure 6Line plots created based on the example dataset for two-factorial mixed-designs. In addition to what is shown in Figure 5, another panel is added which illustrates raw data and summary statistics for differences between consecutive within-subject factor levels (i.e., pre to post, post to retention). (A) Original output as obtained from syntax and (B) adjusted figure based on a variety of settings not illustrated here, but partially explained in Figure 5C as well as in the main text. (C) Panel with individual difference scores connected (see main text for details). In all panels, red horizontal bars represent arithmetic means, blue horizontal bars represent medians and error bars represent 95% CIs associated with respective means.
Figure 7Exemplar plots visualizing the same data underlying the fictitious example for one-factorial within-subject designs. Line plot on mean choice RT (A) without and (B) with raw data connected. (C) Box plot together with mean choice RT without connection of raw data. (D) Box plot with raw data connected. (E) Dot plot together with mean and median choice RT without connection of raw data. (F) Dot plot of differences in choice reaction time between adjacent within-subject factor conditions (see main text for details). In (A,B,E,F ) error bars represent 95% CIs associated with the respective means. Means are represented by red bars (A–C,E,F ), medians are indicated by blue bars (E).
Figure 8(A) Illustration of the exemplar 2-x-2 within-subject design. The labels of the cells representing factor level combinations correspond to the names of variables in the SPSS dataset template that need to be filled with values to run the data visualization syntax. Graphical outputs obtained from 2 × 2 RM-ANOVA showing arithmetic means and associated 95% CIs (B) for quiet and loud conditions by music tempo and (C) for slow and fast conditions by music intensity. (D) Dot plots visualizing individual differences between the loud and quiet condition under slow and fast tempo (left panel) as well as individual differences between the fast and slow condition under quiet and loud intensity (right panel). (E) Same as in (D) with individual data points additionally being connected through straight lines. In (D,E) red bars represent arithmetic means and error bars indicate 95% CIs associated with means.
Overview of exemplar aspects users might want to modify in syntax.
| Aspect to modify | Default syntax code | Comment for modification |
|---|---|---|
| Level of CI | alpha(0.95) | Change value in parenthesis to, e.g., 0.90 or 0.99 to visualize 90% or 99% CIs. By default, the value is set to 0.95 in all syntax templates that allow the visualization of 95% CIs. |
| Display of SEM instead of SD | region.spread.sd | Change “sd” to “se” in the code such that the modified code reads: region.spread.se |
| Shape | shape(shape.ibeam) | The possibility of changing the shape of a graphical element may depend on its exact type such as “interval” to display dispersion measures (i.e., CI, SD), “point” to display raw data values or measures of central tendency (i.e., mean, median) or “line” to connect values (raw data dots or means). To change a graphic element’s shape in syntax, change the code given in parenthesis after “shape.” Since there is a multitude of options available for shape, please see the “GPL Reference Guide for IBM SPSS Statistics” ( |
| Color | color.interior(color.grey) | To change the color of, for example, the dots representing raw data, simply change the color code written in parenthesis to, e.g., red, green, blue, black or any other of the many color constants available through GPL. |
| Size | size(size.medium) | The size of graphic elements is either indicated through size constants (e.g., medium in the left column) or through explicit values given in pixels (e.g., “6px” in the left column). To change an element’s size by modifying code, simply replace the code given in parenthesis after “size.” either through another constant (tiny, small, medium, large, huge) or through another value (e.g., “8px”, “10px”). |
| Transparency | transparency.interior(transparency.“0.4”) | Transparency values can range between 0 (no transparency) to 1 (full transparency). Change value in parenthesis (default is 0.4) to change transparency so as to put more or less emphasis on raw data dots compared to measures of central tendency and dispersion ( |
| Arrangement of dots representing raw data | point.dodge.symmetric | Change “symmetric” to “asymmetric” for asymmetric arrangement of raw data dots with left alignment relative to the central position of a category (no overlap of dots representing the same value). |
The code snippet “interior” relates to a graphic element’s fill, whereas “exterior” relates to its border. If no such specification is made (i.e., neither interior nor exterior) code is implicitly handled as with “interior.”