Literature DB >> 33476183

SuperPlotsOfData-a web app for the transparent display and quantitative comparison of continuous data from different conditions.

Abstract

Plots and charts are graphical tools that make data intelligible and digestible by humans. But the oversimplification of data by only plotting the statistical summaries conflicts with the transparent communication of results. Therefore, plotting of all data are generally encouraged and this can be achieved by using a dotplot for discrete conditions. Dotplots, however, often fail to communicate whether the data are from different technical or biological replicates. The superplot has been proposed by Lord and colleagues (Lord et al., 2020) to improve the communication of experimental design and results. To simplify the plotting of data from discrete conditions as a superplot, the SuperPlotsOfData web app was generated. The tool offers easy and open access to state-of-the-art data visualization. In addition, it incorporates recent innovations in data visualization and analysis, including raincloud plots and estimation statistics. The free, open source webtool can be accessed at: https://huygens.science.uva.nl/SuperPlotsOfData/.

Entities: Species

Year: 2021 PMID： 33476183 PMCID： PMC8101441 DOI： 10.1091/mbc.E20-09-0583

Source DB: PubMed Journal: Mol Biol Cell ISSN： 1059-1524 Impact factor: 4.138

INTRODUCTION

Graphs have a key role in the communication of experimental results in lab meetings, presentations, and manuscripts. Over the years, efforts have been made to increase the transparency in reporting of results and this has led to the recommendation to plot all data, rather than just a summary (Drummond and Vowler, 2011; Wilcox and Rousselet, 2018; Weissgerber ). For the display of continuous data from discrete conditions this means that instead of, or on top of, a bar that summarizes the data, the dot is the geometry of choice as it enables the display of all data. Dotplots can be generated by several popular commercial software packages, but there are also free, open source solutions (Spitzer ; Mauri ; Weissgerber ; Postma and Goedhart, 2019). We have previously created PlotsOfData to facilitate visualization of continuous data from different (discrete) conditions (Postma and Goedhart, 2019). Under the hood, PlotsOfData uses the statistical computing software R and several packages, including ggplot2 for state-of-the-art data visualization (Wickham, 2011). The software is operated by graphical user interface in a web browser. As a result, PlotsOfData is a universally accessible open source tool that delivers high quality plots, without the need for coding skills and with a minimal learning curve. In addition to the transparent display of the data, information about the experimental design is necessary to interpret the results. For instance, it should be clear whether data are paired, how many technical and biological replicates are plotted, and how “n” is defined for different conditions (Naegle ; Lazic ). The “superplot” has been proposed by Lord and colleagues as a way to make this clear in a plot (Lord ). A superplot identifies different replicates by color and/or shape and uses the average of each (biological) replicate as input for the comparison of conditions. An additional benefit of the explicit identification of biological replicates, instead of using the aggregated technical replicates, is that realistic p values are obtained. Other approaches that aim to improve data visualization and analysis have recently been proposed, that is, raincloud plots (Allen ) and the use of estimation statistics (Cumming, 2014; Claridge-Chang and Assam, 2016; Ho ). The benefit of an open source tool (such as PlotsOfData) that is based on a powerful statistical computing language with excellent graphics is that these innovations are readily incorporated. In contrast, implementing these new ideas in commercial software requires multistep workarounds (Lord ). Here, SuperPlotsOfData is presented, which builds on the PlotsOfData web tool and uses the same philosophy of providing easy and open access to state-of-the-art data visualization. In addition, it enables the identification of replicates as a superplot and it incorporates recent innovations, including raincloud plots and estimation statistics.

Availability, code, and issue reporting

The SuperPlotsOfData app is available at: https://huygens.science.uva.nl/SuperPlotsOfData. The code was written using R (https://www.r-project.org) and Rstudio (https://www.rstudio.com). To run the app, several freely available packages are required: shiny, ggplot2, magrittr, dplyr, readr, tidyr, ggbeeswarm, readxl, DT, broom and RCurl. This version of the manuscript is connected to version 1.0.3 of the web app (https://github.com/JoachimGoedhart/SuperPlotsOfData/releases/tag/v1.0.3), which is archived at zenodo, doi: https://doi.org/10.5281/zenodo.4423341. Up-to-date code and new release will be made available on Github, together with information on running the app locally: https://github.com/JoachimGoedhart/SuperPlotsOfData/. The Github page of SuperPlotsOfData is the preferred way to communicate issues and request features (https://github.com/JoachimGoedhart/SuperPlotsOfData/issues). Alternatively, the users can contact the developers by email or Twitter. Contact information is found on the “About” page of the app.

Data input and data format

The default data structure is the tidy format (Wickham, 2014) and an example is shown in Table 1. The minimum input consists of a column with conditions and a column with measured variables. A third column that identifies the replicates is recommend to take full advantage of the application.

TABLE 1:

Synthetic data in the tidy format, where each row is an observation and all measured values are in one column.

Condition	Replicate	Value
Control	Replicate1	7.1
Control	Replicate1	6.7
Control	Replicate1	6.3
Drug	Replicate1	9.9
Drug	Replicate1	9.5
Drug	Replicate1	8.5
Control	Replicate2	5.1
Control	Replicate2	4.5
Control	Replicate2	4.4
Drug	Replicate2	8.7
Drug	Replicate2	9.1
Drug	Replicate2	8.5

Synthetic data in the tidy format, where each row is an observation and all measured values are in one column. Data are, however, often organized and stored in spreadsheets. The web app supports this type of “wide” data, when it is structured according to Table 2. Users are asked to define the number of rows that are used to define the conditions and replicates. For a superplot, typically two rows are necessary, one for conditions and another for specifying the replicates. Users can identify the information of each of the rows with a label. After upload, the data will be converted to the tidy format.

TABLE 2:

Data, identical to the data in Table 1, in a spreadsheet format. The first two rows specify the condition and replicate.

Control	Drug	Control	Drug
Replicate1	Replicate1	Replicate2	Replicate2
7.1	9.9	5.1	8.7
6.7	9.5	4.5	9.1
6.3	8.5	4.4	8.5

Data, identical to the data in Table 1, in a spreadsheet format. The first two rows specify the condition and replicate. The data can be supplied as a CSV file, XLS(X) file, by copy-paste or through a URL (CSV files only). One example dataset is available to demonstrate the data structure and for testing the app. After data upload, the user selects the columns that hold the information on the Conditions, Measurements, and (optional) Replicates. When no replicates are selected, the measurements are grouped per condition.

Data visualization

When the data are composed of different replicates, these are indicated for each condition with a different color and/or a different symbol as suggested before (Galbraith ; Weissgerber ; Lord ). The mean or median of each of the replicates is indicated with a larger dot (Figure 1). Lines are drawn between the means or medians from the same sample when the data are qualified by the user as “paired.” Pairing of the data will affect the statistics for the quantitative comparison of conditions, as will be explained below.

FIGURE 1:

Output of the application based on example data. (A) By default, the (biological) replicates are identified by unique, colorblind friendly colors and the mean of each replicate is indicated with a larger dot. (B) An alternative presentation of the same data that uses different symbols and gray values to identify replicates.

Statistics

Technical replicates.

A median or mean can be selected as the summary statistic for each replicate and this is shown in the graphs as a larger dot. Several other statistics for individual replicates are available under the Data Summary tab and include n, standard deviation standard error of the mean and the 95% confidence interval (CI). The table (Figure 2) with the statistics for the individual replicates lists the p value from a Shapiro–Wilk test which can be used to evaluate whether the data of the replicates are normally distributed. A high p value provides evidence for a normal distribution. When the majority of the replicates shows a deviation from normality (threshold for p < 0.05), the user is notified and advised to use the median as a summary statistic.

FIGURE 2:

A screenshot of statistics from the example data that are calculated by SuperPlotsOfData. The statistics are available under the Data Summary tab. Each of the tables can be downloaded in a number of formats.

Biological replicates.

The statistics for each condition are presented in a second table that is available under the Data Summary tab. The number of (biological) replicates, the mean, standard deviation, and standard error of the mean and 95% CI are displayed.

Comparing conditions.

The aim of an experiment with different conditions is often to detect a difference between those conditions. One way to do this is in a plot is by comparing the 95% CI. SuperPlotsOfData has an option to display the mean and 95% CI to enable inference by eye (Cumming and Finch, 2005; Cumming, 2009). Alternatively, the mean can be displayed together with the SD to summarize the spread in the data that is measured for a condition. The predominant statistical methods for the quantitative comparison of data are a null-hypothesis significance test (NHST). A low p value provides evidence for a statistical difference between conditions. SuperPlotsOfData offers an ordinary t test for a difference between the means of the of the individual replicates. Depending on the experimental design, the data can be paired or connected. To highlight a paired relation, lines can be added to connect the mean or median values of the replicates. This will affect the result of the statistical analysis and a notification is displayed by the app. A paired t test is done when the means are connected with lines and an unpaired t test with correction for unequal variances (also known as Welch’s t test) is done when the means are not connected. The table with statistics can be displayed under the plot and is also available under the Data Summary tab. Instead of testing for statistical significance, it is often more interesting and biologically relevant to answer the question: how large is the difference? (Cumming, 2014). The calculated difference between conditions and its 95% CI is also known as the effect size and this type of analysis is termed estimation statistics (Claridge-Chang and Assam, 2016). In SuperPlotsOfData there is an option to display the difference between a reference condition, which can be selected by the user, and the other conditions.

Optimizing the visualization

Scientific graphs often represent data in an unpolished format with default settings that are not optimized to communicate the data in an easy-to-digest manner. In marked contrast, the field of data visualization focuses on “storytelling” and aims to convey the story that is told by the data with a clear and compelling illustration. Several of the principles of storytelling may aid the construction of graphs that are easier to understand. SuperPlotsOfData has several features that improve the plot by making the communication and interpretation of the data more effective, including 1) the option to sort data according to the measured variable, 2) the choice to rotate the graph by 90 degrees to improve the readability of the conditions, 3) effective use of colorblind friendly colors, and 4) the option to switch off the gridlines. Finally, a dark theme is available to generate plots for presentation or websites that use a dark background.

Output

The graphs can be downloaded in PDF, SVG, or PNG format. The PDF format enables editing of figures in software applications that accept vector-based graphics. The statistics can be downloaded in CSV or Excel file format. A snapshot of the current setting can be made by the “Clone current setting” button, which returns a URL with that encodes the user-defined settings of the current session. When the data are imported from a web address, the graph can be stored and exchanged. This option enables a reproducible user interface. In addition, the option to retrieve a hyperlink of the current setting facilitates data sharing and reuse. For details, the reader is referred to the papers that report our other apps in which this feature is implemented (Postma and Goedhart, 2019; Goedhart and Luijsterburg, 2020). The hyperlink (URL) that corresponds to the setting used in the plots shown in Figures 1 and 3 are listed in Table 3.

FIGURE 3:

TABLE 3:

Hyperlinks that define the plots in Figures 1 and 3.

Figure	URL
1A	https://huygens.science.uva.nl:/SuperPlotsOfData/?data=2
1B	https://huygens.science.uva.nl/SuperPlotsOfData/?data=2&vis=;;0.7;;;;TRUE;1;none&layout=No;;;;;;;1
3A	https://huygens.science.uva.nl:/SuperPlotsOfData/?data=1;;Treatment;Speed;Replicate&vis=quasirandom;;0.7;mean;solid;;;1;none&layout=No;;;;;;;6;;480;480&color=none&label=;;;;;;24;24;18;;&
3B	https://huygens.science.uva.nl:/SuperPlotsOfData/?data=1;;Treatment;Speed;Replicate&vis=quasirandom;;0.7;mean;solid;;;1;none&layout=Horizontal;;;TRUE;;0,60;;6;;480;480&color=none&label=;;;;;;24;24;18;;&
3C	https://huygens.science.uva.nl:/SuperPlotsOfData/?data=1;;Treatment;Speed;Replicate&vis=random;TRUE;0.7;mean;solid;;;1;none&layout=No;TRUE;TRUE;TRUE;;0,60;;6;;480;480&color=none&label=;;;;;;24;24;18;;&

Launching the webtool by using the hyperlink will reproduce the corresponding figure (make sure to copy-paste the entire URL. Clicking the hyperlink in the pdf may result in a broken link).

An example of the flexibility in plotting the data that is offered by SuperPlotsOfData. All plots are based on the example data and the biological replicates are paired, as indicated with the solid line. (A) Classic presentation, (B) separate presentation of each replicate, and (C) rotated plot with the data distributions on top, also known as a rain cloud plot. The URLs to recreate these figures are listed in Table 3. Hyperlinks that define the plots in Figures 1 and 3. Launching the webtool by using the hyperlink will reproduce the corresponding figure (make sure to copy-paste the entire URL. Clicking the hyperlink in the pdf may result in a broken link).

Limitations

A limitation of the app is the absence of a dedicated statistical analysis, which has several reasons. First, the main purpose of the app is to visualize the data. Second, it is hard to implement an appropriate and fail-safe statistical analysis, since it is unpredictable what data will be supplied and analyzed (in terms of experimental design, but also the number of technical and biological replicates and the data distribution). By not implementing different analyses, a mindless, click-a-button, statistical analysis or “shopping” for the test that provides significance will be prevented. The statistical significance test should be carefully chosen to match the experimental design and data. For an overview of statistical tests and a decision tree to select the correct statistical test, the reader is referred to Pollard et al. (Pollard ). Finally, I believe that estimation statistics, that is, quantifying the actual difference between conditions, should be promoted and therefore an emphasis on significance testing is not desirable. Experimental data that comprise both technical and biological replicates have a nested design. Each type of replicate contributes in a different way to the overall variance. A statistically correct comparison for these types of data uses a multilevel model (Galbraith ; Aarts ). However, this is a relatively complicated analysis. As a practical and intuitive alternative, the average of each technical replicate can be used as input for a standard (paired) t test. This approach is statistically valid (Galbraith ; Aarts ), but it is recommended to keep the number of measurements for technical each replicate similar (Lazic, 2010). Although averaging the technical replicates is not the best approach (Galbraith ), it is preferred over aggregating all technical replicates and ignoring the existence of biological replicates. For a detailed discussion about the superplot, the reader is referred to the original paper that proposed superplots (Lord ). The web app enables the simultaneous visualization of multiple conditions. However, when multiple conditions are presented, their comparison may require a different test for statistical significance. For instance, testing multiple conditions may require correction for multiple testing to reduce the number of false positives. Since the primary goal of the web app is to visualize data, these analyses are currently not implemented. Even when more sophisticated statistics are used to calculate p values (e.g., hierarchical or nested tests), one can still use SuperPlotsOfData graphs to convey the distribution of the data.

DISCUSSION

The SuperPlotsOfData app implements some of the recent innovations in data visualization and analysis. I hope that the webtool will encourage users to adopt best practices in data presentation and analysis. These best practices include the display of individual observations, distinguishing between technical and biological replicates, and the use of estimation statistics for the quantitative comparison of conditions. The tool democratizes modern data visualization as 1) the app is freely accessible online or it can run locally with free software, 2) it does not require any coding skills, and 3) it has a minimal learning curve. Finally, the code of the app is available which makes the analysis procedure transparent and open to modification to accommodate any future developments in analysis and visualization. Click here for additional data file. Click here for additional data file.

21 in total

1. Estimation statistics should replace significance testing.

Authors: Adam Claridge-Chang; Pryseley N Assam
Journal: Nat Methods Date: 2016-02 Impact factor: 28.547

2. Inference by eye: confidence intervals and how to read pictures of data.

Authors: Geoff Cumming; Sue Finch
Journal: Am Psychol Date: 2005 Feb-Mar

3. Inference by eye: reading the overlap of independent confidence intervals.

Authors: Geoff Cumming
Journal: Stat Med Date: 2009-01-30 Impact factor: 2.373

4. The new statistics: why and how.

Authors: Geoff Cumming
Journal: Psychol Sci Date: 2013-11-12

5. Show the data, don't conceal them.

Authors: G B Drummond; S L Vowler
Journal: J Physiol Date: 2011-04-15 Impact factor: 5.182

Review 6. A Guide to Robust Statistical Methods in Neuroscience.

Authors: Rand R Wilcox; Guillaume A Rousselet
Journal: Curr Protoc Neurosci Date: 2018-01-22

Review 7. Reveal, Don't Conceal: Transforming Data Visualization to Improve Transparency.

Authors: Tracey L Weissgerber; Stacey J Winham; Ethan P Heinzen; Jelena S Milin-Lazovic; Oscar Garcia-Valencia; Zoran Bukumiric; Marko D Savic; Vesna D Garovic; Natasa M Milic
Journal: Circulation Date: 2019-10-28 Impact factor: 29.690

8. Raincloud plots: a multi-platform tool for robust data visualization.

Authors: Micah Allen; Davide Poggiali; Kirstie Whitaker; Tom Rhys Marshall; Rogier A Kievit
Journal: Wellcome Open Res Date: 2019-04-01

9. VolcaNoseR is a web app for creating, exploring, labeling and sharing volcano plots.

Authors: Joachim Goedhart; Martijn S Luijsterburg
Journal: Sci Rep Date: 2020-11-25 Impact factor: 4.379

10. What exactly is 'N' in cell culture and animal experiments?

Authors: Stanley E Lazic; Charlie J Clarke-Williams; Marcus R Munafò
Journal: PLoS Biol Date: 2018-04-04 Impact factor: 8.029

22 in total

1. The relative binding position of Nck and Grb2 adaptors impacts actin-based motility of Vaccinia virus.

Authors: Angika Basant; Michael Way
Journal: Elife Date: 2022-07-07 Impact factor: 8.713

2. Three-dimensional remodelling of the cellular energy distribution system during postnatal heart development.

Authors: Yuho Kim; Peter T Ajayi; Christopher K E Bleck; Brian Glancy
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2022-10-03 Impact factor: 6.671

3. The Fluorescence-Activating and Absorption-Shifting Tag (FAST) Enables Live-Cell Fluorescence Imaging of Methanococcus maripaludis.

Authors: Eric Hernandez; Kyle C Costa
Journal: J Bacteriol Date: 2022-06-03 Impact factor: 3.476

4. Analysis of three-dimensional chromatin packing domains by chromatin scanning transmission electron microscopy (ChromSTEM).

Authors: Yue Li; Vasundhara Agrawal; Ranya K A Virk; Eric Roth; Wing Shun Li; Adam Eshein; Jane Frederick; Kai Huang; Luay Almassalha; Reiner Bleher; Marcelo A Carignano; Igal Szleifer; Vinayak P Dravid; Vadim Backman
Journal: Sci Rep Date: 2022-07-16 Impact factor: 4.996

5. Assembly dynamics of FtsZ and DamX during infection-related filamentation and division in uropathogenic E. coli.

Authors: Bill Söderström; Matthew J Pittorino; Daniel O Daley; Iain G Duggin
Journal: Nat Commun Date: 2022-06-25 Impact factor: 17.694

6. Endothelial Focal Adhesions Are Functional Obstacles for Leukocytes During Basolateral Crawling.

Authors: Janine J G Arts; Eike K Mahlandt; Lilian Schimmel; Max L B Grönloh; Sanne van der Niet; Bart J A M Klein; Mar Fernandez-Borja; Daphne van Geemen; Stephan Huveneers; Jos van Rijssel; Joachim Goedhart; Jaap D van Buul
Journal: Front Immunol Date: 2021-05-18 Impact factor: 7.561

7. A specific hybridisation internalisation probe (SHIP) enables precise live-cell and super-resolution imaging of internalized cargo.

Authors: Sara Hernández-Pérez; Pieta K Mattila
Journal: Sci Rep Date: 2022-01-12 Impact factor: 4.379

8. Ex vivo anticoagulants affect human blood platelet biomechanics with implications for high-throughput functional mechanophenotyping.

Authors: Laura Sachs; Jan Wesche; Lea Lenkeit; Andreas Greinacher; Markus Bender; Oliver Otto; Raghavendra Palankar
Journal: Commun Biol Date: 2022-01-21

9. Violin SuperPlots: visualizing replicate heterogeneity in large data sets.

Authors: Martin Kenny; Ingmar Schoen
Journal: Mol Biol Cell Date: 2021-07-15 Impact factor: 4.138

10. Myosin-X and talin modulate integrin activity at filopodia tips.

Authors: Mitro Miihkinen; Max L B Grönloh; Ana Popović; Helena Vihinen; Eija Jokitalo; Benjamin T Goult; Johanna Ivaska; Guillaume Jacquemet
Journal: Cell Rep Date: 2021-09-14 Impact factor: 9.423