Literature DB >> 34264756

Violin SuperPlots: visualizing replicate heterogeneity in large data sets.

Martin Kenny1, Ingmar Schoen1.   

Abstract

Entities:  

Year:  2021        PMID: 34264756      PMCID: PMC8694042          DOI: 10.1091/mbc.E21-03-0130

Source DB:  PubMed          Journal:  Mol Biol Cell        ISSN: 1059-1524            Impact factor:   4.138


× No keyword cloud information.
To the editor: A recent article in Molecular Biology of the Cell (Goedhart, 2021) presented a web interface for the creation of “SuperPlots.” SuperPlots were introduced by Lord and colleagues last year (Lord ) to visualize cell-level variability within replicates as well as the experimental reproducibility between replicates in one single plot. Simple bar charts or boxplots of mean or median values from experimental replicates mask the contribution of underlying cell-to-cell variations in individual experiments, whereas pooling cell-level data across replicates overemphasizes statistical differences. The SuperPlot put forward by Lord et al. uses a beeswarm plot to display the cell-level data color-coded according to the individual replicates and overlays the mean (or median) and error bars (SD or confidence intervals) of each replicate (Figure 1A). The new web interface (Goedhart, 2021) offers an online option for researchers to generate beeswarm SuperPlots, as well as RainCloud plots (Allen ), using their own data. We welcome the transparency brought by SuperPlots and would like to introduce an augmentation, the Violin SuperPlot, to further simplify visual inspection of raw data containing large sample sizes.
FIGURE 1:

Violin SuperPlots for the visualization of replicate heterogeneity in large data sets. (A) Beeswarm SuperPlots show cell-level (technical replicates) data color-coded by experimental (biological) replicate. Distributions of individual replicates can be difficult to interpret due to the density and jitter of the data points. This plot was created using the SuperPlotOfData web app. (B) Violin SuperPlots depict cell-level data from each replicate as stripes in a compound violin plot. Same data as in A. (C) The number of replicates (in this case six) in Violin SuperPlots can be increased without compromising readability. Symbols: means of experimental replicates. Lines: mean and SEM of the replicate means. Statistical test: paired Student's t test. Data shown: spreading area (µm2) of human platelets seeded on fibrinogen-coated coverslips for 60 min in the presence/absence of 40 µM blebbistatin.

Violin SuperPlots for the visualization of replicate heterogeneity in large data sets. (A) Beeswarm SuperPlots show cell-level (technical replicates) data color-coded by experimental (biological) replicate. Distributions of individual replicates can be difficult to interpret due to the density and jitter of the data points. This plot was created using the SuperPlotOfData web app. (B) Violin SuperPlots depict cell-level data from each replicate as stripes in a compound violin plot. Same data as in A. (C) The number of replicates (in this case six) in Violin SuperPlots can be increased without compromising readability. Symbols: means of experimental replicates. Lines: mean and SEM of the replicate means. Statistical test: paired Student's t test. Data shown: spreading area (µm2) of human platelets seeded on fibrinogen-coated coverslips for 60 min in the presence/absence of 40 µM blebbistatin. Beeswarm plots are a direct visualization of the raw data points that sample an underlying parameter distribution. As the number of data points increases, the individual points become indistinguishable while the outline of the beeswarm plot approaches the shape of the underlying parameter distribution. Moreover, the jittered arrangement of color-coded beeswarms in SuperPlots makes it very difficult to identify differences in the replicates’ distributions (Figure 1A). Lacking suitable alternatives, researchers have chosen to show the pooled data distribution using a violin plot that does not contain information about the individual cell distributions within biological replicates (Chavali ; Pagès ). We thus propose replacing the beeswarm plot with a modified violin plot. A violin plot is essentially a smoothened histogram rotated by 90° that provides a density estimation of these data (Hintze and Nelson, 1998). In our Violin SuperPlot (Figure 1B), the normalized density estimates of individual replicates are stacked to show how each replicate (color-coded stripe) contributes to the overall density estimate (outline), allowing rapid inspection of experimental variability. These vertical stripes are then overlaid with markers for the central tendency of each distribution (mean or median) and summary statistics (mean and SEM). Compared to a lesser-known visual representation, the so-called RainCloud plot (Allen ; Goedhart, 2021), Violin SuperPlots are more compact and concise, thus allowing for rapid visual comparisons and interpretation. Violin SuperPlots are especially useful for high-throughput single cell data sets from microscopy screenings that contain hundreds of cells per experimental replicate (Pepperkok and Ellenberg, 2006; Jones ). Certain cell parameters are not necessarily normally distributed. For example, cell spreading area can show one-sided distributions with a tail in either direction, depending on the proportion of spread versus nonspread cells, which may vary upon drug treatment or due to experimental variability (see Figure 1, here from donor to donor). This can be directly appreciated from the width of the stripes in a Violin SuperPlot (Figure 1B) even for experiments containing more than three replicates (Figure 1C), but is less clear from the color-coded points of a beeswarm representation (Figure 1A). Violin SuperPlots are particularly suited for data sets with >10 data points per replicate and up to ∼18 biological replicates (Supplemental Figure S1). For fewer data points (<10) and no more than three replicates, a direct depiction of the raw data by a color-coded beeswarm plot might be considered more appropriate than the smoothened density estimate of a violin plot. For many biological replicates (>18), the shape of the individual stripes of a Violin SuperPlot becomes uninformative. In this limiting case, plotting the replicate means together with their summary statistics on top of a violin plot of the pooled data (Chavali ; Pagès ) provides a suitable compromise. Violin SuperPlots thus do not replace previous SuperPlot formats (Lord ; Goedhart, 2021) but rather complement and extend their scope. To help cell biologists generate Violin SuperPlots from their own data, we have developed a Python-based command-line application built upon libraries that are routinely used for scientific data processing and visualization (Harris ; Virtanen ). The application was designed to be accessible for programmers and nonprogrammers alike and allows for effortless customization of the generated plots to suit user preferences (Supplemental Figures S2–S4). The package and supporting documentation are freely available from the PyPI repository and in the Supplemental Material accompanying this Letter. A basic implementation for MATLAB is also available as Supplemental Material. The software license also allows the integration of these Violin SuperPlots into a web interface and other data visualization programs. We join Goedhart (2021) and Lord in encouraging authors to represent data in ways that help the reader to assess biological variation within individual experiments, between biological replicates, and between conditions. We hope that researchers will find the Violin SuperPlots intuitive and helpful for this purpose.
  8 in total

Review 1.  High-throughput fluorescence microscopy for systems biology.

Authors:  Rainer Pepperkok; Jan Ellenberg
Journal:  Nat Rev Mol Cell Biol       Date:  2006-07-19       Impact factor: 94.444

2.  Wnt-Dependent Oligodendroglial-Endothelial Interactions Regulate White Matter Vascularization and Attenuate Injury.

Authors:  Manideep Chavali; Maria José Ulloa-Navas; Pedro Pérez-Borredá; Jose Manuel Garcia-Verdugo; Patrick S McQuillen; Eric J Huang; David H Rowitch
Journal:  Neuron       Date:  2020-10-20       Impact factor: 17.173

3.  SuperPlotsOfData-a web app for the transparent display and quantitative comparison of continuous data from different conditions.

Authors:  Joachim Goedhart
Journal:  Mol Biol Cell       Date:  2021-01-21       Impact factor: 4.138

4.  Raincloud plots: a multi-platform tool for robust data visualization.

Authors:  Micah Allen; Davide Poggiali; Kirstie Whitaker; Tom Rhys Marshall; Rogier A Kievit
Journal:  Wellcome Open Res       Date:  2019-04-01

5.  SuperPlots: Communicating reproducibility and variability in cell biology.

Authors:  Samuel J Lord; Katrina B Velle; R Dyche Mullins; Lillian K Fritz-Laylin
Journal:  J Cell Biol       Date:  2020-06-01       Impact factor: 10.539

Review 6.  Array programming with NumPy.

Authors:  Charles R Harris; K Jarrod Millman; Stéfan J van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández Del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi; Christoph Gohlke; Travis E Oliphant
Journal:  Nature       Date:  2020-09-16       Impact factor: 49.962

7.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens.

Authors:  Thouis R Jones; In Han Kang; Douglas B Wheeler; Robert A Lindquist; Adam Papallo; David M Sabatini; Polina Golland; Anne E Carpenter
Journal:  BMC Bioinformatics       Date:  2008-11-15       Impact factor: 3.169

Review 8.  SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors:  Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal:  Nat Methods       Date:  2020-02-03       Impact factor: 28.547

  8 in total
  5 in total

1.  Metal cofactor stabilization by a partner protein is a widespread strategy employed for amidase activation.

Authors:  Julia E Page; Meredith A Skiba; Truc Do; Andrew C Kruse; Suzanne Walker
Journal:  Proc Natl Acad Sci U S A       Date:  2022-06-22       Impact factor: 12.779

2.  Near-native state imaging by cryo-soft-X-ray tomography reveals remodelling of multiple cellular organelles during HSV-1 infection.

Authors:  Kamal L Nahas; Viv Connor; Katharina M Scherer; Clemens F Kaminski; Maria Harkiolaki; Colin M Crump; Stephen C Graham
Journal:  PLoS Pathog       Date:  2022-07-07       Impact factor: 7.464

Review 3.  The R Language: An Engine for Bioinformatics and Data Science.

Authors:  Federico M Giorgi; Carmine Ceraolo; Daniele Mercatelli
Journal:  Life (Basel)       Date:  2022-04-27

4.  Consecutive-Day Ventricular and Atrial Cardiomyocyte Isolations from the Same Heart: Shifting the Cost-Benefit Balance of Cardiac Primary Cell Research.

Authors:  Joachim Greiner; Teresa Schiatti; Wenzel Kaltenbacher; Marica Dente; Alina Semenjakin; Thomas Kok; Dominik J Fiegle; Thomas Seidel; Ursula Ravens; Peter Kohl; Rémi Peyronnet; Eva A Rog-Zielinska
Journal:  Cells       Date:  2022-01-11       Impact factor: 6.600

5.  Celebrating 20 years of live single-actin-filament studies with five golden rules.

Authors:  Hugo Wioland; Antoine Jégou; Guillaume Romet-Lemonne
Journal:  Proc Natl Acad Sci U S A       Date:  2022-01-18       Impact factor: 12.779

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.