| Literature DB >> 28588607 |
Alessandra Dal Molin1, Giacomo Baruzzo1, Barbara Di Camillo1.
Abstract
The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.Entities:
Keywords: assessment; benchmark; differential distributions; differential expression; single-cell RNA-seq
Year: 2017 PMID: 28588607 PMCID: PMC5440469 DOI: 10.3389/fgene.2017.00062
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Tools compared in this study.
| MAST; Finak et al., | Generalized linear hurdle model | Unix/Linux, Mac OS, Windows | Yes | |
| SCDE; Kharchenko et al., | Mixture of a negative binomial distribution and low-level Poisson distribution | Unix/Linux, Mac OS, Windows | Yes | |
| Monocle; Trapnell et al., | Generalized additive model | Unix/Linux, Mac OS, Windows | Yes | |
| D3E; Delmans and Hemberg, | Transcriptional bursting model | Python | Unix/Linux, Mac OS, Windows | No |
| DESeq; Anders and Huber, | Negative binomial distribution | Unix/Linux, Mac OS, Windows | No | |
| edgeR; Robinson et al., | Negative binomial distribution | Unix/Linux, Mac OS, Windows | No |
MAST, SCDE, Monocle, and D.
No information available about the version.
Figure 1Examples of the four classes of differential distributions, as defined in Korthauer et al. (.
Mean number of DEGs (± standard deviation) detected by each of the assessed tools below the FDR cut-off of 0.05.
| MAST | 1,153.00 ± 15.19 | 1,148.10 ± 15.72 |
| MASTNotCDR | 1,149.00 ± 15.55 | 1,144.10 ± 15.72 |
| SCDE | 1,021.30 ± 25.64 | 1,018.10 ± 24.92 |
| Monocle | 1,576.70 ± 8.47 | 1,471.30 ± 17.17 |
| D3E CvM | 1,741.00 ± 34.28 | 1,507.30 ± 7.78 |
| D3E KS | 1,700.70 ± 23.22 | 1,534.40 ± 16.70 |
| DESeq | 1,122.60 ± 16.95 | 1,116.20 ± 17.75 |
| edgeR | 1,564.50 ± 15.50 | 1,471.10 ± 16.75 |
The third column reports the average number of true DEGs (± standard deviation) among the total number of detected DEGs.
Figure 2Results of the analysis of simulated data. (A) Global PR curve for all tested tools. (B) Boxplots of global AURPC. (C) Boxplots of global Precision. (D) Boxplots of global Recall.
Figure 3Boxplots of Precision and Recall of simulated data for all tools, reported for the four Differential Distributions classes.
Figure 4Results of the analysis of Islam dataset using as benchmark dataset the list of top 1,000 DEGs of Moliner et al. (. Stacked barplots of detected DEGs are shown for all tools. The coral bar indicates the intersection with Moliner reference list. On the top of each coral bar is reported the ratio of detected Moliner genes among the total 1,000 assumed to be true positives. On the top of each blue bar is reported the ratio between the intersection with Moliner's reference list and the total number of called DEGs.
Figure 5Intersection plot of the tools under comparison. The coral-colored histogram located next to the tools' names, corresponds to the coral bar of Figure 4, as it reports for each tool the intersection size (i.e., number of DEGs in common) with Moliner's reference list. The green-colored histogram shows the intersection of Moliner's reference list with different combinations of tools. The “dot matrix” below the figure shows these different combinations by indicating with black dots the tools considered in the intersection and with gray dots the tools that do not contribute to the intersection.
Summary statistics of run time for all tools on simulated data.
| MAST | 00:00:03:52 ± 00:00:00:65 | 00:00:16:57 ± 00:00:03:47 |
| SCDE | 00:00:19:25 ± 00:00:02:02 | 00:01:26:75 ± 00:00:10:08 |
| Monocle | 00:01:05:04 ± 00:00:07:08 | 00:07:04:44 ± 00:00:11:05 |
| D3E_CvM | – | 04:19:39:46 ± 00:01:39:35 |
| D3E_KS | – | 04:18:41:22 ± 00:01:13:33 |
| DESeq | – | 00:00:26:14 ± 00:00:02:12 |
| edgeR | – | 00:00:03:23 ± 00:00:01:10 |
We reported mean and standard deviations of ten tests performed.