| Literature DB >> 32051003 |
Christian H Holland1,2, Jovan Tanevski1,3, Javier Perales-Patón1, Jan Gleixner4,5, Manu P Kumar6, Elisabetta Mereu7, Brian A Joughin6,8, Oliver Stegle4,5,9, Douglas A Lauffenburger6, Holger Heyn7,10, Bence Szalai11, Julio Saez-Rodriguez12,13.
Abstract
BACKGROUND: Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way.Entities:
Keywords: Benchmark; Functional analysis; Pathway analysis; Transcription factor analysis; scRNA-seq
Mesh:
Substances:
Year: 2020 PMID: 32051003 PMCID: PMC7017576 DOI: 10.1186/s13059-020-1949-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Testing the robustness of DoRothEA (AB), PROGENy, and GO-GSEA against low gene coverage. a DoRothEA (AB) performance (area under ROC curve, AUROC) versus gene coverage. b PROGENy performance (AUROC) for different number of footprint genes per pathway versus gene coverage. c Performance (AUROC) of GO-GSEA versus gene coverage. The dashed line indicates the performance of a random model. The colors in a and c are meant only as a visual support to distinguish between the individual violin plots and jittered points
Fig. 2Benchmark results of TF and pathway analysis tools on simulated scRNA-seq data. a Simulation strategy of single cells from an RNA-seq bulk sample. b Example workflow of DoRothEA’s performance evaluation on simulated single cells for a specific parameter combination (number of cells = 10, mean library size = 5000). 1. Step: ROC-curves of DoRothEA’s performance on single cells (25 replicates) and on bulk data including only TFs with confidence level A. 2. Step: DoRothEA performance on single cells and bulk data summarized as AUROC vs TF coverage. TF coverage denotes the number of distinct perturbed TFs in the benchmark dataset that are also covered by the gene set resource (see Additional file 1: Figure S3a) Results are provided for different combinations of DoRothEA’s confidence levels (A, B, C, D, E). Error bars of AUROC values depict the standard deviation and correspond to different simulation replicates. Step 3: Averaged difference across all confidence level combinations between AUROC of single cells and bulk data for all possible parameter combinations. The letters within the tiles indicates which confidence level combination performs the best on single cells. The tile marked in red corresponds to the parameter setting used for previous plots (Steps 1 and 2). c D-AUCell and d metaVIPER performance on simulated single cells summarized as AUROC for a specific parameter combination (number of cells = 10, mean library size = 5000) and corresponding bulk data vs TF coverage. e, f Performance results of e PROGENy and f P-AUCell on simulated single cells for a specific parameter combination (number of cells = 10, mean library size = 5000) and corresponding bulk data in ROC space vs number of footprint genes per pathway. c–f Plots revealing the change in performance for all possible parameter combinations (Step 3) are available in Additional file 1: Figure S7. b–f The dashed line indicates the performance of a random model
Fig. 3Benchmark results of TF analysis tools on real scRNA-seq data. a Performance of DoRothEA, D-AUCell, metaVIPER, and SCENIC on all sub benchmark datasets in ROC space vs TF coverage. b Performance of DoRothEA, D-AUCell, and metaVIPER on all sub benchmark datasets in ROC vs TF coverage split up by combinations of DoRothEA’s confidence levels (A-E). a, b In both panels, the results for each tool are based on the same but for the respective panel different set of (shared) TFs. TF coverage reflects the number of distinct perturbed TFs in the benchmark data set that are also covered by the gene sets
Fig. 4Application of TF and pathway analysis tools on a representative scRNA-seq dataset of PBMCs and HEK cells. a Dendrogram showing how cell lines/cell types are clustered together based on different hierarchy levels. The dashed line marks the hierarchy level 2, where CD4 T cells, CD8 T cells, and NK cells are aggregated into a single cluster. Similarly, CD14+ monocytes, FCGR3A+ monocytes, and dendritic cells are also aggregated to a single cluster. The B cells and HEK cells are represented by separate, pure clusters. b, d Comparison of cluster purity (clusters are defined by hierarchy level 2) between the top 2000 highly variable genes and b TF activity and TF expression and d pathway activities. The dashed line in b separates SCENIC as it is not directly comparable to the other TF analysis tools and controls due to a different number of considered TFs. c UMAP plots of TF activities calculated with DoRothEA and corresponding TF expression measured by SMART-Seq2 protocol. e Heatmap of selected TF activities inferred with DoRothEA from gene expression data generated via Quartz-Seq2