| Literature DB >> 32767996 |
Chenfei Wang1,2, Dongqing Sun3, Xin Huang4, Changxin Wan3, Ziyi Li3, Ya Han3, Qian Qin3, Jingyu Fan3, Xintao Qiu2,5, Yingtian Xie2,5, Clifford A Meyer1,2, Myles Brown2,5, Ming Tang1,2, Henry Long2,5, Tao Liu6, X Shirley Liu7,8.
Abstract
We present Model-based AnalysEs of Transcriptome and RegulOme (MAESTRO), a comprehensive open-source computational workflow ( http://github.com/liulab-dfci/MAESTRO ) for the integrative analyses of single-cell RNA-seq (scRNA-seq) and ATAC-seq (scATAC-seq) data from multiple platforms. MAESTRO provides functions for pre-processing, alignment, quality control, expression and chromatin accessibility quantification, clustering, differential analysis, and annotation. By modeling gene regulatory potential from chromatin accessibilities at the single-cell level, MAESTRO outperforms the existing methods for integrating the cell clusters between scRNA-seq and scATAC-seq. Furthermore, MAESTRO supports automatic cell-type annotation using predefined cell type marker genes and identifies driver regulators from differential scRNA-seq genes and scATAC-seq peaks.Entities:
Keywords: Cell-type annotation; Computational workflow; Integrate scRNA-seq and scATAC-seq; Predict transcriptional regulators; Single-cell ATAC-seq; Single-cell RNA-seq
Mesh:
Year: 2020 PMID: 32767996 PMCID: PMC7412809 DOI: 10.1186/s13059-020-02116-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1An overview of the MAESTRO workflow. Starting from fastq files, MAESTRO performs pre-processing, alignment, quality control, expression index for scRNA-seq, peak calling for scATAC-seq, clustering, differential analysis, cell-type annotation, and transcription factor identification analysis. If scRNA-seq and scATAC-seq from the same experiment are given, MAESTRO can perform the integration analysis and annotate the integrated clusters
Comprehensive features of MAESTRO compared to other single-cell tools
| Methods | Multiple technologies (e.g., scRNA-seq, scATAC-seq) | Multiple platforms (e.g., 10X Genomics, sci-ATAC-seq) | Alignment | Bulk level QC | Single-cell QC | Expression index and peak calling | Normalization | Clustering | Differential analysis | Cell-type annotation | Regulator annotation | Integrated analysis between multiple technologies |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SC3 [ | ✓ | ✓ | ✓ | |||||||||
| SNN-cliq [ | ✓ | ✓ | ✓ | |||||||||
| MAST [ | ✓ | ✓ | ||||||||||
| scde [ | ✓ | ✓ | ||||||||||
| Monocle [ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| Pagoda [ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| Scanpy [ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| Seurat [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| scABC [ | ✓ | ✓ | ✓ | |||||||||
| CisTopic [ | ✓ | ✓ | ✓ | ✓ | ||||||||
| chromVAR [ | ✓ | ✓ | ✓ | ✓ | ||||||||
| Cicero [ | ✓ | ✓ | ✓ | ✓ | ||||||||
| Cellranger [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| Dr.seq2 [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| snapATAC [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| MAESTRO | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Fig. 2Pre-processing and quality control using MAESTRO. a Mapping summary of human PBMC scRNA-seq (12k cells) dataset. b Mapping summary of human PBMC scATAC-seq (10k cells) dataset. c Cell filtering plot of PBMC scRNA-seq. The x-axis represents the number of unique reads/UMIs present in each cell, and the y-axis represents the number of genes covered in each cell. d Cell filtering plot of PBMC scATAC-seq. The x-axis represents the number of unique reads present in each cell, and the y-axis represents the fraction of reads in promoter regions (defined as 2 kb up/downstream of TSS)
Fig. 3Clustering, cell-type, and transcriptional regulator annotation using MAESTRO. a UMAP visualization of human PBMC scRNA-seq (12k cells) clusters. Colors represent the different clusters with the cluster ID labeled. b UMAP visualization of human PBMC scATAC-seq (10k cells) clusters. Colors represent the different clusters with cluster ID labeled. c UMAP visualization of human PBMC scRNA-seq (12k cells) clusters. Colors represent the different cell types. The cell-type information for each cluster is annotated using the expression level of marker genes. d UMAP visualization of human PBMC scATAC-seq (10k cells) clusters. Colors represent the different cell types. The cell-type information for each cluster is annotated using the regulatory potential of marker genes. e The rank of driver transcription regulators in the CD14 monocyte cells of PBMC scRNA-seq (12k cells). The regulators are ranked by the TF enrichment score from LISA results in cluster-specific genes, and the color of the circles represents the averaged expression level of corresponding regulators in CD14 monocyte cells. The names of the top 10 TFs are labeled on the graph. f The rank of driver transcription regulators in the CD14 monocyte cells of PBMC scATAC-seq (10k cells). The regulators are ranked by the TF enrichment score from GIGGLE results in cluster-specific peaks, and the color of the circles represents the averaged regulatory potential of corresponding regulators in CD14 monocyte cells. The names of the top 10 TFs are labeled on the graph
Fig. 4Integrated analysis of PBMC scRNA-seq and scATAC-seq data using MAESTRO. a UMAP visualization for joint clustering of human PBMC scRNA-seq (12k cells) and PBMC scATAC-seq (10k cells). Colors represent the cells from different technologies. The cells are joined by CCA on gene expression level and regulatory potential from MAESTRO. b UMAP visualization for joint clustering of human PBMC scRNA-seq and scATAC-seq. The cells are joined by CCA on the gene expression level and regulatory potential from MAESTRO. Colors represent the cell types, for which are generated using the scRNA-seq dataset and transferred to the scATAC-seq dataset. c The rank of driver regulators in CD14 monocyte cells of the PBMC dataset. The x-axis represents the TF enrichment score from LISA results in cluster-specific genes using scRNA-seq; the y-axis represents the TF enrichment score from GIGGLE results in cluster-specific peaks using scATAC-seq. The color of the circles represents the averaged expression level of corresponding regulators in CD14 monocyte scRNA-seq cells, and the size represents the TF enrichment score using GIGGLE in CD14 monocyte scATAC-seq cells. The names of the top 10 TFs from LISA and GIGGLE are labeled on the graph. d Comparison of transcriptional regulators predicted using scRNA-seq and scATAC-seq in each cell type for PBMC dataset. The y-axis represents the Spearman’s correlation coefficient between LISA-predicted TF enrichment score and GIGGLE-predicted TF enrichment score for all the tested regulators. e Genome browser view of MS4A1 (B cells), CD8A (T cells), and HLA-DQA1 (monocytes and DCs) locus. The pseudo-bulk ATAC-seq profiles are generated by pooling together cells within each cell type. The y-axis represents the sequence depth-normalized ATAC-seq signals (reads per million mapped reads (RPM))
Fig. 5The dramatic change of bone marrow microenvironment in a CLL patient versus a healthy donor. a UMAP visualization for joint clustering of human BMMC scRNA-seq (5k cells) and scATAC-seq (9k cells) from the CLL patient and the healthy donor. Colors represent the cells from different technologies. The cells are joined by CCA on gene expression level and regulatory potential from MAESTRO. b UMAP visualization for joint clustering of human BMMC scRNA-seq (5k cells) and scATAC-seq (9k cells) from the CLL patient and the healthy donor. The cells are joined by CCA on gene expression level and regulatory potential from MAESTRO. Colors represent the cell types, which are generated using the scRNA-seq dataset and transferred to the scATAC-seq dataset. c Cell-type proportions of the CLL patient and the healthy donor. The total number of cells in each sample (CLL patients or healthy donors) should add up to 1. The scRNA-seq and scATAC-seq are quantified separately. Statistic significance is evaluated using two proportion z test, ***p < 2.2e−16, *p ≤ 0.05, N.S.p > 0.05. d The rank of driver regulators in CLL1 (left) and CLL2 (right) cluster of the BMMC dataset. The x-axis represents the TF enrichment score from LISA result on differentially expressed genes between CLL1 and CLL2 clusters in scRNA-seq; the y-axis represents the TF enrichment score from GIGGLE result on differential peaks between CLL1 and CLL2 clusters in scATAC-seq. The color of the circles represents the averaged expression level of regulators in corresponding clusters of scRNA-seq, and the size represents the TF enrichment score using GIGGLE in corresponding clusters of scATAC-seq. The names of the top 10 TFs from LISA and GIGGLE are labeled on the graph
Fig. 6MAESTRO outperforms the existing software in integrating scRNA-seq and scATAC-seq dataset. a Comparison of the cell-type label prediction score distribution after integration of the scRNA-seq and scATAC-seq using gene activity scores from MAESTRO, Seurat, snapATAC, and cicero. The comparisons were made on three independent datasets: PBMC from different donors, PBMC from the same donor, and BMMC from the same donor. Statistical significance was evaluated using Wilcoxon rank-sum test. b Comparison of the consistency between gene expression and gene activity scores from MAESTRO, Seurat, snapATAC, and cicero. The comparisons were made on three independent datasets. The x-axis represents the different cell types, and the y-axis represents the Spearman’s correlation coefficient between the expression level from scRNA-seq and gene activity score from scATAC-seq within each cluster. Only the top 2000 highly variable genes were used in the analysis