| Literature DB >> 34433064 |
Gregory W Schwartz1, Yeqiao Zhou1, Jelena Petrovic1, Warren S Pear2, Robert B Faryabi3.
Abstract
Emerging single-cell epigenomic assays are used to investigate the heterogeneity of chromatin activity and its function. However, identifying cells with distinct regulatory elements and clearly visualizing their relationships remains challenging. To this end, we introduce TooManyPeaks to address the need for the simultaneous study of chromatin state heterogeneity in both rare and abundant subpopulations. Our analyses of existing data from three widely used single-cell assays for transposase-accessible chromatin using sequencing (scATAC-seq) show the superior performance of TooManyPeaks in delineating and visualizing pure clusters of rare and abundant subpopulations. Furthermore, the application of TooManyPeaks to new scATAC-seq data from drug-naive and drug-resistant leukemic T cells clearly visualizes relationships among these cells and stratifies a rare "resistant-like" drug-naive sub-clone with distinct cis-regulatory elements.Entities:
Keywords: T cell leukemia; chromatin accessibility; drug resistance; single-cell ATAC-seq; visualization and clustering algorithms
Mesh:
Year: 2021 PMID: 34433064 PMCID: PMC8409102 DOI: 10.1016/j.celrep.2021.109575
Source DB: PubMed Journal: Cell Rep Impact factor: 9.423
Figure 1.TooManyPeaks overview and performance comparison
(A) Graphical representation of the TooManyPeaks algorithm. Following the arrows from left to right, TooManyPeaks converts scATAC-seq data to a cell-by-bin matrix, binarizes each value (accessible or inaccessible), and identifies and visualizes cell clade relationships by using matrix-free divisive hierarchical spectral clustering (see STAR Methods). TooManyPeaks trees are interpreted by following the cell groups from the root (the largest inner node) to the leaves. A leaf node here is shown as a pie chart of its cell composition. The sizes of a leaf and branches are proportional to the number of cells in the node. TooManyPeaks may then perform several downstream analyses.
(B–D) Clustering benchmarks with, from left to right, lower entropy, higher purity, higher normalized mutual information (NMI), higher adjusted Rand index (ARI), higher homogeneity, and higher residual average Gini index (RAGI; not applicable to synthetic data) representing more accurate clustering of simulated bone marrow cells with a moderate noise level of 0.2 (Chen et al., 2019) (B), CD34+ hematopoietic progenitor cells profiled using 10x Genomics (n = 7,771 cells) (Satpathy et al., 2019) (C), or Fluidigm C1 (n = 2,954 cells) (Buenrostro et al., 2018) (D).
(E–G) Detection of cells from two “rare” populations mixed with a “common” population was benchmarked. Box-and-whisker plots quantifying the accuracy of rare population detection in controlled admixtures from various datasets (m = 10 admixtures), as follows: n = 1,000 synthetic cells generated by simATAC (Navidi et al., 2021) (E); n = 1,000 B (common), CD8+ T (“rare1”) and Treg cells (“rare2”) (Satpathy et al., 2019) (F); and n = 500 common myeloid progenitors (CMPs) (common), monocytes (rare1), and plasmacytoid dendritic cells (pDC) (rare2) (Buenrostro et al., 2018) (G). Each point represents the average performance of 10 experiments from an admixture (100 admixtures overall). Performance indicates (true rare pairs (cells from the same rare population in the same cluster)/total rare pairs (true rare pairs and cells from different rare populations)). Box-and-whisker plots represent the following: center line, median; box limits, upper (75th) and lower (25th) percentiles; whiskers, 1.5 × interquartile range; points, outliers. See also Figure S1.
Figure 2.Stratification and annotation of murine bone marrow and spleen cells
(A) The TooManyPeaks algorithm for cell-type annotation based on input reference cis-regulatory elements is used to predict the cell types in mouse bone marrow and spleen (n = 16,749 cells) (Cusanovich et al., 2018). Reference cis-regulatory elements of 92 phenotypically defined progenitor and differentiated hematopoietic cell types are generated from the analyses of bulk ATAC-seq in FACS-sorted cells (Yoshida et al., 2019). A TooManyPeaks tree pruned at median(modularity) + 15 × MAD (modularity) threshold shows major hematopoietic lineages. At each bipartitioning, a darker circle circumference represents higher modularity.
(B–J) TooManyPeaks tree (B) and UMAP outputs (C–J) colored by T3 B cells (red, left) or cluster label (right) generated by the noted algorithms.
(K) Clustering benchmarks with, from left to right, lower entropy, higher purity, higher NMI, higher ARI, and higher homogeneity showing more accurate clustering of phenotypically defined progenitor and differentiated hematopoietic cell types in mouse bone marrow and spleen by TooManyPeaks. An “X” marks algorithms that failed to complete. See also Figures S2, S3, and S4.
Figure 3.TooManyPeaks identifies genomic elements specific to resistant-like parental T-ALL cells
(A) TooManyPeaks tree of parental (n = 3,831 cells) and GSI-resistant (n = 4,158 cells) DND-41 T-ALL cells showing a resistant-like parental subpopulation of n = 144 cells.
(B) Genome tracks highlight key genomic elements at the MYC locus from 5′ to 3′, as follows: MYC promoter, Notch-dependent MYC enhancer E1, LINC00977-proximal enhancer E2, and Notch-independent MYC enhancer E3. The top two and bottom two tracks show H3K27ac and aggregated scATAC-seq of DND-41 populations in (A), respectively.
(C–F) TooManyPeaks tree as in (A) showing the accessibility of the MYC promotor (C) and enhancers E1 (D), E2 (E), and E3 (F).
(G) Box-and-whisker plot showing normalized accessibility at each locus in (B) for each population from (A).
(H) TooManyCells tree of gene expression showing elevated LINC00977 levels in the parental population (n = 7,371 cells).
(I) Box-and-whisker plot quantifying upper-quartile-normalized LINC00977 expression in each population from (H). See also Figures S5, S6, and S7 and Tables S1, S2, S3, S4, and S5.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Rabbit polyclonal anti-H3 acetyl-K27 | Active Motif | Cat# 39133; RRID:AB_2561016 |
| Chemicals, peptides, and recombinant proteins | ||
| Recombinant Protein G Agarose | Invitrogen | Cat# 15920-010 |
| Proteinase K | Invitrogen | Cat# 25530-049 |
| RNase A | Roche | Cat# 10109169001 |
| γ-Secretase Inhibitor XXI (compound E) | Calbiochem | Cat# 565790 |
| RPMI 1640 | Corning | Cat# 10-040-CM |
| HyClone Fetal bovine serum | Thermo Fisher Scientific | Cat# SH30070.03 |
| L-glutamine | Corning | Cat# 25-005-CI |
| Penicillin-Streptomycin | Corning | Cat# 30-002-CI |
| MEM Non-Essential Amino Acids | GIBCO | Cat# 11140-050 |
| Sodium Pyruvate | GIBCO | Cat# 11360-070 |
| Glycine | Invitrogen | Cat# 15527-013 |
| Pierce 16% Formaldehyde | Thermo Fisher Scientific | Cat# 28908 |
| Trizma Hydrochloride Solution, pH 7.4 | Sigma-Aldrich | Cat# T2194-100ml |
| Sodium Chloride Solution, 5M | Sigma-Aldrich | Cat# 59222C-500ml |
| Magnesium Chloride Solution, 1M | Sigma-Aldrich | Cat# M1028-100ml |
| Nonidet P40 Substitute | Sigma-Aldrich | Cat# 74385-5l |
| MACS BSA Stock Solution | Miltenyi Biotec | Cat# 130-091-376 |
| Flowmi Cell Strainer, 40 mm | Bel-Art | Cat# H13680-0040 |
| Digitonin | Thermo Fisher Scientific | Cat# BN2006 |
| Dulbecco’s Phosphate-Buffered Salt Solution 1X | Corning | Cat# 21031CV |
| Critical commercial assays | ||
| KAPA Library Quant Kit | Roche | Cat# KK4824 |
| D1000 ScreenTape | Agilent | Cat# 5067-5582 |
| D1000 Reagents | Agilent | Cat# 5067-5583 |
| High Sensitivity D1000 ScreenTape | Agilent | Cat# 5067-5584 |
| High Sensitivity D1000 Reagents | Agilent | Cat# 5067-5585 |
| QIAquick PCR Purification Kit | QIAGEN | Cat# 28106 |
| NEBNext Ultra II DNA Library Prep Kit | NEB | Cat# E7645S |
| Chromium Single Cell ATAC Library & Gel Bead Kit, 4 rxns | 10X GENOMICS | Cat# PN-1000111 |
| Chromium i7 Multiplex Kit N, Set A | 10X GENOMICS | Cat# PN-1000084 |
| Chromium Chip E Single Cell ATAC Kit, 48 rxns | 10X GENOMICS | Cat# PN-1000082 |
| NextSeq® 500/550 High Output Kit v2 (75 cycles) | Illumina | Cat# FC-404-2005 |
| NextSeq® 500/550 High Output Kit v2 (150 cycles) | Illumina | Cat# FC-404-2002 |
| Deposited data | ||
| Raw and analyzed scATAC-seq data | This paper | GEO: GSE155916 |
| Raw and analyzed ChIP-seq data | This paper | GEO: GSE171098 |
| Bulk ATAC-seq of purified progenitor and differentiated hematopoietic cells | GEO: GSE100738 | |
| 10x Genomics scATAC-seq of CD34\textsuperscript{+} hematopoietic progenitor cells | GEO: GSE129785 | |
| Fluidigm C1 scATAC-seq of CD34\textsuperscript{+} hematopoietic progenitor cells | GEO: GSE96769 | |
| sciATAC-seq of murine marrow and spleen cells | GEO: GSE111586 | |
| scRNA-seq of GSI-resistant DND-41 cells | GEO: GSE138892 | |
| Experimental models: Cell lines | ||
| DND-41 | DSMZ | ACC 525 |
| Software and algorithms | ||
| APEC v1.2.2 |
| |
| Cicero v1.9.1 |
| |
| CisTopic v0.3.0 |
| |
| Cusanovich2018 | This paper | |
| EpiScanpy v0.3.0 |
| |
| Seurat v3.2.3 |
| |
| Signac v1.1.0 |
| |
| SnapATAC v1.0.0 |
| |
| tsne v0.1.3 |
|
|
| TooManyPeaks v2.2.0.0 | This paper |
|
| TooManyPeaks analysis code | This paper |
|
| R wrapper for TooManyCells v0.1.1.0 |
| |
| umap-learn v0.4.6 |
| |
| HOMER v4.9 |
| |
| bedtools v2.30.0 |
| |
| BWAv0.7.13 |
| |
| Cell Ranger ATAC v1.2.0 |
| |
| Picard v2.1.0 | Broad Institute |
|
| Trim Galore v0.4.1 | Babraham Bioinformatics |
|
| UCSC tools v404 |
|