| Literature DB >> 29925875 |
Mahdi Zamanighomi1, Zhixiang Lin1, Timothy Daley1,2, Xi Chen1,3, Zhana Duren1, Alicia Schep4,5, William J Greenleaf4,5,6, Wing Hung Wong7,8.
Abstract
Characterizing epigenetic heterogeneity at the cellular level is a critical problem in the modern genomics era. Assays such as single cell ATAC-seq (scATAC-seq) offer an opportunity to interrogate cellular level epigenetic heterogeneity through patterns of variability in open chromatin. However, these assays exhibit technical variability that complicates clear classification and cell type identification in heterogeneous populations. We present scABC, an R package for the unsupervised clustering of single-cell epigenetic data, to classify scATAC-seq data and discover regions of open chromatin specific to cell identity.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29925875 PMCID: PMC6010417 DOI: 10.1038/s41467-018-04629-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1The scABC framework for unsupervised clustering of scATAC-seq data. a Overview of scABC pipeline. scABC constructs a matrix of read counts over peaks, then weights cells by sample depth and applies a weighted K-medoids clustering. The clustering defines a set of K landmarks, which are then used to reassign cells to clusters. b Assignment of cells to landmarks by Spearman correlation, where each cell is highly correlated with just one landmark. The similarity measure used above is defined as the Spearman correlation of cells to landmarks, normalized by the mean of the absolute values across all landmarks for every cell. This allows us to better visualize the relative correlation across all cells. c Accessibility of peaks across all cells. The vast majority of peaks tend to be either common or cluster specific, allowing us to define cluster specific peaks
Fig. 2Cluster specific peaks determined by scABC shed light on cell identity. a Application of chromVAR to the cluster specific narrow peaks allows for the identification of cluster specific transcription factor binding motifs. chromVAR calculated deviations are shown for the top twenty most variable transcription factor binding motifs. b Cluster-specific open promoters distinguish expression. Shown are the densities of the average log gene expression values in genes with either a K562-specific open promoter, HL60-specific open promoter, or non-specific promoter (neither) in K562 cells (left) or HL60 cells (right), with each plot normalized to have total area equal to one. c Integration of scATAC-seq and scRNA-seq enables clear delineation of cell identity. scABC applied to scATAC-seq identified genes with cluster specific open promoters for K562 and HL-60 cells. These genes were then used for Principal Component Analysis (PCA) of 42 K562 and 54 HL-60 cells (right) and compared to PCA of all genes (left)
Fig. 3The application of scABC to a biological cell mixture. a 95 scATAC-seq samples were obtained on the day 4 of RA-treated mESC differentiation and classified into two clusters by scABC. Here, similarity between cells (rows) and the two detected landmarks (columns) are depicted, with cluster assignments on the left. b Heatmap for peak accessibility across cluster specific peaks (columns) and cells (rows). To simplify the presentation for each cluster, we only show the top 500 peaks specific to each cluster, i.e. the smallest scABC p-values (Methods). c chromVAR deviations for the top 50 most variable TF motifs (columns) and cells (rows), calculated using cluster specific narrow peaks. Hierarchical cluster analysis of deviations divides motifs into two groups, each specific to just one cluster