| Literature DB >> 29987051 |
Zhana Duren1,2, Xi Chen1,2, Mahdi Zamanighomi1,2,3, Wanwen Zeng1,2,4, Ansuman T Satpathy3, Howard Y Chang3, Yong Wang5,6, Wing Hung Wong7,2,3.
Abstract
When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the clustering of cells in the different samples should be coupled. We formulate this "coupled clustering" problem as an optimization problem and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single-cell RNA-sequencing (RNA-seq) and single-cell ATAC-sequencing (ATAC-seq) data.Entities:
Keywords: NMF; coupled clustering; single-cell genomic data
Mesh:
Year: 2018 PMID: 29987051 PMCID: PMC6065048 DOI: 10.1073/pnas.1805681115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Overview of the coupled-clustering method. (A) Single-cell gene expression and single-cell chromatin accessibility data. (B) Learning coupling matrix from public data. (C) Coupled clustering model. (D) Cluster-specific gene expression and chromatin accessibility.
Fig. 2.(A) Clustering results of k-means, NMF, and our coupled clustering on simulation scRNA-seq data of CMP and MEP. (B) Clustering results of k-means, NMF, and our coupled clustering on simulation scATAC-seq data of CMP and MEP. (C) Comparison of k-means, NMF, and coupled clustering on simulation data of CMP and MEP.
Fig. 3.(A) t-SNE plot of scRNA-seq data (Right) and scATAC-seq data (Left) from RA day 4. Different colors represent clustering assignment from the coupled-clustering method. (B) Same t-SNE plots as in A. Different colors represent cluster-specific TFs’ (Ebf1, Gata4, and Rfx4) gene expression Z score and motif activity Z score. (C) Comparison of cluster-specific TFs’ expression Z score with motif activity Z score at the cluster level. (D) Overlap of cluster-specific peaks nearby genes with cluster-specific genes. The values represent Fisher’s exact test P value and fold change.
Fig. 4.(A–C) Similarity of cluster-specific peaks with enhancers of 12 tissues’ seven developmental stages. The numbers represent 10,000× Jaccard index and NA indicates enhancer data of that tissue in that stage are not available. (D) Percentage of VISTA enhancer that overlapped with cluster-specific peaks. (E) GO enrichment of cluster-specific genes.