| Literature DB >> 31601804 |
Wanwen Zeng1,2, Xi Chen1, Zhana Duren1, Yong Wang3,4, Rui Jiang5, Wing Hung Wong6.
Abstract
Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31601804 PMCID: PMC6787340 DOI: 10.1038/s41467-019-12547-1
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of the DC3 method. a DC3 performs joint analysis using three types of data from separate samples from the same cell population: scRNA-seq, scATAC-seq, bulk HiChIP. denotes the genes expression level in each cell measured in scRNA-seq; denotes enhancer chromatin accessibilities in each cell measured in scATAC-seq; denotes the enhancer-promoter interactions strength (loop counts) between each gene and each enhancer measured in bulk HiChIP. b A graphical example for simultaneously decomposing , , to get the underlying clusters and cluster-specific HiChIP in K = 3 case: (1) : w1gives the mean gene expression for the i-th gene in the the k-th cluster of cells, while gives the assignment weights of the j-th cell to the k-th cluster; (2) gives the mean chromatin accessibility for the i-th enhancer in the k-th cluster of cells, while the j-th column of H2 gives the assignment weights of the j-th cell to the different clusters; (3) : each enhancer–promoter interaction c can be decomposed into subpopulation-specific interactions, i.e. , where c is the interaction strength in the k-th subpopulation and λ is proportional to the size of the subpopulation; Λ is a K by K diagonal matrix [λ1, λ2, …λ]. Within each subpopulation, following the assumption that an enhancer-promoter interaction is proportional to the product of accessibilities of the corresponding enhancer and promoter, we model c as where d is a set of indicators selecting the enhancer-promoter pair to be modeled. Therefore, cluster-specific HiChIP interactions of k-th subpopulation can be obtained from the k-th column of W1 multiple the transposition the k-th column of
The deconvolution performance of DC3 with different input combinations
| Input combinations | HiChIP | RNA-seq | ATAC-seq | Mean | |||
|---|---|---|---|---|---|---|---|
| K562 | GM12878 | K562 | GM12878 | K562 | GM12878 | ||
| scRNA-seq, scATAC-seq and scHi-C | 0.92 ± 0.01 | 0.95 ± 0.01 | 0.98 ± 0.00 | 0.99 ± 0.00 | 0.98 ± 0.01 | 0.99 ± 0.01 | 0.97 |
| scRNA-seq, scATAC-seq and bulk Hi-C | 0.86 ± 0.00 | 0.95 ± 0.00 | 0.98 ± 0.00 | 0.99 ± 0.00 | 0.98 ± 0.01 | 0.99 ± 0.01 | 0.95 |
| scRNA-seq, bulk ATAC-seq, bulk Hi-C | 0.78 ± 0.09 | 0.85 ± 0.08 | 0.95 ± 0.04 | 0.94 ± 0.02 | 0.85 ± 0.02 | 0.85 ± 0.02 | 0.87 |
| Bulk RNA-seq, scATAC-seq, bulk Hi-C | 0.71 ± 0.09 | 0.83 ± 0.08 | 0.85 ± 0.02 | 0.88 ± 0.03 | 0.88 ± 0.01 | 0.89 ± 0.01 | 0.84 |
| Random deconvolution | 0.61 ± 0.12 | 0.76 ± 0.10 | 0.76 ± 0.11 | 0.74 ± 0.08 | 0.62 ± 0.12 | 0.71 ± 0.08 | 0.70 |
Fig. 2Validation of DC3 in simulation and real data. a Performance of HiChIP deconvolution scRNA-seq and scATAC-seq for GM12878 and K562 simulation data under different drop out rates. As a comparison, the random deconvolution results are presented. Error bars represent standard deviation. b Performance of joint clustering for GM12878 and K562 simulation data under different drop out rates. Error bars represent standard deviation. c FACS plot shows that in RA-day 4, 15.7 ± 3.2% cells are double positive for population-2-markers EpCAM and CD38. d Performance of HiChIP deconvolution in RA-day 4 real data. The HiChIP profile measured from double positive cells (red triangle) is much closer to that inferred for subpopulation 2 (blue triangle) than to the HiChIP profiles inferred for the other subpopulation (blue circle and blue rhombus) or measured from the bulk sample (red circle). All HiChIP profiles are represented using n-dimensional vectors with each dimension indicates corresponding loop counts. Source data are provided as a Source Data file
Subpopulation-specific GO terms enrichment results
| -log10( | Original | Down-sampling | |||
|---|---|---|---|---|---|
| scRNA-seq | scRNA-seq + HiChIP | scRNA-seq | scRNA-seq + HiChIP | ||
| Subpop 1 | neuron development | 37.83 | 37.84 | 16.81 | 24.26 |
| axon development | 26.44 | 27.63 | 11.52 | 17.87 | |
| axonogenesis | 18.12 | 20.12 | 10.01 | 16.55 | |
| neuron projection guidance | 16.73 | 17.88 | 7.99 | 16.03 | |
| Subpop 2 | cardiovascular system development | 9.13 | 11.34 | 4.87 | 11.02 |
| vasculature development | 9.76 | 11.75 | 5.59 | 8.58 | |
| circulatory system development | 7.17 | 9.93 | 3.89 | 8.87 | |
| muscle structure development | 5.98 | 8.72 | 2.88 | 7.33 | |
| Subpop 3 | forebrain development | 11.32 | 13.26 | 1.44 | 13.03 |
| central nervous system development | 10.74 | 14.34 | 1.40 | 11.96 | |
| brain development | 9.02 | 13.98 | 2.18 | 11.22 | |
| head development | 8.77 | 12.05 | 1.57 | 9.51 | |
The enrichment p-values are transformed to −log10(p-values) and shown in the table. Original: scRNA-seq measured in SMART-seq with median ~1 million reads per cell; Down-sampling: simulated scRNA-seq measured in Drop-seq with median UMI ~5000
Fig. 3Analysis of subpopulation-specific regulatory networks. a–c Scatter plots of TF expression level and motif enrichment scores in the three subpopulations in RA-day 4. Node color represents expression specificity. Horizontal and vertical black lines indicate threshold values of motif enrichment scores and TF expression level. Key TFs are represented by squares (see text for key TF definition). d Top 30 key TFs in each subpopulation. Ranking is based on the product of log2(FPKM), motif enrichment score and expression specificity. e–g Dense subnetworks of key TFs plus expressed RA receptors in subpopulations 1 to 3 (left to right). Cadet blue color nodes represent the core subnetwork, violet nodes represent the upstream subnetwork and pink nodes represent the downstream subnetwork. Only the top 30 key TFs are shown. Source data are provided as a Source Data file
The numbers of cells in simulation data
| GM12878 | K562 | Total | |
|---|---|---|---|
| scRNA-seq | 73 | 73 | 146 |
| scATAC-seq | 373 | 373 | 746 |
| scHi-C | 100 | 100 | 200 |