| Literature DB >> 35259165 |
Bryce Rowland1, Ruth Huh1, Zoey Hou2, Cheynna Crowley1, Jia Wen3, Yin Shen4,5, Ming Hu6, Paola Giusti-Rodríguez7, Patrick F Sullivan3,8,9, Yun Li1,3,10.
Abstract
Hi-C data provide population averaged estimates of three-dimensional chromatin contacts across cell types and states in bulk samples. Effective analysis of Hi-C data entails controlling for the potential confounding factor of differential cell type proportions across heterogeneous bulk samples. We propose a novel unsupervised deconvolution method for inferring cell type composition from bulk Hi-C data, the Two-step Hi-c UNsupervised DEconvolution appRoach (THUNDER). We conducted extensive simulations to test THUNDER based on combining two published single-cell Hi-C (scHi-C) datasets. THUNDER more accurately estimates the underlying cell type proportions compared to reference-free methods (e.g., TOAST, and NMF) and is more robust than reference-dependent methods (e.g. MuSiC). We further demonstrate the practical utility of THUNDER to estimate cell type proportions and identify cell-type-specific interactions in Hi-C data from adult human cortex tissue samples. THUNDER will be a useful tool in adjusting for varying cell type composition in population samples, facilitating valid and more powerful downstream analysis such as differential chromatin organization studies. Additionally, THUNDER estimated contact profiles provide a useful exploratory framework to investigate cell-type-specificity of the chromatin interactome while experimental data is still rare.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35259165 PMCID: PMC8932604 DOI: 10.1371/journal.pgen.1010102
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Overview of THUNDER Procedure.
(A) Overview of nonnegative matrix factorization (NMF) in the context of bulk Hi-C data. Three underlying cell types each contribute to the observed contact frequencies in two bulk Hi-C samples. The first step of the THUNDER algorithm is to deconvolve the input bulk Hi-C data into two estimated matrices: the cell type profile matrix and the proportion matrix. (B) In order to select informative bin-pairs for deconvolution, THUNDER utilizes a feature selection algorithm specifically tailored to Hi-C data to analyze the contact frequency distribution of the bin-pairs in the cell type profile matrix. (C) In the final step of THUNDER, we subset the bin-pairs contained in the input bulk Hi-C samples to only informative bin-pairs and perform NMF a second time. The proportion matrix is scaled to be estimates of the underlying cell type proportions in the bulk Hi-C samples. The cell type profile matrix estimates cell-type-specific contact distributions.
Fig 2Performance of Feature Selection Strategies for Unsupervised Hi-C Deconvolution in HAP1, HeLa, and GM12878 Mixtures.
We test 11 feature selection strategies including no feature selection (NMF), Fano 100, Fano 1,000, and 8 feature selection strategies combining bin-pairs with high cell-type-specificity (CTS) and high across-cell-type variation (ACV). We computed the mean absolute deviation (MAD) and Pearson correlation between the true simulated cell type proportions and the estimated proportions across simulations across 5 simulation replicates. Colors are grouped such that the “reds” are strategies analyzing the estimated cell-type-specific profiles using the mean across bin-pairs for thresholding, “blues” are feature score strategies analyzing the estimated cell-type-specific profiles using the median across bin-pairs for thresholding, and “greens” are NMF with no feature selection or a pre-specified number of features based on Fano factor. Distributions are presented across simulation replicates.
Fig 3Performance of Deconvolution Methods on Mixtures with 6 Human Brain Cell Types.
(A,B) We computed the mean absolute deviation (MAD) and Pearson correlation between the true simulated cell type proportions and the estimated proportions across simulations across 5 simulation replicates. Bars are the average value across simulation replicates. Lower MAD and higher Pearson correlation indicates better performance. Error bars are equal to the standard deviation across simulation replications. (C) Number of bin-pairs selected by deconvolution methods which perform feature selection.
Fig 4THUNDER Estimated Cell Type Proportions in 3 Samples of Human Cortex Tissue.
We use THUNDER to estimate cell type proportions for 3 Hi-C samples from cortex tissue and perform enrichment analyses to assign brain cell types to THUNDER clusters. Our results match the expected ratio of neuronal to non-neuronal cells in cortex tissue.