| Literature DB >> 32647272 |
K Polovnikov1,2, A Gorsky3,4, S Nechaev5,6, S V Razin7,8, S V Ulianov7,8.
Abstract
Chromatin communities stabilized by protein machinery play essential role in gene regulation and refine global polymeric folding of the chromatin fiber. However, treatment of these communities in the framework of the classical network theory (stochastic block model, SBM) does not take into account intrinsic linear connectivity of the chromatin loci. Here we propose the polymer block model, paving the way for community detection in polymer networks. On the basis of this new model we modify the non-backtracking flow operator and suggest the first protocol for annotation of compartmental domains in sparse single cell Hi-C matrices. In particular, we prove that our approach corresponds to the maximum entropy principle. The benchmark analyses demonstrates that the spectrum of the polymer non-backtracking operator resolves the true compartmental structure up to the theoretical detectability threshold, while all commonly used operators fail above it. We test various operators on real data and conclude that the sizes of the non-backtracking single cell domains are most close to the sizes of compartments from the population data. Moreover, the found domains clearly segregate in the gene density and correlate with the population compartmental mask, corroborating biological significance of our annotation of the chromatin compartmental domains in single cells Hi-C matrices.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32647272 PMCID: PMC7347895 DOI: 10.1038/s41598-020-68182-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Adjacency matrices of with two clusters generated according to the (a) polymer stochastic block model () and (b) canonical stochastic block model (). Vertices in the graph are enumerated by the polymer coordinate (a) and first all red, then all blue ones (b).
Figure 2(a) Depiction of the polymer SBM network: the backbone (bold), contacts between genomically distant monomers (dashed) and two chemical sorts of the monomers (red and blue), arranged into contiguous alternating segments. An example of the non-backtracking walk on such graph is shown by arrows. Immediate returns are forbidden, preventing localization on hubs; (b) Spectrum of the polymer non-backtracking flow (11) for the fractal globular () large-scale organization of the chain with two overlaid compartments with the mean length .
Figure 3(a) Comparison of performance of different classical operators without background, polymer modularity and polymer non-backtracking flow operators (); (b) The iterative approach that can be used to determine the optimal value of for five values of ; the true optimal values of calculated from (8) are shown by dash; (c) The mean numbers of inner and outer edges are calculated for each value of in order to estimate the detectability threshold for the corresponding regular network. (d) Amount of isolated eigenvalues of the polymer flow operator plotted against . Full spectra of the polymer flow operator for the two values of are shown in the insets.
Figure 4(a The average contact probability P(s) of single cells (gray) and of the merged cell (solid, black) computed for logarithmically spaced bins with the logfactor 1.4; the fractal globule scaling is also shown by dashed line for comparison. (b) Annotation of active (red) and inactive (blue) compartmental domains for one of the contact maps (cell 29749, chromosome 3, length , 200kb resolution) by the polymer non-backtracking flow operator. Below the map the compartmental signal from the corresponding leading eigenvector of the polymer non-backtracking flow matrix is shown. Inset: the full spectrum of the polymer flow for the same contact map. (c,d) Averaged profiles of the GC content (z-scores) plotted around the centers of the compartmental domains (active—red, inactive—blue) for the population of cells and for a pool of single cells.