Literature DB >> 29229825

Network analysis identifies chromosome intermingling regions as regulatory hotspots for transcription.

Anastasiya Belyaeva1,2, Saradha Venkatachalapathy3, Mallika Nagarajan3, G V Shivashankar3,4, Caroline Uhler5,2.   

Abstract

The 3D structure of the genome plays a key role in regulatory control of the cell. Experimental methods such as high-throughput chromosome conformation capture (Hi-C) have been developed to probe the 3D structure of the genome. However, it remains a challenge to deduce from these data chromosome regions that are colocalized and coregulated. Here, we present an integrative approach that leverages 1D functional genomic features (e.g., epigenetic marks) with 3D interactions from Hi-C data to identify functional interchromosomal interactions. We construct a weighted network with 250-kb genomic regions as nodes and Hi-C interactions as edges, where the edge weights are given by the correlation between 1D genomic features. Individual interacting clusters are determined using weighted correlation clustering on the network. We show that intermingling regions generally fall into either active or inactive clusters based on the enrichment for RNA polymerase II (RNAPII) and H3K9me3, respectively. We show that active clusters are hotspots for transcription factor binding sites. We also validate our predictions experimentally by 3D fluorescence in situ hybridization (FISH) experiments and show that active RNAPII is enriched in predicted active clusters. Our method provides a general quantitative framework that couples 1D genomic features with 3D interactions from Hi-C to probe the guiding principles that link the spatial organization of the genome with regulatory control.
Copyright © 2017 the Author(s). Published by PNAS.

Entities:  

Keywords:  3D FISH; Hi-C; chromosome intermingling; epigenetics; network and clustering analysis

Mesh:

Year:  2017        PMID: 29229825      PMCID: PMC5748172          DOI: 10.1073/pnas.1708028115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


The 3D structure of the genome plays a key role in regulatory control of the cell. Historically, the spatial organization of the genetic material has been probed with fluorescence in situ hybridization (FISH), and it was shown that chromosome organization is nonrandom. Each chromosome occupies its own territory with gene-dense chromosomes more likely to be in the nuclear interior (1). As an addition to FISH, chromosome conformation capture methods (3C, 4C, 5C, and Hi-C) have been designed to probe the 3D organization of the genome by measuring the genome-wide contact frequencies over a population of cells (2–5). Computational and experimental efforts have largely focused on investigating intrachromosomal contacts. Studies where these interactions have been analyzed together with epigenetic modifications as measured by chromatin immunoprecipitation sequencing (ChIP-seq) showed that epigenetic marks are tightly linked to shaping the architecture of the genome (6, 7). Few studies have considered interchromosomal interactions. It was shown that regions on neighboring chromosome territories may loop out and intermingle with each other in a transcription-dependent manner (8, 9). In addition, a recent study has revealed that intermingling regions are enriched in both active and repressive epigenetic marks, as well as the active form of RNA polymerase II (RNAPII) and transcription factors (10). Furthermore, it was identified that genes are spatially colocalized and coregulated by sharing common transcription factors (11, 12) and epigenetic machinery like the polycomb proteins (13). For example, TNF-responsive genes (on the same and different chromosomes) have been shown to colocalize upon their stimulation. Their spatial clustering was found to be correlated with their temporal expression patterns (12). The clustering of genes, transcriptional machinery, and regulatory factors to coordinate expression, also known as transcription factories, has been proposed as a model for gene regulation (14–16). Collectively, these studies suggest that interchromosomal regions could harbor coregulated gene clusters. However, missing in this picture is a systematic analysis linking 1D epigenetic marks and 3D intermingling regions and their roles in transcription control. Various methods have been developed to infer the spatial connectivity of the whole genome from Hi-C data. Restraint-based approaches transform Hi-C contact matrices into distances to deduce one consensus structure (17–21). However, it remains a challenge to map contact frequencies to spatial distances due to biases in Hi-C matrices (22). A different approach is to produce an ensemble of structures that could explain the experimental data (23, 24). Computational methods have largely focused on inferring the 3D genome structure based on Hi-C data alone without leveraging functional genomic data for studying its architecture. A recent study has explored this idea by superimposing ChIP-seq data of three transcription factors (TFs) on the 3D genome architecture inferred from Hi-C and determined functional hotspots in Saccharomyces cerevisiae (25). Another study used 1D epigenomic tracks to predict 3D interactions (26). But there remains a lack of a general quantitative framework that integrates 1D functional genomic features with 3D intermingling regions to determine a regulatory code for interchromosomal interactions. In this paper, we take a unique approach by integrating Hi-C and functional genomic data to predict regions that are colocalized and coregulated in 3D. The model of gene regulation that is captured by our analysis is the spatial clustering of genomic regions for their coregulation (27). This mode of gene regulation may enable the cell to coordinate gene expression and activate or repress pathways that are important for cell function in a coordinated manner. We focus on interchromosomal interactions to study chromosome intermingling regions. Using a network analysis approach, we construct a network of chromosomal interactions weighted by correlations in their genomic features at a 250-kb resolution. We find that intermingling regions can be divided into active and inactive clusters, where active clusters are hotspots for TF binding. We validate our predictions using FISH by comparing a predicted active cluster vs. a predicted negative control and also confirm that active RNAPII is significantly enriched in the predicted active cluster.

Results

Identification of Intermingling Domains.

To identify interchromosomal regions that are both spatially colocalized and coregulated, we leveraged spatial information from Hi-C experiments and regulatory information, namely, epigenetic marks, TF ChIP-seq, DNase I hypersensitivity (DNase-seq), and RNA-seq. Our aim was to identify clusters of chromosome regions at the whole-genome scale that interact spatially due to similarities in their regulatory features and thus might be coregulated by shared regulatory factors and epigenetic marks. Our method consists of four steps outlined in Fig. 1: (i) identification of highly interacting domains by determining large average submatrices in interchromosomal Hi-C maps, (ii) superimposing regulatory marks on the interacting domains, (iii) construction of a network of interacting regions with edges weighted by the correlation of the superimposed marks as a measure of coregulation, and (iv) network clustering to obtain spatially colocalized and coregulated domains.
Fig. 1.

Overview of the proposed quantitative framework for detecting intermingling regions. (A) Example of an observed interchromosomal Hi-C contact matrix at 250-kb resolution after preprocessing and transformation (standardized by mean and SD after log(1 + x) transformation) for chromosomes 19 and 20 (). Rectangular boxes represent interacting domains for this pair of chromosomes as detected by the LAS algorithm, which finds submatrices with high average. (B) Matrix containing the number of interacting 250-kb regions identified by the LAS algorithm for each pair of chromosomes. (C) Subnetwork of the chromosome interaction network corresponding to two distinct clusters. Nodes are colored by chromosome number. Each node in the network corresponds to a 250-kb region. Edges link nodes that are found together in a submatrix (box) as determined by the LAS algorithm. The edge weights are given by the strength of correlation between the genomic features (histone modifications, TF ChIP-seq, DNase-seq, and RNA-seq as listed in ) of adjacent 250-kb nodes. (D and E) Activity (normalized number of peaks in a 250-kb region) of the genomic features for the two clusters obtained by weighted correlation clustering on the subnetwork in C. Each ring corresponds to one genomic feature, listed from outer ring to inner ring in . Features are grouped into active (outer rings—RNA-seq, RNAPII, H3K4me1, H3K4me2, H3K4me3, H3K36me3, H3K9ac), repressive (middle rings—H3K27me3 and H3K9me3), and other (inner rings) categories. (F) Fold enrichment of each genomic feature in the intermingling regions ().

Overview of the proposed quantitative framework for detecting intermingling regions. (A) Example of an observed interchromosomal Hi-C contact matrix at 250-kb resolution after preprocessing and transformation (standardized by mean and SD after log(1 + x) transformation) for chromosomes 19 and 20 (). Rectangular boxes represent interacting domains for this pair of chromosomes as detected by the LAS algorithm, which finds submatrices with high average. (B) Matrix containing the number of interacting 250-kb regions identified by the LAS algorithm for each pair of chromosomes. (C) Subnetwork of the chromosome interaction network corresponding to two distinct clusters. Nodes are colored by chromosome number. Each node in the network corresponds to a 250-kb region. Edges link nodes that are found together in a submatrix (box) as determined by the LAS algorithm. The edge weights are given by the strength of correlation between the genomic features (histone modifications, TF ChIP-seq, DNase-seq, and RNA-seq as listed in ) of adjacent 250-kb nodes. (D and E) Activity (normalized number of peaks in a 250-kb region) of the genomic features for the two clusters obtained by weighted correlation clustering on the subnetwork in C. Each ring corresponds to one genomic feature, listed from outer ring to inner ring in . Features are grouped into active (outer rings—RNA-seq, RNAPII, H3K4me1, H3K4me2, H3K4me3, H3K36me3, H3K9ac), repressive (middle rings—H3K27me3 and H3K9me3), and other (inner rings) categories. (F) Fold enrichment of each genomic feature in the intermingling regions (). We analyzed Hi-C data from human lung fibroblast (IMR-90) cells at 250-kb resolution, obtained from ref. 28. After bias correction, filtering, and transforming the data (), we identified a stringent set of highly interacting interchromosomal regions by solving the following submatrix finding problem in Hi-C maps. We sought a contiguous submatrix that has a high average , within the real-valued data matrix , where each entry is an interchromosomal contact frequency between two 250-kb regions. We used the iterative large average submatrix (LAS) algorithm (29) that balances matrix size and average value, as outlined in to discover highly interacting domains. Fig. 1 shows the identified domains in the Hi-C contact map for chromosomes 19 and 20. As shown in Fig. 1, the LAS algorithm captures the regions with high intensity in the interchromosomal matrix. Applying this procedure to all pairwise interchromosomal maps yields Fig. 1, where each entry in the matrix corresponds to the number of 250-kb regions identified for the particular chromosome pair [false discovery rate (FDR) < , ]. The total size of highly interacting domains across all chromosomes spanned 903.25 Mb (). Consistent with previous observations (2, 30), Fig. 1 shows that gene-dense chromosomes such as 15–17 and 19–22 had a high number of intermingling 250-kb regions. In addition, as previously noted (31), we found a striking difference between chromosomes 18 and 19—although these two chromosomes are approximately equal in size, the gene-poor chromosome 18 has a low level of intermingling across most chromosomes, while the gene-rich chromosome 19 tends to intermingle more with other chromosomes.

Integration of Functional Genomic Data and Network Analysis.

We obtained functional genomic data: TF ChIP-seq, histone modifications, DNase-seq, and RNA-seq data from ENCODE (32), Roadmap Epigenomics (33), and GEO databases (). We used these experimental data as a regulatory profile for all 250-kb regions that lay within the intermingling domains. Considering each selected 250-kb region as a node, a whole-genome network of chromosomal interactions was constructed as follows. Between chromosomes, the edges in the network were placed between pairs of 250-kb regions that lay within the same submatrix as identified by the LAS algorithm. Within chromosomes, edges were placed between loci that fall within the same intrachromosomal domain, as determined in ref. 28. After establishing the skeleton of the network, the edge weights were calculated as follows. Since our goal was to determine spatially coregulated regions, we weighted the edges by Spearman’s correlation between the genomic profiles of adjacent 250-kb regions. This combined approach can mitigate some of the noise associated with using Hi-C contact frequencies alone. In addition, it allows us to identify chromosome intermingling regions with coordinated activity, which might be controlled by the same set of TFs or epigenetic marks, as opposed to domains that interact in 3D by chance. A subnetwork containing six 250-kb regions from three distinct chromosomes is shown in Fig. 1. The edge weights in this subnetwork suggest the presence of two separate clusters. To retrieve intermingling regions that are coregulated, the weighted network of 250-kb regions was partitioned into clusters, using weighted correlation clustering (34) (see ). This approach can for example identify regions that are brought together for transcription, since these would have high RNAPII and low repressive epigenetic marks. This approach indeed found two clusters in the subnetwork shown in Fig. 1. The regulatory profiles of the six regions, separated into two clusters, are illustrated in Fig. 1 . As a consequence of using weighted correlation clustering, the genomic features within a cluster are more similar than across clusters. Interestingly, the particular cluster in Fig. 1 is enhanced for active genomic features (we analyzed H3K9ac, H3K36me3, H3K4me3, H3K4me2, H3K4me1, RNAPII, and RNA-seq) and depleted for repressive features (we analyzed H3K27me3 and H3K9me3), while the cluster in Fig. 1 is depleted for active features. Using this method, 446 clusters (totaling 459.5 Mb; ) were identified (P value under a test) that consist of at least two nodes and span multiple chromosomes ( and Dataset S1). On average, 2.5 chromosomes interact within one cluster (). We analyzed the enrichment of regulatory marks in intermingling regions and found that these regions were most enriched for RNAPII, namely by a factor of 2.23 (Fig. 1). We also found the active and repressive marks (e.g., H3K9ac, H3K4me3, and H3K9me3) to be enriched in intermingling clusters, which is consistent with a previous study (10).

Regulatory Features are Predictive of Intermingling.

To characterize intermingling regions as a whole and evaluate whether they are distinct from nonintermingling regions on a regulatory level, we built a classifier and determined the features that contribute the most to distinguishing between these two classes. These features may represent a mechanism to spatially cluster genes for their coregulation. We annotated 250-kb regions as intermingling or nonintermingling based on the results from our network analysis and clustering. We then performed classification based on the associated regulatory profiles (). We used eXtreme gradient boosting trees with 10-fold cross-validation to train our classifier. Using all features, the classifier achieves an accuracy of and the corresponding receiver operating characteristic (ROC) curve in Fig. 2 has an area under the curve (AUC) of 0.77.
Fig. 2.

Performance and feature importance for classifying intermingling regions. (A) ROC curve for eXtreme gradient boosting trees classifier that was trained on genomic features of intermingling vs. nonintermingling regions. This results in AUC of 0.77. (B) Features ranked in order of importance (relative depth of feature in the decision tree) for distinguishing intermingling domains. (C) AUC when recursively eliminating one feature at a time based on 10-fold cross-validation. Near-optimal performance is reached with 13 features, which are indicated by asterisks in B.

Performance and feature importance for classifying intermingling regions. (A) ROC curve for eXtreme gradient boosting trees classifier that was trained on genomic features of intermingling vs. nonintermingling regions. This results in AUC of 0.77. (B) Features ranked in order of importance (relative depth of feature in the decision tree) for distinguishing intermingling domains. (C) AUC when recursively eliminating one feature at a time based on 10-fold cross-validation. Near-optimal performance is reached with 13 features, which are indicated by asterisks in B. To quantify the importance of each feature by itself and in conjunction with all other features, we computed its univariate and multivariate rank based on its depth in the decision trees of the ensemble (Fig. 2 and ). The most important features determined by this analysis are lamin B1 (LMNB1), H3K9me3, H3K56ac, and H2A.Z. The importance of both repressive (H3K9me3, LMNB1) and active (H3K56ac, H2A.Z) marks ties with the observation that intermingling regions contain both active and repressed regions (35). Furthermore, previous mapping of LMNB1 in the genome revealed the presence of lamina-associated domains (LADs) that interact with the lamina on the nuclear envelope, spatially organize chromosomes by anchoring them to the lamina, and display coordinated gene repression (36–38). H3K9me3 is enriched in LADs and may facilitate gene silencing in LADs (37, 39). The context-dependent importance of this feature is in line with its low univariate, but high multivariate rank (). H3K56ac is a known mark of transcriptionally active chromatin regions (40, 41). Finally, H2A.Z is enriched at transcription start sites (42), indicating its involvement in transcription initiation, and it appears to be a defining feature of intermingling on its own (). Performing stepwise feature elimination shows that ∼13 features are sufficient for achieving high AUC (Fig. 2) and the corresponding features are annotated by asterisks in Fig. 2.

Intermingling Clusters Are Divided into Active and Inactive Clusters.

While it is interesting to evaluate intermingling regions altogether, studying these on a cluster-by-cluster level may give insights into the links between regulatory processes and spatial colocalization. Based on previous evidence (43) we hypothesized that active regions are clustered with other active regions and inactive regions with other inactive regions. To analyze the types of clusters we obtained, we computed the fold enrichment of each cluster for several regulatory features ( and Dataset S1). We found that a high proportion of the clusters—41.7% (186 clusters)—was enriched for all active marks—RNAPII, H3K9ac, H3K36me3, H3K4me3, and H3K4me1 as shown in Fig. 3 (P value = under a test, ). Notably, the majority of clusters were either enriched for all five active marks or not enriched for any active mark.
Fig. 3.

Classification of intermingling regions into active and inactive clusters. (A) Five-way Venn diagram representing the number of clusters enriched for each active epigenetic mark and RNAPII. Interestingly, many clusters (186 of 446) are enriched for all five active marks. (B) Venn diagram of the active clusters (the 186 clusters in the intersection of the five-way diagram in A) and clusters enriched for the silencing mark H3K9me3. Note that only 18 of 446 clusters are both active and silenced, showing that the clusters separate into two categories of active and inactive clusters.

Classification of intermingling regions into active and inactive clusters. (A) Five-way Venn diagram representing the number of clusters enriched for each active epigenetic mark and RNAPII. Interestingly, many clusters (186 of 446) are enriched for all five active marks. (B) Venn diagram of the active clusters (the 186 clusters in the intersection of the five-way diagram in A) and clusters enriched for the silencing mark H3K9me3. Note that only 18 of 446 clusters are both active and silenced, showing that the clusters separate into two categories of active and inactive clusters. The percentage of clusters enriched for the repressive/inactivating mark H3K9me3 was 38.3% (171 clusters). Interestingly, we observed a clear separation of the intermingling clusters into active and inactive, with only 4% of clusters (18 clusters) that were in both categories as shown in Fig. 3 (P value =  under a test, ). Active clusters were defined as those clusters enriched for RNAPII (fold enrichment >1) but not for H3K9me3. Inactive clusters were defined as enriched for H3K9me3 but not for RNAPII. Active clusters also had significantly higher gene expression (P value = 0.004 under a t test) in comparison with inactive clusters (). In addition, high-occupancy target (HOT) regions, i.e., regions that are occupied by many TFs (44), were overrepresented in active clusters in comparison with low-occupancy target (LOT) regions, by a HOT:LOT ratio of 2.94 (). These findings suggest that active clusters may be hotspots for TF binding.

Active Clusters Are Hotspots for TF Binding.

We probed the active clusters for shared TFs that may be involved in colocalizing and coregulating regions in a cluster by analyzing TF binding sites (TFBS). We used the JASPAR 2016 database to obtain the TFBS. These data were overlaid and then filtered using ChIP-seq peaks from all human cell lines available from ENCODE (32) (). This resulted in TFBS for 52 TF motifs. We performed an additional analysis to also consider a larger set of TF motifs (386) by overlaying and filtering the JASPAR 2016 database with a robust set of CAGE peaks from ref. 45, collected across 353 human tissue samples as part of the FANTOM5 project (). This filtering step provided us with a list of potential transcription start sites that contain motifs for the TFs under consideration. We compared the distributions of TFBS counts per 250-kb region for active clusters vs. the whole genome. Several factors, such as EGR1, YY1, CTCF, and the E2F family of proteins, showed a significant increase in TFBS counts under a Mann–Whitney U test (Fig. 4).
Fig. 4.

TFBS and GO terms across active clusters. (A) Top 10 TFs with significantly overrepresented TFBS in active clusters compared with the whole-genome distribution (under a Mann–Whitney U test). (B) Matrix corresponding to a representative active cluster with the number of TFBS for each 250-kb region in the cluster. Only TFs containing at least one nonzero column entry are shown. A TF shared among multiple regions in the cluster may indicate its role in colocalization and coregulation of the clustered regions. (C) Significantly enriched GO terms computed from the genes that are expressed and colocalized in the intermingling cluster shown in B [ranked by P value using DAVID, ].

TFBS and GO terms across active clusters. (A) Top 10 TFs with significantly overrepresented TFBS in active clusters compared with the whole-genome distribution (under a Mann–Whitney U test). (B) Matrix corresponding to a representative active cluster with the number of TFBS for each 250-kb region in the cluster. Only TFs containing at least one nonzero column entry are shown. A TF shared among multiple regions in the cluster may indicate its role in colocalization and coregulation of the clustered regions. (C) Significantly enriched GO terms computed from the genes that are expressed and colocalized in the intermingling cluster shown in B [ranked by P value using DAVID, ]. The majority of active clusters contained binding sites for TFs that are shared across regions spanning multiple chromosomes (). For example, the cluster studied in Fig. 1 involving chromosomes 12 and 17 contains binding sites for the TFs USF1 and NRF1 on regions of both chromosomes (Fig. 4). This cluster is formed by the colocalization between two adjacent 250-kb regions on chromosome 12 and one region on chromosome 17. Gene ontology (GO) term analysis of the expressed genes () in this cluster revealed an enrichment for biological processes related to fibroblasts such as “cytoskeleton-dependent intracellular transport” (Fig. 4). On the other hand, we found that inactive clusters contained a low number of TFBS (), reaffirming the existence of two distinct types of cluster categories for intermingling regions.

Experimental Validation.

We ranked the active clusters according to the presence of binding sites for TFs that were shared across multiple chromosomes, using a permutation test (). The top 15 active clusters are shown in . Chromosomes 12 and 17 were consistently found together among the top highly ranked clusters and were thus chosen for experimental validation (). We compared the amount of overlap between chromosomes 12 and 17 to a negative control that we obtained by analyzing the network of least-interacting chromosomes (). The chromosome territories were identified in human fibroblast (BJ) cells using DNA FISH and visualized using a laser scanning confocal microscope (Fig. 5 ). To obtain a representative sample of the population, we imaged at least 200 cells for each chromosome pair. We confirmed that chromosomes 12 and 17 consistently intermingle in a population of cells (Fig. 5; ; and Movie S1), while the negative control chromosome pair does not (Fig. 5; ; and Movie S2). To quantify our results, the intermingling degree, i.e., the amount of overlap between the two pairs of chromosome territories, was calculated as explained in . We found that the chromosome pair 12 and 17, which was predicted to interact, had a significantly higher intermingling degree than the negative control pair 3 and 20 (Fig. 5, P value = 0.005 under a Welch two-sample t test). The percentage of nuclei that were intermingling (intermingling degree >0) was higher in the predicted pair of interacting chromosomes, 12 and 17, than in the negative control, 3 and 20 (). In addition, we also calculated the enrichment of active RNAPII in the intermingling regions for the aforementioned pairs (). We found that the predicted chromosome pair, 12 and 17, which belongs to an active cluster, had significantly higher enrichment for active RNAPII in the intermingling regions compared with the negative control pair, 3 and 20 (Fig. 5, P value = 7.125e-05 under a Welch two-sample t test), showing that the chromosome pair 12 and 17 indeed contains an active mark at the site of intermingling.
Fig. 5.

Experimental validation. (A) Representative images of the maximum-intensity Z projections of the nucleus, active RNAPII, and chromosomes 17 and 12, from Left to Right, respectively. (B) Raw image resulting from merging the nuclear (blue) and the two chromosome channels depicting the overlap between chromosomes 17 (purple) and 12 (cyan). (C) Image in B after segmentation with nucleus (white), chromosome 17 (red), and chromosome 12 (green). Yellow regions are the overlapping or intermingling regions. (C, Right) Enlargement of the region in the dotted white boxes in C, Left. (D) Representative images of the maximum-intensity Z projections of the nucleus, active RNAPII, and chromosomes 20 and 3, from Left to Right, respectively. (E) Raw image resulting from merging the nuclear (blue) and the two chromosome channels depicting the overlap between chromosomes 20 (purple) and 3 (cyan). (F) Image in E after segmentation with nucleus (white), chromosome 20 (red), and chromosome 3 (green). (F, Right) Enlargement of the region in the dotted white boxes in F, Left. (G) Boxplot depicting intermingling degree between chromosomes 12 and 17 and chromosomes 3 and 20 (P value = 0.005 under a Welch two-sample t test). (H) Boxplot depicting the enrichment of active RNAPII between chromosomes 12 and 17 and chromosomes 3 and 20 (P value = 7.125e-05 under a Welch two-sample t test). (All scale bars, 5 m.)

Experimental validation. (A) Representative images of the maximum-intensity Z projections of the nucleus, active RNAPII, and chromosomes 17 and 12, from Left to Right, respectively. (B) Raw image resulting from merging the nuclear (blue) and the two chromosome channels depicting the overlap between chromosomes 17 (purple) and 12 (cyan). (C) Image in B after segmentation with nucleus (white), chromosome 17 (red), and chromosome 12 (green). Yellow regions are the overlapping or intermingling regions. (C, Right) Enlargement of the region in the dotted white boxes in C, Left. (D) Representative images of the maximum-intensity Z projections of the nucleus, active RNAPII, and chromosomes 20 and 3, from Left to Right, respectively. (E) Raw image resulting from merging the nuclear (blue) and the two chromosome channels depicting the overlap between chromosomes 20 (purple) and 3 (cyan). (F) Image in E after segmentation with nucleus (white), chromosome 20 (red), and chromosome 3 (green). (F, Right) Enlargement of the region in the dotted white boxes in F, Left. (G) Boxplot depicting intermingling degree between chromosomes 12 and 17 and chromosomes 3 and 20 (P value = 0.005 under a Welch two-sample t test). (H) Boxplot depicting the enrichment of active RNAPII between chromosomes 12 and 17 and chromosomes 3 and 20 (P value = 7.125e-05 under a Welch two-sample t test). (All scale bars, 5 m.)

Discussion

Understanding the spatial organization of the chromosomes within the cell nucleus has been a major question in cell biology. A number of studies have suggested that the packing of DNA plays a critical role in regulating genomic programs (3). Earlier experiments took advantage of chromosome painting methods and revealed that chromosomes are organized nonrandomly and in a cell–type-specific manner (1, 8, 9). Analysis of gene positioning using FISH showed that coregulated genes were coclustered (11, 12). Such clusters of genes were also found to be colocalized with transcription-related machinery such as active RNAPII and TFs (11, 12). Recent developments in chromosome capture technologies further revealed that genome-wide chromosome contact maps are correlated with epigenetic marks (6, 7). The majority of studies using chromosome conformation capture focused on linking chromatin contacts with epigenetic modifications at the resolution of genes in intrachromosomal regions (6, 7). However, the coupling between the global organization of chromosomes with genome-wide epigenetic marks and the intermingling regions as an additional layer of transcriptional regulation has not been well studied. In this paper, we developed a network analysis approach to reveal the principles of transcription-dependent chromosome intermingling by taking advantage of 3D contact maps obtained using Hi-C and 1D epigenetic marks, TF ChIP-seq, DNA accessibility, and RNA-seq. Our computational approach focuses on interchromosomal domains, since their organizational principles have been largely unknown. The proposed quantitative framework enables the prediction of chromosome intermingling regions at a genome-wide scale, thereby complementing experimental methods such as FISH that can be used to study specific clusters of interchromosomal interactions. The novelty of our method lies in leveraging 1D genomic features in combination with 3D interactions from Hi-C data. This allows us to study functionally colocalized regions: Since interactions can occur by chance in 3D, some intermingling regions may not be of biological relevance. By leveraging epigenetic marks and data from TF binding and DNA accessibility, as well as gene expression, we can determine interchromosomal regions that are colocalized and coregulated. Our predictions reveal intriguing patterns of chromosome organization and have been validated by FISH experiments. Our findings recapitulate known principles of chromosome interactions, such as the tendency of gene-dense chromosomes to intermingle more frequently (2, 30) and the enrichment of RNAPII in intermingling regions (10), suggesting that RNAPII may play a crucial role in establishing and maintaining chromosome interactions. We observe that the clusters of interchromosomal regions fall broadly into two categories, active and inactive, where active clusters are enriched for active epigenetic marks and RNAPII and inactive clusters are enriched for H3K9me3. Interestingly, we found that active clusters are hotspots for TF binding sites, with several TFs being shared among multiple chromosomes within a cluster. These clusters contain genes with biologically relevant GO terms. We established the predictive power of our model through experimental validation. Using FISH experiments we showed that the predicted intermingling chromosomes interact consistently across a population of cells and that such intermingling regions are enriched for active RNAPII. Our quantitative analysis provides evidence that TF hotspots in active clusters are colocalized with active epigenetic modifications and with RNAPII and have a significantly higher gene expression than inactive clusters, suggesting that the relative positioning of the chromosomes in the cell nucleus is optimized to facilitate the clustering of coregulated genes, TFs, epigenetic modifications, and transcriptional machinery. Collectively, these findings suggest that the spatial organization of the genomic material in the cell nucleus is optimized for transcription programs. The framework we present here is general and can be applied to analyze any cell type. We showed by experimentally validating the predictions from our model using single-cell imaging methods that population-level genome-wide contact and epigenetic data carry enough information to identify highly interacting regions. However, we anticipate that the power of our method will be increased as more robust single-cell genomic data become available. We believe that our quantitative approach will provide a useful framework to gain insights into the interplay between chromosome reorganization and regulation during processes such as cell differentiation, reprogramming, or the maintenance of homeostasis.

Materials and Methods

Details about the methods used for processing the raw Hi-C matrices, the LAS algorithm for identifying highly interacting regions, weighted correlation clustering, classification into intermingling and nonintermingling domains, the computation of fold enrichment of genomic features, the cell culture and chromosome FISH protocols, and the methods and settings used for confocal imaging and image analysis are provided in . The code for interchromosomal network construction via LAS and for the identification and analysis of clusters is available at https://github.com/anastasiyabel/functional_chromosome_interactions. The code for performing the image analysis is available at https://github.com/SaradhaVenkatachalapathy/Chromsome-intermingling-region-indentifcation-and-characterisation-of-protein-levels.
  45 in total

Review 1.  Long-Range Chromatin Interactions.

Authors:  Job Dekker; Tom Misteli
Journal:  Cold Spring Harb Perspect Biol       Date:  2015-10-01       Impact factor: 10.005

Review 2.  Genome architecture: domain organization of interphase chromosomes.

Authors:  Wendy A Bickmore; Bas van Steensel
Journal:  Cell       Date:  2013-03-14       Impact factor: 41.582

3.  Population-based 3D genome structure analysis reveals driving forces in spatial genome organization.

Authors:  Harianto Tjong; Wenyuan Li; Reza Kalhor; Chao Dai; Shengli Hao; Ke Gong; Yonggang Zhou; Haochen Li; Xianghong Jasmine Zhou; Mark A Le Gros; Carolyn A Larabell; Lin Chen; Frank Alber
Journal:  Proc Natl Acad Sci U S A       Date:  2016-03-07       Impact factor: 11.205

Review 4.  The 3D Genome as Moderator of Chromosomal Communication.

Authors:  Job Dekker; Leonid Mirny
Journal:  Cell       Date:  2016-03-10       Impact factor: 41.582

5.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

6.  A promoter-level mammalian expression atlas.

Authors:  Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal:  Nature       Date:  2014-03-27       Impact factor: 49.962

7.  Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes.

Authors:  Andreas Bolzer; Gregor Kreth; Irina Solovei; Daniela Koehler; Kaan Saracoglu; Christine Fauth; Stefan Müller; Roland Eils; Christoph Cremer; Michael R Speicher; Thomas Cremer
Journal:  PLoS Biol       Date:  2005-04-26       Impact factor: 8.029

8.  Genome-wide identification and characterisation of HOT regions in the human genome.

Authors:  Hao Li; Feng Liu; Chao Ren; Xiaochen Bo; Wenjie Shu
Journal:  BMC Genomics       Date:  2016-09-15       Impact factor: 3.969

9.  Recruitment to the nuclear periphery can alter expression of genes in human cells.

Authors:  Lee E Finlan; Duncan Sproul; Inga Thomson; Shelagh Boyle; Elizabeth Kerr; Paul Perry; Bauke Ylstra; Jonathan R Chubb; Wendy A Bickmore
Journal:  PLoS Genet       Date:  2008-03-21       Impact factor: 5.917

10.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

View more
  19 in total

1.  H3K9me3, H3K36me3, and H4K20me3 Expression Correlates with Patient Outcome in Esophageal Squamous Cell Carcinoma as Epigenetic Markers.

Authors:  Menghan Zhou; Yiping Li; Shaofeng Lin; Yanping Chen; Yanyan Qian; Zhujiang Zhao; Hong Fan
Journal:  Dig Dis Sci       Date:  2019-02-20       Impact factor: 3.199

Review 2.  Reciprocal regulation of cellular mechanics and metabolism.

Authors:  Tom M J Evers; Liam J Holt; Simon Alberti; Alireza Mashaghi
Journal:  Nat Metab       Date:  2021-04-19

3.  Egr1 is a 3D matrix-specific mediator of mechanosensitive stem cell lineage commitment.

Authors:  Jieung Baek; Paola A Lopez; Sangmin Lee; Taek-Soo Kim; Sanjay Kumar; David V Schaffer
Journal:  Sci Adv       Date:  2022-04-15       Impact factor: 14.957

4.  IGAP-integrative genome analysis pipeline reveals new gene regulatory model associated with nonspecific TF-DNA binding affinity.

Authors:  Alireza Sahaf Naeini; Amna Farooq; Magnar Bjørås; Junbai Wang
Journal:  Comput Struct Biotechnol J       Date:  2020-06-02       Impact factor: 7.271

5.  Predicting cell lineages using autoencoders and optimal transport.

Authors:  Karren Dai Yang; Karthik Damodaran; Saradha Venkatachalapathy; Ali C Soylemezoglu; G V Shivashankar; Caroline Uhler
Journal:  PLoS Comput Biol       Date:  2020-04-28       Impact factor: 4.475

Review 6.  Forces driving the three-dimensional folding of eukaryotic genomes.

Authors:  Alvaro Rada-Iglesias; Frank G Grosveld; Argyris Papantonis
Journal:  Mol Syst Biol       Date:  2018-06-01       Impact factor: 11.429

7.  Mapping the spectrum of 3D communities in human chromosome conformation capture data.

Authors:  Sang Hoon Lee; Yeonghoon Kim; Sungmin Lee; Xavier Durang; Per Stenberg; Jae-Hyung Jeon; Ludvig Lizana
Journal:  Sci Rep       Date:  2019-05-02       Impact factor: 4.379

8.  Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform.

Authors:  Marcela Sandoval-Velasco; Juan Antonio Rodríguez; Cynthia Perez Estrada; Guojie Zhang; Erez Lieberman Aiden; Marc A Marti-Renom; M Thomas P Gilbert; Oliver Smith
Journal:  Gigascience       Date:  2020-08-01       Impact factor: 6.524

9.  Association between fetal sex and maternal plasma microRNA responses to prenatal alcohol exposure: evidence from a birth outcome-stratified cohort.

Authors:  Nihal A Salem; Amanda H Mahnke; Alan B Wells; Alexander M Tseng; Lyubov Yevtushok; Natalya Zymak-Zakutnya; Wladimir Wertlecki; Christina D Chambers; Rajesh C Miranda
Journal:  Biol Sex Differ       Date:  2020-09-10       Impact factor: 5.027

10.  Automatic detection of genomic regions with informative epigenetic patterns.

Authors:  Florencio Pazos; Adrian Garcia-Moreno; Juan C Oliveros
Journal:  BMC Genomics       Date:  2018-11-28       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.