Literature DB >> 27185892

3D-GNOME: an integrated web service for structural modeling of the 3D genome.

Przemyslaw Szalaj¹, Paul J Michalski², Przemysław Wróblewski³, Zhonghui Tang⁴, Michal Kadlof³, Giovanni Mazzocco³, Yijun Ruan⁵, Dariusz Plewczynski⁶.

Abstract

Recent advances in high-throughput chromosome conformation capture (3C) technology, such as Hi-C and ChIA-PET, have demonstrated the importance of 3D genome organization in development, cell differentiation and transcriptional regulation. There is now a widespread need for computational tools to generate and analyze 3D structural models from 3C data. Here we introduce our 3D GeNOme Modeling Engine (3D-GNOME), a web service which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. Users submit a bedpe (paired-end BED format) file containing the locations and strengths of long range contact points, and 3D-GNOME simulates the structure and provides a convenient user interface for further analysis. Alternatively, a user may generate structures using published ChIA-PET data for the GM12878 cell line by simply specifying a genomic region of interest. 3D-GNOME is freely available at http://3dgnome.cent.uw.edu.pl/.

Entities: CellLine Chemical Gene Species

Mesh：

Year: 2016 PMID： 27185892 PMCID： PMC4987952 DOI： 10.1093/nar/gkw437

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

ChIA-PET (1), Hi-C (2) and related technologies have revealed that the mammalian genome has multiple levels of organization, from large-scale chromosome territories, mega-base sized topologically associated domains (TADs) (3,4) and chromosome contact domains (CCDs) (5), and down to specific CTCF-mediated looping interactions (5,6). Structural chromosome models are a key computational tool for analysis of such data (7,8), as they provide a representation of the data which can be more revealing than 1D looping depictions or 2D heatmaps. In particular, overlaying the 3D structure with additional genomic annotation data, such as histone methylation marks, can reveal spatial clustering of epigenetic factors which is not readily apparent in other representations (9). Recently, we have shown that a single ChIA-PET experiment reveals information at all relevant genomic resolutions (5). Previous ChIA-PET experiments focused on high-frequency, high-confidence interactions representing true long range interactions, and discarded singletons, which are low frequency, low confidence interactions which in fact constitute the majority of ChIA-PET reads (10–12). However, we showed that the singleton data provides the same information as Hi-C data, namely, low-resolution (∼1 Mb) information about large-scale topological domains. In light of this observation, we developed a structural modeling algorithm which leverages the multiscale nature of ChIA-PET data to produce 3D chromosome models at multiple resolutions (5). Although it was designed for ChIA-PET data, the algorithm works equally well with Hi-C and other genome-wide 3C-like data, provided some distinction is made between weak and strong interactions. The number of tools available for 3D chromatin modeling is growing rapidly (for a recent review, see (13)), and includes a mix of private (14–16) and publicly available (7,9,17–22) software. However, most of these tools require additional dependencies and/or some computational expertise. For example, ChromSDE (17) requires Matlab; ShRec3D (18) only runs on Linux; TADbit (19) requires the Integrative Modeling Platform (23); BACH (9) requires R, the GNU scientific library, and must be compiled from source; PASTIS (20) requires Python and includes files which must be compiled from source; and MCMC5C (7) and MOGEN (22) require Java and must be run from the command line. AutoChrom3D (21) is available as a convenient web application and requires no technical expertise from the user, but it imposes size limits on the reconstructed region dependent on the selected resolution, and its visualization requires Java, which is no longer supported by Google Chrome and generally poses a security risk when run in the browser. Here we introduce a new, freely available web tool, the 3D GeNOme Modeling Engine (3D-GNOME), which allows a user without any programing experience to generate 3D structures from 3C data with minimal effort, and simply requires any modern web browser to access and use. The simulation program is based on our algorithms published in (5). 3D-GNOME provides a web-based, interactive 3D viewer to visualize and analyze the resulting 3D structure, and includes options for the user to upload genomic annotation data to overlay on the structure. In addition to the 3D structure, 3D-GNOME provides a variety of other analysis tools, including 1D arc representations and 2D heatmap representations of the data.

IMPLEMENTATION

Web server

A schematic of the web server architecture is shown in Figure 1. The web server is written in Python using the Flask framework (http://flask.pocoo.org/). Job requests submitted by the user are subject to both the client and server-side validation. Upon validation, the request is saved to the MySQL database and a request with an id of the corresponding database record is added to the job queue. The job queue is simply a text file monitored by GNU Parallel (http://www.gnu.org/software/parallel/), which starts new jobs while managing the available resources (number of threads used, available memory, etc.). This allows us to easily adjust how many simultaneous jobs are run and to distribute processes to several machines. Technically, each job is a Python script containing all the processing steps - parsing the input, running external scripts (written in Python, PHP and R) to calculate statistics and generate plots, and, finally, running the 3D simulations. Structures are viewed using an interactive 3D viewer, which is implemented in WebGL (https://www.khronos.org/webgl/) using the Three.js JavaScript helper library (http://threejs.org/) and the dat.gui library (https://github.com/dataarts/dat.gui) for the user interface.

Figure 1.

Webserver architecture. The central role is played by a Flask-based server which accepts the requests, stores them in the database and adds them to the job queue. GNU Parallel monitors the queue and runs the jobs as soon as there are computational resources available. Each job consists of a number of external scripts executed sequentially.

3D Simulation

The simulation framework is written in C++. A complete description of the modeling approach is given in (5), and is also available in the technical documentation on the web server. Briefly, we use a multiscale, top-to-bottom modeling approach in which different scales correspond to different resolutions. At each level the chromatin is represented as a beads-and-springs polymer, with beads representing different genomic regions. The assignment of genomic regions to beads is data driven and reflects the underlying biological features that can be identified using interaction clusters—CCDs and interaction anchors—as shown in Figure 2A. We first model the general, low resolution (1–2 Mb) structure using singleton data, and then refine this structure using PET interactions to achieve a high resolution (1–10 kb) structure (Figure 2B).

Figure 2.

Presentation of basic modelling principles used in 3D-GNOME. (A) CCDs (marked with blue bars) can be clearly distinguished in both the PET clusters and singleton heatmaps. (B) Schematic representation of low (megabase size; left) and high (1–10 kb size; right) resolution structure levels. On the low resolution level each CCD is represented with a single bead. On the high resolution level the interior of CCDs is modeled as an interaction complex and the chromatin loops extending outwards. (C) An example of a PET clusters interactions pattern in a single CCD with anchors and CTCF motifs orientations marked (top) and schematic representation of the corresponding structure (bottom). The simulation protocol is similar for all levels. First, an energy function is defined taking into account the data available on this particular level. For the low-resolution level we apply the energy function build using singleton heatmaps. The number of interactions between regions can be used as a proxy of their pairwise physical distances – intuitively, the more interactions between the regions the closer they should be in 3D space. Thus, the interaction frequencies are converted to expected distances between genomic regions. The proper way to convert from interaction frequencies to distances, and even the appropriateness of such a conversion, is the subject of much discussion in the literature. We use a simple inverse power-law with user adjustable parameters, an approach used by most other modeling programs. For the high-resolution level we use PET clusters to position the anchors within an interaction complex, and we include terms to account for typical polymer physics interactions like stretching and bending energies. We do not consider excluded volume interactions, because including such an interaction generally introduced only minor modifications to the structure but dramatically increased the computation time. There are two optional refinements on this level: first, if CTCF motif orientation is available, as it is in the GM12878 line (5,6), then these can be used to orient the interactions (Figure 2C), and secondly, the shape of chromatin loops may be modified using high-resolution singleton heatmaps. A more comprehensive discussion of our modeling assumptions and the functional forms used to describe the various interaction terms can be found in the technical documentation on the website.

USAGE

Input

There are two potential use cases. In the first, the user has generated their own dataset from a ChIA-PET or Hi-C library and would like to generate 3D structures. The data should be stored in a tab-delimited, bedpe-like (http://bedtools.readthedocs.org/en/latest/content/general-usage.html) file consisting of seven or eight columns, where the first three columns describe the region on one side of the interaction (chromosome, start and stop positions), the second three columns describe the region on the other side of the interaction (chromosome, start and stop positions), and the seventh column indicates the frequency of that interaction in the dataset. The eighth column is optional and may be used to name the transcription factors pulled down in the experiments. The data should be sorted into two files, the first containing high-frequency, high-confidence ‘true’ interactions, and the second containing low-frequency, low-confidence singleton interactions. The user is also advised to supply a file with a definition of TADs, as this may lead to a more reasonable structure. If this file is not provided, then a heuristic algorithm will be used to determine TADs automatically. In the second use case, a user may generate 3D structures using our recently published GM12878 ChIA-PET data set (5). This data is stored on our server and requires no additional input file from the user. In either use case, the user then selects the genomic region they would like to model and, optionally, specifies a name for their model. Additionally, there are a number of simulation parameters, which the user may tune, including the weights of various interactions, or features (like CTCF orientation) and the number of algorithm iterations on each genomic scale. These are set to reasonable defaults, which work well for the GM12878 dataset, but may not be appropriate for other species or even other human cell lines. The purpose of each parameter is described, and there is a link to a help document for additional information. Finding a reasonable parameter set for a given data set may involve some trial and error. We tested our modeling approach for four additional cell lines: HEK293T, K562, HeLa and MCF7, confirming that indeed the changes of parameters values and different segmentations of chromatin chain are needed. Nevertheless we were able to prepare successfully three-dimensional models in all cases using our simulation code for those cell lines. Upon clicking the submit button, the user will be provided with a custom URL to a page where they can find their results. While the simulation is running the page will indicate its status.

Output

When the simulation finishes the status page will show several plots for data analysis, as shown in Figure 3 for ∼3.5 Mb region on chromosome 4. At the top is a 1D arc representation of the PET interactions, where the x-axis represents genomic position and the arc height represents the measured contact frequency. This representation allows the user to quickly evaluate the number and distribution of interaction clusters, and can be useful in estimating the accuracy of a 3D model. Next the user is presented with some population-level statistics on the PET interactions and singletons used in the current simulation. The singleton data is represented using conventional 2D heatmaps: one heatmap shows the raw data, and the other shows the normalized heatmap. Lastly, several helpful distributions are plotted: a histogram of PET cluster length, a density plot of singleton interaction distance, and the density of interacting loci in the selected region of interest.

Figure 3.

Example results page for a selected region (chr 4:109556994–113054287). (A) Interaction arcs representing the strength of PET interactions. Orange bars on top correspond to disjoint interaction subsets that can be distinguished. (B) Various statistics on the PET interactions (separately for CTCF and RNAPII) and singleton interactions. (C) The heatmaps showing the raw (left) and normalized (right) singleton data. Orange bars on top of the normalized heatmap represent a possible TAD calling for this region. (D) Plots showing the length distribution for PET and singleton interactions and the number of interactions originating from each site. It can be readily seen that RNAPII interactions are much shorter than CTCF ones (with the average length of 191 kb for RNAPII and 436kb for CTCF), and that they are usually found very close to the CTCF sites, which is in concordance with the genome folding model we proposed earlier (5). With some simplification it can be said that CTCF is a major contributor that shapes the genome topology, with one of its functions being bringing together the transcription and regulatory elements. Given the presented interaction plots one could argue that the selected region is comprised of 4 substructures (Figure 3A, marked with orange bars), with no PET interactions joining them. This detailed information conveyed by the interaction arcs plots is complemented by singleton heatmaps which allow easy recognition of TADs. Looking at the heatmaps alone it seems that there are three, four or five TADs, depending on whether we prefer to identify small and dense regions, or larger, but possibly less compacted ones. Interestingly, one of the most apparent splits into TADs (Figure 3C, marked with orange bars) does not entirely align with the regions suggested based on the PET interactions, suggest that inferring 3D structures requires careful examination and interpretation of both types of data. At the top of the page is a link labeled ‘Open 3D view’, which will open a page for the interactive 3D viewer with the model pre-loaded. The viewer supports all the usual interactions - translation, rotation, and zoom. A wide variety of options are provided through a dropdown menu on the right. Here we only mention a few of these, and refer the reader to the online tutorial for a full list of options. Figure 4 shows a model of chr2:24026977-24629787, a region we previously investigated for its complex but functional looping interactions (5). For ChIA-PET libraries, the viewer can display the locations of the DNA binding protein(s) used to create the library, as shown in Figure 4A, for CTCF (green) and RNA pol II (red). Notably, the user may upload a genomic annotation file in the broadPeak (http://genome.ucsc.edu/FAQ/FAQformat.html#format13) format used by ENCODE, which will color the 3D structure according to the intensity of the annotated peak. The color scale may be adjusted in a variety of ways to highlight regions of interest. Figure 4B shows the locations of strong and weak promoters (red), according to ChromHMM (ENCODE). It is immediately obvious that the promoters are colocalized within the cluster. This example demonstrates the utility of 3D modeling; to make such an inference from a 1D or 2D representation would require examining neighbors, next-nearest-neighbors, etc., to elucidate the promoter cluster. Additional genomic annotations are shown in Figure 4C (H3K4me1) and d (H3K4me3). In these cases, there is no apparent colocalization, as the histone marks are rather evenly distributed within the cluster and along the loops.

Figure 4.

3D structure of chr2:24026977-24629787 region for the GM12878 genome. (A) Structure colored according to genomic position. The locations of CTCF (green) and RNA pol II (red) are indicated by spheres. (B) Strong and weak promoter sites identified by ChromHMM (ENCODE) are colored red. The promoter sites co-localize within the chromosome cluster. (C) Intensity of H3K4me1 histone marks (ENCODE). (D) Intensity of H3K4me3 histone marks (ENCODE). There are two options for locally saving the structure. The current view can be saved as an image (png), suitable for publication and presentations. Additionally, the user can download the structure as an STL file, which is a common format for 3D rendering software.

DISCUSSION

The 3D-GNOME web server provides easy access to our 3D chromatin modeling platform and a wide array of analysis tools, all packaged in a convenient, user-friendly environment. A user with no computational expertise can easily create 3D models from 3C-like data, and we expect the availability of such a resource to greatly expand the opportunities for 3D chromatin analysis. Model generation typically takes a few minutes, and multiple structures can be requested simultaneously. The 1D arcs, 2D heatmaps and 3D structures generated by 3D-GNOME offer complementary representations of the library data. Together, these representations offer ample opportunities for data analysis, hypothesis generation, and testing. We believe that the 3D-GNOME webserver is a valuable tool for researchers that are already interested in the higher order chromatin organization, but are lacking either the experimental data (the first scenario), or advanced simulation software (the second scenario) to infer the 3D structures from their own interaction data.

23 in total

1. MOGEN: a tool for reconstructing 3D models of genomes from chromosomal conformation capturing data.

Authors: Tuan Trieu; Jianlin Cheng
Journal: Bioinformatics Date: 2015-12-31 Impact factor: 6.937

2. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing.

Authors: Guoliang Li; Melissa J Fullwood; Han Xu; Fabianus Hendriyan Mulawadi; Stoyan Velkov; Vinsensius Vega; Pramila Nuwantha Ariyaratne; Yusoff Bin Mohamed; Hong-Sain Ooi; Chandana Tennakoon; Chia-Lin Wei; Yijun Ruan; Wing-Kin Sung
Journal: Genome Biol Date: 2010-02-25 Impact factor: 13.583

Review 3. Restraint-based three-dimensional modeling of genomes and genomic domains.

Authors: François Serra; Marco Di Stefano; Yannick G Spill; Yasmina Cuartero; Michael Goodstadt; Davide Baù; Marc A Marti-Renom
Journal: FEBS Lett Date: 2015-05-14 Impact factor: 4.124

4. Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription.

Authors: Luca Giorgetti; Rafael Galupa; Elphège P Nora; Tristan Piolot; France Lam; Job Dekker; Guido Tiana; Edith Heard
Journal: Cell Date: 2014-05-08 Impact factor: 41.582

5. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors: Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal: Cell Date: 2014-12-11 Impact factor: 41.582

6. An oestrogen-receptor-alpha-bound human chromatin interactome.

Authors: Melissa J Fullwood; Mei Hui Liu; You Fu Pan; Jun Liu; Han Xu; Yusoff Bin Mohamed; Yuriy L Orlov; Stoyan Velkov; Andrea Ho; Poh Huay Mei; Elaine G Y Chew; Phillips Yao Hui Huang; Willem-Jan Welboren; Yuyuan Han; Hong Sain Ooi; Pramila N Ariyaratne; Vinsensius B Vega; Yanquan Luo; Peck Yean Tan; Pei Ye Choy; K D Senali Abayratna Wansa; Bing Zhao; Kar Sian Lim; Shi Chi Leow; Jit Sin Yow; Roy Joseph; Haixia Li; Kartiki V Desai; Jane S Thomsen; Yew Kok Lee; R Krishna Murthy Karuturi; Thoreau Herve; Guillaume Bourque; Hendrik G Stunnenberg; Xiaoan Ruan; Valere Cacheux-Rataboul; Wing-Kin Sung; Edison T Liu; Chia-Lin Wei; Edwin Cheung; Yijun Ruan
Journal: Nature Date: 2009-11-05 Impact factor: 49.962

7. A three-dimensional model of the yeast genome.

Authors: Zhijun Duan; Mirela Andronescu; Kevin Schutz; Sean McIlwain; Yoo Jung Kim; Choli Lee; Jay Shendure; Stanley Fields; C Anthony Blau; William S Noble
Journal: Nature Date: 2010-05-02 Impact factor: 49.962

8. Topological domains in mammalian genomes identified by analysis of chromatin interactions.

Authors: Jesse R Dixon; Siddarth Selvaraj; Feng Yue; Audrey Kim; Yan Li; Yin Shen; Ming Hu; Jun S Liu; Bing Ren
Journal: Nature Date: 2012-04-11 Impact factor: 49.962

9. Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for mapping chromatin interactions and understanding transcription regulation.

Authors: Yufen Goh; Melissa J Fullwood; Huay Mei Poh; Su Qin Peh; Chin Thing Ong; Jingyao Zhang; Xiaoan Ruan; Yijun Ruan
Journal: J Vis Exp Date: 2012-04-30 Impact factor: 1.355

10. The sequencing bias relaxed characteristics of Hi-C derived data and implications for chromatin 3D modeling.

Authors: Cheng Peng; Liang-Yu Fu; Peng-Fei Dong; Zhi-Luo Deng; Jian-Xin Li; Xiao-Tao Wang; Hong-Yu Zhang
Journal: Nucleic Acids Res Date: 2013-08-21 Impact factor: 16.971

13 in total

1. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data.

Authors: Kimberly MacKay; Anthony Kusalik
Journal: Brief Funct Genomics Date: 2020-07-29 Impact factor: 4.241

2. Producing genome structure populations with the dynamic and automated PGS software.

Authors: Nan Hua; Harianto Tjong; Hanjun Shin; Ke Gong; Xianghong Jasmine Zhou; Frank Alber
Journal: Nat Protoc Date: 2018-04-05 Impact factor: 13.491

3. Heterogeneous Loop Model to Infer 3D Chromosome Structures from Hi-C.

Authors: Lei Liu; Min Hyeok Kim; Changbong Hyeon
Journal: Biophys J Date: 2019-07-04 Impact factor: 4.033

Review 4. Computational approaches for inferring 3D conformations of chromatin from chromosome conformation capture data.

Authors: Dario Meluzzi; Gaurav Arya
Journal: Methods Date: 2019-08-27 Impact factor: 3.608

5. Machine learning polymer models of three-dimensional chromatin organization in human lymphoblastoid cells.

Authors: Ziad Al Bkhetan; Michal Kadlof; Agnieszka Kraft; Dariusz Plewczynski
Journal: Methods Date: 2019-03-07 Impact factor: 3.608

6. Chrom3D: three-dimensional genome modeling from Hi-C and nuclear lamin-genome contacts.

Authors: Jonas Paulsen; Monika Sekelja; Anja R Oldenburg; Alice Barateau; Nolwenn Briand; Erwan Delbarre; Akshay Shah; Anita L Sørensen; Corinne Vigouroux; Brigitte Buendia; Philippe Collas
Journal: Genome Biol Date: 2017-01-30 Impact factor: 13.583

Review 7. Challenges for visualizing three-dimensional data in genomic browsers.

Authors: Mike Goodstadt; Marc A Marti-Renom
Journal: FEBS Lett Date: 2017-08-24 Impact factor: 4.124

8. 4Cin: A computational pipeline for 3D genome modeling and virtual Hi-C analyses from 4C data.

Authors: Ibai Irastorza-Azcarate; Rafael D Acemel; Juan J Tena; Ignacio Maeso; José Luis Gómez-Skarmeta; Damien P Devos
Journal: PLoS Comput Biol Date: 2018-03-09 Impact factor: 4.475

9. 3D-GNOME 2.0: a three-dimensional genome modeling engine for predicting structural variation-driven alterations of chromatin spatial structure in the human genome.

Authors: Michal Wlasnowolski; Michal Sadowski; Tymon Czarnota; Karolina Jodkowska; Przemyslaw Szalaj; Zhonghui Tang; Yijun Ruan; Dariusz Plewczynski
Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971

10. Multiscale modeling of genome organization with maximum entropy optimization.

Authors: Xingcheng Lin; Yifeng Qi; Andrew P Latham; Bin Zhang
Journal: J Chem Phys Date: 2021-07-07 Impact factor: 3.488