| Literature DB >> 35501320 |
Sarah B Reiff1, Andrew J Schroeder1, Koray Kırlı1, Andrea Cosolo1, Clara Bakker1, Soohyun Lee1, Alexander D Veit1, Alexander K Balashov1, Carl Vitzthum1, William Ronchetti1, Kent M Pitman1, Jeremy Johnson1, Shannon R Ehmsen1, Peter Kerpedjiev1, Nezar Abdennur2, Maxim Imakaev2, Serkan Utku Öztürk3, Uğur Çamoğlu3, Leonid A Mirny2,4, Nils Gehlenborg1, Burak H Alver1, Peter J Park5.
Abstract
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.Entities:
Mesh:
Year: 2022 PMID: 35501320 PMCID: PMC9061818 DOI: 10.1038/s41467-022-29697-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Genomic assay types in the 4D Nucleome Data Portal.
| Experiment type | No. of public experiment sets |
|---|---|
| in situ Hi-C | 335 |
| Dilution Hi-C | 118 |
| DNase Hi-C | 21 |
| Micro-C | 26 |
| Single-cell Hi-C | 11 |
| Single nucleus Hi-C | 17 |
| sci-Hi-C[ | 28 |
| Capture Hi-C[ | 40 |
| TCC | 14 |
| ChIA-PET | 4 |
| in situ ChIA-PET | 10 |
| ChIA-Drop[ | 2 |
| PLAC-seq[ | 18 |
| ChIP-seq | 141 |
| CUT&RUN | 61 |
| CUT&Tag[ | 2 |
| Repli-seq[ | 138 |
| SPRITE[ | 3 |
| DamID | 66 |
| ATAC-seq | 21 |
| RNA-seq | 90 |
| TRIP[ | 7 |
| NAD-seq[ | 8 |
| TSA-seq[ | 67 |
| MARGI[ | 6 |
| GAM[ | 6 |
| RE-seq(DpnII-seq[ | 11 |
| Bru-seq[ | 1 |
| MC-3C[ | 1 |
Bold rows indicate categories of genomic assays and their subtotal counts.
4D Nucleome analysis pipelines.
| Pipeline | Steps | Software | Available file formats | CWL/WDL filename |
|---|---|---|---|---|
| Hi-C1 | Alignment | bwa-mem | .bam | bwa-mem.cwl |
| Filtering | pairtools | .pairs | hi-c-processing-bam.cwl | |
| Merging replicates & matrix aggregation | cooler | .hic, .mcool | hi-c-processing-pairs.cwl | |
| MARGI2 | Alignment | bwa-mem | .bam | imargi-processing-fastq.cwl |
| Filtering | pairtools | .pairs | imargi-processing-bam.cwl | |
| Merging replicates & matrix aggregation | cooler | .mcool | imargi-processing-pairs.cwl | |
| Repli-seq3 | Alignment | bwa-mem | .bam | repliseq-parta.cwl |
| Filtering | samtools | - | ||
| Binning & aggregation | bedtools | .bw, .bg | ||
| CUT&RUN4 | Alignment & filtering | bowtie2, Picard, samtools, bedtools | .bam, .bedpe | cut-and-run-processing.cwl |
| Peak calling | SEACR | .bw, .bg, .bed | cut-and-run-postaln.cwl | |
| ATAC-seq5 | Alignment & filtering | bowtie2, bedtools | .bed | atac.wdl |
| Peak calling | MACS2 | .bw, .bigbed | ||
| ChIP-seq6 | Alignment & filtering | bwa, bedtools | .bed | chip.wdl |
| Peak calling | MACS2, SPP | .bw, .bigbed | ||
| RNA-seq7 | Alignment | STAR | .bam | rna-seq-pipeline.wdl |
| Expression quantification | RSEM | .tsv | ||
| Read coverage | STAR | .bw |
Listed below are (i) subdirectories for Docker images from https://hub.docker.com/r/4dndcic; (ii) subdirectories from github repositories at https://github.com/4dn-dcic/ that hold the CWL or WDL pipeline files; and (iii) subdirectories for more information from https://data.4dnucleome.org/resources/data-analysis/. Note that for the Repliseq pipeline as well as the WDL pipelines from ENCODE, there is only one workflow file for the whole pipeline.
14dn-hic, docker-4dn-hic/tree/v43/cwl, hi_c-processing-pipeline.
2imargi, iMARGI-Docker/tree/v1.1.1_dcic_4/src/cwl, imargi-pipeline.
3repliseq, docker-4dn-repliseq/tree/v16/cwl, repli-seq-processing-pipeline.
4cut-and-run-pipeline, docker-4dn-cut-and-run-pipeline/tree/v1/cwl, cut-and-run-pipeline.
5encode-atacseq, atac-seq-pipeline, atacseq-processing-pipeline.
6encode-chipseq, chip-seq-pipeline2, chipseq-processing-pipeline.
7encode-rnaseq, rna-seq-pipeline, rnaseq-processing-pipeline information.
Microscopy assay types in the 4D Nucleome Data Portal.
| Experiment type | No. of public experiment sets |
|---|---|
| DNA FISH | 275 |
| Immunofluorescence | 138 |
| SPT | 101 |
| RNA FISH | 77 |
| Optodroplet | 13 |
| Electron Tomography | 3 |
| Multiplexed FISH | 1 |
| Total | 608 |
Fig. 1Browsing 4DN experiment sets.
The browse view features a table of experiment sets, the second of which can be seen expanded here to show additional metadata and information about files. On the left are a number of properties that can be used to filter the results; here “in situ Hi-C” is selected as well as Tier 1 samples. The top two experiment sets in the table are also shown with their checkboxes checked so that the “Download Files” button above the table can be used.
Fig. 2Item page for a replicate set of Hi-C experiments.
A source publication is shown near the top of the page, when available. Below these are two dropdown boxes: the first, titled Assay Description, can be expanded for an explanation of the assay, and the second, titled Experiment Set Properties, contains a selection of basic metadata fields. Below is a window with several tabs. The selected tab shows the processed files associated with the experiment. The .mcool contact matrix file is visualized on the left using the integrated HiGlass browser as a 2D track, with TAD boundaries, insulation scores, and compartments as 1D tracks above it. This display can be expanded for further data exploration. Scrolling down on this page would reveal the quality control metrics associated with the processed data.
Fig. 3HiGlass Display containing 2D and 1D Genomic Tracks.
The display shows a visualization of an HFF in situ Hi-C contact matrix on the left, and one for H1 hESCs on the right. Above each 2D contact matrix visualization are 1D tracks from a TSA-seq experiment (topmost) and a DamID-seq experiment (second from top), in the cell type that matches the corresponding matrix. The “Add Data” button in the top left gives the user the option to add more files to the visualization display; on the top right, the user has the ability to save the display, clone the display to create a new item, or to manage permissions of who can view the display.
Fig. 4Microscopy experiment set item page.
Item pages for microscopy experiment sets are similar to those for genomics, but with a few important differences. The sample image field contains an image preview, which can be clicked for a popup containing an interactive image viewer. Beside the sample image is a field called “Imaging Paths”, which describes what is imaged in each channel for the experiments, including the biological target and the antibodies or probes used.
Fig. 54DN Data Portal architecture.
Metadata is submitted by users via spreadsheet forms and gets loaded into the database, while associated files are uploaded into cloud storage. Once metadata and files are in place, automated processing pipelines can be run on AWS using the Tibanna pipeline runner. All metadata can then be searched via the data portal website, where files can also be visualized with HiGlass. The data portal website is accessible to external users, who can also login in order to download files.