Literature DB >> 29028898

diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data.

Caleb A Lareau1,2,3, Martin J Aryee1,2,3,4.   

Abstract

Summary: The 3D architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in DNA looping structure are associated with variation in gene expression and cell state. To systematically assess changes in DNA looping architecture between samples, we introduce diffloop, an R/Bioconductor package that provides a suite of functions for the quality control, statistical testing, annotation, and visualization of DNA loops. We demonstrate this functionality by detecting differences between ENCODE ChIA-PET samples and relate looping to variability in epigenetic state. Availability and implementation: Diffloop is implemented as an R/Bioconductor package available at https://bioconductor.org/packages/release/bioc/html/diffloop.html. Contact: aryee.martin@mgh.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29028898      PMCID: PMC5860605          DOI: 10.1093/bioinformatics/btx623

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The organization of DNA within the nucleus into hierarchical 3D structures plays a key role in regulating gene expression by determining the accessibility of genes to the transcriptional machinery as well as the proximity of genes to their distal regulatory elements. Differences in 3D architecture, such as the presence or absence of loops between enhancers and their target genes, are associated with transcriptional variation in both normal and disease states. Intriguingly, several recent studies have implicated alterations in genome topology with a diverse set of diseases (Flavahan ; Hnisz ). Experimental techniques that couple chromatin conformation capture (3C; Dekker ) with high-throughput sequencing have made the genome-wide identification of 3D interactions feasible. For example, the high-throughput chromosome conformation capture (Hi-C) assay, which can theoretically yield a near-complete map of chromatin interactions, has been used to map the 3D genome at a 1-kb resolution (Rao ). As Hi-C requires billions of reads to achieve this resolution, methods such as Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) (Tang ), HiChIP (Mumbach ), or promoter-capture Hi-C (Mifsud ) use capture techniques to enrich for specific subsets of interactions such as structural loops or enhancer–promoter interactions, allowing lower sequencing depths. These assays when coupled with appropriate preprocessing tools (Cairns ; Phanstiel ) produce interaction frequencies between pairs of genomic loci. In order to fully explore the role that 3D genome organization plays in determining normal and pathogenic cell states, statistical tools are needed to identify differences in DNA loops in a similar manner to which differential expression analysis is applied to transcriptional data. Additionally, the systematic integration of biological prior knowledge, such as the location of active enhancer regions, into topology analyses can provide annotation and insight into the regulatory role of a loop. To address these needs, we have developed diffloop, an R/Bioconductor package that implements statistical testing for differential DNA looping between samples from ChIA-PET, HiChIP, and related 3C assays. While existing tools such as DiffBind (Stark and Brown, 2015) and diffHiC (Lun and Smyth, 2015) provide functionality for identifying differential features from ChIP-seq and Hi-C experiments, diffloop provides a suite of functions tailored to chromatin loop data. Here, we briefly demonstrate some of the utility of the diffloop package by comparing chromatin interactions inferred by ChIA-PET replicates between the MCF7 and K562 cell lines (ENCODE Project Consortium, 2012).

2 Materials and methods

Following import of raw loop read counts diffloop combines counts across samples and assigns statistical significance to each putative loop using the method developed by Phanstiel . The calculation uses a model that takes into account the signal intensity at each of the anchors and the expected background chromatin interaction frequency for the given anchor separation distance. To identify differential loops, diffloop by default applies the statistical test in edgeR (Robinson ) where counts are modeled using the negative binomial distribution and an empirical Bayes procedure is used to moderate the degree of overdispersion. The counts matrix, rather than representing reads mapped to genes or transcripts as is typical in a differential expression analysis, instead contains PETs (i.e. paired-end reads). A scaling size factor is calculated for each sample to account for variations in read depth. The diffloop provides functionality for annotation of loops and loop anchors and to facilitate interpretation of the functional relevance of the significantly differential loops identified. A typical use case involves annotating anchors with chromatin mark data and, promoter overlap and gene expression levels. Loops may be categorized based on these annotations into categories such as CTCF-CTCF or enhancer-promoter and can be visualized with ease using novel diffloop functions.

3 Results

POL2 ChIA-PET data from two MCF7 and two K562 samples were individually preprocessed from raw reads to loops using the Mango preprocessing pipeline (Phanstiel ). Across the union of the four samples considered for our analyses, we observed a total of 87 456 anchor pairs involving 24 576 autosomal loci (anchors). After filtering out loci biased by copy number variation, loops only detected in a single sample, and anchor pairs with interaction frequencies within the range of the background signal (Phanstiel ), we retained 9320 loops for differential testing (see Supplementary Material). At an FDR of 1%, we identified 2633 differential loops between the cell lines, including 1974 loops that were annotated as enhancer-promoter loops. Supplementary Table S3 summarizes the top five differential enhancer–promoter loops specific to each cell line. Figure 1 provides a sample visualization of one of these differential loop regions where mutliple loops near the MTHFR gene were more prevalent in the K562 cell line than the MCF7 cell line.
Fig. 1

Sample visualizations of differential looping. The figure shows the combined POL2 ChIA-PET replicates for the K562 and MCF-7 cell lines as well as the cell type-specific H3K27ac ChIP-Seq track. Line widths are indicative of the number of PETs supporting a loop while colors represent biological annotation (red, enhancer-promoter; purple, enhancer-enhancer; black: no special annotation). The region highlighted contains the MTHFR gene, which has previously been implicated as an up-regulated feature of human leukemias such as the K562 cell line

Sample visualizations of differential looping. The figure shows the combined POL2 ChIA-PET replicates for the K562 and MCF-7 cell lines as well as the cell type-specific H3K27ac ChIP-Seq track. Line widths are indicative of the number of PETs supporting a loop while colors represent biological annotation (red, enhancer-promoter; purple, enhancer-enhancer; black: no special annotation). The region highlighted contains the MTHFR gene, which has previously been implicated as an up-regulated feature of human leukemias such as the K562 cell line To characterize the structural differences globally, we identified nine pathways enriched for genes involved in differential enhancer–promoter looping (see Supplementary Material). Genes related to estrogen response such as GREB1 and XBP1, for example, are linked by several strong loops to unique enhancers in the MCF-7 breast cancer cell line. Conversely, targets associated with c-MYC transcription factor, which plays a well-documented role in leukemia and hematopoiesis were enriched in K562. These results suggest that differential topology analyses can systematically uncover known and novel regulatory loops related to disease and other phenotypes of interest. Thus, we suggest that cell type-specific chromatin loops such as those identified here by diffloop can serve as a valuable epigenetic feature for characterizing cell identity. Click here for additional data file.
  12 in total

1.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription.

Authors:  Zhonghui Tang; Oscar Junhong Luo; Xingwang Li; Meizhen Zheng; Jacqueline Jufen Zhu; Przemyslaw Szalaj; Pawel Trzaskoma; Adriana Magalska; Jakub Wlodarczyk; Blazej Ruszczycki; Paul Michalski; Emaly Piecuch; Ping Wang; Danjuan Wang; Simon Zhongyuan Tian; May Penrad-Mobayed; Laurent M Sachs; Xiaoan Ruan; Chia-Lin Wei; Edison T Liu; Grzegorz M Wilczynski; Dariusz Plewczynski; Guoliang Li; Yijun Ruan
Journal:  Cell       Date:  2015-12-10       Impact factor: 41.582

2.  Mango: a bias-correcting ChIA-PET analysis pipeline.

Authors:  Douglas H Phanstiel; Alan P Boyle; Nastaran Heidari; Michael P Snyder
Journal:  Bioinformatics       Date:  2015-06-01       Impact factor: 6.937

3.  Capturing chromosome conformation.

Authors:  Job Dekker; Karsten Rippe; Martijn Dekker; Nancy Kleckner
Journal:  Science       Date:  2002-02-15       Impact factor: 47.728

4.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

5.  Activation of proto-oncogenes by disruption of chromosome neighborhoods.

Authors:  Denes Hnisz; Abraham S Weintraub; Daniel S Day; Anne-Laure Valton; Rasmus O Bak; Charles H Li; Johanna Goldmann; Bryan R Lajoie; Zi Peng Fan; Alla A Sigova; Jessica Reddy; Diego Borges-Rivera; Tong Ihn Lee; Rudolf Jaenisch; Matthew H Porteus; Job Dekker; Richard A Young
Journal:  Science       Date:  2016-03-03       Impact factor: 47.728

6.  CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data.

Authors:  Jonathan Cairns; Paula Freire-Pritchett; Steven W Wingett; Csilla Várnai; Andrew Dimond; Vincent Plagnol; Daniel Zerbino; Stefan Schoenfelder; Biola-Maria Javierre; Cameron Osborne; Peter Fraser; Mikhail Spivakov
Journal:  Genome Biol       Date:  2016-06-15       Impact factor: 13.583

7.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

8.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

9.  diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

Authors:  Aaron T L Lun; Gordon K Smyth
Journal:  BMC Bioinformatics       Date:  2015-08-19       Impact factor: 3.169

10.  Insulator dysfunction and oncogene activation in IDH mutant gliomas.

Authors:  William A Flavahan; Yotam Drier; Brian B Liau; Shawn M Gillespie; Andrew S Venteicher; Anat O Stemmer-Rachamimov; Mario L Suvà; Bradley E Bernstein
Journal:  Nature       Date:  2015-12-23       Impact factor: 49.962

View more
  24 in total

1.  multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments.

Authors:  John C Stansfield; Kellen G Cresswell; Mikhail G Dozmorov
Journal:  Bioinformatics       Date:  2019-09-01       Impact factor: 6.937

2.  Role of Systemic Lupus Erythematosus Risk Variants With Opposing Functional Effects as a Driver of Hypomorphic Expression of TNIP1 and Other Genes Within a Three-Dimensional Chromatin Network.

Authors:  Satish Pasula; Kandice L Tessneer; Yao Fu; Jaanam Gopalakrishnan; Richard C Pelikan; Jennifer A Kelly; Graham B Wiley; Mandi M Wiley; Patrick M Gaffney
Journal:  Arthritis Rheumatol       Date:  2020-03-30       Impact factor: 10.995

3.  Analysis of HiChIP Data.

Authors:  Martina Dori; Mattia Forcato
Journal:  Methods Mol Biol       Date:  2022

4.  Large-Scale Topological Changes Restrain Malignant Progression in Colorectal Cancer.

Authors:  Sarah E Johnstone; Alejandro Reyes; Yifeng Qi; Carmen Adriaens; Esmat Hegazi; Karin Pelka; Jonathan H Chen; Luli S Zou; Yotam Drier; Vivian Hecht; Noam Shoresh; Martin K Selig; Caleb A Lareau; Sowmya Iyer; Son C Nguyen; Eric F Joyce; Nir Hacohen; Rafael A Irizarry; Bin Zhang; Martin J Aryee; Bradley E Bernstein
Journal:  Cell       Date:  2020-08-24       Impact factor: 41.582

5.  The connectome of neural crest enhancers reveals regulatory features of signaling systems.

Authors:  Ana Paula Azambuja; Marcos Simoes-Costa
Journal:  Dev Cell       Date:  2021-04-13       Impact factor: 12.270

6.  STAG2 loss rewires oncogenic and developmental programs to promote metastasis in Ewing sarcoma.

Authors:  Biniam Adane; Gabriela Alexe; Bo Kyung A Seong; Diana Lu; Elizabeth E Hwang; Denes Hnisz; Caleb A Lareau; Linda Ross; Shan Lin; Filemon S Dela Cruz; Melissa Richardson; Abraham S Weintraub; Sarah Wang; Amanda Balboni Iniguez; Neekesh V Dharia; Amy Saur Conway; Amanda L Robichaud; Benjamin Tanenbaum; John M Krill-Burger; Francisca Vazquez; Monica Schenone; Jason N Berman; Andrew L Kung; Steven A Carr; Martin J Aryee; Richard A Young; Brian D Crompton; Kimberly Stegmaier
Journal:  Cancer Cell       Date:  2021-06-14       Impact factor: 38.585

7.  Post-transcriptional tuning of FGF signaling mediates neural crest induction.

Authors:  Jacqueline Copeland; Marcos Simoes-Costa
Journal:  Proc Natl Acad Sci U S A       Date:  2020-12-21       Impact factor: 12.779

8.  HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP.

Authors:  Merve Sahin; Wilfred Wong; Yingqian Zhan; Kinsey Van Deynze; Richard Koche; Christina S Leslie
Journal:  Nat Commun       Date:  2021-06-07       Impact factor: 14.919

9.  The chromatin, topological and regulatory properties of pluripotency-associated poised enhancers are conserved in vivo.

Authors:  Giuliano Crispatzu; Rizwan Rehimi; Tomas Pachano; Tore Bleckwehl; Sara Cruz-Molina; Cally Xiao; Esther Mahabir; Hisham Bazzi; Alvaro Rada-Iglesias
Journal:  Nat Commun       Date:  2021-07-16       Impact factor: 14.919

10.  Chromatin architecture reveals cell type-specific target genes for kidney disease risk variants.

Authors:  Aiping Duan; Hong Wang; Yan Zhu; Qi Wang; Jing Zhang; Qing Hou; Yuexian Xing; Jinsong Shi; Jinhua Hou; Zhaohui Qin; Zhaohong Chen; Zhihong Liu; Jingping Yang
Journal:  BMC Biol       Date:  2021-02-24       Impact factor: 7.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.