Literature DB >> 31197313

Chicdiff: a computational pipeline for detecting differential chromosomal interactions in Capture Hi-C data.

Jonathan Cairns1,2, William R Orchard1,3,4,5, Valeriya Malysheva1,3,4, Mikhail Spivakov1,3,4.   

Abstract

SUMMARY: Capture Hi-C is a powerful approach for detecting chromosomal interactions involving, at least on one end, DNA regions of interest, such as gene promoters. We present Chicdiff, an R package for robust detection of differential interactions in Capture Hi-C data. Chicdiff enhances a state-of-the-art differential testing approach for count data with bespoke normalization and multiple testing procedures that account for specific statistical properties of Capture Hi-C. We validate Chicdiff on published Promoter Capture Hi-C data in human Monocytes and CD4+ T cells, identifying multitudes of cell type-specific interactions, and confirming the overall positive association between promoter interactions and gene expression.
AVAILABILITY AND IMPLEMENTATION: Chicdiff is implemented as an R package that is publicly available at https://github.com/RegulatoryGenomicsGroup/chicdiff. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31197313      PMCID: PMC6853696          DOI: 10.1093/bioinformatics/btz450

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Differential signal detection in sequencing data is one of the most common tasks in genomic analyses. Multiple tools have been developed for this purpose, many of which, including DESeq and EdgeR, are based on the negative binomial models for count data (Anders and Huber, 2010; Robinson ). Such tools are theoretically suitable for the analysis of most sequencing data types, including chromatin immunoprecipitation and Hi-C, leading to the development of wrapper packages around DESeq and EdgeR that facilitate differential analyses for such data (Lareau and Aryee, 2018; Ross-Innes ). However, both of these algorithms have been developed with standard RNA sequencing data in mind and therefore not account for or benefit from the specific properties of data resulting from other assays, prompting the development of assay-specific differential analysis tools (Chen ; Liu and Ruan, 2017; Stansfield ; Xu ). Capture Hi-C (CHi-C) is a powerful experimental technique for detecting chromosomal interactions globally and at high resolution (Schoenfelder ). In CHi-C, the genome-wide pulldown of pairs of interacting genomic fragments by Hi-C is followed by sequence capture to selectively enrich Hi-C material for interactions involving (at least on one end) fragments of interest, termed ‘baits’. Differential analyses of CHi-C data are challenging due to sample normalization issues, sparsity and uneven signal-to-noise ratios across interaction distances and different capture baits, which are not accounted for by standard differential analysis algorithms. We have previously reported CHiCAGO, a statistical pipeline for robust detection of significant interactions in CHi-C data from a single condition (Cairns ). Here, we present Chicdiff, an R package for differential CHi-C data analysis. Chicdiff combines moderated differential testing for count data implemented in DESeq2 (Love ) with CHi-C-specific procedures for signal normalization informed by CHiCAGO and P-value weighting. Jointly, procedures implemented in Chicdiff enable a robust and sensitive detection of differential interactions in CHi-C data.

2 Approach

A schematic of the overall analysis approach is presented in Supplementary Figure S1. The following sections and Supplementary Note describe specific steps in more detail.

2.1 Feature selection

CHi-C data are often sparse, particularly at large interaction distances, limiting the power of differential signal detection. In part, this problem can be mitigated based on the fact CHi-C signals commonly spread to adjacent fragments (Eijsbouts ), most likely owing to the tethering of these fragments into the vicinity of the baits by nearby specific interactions. Therefore, to increase power, Chicdiff pools read across several fragments (by default, five in each direction) surrounding each interacting fragment of interest for each bait. A functionality is provided to prioritize fragment-level interactions within each detected differentially interacting region post-hoc (see Supplementary Note).

2.2 Data normalization and significance testing

Typically, in differential count analyses, a single normalization (scaling) factor is estimated per sample to account for differences in library size. However, we found that in CHi-C data, normalization can be further improved by taking into account the differences in the background levels for specific pairs of fragments between samples. In CHi-C, unlike in many other data types, such as RNA-seq, it is possible to obtain such background estimates from the data, and procedures for this are implemented in the Chicago package (Cairns ). Chicdiff combines scaling factors based on these background estimates with sample-level scaling factors in a manner that minimizes the total dispersion of read counts across replicates and conditions at each interaction. The count and scaling matrices generated as described above are provided as input for the DESeq2 package, which tests each interaction for differences between conditions using a negative binomial model with moderated dispersion estimation.

2.3 Weighted multiple testing treatment

As with other Hi-C-derived data types, signal-to-noise ratios and effect sizes in CHi-C data vary highly with interaction distance. This makes a strong case for non-uniform multiple testing correction, such that P-values for differential tests on longer-distance interactions are corrected more stringently compared with those on short-distance interactions. To do this, Chicdiff uses the Independent Hypothesis Weighting (IHW) method (Ignatiadis ) to learn P-value weights based on interaction distance in a manner that maximizes the number of rejected null hypotheses. However, training IHW weights on the test regions is not appropriate, since their P-values are often not uniform under the null due to selection bias, which violates IHW’s core assumption. Therefore, instead we learn weights on a separate ‘weight training set’ of fragment pairs randomly drawn from the full interaction count data for each sample (i.e. not limited to CHiCAGO-detected significant interactions), thus avoiding selection bias. The distance-dependent weights learned this way are applied to the P-values in the test set, and the resulting weighted P-values are reported to the user.

3 Use example

We applied Chicdiff to detect interactions specific to naive CD4+ T cells versus monocytes based on promoter CHi-C data from Javierre . This resulted in 208 232 detected differential interacting regions (weighted adjusted P-value <0.05; see Supplementary Table S1 for further summary statistics). An example of differential interactions is shown in Figure 1, and a heatmap of a subset of differential and non-differential interactions is shown in Supplementary Figure S2. As expected, differential promoter-interacting regions were enriched for differential enhancer activity between the two cell types (Supplementary Fig. S3). In addition, many genes whose promoters engaged in differential interactions showed consistent differences in expression (Supplementary Fig. S4). Supplementary Figures S5–S9 validate the Chicdiff approach by comparing the differential interaction calls obtained with and without pooling across multiple fragments, with Chicdiff versus standard DESeq2 normalization, and with and without P-value weighting, with respect to the expression of associated genes and other parameters.
Fig. 1.

Example of differential interactions detected by Chicdiff. Profiles of Promoter CHi-C interaction counts detected for WNT7A promoter in naive CD4+ T cells (top) and monocytes (bottom) generated by Chicdiff (data from Javierre ). Significant interactions detected for each condition separately by CHiCAGO are colour-coded (blue: 35). Significant differentially interacting regions detected by Chicdiff are depicted as red blocks. Interactions beyond 1 Mb each way cropped out

Example of differential interactions detected by Chicdiff. Profiles of Promoter CHi-C interaction counts detected for WNT7A promoter in naive CD4+ T cells (top) and monocytes (bottom) generated by Chicdiff (data from Javierre ). Significant interactions detected for each condition separately by CHiCAGO are colour-coded (blue: 35). Significant differentially interacting regions detected by Chicdiff are depicted as red blocks. Interactions beyond 1 Mb each way cropped out Click here for additional data file.
  14 in total

1.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

2.  Differential oestrogen receptor binding is associated with clinical outcome in breast cancer.

Authors:  Caryn S Ross-Innes; Rory Stark; Andrew E Teschendorff; Kelly A Holmes; H Raza Ali; Mark J Dunning; Gordon D Brown; Ondrej Gojis; Ian O Ellis; Andrew R Green; Simak Ali; Suet-Feung Chin; Carlo Palmieri; Carlos Caldas; Jason S Carroll
Journal:  Nature       Date:  2012-01-04       Impact factor: 49.962

3.  The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements.

Authors:  Stefan Schoenfelder; Mayra Furlan-Magaril; Borbala Mifsud; Filipe Tavares-Cadete; Robert Sugar; Biola-Maria Javierre; Takashi Nagano; Yulia Katsman; Moorthy Sakthidevi; Steven W Wingett; Emilia Dimitrova; Andrew Dimond; Lucas B Edelman; Sarah Elderkin; Kristina Tabbada; Elodie Darbo; Simon Andrews; Bram Herman; Andy Higgs; Emily LeProust; Cameron S Osborne; Jennifer A Mitchell; Nicholas M Luscombe; Peter Fraser
Journal:  Genome Res       Date:  2015-03-09       Impact factor: 9.043

4.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

5.  Data-driven hypothesis weighting increases detection power in genome-scale multiple testing.

Authors:  Nikolaos Ignatiadis; Bernd Klaus; Judith B Zaugg; Wolfgang Huber
Journal:  Nat Methods       Date:  2016-05-30       Impact factor: 28.547

6.  CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data.

Authors:  Jonathan Cairns; Paula Freire-Pritchett; Steven W Wingett; Csilla Várnai; Andrew Dimond; Vincent Plagnol; Daniel Zerbino; Stefan Schoenfelder; Biola-Maria Javierre; Cameron Osborne; Peter Fraser; Mikhail Spivakov
Journal:  Genome Biol       Date:  2016-06-15       Impact factor: 13.583

7.  Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

Authors:  Biola M Javierre; Oliver S Burren; Steven P Wilder; Roman Kreuzhuber; Steven M Hill; Sven Sewitz; Jonathan Cairns; Steven W Wingett; Csilla Várnai; Michiel J Thiecke; Frances Burden; Samantha Farrow; Antony J Cutler; Karola Rehnström; Kate Downes; Luigi Grassi; Myrto Kostadima; Paula Freire-Pritchett; Fan Wang; Hendrik G Stunnenberg; John A Todd; Daniel R Zerbino; Oliver Stegle; Willem H Ouwehand; Mattia Frontini; Chris Wallace; Mikhail Spivakov; Peter Fraser
Journal:  Cell       Date:  2016-11-17       Impact factor: 41.582

8.  HiCcompare: an R-package for joint normalization and comparison of HI-C datasets.

Authors:  John C Stansfield; Kellen G Cresswell; Vladimir I Vladimirov; Mikhail G Dozmorov
Journal:  BMC Bioinformatics       Date:  2018-07-31       Impact factor: 3.169

9.  Fine mapping chromatin contacts in capture Hi-C data.

Authors:  Christiaan Q Eijsbouts; Oliver S Burren; Paul J Newcombe; Chris Wallace
Journal:  BMC Genomics       Date:  2019-01-23       Impact factor: 3.969

10.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

View more
  4 in total

1.  Multiple epigenetic factors co-localize with HMGN proteins in A-compartment chromatin.

Authors:  Bing He; Iris Zhu; Yuri Postnikov; Takashi Furusawa; Lisa Jenkins; Ravikanth Nanduri; Michael Bustin; David Landsman
Journal:  Epigenetics Chromatin       Date:  2022-06-27       Impact factor: 5.465

2.  Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools.

Authors:  Paula Freire-Pritchett; Helen Ray-Jones; Monica Della Rosa; Chris Q Eijsbouts; William R Orchard; Steven W Wingett; Chris Wallace; Jonathan Cairns; Mikhail Spivakov; Valeriya Malysheva
Journal:  Nat Protoc       Date:  2021-08-09       Impact factor: 13.491

3.  3D promoter architecture re-organization during iPSC-derived neuronal cell differentiation implicates target genes for neurodevelopmental disorders.

Authors:  Chun Su; Mariana Argenziano; Sumei Lu; James A Pippin; Matthew C Pahl; Michelle E Leonard; Diana L Cousminer; Matthew E Johnson; Chiara Lasconi; Andrew D Wells; Alessandra Chesi; Struan F A Grant
Journal:  Prog Neurobiol       Date:  2021-02-02       Impact factor: 10.885

4.  Cohesin-Dependent and -Independent Mechanisms Mediate Chromosomal Contacts between Promoters and Enhancers.

Authors:  Michiel J Thiecke; Gordana Wutz; Matthias Muhar; Wen Tang; Stephen Bevan; Valeriya Malysheva; Roman Stocsits; Tobias Neumann; Johannes Zuber; Peter Fraser; Stefan Schoenfelder; Jan-Michael Peters; Mikhail Spivakov
Journal:  Cell Rep       Date:  2020-07-21       Impact factor: 9.423

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.