Literature DB >> 29444232

HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications.

Anandashankar Anil¹, Rapolas Spalinskas¹, Örjan Åkerborg¹, Pelin Sahlén¹.

Abstract

Summary: Folding of eukaryotic genomes within nuclear space enables physical and functional contacts between regions that are otherwise kilobases away in sequence space. Targeted chromosome conformation capture methods (T2C, chi-C and HiCap) are capable of informing genomic contacts for a subset of regions targeted by probes. We here present HiCapTools, a software package that can design sequence capture probes for targeted chromosome capture applications and analyse sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution. Targeted chromosome conformation capture applications are invaluable for locating target genes of disease-associated variants found by genome-wide association studies. Hence, we believe our software suite will prove to be useful for a wider user base within clinical and functional applications. Availability: https://github.com/sahlenlab/HiCapTools. Contact: pelinak@kth.se. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2018 PMID： 29444232 PMCID： PMC6368139 DOI： 10.1093/bioinformatics/btx625

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Promoters play a pivotal role in regulating expression levels of the corresponding genes (Smale and Kadonaga, 2003). Promoters and enhancers contain binding sites for both ubiquitous and tissue-specific transcription factors, with chromatin loops bringing promoters proximal to distal enhancers enabling modulated regulation (Maston ; Shen ; Spitz and Furlong, 2012; Visel ). Chromosome conformation capture adapted for high-throughput sequencing (Hi-C) preserves DNA looping information and provides a list of regions in close proximity (Lieberman-Aiden ). This powerful methodology enabled us to understand how genomes are organized in 3D space within the nucleus (Dixon ; Ea ). However, a linear increase in Hi-C resolution requires a quadratic increase in sequencing depth, making it a costly method to detect interactions occurring between features such as promoters or enhancers. To map promoter-anchored interactions, a targeted Hi-C approach can be used where Hi-C material is hybridized to a set of promoter targeting sequence capture probes (Dixon ; Dryden ; Jäger ; Ma ; Sahlén ). This allows focusing on proximities of targeted regions and restores the linear relationship between the read depth and sensitivity. Hi-C uses a four or six cutter restriction enzyme for fragmentation of the genome, and this choice dictates the resolution of targeted Hi-C. CHiCAGO is a software tool which can be used to detect DNA looping in targeted Hi-C data (Cairns ). It deploys a convolution of two distributions corresponding to Brownian collisions and technical noise to call feature interactions. Here we present HiCapTools, a software package that can fully support a targeted Hi-C experimental setup with modules to determine probe placement and to detect proximities in the output. In contrast to CHiCAGO, HiCapTools uses a negative control probe set to generate a background contact frequency distribution to calculate the statistical significance of observed proximities.

2 Implementation

HiCapTools has two modules: the first selects capture probes to target sequences of interest. The second module processes a mapped and filtered targeted Hi-C dataset (Wingett ) to detect proximities between probes and the rest of the genome. The two modules will be described separately below. The software is implemented using C ++11 and compiled using the gcc compiler (version 4.9) and packaged with CMake tool (version 3.5.1). Detailed instructions on installation and usage are provided in the supplementary text.

Module 1: probe design (PD1)

The module generates a list of regions that will be targeted by sequence capture probes. Since most chromosome conformation capture applications fragment the genome using restriction enzymes, informative products contain a ligation site between two restriction fragments, i.e. a junction. Therefore, probes are placed precisely next to the restriction sites to maximize capture of fragments containing junctions. HiCapTools is currently compatible only with Hi-C performed with restriction enzymes. The software requires two mandatory and two optional input files: coordinates of restriction fragments, a list of transcripts or features, coordinates of the repeat regions (optional) and alignability scores of the genome (optional). The module reads restriction fragment coordinates, repeat regions and alignability scores into memory (Kent, 2014), and stores them using interval trees (Garrison, 2015). It then reads features [such as transcripts and single nucleotide variants (SNVs)] and locates neighboring restriction sites. In the minimal mode, the module reports sequences that target restriction fragments closest to the features, as dictated by the lower and upper thresholds of distance from the feature (user provided). However, if repeat and alignability files are provided, regions with low sequence quality are avoided while placing probes. In this case PD1 successively searches restriction sites within a given distance from the feature and chooses probes satisfying the conditions set by the user (Supplementary Fig. S1 and Supplementary Material). The user can also set the distance between probes to avoid placing probes too close to each other.

Module 2: proximity detector (PD2)

The second module of the suite reports proximities between targeted regions and the rest of the genome. The module takes a sorted binary alignment file (BAM) and requires that invalid junctions and duplicate read pairs are removed beforehand. The program processes only pairs mapped on targeted regions. BAMTools is used to read the alignment files (Barnett ). First a probe region is set to determine all alignments on that probe (Supplementary Fig. S2).Then restriction fragments containing the mate of each alignment are located and counted. PD2 then uses fragments and counts to generate two lists of proximities – one that is between targeted and non-targeted regions (feature to distal) and the second between targeted regions themselves (feature to feature). It is possible to include negative controls, i.e., a set of probes in the design that target regions with no known annotation or regulatory potential, and PD1 can select such regions and selects probes for targeting (Supplementary text). Proximities of such probes can then be used to obtain contact frequencies at different distances occurring due to structural constraints (Supplementary text). This is achieved by binning the observed distances of such probes (default bin size is 1 kb). Mean and standard deviation of each bin is calculated to generate contact frequency versus distance. To avoid over-penalizing for distances observed only a few times (particularly the case for distances over 200 kb), contact frequency distribution was smoothed using the moving average method (Kenney and Keeping, 1962). Statistical significance of observed proximities was assigned by means of a p-value, obtained relative to the background distribution of the corresponding distance bin, under a normality assumption (Gautschi, 1972; Bochkanov). The value of P-value based filtering was assessed using an in-house generated targeted Hi-C dataset obtained from the GM12878 cell line (Marco ). The dataset was overlapped with selected enhancer associated H3K4me1 peaks from ENCODE (ENCODE Project Consortium, 2004), and peaks obtained with the enhancer track of the tfNet-repository (Diamanti ). Overlap enrichment was calculated relative to promoter-distance and fragment-length matched but otherwise random regions. A stricter P-value increases enrichment (Supplementary Fig. S3a and b), in particular for regions at larger distance from the corresponding promoter. A control set with enhancer inactive H3K9me3 peaks lack a similar enrichment signal (Supplementary Fig. S3c). We then processed the same dataset above using the CHiCAGO tool (Cairns ) and compared its regulatory element enrichment levels to those obtained with HiCapTools (Supplementary Material and Supplementary Table S1). We found that CHiCAGO performs better at short (<50 kb) distances whereas the opposite is true for distant (>500 kb) proximities (Supplementary Fig. S3b). The two tools show similar results at medium distances.

3 Discussion

Targeted Hi-C applications are gaining significant interest in the functional and disease biology fields, particularly in cases where non-coding variants play important roles. Special care is taken to make the software accessible for users with minimal bioinformatics background such as clinical researchers. Therefore, HiCapTools should be able to attract a wide user base given its relevance and convenience. Click here for additional data file.

8 in total

1. Detecting chromosomal interactions in Capture Hi-C data with CHiCAGO and companion tools.

Authors: Paula Freire-Pritchett; Helen Ray-Jones; Monica Della Rosa; Chris Q Eijsbouts; William R Orchard; Steven W Wingett; Chris Wallace; Jonathan Cairns; Mikhail Spivakov; Valeriya Malysheva
Journal: Nat Protoc Date: 2021-08-09 Impact factor: 13.491

2. ChiCMaxima: a robust and simple pipeline for detection and visualization of chromatin looping in Capture Hi-C.

Authors: Yousra Ben Zouari; Anne M Molitor; Natalia Sikorska; Vera Pancaldi; Tom Sexton
Journal: Genome Biol Date: 2019-05-22 Impact factor: 13.583

3. Allele specific chromatin signals, 3D interactions, and motif predictions for immune and B cell related diseases.

Authors: Marco Cavalli; Nicholas Baltzer; Husen M Umer; Jan Grau; Ioana Lemnian; Gang Pan; Ola Wallerman; Rapolas Spalinskas; Pelin Sahlén; Ivo Grosse; Jan Komorowski; Claes Wadelius
Journal: Sci Rep Date: 2019-02-25 Impact factor: 4.379

4. CaptureProbe: a java tool for designing probes for capture Hi-C applications.

Authors: Yun-Fei Ma; Adeniyi C Adeola; Yan-Bo Sun; Hai-Bing Xie; Ya-Ping Zhang
Journal: Zool Res Date: 2020-01-18

Review 5. Capture-C: a modular and flexible approach for high-resolution chromosome conformation capture.

Authors: Damien J Downes; Alastair L Smith; Magdalena A Karpinska; Taras Velychko; Kevin Rue-Albrecht; David Sims; Thomas A Milne; James O J Davies; A Marieke Oudelaar; Jim R Hughes
Journal: Nat Protoc Date: 2022-02-04 Impact factor: 17.021

Review 6. Genetics in Atopic Dermatitis: Historical Perspective and Future Prospects.

Authors: Sara J Brown; Martina S Elias; Maria Bradley
Journal: Acta Derm Venereol Date: 2020-06-09 Impact factor: 3.875

7. A Multi-Omics Approach to Liver Diseases: Integration of Single Nuclei Transcriptomics with Proteomics and HiCap Bulk Data in Human Liver.

Authors: Marco Cavalli; Klev Diamanti; Gang Pan; Rapolas Spalinskas; Chanchal Kumar; Atul Shahaji Deshmukh; Matthias Mann; Pelin Sahlén; Jan Komorowski; Claes Wadelius
Journal: OMICS Date: 2020-03-16

8. Whole-genome analysis of haemophilus influenzae invasive strains isolated from Campinas state University hospital. An epidemiological approach 2012 - 2019 and ancestor strains.

Authors: Rafaella Fabiana Carneiro Pereira; João Paulo de Oliveira Guarnieri; Carlos Fernando Macedo da Silva; Bruno Gaia Bernardes; Marcelo Lancellotti
Journal: Braz J Infect Dis Date: 2021-12-24 Impact factor: 3.257

8 in total