Literature DB >> 31840950

CaptureProbe: a java tool for designing probes for capture Hi-C applications.

Yun-Fei Ma1,2,3, Adeniyi C Adeola1, Yan-Bo Sun1, Hai-Bing Xie4, Ya-Ping Zhang1,5.   

Abstract

Many functional elements associated with traits and diseases are located in non-coding regions and act on distant target genes via chromatin looping and folding, making it difficult for scientists to reveal the genetic regulatory mechanisms. Capture Hi-C is a newly developed chromosome conformation capture technology based on hybridization capture between probes and target genomic regions. It can identify interactions among target loci and all other loci in a genome with low cost and high resolution. Here, we developed CaptureProbe, a user-friendly, graphical Java tool for the design of capture probes across a range of target sites or regions. Numerous parameters helped to achieve and optimize the designed probes. Design testing of CaptureProbe showed high efficiency in the design success ratio of target loci and probe specificity. Hence, this program will help scientists conduct genome spatial interaction research. CaptureProbe and source code are available at https://sourceforge.net/projects/captureprobe/.

Entities:  

Keywords:  Capture Hi-C; Capture probes; CaptureProbe

Mesh:

Substances:

Year:  2020        PMID: 31840950      PMCID: PMC6956725          DOI: 10.24272/j.issn.2095-8137.2020.010

Source DB:  PubMed          Journal:  Zool Res        ISSN: 2095-8137


DEAR EDITOR,

Many functional elements associated with traits and diseases are located in non-coding regions and act on distant target genes via chromatin looping and folding, making it difficult for scientists to reveal the genetic regulatory mechanisms. Capture Hi-C is a newly developed chromosome conformation capture technology based on hybridization capture between probes and target genomic regions. It can identify interactions among target loci and all other loci in a genome with low cost and high resolution. Here, we developed CaptureProbe, a user-friendly, graphical Java tool for the design of capture probes across a range of target sites or regions. Numerous parameters helped to achieve and optimize the designed probes. Design testing of CaptureProbe showed high efficiency in the design success ratio of target loci and probe specificity. Hence, this program will help scientists conduct genome spatial interaction research. CaptureProbe and source code are available at https://sourceforge.net/projects/captureprobe/. Genome level studies on traits and diseases in different organisms have revealed that the majority of associated genetic loci are located in non-coding regions and are enriched in different regulatory signals, thus suggesting their regulatory functions (Maurano et al., 2012; Welter et al., 2014; Zhang et al., 2014). Regulatory elements can act on multiple genes and distant target genes via chromatin looping (Maston et al., 2006). Therefore, elucidation of the regulatory mechanisms of these non-coding loci is not reliable when applying simple assignment to the nearest genes. Chromosome conformation capture with high-throughput sequencing (Hi-C) allows for the identification of physical chromatin interactions across an entire genome (Lieberman-Aiden et al., 2009). However, the enormous complexity of Hi-C libraries makes it costly to obtain sufficient spatial resolution to detect interactions among specific elements. To circumvent these issues, capture Hi-C technology with capture probes was developed to reduce the target regions for sequencing in order to identify interactions among target loci and all other loci in a genome at low cost (Mifsud et al., 2015; Sahlén et al., 2015; Schoenfelder et al., 2015). This technology has been used extensively in different studies to reveal the regulatory mechanisms of traits or disease-associated loci in non-coding regions (Baxter et al., 2018; Mishra & Hawkins, 2017). The design of capture probes is a necessary prerequisite for capture Hi-C experiments and can be complex work for researchers without programming experience. Several software tools have been designed for capture Hi-C probes, including CapSequm (Davies et al., 2016), HiCapTools (Anil et al., 2018), and GOPHER (Hansen et al., 2019). These toolkits are important in capture Hi-C-related analysis but cannot meet all requirements of diverse experiments. For instance, CapSequm, which is a web application for designing capture probes, can only process 1 000 positions at a time and provides very limited design parameters (e.g., probe length, restriction enzyme). HiCapTools was designed to find probes for target sites, but not for target regions, which are very common candidates for genetic research. In addition, HiCapTools contains limited parameters, which reduces its flexibility when considering specific DNA sequencing contexts. Furthermore, it is a command-line program and requires a series of input files, and thus is not very user friendly. The recently developed program GOPHER can design capture probes for both target sites and regions and includes a user-friendly graphic user interface (GUI). However, its capture probe design capacity is currently limited to human and mouse. In this study, we developed CaptureProbe, a Java tool with a graphical user-friendly interface that can design capture probes for both target genetic sites and regions without species limitation. CaptureProbe is easy to use, only requires simple input files, and provides abundant parameters for probe design. Moreover, it can also give detailed statistical information about design results. Comparisons between CaptureProbe and other existing tools showed that it provides rich software functions and shows better or equivalent performance in designing capture probes. To achieve good performance in capturing informative ligation fragments, CaptureProbe designs probes based on the structural features of the Hi-C library. The Hi-C library consists of ligated restriction fragments originally in close spatial proximity in the nucleus (Lieberman-Aiden et al., 2009). Usually, these ligation fragments are sheared to a specific size range to ensure suitability for high-throughput sequencing. Therefore, CaptureProbe designs probes to capture both ends of the target restriction fragment (overlapping target sites or regions) and selects probes nearest to the end of the target restriction fragment. The program initially starts probe design from both ends of the target restriction fragment and moves inward by 1 bp for each cycle. To improve capture efficiency and specificity of probes, CaptureProbe calculates the GC content and missing and repeated bases (missing bases: n/N, repeated bases: marked in lowercase) in the probe sequence and chooses the first probe that meets all parameter limitations provided by the user. CaptureProbe can avoid redundancy probe sequences caused by target site overlap in the same restriction fragment. Running CaptureProbe is very simple, requiring only the coordinate file of the target sites/regions and the sequence file (fasta format). CaptureProbe can limit any repeated sequences marked in the sequence files when designing the probes. Users can employ windows to directly specify the path of the required files and to set parameters. All real-time configuration information can be printed for users to check progress. After running, CaptureProbe can print detailed information on the results of the capture probe design for users to evaluate the results. CaptureProbe will generate a series of result files for users to customize probes and to check the design state of each target site or region. We systematically compared software function and probe design performance between CaptureProbe and other existing tools (Tables 1–2). As CapSequm function is limited, comparison analysis was not included. Both CaptureProbe and GOPHER showed rich functions and user-friendly GUI (Table 1).
1

Functional comparisons among CaptureProbe and other tools

ToolCaptureP-robeHiCap-ToolsGOPHER
Supported speciesAllAllHuman, Mouse
Type of target lociSite/RegionSiteSite/Re-gion
GUI×
Detailed probe information×
Probe GC content limitation×
Probe missing base limitation××
Repeated sequence limitation×
Design margin limitation×
Fragment size limitation×
Fragment missing base limitation××
Mapping score limitation×
2

Comparisons of capture probe design among CaptureProbe and other tools

ToolCaptureProbeHiCapToolsGOPHER
Testing site number (n) 20 00020 00020 000
Both ends with probes (%)42.4021.2685.76
Only upstream with probe (%)18.7723.142.67
Only downstream with probe (%)19.2823.773.02
Total sites with probes (%)80.4568.1791.45
Total sites with no probes (%)19.5731.838.56
Probe GC content <25% (%)0.003.030.00
Probe GC content >65% (%)0.000.340.00
Probe with extreme GC content (%)0.003.370.00
Probe with unique alignment (%)92.8477.4483.34
Probe with multiple aligments (%)7.1022.3116.41
Probe with no alignment (%)0.060.250.25
In this study, we only evaluated design performance for target sites as the mechanism is the same for target sites and regions. Twenty thousand random target sites (not from gap regions) in the human genome (hg38) were generated for testing. The same parameters were set for all tools: i.e., probe length, 120 bp; repeat sequence length, 6 bp; restriction enzyme, Hand III; minimal fragment length, 300 bp; design margin size, 500 bp; probe GC content, 25%–65%; with all other parameters set using default values. Firstly, we compared the design success ratio among the three programs. GOPHER showed the highest design success ratio (91.45%), followed by CaptureProbe (80.45%), and finally HiCapTools (68.17%). We next accessed the specificity of the probes, using BLASTN (Altschul et al., 1990) to align all probes to the genome sequence. CaptureProbe demonstrated the highest ratio of unique alignment (92.84%) among the programs (GOPHER: 83.34%, HiCapTools: 77.44%). As HiCapTools could not filter GC content in the probe sequences, partial probes of HiCapTools (3.37%) showed extremely high GC content (<25% or >65%), which did not match the efficient capture range (Agilent Technologies). Furthermore, we also found that small probes from GOPHER contained ambiguous characters (N). Here, we present a very simple and user-friendly Java tool (CaptureProbe) that facilitates rapid capture probe design for target chromosome capture applications with no species limitation. CaptureProbe provides rich software functions and shows good probe design performance. Comparisons with existing software demonstrated that CaptureProbe has a good design success ratio and better probe specificity. CaptureProbe will be useful for a wide range of scientists studying genome spatial interactions.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS' CONTRIBUTIONS

Y.F.M., Y.P.Z., and H.B.X. designed the research. Y.F.M. implemented the Java code and analyzed the data. Y. F. M., Y. P. Z., and H. B. X. wrote the paper. A. C. A. and Y. B. S. revised and edited the manuscript. All authors read and approved the final version of the manuscript.
  14 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

Review 2.  Transcriptional regulatory elements in the human genome.

Authors:  Glenn A Maston; Sara K Evans; Michael R Green
Journal:  Annu Rev Genomics Hum Genet       Date:  2006       Impact factor: 8.929

3.  Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

Authors:  Erez Lieberman-Aiden; Nynke L van Berkum; Louise Williams; Maxim Imakaev; Tobias Ragoczy; Agnes Telling; Ido Amit; Bryan R Lajoie; Peter J Sabo; Michael O Dorschner; Richard Sandstrom; Bradley Bernstein; M A Bender; Mark Groudine; Andreas Gnirke; John Stamatoyannopoulos; Leonid A Mirny; Eric S Lander; Job Dekker
Journal:  Science       Date:  2009-10-09       Impact factor: 47.728

Review 4.  Laying a solid foundation for Manhattan--'setting the functional basis for the post-GWAS era'.

Authors:  Xiaoyang Zhang; Swneke D Bailey; Mathieu Lupien
Journal:  Trends Genet       Date:  2014-03-22       Impact factor: 11.639

5.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C.

Authors:  Borbala Mifsud; Filipe Tavares-Cadete; Alice N Young; Robert Sugar; Stefan Schoenfelder; Lauren Ferreira; Steven W Wingett; Simon Andrews; William Grey; Philip A Ewels; Bram Herman; Scott Happe; Andy Higgs; Emily LeProust; George A Follows; Peter Fraser; Nicholas M Luscombe; Cameron S Osborne
Journal:  Nat Genet       Date:  2015-05-04       Impact factor: 38.330

6.  The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements.

Authors:  Stefan Schoenfelder; Mayra Furlan-Magaril; Borbala Mifsud; Filipe Tavares-Cadete; Robert Sugar; Biola-Maria Javierre; Takashi Nagano; Yulia Katsman; Moorthy Sakthidevi; Steven W Wingett; Emilia Dimitrova; Andrew Dimond; Lucas B Edelman; Sarah Elderkin; Kristina Tabbada; Elodie Darbo; Simon Andrews; Bram Herman; Andy Higgs; Emily LeProust; Cameron S Osborne; Jennifer A Mitchell; Nicholas M Luscombe; Peter Fraser
Journal:  Genome Res       Date:  2015-03-09       Impact factor: 9.043

7.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

8.  Capture Hi-C identifies putative target genes at 33 breast cancer risk loci.

Authors:  Joseph S Baxter; Olivia C Leavy; Nicola H Dryden; Sarah Maguire; Nichola Johnson; Vita Fedele; Nikiana Simigdala; Lesley-Ann Martin; Simon Andrews; Steven W Wingett; Ioannis Assiotis; Kerry Fenwick; Ritika Chauhan; Alistair G Rust; Nick Orr; Frank Dudbridge; Syed Haider; Olivia Fletcher
Journal:  Nat Commun       Date:  2018-03-12       Impact factor: 14.919

9.  Multiplexed analysis of chromosome conformation at vastly improved sensitivity.

Authors:  James O J Davies; Jelena M Telenius; Simon J McGowan; Nigel A Roberts; Stephen Taylor; Douglas R Higgs; Jim R Hughes
Journal:  Nat Methods       Date:  2015-11-23       Impact factor: 28.547

Review 10.  Three-dimensional genome architecture and emerging technologies: looping in disease.

Authors:  Arpit Mishra; R David Hawkins
Journal:  Genome Med       Date:  2017-09-30       Impact factor: 11.117

View more
  1 in total

1.  OrthReg: a tool to predict cis-regulatory elements based on cross-species orthologous sequence conservation.

Authors:  Yun-Fei Ma; Cui-Ping Huang; Fang-Ru Lu; Jin-Xiu Li; Xu-Man Han; Adeniyi C Adeola; Yun Gao; Jia-Kun Deng; Hai-Bing Xie; Ya-Ping Zhang
Journal:  Zool Res       Date:  2020-07-18
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.