Literature DB >> 19158162

Disperse--a software system for design of selector probes for exon resequencing applications.

J Stenberg1, M Zhang, H Ji.   

Abstract

SUMMARY: Selector probes enable the amplification of many selected regions of the genome in multiplex. Disperse is a software pipeline that automates the procedure of designing selector probes for exon resequencing applications. AVAILABILITY: Software and documentation is available at http://bioinformatics.org/disperse

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19158162      PMCID: PMC2647824          DOI: 10.1093/bioinformatics/btp001

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


The advent of next-generation sequencing systems has brought with it interesting new possibilities for genomic research (Shendure et al., 2004). One application is resequencing selected regions of the human genome. To accomplish this, methods for sample preparation are required that allow the selective enrichment of many arbitrary regions of the genome. Several methods have been proposed to this end, such as hybridization of fragmented genomic DNA to oligonucleotide arrays, followed by washing and amplification of the retained material (Albert et al., 2007; Okou et al., 2007), and various probe-basedapproaches using PCR to achieve multiplex amplification (Dahl et al., 2007; Fredriksson et al., 2007; Porreca et al., 2007). One of these approaches, the selector technique, allows amplification of arbitrarily selected restriction fragments in a single-primer pair PCR (Dahl et al., 2005). Selectors are oligonucleotide constructs that consist of a general sequence motif flanked by target-specific end sequences. Each selector targets a single restriction fragment and guides the circularization of this fragment by ligation, incorporating the general sequence motif into all fragments. Either the complete fragment is circularized, or a portion of the fragment is cleaved off using the endonucleolytic cleavage activity of a polymerase, and the 3′ part of the fragment is circularized. All targeted fragments are then amplified in multiplex. In previous work, we have demonstrated how this can be used for amplification of a large set of exons for subsequent resequencing using massively parallel pyrosequencing (Dahl et al., 2007). Designing selector probes for exon resequencing of a set of genes has previously involved querying databases through web interfaces and running a series of command line utilities and interactive programs. This procedure has been time consuming and not readily reproducible, which prompted us to develop an integrated software system, Disperse, to facilitate this task. Given a list of genes as HUGO gene symbols, and a set of design parameters, Disperse will generate a set of selector probe sequences, designed to select the largest possible portion of the targeted sequence. The design work is performed in a pipelined fashion, where a number of steps are executed sequentially, each step utilizes the results of previous steps. Each step is implemented as a Perl script or a command line Java program, and all steps can be executed in the order using a pipeline script. The design parameters and the input set of gene names are defined in text files, and the intermediate and final results are also stored as text files. The steps are as follows: By default, all these steps are carried out when the pipeline script is executed. If manual intervention is required at some stage, e.g. to add a region of interest outside coding regions, the script can be set to run only a subset of the steps at a time. Determine coordinates for all coding regions of target genes by lookup in a local copy of CCDS data (http://www.ncbi.nlm.nih.gov/projects/CCDS/). For genes not found, use NCBI eUtils (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html) to acquire GenBank (Benson et al., 2008) files and extract coding sequence coordinates. If any gene is not found in this step, halt the execution. Define regions of interest (ROIs) by extending all coding regions by a number of bases on each flank, and merging overlapping ROIs. Use the fastacmd program to retrieve reference sequence for ROIs and flanking regions from a local Blast database file (http://www.ncbi.nlm.nih.gov/blast). Extract SNP data for target regions from local SNP data file containing an extract of dbSNP (Sherry et al., 2001). Collate sequence and SNP data by creating a copy of the target file with SNP data included using degeneracy symbols. Select a reaction combination and generate a list of all accepted fragments for the combination using the PieceMaker API (application programming interface) (Stenberg et al., 2005a). Select a subset of the generated fragments, retaining optimal coverage of the targeted regions using the PieceMaker API. Generate an amplicon output file listing the circularized parts of the fragments of the selected subset, including their genomic coordinates. Assemble selector probe sequences using the ProbeMaker (Stenberg et al., 2005b) command line interface. Generate consolidated output files to provide overview of the design. The results of a design job are affected by the design parameters and the pool of restriction reactions to choose from. The design parameters determine, among other things, acceptable fragment lengths and the length of the ends of the selector probes. Reactions, which may contain one or more restriction enzymes, are picked from a pool of reactions defined by the user. The size and contents of this pool, and the maximum number of reactions allowed, will affect the design results as measured by the portion of the ROIs that is covered by selected fragments. The likelihood of finding acceptable fragments that cover all ROIs will increase with the size of the pool, and with the number of reactions picked. Also, the time required to generate and evaluate fragments will increase linearly with the number of reactions, while the time required to select a reaction combination will increase with the number of possible reaction combinations, which depends on both the number of reactions in the pool, and the maximum number of reactions allowed. To demonstrate the performance of the software, we designed a set of selector probe sequences for the coding regions of the 206 genes on chromosome 18 present in the CCDS data. We used similar design settings as in our previous work (Dahl et al., 2005), allowing selected fragments of length 100–200 bases, with a max flap length of 500. We allowed a combination of up to three reactions to be picked from a pool of 120 reactions. This design took 42 min on a 2.4 GHz Intel Quad-core processor computer running 64-bit openSUSE 10.3 with a 32-bit Java virtual machine, and resulted in 90.7% coverage of the targeted region using 5519 selector probes. The results are provided on the project web site. To investigate the effects of using a smaller reaction pool, we performed the design using a randomly selected subset of 24 reactions from the pool of 120 reactions. This was completed in 8 min, yielding a slightly smaller coverage of 89.2%. Another run, picking up to six reactions from the smaller pool, resulted in 98.2% coverage and took 34 min. We estimate that a design allowing six reactions to be picked from the larger pool would yield close to full coverage, but would take several months to complete on a typical workstation. The conclusion is that for each design, it is reasonable to do some form of prescreening of enzymes and reactions, for example by analyzing the length distribution of generated fragments, before running the design pipeline. Disperse depends on external data sources, programs and libraries. To our knowledge, all required data sources are freely available, as are the external dependencies, with the exception of the PieceMaker program (version 1.3.2) which is freely available for non-commercial users only. Instructions for acquiring PieceMaker are available on the Disperse web site. Disperse is written in Java and Perl, and has been tested on machines running openSUSE Linux, Gentoo Linux, Windows XP and MacOS X, using Perl 5 and Java 1.6.
  11 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

Review 2.  Advanced sequencing technologies: methods and goals.

Authors:  Jay Shendure; Robi D Mitra; Chris Varma; George M Church
Journal:  Nat Rev Genet       Date:  2004-05       Impact factor: 53.242

3.  Multigene amplification and massively parallel sequencing for cancer mutation discovery.

Authors:  Fredrik Dahl; Johan Stenberg; Simon Fredriksson; Katrina Welch; Michael Zhang; Mats Nilsson; David Bicknell; Walter F Bodmer; Ronald W Davis; Hanlee Ji
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-17       Impact factor: 11.205

4.  Microarray-based genomic selection for high-throughput resequencing.

Authors:  David T Okou; Karyn Meltz Steinberg; Christina Middle; David J Cutler; Thomas J Albert; Michael E Zwick
Journal:  Nat Methods       Date:  2007-10-14       Impact factor: 28.547

5.  Direct selection of human genomic loci by microarray hybridization.

Authors:  Thomas J Albert; Michael N Molla; Donna M Muzny; Lynne Nazareth; David Wheeler; Xingzhi Song; Todd A Richmond; Chris M Middle; Matthew J Rodesch; Charles J Packard; George M Weinstock; Richard A Gibbs
Journal:  Nat Methods       Date:  2007-10-14       Impact factor: 28.547

6.  Multiplex amplification of large sets of human exons.

Authors:  Gregory J Porreca; Kun Zhang; Jin Billy Li; Bin Xie; Derek Austin; Sara L Vassallo; Emily M LeProust; Bill J Peck; Christopher J Emig; Fredrik Dahl; Yuan Gao; George M Church; Jay Shendure
Journal:  Nat Methods       Date:  2007-10-14       Impact factor: 28.547

7.  ProbeMaker: an extensible framework for design of sets of oligonucleotide probes.

Authors:  Johan Stenberg; Mats Nilsson; Ulf Landegren
Journal:  BMC Bioinformatics       Date:  2005-09-19       Impact factor: 3.169

8.  Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragments.

Authors:  Fredrik Dahl; Mats Gullberg; Johan Stenberg; Ulf Landegren; Mats Nilsson
Journal:  Nucleic Acids Res       Date:  2005-04-28       Impact factor: 16.971

9.  PieceMaker: selection of DNA fragments for selector-guided multiplex amplification.

Authors:  Johan Stenberg; Fredrik Dahl; Ulf Landegren; Mats Nilsson
Journal:  Nucleic Acids Res       Date:  2005-04-28       Impact factor: 16.971

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2007-12-11       Impact factor: 16.971

View more
  5 in total

1.  Targeted resequencing of candidate genes using selector probes.

Authors:  H Johansson; M Isaksson; E Falk Sörqvist; F Roos; J Stenberg; T Sjöblom; J Botling; P Micke; K Edlund; S Fredriksson; H Göransson Kultima; Olle Ericsson; Mats Nilsson
Journal:  Nucleic Acids Res       Date:  2010-11-08       Impact factor: 16.971

2.  A flexible approach for highly multiplexed candidate gene targeted resequencing.

Authors:  Georges Natsoulis; John M Bell; Hua Xu; Jason D Buenrostro; Heather Ordonez; Susan Grimes; Daniel Newburger; Michael Jensen; Jacob M Zahn; Nancy Zhang; Hanlee P Ji
Journal:  PLoS One       Date:  2011-06-30       Impact factor: 3.240

3.  Diagnostics of primary immunodeficiency diseases: a sequencing capture approach.

Authors:  Lotte N Moens; Elin Falk-Sörqvist; A Charlotta Asplund; Ewa Bernatowska; C I Edvard Smith; Mats Nilsson
Journal:  PLoS One       Date:  2014-12-11       Impact factor: 3.240

4.  Automated genotyping of biobank samples by multiplex amplification of insertion/deletion polymorphisms.

Authors:  Lucy Mathot; Elin Falk-Sörqvist; Lotte Moens; Marie Allen; Tobias Sjöblom; Mats Nilsson
Journal:  PLoS One       Date:  2012-12-27       Impact factor: 3.240

5.  New lung cancer panel for high-throughput targeted resequencing.

Authors:  Eun-Hye Kim; Sunghoon Lee; Jongsun Park; Kyusang Lee; Jong Bhak; Byung Chul Kim
Journal:  Genomics Inform       Date:  2014-06-30
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.