Literature DB >> 15059835

Efficient selection of unique and popular oligos for large EST databases.

Jie Zheng1, Timothy J Close, Tao Jiang, Stefano Lonardi.   

Abstract

MOTIVATION: Expressed sequence tag (EST) databases have grown exponentially in recent years and now represent the largest collection of genetic sequences. An important application of these databases is that they contain information useful for the design of gene-specific oligonucleotides (or simply, oligos) that can be used in PCR primer design, microarray experiments and genomic library screening.
RESULTS: In this paper, we study two complementary problems concerning the selection of short oligos, e.g. 20-50 bases, from a large database of tens of thousands of ESTs: (i) selection of oligos each of which appears (exactly) in one unigene but does not appear (exactly or approximately) in any other unigene and (ii) selection of oligos that appear (exactly or approximately) in many unigenes. The first problem is called the unique oligo problem and has applications in PCR primer and microarray probe designs, and library screening for gene-rich clones. The second is called the popular oligo problem and is also useful in screening genomic libraries. We present an efficient algorithm to identify all unique oligos in the unigenes and an efficient heuristic algorithm to enumerate the most popular oligos. By taking into account the distribution of the frequencies of the words in the unigene database, the algorithms have been engineered carefully to achieve remarkable running times on regular PCs. Each of the algorithms takes only a couple of hours (on a 1.2 GHz CPU, 1 GB RAM machine) to run on a dataset 28 Mb of barley unigenes from the HarvEST database. We present simulation results on the synthetic data and a preliminary analysis of the barley unigene database. AVAILABILITY: Available on request from the authors.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15059835     DOI: 10.1093/bioinformatics/bth210

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  A parallel and incremental algorithm for efficient unique signature discovery on DNA databases.

Authors:  Hsiao Ping Lee; Tzu-Fang Sheu; Chuan Yi Tang
Journal:  BMC Bioinformatics       Date:  2010-03-16       Impact factor: 3.169

2.  Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases.

Authors:  Manrique Mata-Montero; Nabil Shalaby; Bradley Sheppard
Journal:  Adv Bioinformatics       Date:  2013-04-08

3.  OligoSpawn: a software tool for the design of overgo probes from large unigene datasets.

Authors:  Jie Zheng; Jan T Svensson; Kavitha Madishetty; Timothy J Close; Tao Jiang; Stefano Lonardi
Journal:  BMC Bioinformatics       Date:  2006-01-09       Impact factor: 3.169

4.  GENOMEMASKER package for designing unique genomic PCR primers.

Authors:  Reidar Andreson; Eric Reppo; Lauris Kaplinski; Maido Remm
Journal:  BMC Bioinformatics       Date:  2006-03-27       Impact factor: 3.169

5.  Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork.

Authors:  Arnis Druka; Ilze Druka; Arthur G Centeno; Hongqiang Li; Zhaohui Sun; William T B Thomas; Nicola Bonar; Brian J Steffenson; Steven E Ullrich; Andris Kleinhofs; Roger P Wise; Timothy J Close; Elena Potokina; Zewei Luo; Carola Wagner; Günther F Schweizer; David F Marshall; Michael J Kearsey; Robert W Williams; Robbie Waugh
Journal:  BMC Genet       Date:  2008-11-18       Impact factor: 2.797

6.  An algorithm of discovering signatures from DNA databases on a computer cluster.

Authors:  Hsiao Ping Lee; Tzu-Fang Sheu
Journal:  BMC Bioinformatics       Date:  2014-10-05       Impact factor: 3.169

7.  HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

Authors:  Ramin Karimi; Andras Hajdu
Journal:  Evol Bioinform Online       Date:  2016-02-10       Impact factor: 1.625

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.