Literature DB >> 15585119

Parallel hash-based EST clustering algorithm for gene sequencing.

R Mudhireddy1, F Ercal, R Frank.   

Abstract

EST clustering is a simple, yet effective method to discover all the genes present in a variety of species. Although using ESTs is a cost-effective approach in gene discovery, the amount of data, and hence the computational resources required, make it a very challenging problem. Time and storage requirements for EST clustering problems are prohibitively expensive. Existing tools have quadratic time complexity resulting from all against all sequence comparisons. With the rapid growth of EST data we need better and faster clustering tools. In this paper, we present HECT (Hash based EST Clustering Tool), a novel time- and memory-efficient algorithm for EST clustering. We report that HECT can cluster a 10,000 Human EST dataset (which is also used in benchmarking d2_cluster), in 207 minutes on a 1 GHz Pentium III processor which is 36 times faster than the original d2_cluster algorithm. A parallel version of HECT (PECT) is also developed and used to cluster 269,035 soybean EST sequences on IA-32 Linux cluster at National Center for Supercomputing Applications at UIUC. The parallel algorithm exhibited excellent speedup over its sequential counterpart and its memory requirements are almost negligible making it suitable to run virtually on any data size. The performance of the proposed clustering algorithms is compared against other known clustering techniques and results are reported in the paper.

Entities:  

Mesh:

Year:  2004        PMID: 15585119     DOI: 10.1089/dna.2004.23.615

Source DB:  PubMed          Journal:  DNA Cell Biol        ISSN: 1044-5498            Impact factor:   3.311


  2 in total

1.  Evaluation of Glycine max mRNA clusters.

Authors:  Ronald L Frank; Fikret Ercal
Journal:  BMC Bioinformatics       Date:  2005-07-15       Impact factor: 3.169

2.  An automated method for rapid identification of putative gene family members in plants.

Authors:  Ronald L Frank; Ajay Mane; Fikret Ercal
Journal:  BMC Bioinformatics       Date:  2006-09-06       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.