Literature DB >> 21984769

KABOOM! A new suffix array based algorithm for clustering expression data.

Scott Hazelhurst1, Zsuzsanna Lipták.   

Abstract

MOTIVATION: Second-generation sequencing technology has reinvigorated research using expression data, and clustering such data remains a significant challenge, with much larger datasets and with different error profiles. Algorithms that rely on all-versus-all comparison of sequences are not practical for large datasets.
RESULTS: We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time. AVAILABILITY: Source code and binaries available under GPL at http://code.google.com/p/wcdest. Runs on Linux and MacOS X. CONTACT: scott.hazelhurst@wits.ac.za; zsuzsa@cebitec.uni-bielefeld.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2011        PMID: 21984769     DOI: 10.1093/bioinformatics/btr560

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.

Authors:  Felipe A Louza; Guilherme P Telles; Simon Gog; Nicola Prezza; Giovanna Rosone
Journal:  Algorithms Mol Biol       Date:  2020-09-22       Impact factor: 1.405

2.  Ultrafast clustering algorithms for metagenomic sequence analysis.

Authors:  Weizhong Li; Limin Fu; Beifang Niu; Sitao Wu; John Wooley
Journal:  Brief Bioinform       Date:  2012-07-06       Impact factor: 11.622

3.  A bioinformatician's guide to the forefront of suffix array construction algorithms.

Authors:  Anish Man Singh Shrestha; Martin C Frith; Paul Horton
Journal:  Brief Bioinform       Date:  2014-01-10       Impact factor: 11.622

4.  EasyCluster2: an improved tool for clustering and assembling long transcriptome reads.

Authors:  Vitoantonio Bevilacqua; Nicola Pietroleonardo; Ely Giannino; Fabio Stroppa; Domenico Simone; Graziano Pesole; Ernesto Picardi
Journal:  BMC Bioinformatics       Date:  2014-12-03       Impact factor: 3.169

5.  Large Differences in Gene Expression Responses to Drought and Heat Stress between Elite Barley Cultivar Scarlett and a Spanish Landrace.

Authors:  Carlos P Cantalapiedra; María J García-Pereira; María P Gracia; Ernesto Igartua; Ana M Casas; Bruno Contreras-Moreira
Journal:  Front Plant Sci       Date:  2017-05-01       Impact factor: 5.753

6.  A hybrid distance measure for clustering expressed sequence tags originating from the same gene family.

Authors:  Keng-Hoong Ng; Chin-Kuan Ho; Somnuk Phon-Amnuaisuk
Journal:  PLoS One       Date:  2012-10-11       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.