Literature DB >> 11120681

CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts.

V J Promponas1, A J Enright, S Tsoka, D P Kreil, C Leroy, S Hamodrakas, C Sander, C A Ouzounis.   

Abstract

MOTIVATION: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions.
RESULTS: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions. AVAILABILITY: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 11120681     DOI: 10.1093/bioinformatics/16.10.915

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  70 in total

1.  An efficient algorithm for large-scale detection of protein families.

Authors:  A J Enright; S Van Dongen; C A Ouzounis
Journal:  Nucleic Acids Res       Date:  2002-04-01       Impact factor: 16.971

2.  Functional versatility and molecular diversity of the metabolic map of Escherichia coli.

Authors:  S Tsoka; C A Ouzounis
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

3.  The phylogenetic extent of metabolic enzymes and pathways.

Authors:  José Manuel Peregrin-Alvarez; Sophia Tsoka; Christos A Ouzounis
Journal:  Genome Res       Date:  2003-03       Impact factor: 9.043

4.  The phylogenetic diversity of eukaryotic transcription.

Authors:  Richard M R Coulson; Christos A Ouzounis
Journal:  Nucleic Acids Res       Date:  2003-01-15       Impact factor: 16.971

5.  Protein families and TRIBES in genome sequence space.

Authors:  Anton J Enright; Victor Kunin; Christos A Ouzounis
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

6.  Dictionary-driven protein annotation.

Authors:  Isidore Rigoutsos; Tien Huynh; Aris Floratos; Laxmi Parida; Daniel Platt
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

7.  PRED-GPCR: GPCR recognition and family classification server.

Authors:  P K Papasaikas; P G Bagos; Z I Litou; V J Promponas; S J Hamodrakas
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

8.  Automated metabolic reconstruction for Methanococcus jannaschii.

Authors:  Sophia Tsoka; David Simon; Christos A Ouzounis
Journal:  Archaea       Date:  2004-10       Impact factor: 3.273

9.  Lineage-specific partitions in archaeal transcription.

Authors:  Richard M R Coulson; Nathalie Touboul; Christos A Ouzounis
Journal:  Archaea       Date:  2007-05       Impact factor: 3.273

10.  Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum.

Authors:  Richard M R Coulson; Neil Hall; Christos A Ouzounis
Journal:  Genome Res       Date:  2004-07-15       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.