Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A set-theoretic approach to database searching and clustering.

Literature DB >> 9682056

A set-theoretic approach to database searching and clustering.

Abstract

MOTIVATION: In this paper, we introduce an iterative method of database searching and apply it to design a database clustering algorithm applicable to an entire protein database. The clustering procedure relies on the quality of the database searching routine and further improves its results based on a set-theoretic analysis of a highly redundant yet efficient to generate cluster system.
RESULTS: Overall, we achieve unambiguous assignment of 80% of SWISS-PROT sequences to non-overlapping sequence clusters in an entirely automatic fashion. Our results are compared to an expert-generated clustering for validation. The database searching method is fast and the clustering technique does not require time-consuming all-against-all comparison. This allows for fast clustering of large amounts of sequences. AVAILABILITY: The resulting clustering for the PIR1 (Release 51) and SWISS-PROT (Release 34) databases is available over the Internet from http://www.dkfz-heidelberg.de/tbi/services/modest/b rowsesysters.pl. CONTACT: a.krause@dkfz-heidelberg.de; m.vingron@dkfz-heidelberg.de

Entities: Gene

Mesh：

Substances：
Proteins

Year: 1998 PMID： 9682056 DOI： 10.1093/bioinformatics/14.5.430

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

9 in total

1. ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors: G Yona; N Linial; M Linial
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The SYSTERS protein sequence cluster set.

Authors: A Krause; J Stoye; M Vingron
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein.

Authors: Antje Krause; Stefan A Haas; Eivind Coward; Martin Vingron
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

Review 4. Substitution scoring matrices for proteins - An overview.

Authors: Rakesh Trivedi; Hampapathalu Adimurthy Nagarajaram
Journal: Protein Sci Date: 2020-10-12 Impact factor: 6.725

5. Genome-wide comparative gene family classification.

Authors: Christian Frech; Nansheng Chen
Journal: PLoS One Date: 2010-10-15 Impact factor: 3.240

6. Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.

Authors: Qicheng Ma; Gung-Wei Chirn; Richard Cai; Joseph D Szustakowski; N R Nirmala
Journal: BMC Bioinformatics Date: 2005-10-03 Impact factor: 3.169