Literature DB >> 9682056

A set-theoretic approach to database searching and clustering.

A Krause1, M Vingron.   

Abstract

MOTIVATION: In this paper, we introduce an iterative method of database searching and apply it to design a database clustering algorithm applicable to an entire protein database. The clustering procedure relies on the quality of the database searching routine and further improves its results based on a set-theoretic analysis of a highly redundant yet efficient to generate cluster system.
RESULTS: Overall, we achieve unambiguous assignment of 80% of SWISS-PROT sequences to non-overlapping sequence clusters in an entirely automatic fashion. Our results are compared to an expert-generated clustering for validation. The database searching method is fast and the clustering technique does not require time-consuming all-against-all comparison. This allows for fast clustering of large amounts of sequences. AVAILABILITY: The resulting clustering for the PIR1 (Release 51) and SWISS-PROT (Release 34) databases is available over the Internet from http://www.dkfz-heidelberg.de/tbi/services/modest/b rowsesysters.pl. CONTACT: a.krause@dkfz-heidelberg.de; m.vingron@dkfz-heidelberg.de

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9682056     DOI: 10.1093/bioinformatics/14.5.430

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors:  G Yona; N Linial; M Linial
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The SYSTERS protein sequence cluster set.

Authors:  A Krause; J Stoye; M Vingron
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein.

Authors:  Antje Krause; Stefan A Haas; Eivind Coward; Martin Vingron
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 4.  Substitution scoring matrices for proteins - An overview.

Authors:  Rakesh Trivedi; Hampapathalu Adimurthy Nagarajaram
Journal:  Protein Sci       Date:  2020-10-12       Impact factor: 6.725

5.  Genome-wide comparative gene family classification.

Authors:  Christian Frech; Nansheng Chen
Journal:  PLoS One       Date:  2010-10-15       Impact factor: 3.240

6.  Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.

Authors:  Qicheng Ma; Gung-Wei Chirn; Richard Cai; Joseph D Szustakowski; N R Nirmala
Journal:  BMC Bioinformatics       Date:  2005-10-03       Impact factor: 3.169

Review 7.  Functional classification using phylogenomic inference.

Authors:  Duncan Brown; Kimmen Sjölander
Journal:  PLoS Comput Biol       Date:  2006-06-30       Impact factor: 4.475

8.  Large scale hierarchical clustering of protein sequences.

Authors:  Antje Krause; Jens Stoye; Martin Vingron
Journal:  BMC Bioinformatics       Date:  2005-01-22       Impact factor: 3.169

9.  PFASUM: a substitution matrix from Pfam structural alignments.

Authors:  Frank Keul; Martin Hess; Michael Goesele; Kay Hamacher
Journal:  BMC Bioinformatics       Date:  2017-06-05       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.