Literature DB >> 9545450

Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.

J Gracy1, P Argos.   

Abstract

MOTIVATION: Decomposing each protein into modular domains is a basic prerequisite to classify accurately structural units in biological molecules. Boundaries between domains are indicated by two similar amino acid sequence segments located within the same protein (repeats) or within homologous proteins at notably different distances from their respective N- or C-termini.
RESULTS: We have developed an automated method that combines such positional constraints derived from various detected pairwise sequence similarities to delineate the modular organization of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SWISS-PROT databanks and shared <60% sequence identity amongst pairs. The resultant clustering, delineation and multiple alignment of 24 380 sequence fragments yielded a new database of 4364 domain families. Comparison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundaries and multiple sequence alignments. The accuracy and sensitivity of the method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules. AVAILABILITY: The resulting database, called DOMO, is available through the database search routine SRS at Infobiogen (http://www.infobiogen.fr/srs5/), EBI (http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl-heidelberg.de/srs5/) World Wide Web sites. CONTACT: gracy@infobiogen.fr

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9545450     DOI: 10.1093/bioinformatics/14.2.174

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  The MetaFam Server: a comprehensive protein family resource.

Authors:  K A Silverstein; E Shoop; J E Johnson; A Kilian; J L Freeman; T M Kunau; I A Awad; M Mayer; E F Retzel
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Massive sequence comparisons as a help in annotating genomic sequences.

Authors:  A Louis; E Ollivier; J C Aude; J L Risler
Journal:  Genome Res       Date:  2001-07       Impact factor: 9.043

3.  DIAN: a novel algorithm for genome ontological classification.

Authors:  Y Pouliot; J Gao; Q J Su; G G Liu; X B Ling
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

4.  Automated de novo identification of repeat sequence families in sequenced genomes.

Authors:  Zhirong Bao; Sean R Eddy
Journal:  Genome Res       Date:  2002-08       Impact factor: 9.043

5.  Prediction of protein domain boundaries from sequence alone.

Authors:  Oxana V Galzitskaya; Bogdan S Melnik
Journal:  Protein Sci       Date:  2003-04       Impact factor: 6.725

6.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences.

Authors:  J Burke; D Davison; W Hide
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

7.  DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

Authors:  Jesse Eickholt; Xin Deng; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2011-02-01       Impact factor: 3.169

8.  Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Authors:  Ikuo Uchiyama
Journal:  Nucleic Acids Res       Date:  2006-01-25       Impact factor: 16.971

9.  MACHOS: Markov clusters of homologous subsequences.

Authors:  Simon Wong; Mark A Ragan
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

10.  Identifying foldable regions in protein sequence from the hydrophobic signal.

Authors:  Chi N I Pang; Kuang Lin; Merridee A Wouters; Jaap Heringa; Richard A George
Journal:  Nucleic Acids Res       Date:  2007-12-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.