Literature DB >> 9545446

DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins.

J Park1, S A Teichmann.   

Abstract

MOTIVATION: Large-scale determination of relationships between the proteins produced by genome sequences is now common. All protein sequences are matched and those that have high match scores are clustered into families. In cases where the proteins are built of several domains or duplication modules, this can lead to misleading results. Consider the very simple example of three proteins: 1, formed by duplication modules A and B; 2, formed by duplication modules B' and C; and 3, formed by duplication modules C' and D. Duplication modules B and B' are homologous, as are C and C'. Matching the sequences of 1, 2 and 3 followed by simple single-linkage clustering would put all three in the same family, even though proteins 1 and 3 are not related. This is because the different parts of 2 match 1 and 3. This paper describes a procedure, DIVCLUS, that divides such complex clusters of partially related sequences into simple clusters that contain only related duplication modules. In the example just given, it would produce two groups of sequences: the first with domains B of sequence 1 and B of sequence 2, and the second with domain C of sequence 2 and C of sequence 3. DIVCLUS is part of a package called GEANFAMMER, for GEnome ANalysis and protein FAMily MakER. The package automates the detection of families of duplication modules from a protein sequence database.
RESULTS: DIVCLUS has been applied to the division of single-linkage clusters generated from the protein sequences of six completely sequenced bacterial genomes. Out of 12 013 genes in these six genomes, 4563 single- and multi-domain sequences formed 1071 complex clusters. Application of the DIVCLUS program resolved these clusters into 2113 clusters corresponding to single duplication modules. AVAILABILITY: The perl5 program and its documentation are available at the following address: http://www.mrc-lmb.cam.ac.uk/genomes/ and by anonymous ftp at ftp.mrc-lmb.cam.ac.uk in the directory /pub/genomes/Software/. CONTACT: sat@mrc-lmb.cam.ac.uk; jong@mrc-lmb. cam.ac.uk

Mesh:

Substances:

Year:  1998        PMID: 9545446     DOI: 10.1093/bioinformatics/14.2.144

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis.

Authors:  Yong-Li Xiao; Mukesh Malik; Catherine A Whitelaw; Christopher D Town
Journal:  Plant Physiol       Date:  2002-12       Impact factor: 8.340

2.  Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome.

Authors:  S Balasubramanian; T Schneider; M Gerstein; L Regan
Journal:  Nucleic Acids Res       Date:  2000-08-15       Impact factor: 16.971

3.  A limited universe of membrane protein families and folds.

Authors:  Amit Oberai; Yungok Ihm; Sanguk Kim; James U Bowie
Journal:  Protein Sci       Date:  2006-07       Impact factor: 6.725

4.  Prediction of protein domain boundaries from sequence alone.

Authors:  Oxana V Galzitskaya; Bogdan S Melnik
Journal:  Protein Sci       Date:  2003-04       Impact factor: 6.725

5.  Genome of lumpy skin disease virus.

Authors:  E R Tulman; C L Afonso; Z Lu; L Zsak; G F Kutish; D L Rock
Journal:  J Virol       Date:  2001-08       Impact factor: 5.103

6.  The genome of swinepox virus.

Authors:  C L Afonso; E R Tulman; Z Lu; L Zsak; F A Osorio; C Balinsky; G F Kutish; D L Rock
Journal:  J Virol       Date:  2002-01       Impact factor: 5.103

7.  Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae.

Authors:  Oliver Jardine; Julian Gough; Cyrus Chothia; Sarah A Teichmann
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

8.  Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements.

Authors:  S A Teichmann; J Park; C Chothia
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

Authors:  Jesse Eickholt; Xin Deng; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2011-02-01       Impact factor: 3.169

10.  SECOM: a novel hash seed and community detection based-approach for genome-scale protein domain identification.

Authors:  Ming Fan; Ka-Chun Wong; Taewoo Ryu; Timothy Ravasi; Xin Gao
Journal:  PLoS One       Date:  2012-06-28       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.