Literature DB >> 11917018

An efficient algorithm for large-scale detection of protein families.

A J Enright1, S Van Dongen, C A Ouzounis.   

Abstract

Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 11917018      PMCID: PMC101833          DOI: 10.1093/nar/30.7.1575

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  47 in total

1.  An insight into domain combinations.

Authors:  G Apic; J Gough; S A Teichmann
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

2.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes.

Authors:  G Apic; J Gough; S A Teichmann
Journal:  J Mol Biol       Date:  2001-07-06       Impact factor: 5.469

Review 3.  The emergence of major cellular processes in evolution.

Authors:  C Ouzounis; N Kyrpides
Journal:  FEBS Lett       Date:  1996-07-22       Impact factor: 4.124

Review 4.  Hidden Markov models.

Authors:  S R Eddy
Journal:  Curr Opin Struct Biol       Date:  1996-06       Impact factor: 6.809

Review 5.  Aspects of molecular evolution.

Authors:  W M Fitch
Journal:  Annu Rev Genet       Date:  1973       Impact factor: 16.830

Review 6.  The multiplicity of domains in proteins.

Authors:  R F Doolittle
Journal:  Annu Rev Biochem       Date:  1995       Impact factor: 23.643

Review 7.  Eukaryotes have "two-component" signal transducers.

Authors:  C Chang; E M Meyerowitz
Journal:  Res Microbiol       Date:  1994 Jun-Aug       Impact factor: 3.992

Review 8.  Evolutionarily mobile modules in proteins.

Authors:  R F Doolittle; P Bork
Journal:  Sci Am       Date:  1993-10       Impact factor: 2.142

9.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

10.  Strain-specific genes of Helicobacter pylori: distribution, function and dynamics.

Authors:  P J Janssen; B Audit; C A Ouzounis
Journal:  Nucleic Acids Res       Date:  2001-11-01       Impact factor: 16.971

View more
  1402 in total

1.  A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis.

Authors:  David Burstein; Sven B Gould; Verena Zimorski; Thorsten Kloesges; Fuat Kiosse; Peter Major; William F Martin; Tal Pupko; Tal Dagan
Journal:  Eukaryot Cell       Date:  2011-12-02

2.  Organismal complexity, protein complexity, and gene duplicability.

Authors:  Jing Yang; Richard Lusk; Wen-Hsiung Li
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-05       Impact factor: 11.205

3.  Genome evolution reveals biochemical networks and functional modules.

Authors:  Christian von Mering; Evgeny M Zdobnov; Sophia Tsoka; Francesca D Ciccarelli; Jose B Pereira-Leal; Christos A Ouzounis; Peer Bork
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-12       Impact factor: 11.205

4.  EyeSite: a semi-automated database of protein families in the eye.

Authors:  David A Lee; Sandrine Fefeu; Adrian A Edo-Ukeh; Christine A Orengo; Christine Slingsby
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Genetic relatedness of the Streptococcus pneumoniae capsular biosynthetic loci.

Authors:  Angeliki Mavroidi; David M Aanensen; Daniel Godoy; Ian C Skovsted; Margit S Kaltoft; Peter R Reeves; Stephen D Bentley; Brian G Spratt
Journal:  J Bacteriol       Date:  2007-08-31       Impact factor: 3.490

6.  A complex-based reconstruction of the Saccharomyces cerevisiae interactome.

Authors:  Haidong Wang; Boyko Kakaradov; Sean R Collins; Lena Karotki; Dorothea Fiedler; Michael Shales; Kevan M Shokat; Tobias C Walther; Nevan J Krogan; Daphne Koller
Journal:  Mol Cell Proteomics       Date:  2009-01-27       Impact factor: 5.911

7.  Two Rumex species from contrasting hydrological niches regulate flooding tolerance through distinct mechanisms.

Authors:  Hans van Veen; Angelika Mustroph; Gregory A Barding; Marleen Vergeer-van Eijk; Rob A M Welschen-Evertman; Ole Pedersen; Eric J W Visser; Cynthia K Larive; Ronald Pierik; Julia Bailey-Serres; Laurentius A C J Voesenek; Rashmi Sasidharan
Journal:  Plant Cell       Date:  2013-11-27       Impact factor: 11.277

8.  Génolevures: comparative genomics and molecular evolution of hemiascomycetous yeasts.

Authors:  David Sherman; Pascal Durrens; Emmanuelle Beyne; Macha Nikolski; Jean-Luc Souciet
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

9.  MIPS: analysis and annotation of proteins from whole genomes.

Authors:  H W Mewes; C Amid; R Arnold; D Frishman; U Güldener; G Mannhaupt; M Münsterkötter; P Pagel; N Strack; V Stümpflen; J Warfsmann; A Ruepp
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

10.  PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species.

Authors:  Derrick E Fouts; Lauren Brinkac; Erin Beck; Jason Inman; Granger Sutton
Journal:  Nucleic Acids Res       Date:  2012-08-16       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.