Literature DB >> 15768405

Identification and distribution of protein families in 120 completed genomes using Gene3D.

David Lee1, Alastair Grant, Russell L Marsden, Christine Orengo.   

Abstract

Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/. Copyright 2005 Wiley-Liss, Inc.

Mesh:

Substances:

Year:  2005        PMID: 15768405     DOI: 10.1002/prot.20409

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  18 in total

1.  Target selection for structural genomics based on combining fold recognition and crystallisation prediction methods: application to the human proteome.

Authors:  James E Bray
Journal:  J Struct Funct Genomics       Date:  2012-02-22

Review 2.  Exploiting protein structure data to explore the evolution of protein function and biological complexity.

Authors:  Russell L Marsden; Juan A G Ranea; Antonio Sillero; Oliver Redfern; Corin Yeats; Michael Maibaum; David Lee; Sarah Addou; Gabrielle A Reeves; Timothy J Dallman; Christine A Orengo
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

3.  Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering.

Authors:  Xiao-Hui Cai; Lukasz Jaroszewski; John Wooley; Adam Godzik
Journal:  Proteins       Date:  2011-05-31

Review 4.  The evolutionary origin of orphan genes.

Authors:  Diethard Tautz; Tomislav Domazet-Lošo
Journal:  Nat Rev Genet       Date:  2011-08-31       Impact factor: 53.242

5.  The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.

Authors:  Alison Cuff; Oliver C Redfern; Lesley Greene; Ian Sillitoe; Tony Lewis; Mark Dibley; Adam Reid; Frances Pearl; Tim Dallman; Annabel Todd; Richard Garratt; Janet Thornton; Christine Orengo
Journal:  Structure       Date:  2009-08-12       Impact factor: 5.006

6.  Genome-wide comparative gene family classification.

Authors:  Christian Frech; Nansheng Chen
Journal:  PLoS One       Date:  2010-10-15       Impact factor: 3.240

Review 7.  The impact of structural genomics: the first quindecennial.

Authors:  Marek Grabowski; Ewa Niedzialkowska; Matthew D Zimmerman; Wladek Minor
Journal:  J Struct Funct Genomics       Date:  2016-03-02

Review 8.  PSI-2: structural genomics to cover protein domain family space.

Authors:  Benoît H Dessailly; Rajesh Nair; Lukasz Jaroszewski; J Eduardo Fajardo; Andrei Kouranov; David Lee; Andras Fiser; Adam Godzik; Burkhard Rost; Christine Orengo
Journal:  Structure       Date:  2009-06-10       Impact factor: 5.006

9.  Scaling properties of protein family phylogenies.

Authors:  Alejandro Herrada; Víctor M Eguíluz; Emilio Hernández-García; Carlos M Duarte
Journal:  BMC Evol Biol       Date:  2011-06-06       Impact factor: 3.260

10.  Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space.

Authors:  Russell L Marsden; David Lee; Michael Maibaum; Corin Yeats; Christine A Orengo
Journal:  Nucleic Acids Res       Date:  2006-02-15       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.