Literature DB >> 22942020

Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

Qifang Xu1, Roland L Dunbrack.   

Abstract

MOTIVATION: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed.
RESULTS: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. AVAILABILITY: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

Mesh:

Substances:

Year:  2012        PMID: 22942020      PMCID: PMC3476341          DOI: 10.1093/bioinformatics/bts533

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  33 in total

1.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

2.  Inference of macromolecular assemblies from crystalline state.

Authors:  Evgeny Krissinel; Kim Henrick
Journal:  J Mol Biol       Date:  2007-05-13       Impact factor: 5.469

3.  SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling.

Authors:  Qiang Wang; Adrian A Canutescu; Roland L Dunbrack
Journal:  Nat Protoc       Date:  2008       Impact factor: 13.491

4.  Statistical analysis of interface similarity in crystals of homologous proteins.

Authors:  Qifang Xu; Adrian A Canutescu; Guoli Wang; Maxim Shapovalov; Zoran Obradovic; Roland L Dunbrack
Journal:  J Mol Biol       Date:  2008-06-07       Impact factor: 5.469

5.  The protein common interface database (ProtCID)--a comprehensive database of interactions of homologous proteins in multiple crystal forms.

Authors:  Qifang Xu; Roland L Dunbrack
Journal:  Nucleic Acids Res       Date:  2010-10-29       Impact factor: 16.971

6.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

7.  Powerful fusion: PSI-BLAST and consensus sequences.

Authors:  Dariusz Przybylski; Burkhard Rost
Journal:  Bioinformatics       Date:  2008-08-04       Impact factor: 6.937

8.  Comparative mapping of sequence-based and structure-based protein domains.

Authors:  Ya Zhang; John-Marc Chandonia; Chris Ding; Stephen R Holbrook
Journal:  BMC Bioinformatics       Date:  2005-03-25       Impact factor: 3.169

9.  InterPro: the integrative protein signature database.

Authors:  Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

10.  Characterization of protein hubs by inferring interacting motifs from protein interactions.

Authors:  Ramon Aragues; Andrej Sali; Jaume Bonet; Marc A Marti-Renom; Baldo Oliva
Journal:  PLoS Comput Biol       Date:  2007-07-30       Impact factor: 4.475

View more
  26 in total

Review 1.  Histone methyltransferases: novel targets for tumor and developmental defects.

Authors:  Xin Yi; Xue-Jun Jiang; Xiao-Yan Li; Ding-Sheng Jiang
Journal:  Am J Transl Res       Date:  2015-11-15       Impact factor: 4.060

2.  Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases.

Authors:  Qifang Xu; Kimberly L Malecka; Lauren Fink; E Joseph Jordan; Erin Duffy; Samuel Kolander; Jeffrey R Peterson; Roland L Dunbrack
Journal:  Sci Signal       Date:  2015-12-01       Impact factor: 8.192

3.  The origin of CDR H3 structural diversity.

Authors:  Brian D Weitzner; Roland L Dunbrack; Jeffrey J Gray
Journal:  Structure       Date:  2015-01-08       Impact factor: 5.006

4.  Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection.

Authors:  Gayatri Kumar; Narayanaswamy Srinivasan; Sankaran Sandhya
Journal:  Methods Mol Biol       Date:  2022

Review 5.  Orchestrating copper binding: structure and variations on the cupredoxin fold.

Authors:  Jing Guo; Oriana S Fisher
Journal:  J Biol Inorg Chem       Date:  2022-08-22       Impact factor: 3.862

6.  Biological function derived from predicted structures in CASP11.

Authors:  Peter J Huwe; Qifang Xu; Maxim V Shapovalov; Vivek Modi; Mark D Andrake; Roland L Dunbrack
Journal:  Proteins       Date:  2016-06-15

7.  Charge asymmetry in the proteins of the outer membrane.

Authors:  Joanna S G Slusky; Roland L Dunbrack
Journal:  Bioinformatics       Date:  2013-06-19       Impact factor: 6.937

8.  The study of homology between tumor progression genes and members of retroviridae as a tool to predict target-directed therapy failure.

Authors:  Janaina Fernandes
Journal:  Front Pharmacol       Date:  2015-05-01       Impact factor: 5.810

9.  CDD: conserved domains and protein three-dimensional structure.

Authors:  Aron Marchler-Bauer; Chanjuan Zheng; Farideh Chitsaz; Myra K Derbyshire; Lewis Y Geer; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Christopher J Lanczycki; Fu Lu; Shennan Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Dachuan Zhang; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

10.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.

Authors:  Sameer Velankar; José M Dana; Julius Jacobsen; Glen van Ginkel; Paul J Gane; Jie Luo; Thomas J Oldfield; Claire O'Donovan; Maria-Jesus Martin; Gerard J Kleywegt
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.