Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

Literature DB >> 22942020

Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

Abstract

MOTIVATION: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed.
RESULTS: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. AVAILABILITY: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

Mesh：

Substances：
Proteins

Year: 2012 PMID： 22942020 PMCID： PMC3476341 DOI： 10.1093/bioinformatics/bts533

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

33 in total

1. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors: Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal: Nat Methods Date: 2011-12-25 Impact factor: 28.547

2. Inference of macromolecular assemblies from crystalline state.

Authors: Evgeny Krissinel; Kim Henrick
Journal: J Mol Biol Date: 2007-05-13 Impact factor: 5.469

3. SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling.

Authors: Qiang Wang; Adrian A Canutescu; Roland L Dunbrack
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

4. Statistical analysis of interface similarity in crystals of homologous proteins.

Authors: Qifang Xu; Adrian A Canutescu; Guoli Wang; Maxim Shapovalov; Zoran Obradovic; Roland L Dunbrack
Journal: J Mol Biol Date: 2008-06-07 Impact factor: 5.469

5. The protein common interface database (ProtCID)--a comprehensive database of interactions of homologous proteins in multiple crystal forms.

Authors: Qifang Xu; Roland L Dunbrack
Journal: Nucleic Acids Res Date: 2010-10-29 Impact factor: 16.971

6. The Pfam protein families database.

Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971

7. Powerful fusion: PSI-BLAST and consensus sequences.

Authors: Dariusz Przybylski; Burkhard Rost
Journal: Bioinformatics Date: 2008-08-04 Impact factor: 6.937

8. Comparative mapping of sequence-based and structure-based protein domains.

Authors: Ya Zhang; John-Marc Chandonia; Chris Ding; Stephen R Holbrook
Journal: BMC Bioinformatics Date: 2005-03-25 Impact factor: 3.169

9. InterPro: the integrative protein signature database.

Authors: Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

10. Characterization of protein hubs by inferring interacting motifs from protein interactions.

Authors: Ramon Aragues; Andrej Sali; Jaume Bonet; Marc A Marti-Renom; Baldo Oliva
Journal: PLoS Comput Biol Date: 2007-07-30 Impact factor: 4.475

26 in total

Review 1. Histone methyltransferases: novel targets for tumor and developmental defects.

Authors: Xin Yi; Xue-Jun Jiang; Xiao-Yan Li; Ding-Sheng Jiang
Journal: Am J Transl Res Date: 2015-11-15 Impact factor: 4.060

2. Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases.

Authors: Qifang Xu; Kimberly L Malecka; Lauren Fink; E Joseph Jordan; Erin Duffy; Samuel Kolander; Jeffrey R Peterson; Roland L Dunbrack
Journal: Sci Signal Date: 2015-12-01 Impact factor: 8.192

3. The origin of CDR H3 structural diversity.

Authors: Brian D Weitzner; Roland L Dunbrack; Jeffrey J Gray
Journal: Structure Date: 2015-01-08 Impact factor: 5.006

4. Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection.

Authors: Gayatri Kumar; Narayanaswamy Srinivasan; Sankaran Sandhya
Journal: Methods Mol Biol Date: 2022

Review 5. Orchestrating copper binding: structure and variations on the cupredoxin fold.

Authors: Jing Guo; Oriana S Fisher
Journal: J Biol Inorg Chem Date: 2022-08-22 Impact factor: 3.862

6. Biological function derived from predicted structures in CASP11.

Authors: Peter J Huwe; Qifang Xu; Maxim V Shapovalov; Vivek Modi; Mark D Andrake; Roland L Dunbrack
Journal: Proteins Date: 2016-06-15

7. Charge asymmetry in the proteins of the outer membrane.

Authors: Joanna S G Slusky; Roland L Dunbrack
Journal: Bioinformatics Date: 2013-06-19 Impact factor: 6.937

8. The study of homology between tumor progression genes and members of retroviridae as a tool to predict target-directed therapy failure.

Authors: Janaina Fernandes
Journal: Front Pharmacol Date: 2015-05-01 Impact factor: 5.810

9. CDD: conserved domains and protein three-dimensional structure.

Authors: Aron Marchler-Bauer; Chanjuan Zheng; Farideh Chitsaz; Myra K Derbyshire; Lewis Y Geer; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Christopher J Lanczycki; Fu Lu; Shennan Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Dachuan Zhang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2012-11-28 Impact factor: 16.971

10. SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.

Authors: Sameer Velankar; José M Dana; Julius Jacobsen; Glen van Ginkel; Paul J Gane; Jie Luo; Thomas J Oldfield; Claire O'Donovan; Maria-Jesus Martin; Gerard J Kleywegt
Journal: Nucleic Acids Res Date: 2012-11-29 Impact factor: 16.971