Literature DB >> 18313074

Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets.

Hua Cheng1, Bong-Hyun Kim, Nick V Grishin.   

Abstract

A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.

Mesh:

Year:  2008        PMID: 18313074      PMCID: PMC4494761          DOI: 10.1016/j.jmb.2007.12.076

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  64 in total

1.  Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events.

Authors:  Y I Wolf; L Aravind; N V Grishin; E V Koonin
Journal:  Genome Res       Date:  1999-08       Impact factor: 9.043

2.  Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins.

Authors:  C P Ponting; R B Russell
Journal:  J Mol Biol       Date:  2000-10-06       Impact factor: 5.469

3.  The crystal structure of ZapA and its modulation of FtsZ polymerisation.

Authors:  Harry H Low; Martin C Moncrieffe; Jan Löwe
Journal:  J Mol Biol       Date:  2004-08-13       Impact factor: 5.469

Review 4.  The many faces of the helix-turn-helix domain: transcription regulation and beyond.

Authors:  L Aravind; Vivek Anantharaman; Santhanam Balaji; M Mohan Babu; Lakshminarayan M Iyer
Journal:  FEMS Microbiol Rev       Date:  2005-04       Impact factor: 16.408

5.  The structural alignment between two proteins: is there a unique answer?

Authors:  A Godzik
Journal:  Protein Sci       Date:  1996-07       Impact factor: 6.725

6.  Dictionary of recurrent domains in protein structures.

Authors:  L Holm; C Sander
Journal:  Proteins       Date:  1998-10-01

7.  DOM-fold: a structure with crossing loops found in DmpA, ornithine acetyltransferase, and molybdenum cofactor-binding domain.

Authors:  Hua Cheng; Nick V Grishin
Journal:  Protein Sci       Date:  2005-06-03       Impact factor: 6.725

8.  The crystal structure of Escherichia coli MoeA and its relationship to the multifunctional protein gephyrin.

Authors:  S Xiang; J Nichols; K V Rajagopalan; H Schindelin
Journal:  Structure       Date:  2001-04-04       Impact factor: 5.006

9.  Structure of the CAD domain of caspase-activated DNase and interaction with the CAD domain of its inhibitor.

Authors:  K Uegaki; T Otomo; H Sakahira; M Shimizu; N Yumoto; Y Kyogoku; S Nagata; T Yamazaki
Journal:  J Mol Biol       Date:  2000-04-14       Impact factor: 5.469

10.  The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like beta-grasp domains.

Authors:  Lakshminarayan M Iyer; A Maxwell Burroughs; L Aravind
Journal:  Genome Biol       Date:  2006       Impact factor: 13.583

View more
  12 in total

1.  Protein structure determination by exhaustive search of Protein Data Bank derived databases.

Authors:  Ian Stokes-Rees; Piotr Sliz
Journal:  Proc Natl Acad Sci U S A       Date:  2010-11-22       Impact factor: 11.205

2.  HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch.

Authors:  Mensur Dlakić
Journal:  Bioinformatics       Date:  2009-09-22       Impact factor: 6.937

3.  Defining and predicting structurally conserved regions in protein superfamilies.

Authors:  Ivan K Huang; Jimin Pei; Nick V Grishin
Journal:  Bioinformatics       Date:  2012-11-28       Impact factor: 6.937

4.  Conserved evolutionary units in the heme-copper oxidase superfamily revealed by novel homologous protein families.

Authors:  Jimin Pei; Wenlin Li; Lisa N Kinch; Nick V Grishin
Journal:  Protein Sci       Date:  2014-07-07       Impact factor: 6.725

5.  Exploration of uncharted regions of the protein universe.

Authors:  Lukasz Jaroszewski; Zhanwen Li; S Sri Krishna; Constantina Bakolitsa; John Wooley; Ashley M Deacon; Ian A Wilson; Adam Godzik
Journal:  PLoS Biol       Date:  2009-09-29       Impact factor: 8.029

6.  Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison.

Authors:  Mindaugas Margelevicius; Ceslovas Venclovas
Journal:  BMC Bioinformatics       Date:  2010-02-17       Impact factor: 3.169

7.  On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence.

Authors:  Douglas L Theobald
Journal:  Biol Direct       Date:  2011-11-24       Impact factor: 4.540

8.  HorA web server to infer homology between proteins using sequence and structural similarity.

Authors:  Bong-Hyun Kim; Hua Cheng; Nick V Grishin
Journal:  Nucleic Acids Res       Date:  2009-05-05       Impact factor: 16.971

9.  COMPASS server for homology detection: improved statistical accuracy, speed and functionality.

Authors:  Ruslan I Sadreyev; Ming Tang; Bong-Hyun Kim; Nick V Grishin
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

10.  MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments.

Authors:  Shintaro Minami; Kengo Sawada; George Chikenji
Journal:  BMC Bioinformatics       Date:  2013-01-18       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.