Literature DB >> 23193223

Defining and predicting structurally conserved regions in protein superfamilies.

Ivan K Huang1, Jimin Pei, Nick V Grishin.   

Abstract

MOTIVATION: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment.
RESULTS: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. AVAILABILITY: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. CONTACT: 91huangi@gmail.com or grishin@chop.swmed.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23193223      PMCID: PMC3546793          DOI: 10.1093/bioinformatics/bts682

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  35 in total

1.  The hydrophobic cores of proteins predicted by wavelet analysis.

Authors:  H Hirakawa; S Muta; S Kuhara
Journal:  Bioinformatics       Date:  1999-02       Impact factor: 6.937

2.  A similar active site for non-specific and specific endonucleases.

Authors:  P Friedhoff; I Franke; G Meiss; W Wende; K L Krause; A Pingoud
Journal:  Nat Struct Biol       Date:  1999-02

Review 3.  Restriction enzymes and their isoschizomers.

Authors:  R J Roberts; D Macelis
Journal:  Nucleic Acids Res       Date:  1991-04-25       Impact factor: 16.971

Review 4.  Mapping the protein universe.

Authors:  L Holm; C Sander
Journal:  Science       Date:  1996-08-02       Impact factor: 47.728

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

7.  Structural relationships of homologous proteins as a fundamental principle in homology modeling.

Authors:  M Hilbert; G Böhm; R Jaenicke
Journal:  Proteins       Date:  1993-10

8.  Amino acid sequence motif of group I intron endonucleases is conserved in open reading frames of group II introns.

Authors:  D A Shub; H Goodrich-Blair; S R Eddy
Journal:  Trends Biochem Sci       Date:  1994-10       Impact factor: 13.807

9.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

10.  The relation between the divergence of sequence and structure in proteins.

Authors:  C Chothia; A M Lesk
Journal:  EMBO J       Date:  1986-04       Impact factor: 11.598

View more
  6 in total

1.  A sequence family database built on ECOD structural domains.

Authors:  Yuxing Liao; R Dustin Schaeffer; Jimin Pei; Nick V Grishin
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

Review 2.  Signal transduction: From the atomic age to the post-genomic era.

Authors:  Jeremy Thorner; Tony Hunter; Lewis C Cantley; Richard Sever
Journal:  Cold Spring Harb Perspect Biol       Date:  2014-10-30       Impact factor: 10.005

3.  Refinement by shifting secondary structure elements improves sequence alignments.

Authors:  Jing Tong; Jimin Pei; Zbyszek Otwinowski; Nick V Grishin
Journal:  Proteins       Date:  2015-01-13

4.  A sequence-based method for predicting extant fold switchers that undergo α-helix ↔ β-strand transitions.

Authors:  Soumya Mishra; Loren L Looger; Lauren L Porter
Journal:  Biopolymers       Date:  2021-09-09       Impact factor: 2.240

5.  Revisiting Myosin Families Through Large-scale Sequence Searches Leads to the Discovery of New Myosins.

Authors:  Shaik Naseer Pasha; Iyer Meenakshi; Ramanathan Sowdhamini
Journal:  Evol Bioinform Online       Date:  2016-08-29       Impact factor: 1.625

6.  Genome-wide identification of Calcineurin B-Like (CBL) gene family of plants reveals novel conserved motifs and evolutionary aspects in calcium signaling events.

Authors:  Tapan Kumar Mohanta; Nibedita Mohanta; Yugal Kishore Mohanta; Pratap Parida; Hanhong Bae
Journal:  BMC Plant Biol       Date:  2015-08-06       Impact factor: 4.215

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.