Literature DB >> 11206052

Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Y Kuroda1, K Tani, Y Matsuo, S Yokoyama.   

Abstract

Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.

Mesh:

Substances:

Year:  2000        PMID: 11206052      PMCID: PMC2144534          DOI: 10.1110/ps.9.12.2313

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  39 in total

1.  Solution structure of the link module: a hyaluronan-binding domain involved in extracellular matrix stability and cell migration.

Authors:  D Kohda; C J Morton; A A Parkar; H Hatanaka; F M Inagaki; I D Campbell; A J Day
Journal:  Cell       Date:  1996-09-06       Impact factor: 41.582

Review 2.  NMR structures of proteins and protein complexes beyond 20,000 M(r).

Authors:  G M Clore; A M Gronenborn
Journal:  Nat Struct Biol       Date:  1997-10

3.  Touring protein fold space with Dali/FSSP.

Authors:  L Holm; C Sander
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

4.  Pfam: a comprehensive database of protein domain families based on seed alignments.

Authors:  E L Sonnhammer; S R Eddy; R Durbin
Journal:  Proteins       Date:  1997-07

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  CATH--a hierarchic classification of protein domain structures.

Authors:  C A Orengo; A D Michie; S Jones; D T Jones; M B Swindells; J M Thornton
Journal:  Structure       Date:  1997-08-15       Impact factor: 5.006

Review 7.  Structural genomics: beyond the human genome project.

Authors:  S K Burley; S C Almo; J B Bonanno; M Capel; M R Chance; T Gaasterland; D Lin; A Sali; F W Studier; S Swaminathan
Journal:  Nat Genet       Date:  1999-10       Impact factor: 38.330

8.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

9.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  1996-01-01       Impact factor: 16.971

10.  Modular arrangement of proteins as inferred from analysis of homology.

Authors:  E L Sonnhammer; D Kahn
Journal:  Protein Sci       Date:  1994-03       Impact factor: 6.725

View more
  13 in total

1.  Characteristics and prediction of domain linker sequences in multi-domain proteins.

Authors:  Takanori Tanaka; Yutaka Kuroda; Shigeyuki Yokoyama
Journal:  J Struct Funct Genomics       Date:  2003

2.  Characterization and prediction of linker sequences of multi-domain proteins by a neural network.

Authors:  Satoshi Miyazaki; Yutaka Kuroda; Shigeyuki Yokoyama
Journal:  J Struct Funct Genomics       Date:  2002

3.  Computer-aided NMR assay for detecting natively folded structural domains.

Authors:  Takayuki Hondoh; Atsushi Kato; Shigeyuki Yokoyama; Yutaka Kuroda
Journal:  Protein Sci       Date:  2006-03-07       Impact factor: 6.725

4.  IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.

Authors:  Teppei Ebina; Yuki Umezawa; Yutaka Kuroda
Journal:  J Comput Aided Mol Des       Date:  2013-05-29       Impact factor: 3.686

5.  H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

Authors:  Teppei Ebina; Ryosuke Suzuki; Ryotaro Tsuji; Yutaka Kuroda
Journal:  J Comput Aided Mol Des       Date:  2014-06-26       Impact factor: 3.686

6.  ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.

Authors:  Yan Wang; Jian Wang; Ruiming Li; Qiang Shi; Zhidong Xue; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

7.  Mathematical model for empirically optimizing large scale production of soluble protein domains.

Authors:  Eisuke Chikayama; Atsushi Kurotani; Takanori Tanaka; Takashi Yabuki; Satoshi Miyazaki; Shigeyuki Yokoyama; Yutaka Kuroda
Journal:  BMC Bioinformatics       Date:  2010-03-01       Impact factor: 3.169

8.  DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

Authors:  Jesse Eickholt; Xin Deng; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2011-02-01       Impact factor: 3.169

9.  Identification of putative domain linkers by a neural network - application to a large sequence database.

Authors:  Satoshi Miyazaki; Yutaka Kuroda; Shigeyuki Yokoyama
Journal:  BMC Bioinformatics       Date:  2006-06-27       Impact factor: 3.169

10.  Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors.

Authors:  Satoshi Fukuchi; Keiichi Homma; Yoshiaki Minezaki; Takashi Gojobori; Ken Nishikawa
Journal:  BMC Struct Biol       Date:  2009-04-30
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.