Literature DB >> 24965847

H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

Teppei Ebina1, Ryosuke Suzuki, Ryotaro Tsuji, Yutaka Kuroda.   

Abstract

Domain linker prediction is attracting much interest as it can help identifying novel domains suitable for high throughput proteomics analysis. Here, we report H-DROP, an SVM-based Helical Domain linker pRediction using OPtimal features. H-DROP is, to the best of our knowledge, the first predictor for specifically and effectively identifying helical linkers. This was made possible first because a large training dataset became available from IS-Dom, and second because we selected a small number of optimal features from a huge number of potential ones. The training helical linker dataset, which included 261 helical linkers, was constructed by detecting helical residues at the boundary regions of two independent structural domains listed in our previously reported IS-Dom dataset. 45 optimal feature candidates were selected from 3,000 features by random forest, which were further reduced to 26 optimal features by stepwise selection. The prediction sensitivity and precision of H-DROP were 35.2 and 38.8%, respectively. These values were over 10.7% higher than those of control methods including our previously developed DROP, which is a coil linker predictor, and PPRODO, which is trained with un-differentiated domain boundary sequences. Overall, these results indicated that helical linkers can be predicted from sequence information alone by using a strictly curated training data set for helical linkers and carefully selected set of optimal features. H-DROP is available at http://domserv.lab.tuat.ac.jp.

Mesh:

Substances:

Year:  2014        PMID: 24965847     DOI: 10.1007/s10822-014-9763-x

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  31 in total

1.  Protein secondary structure prediction based on position-specific scoring matrices.

Authors:  D T Jones
Journal:  J Mol Biol       Date:  1999-09-17       Impact factor: 5.469

2.  Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Authors:  Y Kuroda; K Tani; Y Matsuo; S Yokoyama
Journal:  Protein Sci       Date:  2000-12       Impact factor: 6.725

3.  DomCut: prediction of inter-domain linker regions in amino acid sequences.

Authors:  Mikita Suyama; Osamu Ohara
Journal:  Bioinformatics       Date:  2003-03-22       Impact factor: 6.937

4.  An analysis of protein domain linkers: their classification and role in protein folding.

Authors:  Richard A George; Jaap Heringa
Journal:  Protein Eng       Date:  2002-11

5.  Improving the performance of DomainParser for structural domain partition using neural network.

Authors:  Jun-tao Guo; Dong Xu; Dongsup Kim; Ying Xu
Journal:  Nucleic Acids Res       Date:  2003-02-01       Impact factor: 16.971

6.  Crystal structure of the essential N-terminal domain of telomerase reverse transcriptase.

Authors:  Steven A Jacobs; Elaine R Podell; Thomas R Cech
Journal:  Nat Struct Mol Biol       Date:  2006-02-05       Impact factor: 15.369

7.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

8.  Armadillo: domain boundary prediction by amino acid composition.

Authors:  Michel Dumontier; Rong Yao; Howard J Feldman; Christopher W V Hogue
Journal:  J Mol Biol       Date:  2005-07-29       Impact factor: 5.469

9.  Mathematical model for empirically optimizing large scale production of soluble protein domains.

Authors:  Eisuke Chikayama; Atsushi Kurotani; Takanori Tanaka; Takashi Yabuki; Satoshi Miyazaki; Shigeyuki Yokoyama; Yutaka Kuroda
Journal:  BMC Bioinformatics       Date:  2010-03-01       Impact factor: 3.169

10.  Identification of putative domain linkers by a neural network - application to a large sequence database.

Authors:  Satoshi Miyazaki; Yutaka Kuroda; Shigeyuki Yokoyama
Journal:  BMC Bioinformatics       Date:  2006-06-27       Impact factor: 3.169

View more
  4 in total

1.  Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.

Authors:  Tambi Richa; Soichiro Ide; Ryosuke Suzuki; Teppei Ebina; Yutaka Kuroda
Journal:  J Comput Aided Mol Des       Date:  2016-12-27       Impact factor: 3.686

2.  ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.

Authors:  Yan Wang; Jian Wang; Ruiming Li; Qiang Shi; Zhidong Xue; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

3.  A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

Authors:  Runtao Yang; Chengjin Zhang; Rui Gao; Lina Zhang
Journal:  Int J Mol Sci       Date:  2016-02-06       Impact factor: 5.923

4.  PssJ Is a Terminal Galactosyltransferase Involved in the Assembly of the Exopolysaccharide Subunit in Rhizobium Leguminosarum bv. Trifolii.

Authors:  Małgorzata Marczak; Magdalena Wójcik; Kamil Żebracki; Anna Turska-Szewczuk; Kamila Talarek; Dominika Nowak; Leszek Wawiórka; Marcin Sieńczyk; Agnieszka Łupicka-Słowik; Kamila Bobrek; Marceli Romańczuk; Piotr Koper; Andrzej Mazur
Journal:  Int J Mol Sci       Date:  2020-10-20       Impact factor: 5.923

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.