Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Armadillo: domain boundary prediction by amino acid composition.

Literature DB >> 15978619

Armadillo: domain boundary prediction by amino acid composition.

Michel Dumontier¹, Rong Yao, Howard J Feldman, Christopher W V Hogue.

Abstract

The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.

Entities: Chemical Gene

Mesh：

Substances：
Proteins

Year: 2005 PMID： 15978619 DOI： 10.1016/j.jmb.2005.05.037

Source DB: PubMed Journal: J Mol Biol ISSN： 0022-2836 Impact factor: 5.469

Keyword Cloud
Cited

22 in total

1. Domain structure of Lassa virus L protein.

Authors: Linda Brunotte; Michaela Lelke; Meike Hass; Katja Kleinsteuber; Beate Becker-Ziaja; Stephan Günther
Journal: J Virol Date: 2010-10-27 Impact factor: 5.103

2. IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.

Authors: Teppei Ebina; Yuki Umezawa; Yutaka Kuroda
Journal: J Comput Aided Mol Des Date: 2013-05-29 Impact factor: 3.686

3. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.

Authors: Tambi Richa; Soichiro Ide; Ryosuke Suzuki; Teppei Ebina; Yutaka Kuroda
Journal: J Comput Aided Mol Des Date: 2016-12-27 Impact factor: 3.686

4. H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

Authors: Teppei Ebina; Ryosuke Suzuki; Ryotaro Tsuji; Yutaka Kuroda
Journal: J Comput Aided Mol Des Date: 2014-06-26 Impact factor: 3.686

5. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.

Authors: Yan Wang; Jian Wang; Ruiming Li; Qiang Shi; Zhidong Xue; Yang Zhang
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

Armadillo: domain boundary prediction by amino acid composition.

1. Domain structure of Lassa virus L protein.

2. IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.

3. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.

4. H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

5. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.

6. DomSVR: domain boundary prediction with support vector regression from sequence information alone.

7. Mathematical model for empirically optimizing large scale production of soluble protein domains.

8. OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries.

9. Ab initio and homology based prediction of protein domains by recursive neural networks.

10. DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.