Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

Literature DB >> 12441380

Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

Russell L Marsden¹, Liam J McGuffin, David T Jones.

Abstract

The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.

Mesh：

Substances：
Proteins

Year: 2002 PMID： 12441380 PMCID： PMC2373756 DOI： 10.1110/ps.0209902

Source DB: PubMed Journal: Protein Sci ISSN： 0961-8368 Impact factor: 6.725

27 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The Pfam protein families database.

Authors: A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. Protein Information Resource: a community resource for expert annotation of protein data.

Authors: W C Barker; J S Garavelli; Z Hou; H Huang; R S Ledley; P B McGarvey; H W Mewes; B C Orcutt; F Pfeiffer; A Tsugita; C R Vinayaka; C Xiao; L S Yeh; C Wu
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

4. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP.

Authors: C Hadley; D T Jones
Journal: Structure Date: 1999-09-15 Impact factor: 5.006

5. What are the baselines for protein fold recognition?

Authors: L J McGuffin; K Bryson; D T Jones
Journal: Bioinformatics Date: 2001-01 Impact factor: 6.937

6. Domain size distributions can predict domain boundaries.

Authors: S J Wheelan; A Marchler-Bauer; S H Bryant
Journal: Bioinformatics Date: 2000-07 Impact factor: 6.937

7. Identification of homology in protein structure classification.

Authors: S Dietmann; L Holm
Journal: Nat Struct Biol Date: 2001-11

8. SnapDRAGON: a method to delineate protein structural domains from sequence data.

Authors: Richard A George; Jaap Heringa
Journal: J Mol Biol Date: 2002-02-22 Impact factor: 5.469

9. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

10. CAFASP2: the second critical assessment of fully automated structure prediction methods.

Authors: D Fischer; A Elofsson; L Rychlewski; F Pazos; A Valencia; B Rost; A R Ortiz; R L Dunbrack
Journal: Proteins Date: 2001

56 in total

1. Sequence-based prediction of protein domains.

Authors: Jinfeng Liu; Burkhard Rost
Journal: Nucleic Acids Res Date: 2004-07-07 Impact factor: 16.971

2. Model-based inference of recombination hotspots in a highly variable oncogene [corrected].

Authors: G Greenspan; D Geiger; F Gotch; M Bower; S Patterson; M Nelson; B Gazzard; J Stebbing
Journal: J Mol Evol Date: 2004-03 Impact factor: 2.395

3. Point mutations in a Drosophila P element abolish both P element-dependent silencing (PDS) of a transgene and repressor functions.

Authors: Alireza Sameny; Anderson La; Scott Hanna; John Locke
Journal: Chromosoma Date: 2011-10-19 Impact factor: 4.316

Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

1. The Protein Data Bank.

2. The Pfam protein families database.

3. Protein Information Resource: a community resource for expert annotation of protein data.

4. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP.

5. What are the baselines for protein fold recognition?

6. Domain size distributions can predict domain boundaries.

7. Identification of homology in protein structure classification.

8. SnapDRAGON: a method to delineate protein structural domains from sequence data.

9. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

10. CAFASP2: the second critical assessment of fully automated structure prediction methods.

1. Sequence-based prediction of protein domains.

2. Model-based inference of recombination hotspots in a highly variable oncogene [corrected].

3. Point mutations in a Drosophila P element abolish both P element-dependent silencing (PDS) of a transgene and repressor functions.

4. Fast prediction of protein domain boundaries using conserved local patterns.

5. Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

6. Growth of novel protein structural data.

7. Random dissection to select for protein split sites and its application in protein fragment complementation.

8. A novel automethylation reaction in the Aspergillus nidulans LaeA protein generates S-methylmethionine.

9. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

10. A modular kernel approach for integrative analysis of protein domain boundaries.