Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

Literature DB >> 9545449

Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

Abstract

MOTIVATION: Genome sequencing projects require the periodic application of analysis tools that can classify and multiply align related protein sequence domains. Full automation of this task requires an efficient integration of similarity and alignment techniques.
RESULTS: We have developed a fully automated process that classifies entire protein sequence databases, resulting in alignment of the homologous sequences. The successive steps of the procedure are based on compositional and local sequence similarity searches followed by multiple sequence alignments. Global similarities are detected from the pairwise comparison of amino acid and dipeptide compositions of each protein. After the elimination of all but one sequence from each detected cluster of closely related proteins, the remaining sequences are compiled in a suffix tree which is self-compared to detect local sequence similarities. Sets of proteins which share similar sequence segments are then weighted according to their closeness and multiply aligned using a fast hierarchical dynamic programming algorithm. Computational strategies were devised to minimize computer processing time and memory space requirements. The accuracy of the sequence classifications has been evaluated for 12 462 primary structures distributed over 341 known families. The percentage of sequences with missed or incorrect family assignments was 6.8% on the test set. This low error level is only twice that of the manually constructed PROSITE database ( 3.4% ) and is substantially better than that found for the automatically built PRODOM database ( 34.9% ). AVAILABILITY: The resulting database, called DOMO, is available through database search routine SRS at Infobiogen (http://www.infobiogen.fr/srs5/), EBI (http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl-heidelberg.de/srs5/) World Wide Web sites. CONTACT: gracy@infobiogen.fr

Entities: Chemical Gene

Mesh：

Substances：
Proteins

Year: 1998 PMID： 9545449 DOI： 10.1093/bioinformatics/14.2.164

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

15 in total

1. Increased coverage of protein families with the blocks database servers.

Authors: J G Henikoff; E A Greene; S Pietrokovski; S Henikoff
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors: G Yona; N Linial; M Linial
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. The MetaFam Server: a comprehensive protein family resource.

Authors: K A Silverstein; E Shoop; J E Johnson; A Kilian; J L Freeman; T M Kunau; I A Awad; M Mayer; E F Retzel
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

4. Mendel-GFDb and Mendel-ESTS: databases of plant gene families and ESTs annotated with gene family numbers and gene family names.

Authors: D Lonsdale; M Crowe; B Arnold; B C Arnold
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

5. Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Authors: Y Kuroda; K Tani; Y Matsuo; S Yokoyama
Journal: Protein Sci Date: 2000-12 Impact factor: 6.725

6. DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches.

Authors: J D Thompson; F Plewniak; J Thierry; O Poch
Journal: Nucleic Acids Res Date: 2000-08-01 Impact factor: 16.971

7. Sequence similarities of protein kinase peptide substrates and inhibitors: comparison of their primary structures with immunoglobulin repeats.

Authors: J Kubrycht; J Borecký; K Sigler
Journal: Folia Microbiol (Praha) Date: 2002 Impact factor: 2.099