Literature DB >> 8052532

Discovering active motifs in sets of related protein sequences and using them for classification.

J T Wang1, T G Marr, D Shasha, B A Shapiro, G W Chirn.   

Abstract

We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier. When we apply the classifier to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classifier of Henikoff and Henikoff. Thus, using our classifier in conjunction with theirs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree).

Mesh:

Substances:

Year:  1994        PMID: 8052532      PMCID: PMC308246          DOI: 10.1093/nar/22.14.2769

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  32 in total

1.  PATMAT: a searching and extraction program for sequence, pattern and block queries and databases.

Authors:  J C Wallace; S Henikoff
Journal:  Comput Appl Biosci       Date:  1992-06

2.  A method for the simultaneous alignment of three or more amino acid sequences.

Authors:  M S Johnson; R F Doolittle
Journal:  J Mol Evol       Date:  1986       Impact factor: 2.395

Review 3.  A thousand and one protein kinases.

Authors:  T Hunter
Journal:  Cell       Date:  1987-09-11       Impact factor: 41.582

4.  Nucleotide sequence and expression of the gene encoding the EcoRII modification enzyme.

Authors:  S Som; A S Bhagwat; S Friedman
Journal:  Nucleic Acids Res       Date:  1987-01-12       Impact factor: 16.971

5.  A tool for multiple sequence alignment.

Authors:  D J Lipman; S F Altschul; J D Kececioglu
Journal:  Proc Natl Acad Sci U S A       Date:  1989-06       Impact factor: 11.205

6.  A fast and sensitive multiple sequence alignment algorithm.

Authors:  M Vingron; P Argos
Journal:  Comput Appl Biosci       Date:  1989-04

7.  Profile analysis: detection of distantly related proteins.

Authors:  M Gribskov; A D McLachlan; D Eisenberg
Journal:  Proc Natl Acad Sci U S A       Date:  1987-07       Impact factor: 11.205

8.  Multiple sequence alignment by consensus.

Authors:  M S Waterman
Journal:  Nucleic Acids Res       Date:  1986-11-25       Impact factor: 16.971

9.  Homology of 54K protein of signal-recognition particle, docking protein and two E. coli proteins with putative GTP-binding domains.

Authors:  K Römisch; J Webb; J Herz; S Prehn; R Frank; M Vingron; B Dobberstein
Journal:  Nature       Date:  1989-08-10       Impact factor: 49.962

10.  Cloning, sequencing, in vivo promoter mapping, and expression in Escherichia coli of the gene for the HhaI methyltransferase.

Authors:  M Caserta; W Zacharias; D Nwankwo; G G Wilson; R D Wells
Journal:  J Biol Chem       Date:  1987-04-05       Impact factor: 5.157

View more
  6 in total

1.  Finding flexible patterns in unaligned protein sequences.

Authors:  I Jonassen; J F Collins; D G Higgins
Journal:  Protein Sci       Date:  1995-08       Impact factor: 6.725

2.  Protein sequences classification by means of feature extraction with substitution matrices.

Authors:  Rabie Saidi; Mondher Maddouri; Engelbert Mephu Nguifo
Journal:  BMC Bioinformatics       Date:  2010-04-08       Impact factor: 3.169

3.  WildSpan: mining structured motifs from protein sequences.

Authors:  Chen-Ming Hsu; Chien-Yu Chen; Baw-Jhiune Liu
Journal:  Algorithms Mol Biol       Date:  2011-03-31       Impact factor: 1.405

4.  MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.

Authors:  Chen-Ming Hsu; Chien-Yu Chen; Baw-Jhiune Liu
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

5.  MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.

Authors:  Chen-Ming Hsu; Chien-Yu Chen; Baw-Jhiune Liu
Journal:  Nucleic Acids Res       Date:  2008-03       Impact factor: 16.971

6.  Functional representation of enzymes by specific peptides.

Authors:  Vered Kunik; Yasmine Meroz; Zach Solan; Ben Sandbank; Uri Weingart; Eytan Ruppin; David Horn
Journal:  PLoS Comput Biol       Date:  2007-07-11       Impact factor: 4.475

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.