| Literature DB >> 9322037 |
C G Nevill-Manning1, K S Sethi, T D Wu, D L Brutlag.
Abstract
Discrete motifs that discriminate functional classes of proteins are useful for classifying new sequences, capturing structural constraints, and identifying protein subclasses. Despite the fact that the space of such motifs can grow exponentially with sequence length and number, we show that in practice it usually does not, and we describe a technique that infers motifs from aligned protein sequences by exhaustively searching this space. Our method generates sequence motifs over a wide range of recall and precision, and chooses a representative motif based on a score that we derive from both statistical and information-theoretic frameworks. Finally, we show that the selected motifs perform well in practice, classifying unseen sequences with extremely high precision, and infer protein subclasses that correspond to known biochemical classes.Entities:
Mesh:
Substances:
Year: 1997 PMID: 9322037
Source DB: PubMed Journal: Proc Int Conf Intell Syst Mol Biol ISSN: 1553-0833