| Literature DB >> 9070452 |
Abstract
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.Mesh:
Substances:
Year: 1997 PMID: 9070452 PMCID: PMC2143675 DOI: 10.1002/pro.5560060319
Source DB: PubMed Journal: Protein Sci ISSN: 0961-8368 Impact factor: 6.725