Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Iterated sequence databank search methods.

Literature DB >> 10404625

Iterated sequence databank search methods.

Abstract

Iterated sequence databank search methods were assessed from the viewpoint of someone with the sequence of a novel gene product wishing to find distant relatives to their protein and, with the specific searches against the PDB, also hoping to find a relative of known structure. We examined three methods in detail, spanning a range from simple pattern-matching to sophisticated weighted profiles. Rather than apply these methods 'blindly' (with default parameters) to a large number of test queries, we have concentrated on the globins, so allowing a more detailed investigation of each method on different data subsets with different parameter settings. Despite their widespread use, regular-expression matching proved to be very limited-seldom extending beyond the sub-family from which the pattern was derived. To attain any generality, the patterns had to be 'stripped-down' to include only the most highly conserved parts. The QUEST program avoided these problems by introducing a more flexible (weighted) matching. On the PDB sequences this was highly effective, missing only a few globins with probes based on each sub-family or even a single representative from each sub-family. In addition, very few false-positives were encountered, and those that did match, often only did so for a few cycles before being lost again. On the larger sequence collection, however, QUEST encountered problems with maintaining (or achieving) the alignment of the full globin family. psi-BLAST also recognised almost all the globins when matching against the PDB sequences, typically, missing three or four of the most distantly related sequences while picking-up a few false-positives. In contrast to QUEST, psi-BLAST performed very well on the larger databank, getting almost a full collection of globins although still retaining the same proportion of false-positives. SAM applied to the PDB sequences performed reasonably well with the myoglobin and hemoglobin families as probes, missing, typically several of the more difficult proteins but performed poorly with the leghemoglobin probe. Only with the full family range as a probe did it produce results comparable to psi-BLAST and QUEST. With the larger databank, SAM produced a good result but, again, this was only achieved using the full range of sequence variation with the default regulariser and use of Dirichlet mixtures completely failed in this situation.

Entities: Gene

Mesh：

Substances：
Proteins

Year: 1999 PMID： 10404625 DOI： 10.1016/s0097-8485(99)00017-0

Source DB: PubMed Journal: Comput Chem ISSN： 0097-8485

Keyword Cloud
Cited

2 in total

1. Homology-extended sequence alignment.

Authors: V A Simossis; J Kleinjung; J Heringa
Journal: Nucleic Acids Res Date: 2005-02-07 Impact factor: 16.971

2. Reduction, alignment and visualisation of large diverse sequence families.

Authors: William R Taylor
Journal: BMC Bioinformatics Date: 2016-08-02 Impact factor: 3.169

2 in total