Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improved database searches for orthologous sequences by conditioning on outgroup sequences.

Literature DB >> 11836215

Improved database searches for orthologous sequences by conditioning on outgroup sequences.

Philip J Cotter¹, Daniel R Caffrey, Denis C Shields.

Abstract

MOTIVATION: Searches of biological sequence databases are usually focussed on distinguishing significant from random matches. However, the increasing abundance of related sequences on databases present a second challenge: to distinguish the evolutionarily most closely related sequences (often orthologues) from more distantly related homologues. This is particularly important when searching a database of partial sequences, where short orthologous sequences from a non-conserved region will score much more poorly than non-orthologous (outgroup) sequences from a conserved region.
RESULTS: Such inferences are shown to be improved by conditioning the search results on the scores of an outgroup sequence. The log-odds score for each target sequence identified on the database has the log-odds score of the outgroup sequence subtracted from it. A test group of Caenorhabditis elegans kinase sequences and their identified C.elegans outgroups were searched against a test database of human Expressed Sequence Tag (EST) sequences, where the sets of true target sequences were known in advance. The outgroup conditioned method was shown to identify 58% more true positives ahead of the first false positive, compared to the straightforward search without an outgroup. A test dataset of 151 proteins drawn from the C.elegans genome, where the putative 'outgroup' was assigned automatically, similarly found 50% more true positives using outgroup conditioning. Thus, outgroup conditioning provides a means to improve the results of database searching with little increase in the search computation time.

Entities: Species

Mesh：

Substances：
Protein Kinases

Year: 2002 PMID： 11836215 DOI： 10.1093/bioinformatics/18.1.83

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

1 in total

1. Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Authors: Ikuo Uchiyama
Journal: Nucleic Acids Res Date: 2006-01-25 Impact factor: 16.971

1 in total