Literature DB >> 26357264

Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate.

Hyrum D Carroll, Alex C Williams, Anthony G Davis, John L Spouge.   

Abstract

Over the past few decades, discovery based on sequence homology has become a widely accepted practice. Consequently, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems. Using NCBI's BLAST and PSI-BLAST software packages, we demonstrate the applicability of such a replacement in both non-iterative (BLASTFDR) and iterative (PSI-BLAST(FDR)) homology searches. For each application, we performed an evaluation of retrieval efficacy with five different multiple testing methods on a large training database. For each algorithm, we choose the best performing method, Benjamini-Hochberg, as the default statistic. As measured by the threshold average precision, BLAST(FDR) yielded 14.1 percent better retrieval performance than BLAST on a large (5,161 queries) test database and PSI-BLAST(FDR) attained 11.8 percent better retrieval performance than PSI-BLAST. The C++ source code specific to BLAST(FDR) and PSI-BLAST(FDR) and instructions are available at http://www.cs.mtsu.edu/~hcarroll/blast_fdr/.

Entities:  

Mesh:

Year:  2015        PMID: 26357264      PMCID: PMC4568567          DOI: 10.1109/TCBB.2014.2366112

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  9 in total

1.  The ASTRAL Compendium in 2004.

Authors:  John-Marc Chandonia; Gary Hon; Nigel S Walker; Loredana Lo Conte; Patrice Koehl; Michael Levitt; Steven E Brenner
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.

Authors:  M Gribskov; N L Robinson
Journal:  Comput Chem       Date:  1996-03

3.  Removing near-neighbour redundancy from large protein sequence collections.

Authors:  L Holm; C Sander
Journal:  Bioinformatics       Date:  1998-06       Impact factor: 6.937

Review 4.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

5.  Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics.

Authors:  Hyrum D Carroll; Maricel G Kann; Sergey L Sheetlin; John L Spouge
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

6.  Homologous over-extension: a challenge for iterative similarity searches.

Authors:  Mileidy W Gonzalez; William R Pearson
Journal:  Nucleic Acids Res       Date:  2010-01-11       Impact factor: 16.971

7.  Dfam: a database of repetitive DNA based on profile hidden Markov models.

Authors:  Travis J Wheeler; Jody Clements; Sean R Eddy; Robert Hubley; Thomas A Jones; Jerzy Jurka; Arian F A Smit; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2012-11-30       Impact factor: 16.971

8.  PSI-BLAST pseudocounts and the minimum description length principle.

Authors:  Stephen F Altschul; E Michael Gertz; Richa Agarwala; Alejandro A Schäffer; Yi-Kuo Yu
Journal:  Nucleic Acids Res       Date:  2008-12-16       Impact factor: 16.971

9.  The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Authors:  Maricel G Kann; Sergey L Sheetlin; Yonil Park; Stephen H Bryant; John L Spouge
Journal:  Nucleic Acids Res       Date:  2007-06-27       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.