Literature DB >> 16105900

Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap.

Gavin A Price1, Gavin E Crooks, Richard E Green, Steven E Brenner.   

Abstract

MOTIVATION: Protein sequence comparison methods are routinely used to infer the intricate network of evolutionary relationships found within the rapidly growing library of protein sequences, and thereby to predict the structure and function of uncharacterized proteins. In the present study, we detail an improved statistical benchmark of pairwise protein sequence comparison algorithms. We use bootstrap resampling techniques to determine standard statistical errors and to estimate the confidence of our conclusions. We show that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased. Consequently, the standard bootstrap underpredicts average performance when used in the context of evaluating sequence comparison methods. We have developed, as an alternative, an unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap.
RESULTS: We apply our analysis to the comparative study of amino acid substitution matrix families and find that using modern matrices results in a small, but statistically significant improvement in remote homology detection compared with the classic PAM and BLOSUM matrices. AVAILABILITY: The sequence sets and code for performing these analyses are available from http://compbio.berkeley.edu/. CONTACT: brenner@compbio.berkeley.edu.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16105900     DOI: 10.1093/bioinformatics/bti627

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Authors:  Eric L Peterson; Jané Kondev; Julie A Theriot; Rob Phillips
Journal:  Bioinformatics       Date:  2009-04-07       Impact factor: 6.937

2.  Testing statistical significance scores of sequence comparison methods with structure similarity.

Authors:  Tim Hulsen; Jacob de Vlieg; Jack A M Leunissen; Peter M A Groenen
Journal:  BMC Bioinformatics       Date:  2006-10-12       Impact factor: 3.169

3.  Evaluation of jackknife and bootstrap for defining confidence intervals for pairwise agreement measures.

Authors:  Ana Severiano; João A Carriço; D Ashley Robinson; Mário Ramirez; Francisco R Pinto
Journal:  PLoS One       Date:  2011-05-18       Impact factor: 3.240

4.  Lateral transfer of genes and gene fragments in prokaryotes.

Authors:  Cheong Xin Chan; Robert G Beiko; Aaron E Darling; Mark A Ragan
Journal:  Genome Biol Evol       Date:  2009-11-04       Impact factor: 3.416

5.  Hidden Markov model speed heuristic and iterative HMM search procedure.

Authors:  L Steven Johnson; Sean R Eddy; Elon Portugaly
Journal:  BMC Bioinformatics       Date:  2010-08-18       Impact factor: 3.169

6.  Bayesian models for comparative analysis integrating phylogenetic uncertainty.

Authors:  Pierre de Villemereuil; Jessie A Wells; Robert D Edwards; Simon P Blomberg
Journal:  BMC Evol Biol       Date:  2012-06-28       Impact factor: 3.260

7.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

8.  Optimizing substitution matrix choice and gap parameters for sequence alignment.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2009-12-02       Impact factor: 3.169

9.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

10.  MACHOS: Markov clusters of homologous subsequences.

Authors:  Simon Wong; Mark A Ragan
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.