Literature DB >> 12162893

Estimating and evaluating the statistics of gapped local-alignment scores.

Timothy L Bailey1, Michael Gribskov.   

Abstract

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Mesh:

Substances:

Year:  2002        PMID: 12162893     DOI: 10.1089/106652702760138637

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  6 in total

1.  Statistical calibration of the SEQUEST XCorr function.

Authors:  Aaron A Klammer; Christopher Y Park; William Stafford Noble
Journal:  J Proteome Res       Date:  2009-04       Impact factor: 4.466

2.  Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data.

Authors:  Andy Lin; J Jeffry Howbert; William Stafford Noble
Journal:  J Proteome Res       Date:  2018-10-18       Impact factor: 4.466

3.  Detecting transcriptionally active regions using genomic tiling arrays.

Authors:  Gabor Halasz; Marinus F van Batenburg; Joelle Perusse; Sujun Hua; Xiang-Jun Lu; Kevin P White; Harmen J Bussemaker
Journal:  Genome Biol       Date:  2006       Impact factor: 13.583

4.  RSEARCH: finding homologs of single structured RNA sequences.

Authors:  Robert J Klein; Sean R Eddy
Journal:  BMC Bioinformatics       Date:  2003-09-22       Impact factor: 3.169

5.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts.

Authors:  Ferhat Ay; Timothy L Bailey; William Stafford Noble
Journal:  Genome Res       Date:  2014-02-05       Impact factor: 9.043

6.  Physicochemical property consensus sequences for functional analysis, design of multivalent antigens and targeted antivirals.

Authors:  Catherine H Schein; David M Bowen; Jessica A Lewis; Kyung Choi; Aniko Paul; Gerbrand J van der Heden van Noort; Wenzhe Lu; Dmitri V Filippov
Journal:  BMC Bioinformatics       Date:  2012-08-24       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.