Literature DB >> 16761361

Statistical significance in biological sequence analysis.

Alexander Yu Mitrophanov1, Mark Borodovsky.   

Abstract

One of the major goals of computational sequence analysis is to find sequence similarities, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Since the degree of similarity is usually assessed by the sequence alignment score, it is necessary to know if a score is high enough to indicate a biologically interesting alignment. A powerful approach to defining score cutoffs is based on the evaluation of the statistical significance of alignments. The statistical significance of an alignment score is frequently assessed by its P-value, which is the probability that this score or a higher one can occur simply by chance, given the probabilistic models for the sequences. In this review we discuss the general role of P-value estimation in sequence analysis, and give a description of theoretical methods and computational approaches to the estimation of statistical signifiance for important classes of sequence analysis problems. In particular, we concentrate on the P-value estimation techniques for single sequence studies (both score-based and score-free), global and local pairwise sequence alignments, multiple alignments, sequence-to-profile alignments and alignments built with hidden Markov models. We anticipate that the review will be useful both to researchers professionally working in bioinformatics as well as to biomedical scientists interested in using contemporary methods of DNA and protein sequence analysis.

Mesh:

Year:  2006        PMID: 16761361     DOI: 10.1093/bib/bbk001

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  19 in total

1.  PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

Authors:  K Ganesan; S Parthasarathy
Journal:  J Struct Funct Genomics       Date:  2011-12-03

2.  Statistical significance of optical map alignments.

Authors:  Deepayan Sarkar; Steve Goldstein; David C Schwartz; Michael A Newton
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

3.  Importance sampling of word patterns in DNA and protein sequences.

Authors:  Hock Peng Chan; Nancy Ruonan Zhang; Louis H Y Chen
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

4.  Significance of gapped sequence alignments.

Authors:  Lee A Newberg
Journal:  J Comput Biol       Date:  2008-11       Impact factor: 1.479

5.  Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors.

Authors:  Christopher A Miller; Stephen H Settle; Erik P Sulman; Kenneth D Aldape; Aleksandar Milosavljevic
Journal:  BMC Med Genomics       Date:  2011-04-14       Impact factor: 3.063

6.  Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison.

Authors:  Mindaugas Margelevicius; Ceslovas Venclovas
Journal:  BMC Bioinformatics       Date:  2010-02-17       Impact factor: 3.169

7.  From IMGT-ONTOLOGY to IMGT/LIGMotif: the IMGT standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences.

Authors:  Jérôme Lane; Patrice Duroux; Marie-Paule Lefranc
Journal:  BMC Bioinformatics       Date:  2010-04-30       Impact factor: 3.169

8.  Error statistics of hidden Markov model and hidden Boltzmann model results.

Authors:  Lee A Newberg
Journal:  BMC Bioinformatics       Date:  2009-07-09       Impact factor: 3.307

9.  MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences.

Authors:  Dianhui Wang; Sarwar Tapan
Journal:  BMC Syst Biol       Date:  2012-12-12

10.  Next-generation phylogenomics.

Authors:  Cheong Xin Chan; Mark A Ragan
Journal:  Biol Direct       Date:  2013-01-22       Impact factor: 4.540

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.