Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.

Literature DB >> 14990449

Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.

Olivier Bastien¹, Jean-Christophe Aude, Sylvaine Roy, Eric Maréchal.

Abstract

MOTIVATION: Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified.
RESULTS: In this paper, we used the Bienaymé-Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value(2) is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).

Entities: Species

Mesh：

Substances：
Amino Acids
Proteins

Year: 2004 PMID： 14990449 DOI： 10.1093/bioinformatics/btg440

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

10 in total

1. R. S. WebTool, a web server for random sampling-based significance evaluation of pairwise distances.

Authors: Florent Villiers; Olivier Bastien; June M Kwak
Journal: Nucleic Acids Res Date: 2014-05-30 Impact factor: 16.971

2. Testing statistical significance scores of sequence comparison methods with structure similarity.

Authors: Tim Hulsen; Jacob de Vlieg; Jack A M Leunissen; Peter M A Groenen
Journal: BMC Bioinformatics Date: 2006-10-12 Impact factor: 3.169

3. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

Authors: Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal: BMC Res Notes Date: 2010-04-06

4. The cyst-dividing bacterium Ramlibacter tataouinensis TTB310 genome reveals a well-stocked toolbox for adaptation to a desert environment.

Authors: Gilles De Luca; Mohamed Barakat; Philippe Ortet; Sylvain Fochesato; Cécile Jourlin-Castelli; Mireille Ansaldi; Béatrice Py; Gwennaele Fichant; Pedro M Coutinho; Romé Voulhoux; Olivier Bastien; Eric Maréchal; Bernard Henrissat; Yves Quentin; Philippe Noirot; Alain Filloux; Vincent Méjean; Michael S DuBow; Frédéric Barras; Valérie Barbe; Jean Weissenbach; Irina Mihalcescu; André Verméglio; Wafa Achouak; Thierry Heulin
Journal: PLoS One Date: 2011-09-01 Impact factor: 3.240

5. A simple derivation of the distribution of pairwise local protein sequence alignment scores.

Authors: Olivier Bastien
Journal: Evol Bioinform Online Date: 2008-02-14 Impact factor: 1.625

Review 6. Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Authors: Lyn-Marie Birkholtz; Olivier Bastien; Gordon Wells; Delphine Grando; Fourie Joubert; Vinod Kasam; Marc Zimmermann; Philippe Ortet; Nicolas Jacq; Nadia Saïdani; Sylvaine Roy; Martin Hofmann-Apitius; Vincent Breton; Abraham I Louw; Eric Maréchal
Journal: Malar J Date: 2006-11-17 Impact factor: 2.979

7. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.

Authors: Dimitrii O Kostenko; Eugene V Korotkov
Journal: Int J Mol Sci Date: 2022-03-29 Impact factor: 5.923

8. A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

Authors: Olivier Bastien; Philippe Ortet; Sylvaine Roy; Eric Maréchal
Journal: BMC Bioinformatics Date: 2005-03-10 Impact factor: 3.169

9. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

Authors: Olivier Bastien; Eric Maréchal
Journal: BMC Bioinformatics Date: 2008-08-07 Impact factor: 3.169

10. PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features.

Authors: Lei Cao; Yupeng Wang; Changwei Bi; Qiaolin Ye; Tongming Yin; Ning Ye
Journal: Genes (Basel) Date: 2020-08-23 Impact factor: 4.096

10 in total