Literature DB >> 14990449

Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.

Olivier Bastien1, Jean-Christophe Aude, Sylvaine Roy, Eric Maréchal.   

Abstract

MOTIVATION: Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified.
RESULTS: In this paper, we used the Bienaymé-Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value(2) is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14990449     DOI: 10.1093/bioinformatics/btg440

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  R. S. WebTool, a web server for random sampling-based significance evaluation of pairwise distances.

Authors:  Florent Villiers; Olivier Bastien; June M Kwak
Journal:  Nucleic Acids Res       Date:  2014-05-30       Impact factor: 16.971

2.  Testing statistical significance scores of sequence comparison methods with structure similarity.

Authors:  Tim Hulsen; Jacob de Vlieg; Jack A M Leunissen; Peter M A Groenen
Journal:  BMC Bioinformatics       Date:  2006-10-12       Impact factor: 3.169

3.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

Authors:  Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal:  BMC Res Notes       Date:  2010-04-06

4.  The cyst-dividing bacterium Ramlibacter tataouinensis TTB310 genome reveals a well-stocked toolbox for adaptation to a desert environment.

Authors:  Gilles De Luca; Mohamed Barakat; Philippe Ortet; Sylvain Fochesato; Cécile Jourlin-Castelli; Mireille Ansaldi; Béatrice Py; Gwennaele Fichant; Pedro M Coutinho; Romé Voulhoux; Olivier Bastien; Eric Maréchal; Bernard Henrissat; Yves Quentin; Philippe Noirot; Alain Filloux; Vincent Méjean; Michael S DuBow; Frédéric Barras; Valérie Barbe; Jean Weissenbach; Irina Mihalcescu; André Verméglio; Wafa Achouak; Thierry Heulin
Journal:  PLoS One       Date:  2011-09-01       Impact factor: 3.240

5.  A simple derivation of the distribution of pairwise local protein sequence alignment scores.

Authors:  Olivier Bastien
Journal:  Evol Bioinform Online       Date:  2008-02-14       Impact factor: 1.625

Review 6.  Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Authors:  Lyn-Marie Birkholtz; Olivier Bastien; Gordon Wells; Delphine Grando; Fourie Joubert; Vinod Kasam; Marc Zimmermann; Philippe Ortet; Nicolas Jacq; Nadia Saïdani; Sylvaine Roy; Martin Hofmann-Apitius; Vincent Breton; Abraham I Louw; Eric Maréchal
Journal:  Malar J       Date:  2006-11-17       Impact factor: 2.979

7.  Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.

Authors:  Dimitrii O Kostenko; Eugene V Korotkov
Journal:  Int J Mol Sci       Date:  2022-03-29       Impact factor: 5.923

8.  A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

Authors:  Olivier Bastien; Philippe Ortet; Sylvaine Roy; Eric Maréchal
Journal:  BMC Bioinformatics       Date:  2005-03-10       Impact factor: 3.169

9.  Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

Authors:  Olivier Bastien; Eric Maréchal
Journal:  BMC Bioinformatics       Date:  2008-08-07       Impact factor: 3.169

10.  PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features.

Authors:  Lei Cao; Yupeng Wang; Changwei Bi; Qiaolin Ye; Tongming Yin; Ning Ye
Journal:  Genes (Basel)       Date:  2020-08-23       Impact factor: 4.096

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.