Olivier Bastien1, Jean-Christophe Aude, Sylvaine Roy, Eric Maréchal. 1. Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, UMR 5168 CNRS-CEA-INRA-Université J. Fourier, CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France.
Abstract
MOTIVATION: Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified. RESULTS: In this paper, we used the Bienaymé-Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value(2) is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).
MOTIVATION: Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified. RESULTS: In this paper, we used the Bienaymé-Chebyshev inequality to demonstrate a theorem of the upper limit of an alignment score probability (or P-value). This theorem implies that a computed Z-value is a statistical test, a single-linkage clustering criterion and that 1/Z-value(2) is an upper limit to the probability of an alignment score whatever the actual probability law is. Therefore, this study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).
Authors: Gilles De Luca; Mohamed Barakat; Philippe Ortet; Sylvain Fochesato; Cécile Jourlin-Castelli; Mireille Ansaldi; Béatrice Py; Gwennaele Fichant; Pedro M Coutinho; Romé Voulhoux; Olivier Bastien; Eric Maréchal; Bernard Henrissat; Yves Quentin; Philippe Noirot; Alain Filloux; Vincent Méjean; Michael S DuBow; Frédéric Barras; Valérie Barbe; Jean Weissenbach; Irina Mihalcescu; André Verméglio; Wafa Achouak; Thierry Heulin Journal: PLoS One Date: 2011-09-01 Impact factor: 3.240
Authors: Lyn-Marie Birkholtz; Olivier Bastien; Gordon Wells; Delphine Grando; Fourie Joubert; Vinod Kasam; Marc Zimmermann; Philippe Ortet; Nicolas Jacq; Nadia Saïdani; Sylvaine Roy; Martin Hofmann-Apitius; Vincent Breton; Abraham I Louw; Eric Maréchal Journal: Malar J Date: 2006-11-17 Impact factor: 2.979