Literature DB >> 24825613

SIBIS: a Bayesian model for inconsistent protein sequence estimation.

Walyd Khenoussi1, Renaud Vanhoutrève1, Olivier Poch1, Julie D Thompson1.   

Abstract

MOTIVATION: The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors.
RESULTS: We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences.
AVAILABILITY AND IMPLEMENTATION: Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24825613     DOI: 10.1093/bioinformatics/btu329

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.

Authors:  Nicolas Scalzitti; Anne Jeannin-Girardon; Pierre Collet; Olivier Poch; Julie D Thompson
Journal:  BMC Genomics       Date:  2020-04-09       Impact factor: 3.969

2.  Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes.

Authors:  Corentin Meyer; Nicolas Scalzitti; Anne Jeannin-Girardon; Pierre Collet; Olivier Poch; Julie D Thompson
Journal:  BMC Bioinformatics       Date:  2020-11-10       Impact factor: 3.169

3.  LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Authors:  Renaud Vanhoutreve; Arnaud Kress; Baptiste Legrand; Hélène Gass; Olivier Poch; Julie D Thompson
Journal:  BMC Bioinformatics       Date:  2016-07-07       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.