Literature DB >> 16078368

vALId: validation of protein sequence quality based on multiple alignment data.

Laurent Bianchetti1, Julie Dawn Thompson, Odile Lecompte, Frederic Plewniak, Olivier Poch.   

Abstract

The validation of sequences is essential to perform accurate phylogeny and structure/function analysis. However among the thousands of protein sequences available in the public databases, most have been predicted in silico and have not systematically undergone a quality verification. It has recently become evident that they often contain sequence errors. To address the problem of automatic protein quality control, we have developed vALId, an interactive web interfaced software. Taking advantage of high quality multiple alignments of complete protein sequences (MACS), vALId first warns about the presence of suspicious insertions, deletions (indels) and divergent segments, and second, proposes corrections based on transcripts and genome contigs. In a first evaluation test, hundreds of indels and divergent segments were randomly generated in a manually refined MACS. The sensitivity (Sn) and specificity (Sp) of indel detection were excellent (0.96) while the mean Sn(0.49) and Sp(0.56) of divergent segment delineation depended on the percent identity between sequence neighbors. In a second test, 6195 sequences in 100 MACS corresponding to different functional and structural protein families were analyzed. 65% of the sequences were in silico predictions and 44% of eukaryote predicted proteins were partially incorrect with at least one suspicious indel or divergent segment.

Mesh:

Substances:

Year:  2005        PMID: 16078368     DOI: 10.1142/s0219720005001326

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  4 in total

1.  A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.

Authors:  Julie D Thompson; Benjamin Linard; Odile Lecompte; Olivier Poch
Journal:  PLoS One       Date:  2011-03-31       Impact factor: 3.240

2.  ICDS database: interrupted CoDing sequences in prokaryotic genomes.

Authors:  Emmanuel Perrodou; Caroline Deshayes; Jean Muller; Christine Schaeffer; Alain Van Dorsselaer; Raymond Ripp; Olivier Poch; Jean-Marc Reyrat; Odile Lecompte
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

3.  Strategies for reliable exploitation of evolutionary concepts in high throughput biology.

Authors:  Anthony Levasseur; Pierre Pontarotti; Olivier Poch; Julie D Thompson
Journal:  Evol Bioinform Online       Date:  2008-05-08       Impact factor: 1.625

4.  SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages.

Authors:  Laurent Bianchetti; Yan Wu; Eric Guerin; Frédéric Plewniak; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2007-09-20       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.