Literature DB >> 17709332

Mind the gaps: evidence of bias in estimates of multiple sequence alignments.

Tanya Golubchik1, Michael J Wise, Simon Easteal, Lars S Jermiin.   

Abstract

Multiple sequence alignment (MSA) is a crucial first step in the analysis of genomic and proteomic data. Commonly occurring sequence features, such as deletions and insertions, are known to affect the accuracy of MSA programs, but the extent to which alignment accuracy is affected by the positions of insertions and deletions has not been examined independently of other sources of sequence variation. We assessed the performance of 6 popular MSA programs (ClustalW, DIALIGN-T, MAFFT, MUSCLE, PROBCONS, and T-COFFEE) and one experimental program, PRANK, on amino acid sequences that differed only by short regions of deleted residues. The analysis showed that the absence of residues often led to an incorrect placement of gaps in the alignments, even though the sequences were otherwise identical. In data sets containing sequences with partially overlapping deletions, most MSA programs preferentially aligned the gaps vertically at the expense of incorrectly aligning residues in the flanking regions. Of the programs assessed, only DIALIGN-T was able to place overlapping gaps correctly relative to one another, but this was usually context dependent and was observed only in some of the data sets. In data sets containing sequences with non-overlapping deletions, both DIALIGN-T and MAFFT (G-INS-I) were able to align gaps with near-perfect accuracy, but only MAFFT produced the correct alignment consistently. The same was true for data sets that comprised isoforms of alternatively spliced gene products: both DIALIGN-T and MAFFT produced highly accurate alignments, with MAFFT being the more consistent of the 2 programs. Other programs, notably T-COFFEE and ClustalW, were less accurate. For all data sets, alignments produced by different MSA programs differed markedly, indicating that reliance on a single MSA program may give misleading results. It is therefore advisable to use more than one MSA program when dealing with sequences that may contain deletions or insertions, particularly for high-throughput and pipeline applications where manual refinement of each alignment is not practicable.

Entities:  

Mesh:

Year:  2007        PMID: 17709332     DOI: 10.1093/molbev/msm176

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  38 in total

Review 1.  Computational approaches to study the effects of small genomic variations.

Authors:  Kamil Khafizov; Maxim V Ivanov; Olga V Glazova; Sergei P Kovalenko
Journal:  J Mol Model       Date:  2015-09-08       Impact factor: 1.810

2.  High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes.

Authors:  Penka Markova-Raina; Dmitri Petrov
Journal:  Genome Res       Date:  2011-03-10       Impact factor: 9.043

3.  Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Authors:  Michael Nute; Ehsan Saleh; Tandy Warnow
Journal:  Syst Biol       Date:  2019-05-01       Impact factor: 15.683

4.  Expanding the Halohydrin Dehalogenase Enzyme Family: Identification of Novel Enzymes by Database Mining.

Authors:  Marcus Schallmey; Julia Koopmeiners; Elizabeth Wells; Rainer Wardenga; Anett Schallmey
Journal:  Appl Environ Microbiol       Date:  2014-09-19       Impact factor: 4.792

5.  Taxonomic status and origin of the Egyptian weasel (Mustela subpalmata) inferred from mitochondrial DNA.

Authors:  Mónica Rodrigues; Arthur R Bos; Richard Hoath; Patrick J Schembri; Petros Lymberakis; Michele Cento; Wissem Ghawar; Sakir O Ozkurt; Margarida Santos-Reis; Juha Merilä; Carlos Fernandes
Journal:  Genetica       Date:  2016-03-09       Impact factor: 1.082

6.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

7.  Roselliniella revealed as an overlooked genus of Hypocreales, with the description of a second species on parmelioid lichens.

Authors:  D L Hawksworth; A M Millanes; M Wedin
Journal:  Persoonia       Date:  2010-01-21       Impact factor: 11.051

8.  A Comparison of Three Molecular Markers for the Identification of Populations of Globodera pallida.

Authors:  Angelique H Hoolahan; Vivian C Blok; Tracey Gibson; Mark Dowton
Journal:  J Nematol       Date:  2012-03       Impact factor: 1.402

9.  Functional characterization of PAS and HES family bHLH transcription factors during the metamorphosis of the red flour beetle, Tribolium castaneum.

Authors:  Kavita Bitra; Anjiang Tan; Ashley Dowling; Subba R Palli
Journal:  Gene       Date:  2009-08-13       Impact factor: 3.688

10.  Gene classification based on amino acid motifs and residues: the DLX (distal-less) test case.

Authors:  Nuno A Fonseca; Cristina P Vieira; Jorge Vieira
Journal:  PLoS One       Date:  2009-06-01       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.