Benjamin P Blackburne1, Simon Whelan. 1. Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK.
Abstract
MOTIVATION: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. RESULTS: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. AVAILABILITY: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.
MOTIVATION: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. RESULTS: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. AVAILABILITY: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.
Authors: David A Liberles; Sarah A Teichmann; Ivet Bahar; Ugo Bastolla; Jesse Bloom; Erich Bornberg-Bauer; Lucy J Colwell; A P Jason de Koning; Nikolay V Dokholyan; Julian Echave; Arne Elofsson; Dietlind L Gerloff; Richard A Goldstein; Johan A Grahnen; Mark T Holder; Clemens Lakner; Nicholas Lartillot; Simon C Lovell; Gavin Naylor; Tina Perica; David D Pollock; Tal Pupko; Lynne Regan; Andrew Roger; Nimrod Rubinstein; Eugene Shakhnovich; Kimmen Sjölander; Shamil Sunyaev; Ashley I Teufel; Jeffrey L Thorne; Joseph W Thornton; Daniel M Weinreich; Simon Whelan Journal: Protein Sci Date: 2012-04-23 Impact factor: 6.725
Authors: Tayah Hopes; Karl Norris; Michaela Agapiou; Charley G P McCarthy; Philip A Lewis; Mary J O'Connell; Juan Fontana; Julie L Aspden Journal: Nucleic Acids Res Date: 2022-02-28 Impact factor: 16.971
Authors: Andrew E Webb; Z Nevin Gerek; Claire C Morgan; Thomas A Walsh; Christine E Loscher; Scott V Edwards; Mary J O'Connell Journal: Mol Biol Evol Date: 2015-03-10 Impact factor: 16.240