Literature DB >> 20439312

ALTER: program-oriented conversion of DNA and protein alignments.

Daniel Glez-Peña1, Daniel Gómez-Blanco, Miguel Reboiro-Jato, Florentino Fdez-Riverola, David Posada.   

Abstract

ALTER is an open web-based tool to transform between different multiple sequence alignment formats. The originality of ALTER lies in the fact that it focuses on the specifications of mainstream alignment and analysis programs rather than on the conversion among more or less specific formats. In addition, ALTER is capable of identify and remove identical sequences during the transformation process. Besides its user-friendly environment, ALTER allows access to its functionalities in a programmatic way through a Representational State Transfer web service. ALTER's front-end and its API are freely available at http://sing.ei.uvigo.es/ALTER/ and http://sing.ei.uvigo.es/ALTER/api/, respectively.

Entities:  

Mesh:

Year:  2010        PMID: 20439312      PMCID: PMC2896128          DOI: 10.1093/nar/gkq321

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Multiple sequence alignments (MSAs) are at the core of many bioinformatic analyses that benefit from the comparison of genomic sequences, from phylogenetic reconstruction to functional prediction (1,2). MSAs can be stored in a large variety of formats (e.g. FASTA, PIR, PHYLIP, NEXUS, etc.), and very often, researchers are obligated to transform between these in order to use different tools. Some conversion utilities have been extremely useful in this regard, the most popular being ReadSeq (http://iubio.bio.indiana.edu/soft/molbio/readseq/java/). Indeed, there are other tools developed mainly for other purposes that can also import and export aligments in several formats, like ReadAl/TrimAl (3), SeaView (4), Se-Al (http://tree.bio.ed.ac.uk/software/seal/) or even ClustalX2 (5), among others. Moreover, projects like BioPython (6) or BioPerl (7) also offer conversion capabilities. However, the problem with most of these converters is that they—logically—focus on more or less flexible format specifications that are often violated by both developers and users. In fact, during the last years MSA’s formats have ‘evolved’ very much like the sequences they contain, with mutational events consisting of long names, extra spaces, additional carriage returns, etc. Thus, different applications often require or produce particular MSA formats that in fact do not completely fulfill the requirements of the ‘canonical’ formats, often complicating the use of different tools for the analysis of data. For example, ReadSeq and programs like PAML (8) or PAUP* (http://paup.csit.fsu.edu/) fail to read simple alignments produced by ClustalX2 in PHYLIP format. To alleviate these kind of problems, we introduce a web server called ALTER for the program-oriented—rather than format-oriented—conversion between DNA and protein MSA formats. ALTER is free and open to all and there is no login requirement.

FUNCIONALITY

ALTER was designed to accomplish two main objectives: (i) easily convert between MSA formats used by popular tools and (ii) collapse sequences to haplotypes (unique sequences). In order to perform these operations in an intuitive way, ALTER implements a straightforward workflow that easily guides the user through a four-step wizard in which the different options are automatically activated when the required information is available. In addition, ALTER provides an easy-to-follow on-line help as well as many sample MSA data for testing purposes.

Program workflow

The use of ALTER typically implies four simple steps: (i) format/program identification, (ii) data load, (iii) definition of conversion parameters and (iv) storage of the generated file (Figure 1).
Figure 1.

Schematic ALTER workflow. The user can select between different input alignment programs and formats, and obtain a MSA specifically formatted for a particular program.

Schematic ALTER workflow. The user can select between different input alignment programs and formats, and obtain a MSA specifically formatted for a particular program. The process of converting a given MSA in ALTER starts with the selection of the source program and/or the current format. If the user is not confident about this information, the server can try to auto detect the format of the input file. Next, the user has to specify the operating system (OS) under which the input file was generated and upload it, or alternatively directly paste the data. In order to process the input MSA, ALTER first instantiates an appropriate sequence reader for both the input format and program. For each program/format pair, there is a specific parser generated from a formal grammar via JavaCC technology. Regardless of the possibility to reuse grammars among programs that utilize the same format, ALTER has been designed to be able to associate a different grammar for each program/format pair in order to tackle potential differences. If the user has selected the ‘auto detect’ option, a program-independent grammar is used instead. If there are syntax errors on the input sequences, the parser reports precise information about them and the process aborts. Once the input MSA has been successfully read, ALTER can perform an optional step to identify redundant sequences and collapse them into haplotypes. Finally, an appropriate writer for the output program/format/OS is instantiated in order to generate the converted MSA, taking into account different parameters. These allow the user to (i) generate sequential or interleaved sequences (in NEXUS and PHYLIP formats), (ii) use lower case for residues, (iii) use match characters (‘.’) to indicate that the same residue is located at the same position of the first sequence and (iii) generate the sum of the number of residues at each sequence line (ALN format). In addition, the collapsing step can be configured to (i) treat gaps as missing data, (ii) consider missing data as differences between sequences and (iii) define a maximum limit of differences to collapse sequences. It is also possible to generate a program-independent conversion using only the canonical format specification. Every time a new conversion job finished without errors, the output file is displayed and a download button is activated. All the relevant information related to the process of loading and recognizing the input MSA is automatically categorized (info, error, warning) and displayed to the final user by using informative log panels (Figure 2).
Figure 2.

Example of a MSA conversion in ALTER. The ‘Info panel’ in the log area shows information related with the process carried out. Help, support for feedback and contact information options are available from the upper left area. Source code and a description of web services are available from the upper right area.

Example of a MSA conversion in ALTER. The ‘Info panel’ in the log area shows information related with the process carried out. Help, support for feedback and contact information options are available from the upper left area. Source code and a description of web services are available from the upper right area.

Supported MSA formats/programs

ALTER supports a variety of specific MSA formats provided by popular alignment tools and accepted by a variety of analysis programs. Currently, the focus is on molecular evolution, but different tools can be easily added on request. The list of programs supported include alignment, alignment filtering, sequence edition, model selection, phylogenetic, network and population genetics software (Table 1).
Table 1.

List programs/formats supported by ALTER

ToolsSupported formats
INPUT: multiple sequence alignment programs
    Clustal (10)ALN, FASTA, GDE, MSF, NEXUS, PHYLIP, PIR
    MAFFT (11)ALN, FASTA
    TCoffee (12)ALN, FASTA, MSF, PHYLIP, PIR
    MUSCLE (13)ALN, FASTA, MSF, PHYLIP
    PROBCONS (14)ALN, FASTA
OUTPUT: alignment
    ClustalALN, FASTA, GDE, MSF, PIR
    MAFFTFASTA
    MUSCLEFASTA
    PROBCONSFASTA
    TCoffeeALN, FASTA, MSF, PIR
OUTPUT: alignment filtering
    Gblocks (15)FASTA, PIR
OUTPUT: sequence edition
    BioEdit (16)ALN, FASTA, MSF, NEXUS, PHYLIP, PIR
    Se-AlaFASTA, GDE, NEXUS, PHYLIP, PIR
OUTPUT: model selection
    jModelTest (17)ALN, FASTA, MSF, NEXUS, PHYLIP, PIR
    ProtTest (18)NEXUS, PHYLIP
OUTPUT: phylogenetic analysis
    MEGA (19)ALN, FASTA, MEGA, MSF, NEXUS, PHYLIP, PIR
    MesquitebNEXUS
    MrBayes (20)NEXUS
    PAML (8)NEXUS, PHYLIP
    PAUP (21)MEGA, MSF, NEXUS, PHYLIP, PIR
    PhyML (22)PHYLIP
    RaxML (23)PHYLIP
OUTPUT: phylogenetic networks
    SplitsTree (24)ALN, FASTA, NEXUS, PHYLIP
    TCS (25)NEXUS, PHYLIP
OUTPUT: population genetics
    DnaSP (26)FASTA, MEGA, NEXUS, PHYLIP, PIR
OUTPUT: General
    standard specificationALN, FASTA, GDE, MEGA, MSF, NEXUS, PHYLIP, PIR

ahttp://tree.bio.ed.ac.uk/software/seal/.

bhttp://mesquiteproject.org/.

List programs/formats supported by ALTER ahttp://tree.bio.ed.ac.uk/software/seal/. bhttp://mesquiteproject.org/.

Web services

In addition to the functionality provided by the end user front-end, ALTER also implements a web service that allows developers to transform multiple alignment sequences directly in ALTER within their own algorithms and programs (http://sing.ei.uvigo.es/ALTER/api/). Essentially, ALTER’s API offers a unique convert function with multiple parameters plus some metadata functions giving information about the formats and options currently supported. Table 2 summarizes the API functionality.
Table 2.

Core functionality provided by ALTER’s RESTful API

FunctionDescription
ConvertConverts an input sequence from one format to another. This function is accessed via HTTP POST where both the sequence and parameters should be sent to the server.
Metadata functions
List OSsLists the available OSs to read files from.
URL: http://sing.ei.uvigo.es/ALTER/api/so
List input programsLists the currently supported input programs.
URL: http://sing.ei.uvigo.es/ALTER/api/input/programs
List input formatsLists the currently supported input formats.
URL: http://sing.ei.uvigo.es/ALTER/api/input/formats
List output programsLists the currently supported output programs.
URL: http://sing.ei.uvigo.es/ALTER/api/output/programs
List output formatsLists the currently supported output formats.
URL: http://sing.ei.uvigo.es/ALTER/api/output/formats
List output formats for a specific programLists the supported output formats for a given output program.
Example URL: http://sing.ei.uvigo.es/ALTER/api/output/paml/formats
List options for output program and formatLists the supported options for a given output program and format.
Example URL: http://sing.ei.uvigo.es/ALTER/api/output/paml/nexus/options
Core functionality provided by ALTER’s RESTful API

Supported platforms

ALTER runs on a standard Tomcat 5.5 Web application server. Currently, ALTER has been successfully tested in Internet Explorer 7, Firefox 3, Opera 9.62 and Safari 3 browsers working on Windows XP/Vista, Ubuntu Linux 8.04 version and Mac OSX 10.5 of Intel architecture.

IMPLEMENTATION

ALTER is implemented as an AJAX-enabled web application programmed in the J2SE 1.5 Java language. The ZK development framework (http://www.zkoss.org) was used to construct the user interface and to give support to JavaCC for parsing input MSA. JavaCC is a parser and a lexical analyzer generator, that is, it reads a formal description of a language (grammar) and generates code to parse instances of it. It can be see as the Java counterpart of the Lex/Flex and Yacc/Bison tools. Using JavaCC it is possible to (i) isolate the specific sequence format description in independent grammar files and (ii) generate precise error messages during parsing (9). ALTER also implements a REST-based programming interface. Like any RESTful web service, operations are performed via web queries with a well-defined URL structure. Currently, the server gives access to the main sequence conversion functionality as well as to a set of reflective functions intended to get updated information about the supported programs and formats. This server module was implemented following the JAX-RS 1.0 (Java API for RESTful Web Services) by using the implementation found in the Apache CXF library.

CONCLUSIONS

Current MSA conversion tools understandably focus on the translation among ‘canonical’ formats, but in many instances are not of much help for users, which are interested in working with particular programs that use idiosyncratic format variations. In order to alleviate this drawback, we introduce a web server called ALTER for the program-oriented—rather than format-oriented—conversion between different DNA and protein MSA formats. In addition, ALTER is able to ‘collapse’ sequences to haplotypes—unique sequences—indicating which sequence corresponds to which haplotype. Eliminating this redundancy can be very helpful, for example, to speed up phylogenetic analyses.

FUNDING

European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.); Spanish Ministry of Science and Education (BFU2009-08611 to D.P.); Xunta de Galicia (PGIDIT07PXIB310202PR to D.P.); INBIOMED initiative, Angeles Alvariño fellowship (to D.G-P.); University of Vigo (09VIB10 to F.F-.R.). Funding for open access charge: European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.). Conflict of interest statement. None declared.
  22 in total

1.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

3.  ProbCons: Probabilistic consistency-based multiple sequence alignment.

Authors:  Chuong B Do; Mahathi S P Mahabhashyam; Michael Brudno; Serafim Batzoglou
Journal:  Genome Res       Date:  2005-02       Impact factor: 9.043

4.  ProtTest: selection of best-fit models of protein evolution.

Authors:  Federico Abascal; Rafael Zardoya; David Posada
Journal:  Bioinformatics       Date:  2005-01-12       Impact factor: 6.937

5.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

6.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.

Authors:  Koichiro Tamura; Joel Dudley; Masatoshi Nei; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2007-05-07       Impact factor: 16.240

7.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

8.  SplitsTree: analyzing and visualizing evolutionary data.

Authors:  D H Huson
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

9.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

10.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

View more
  81 in total

1.  Settling taxonomic and nomenclatural problems in brine shrimps, Artemia (Crustacea: Branchiopoda: Anostraca), by integrating mitogenomics, marker discordances and nomenclature rules.

Authors:  Lucía Sainz-Escudero; E Karen López-Estrada; Paula Carolina Rodríguez-Flores; Mario García-París
Journal:  PeerJ       Date:  2021-03-10       Impact factor: 2.984

2.  Molecular Analysis and Localization of CaARA7 a Conventional RAB5 GTPase from Characean Algae.

Authors:  Marion C Hoepflinger; Anja Geretschlaeger; Aniela Sommer; Margit Hoeftberger; Christina Hametner; Takashi Ueda; Ilse Foissner
Journal:  Traffic       Date:  2015-04-20       Impact factor: 6.215

3.  Does the reproductive strategy affect the transmission and genetic diversity of bionts in cyanolichens? A case study using two closely related species.

Authors:  Mónica A G Otálora; Clara Salvador; Isabel Martínez; Gregorio Aragón
Journal:  Microb Ecol       Date:  2012-11-27       Impact factor: 4.552

4.  jModelTest 2: more models, new heuristics and parallel computing.

Authors:  Diego Darriba; Guillermo L Taboada; Ramón Doallo; David Posada
Journal:  Nat Methods       Date:  2012-07-30       Impact factor: 28.547

5.  Human impacts have shaped historical and recent evolution in Aedes aegypti, the dengue and yellow fever mosquito.

Authors:  Julia E Brown; Benjamin R Evans; Wei Zheng; Vanessa Obas; Laura Barrera-Martinez; Andrea Egizi; Hongyu Zhao; Adalgisa Caccone; Jeffrey R Powell
Journal:  Evolution       Date:  2013-10-23       Impact factor: 3.694

6.  Phylogeny and phylogeography of the Tuber brumale aggr.

Authors:  Zsolt Merényi; Torda Varga; József Geml; Ákos Kund Orczán; Gerard Chevalier; Zoltán Bratek
Journal:  Mycorrhiza       Date:  2014-03-07       Impact factor: 3.387

7.  Occurrence and effect of trematode metacercariae in two endangered killifishes from Greece.

Authors:  Eleni Kalogianni; Nikol Kmentová; Eileen Harris; Brian Zimmerman; Sofia Giakoumi; Yorgos Chatzinikolaou; Maarten P M Vanhove
Journal:  Parasitol Res       Date:  2017-09-13       Impact factor: 2.289

8.  An overview of the Gyrodactylus (Monogenea: Gyrodactylidae) species parasitizing African catfishes, and their morphological and molecular diversity.

Authors:  Iva Přikrylová; Radim Blažek; Maarten P M Vanhove
Journal:  Parasitol Res       Date:  2011-08-18       Impact factor: 2.289

9.  Species of Gyrodactylus von Nordmann, 1832 (Platyhelminthes: Monogenea) from cichlids from Zambezi and Limpopo river basins in Zimbabwe and South Africa: evidence for unexplored species richness.

Authors:  Petra Zahradníčková; Maxwell Barson; Wilmien J Luus-Powell; Iva Přikrylová
Journal:  Syst Parasitol       Date:  2016-08-13       Impact factor: 1.431

10.  Detection of Spiroplasma and Wolbachia in the bacterial gonad community of Chorthippus parallelus.

Authors:  P Martínez-Rodríguez; M Hernández-Pérez; J L Bella
Journal:  Microb Ecol       Date:  2013-04-16       Impact factor: 4.552

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.