Literature DB >> 18984599

TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops.

Iain Milne1, Dominik Lindner, Micha Bayer, Dirk Husmeier, Gráinne McGuire, David F Marshall, Frank Wright.   

Abstract

UNLABELLED: TOPALi v2 simplifies and automates the use of several methods for the evolutionary analysis of multiple sequence alignments. Jobs are submitted from a Java graphical user interface as TOPALi web services to either run remotely on high-performance computing clusters or locally (with multiple cores supported). Methods available include model selection and phylogenetic tree estimation using the Bayesian inference and maximum likelihood (ML) approaches, in addition to recombination detection methods. The optimal substitution model can be selected for protein or nucleic acid (standard, or protein-coding using a codon position model) data using accurate statistical criteria derived from ML co-estimation of the tree and the substitution model. Phylogenetic software available includes PhyML, RAxML and MrBayes. AVAILABILITY: Freely downloadable from http://www.topali.org for Windows, Mac OS X, Linux and Solaris.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18984599      PMCID: PMC2638937          DOI: 10.1093/bioinformatics/btn575

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The statistical revolution in molecular phylogenetics (Felsenstein, 2001) continues to gather pace with the development of faster and/or more sophisticated maximum likelihood (ML) and Bayesian inference methods, with recent software implementations including PhyML (Guindon and Gascuel, 2003), RAxML (Stamatakis, 2006) and MrBayes (Ronquist and Huelsenbeck, 2003). To fully utilize these methods, it is also important to select a model of evolution with an appropriate level of complexity for the dataset by making use of model selection procedures, e.g. as in the ModelTest software (Posada and Crandall, 1998). However, there are opportunities for improved practice in model testing, particularly for protein-coding DNA where appropriate models for each codon position (CP) are usually required (Bofkin and Goldman, 2007) and are often underused (Shapiro et al., 2005). While biologists are being encouraged to adopt these improved methods (Whelan et al., 2000), there are still obstacles, including the lack of a single easy-to-use interface for analyzing a multiple sequence alignment (MSA) with a range of methods (e.g. model selection and phylogenetic tree estimation) and sophisticated access to computer power. While computational resources are becoming available for phylogenetic analysis (e.g. Dereeper et al., 2008), the interfaces available are basic and analysis options are often not stored for amendment. TOPALi (tree TOPology-related analysis of ALignments Interface) version 1 (Milne et al., 2004) specialized in the detection of recombination in an MSA and ran only on desktops. We have extended and redesigned TOPALi to carry out additional analyses, particularly automated model selection linked to phylogenetic analyses, resulting in rich graphical output, and to utilize, via a sophisticated interface with access to previous option choices, the increased power of high performance computing (HPC) clusters and multi-core desktops. In particular, TOPALi v2 reduces the learning curve for biologists to carry out high quality phylogenetic analyses, by hiding the complexities of setting up and running many analyses.

2 FEATURES

Alignment handling: TOPALi can import/export DNA, RNA and protein MSAs in many formats and create DNA alignments from a protein MSA and corresponding unaligned DNA. Several MSAs can be stored within a TOPALi project allowing working with a group of related MSAs. TOPALi can quickly render alignments facilitating quality checks: the alignment overview shows the relative position of the zoomed region to the full alignment. The user can semi-automatically or manually select a reduced number of sequences for analysis and can also restrict the columns, e.g. exons could be extracted from a genomic alignment and saved as a new alignment. Model selection: the menu launches models available in MrBayes (24 nucleotide models, 36 amino acid models) or in PhyML (56 and 40, respectively) or in RAxML (no nucleotide model choice, 40 amino acid models). The optimal model is automatically selected (and passed to the phylogenetic analysis launch menus) based on calculations involving either hierarchical likelihood ratio tests (hLRTs), Akaike information criterion (AIC), or Bayesian information criterion (BIC), generally following the ModelTest approach, except that (i) the model parameters (substitution, rate heterogeneity) and the phylogenetic tree are estimated by running a separate PhyML job for each model resulting in more accurate estimates of the log likelihood and derived quantities (AIC, BIC) and (ii) the hLRT tests among the 56 PhyML nucleotide models are based on single pairwise LRT tests. CP model selection treats the coding region as three separate alignments. Model components that are similar across CPs can be linked to share parameter estimation in subsequent MrBayes analysis. Phylogenetic tree estimation: web services run the MrBayes, PhyML and RAxML programs on either nucleotide or protein MSAs. For nucleotide data, MrBayes analysis can use a model for all positions or a CP model. The user then accepts, or overrules the model selection choices and enters the MrBayes analyses settings (nRuns, nGenerations, Sample Frequency and Burn-in percentage). For ML analysis, PhyML offers only one model for all positions, so the user accepts or overrules the model selection choice and analysis settings (including number of bootstrap runs). RAxML has three rate heterogeneity models (including the Gamma distribution) but only one parameter-rich model (GTR) for nucleotide analysis, although the model parameters can be estimated separately for each CP. Tree manipulation tools include midpoint rooting and editing to simplify the display of support values.

3 IMPLEMENTATION

TOPALi's analysis methods have been designed to run either on remote HPC clusters (the default setting), or on standard desktops. With no underlying code duplication, we have devised a novel approach that runs tasks as independent processes within their own Java Virtual Machine (when executed on separate cluster nodes), or as semi-independent processes within a typical multithreaded application (when running on a desktop), managed by a local process manager that sets the number of CPUs to be used. In addition to the obvious speed benefits, HPC usage also eliminates any compilation or configuration issues a user may encounter when running jobs locally as some sub-components of the analyses are handled by C or C++ programs from third parties, and must therefore be compiled for local use. TOPALi is designed to be user-friendly and thus includes functionality that allows the user to work with a project locally (loading or examining alignments for instance) and then to submit one or more analysis jobs for remote processing. The client can be closed and reopened later, and the progress of the jobs will be updated from the server. Previously, completed jobs can also be reselected and a new job submission can be created that mirrors the settings from the original job, with or without further modifications. Making use of a newly developed web services resource broker (I.Milne et al., manuscript in preparation), TOPALi queries the broker monitoring a pool of remote HPC clusters hosting TOPALi web services (currently at the Scottish Crop Research Institute and University of Dundee) that can then intelligently decide which cluster is most suitable for the job. We can also manage load by rejecting jobs submitted with very high numbers of sequences (on a per analysis basis). Readers who are interested in hosting TOPALi services on their own Sun Grid Engine enabled cluster (either for private or further public use) may contact us for advice on configuration. TOPALi's main interface showing alignment handling, tree estimation and model selection features. TOPALi is coded in Java for platforms supporting Java version 1.5.0 and above. We provide installable versions with everything required to run the application, including a suitable Java runtime. Funding: UK BBSRC/EPSRC Bioinformatics/E-science Initiative (BBSB16615); the Scottish Government; Scottish Funding Council; Scottish Enterprise. Conflict of Interest: none declared.
  10 in total

Review 1.  Molecular phylogenetics: state-of-the-art methods for looking into the past.

Authors:  S Whelan; P Liò; N Goldman
Journal:  Trends Genet       Date:  2001-05       Impact factor: 11.639

2.  The troubled growth of statistical phylogenetics.

Authors:  J Felsenstein
Journal:  Syst Biol       Date:  2001-08       Impact factor: 15.683

3.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

4.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Authors:  Stéphane Guindon; Olivier Gascuel
Journal:  Syst Biol       Date:  2003-10       Impact factor: 15.683

5.  TOPALi: software for automatic identification of recombinant sequences within DNA multiple alignments.

Authors:  Iain Milne; Frank Wright; Glenn Rowe; David F Marshall; Dirk Husmeier; Gràinne McGuire
Journal:  Bioinformatics       Date:  2004-02-26       Impact factor: 6.937

6.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences.

Authors:  Beth Shapiro; Andrew Rambaut; Alexei J Drummond
Journal:  Mol Biol Evol       Date:  2005-09-21       Impact factor: 16.240

7.  Variation in evolutionary processes at different codon positions.

Authors:  Lee Bofkin; Nick Goldman
Journal:  Mol Biol Evol       Date:  2006-11-21       Impact factor: 16.240

8.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

9.  MODELTEST: testing the model of DNA substitution.

Authors:  D Posada; K A Crandall
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

10.  Phylogeny.fr: robust phylogenetic analysis for the non-specialist.

Authors:  A Dereeper; V Guignon; G Blanc; S Audic; S Buffet; F Chevenet; J-F Dufayard; S Guindon; V Lefort; M Lescot; J-M Claverie; O Gascuel
Journal:  Nucleic Acids Res       Date:  2008-04-19       Impact factor: 16.971

  10 in total
  195 in total

1.  Diversifying evolution of highly pathogenic H5N1 avian influenza virus in Egypt from 2006 to 2011.

Authors:  E M Abdelwhab; Abdel-Satar Arafa; Jürgen Stech; Christian Grund; Olga Stech; Marcus Graeber-Gerberding; Martin Beer; Mohamed K Hassan; Mona M Aly; Timm C Harder; Hafez M Hafez
Journal:  Virus Genes       Date:  2012-06-05       Impact factor: 2.332

2.  Isolation-driven divergence: speciation in a widespread North American songbird (Aves: Certhiidae).

Authors:  Joseph D Manthey; John Klicka; Garth M Spellman
Journal:  Mol Ecol       Date:  2011-09-21       Impact factor: 6.185

3.  Genome-wide networks of amino acid covariances are common among viruses.

Authors:  Maureen J Donlin; Brandon Szeto; David W Gohara; Rajeev Aurora; John E Tavis
Journal:  J Virol       Date:  2012-01-11       Impact factor: 5.103

4.  Nephromyces, a beneficial apicomplexan symbiont in marine animals.

Authors:  Mary Beth Saffo; Adam M McCoy; Christopher Rieken; Claudio H Slamovits
Journal:  Proc Natl Acad Sci U S A       Date:  2010-08-24       Impact factor: 11.205

5.  The gastrointestinal tract as a potential infection reservoir of digital dermatitis-associated treponemes in beef cattle and sheep.

Authors:  L E Sullivan; S D Carter; J S Duncan; D H Grove-White; J W Angell; N J Evans
Journal:  Appl Environ Microbiol       Date:  2015-08-14       Impact factor: 4.792

6.  Legionella cardiaca sp. nov., isolated from a case of native valve endocarditis in a human heart.

Authors:  Meghan M Pearce; Nicole Theodoropoulos; Mark J Mandel; Ellen Brown; Kurt D Reed; Nicholas P Cianciotto
Journal:  Int J Syst Evol Microbiol       Date:  2012-01-27       Impact factor: 2.747

7.  Hepatitis B virus genetic diversity has minimal impact on sensitivity of the viral ribonuclease H to inhibitors.

Authors:  Gaofeng Lu; Juan Antonio Villa; Maureen J Donlin; Tiffany C Edwards; Xiaohong Cheng; Richard F Heier; Marvin J Meyers; John E Tavis
Journal:  Antiviral Res       Date:  2016-09-28       Impact factor: 5.970

8.  Myxobolus ophiocarae sp. n. (Myxozoa: Myxosporea: Bivalvulida) infecting the gill of wild goby, Ophiocara porocephala (Perciformes: Gobioidei) in Malaysia.

Authors:  Muhammad Hafiz Borkhanuddin; Gábor Cech; Suhairi Mazelan; Faizah Shaharom-Harrison; Kálmán Molnár; Csaba Székely
Journal:  Parasitol Res       Date:  2013-10-06       Impact factor: 2.289

9.  Molecular epidemiology of simian immunodeficiency virus infection in wild-living gorillas.

Authors:  Cécile Neel; Lucie Etienne; Yingying Li; Jun Takehisa; Rebecca S Rudicell; Innocent Ndong Bass; Joseph Moudindo; Aimé Mebenga; Amandine Esteban; Fran Van Heuverswyn; Florian Liegeois; Philip J Kranzusch; Peter D Walsh; Crickette M Sanz; David B Morgan; Jean-Bosco N Ndjango; Jean-Christophe Plantier; Sabrina Locatelli; Mary K Gonder; Fabian H Leendertz; Christophe Boesch; Angelique Todd; Eric Delaporte; Eitel Mpoudi-Ngole; Beatrice H Hahn; Martine Peeters
Journal:  J Virol       Date:  2009-11-11       Impact factor: 5.103

10.  Palmyramide A, a cyclic depsipeptide from a Palmyra Atoll collection of the marine cyanobacterium Lyngbya majuscula.

Authors:  Masatoshi Taniguchi; Joshawna K Nunnery; Niclas Engene; Eduardo Esquenazi; Tara Byrum; Pieter C Dorrestein; William H Gerwick
Journal:  J Nat Prod       Date:  2010-03-26       Impact factor: 4.050

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.