Literature DB >> 18474506

Malin: maximum likelihood analysis of intron evolution in eukaryotes.

Miklós Csurös1.   

Abstract

UNLABELLED: Malin is a software package for the analysis of eukaryotic gene structure evolution. It provides a graphical user interface for various tasks commonly used to infer the evolution of exon-intron structure in protein-coding orthologs. Implemented tasks include the identification of conserved homologous intron sites in protein alignments, as well as the estimation of ancestral intron content, lineage-specific intron losses and gains. Estimates are computed either with parsimony, or with a probabilistic model that incorporates rate variation across lineages and intron sites. AVAILABILITY: Malin is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the website http://www.iro.umontreal.ca/~csuros/introns/malin/. The software is distributed under a BSD-style license.

Entities:  

Mesh:

Year:  2008        PMID: 18474506      PMCID: PMC2718671          DOI: 10.1093/bioinformatics/btn226

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

An idiosyncratic feature of eukaryotic gene organization is that the genomic sequences of protein-coding genes are frequently interrupted by non-coding sequences, called introns, which are excised (spliced) from the transcripts prior to translation. Fundamental constituents of the splicing machinery are present throughout main eukaryotic lineages (Collins and Penny, 2005). Intron-containing genes are spread across diverse eukaryotic phyla, and orthologous genes often have similar exon–intron organization even at large evolutionary distances (Rogozin et al., 2003). Accordingly, it is fairly certain that splicing was already present in the last common ancestor of eukaryotes (Rodríguez-Trelles et al., 2006). Gene structures changed to different extents in eukaryotic lineages (Roy and Gilbert, 2006). Whole-genome sequencing projects have made it possible to perform large-scale phylogenetic analyses that scrutinize the evolution of exon–intron organization. Following the pioneering study by Rogozin et al. (2003), numerous results have appeared (Carmel et al., 2005; Carmel et al., 2007; Csűrös, 2005; Csűrös et al., 2007, 2007; Nguyen et al., 2005; Nielsen et al., 2004; Roy and Gilbert, 2005; Roy and Penny, 2006; Stajich et al., 2007; Sullivan et al., 2006) inferring lineage- and gene-specific features of gene structure evolution, and often describing methodological novelties. This note aims to introduce Malin, a software package developed for the analysis of eukaryotic gene structure evolution.

2 FEATURES

Malin provides a graphical user interface for various tasks commonly used to infer the evolution of exon–intron structure in multiple protein-coding ortholog sets (Fig. 1) along a fixed species phylogeny. The implemented tasks include the following:
Fig. 1.

(A) Typical analysis pipeline for intron evolution. Malin can perform the tasks downstream of ortholog identification and alignment. (B) Alignment panel in Malin. The intron table will be constructed from a set of multiple alignments (corresponding to the rows of the table displayed in the middle on the top), based on conservation criteria specified by the user (through the form on the upper right). The bottom half of the panel plots an illustration for the selected alignment, showing alignment gaps and projected intron sites (colored tags).

Identification of conserved homologous splice sites in annotated protein sequence alignments. Computation of primary statistics about introns in homologous sites (‘shared introns’). Estimation of ancestral intron content, intron losses and gains by Dollo parsimony. Estimation of intron loss and gain rates in a probabilistic model. Estimation of ancestral intron content, intron losses and gains in a probabilistic model. Inference of histories at individual or multiple sites. Error estimation for rates and histories by bootstrap. (A) Typical analysis pipeline for intron evolution. Malin can perform the tasks downstream of ortholog identification and alignment. (B) Alignment panel in Malin. The intron table will be constructed from a set of multiple alignments (corresponding to the rows of the table displayed in the middle on the top), based on conservation criteria specified by the user (through the form on the upper right). The bottom half of the panel plots an illustration for the selected alignment, showing alignment gaps and projected intron sites (colored tags). Figure 1 illustrates the typical analysis pipeline for eukaryotic gene structure evolution (Rogozin et al., 2005). In order to infer if spliceosomal introns are in homologous positions, splice sites need to be projected onto coding sequences, and then homology is established in conserved regions of the protein alignments. An intron table is constructed from the projected intron annotations. The table is a binary table of intron presence and absence in homologous sites across the studied organisms: Malin can also cope with ambiguous characters. The patterns can be analyzed by Dollo parsimony (Farris, 1977) (assuming that intron gains and losses are rare events), or by probabilistic models of intron evolution. Malin works with the likelihood framework that I have elaborated (Csűrös, 2005; (Csűrös et al., 2007, 2008). The corresponding probabilistic model has branch-specific intron gain and loss rates, as well as rates-across-sites variation. Malin uses a rates-across-sites Markov model for intron evolution, with branch-specific gain and loss rates. If no rate variation is assumed across the sites, then every branch has just a gain and loss rate, with corresponding gain and loss probabilities. Briefly, an intron is lost on an edge of length t with probability where λ and μ are the gain and loss rates; a new intron appears in a previously unoccupied site with probability . The constant rate model (Csűrös et al., 2007) is completely specified by the branch-specific gain/loss rates, and the probability with which intron sites are occupied at the root. The rate variation model (Csűrös et al., 2008) assumes that intron sites belong to discrete rate categories. Each site category is defined by a pair of loss and gain rate factors (α,β), so that the loss rates μα and gain rates λβ apply on each edge with prototypical rates μ and λ. Malin optimizes rate factors, and can analyze the same dataset with different models simultaneously. Malin is written entirely in Java. It can be used on any computer platform with a Java Runtime Environment (implementing J2SE 1.5 or higher), including Microsoft Windows, MacOS X and Linux. In addition, Malin is also available as an integrated application on MacOS X. The software is distributed with test data and a detailed User's Guide. Input files follow commonly used formats: Newick format for the possibly multifurcating species phylogeny, Fasta format for alignments and the syntax used by Rogozin et al. (2003) for intron tables. Intron sites are specified in Fasta headers. Analysis results can be exported into tab-delimited text files. The software implements previously described computational innovations (Csűrös et al., 2007, 2008), including rate optimization, posterior predictions, fast evaluation of the likelihood function and estimation of statistical confidence through bootstrapping. Malin provides a feature-rich graphical user interface for the analysis tasks. Figure 1 gives an example of an alignment panel, where, in order to build an intron table, the user selects the conservation criteria (such as the minimum number of gapless positions next to an intron site) for discerning homologous sites in a set of multiple alignments. Ideally, Malin will enable researchers to conduct phylogenetic gene structure analysis with the same ease that is currently available for molecular sequences.
  15 in total

1.  Complex spliceosomal organization ancestral to extant eukaryotes.

Authors:  Lesley Collins; David Penny
Journal:  Mol Biol Evol       Date:  2005-01-19       Impact factor: 16.240

2.  Rates of intron loss and gain: implications for early eukaryotic evolution.

Authors:  Scott William Roy; Walter Gilbert
Journal:  Proc Natl Acad Sci U S A       Date:  2005-04-12       Impact factor: 11.205

Review 3.  Analysis of evolution of exon-intron structure of eukaryotic genes.

Authors:  Igor B Rogozin; Alexander V Sverdlov; Vladimir N Babenko; Eugene V Koonin
Journal:  Brief Bioinform       Date:  2005-06       Impact factor: 11.622

4.  Origins and evolution of spliceosomal introns.

Authors:  Francisco Rodríguez-Trelles; Rosa Tarrío; Francisco J Ayala
Journal:  Annu Rev Genet       Date:  2006       Impact factor: 16.830

Review 5.  The evolution of spliceosomal introns: patterns, puzzles and progress.

Authors:  Scott William Roy; Walter Gilbert
Journal:  Nat Rev Genet       Date:  2006-03       Impact factor: 53.242

6.  Extremely intron-rich genes in the alveolate ancestors inferred with a flexible maximum-likelihood approach.

Authors:  Miklós Csurös; Igor B Rogozin; Eugene V Koonin
Journal:  Mol Biol Evol       Date:  2008-02-21       Impact factor: 16.240

7.  Large-scale intron conservation and order-of-magnitude variation in intron loss/gain rates in apicomplexan evolution.

Authors:  Scott William Roy; David Penny
Journal:  Genome Res       Date:  2006-09-08       Impact factor: 9.043

8.  Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number.

Authors:  Scott William Roy; Daniel L Hartl
Journal:  Genome Res       Date:  2006-05-15       Impact factor: 9.043

9.  Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution.

Authors:  Igor B Rogozin; Yuri I Wolf; Alexander V Sorokin; Boris G Mirkin; Eugene V Koonin
Journal:  Curr Biol       Date:  2003-09-02       Impact factor: 10.834

10.  New maximum likelihood estimators for eukaryotic intron evolution.

Authors:  Hung D Nguyen; Maki Yoshihama; Naoya Kenmochi
Journal:  PLoS Comput Biol       Date:  2005-12-30       Impact factor: 4.475

View more
  24 in total

1.  Reverse transcriptase and intron number evolution.

Authors:  Kemin Zhou; Alan Kuo; Igor V Grigoriev
Journal:  Stem Cell Investig       Date:  2014-09-28

2.  Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.

Authors:  Osamu Gotoh
Journal:  Methods Mol Biol       Date:  2021

3.  Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information.

Authors:  Natascha Hill; Alexander Leow; Christoph Bleidorn; Detlef Groth; Ralph Tiedemann; Joachim Selbig; Stefanie Hartmann
Journal:  Theory Biosci       Date:  2012-12-18       Impact factor: 1.919

4.  Origin of spliceosomal introns and alternative splicing.

Authors:  Manuel Irimia; Scott William Roy
Journal:  Cold Spring Harb Perspect Biol       Date:  2014-06-02       Impact factor: 10.005

5.  Mechanisms of intron gain and loss in Drosophila.

Authors:  Paul Yenerall; Bradlee Krupa; Leming Zhou
Journal:  BMC Evol Biol       Date:  2011-12-19       Impact factor: 3.260

6.  Genome streamlining in a minute herbivore that manipulates its host plant.

Authors:  Robert Greenhalgh; Wannes Dermauw; Joris J Glas; Stephane Rombauts; Nicky Wybouw; Jainy Thomas; Juan M Alba; Ellen J Pritham; Saioa Legarrea; René Feyereisen; Yves Van de Peer; Thomas Van Leeuwen; Richard M Clark; Merijn R Kant
Journal:  Elife       Date:  2020-10-23       Impact factor: 8.140

Review 7.  Origin and evolution of spliceosomal introns.

Authors:  Igor B Rogozin; Liran Carmel; Miklos Csuros; Eugene V Koonin
Journal:  Biol Direct       Date:  2012-04-16       Impact factor: 4.540

8.  Dynamics of genomic innovation in the unicellular ancestry of animals.

Authors:  Xavier Grau-Bové; Guifré Torruella; Stuart Donachie; Hiroshi Suga; Guy Leonard; Thomas A Richards; Iñaki Ruiz-Trillo
Journal:  Elife       Date:  2017-07-20       Impact factor: 8.140

9.  Gene make-up: rapid and massive intron gains after horizontal transfer of a bacterial α-amylase gene to Basidiomycetes.

Authors:  Jean-Luc Da Lage; Manfred Binder; Aurélie Hua-Van; Stefan Janeček; Didier Casane
Journal:  BMC Evol Biol       Date:  2013-02-13       Impact factor: 3.260

10.  PIECE: a database for plant gene structure comparison and evolution.

Authors:  Yi Wang; Frank M You; Gerard R Lazo; Ming-Cheng Luo; Roger Thilmony; Sean Gordon; Shahryar F Kianian; Yong Q Gu
Journal:  Nucleic Acids Res       Date:  2012-11-24       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.