Literature DB >> 21546392

Boulder ALignment Editor (ALE): a web-based RNA alignment tool.

Jesse Stombaugh1, Jeremy Widmann, Daniel McDonald, Rob Knight.   

Abstract

SUMMARY: The explosion of interest in non-coding RNAs, together with improvements in RNA X-ray crystallography, has led to a rapid increase in RNA structures at atomic resolution from 847 in 2005 to 1900 in 2010. The success of whole-genome sequencing has led to an explosive growth of unaligned homologous sequences. Consequently, there is a compelling and urgent need for user-friendly tools for producing structure-informed RNA alignments. Most alignment software considers the primary sequence alone; some specialized alignment software can also include Watson-Crick base pairs, but none adequately addresses the needs introduced by the rapid influx of both sequence and structural data. Therefore, we have developed the Boulder ALignment Editor (ALE), which is a web-based RNA alignment editor, designed for editing and assessing alignments using structural information. Some features of BoulderALE include the annotation and evaluation of an alignment based on isostericity of Watson-Crick and non-Watson-Crick base pairs, along with the collapsing (horizontally and vertically) of the alignment, while maintaining the ability to edit the alignment. AVAILABILITY: http://www.microbio.me/boulderale.

Mesh:

Substances:

Year:  2011        PMID: 21546392      PMCID: PMC3106197          DOI: 10.1093/bioinformatics/btr258

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The RNA Alignment Ontology (Brown ) provides several key recommendations that are essential for the development of a user-friendly editor of alignments of a few dozen to a few hundred sequences, consisting of a few hundred base pairs. These are: (i) the incorporation of concepts introduced by the RNA Ontology, in particular the Leontis–Westhof classification system for non-Watson–Crick base pairs (Leontis ; Leontis and Westhof, 2001; Stombaugh ), which are the building blocks of tertiary motifs (Nasalean ); (ii) the annotation of specific regions within the structures (e.g. the P4 helix of RNase P), which can be used to support alternative notions of correspondence (sequence level versus structure level), including homology; and (iii) the ability to perform both horizontal and vertical collapsing of the alignment, allowing the user to focus on specific sequences or on specific regions of the alignment. Several additional considerations that are especially useful for curating large databases of structured RNAs such as Rfam (Griffiths-Jones ), the RNase P database (Brown, 1999) and the tRNA database (Juhling ) are (i) the functionality to dynamically score an alignment based on its ability to preserve features of a known structure, including non-Watson–Crick pairing and to highlight mismatches in a context, where the user can edit the alignment to resolve these mismatches; (ii) to visualize the secondary structure of any sequence within the RNA family based on the consensus secondary structure using standard tools that can be embedded in a web context (Darty ); and (iii) to exploit recently discovered compositional preferences in RNA structural regions (Smit ) to indicate when an alignment is a plausible representative of a putative secondary structure. Additional desiderata include a high level of interactivity, for example, the ability to dynamically rearrange rows of the alignments to juxtapose relevant groups, and the ability to stretch, rotate and otherwise manipulate the picture of the secondary structure, while keeping the bases aligned.

2 THE BOULDERALE SOFTWARE

BoulderALE is built on the PyCogent toolkit (Knight ) and combines these features into a single web application that will greatly assist both in the curation of RNA family databases and in the understanding of novel RNA structures. BoulderALE is available at http://www.microbio.me/boulderale, and source code and unit-tests can be obtained from sourceforge under the GPL (http://sourceforge.net/projects/boulderale). The availability of the source code will allow the developers of other RNA resources to integrate BoulderALE in their own web sites. BoulderALE fully implements several of the ROC recommendations, including the ability to display and evaluate non-Watson–Crick base pairs, annotate structural regions within the RNA interactively [including automatic inference of these annotations from infernal (Nawrocki ) covariance models], horizontal and vertical collapsing based on manual choices of sequences or regions, and display and evaluation of secondary structures. Some other considerations include automatically deciding which sequences or regions to collapse, and fully implementing the RNA Alignment Ontology correspondence concepts. A typical workflow is as follows: first, the alignment is input as a Stockholm or FASTA file. Then, a list of valid base pairs, including non-Watson–Crick base pairs, associated with one reference sequence, is uploaded. A tab-delimited file including locations of regions or motifs can also be uploaded. Alternatively, the list of valid base pairs and features can be stored in the Stockholm file. Using the secondary structure (including non-Watson–Crick base pairs), it is then possible to highlight base pairs in the secondary structure that do or do not match in the alignment, and the user can then edit the alignment to optimize this matching. Base composition metrics can also be produced, and the secondary structure can be plotted. This workflow is illustrated in Figure 1.
Fig. 1.

Illustration of an alignment analysis with BoulderALE. Users upload a Stockholm or FASTA-formatted sequence file along with tab-delimited base pair and motif files. Next, users have options to annotate and edit their alignment, and can produce visualizations to aid in their analysis.

Illustration of an alignment analysis with BoulderALE. Users upload a Stockholm or FASTA-formatted sequence file along with tab-delimited base pair and motif files. Next, users have options to annotate and edit their alignment, and can produce visualizations to aid in their analysis. Finally, features can be mapped onto the alignment, and BoulderALE allows the alignment to be vertically or horizontally collapsed to focus the user's attention on specific taxa or regions of the sequence. In practice, it is often useful to do this iteratively, cleaning up a particular region of the alignment in each of several closely related groups of sequences, then aligning the groups of sequences to each other to reveal higher-level correspondences that rely more on structure than on sequence.

3 COMPARISONS WITH OTHER SOFTWARE

There are other software packages that offer some overlapping functionality with BoulderALE, but they are targeted for different alignment problems. Jalview (Clamp ), although web-embeddable, lacks the ability to incorporate structural data. BioEdit (Hall, 1999), although user-friendly and allowing for Watson–Crick pairing, is restricted to the Windows platform and does not allow for horizontal collapsing. S2S (Jossinet and Westhof, 2005) allows for non-Watson–Crick base pairs; however, many users find its interface conventions counterintuitive, since it was primarily designed for modeling RNA, and it cannot annotate and collapse structural motifs. MultiSeq (Roberts ) can do filtering and grouping of redundant sequences, but lacks a representation of non-Watson–Crick base pairs. SARSE (Andersen ) and RALEE (Griffiths-Jones, 2005) allow for feature coloring, however; they both lack the ability to annotate non-Watson–Crick basepairing and horizontal collapsing. These examples are intended to be illustrative rather than exhaustive, since there are several sequence alignment editors to choose from, many of which are optimized for specific tasks other than those addressed here.

4 CONCLUSIONS

In conclusion, BoulderALE provides a user-friendly package that allows rapid visualization of RNA sequence alignments that have previously been inaccessible, especially through the collapsing of features that rapidly focus the user's attention on specific parts of the alignment, while highlighting features allow users to identify specific sequences or regions that require manual cleanup. We believe BoulderALE will thus assist users in dealing with the flood of structural and sequence data now becoming available.
  16 in total

1.  Geometric nomenclature and classification of RNA base pairs.

Authors:  N B Leontis; E Westhof
Journal:  RNA       Date:  2001-04       Impact factor: 4.942

2.  RALEE--RNA ALignment editor in Emacs.

Authors:  Sam Griffiths-Jones
Journal:  Bioinformatics       Date:  2004-09-17       Impact factor: 6.937

3.  Infernal 1.0: inference of RNA alignments.

Authors:  Eric P Nawrocki; Diana L Kolbe; Sean R Eddy
Journal:  Bioinformatics       Date:  2009-03-23       Impact factor: 6.937

4.  The RNA structure alignment ontology.

Authors:  James W Brown; Amanda Birmingham; Paul E Griffiths; Fabrice Jossinet; Rym Kachouri-Lafond; Rob Knight; B Franz Lang; Neocles Leontis; Gerhard Steger; Jesse Stombaugh; Eric Westhof
Journal:  RNA       Date:  2009-07-21       Impact factor: 4.942

5.  Semiautomated improvement of RNA alignments.

Authors:  Ebbe S Andersen; Allan Lind-Thomsen; Bjarne Knudsen; Susie E Kristensen; Jakob H Havgaard; Elfar Torarinsson; Niels Larsen; Christian Zwieb; Peter Sestoft; Jørgen Kjems; Jan Gorodkin
Journal:  RNA       Date:  2007-09-05       Impact factor: 4.942

6.  VARNA: Interactive drawing and editing of the RNA secondary structure.

Authors:  Kévin Darty; Alain Denise; Yann Ponty
Journal:  Bioinformatics       Date:  2009-04-27       Impact factor: 6.937

7.  MultiSeq: unifying sequence and structure data for evolutionary analysis.

Authors:  Elijah Roberts; John Eargle; Dan Wright; Zaida Luthey-Schulten
Journal:  BMC Bioinformatics       Date:  2006-08-16       Impact factor: 3.169

8.  RNA structure prediction from evolutionary patterns of nucleotide composition.

Authors:  S Smit; R Knight; J Heringa
Journal:  Nucleic Acids Res       Date:  2009-01-07       Impact factor: 16.971

9.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes.

Authors:  Frank Jühling; Mario Mörl; Roland K Hartmann; Mathias Sprinzl; Peter F Stadler; Joern Pütz
Journal:  Nucleic Acids Res       Date:  2008-10-28       Impact factor: 16.971

10.  Frequency and isostericity of RNA base pairs.

Authors:  Jesse Stombaugh; Craig L Zirbel; Eric Westhof; Neocles B Leontis
Journal:  Nucleic Acids Res       Date:  2009-02-24       Impact factor: 16.971

View more
  4 in total

1.  Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families.

Authors:  Lars Barquist; Sarah W Burge; Paul P Gardner
Journal:  Curr Protoc Bioinformatics       Date:  2016-06-20

2.  RNASTAR: an RNA STructural Alignment Repository that provides insight into the evolution of natural and artificial RNAs.

Authors:  Jeremy Widmann; Jesse Stombaugh; Daniel McDonald; Jana Chocholousova; Paul Gardner; Matthew K Iyer; Zongzhi Liu; Catherine A Lozupone; John Quinn; Sandra Smit; Shandy Wikman; Jesse R R Zaneveld; Rob Knight
Journal:  RNA       Date:  2012-05-29       Impact factor: 4.942

Review 3.  An innate twist between Crick's wobble and Watson-Crick base pairs.

Authors:  Prakash Ananth; Gunaseelan Goldsmith; Narayanarao Yathindra
Journal:  RNA       Date:  2013-08       Impact factor: 4.942

4.  "Snake-oil," "quack medicine," and "industrially cultured organisms:" biovalue and the commercialization of human microbiome research.

Authors:  Melody J Slashinski; Sheryl A McCurdy; Laura S Achenbaum; Simon N Whitney; Amy L McGuire
Journal:  BMC Med Ethics       Date:  2012-10-30       Impact factor: 2.652

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.