Literature DB >> 29554223

CMV: visualization for RNA and protein family models and their comparisons.

Florian Eggenhofer^1,2, Ivo L Hofacker^2,3, Rolf Backofen^1,4, Christian Höner Zu Siederdissen^2,5,6.

Abstract

Summary: A standard method for the identification of novel RNAs or proteins is homology search via probabilistic models. One approach relies on the definition of families, which can be encoded as covariance models (CMs) or Hidden Markov Models (HMMs). While being powerful tools, their complexity makes it tedious to investigate them in their (default) tabulated form. This specifically applies to the interpretation of comparisons between multiple models as in family clans. The Covariance model visualization tools (CMV) visualize CMs or HMMs to: I) Obtain an easily interpretable representation of HMMs and CMs; II) Put them in context with the structural sequence alignments they have been created from; III) Investigate results of model comparisons and highlight regions of interest. Availability and implementation: Source code (http://www.github.com/eggzilla/cmv), web-service (http://rna.informatik.uni-freiburg.de/CMVS). Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh：

Substances：
RNA

Year: 2018 PMID： 29554223 PMCID： PMC6061798 DOI： 10.1093/bioinformatics/bty158

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Probabilistic models are constructed for specific RNA and protein families sharing a common ancestor and a biological function. The most prominent instances are the HMM architecture as used by HMMER3 (Eddy, 2011) and the CMs utilized by INFERNAL (Nawrocki and Eddy, 2013). Currently there are 2686 RNA families available from the Rfam (Burge ; Kalvari ; Nawrocki ) database and 16 712 from Pfam (Finn ). Visualization of the models provides an overview over whole regions and allows to directly inspect states, nodes and probabilities. A HMM visualization tool exists as part of SAM (Krogh ), while for CMs, as far as we are aware, no automatic solution exists.

2 Approach

Each tool of CMV accepts one or more models (INFERNAL, HMMER3 format) and optionally one or more corresponding alignments (Stockholm format) as input. The tools for comparison visualization require inputs in CMCompare (Eggenhofer ; Höner zu Siederdissen and Hofacker, 2010) format. Additional parameters can be set that control the level of detail of the visualization. In the minimal setting only the index for each node is shown, while full details provide states and probabilities. Moreover it is possible to select if emission probabilities should be displayed as numerical values or using a graphical representation. The number of entries in the alignment, the image size and the output format (svg, png, eps, pdf) can also be defined via options. The tools have been written using the diagrams library with a cairo back-end for visualization. Processing takes on average, for the first 100 Rfam models, 13 s for a model with detailed output (see Supplementary Table 1). The tools create one visualization output file per input model. If the Stockholm alignment for the family was provided, then a second output file is generated per alignment. It is possible to select from three levels of visualization detail (minimal, simple, detailed) for family models and, exclusively for CMs, linear or tree layout. The minimal detail setting shows each node (roughly corresponding to paired nucleotides or single aminoacids or nucleotides) of the model as a box labeled with the index of the node. When the detail level is set to simple, emission probabilities are included in the visualization for each node in case of HMMs and the node type in case of CMs. The detailed level shows the individual states (encoding match, insertion and deletion options) per node, with emission and transition probabilities (see Fig. 1B–G). Emission probabilities are either shown as numerical values (score, probability) or as graphical bars. Transition probabilities are visualized as arrows between states, with probabilities indicated by increasing opacity, as well as text labels. For more information and figures see the Supplementary Material.

Fig. 1.

Visualization of HMM (B, C, D) and CM (E, F, G) consensus secondary structure (H, I) and Stockholm Alignment (J) for the Hammerhead RNA_HH9 in comparison with families from the Hammerhead RNA family clan (A). Color labels indicate to which other model an alignment column or node has been linked via CMCompare (Complete figures are shown in Supplementary Material). A: Color Legend for the compared models; B: minimal HMM details show nodes with indices, C: simple HMM details show emission probabilities as well; D: detailed HMM view shows states with emission and transition probabilities; E: minimal CM details show nodes with indices; F: simple CM details add node type information; G: detailed CM view shows nodes with states and emission and transition probabilities; H and I show secondary structure visualization via R2R and forna; J shows a slice of input alignment, each line corresponds to one family member. Numbers on top of the columns represent the column index stored in the corresponding CM node Results of model comparison are visualized by labeling nodes with colors encoding the linked models (see Fig. 1A). Since the alignment columns corresponding to a node are known via the column index, the comparison information is also annotated in the alignment visualization (see Fig. 1J). In the case of (structured) RNAs this comparative information can be mapped back to the consensus secondary structure of the family, thus enabling the identification of specific motifs or regions that are linked. This is done via labeling a secondary structure visualization of R2R (Weinberg and Breaker, 2011) or alternatively an input file for forna (Kerpedjiev ) (see Fig. 1H and I). The tool also is available as a web-service, along with documentation and precomputed examples in three detail levels for all available models in the Rfam database and the first 1500 models of the Pfam database.

3 Conclusion

We provide an open-source tool and web-service for the visualization of HMMs, CMs, their alignments and, for RNA, their consensus secondary structure. The visualizations can supplement models in the Pfam and Rfam databases and enable convenient inspection of newly constructed models with RNAlien (Eggenhofer ), RNAscClust (Miladi ), or the RNA workbench (Backofen ; Grüning ). Nodes linked by comparison to other models are highlighted in the visualization, which allows to investigate sequence and structure elements shared among family clans. This simplifies the identification of domains, respectively secondary structure elements, with potentially related biological functionality.

Funding

This project was funded, in part, by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung (FWF), project Doktoratskolleg RNA BiologyW1207-B09, project SFB F43 RNA regulation of the transcriptome, Deutsche Forschungs Gesellschaft (DFG) grant BA 2168/3-3 and DFG BA 2168/16-1. The open access fee was covered by FWF F 4305-B09. We thank the anonymous reviewer for constructive comments, which helped us to improve the tools and the manuscript. Conflict of Interest: none declared. Click here for additional data file.

15 in total

1. Hidden Markov models in computational biology. Applications to protein modeling.

Authors: A Krogh; M Brown; I S Mian; K Sjölander; D Haussler
Journal: J Mol Biol Date: 1994-02-04 Impact factor: 5.469

2. Discriminatory power of RNA family models.

Authors: Christian Höner zu Siederdissen; Ivo L Hofacker
Journal: Bioinformatics Date: 2010-09-15 Impact factor: 6.937

3. RNAlien - Unsupervised RNA family model construction.

Authors: Florian Eggenhofer; Ivo L Hofacker; Christian Höner Zu Siederdissen
Journal: Nucleic Acids Res Date: 2016-06-21 Impact factor: 16.971

4. R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.

Authors: Zasha Weinberg; Ronald R Breaker
Journal: BMC Bioinformatics Date: 2011-01-04 Impact factor: 3.169

5. Rfam 12.0: updates to the RNA families database.

Authors: Eric P Nawrocki; Sarah W Burge; Alex Bateman; Jennifer Daub; Ruth Y Eberhardt; Sean R Eddy; Evan W Floden; Paul P Gardner; Thomas A Jones; John Tate; Robert D Finn
Journal: Nucleic Acids Res Date: 2014-11-11 Impact factor: 19.160

6. Forna (force-directed RNA): Simple and effective online RNA secondary structure diagrams.

Authors: Peter Kerpedjiev; Stefan Hammer; Ivo L Hofacker
Journal: Bioinformatics Date: 2015-06-22 Impact factor: 6.937

7. The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy.

Authors: Björn A Grüning; Jörg Fallmann; Dilmurat Yusuf; Sebastian Will; Anika Erxleben; Florian Eggenhofer; Torsten Houwaart; Bérénice Batut; Pavankumar Videm; Andrea Bagnacani; Markus Wolfien; Steffen C Lott; Youri Hoogstrate; Wolfgang R Hess; Olaf Wolkenhauer; Steve Hoffmann; Altuna Akalin; Uwe Ohler; Peter F Stadler; Rolf Backofen
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

8. Rfam 11.0: 10 years of RNA families.

Authors: Sarah W Burge; Jennifer Daub; Ruth Eberhardt; John Tate; Lars Barquist; Eric P Nawrocki; Sean R Eddy; Paul P Gardner; Alex Bateman
Journal: Nucleic Acids Res Date: 2012-11-03 Impact factor: 16.971

9. The Pfam protein families database: towards a more sustainable future.

Authors: Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal: Nucleic Acids Res Date: 2015-12-15 Impact factor: 16.971

10. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.

Authors: Ioanna Kalvari; Joanna Argasinska; Natalia Quinones-Olvera; Eric P Nawrocki; Elena Rivas; Sean R Eddy; Alex Bateman; Robert D Finn; Anton I Petrov
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

3 in total

1. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering.

Authors: Milad Miladi; Eteri Sokhoyan; Torsten Houwaart; Steffen Heyne; Fabrizio Costa; Björn Grüning; Rolf Backofen
Journal: Gigascience Date: 2019-12-01 Impact factor: 6.524

2. Freiburg RNA tools: a central online resource for RNA-focused research and teaching.

Authors: Martin Raden; Syed M Ali; Omer S Alkhnbashi; Anke Busch; Fabrizio Costa; Jason A Davis; Florian Eggenhofer; Rick Gelhausen; Jens Georg; Steffen Heyne; Michael Hiller; Kousik Kundu; Robert Kleinkauf; Steffen C Lott; Mostafa M Mohamed; Alexander Mattheis; Milad Miladi; Andreas S Richter; Sebastian Will; Joachim Wolff; Patrick R Wright; Rolf Backofen
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

3. The RNA workbench 2.0: next generation RNA data analysis.

Authors: Jörg Fallmann; Pavankumar Videm; Andrea Bagnacani; Bérénice Batut; Maria A Doyle; Tomas Klingstrom; Florian Eggenhofer; Peter F Stadler; Rolf Backofen; Björn Grüning
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

3 in total