| Literature DB >> 23355290 |
Georg Sauthoff1, Mathias Möhl, Stefan Janssen, Robert Giegerich.
Abstract
MOTIVATION: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique.Entities:
Mesh:
Year: 2013 PMID: 23355290 PMCID: PMC3582264 DOI: 10.1093/bioinformatics/btt022
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Left: tree representing an alignment of the amino acid sequences ‘DARLING’ and ‘AIRLINE’. rep(lace), ins(ert) and del(ete) denote the typical edit operations, and nil denotes an empty alignment. Right: tree representing a secondary structure assigned to an RNA sequence. pair indicates a base pair enclosing a sub-structure, split a branching structure, open an unpaired base next to a sub-structure, and nil the empty sub-structure
Algebra products provided in Bellman’s GAP
| Operator | Product name | Effect | Restrictions |
|---|---|---|---|
| Cartesian | Cartesian product | ||
| Lexicographic | Optimization under lexicographic ordering | ||
| classified DP | |||
| reporting candidates | |||
| Take-one | Same as | ||
| Interleaved | Optimization | ||
| Overlay | Stochastic backtrace |
See Section 4.2 for products in action.
Modules related to RNA structure prediction based on thermodynamics
| Type | Name | Purpose |
|---|---|---|
| Signature | foldrna | Describes RNA folding space |
| Grammar | nodangle | Model without dangling bases |
| Grammar | overdangle | Model with overaggressive dangling |
| Grammar | microstate | Correct dangling with… |
| Grammar | macrostate | …Without candidate space extension |
| Grammar | nodangle_lp | Model allowing lonely pairs |
| Algebra | mfe | MFE computation |
| Algebras | pfunc | Boltzmann weights (partition function) |
| pfunc_id | Individual candidate Boltzmann weight | |
| Algebras | shapes1…5 | Classification by shape abstraction |
| Algebra | dotBracket | Printing structures |
Fig. 2.Measurement on 10 000 uniformly distributed random sequences with length between 1 and 500 bases. See text for discussion. Runtime (user + system) and memory consumption (max, RSS) measured by UNIX tool ‘memtime’ (by Johan Bengtsson). RNAfold version 2.0.2 -d 1 –noLP –noPS, UNAfold: hybrid-ss-min –suffix = DAT –mfold –NA = RNA –tmin = 37 –tinc = 1 –tmax = 37 –sodium = 1 –magnesium = 0 -I, Bellman’s GAP: mfe * dotBracket –backtrace -t microstate grammar. Short sequence memory consumption seems to fit into initial stack/heap size of an OS process
GAP-L program sizes and target code sizes
| Tool | Problem solved | No. of algebra functions | No. of algebras used | No. of NTs in | No. of distinct cases | No. of NTs tabulated | Lines of code GAP-L | Lines of code C++ |
|---|---|---|---|---|---|---|---|---|
| RNAshapes | Shape representative structures of RNA | 19 | 3 | 11 | 29 | 5 | 487 | 5456 |
| RNAshapes | Probabilistic shape analysis | 36 | 2 | 26 | 72 | 15 | 640 | 4796 |
| pknotsRG | Pseudoknot prediction | 38 | 2 | 25 | 63 | 19 | 755 | 9581 |
| GAP-RNAfold | RNA folding (RNAfold -d0 emulation) | 15 | 2 | 11 | 25 | 5 | 191 | 3719 |
| RF00553 | Covariance model remake | 258 | 2 | 80 | 257 | 77 | 3348 | 51 646 |
The number of cases distinguished in the problem decomposition is defined as the number of right-hand sides over all rules of the tree grammar. Lines of code are given for a typical instance using a product of two or three algebras, such as .