| Literature DB >> 25779873 |
Jan Reinkensmeier1, Robert Giegerich.
Abstract
RNA family models describe classes of functionally related, non-coding RNAs based on sequence and structure conservation. The most important method for modeling RNA families is the use of covariance models, which are stochastic models that serve in the discovery of yet unknown, homologous RNAs. However, the performance of covariance models in finding remote homologs is poor for RNA families with high sequence conservation, while for families with high structure but low sequence conservation, these models are difficult to built in the first place. A complementary approach to RNA family modeling involves the use of thermodynamic matchers. Thermodynamic matchers are RNA folding programs, based on the established thermodynamic model, but tailored to a specific structural motif. As thermodynamic matchers focus on structure and folding energy, they unfold their potential in discovering homologs, when high structure conservation is paired with low sequence conservation. In contrast to covariance models, construction of thermodynamic matchers does not require an input alignment, but requires human design decisions and experimentation, and hence, model construction is more laborious. Here we report a case study on an RNA family that was constructed by means of thermodynamic matchers. It starts from a set of known but structurally different members of the same RNA family. The consensus secondary structure of this family consists of 2 to 4 adjacent hairpins. Each hairpin loop carries the same motif, CCUCCUCCC, while the stems show high variability in their nucleotide content. The present study describes (1) a novel approach for the integration of the structurally varying family into a single RNA family model by means of the thermodynamic matcher methodology, and (2) provides the results of homology searches that were conducted with this model in a wide spectrum of bacterial species.Entities:
Keywords: CIN, conserved intergenic neighborhood; CM, covariance model; HMM, hidden Markov model; MFE, minimum free energy; OG, orthologous group of genes; RBS, ribosome binding site; RFM, RNA family model; TDM, thermodynamic matcher; aSD, anti Shine-Dalgarno; alphaproteobacteria; cuckoo RNA; dRNA-seq, differential RNA sequencing; family model; homology search; sRNA, small non-coding RNA; small RNA; structural RNA; thermodynamic matcher
Mesh:
Substances:
Year: 2015 PMID: 25779873 PMCID: PMC4615179 DOI: 10.1080/15476286.2015.1017206
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.RNA structures of experimentally validated cuckoo RNAs obtained by TDM folding. (A) RSs0680a RNA (position 692386-692458, Rhodobacter sphaeroides 2.4.1), (B) ReC11 RNA (462572-462689, Rhizobium etli CFN 42), (C) L5 RNA (1831446-1831604, Agrobacterium tumefaciens str. C58).
Distribution of cuckoo RNAs. The table summarizes the number of cuckoo RNAs and their distribution across CINs for each species. The occurrences of cuckoo RNAs divided into structural variants are displayed in columns HP2 to HP4 Columns CIN1–CIN6 show the distribution of cuckoo RNAs within the 6 CINs. Each digit represents a cuckoo RNA within a CIN, while the digit's value reflects the number of modules. Unless denoted by a leading character (c = secondary chromosome, p = plasmid) a CIN and therefore the corresponding cuckoo RNA is located on the primary chromosome of the respective bacterium. The first row, for example, reads in words as follows: “In Brucella abortus A13334, we find 4 cuckoo RNAs, one with 2 hairpins, 2 with 3 hairpins, and one with 4 hairpins. The 4HP cuckoo is found on the main chromosome in neighborhood CIN1; in the same neighborhood, but on the secondary chromosome, we find a 2HP cuckoo. The 3HP cuckoos are found on the main chromosome in neighborhood CIN5, and on the secondary chromosome in neighborhood CIN6.” See Table S1 for complete sequences and detailed results.
| Species | HP2 | HP3 | HP4 | CIN1 | CIN2 | CIN3 | CIN4 | CIN5 | CIN6 |
|---|---|---|---|---|---|---|---|---|---|
| Polymorphum gilvum SL003B-26A1 | 0 | 1 | 1 | ||||||
| Brucellaceae | |||||||||
| Brucella abortus A13334 | 1 | 2 | 1 | 4;c2 | 3 | c3 | |||
| Brucella abortus S19 | 1 | 2 | 1 | 4;c2 | 3 | c3 | |||
| Brucella abortus bv. 1 str. 9-941 | 1 | 1 | 1 | 4;c2 | 3 | ||||
| Brucella canis ATCC 23365 | 1 | 1 | 1 | 4 | 3 | c2 | |||
| Brucella canis HSK A52141 | 1 | 1 | 1 | 4 | 3 | c2 | |||
| Brucella melitensis ATCC 23457 | 1 | 1 | 1 | 4;c2 | 3 | ||||
| Brucella melitensis M28 | 1 | 1 | 1 | 4;c2 | 3 | ||||
| Brucella melitensis M5-90 | 1 | 1 | 1 | 4;c2 | 3 | ||||
| Brucella melitensis NI | 1 | 1 | 1 | 4;c2 | 3 | ||||
| Brucella melitensis biovar Abortus 2308 | 2 | 1 | 1 | 4;c2 | 3 | c2 | |||
| Brucella melitensis bv. 1 str. 16M | 2 | 1 | 1 | 4 | 3 | c2 | |||
| Brucella microti CCM 4915 | 2 | 1 | 1 | 4;c2 | 3 | c2 | |||
| Brucella ovis ATCC 25840 | 1 | 1 | 1 | 4;c2 | |||||
| Brucella pinnipedialis B2/94 | 1 | 2 | 1 | 4;c2 | 3 | c3 | |||
| Brucella suis 1330 | 1 | 2 | 1 | 4;c2 | 3 | c3 | |||
| Brucella suis ATCC 23445 | 1 | 2 | 1 | 4;c2 | |||||
| Brucella suis VBI22 | 1 | 2 | 1 | 4;c2 | 3 | c3 | |||
| Ochrobactrum anthropi ATCC 49188 | 0 | 4 | 1 | 4;c3 | |||||
| Hyphomicrobiaceae | |||||||||
| Pelagibacterium halotolerans B2 | 0 | 2 | 0 | ||||||
| Phyllobacteriaceae | |||||||||
| Chelativorans sp. BNC1 | 2 | 4 | 0 | 3 | |||||
| Mesorhizobium australicum WSM2073 | 0 | 5 | 0 | 333 | |||||
| Mesorhizobium ciceri biovar biserrulae WSM1271 | 0 | 6 | 0 | 333 | |||||
| Mesorhizobium loti MAFF303099 | 0 | 6 | 0 | ||||||
| Mesorhizobium opportunistum WSM2075 | 0 | 5 | 0 | 333 | |||||
| Rhizobiaceae | |||||||||
| Agrobacterium tumefaciens str. C58 | 0 | 1 | 2 | ||||||
| Agrobacterium radiobacter K84 | 0 | 3 | 3 | 3 | 3 | 3 | |||
| Agrobacterium sp H13-3 | 0 | 2 | 2 | l4 | |||||
| Agrobacterium vitis S4 | 0 | 3 | 3 | c4 | |||||
| Rhizobium etli CFN 42 | 0 | 4 | 2 | p4 | 3 | 3 | |||
| Rhizobium etli CIAT 652 | 1 | 3 | 3 | p4 | 3 | ||||
| Rhizobium etli bv. mimosae str. Mim1 | 0 | 4 | 1 | p4 | 3 | 3 | |||
| Rhizobium leguminosarum bv. trifolii WSM1325 | 0 | 4 | 4 | 3;p4;p4 | 3 | 3 | |||
| Rhizobium leguminosarum bv. trifolii WSM2304 | 1 | 4 | 2 | 3;p4 | 3 | 3 | |||
| Rhizobium leguminosarum bv. viciae 3841 | 0 | 4 | 1 | p4 | 3 | ||||
| Rhizobium tropici CIAT 899 | 0 | 3 | 1 | p4 | 3 | 3 | |||
| Sinorhizobium fredii HH103 | 0 | 6 | 0 | 33 | p3 | ||||
| Sinorhizobium fredii NGR234 | 0 | 4 | 0 | 33 | p3 | ||||
| Sinorhizobium fredii USDA 257 | 0 | 6 | 0 | 33 | 3 | ||||
| Sinorhizobium medicae WSM419 | 1 | 4 | 2 | 33;p4 | p3 | ||||
| Sinorhizobium meliloti 1021 | 0 | 5 | 1 | ||||||
| Sinorhizobium meliloti 2011 | 0 | 5 | 1 | 3 | p3 | ||||
| Sinorhizobium meliloti AK83 | 0 | 5 | 1 | 3 | 3 | c3 | |||
| Sinorhizobium meliloti BL225C | 0 | 4 | 1 | 33 | 3 | p3 | |||
| Sinorhizobium meliloti GR4 | 0 | 4 | 1 | 33 | 3 | p3 | |||
| Sinorhizobium meliloti Rm41 | 0 | 5 | 1 | 3 | p3 | ||||
| Sinorhizobium meliloti SM11 | 0 | 4 | 1 | 33 | 3 | p3 | |||
| Rhodobacteraceae | |||||||||
| Parvibaculum lavamentivorans DS-1 | 1 | 0 | 0 | ||||||
| Dinoroseobacter shibae DFL 12 | 1 | 0 | 0 | ||||||
| Jannaschia sp. CCS1 | 3 | 0 | 0 | 222 | |||||
| Ketogulonicigenium vulgare WSH-001 | 0 | 1 | 0 | ||||||
| Ketogulonicigenium vulgare Y25 | 0 | 1 | 0 | ||||||
| Loktanella vestfoldensis DSM 16212 | 2 | 1 | 0 | ||||||
| Loktanella vestfoldensis SKA53 | 2 | 1 | 0 | 223 | |||||
| Oceanicola batsensis HTCC2597 | 2 | 0 | 1 | 24 | |||||
| Oceanicola granulosus HTCC2516 | 4 | 1 | 0 | 22223 | |||||
| Octadecabacter antarcticus 307 | 2 | 0 | 4 | ||||||
| Octadecabacter arcticus 238 | 3 | 1 | 3 | ||||||
| Paracoccus aminophilus JCM 7686 | 0 | 0 | 2 | 44 | |||||
| Paracoccus denitrificans PD1222 | 4 | 0 | 1 | 22224 | |||||
| Phaeobacter gallaeciensis 2.10 | 2 | 0 | 0 | 22 | |||||
| Phaeobacter gallaeciensis DSM 17395 | 2 | 0 | 0 | 22 | |||||
| Phaeobacter gallaeciensis DSM 26640 | 2 | 0 | 0 | ||||||
| Pseudovibrio sp FO-BEG1 | 0 | 0 | 1 | ||||||
| Rhodobacter capsulatus SB 1003 | 4 | 0 | 0 | 2222 | |||||
| Rhodobacter sphaeroides 2.4.1 | 7 | 0 | 0 | 2222222 | |||||
| Rhodobacter sphaeroides ATCC 17025 | 4 | 0 | 0 | 2222 | |||||
| Rhodobacter sphaeroides ATCC 17029 | 9 | 0 | 0 | 222222222 | |||||
| Rhodobacter sphaeroides KD131 | 7 | 0 | 0 | 2222222 | |||||
| Roseobacter denitrificans OCh 114 | 2 | 0 | 0 | 22 | |||||
| Roseobacter litoralis OCh 149 | 2 | 0 | 0 | 22 | |||||
| Roseovarius nubinhibens ISM | 2 | 0 | 0 | 2 | |||||
| Roseovarius sp. 217 | 2 | 0 | 0 | 22 | |||||
| Ruegeria pomeroyi DSS-3 | 3 | 0 | 0 | 222 | |||||
| Ruegeria sp TM1040 | 2 | 0 | 0 | 22 | |||||
| Sagittula stellata E-37 | 3 | 1 | 0 | 2223 | |||||
| Sulfitobacter sp. EE-36 | 1 | 0 | 1 | 24 | |||||
| Sulfitobacter sp NAS-14.1 | 2 | 0 | 1 | 24 |
Figure 2.Conserved intergenic neighborhoods of cuckoo RNAs. CINs are drawn as graphs. Nodes and edges depict conserved flanking features of cuckoo RNAs and represent single and combinations of OGs, respectively. Nodes and edges are annotated with the number of involved cuckoo. Each node is a pie chart to display the phylogenetic distribution. The area of the nodes is proportional to the number of flanked cuckoo RNAs. The colors green, yellow, purple and red correspond to the taxa Brucellaceae, Rhizobiaceae, Phyllobacteriaceae, and Rhodobacterales.
Figure 3.Skeleton (A), cuckoo TDM grammar (B), and cuckoo sequence motif constraints (C). In both grammars struct is the axiom. Vertical bars separate alternative productions that start at the same nonterminal. Algebra functions are colored in green and built the tree-like data structure from terminals and nonterminals. In case of the cuckoo TDM grammar, these functions call upon the energy functions of the thermodynamic model to compute free energies for the corresponding substructure. The following terminals (in blue) are used: ε denotes the empty word, b a single base from the RNA alphabet {A,C,G,U}, r a region of unpaired bases, and loc the end-position of a neighbor subword. Numbers depict thresholds for size filters. A single number specifies the maximum size while two numbers determine a size range. dangle applies a base pair filter (in red), requiring at least six base pairs. For each cuckoo motif in C, an alternative production of seqmotif exists, where r corresponds to the cuckoo motif. The IPUAC convention is used to express cuckoo motifs.
Figure 4.Pipeline for cuckoo RNA discovery based on TDMs. Different colors correspond to structural variants of cuckoo RNAs (HP2 in blue, HP3 in green, HP4 in red). Bacterial genome sequences from the NCBI reference genome database were gathered and consecutively scanned by the skeleton TDM, focusing on primary sequence conservation, and the cuckoo TDM, which incorporates structural constraints. Then the energy filter is applied. HP2 cuckoo candidates that pass the structural filter are processed by the HP2 cuckoo TDM which was adapted to match specifically HP2 cuckoo RNAs.