| Literature DB >> 27574119 |
Adi Millman1, Daniel Dar1, Maya Shamir1, Rotem Sorek2.
Abstract
A common strategy for regulation of gene expression in bacteria is conditional transcription termination. This strategy is frequently employed by 5'UTR cis-acting RNA elements (riboregulators), including riboswitches and attenuators. Such riboregulators can assume two mutually exclusive RNA structures, one of which forms a transcriptional terminator and results in premature termination, and the other forms an antiterminator that allows read-through into the coding sequence to produce a full-length mRNA. We developed a machine-learning based approach, which, given a 5'UTR of a gene, predicts whether it can form the two alternative structures typical to riboregulators employing conditional termination. Using a large positive training set of riboregulators derived from 89 human microbiome bacteria, we show high specificity and sensitivity for our classifier. We further show that our approach allows the discovery of previously unidentified riboregulators, as exemplified by the detection of new LeuA leaders and T-boxes in Streptococci Finally, we developed PASIFIC (www.weizmann.ac.il/molgen/Sorek/PASIFIC/), an online web-server that, given a user-provided 5'UTR sequence, predicts whether this sequence can adopt two alternative structures conforming with the conditional termination paradigm. This webserver is expected to assist in the identification of new riboswitches and attenuators in the bacterial pan-genome.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27574119 PMCID: PMC5314783 DOI: 10.1093/nar/gkw749
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The principle of regulation by conditional termination. A riboregulator that functions through conditional transcription termination can assume two mutually exclusive structural conformations: (A) A ‘closed’ conformation, that entails an intrinsic terminator, which is a stem-loop (strands #3 and #4) followed by a poly-U. This structure causes the RNA-polymerase to terminate transcription prematurely. The terminator structure is usually preceded by another stem-loop structure called the anti-antiterminator or P1 (formed by strands #1 and #2). (B) An ‘open’ conformation, in which an antiterminator stem (not immediately followed by a poly-U) is generated from pairing of strands #2 and #3, allowing the RNA-polymerase to continue transcription into the downstream gene. (C) The closed state (‘Gene off’) typically results in higher amounts of the short, prematurely terminated transcript, which can be measured by RNA-seq (5). In the open state (‘gene on’) more full-length transcripts are observed.
Selected features for the Random Forest classification
| Feature name | Explanation |
|---|---|
| ΔGopen | The free energy of the ‘open state’ RNA fold |
| ΔGclosed | The free energy of the ‘closed state’ RNA fold |
| ΔGopen/ΔGclosed | The ratio between the free energy of the two alternative folds |
| ΔGopen/length | The free energy of the ‘open state’ RNA fold normalized to the length of the sequence |
| ΔGantiterminator | The strength of antiterminator stem-loop |
| ΔGantiterminator/length | The strength of the antiterminator stem, normalized to its length |
| ΔGantiterminator/ ΔGterminator | The ratio between the strength of the antiterminator and terminator stem-loops |
| Endantiterminator | The distance, in nt, of the antiterminator stem end from the TSS |
| ΔGclosed/ΔGMFE | The ratio between the free energy of the ‘closed state’ RNA fold and the most stable folding of the RNA molecule retrieved from RNAFold ( |
| ΔGopen/ΔGMFE | The ratio between the free energy of the ‘open state’ RNA fold and the most stable folding of the RNA molecule retrieved from RNAFold ( |
| ΔGopen-ΔGMFE | The difference in kcal/mol between the free energy of the ‘open state’ RNA fold and the most stable folding of the RNA molecule |
| ΔGclosed-ΔGMFE | The difference in kcal/mol between the free energy of the ‘closed state’ RNA fold and the most stable folding of the RNA molecule |
| ΔGP1 | The strength of the anti-antiterminator (P1) stem-loop |
| ΔGP1/length | The strength of the P1 stem, normalized to its length |
| length P1 | The length of P1 |
Figure 2.Classification of riboregulators (positive set) and other small ncRNAs (negative set) using a Random Forest classifier. (A) Receiver Operating Characteristic (ROC) curve depicting the performance of the Random Forest classifier in differentiating riboregulators from other small RNAs. The area under the curve (AUC) is 0.9. Sensitivity and specificity are specified for the score threshold (0.5) chosen as the classifier threshold. (B) Prediction results for the test set. Individual elements belonging to the positive and negative sets are depicted by blue and red points, respectively. Y-axis depicts the classifier score. Thick horizontal line depicts the classifier threshold (C) Box plot describing the classification score distribution of the positive and negative test sets.
Figure 3.Identification of riboregulators in bacteria belonging to the human oral microbiome. (A) The leucine leader in Streptococcus sanguinis SK36. Data are shown for the Streptococcus sanguinis 2-isopropylmalate synthase (leuA) gene. Shown are RNA-seq data (blue curve), TSS inferred from 5′ end sequencing data (red arrows) and TTS inferred from term-seq data (black arrows) (5). X-axis, position on the Streptococcus sanguinis chromosome (NC_009009). (B) Predicted alternative conformations of the LeuA 5′UTR: Left – the ‘closed’ state where strands #3 and #4 form a terminator. Right – the ‘open’ state, with the predicted antiterminator. (C) Sequence of leucine-rich uORFs upstream of LeuA in various Streptococcus species and E. coli. (D) RNA seq data for the S. sanguinis phenylalanyl-tRNA synthase gene. (E) Predicted alternative conformations of the phenylalanyl-tRNA synthase 5′UTR, characteristic of a T-box leader structure. Left and right, the ‘closed’ and ‘open’ states, respectively.
Figure 4.The PASIFIC web server. (A) Screenshot of the query page (left) and the results page (right) with predicted alternative conformations of a TPP riboswitch given as an example. (B) Comparison of the experimental alternative structures of the purine riboswitch (adapted from (38)) (top) and its PASIFIC-predicted structures (bottom). Left: the ‘closed’ state where the anti-antiterminator (P1) is stabilized by the bound purine and strands #3 and #4 form a terminator. Right: the ‘open’ state, where in the absence of purine the P1 stem is not stabilized and strand #2 is free to form an antiterminator with strand #3.