| Literature DB >> 30619487 |
Santiago Radío1,2, Rafael Sebastián Fort1,2, Beatriz Garat2, José Sotelo-Silveira1,3, Pablo Smircich1,2.
Abstract
Most signals involved in post-transcriptional regulatory networks are located in the untranslated regions (UTRs) of the mRNAs. Therefore, to deepen our understanding of gene expression regulation, delimitation of these regions with high accuracy is needed. The trypanosomatid lineage includes a variety of parasitic protozoans causing a significant worldwide burden on human health. Given their peculiar mechanisms of gene expression, these organisms depend on post-transcriptional regulation as the main level of gene expression control. In this context, the definition of the UTR regions becomes of key importance. We have developed UTR-mini-exon (UTRme), a graphical user interface (GUI) stand-alone application to identify and annotate 5' and 3' UTR regions in a highly accurate way. UTRme implements a multiple scoring system tailored to address the issue of false positive UTR assignment that frequently arise because of the characteristics of the intergenic regions. Even though it was developed for trypanosomatids, the tool can be used to predict 3' sites in any eukaryote and 5' UTRs in any organism where trans-splicing occurs (such as the model organism C. elegans). UTRme offers a way for non-bioinformaticians to precisely determine UTRs from transcriptomic data. The tool is freely available via the conda and github repositories.Entities:
Keywords: GUI; UTR prediction software; post transcriptional regulation; prediction score; untranslated region
Year: 2018 PMID: 30619487 PMCID: PMC6305552 DOI: 10.3389/fgene.2018.00671
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Outline of the UTRme pipeline. Required initial files, data processing steps and software packages used during processing are depicted in dark gray, white and light gray backgrounds, respectively.
Figure 2UTRme classification of read regions. Regions of each read and their counterparts in the genome are defined by UTRme as primary and secondary.
Example of UTRme summary report output.
| TcCLB.397937.5 | 15 | AG | 89 | 418 | 4 |
| TcCLB.398343.9 | 80 | AG | 79 | 2 | 2 |
| TcCLB.399033.19 | 21 | AG | 90 | 27 | 4 |
| TcCLB.400945.10 | 100 | AG | 85 | 39 | 4 |
| TcCLB.404001.10 | 14 | AG | 95 | 59 | 3 |
| TcCLB.404001.4 | 11 | AG | 91 | 75 | 5 |
| TcCLB.404843.20 | 143 | AG | 92 | 65 | 2 |
| TcCLB.405165.19 | 41 | AG | 92 | 54 | 4 |
| TcCLB.407477.20 | 10 | AG | 91 | 64 | 2 |
| TcCLB.407477.30 | 63 | AG | 96 | 51 | 4 |
Summary report of best scoring epimastigote's SL sites using epimastigote RNA-seq data from Li et al. (.
Figure 3Example of UTRme summary plots output. Reported plots for the 5′ and 3′ UTRs predicted using T. cruzi epimastigote Y strain RNA-seq data from Li et al. (2016). Plots for 5′ and 3′ UTRs are in dark gray and light gray, respectively. (A) Kernel density estimation plot of UTR lengths. (B) Kernel density estimation plot of both 5′ and 3′ UTR score distribution. (C) Kernel density estimation plot for the number of 5′ and 3′ UTR sites. In all cases the median is indicated as a dotted line. (D) Central panel: Scatter plot of 5′ UTR scores vs occurrences. A higher point density is indicated by a darker color for each bin. Upper panel: histogram of occurrences. Right panel: histogram of scores.
Comparison of UTRme predictions against experimentally defined processing sites.
| 5′ | TcCLB.509147.50 | 48 | 51 | 51 | 54 | 55 | Di Noia et al., |
| 5′ | TcCLB.511679.10 | 51 | 51 | 51 | 54 | 51 | Di Noia et al., |
| 3′ | TcCLB.506533.142 | 786 | 786 | 764 | – | 789 | Di Noia et al., |
| 3′ | TcCLB.511679.10 | – | 375 | – | – | ~353 | Di Noia et al., |
| 5′ | TcCLB.507485.140 | – | 140 | 137 | – | 137 | Teixeira et al., |
| 5′ | TcCLB.506407.10 | 93 | 102 | 101 | 718 | 103 | Vandersall-Nairn et al., |
| 5′ | TcCLB.509123.10 | – | 33 | – | – | 33 | García et al., |
| 5′ | TcCLB.505931.50 | 43 | 76 | 72 | 43 | 76 | Bontempi et al., |
| 5′ | TcCLB.507093.220 | 68 | 66 | 68 | – | 68 | D'Orso and Frasch, |
| 5′ | TcCLB.507639.30 | 42 | 42 | 42 | 42 | 42 | Coelho et al., |
| 5′ | TcCLB.507511.81 | – | 41 | 41 | – | 41 | Di Noia et al., |
| 5′ | TcCLB.510241.70 | – | 144 | 144 | 144 | 142 | Bhatia et al., |
| 5′ | TcCLB.506925.300 | 60 | 60 | 58 | 63 | 60 | Búa et al., |
| 5′ | TcCLB.506563.40 | 110 | 110 | 110 | 113 | 110 | Bartholomeu et al., |
For UTRme predictions the best scoring site using T. cruzi epimastigote data is shown. UTRme 5′ enriched: UTRme predictions using In-house low pass sequencing of 5′ UTR enriched library. UTRme Li: UTRme predictions using Li et al. (.
Figure 4UTRme accuracy assessment for 5′ UTRs. (A) Dependence of the number of true positives and false positives on the UTRme score (indicated as inserts). (B) False positive annotations are plotted as dots indicating their score and distance to the real processing site. The histogram shows the distribution of scores for all predicted sites.
Figure 5Venn diagrams comparing the results of UTRme and SLaP mapper 5′ processing sites annotations. (A) The intersection of the genes predicted by each tool is shown. (B) For genes were annotations are available for both tools, the intersection of the sites predicted by each tool is shown.
Figure 6Comparison of UTRme best scoring sites with the ones predicted by Slap mapper using Pastro et al. (2017) data. (A) Scatter plot of 5′ UTR lengths. Darker regions indicate higher density of points. (B) The percentage of points that have scores above a threshold is plotted for coincident and non-coincident sites. Dark gray: non-coincident sites. Light gray: coincident sites. The percentage was calculated until the number of sites remaining is above 10 (C,D). Same as (A,B) for 3′ UTRs.