| Literature DB >> 35664803 |
Rohan Parikh1, Briana Wilson1, Laine Marrah1, Zhangli Su1, Shekhar Saha1, Pankaj Kumar1, Fenix Huang2, Anindya Dutta1.
Abstract
tRNA fragments (tRFs) are small RNAs comparable to the size and function of miRNAs. tRFs are generally Dicer independent, are found associated with Ago, and can repress expression of genes post-transcriptionally. Given that this expands the repertoire of small RNAs capable of post-transcriptional gene expression, it is important to predict tRF targets with confidence. Some attempts have been made to predict tRF targets, but are limited in the scope of tRF classes used in prediction or limited in feature selection. We hypothesized that established miRNA target prediction features applied to tRFs through a random forest machine learning algorithm will immensely improve tRF target prediction. Using this approach, we show significant improvements in tRF target prediction for all classes of tRFs and validate our predictions in two independent cell lines. Finally, Gene Ontology analysis suggests that among the tRFs conserved between mice and humans, the predicted targets are enriched significantly in neuronal function, and we show this specifically for tRF-3009a. These improvements to tRF target prediction further our understanding of tRF function broadly across species and provide avenues for testing novel roles for tRFs in biology. We have created a publicly available website for the targets of tRFs predicted by tRForest.Entities:
Year: 2022 PMID: 35664803 PMCID: PMC9155213 DOI: 10.1093/nargab/lqac037
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Comprehensive workflow from initial dataset to validated algorithm applied to tRFs in seven species.
Sensitivity and PPV values for tRForest with different seed match criteria (see Figure 2). 7mer-m1 and 8mer seed matches outperform other criteria
| Seed match | Sensitivity | PPV |
|---|---|---|
| 6mer | 0.9439 | 0.9446 |
| 7mer-A1 | 0.9523 | 0.9525 |
| 7mer-m1 | 0.9616 | 0.9620 |
| 7mer-m8 | 0.9588 | 0.9585 |
| 8mer | 0.9622 | 0.9617 |
Figure 2.Site type breakdown for seed-matching. The 6mer seed comprises nucleotides 2–7 on the tRF, with additional matching of the target at nucleotides 1 and/or 8 on the tRF based on the site type.
Run time comparisons, sensitivity and PPV values for tRForest evaluated on various subsets of features
| Rank by time to calculate | Feature subset | Sensitivity | PPV |
|---|---|---|---|
| 0 | All features | 0.9616 | 0.9620 |
| 1 | Removed phastCons flanking score | 0.9596 | 0.9597 |
| 2 | Removed phastCons stem score | 0.9603 | 0.9605 |
| 3 | Removed phyloP flanking score | 0.9596 | 0.9599 |
| 4 | Removed phyloP stem score | 0.9594 | 0.9597 |
| 5 | Removed binding energy | 0.9583 | 0.9587 |
| 6 | Removed position of longest consecutive pairing | 0.9606 | 0.9609 |
| 7 | Removed length of longest consecutive pairing | 0.9605 | 0.9610 |
| 8 | Removed AU content | 0.9604 | 0.9608 |
| 9 | Removed number of paired positions | 0.9616 | 0.9620 |
| 10 | Removed seed | 0.9602 | 0.9607 |
| 11 | Removed binding region length | 0.9570 | 0.9577 |
| 12 | Removed number of 3′ end pairs | 0.9602 | 0.9605 |
| 13 | Removed difference between seed and 3′ end pairs | 0.9613 | 0.9616 |
| N/A | All RFE ranked > 1 (RFE = 7) | 0.9458 | 0.9453 |
| N/A | All RFE ranked = 1 (RFE = 7) | 0.9423 | 0.9422 |
| N/A | Randomly shuffled | 0.5053 | 0.5054 |
The run times are ranked from 0 (slowest) to 13 (fastest). The clustering of the sensitivity and PPV values after removal of various features suggests that no individual or small subset of features contributes disproportionately to the classification. RFE: Recursive Feature Elimination. Last row: tRForest performs significantly better when labels properly indicate targets and non-targets compared to randomly shuffling the labels.
Figure 3.(A) Heatmap of Pearson correlations between pairs of features for tRF-3009a. Only the two different methods of computing evolutionary conservation scores are highly correlated, with low-to-moderate correlation between all other pairs. (B) Accuracy-efficiency analysis of tRForest. tRForest continues to perform well as the most time-intensive features are removed, allowing it to become more efficient while maintaining accuracy above 95%. (C) Receiver operating characteristic curve. tRForest performs nearly ideally compared to random guessing during training and testing.
Figure 4.(A) tRForest distinguishes targets from non-targets of tRF-3009 in RNA-sequencing data collected after parental tRNA (chr6.trna83-LeuTAA) overexpression in HEK293T cells. This is known to increase the level of tRF-3009a (8). (B) tRForest distinguishes targets from nontargets of tRF-3009 in independent RNA-sequencing data collected from upon tRF-3009 mimic single-stranded overexpression transfection in U87 cells
tRForest outperforms several miRNA target prediction algorithms in its ability to predict repression of target genes upon tRF induction in two different cells
| Experiment | Cell type | Algorithm | # of targets | Effect size |
|
|---|---|---|---|---|---|
| tRNA overexpression (GSE99769) | HEK-293T | miRDB | 287 | −0.0436 | 0.0714 |
| targetScan | 118 | −0.0484 | 0.0830 | ||
| tRFTar | 308 | −0.0770 | 0.0152 | ||
| tRFTarget | 1380 | −0.0417 | 9.74E-04 | ||
| tRFTars | 2519 | −0.0446 | 8.15E-05 | ||
| tRForest | 1083 | −0.0732 | 3.65E-04 | ||
| tRF-3009a mimic overexpression (GSE189510) | U87 | miRDB | 284 | −0.0082 | 0.759 |
| targetScan | 114 | 0.0267 | 0.560 | ||
| tRFTar | 309 | −0.1646 | 0.0015 | ||
| tRFTarget | 1336 | −0.0350 | 0.0219 | ||
| tRFTars | 2458 | −0.0746 | 8.350E-05 | ||
| tRForest | 578 | −0.1458 | 2.64E-05 | ||
| miR-941 mimic overexpression (GSE93717) | HEK-293T | miRDB | 27 | −0.2160 | 9.45E-07 |
| targetScan | 776 | −0.0279 | 3.35E-05 | ||
| miR-101–3p.1 mimic overexpression (GSE180331, sample GSM 5460899) | 143BTK | miRDB | 893 | −0.2177 | < 2E-16 |
| targetScan | 834 | −0.2107 | < 2E-16 | ||
| tRF-3004b overexpression (GSE197091) | HEK-293T | miRDB | 491 | −0.2373 | 2.93E-04 |
| targetScan | 190 | −0.2049 | 6.09E-05 | ||
| tRFTar* | 0 | N/A | N/A | ||
| tRFTarget | 1110 | −0.1189 | 4.40E-06 | ||
| tRFTars | 541 | −0.1883 | 1.55E-07 | ||
| tRForest | 111 | −0.3857 | 3.49E-05 |
* No targets were found using tRFTar for the tRF-3004b sequence.
tRForest matches or outperforms all existing tRF target prediction algorithms in its ability to predict repression of target genes upon tRF induction in two different cells across three experiments and two tRFs. Its performance is also comparable to miRDB and TargetScan evaluated on miRNA mimic overexpression data.
Figure 5.Several tRFs and tRF target GO terms are conserved. (A) Infographic showing the top 10 GO Biological Processes targeted by tRFs terms across four species. (B) Upset plot showing the intersection of conserved tRF sequences across four species. (C) Venn diagram of the intersection of conserved, significant GO terms of targets predicted for the 54 conserved tRFs between humans and mice. (D) Top 10 GO terms conserved among predicted targets of human and mouse tRFs. Top ten GO terms were selected based on q value. MF: molecular function. CC: cellular component. BP: biological process. (E) An example gene ontology analysis plot describing biological processes enriched among predicted targets of tRF-3009a. Left: dot plot. Right: gene-concept network plot.