| Literature DB >> 32211130 |
Moritz Schäfer1,2, Constance Ciaudo1.
Abstract
MicroRNAs (miRNAs) are well-studied small noncoding RNAs involved in post-transcriptional gene regulation in a wide range of organisms, including mammals. Their function is mediated by base pairing with their target RNAs. Although many features required for miRNA-mediated repression have been described, the identification of functional interactions is still challenging. In the last two decades, numerous Machine Learning (ML) models have been developed to predict their putative targets. In this review, we summarize the biological knowledge and the experimental data used to develop these ML models. Recently, Deep Neural Network-based models have also emerged in miRNA interaction modeling. We thus outline established and emerging models to give a perspective on the future developments needed to improve the identification of genes directly regulated by miRNAs.Entities:
Keywords: Deep Learning; Machine Learning; microRNA target prediction
Year: 2020 PMID: 32211130 PMCID: PMC7082591 DOI: 10.1016/j.csbj.2020.02.019
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Predictive features for functional miRNA-target interactions. a) Manually engineered features based on biological assumptions. Seed and supplementary region of the miRNA are marked in red, corresponding complementary regions on the mRNA are marked in blue. Dots and Ns denote arbitrary nucleotides. Blue Ns and vertical bars denote Watson-Crick base pairing. From top to bottom: Nucleotides 2–8 define the seed of a miRNA. Extensive base matching as well as an Adenine opposite the first miRNA nucleotide generally lead to stronger repression. By simulating a heteroduplex between the miRNA and its putative binding site, the binding free energy of the interaction can be determined. This feature is also used for shorter interaction parts like the seed region. Nucleotides 13–16 of a miRNA are denoted as its supplementary region and the extent of binding to this region is extracted into a feature. Of note, the mRNA can form a bulge opposite the miRNA’s central region. Functional miRNA targets are often conserved and features have been developed to convert conservation into a usable metric. Since mRNAs can fold and form secondary structures, some target sites are more accessible for RISC mediated repression than others. High AU content either near the putative site or in the whole 3′UTR has been shown to increase site accessibility and is therefore commonly used as feature. Not only the folding of mRNA can hinder efficient repression. It has been suggested that the ribosome complex can compete with the RISC for binding, which might explain why coding sequence (CDS) regions are not targeted to the same extent as 3′UTRs. The position () within the 3′UTR as well as the length () of it are therefore important binding site features. Individual miRNAs that target large sets of mRNAs, might distribute their repression potential, leading to a decreased repression level for individual mRNAs. Here, the target site abundance () is counted and used as features. Target nucleotides bound to the seed or the supplementary region are colored in blue and vertical dashes denote Watson-Crick base pairing. b) Implicit feature extraction by neural networks, based on the provided training data, here exemplified for a fictive MIP NN model. Nucleotide identities for the relevant input sequences are the only input data provided. In the first layer(s), simple features are extracted from the raw sequence data, which are then combined to more complex features in the later layers. Such features may resemble engineered features as described in a), but may also include unexpected, yet predictive, representations. This hierarchical structure leads to the autonomous extraction of high-level features, ultimately enabling the assessment of the repression potential of input interactions. b) inspired by [20]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Properties of established and Deep Learning-based miRNA target prediction tools.
| ML type | Accessibility | Organisms | Last update | Features | Output | Training data set | Independent test data set | Particularities | References | |
|---|---|---|---|---|---|---|---|---|---|---|
| TargetScan | Linear regression | Web, Dataset, Source code | mmu, dme, hsa, cel, re | 2015 | Features from | cont. | 74 individual miRNA transfections and subsequent MicroArray readout in HeLa cells | 7 individual miRNA transfections and subsequent MicroArray readout in HCT116 cells; Experimentally validated interactions; CLIP-seq data set | - Strong focus on feature engineering | First: |
| RNA22 | No ML used | Web, Dataset | mmu, hsa, dme, cel, | 2019 | folding energy, heteroduplex | cont. | N/A | N/A | - No ML employed | |
| miSTAR | 2-layer model using logistic regression and random forest | Web | hsa | 2016 | Features from | cont. | Luciferase reporter assay for 17 human mRNAs and 470 miRNA mimics | N/A | - Stacked model for ML based estimation of cooperative repression | |
| MiRTarget | SVM | Web, Dataset | mmu, hsa, rno, clf, ggm | 2019 | 96 features including the features from | cont. | 25 individual miRNA transfections and subsequent RNA-seq in HeLa cells | CLIP-seq data set; concurrent knockout of 25 miRNAs and subsequent MicroArray readout | N/A | First: |
| miRAW | “Normal” DNN | Dataset, Source code | hsa | 2018 | Nucleotide identities of miRNA (30 nts) and target site (40 nts) | t/n/f | Positive: CLASH and CLIP data set intersected with TarBase and mirTarBase validated interactions | 5 individual miRNA tranfections and subsequent MicroArray readout | - Very broad identification of potential interaction sites | |
| deepTarget | RNN with Autoencoder for unsupervised input representation learning | Source code | hsa | 2016 | Nucleotide identities of miRNA (30 nts) and target site (30 nts) | t/f | Positive: Experimentally validated interaction data from miRecords | N/A | N/A | |
| DeepMirTar | Stacked Autoencoder | Source code | hsa | 2018 | 750 features grouped in categories “high level”, “expert-designed”, “low-level” and “raw-data-level” | t/f | Positive: CLASH data set and validated interaction data from miRecords | PAR-CLIP based interactions | N/A | |
| Biochemical affinity CNN1 | CNN | Source code | hsa | 2018 | Nucleotide identities of miRNA (10 nts) and target site (12 nts) | cont. | RISC binding affinity data for 6 individual miRNAs (AGO2 RNA bind-n-seq) | N/A | CNN used for prediction of miRNA-target binding affinities; affinities are forwarded into a separate regressor for final miRNA interaction efficacy prediction | |
t/f (true/false) denotes binary classification, t/n/f (true/neutral/false) denotes ternary classification, cont. denotes a continuous regression.
1No name given by publication.
Abbreviations N/A (Not available, ML (Machine Learning, miRNA (microRNA, CNN (Convolutional Neural Network, SVM (Support Vector Machine, DNN (Deep Neural Network, ORF (Open reading frame, CLIP-seq (Cross-Linking ImmunoPrecipitation high-throughput sequencing, PAR-CLIP (Photoactivatable-Ribonucleoside-Enhanced CLIP-seq, CLASH (Cross-linking, Ligation and Sequencing of Hybrids, RISC (RNA induced silencing complex hsa (Homo sapiens), mmu (Mus musculus), dsa (Drosophila melanogaster), cel (Caenorhabditis elegans), dre (Danio Rerio), rno (Rattus norvegicus), clf (Canis lupus familiaris), ggm (Gallus gallus domesticus).