| Literature DB >> 35677883 |
Haopeng Yu1, Yiman Qi1, Yiliang Ding1.
Abstract
Deep learning, or artificial neural networks, is a type of machine learning algorithm that can decipher underlying relationships from large volumes of data and has been successfully applied to solve structural biology questions, such as RNA structure. RNA can fold into complex RNA structures by forming hydrogen bonds, thereby playing an essential role in biological processes. While experimental effort has enabled resolving RNA structure at the genome-wide scale, deep learning has been more recently introduced for studying RNA structure and its functionality. Here, we discuss successful applications of deep learning to solve RNA problems, including predictions of RNA structures, non-canonical G-quadruplex, RNA-protein interactions and RNA switches. Following these cases, we give a general guide to deep learning for solving RNA structure problems.Entities:
Keywords: RNA G-quadruplex; RNA secondary structure; RNA structure prediction; RNA tertiary structure; RNA-protein interaction; deep learning
Year: 2022 PMID: 35677883 PMCID: PMC9168262 DOI: 10.3389/fmolb.2022.869601
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Schematic overview of deep learning workflow. (A) Data processing. Supervised learning requires explicit labelling of the data, including class numbers in classification questions and values in regression questions. (B) Model design. Multilayer perceptron (MLP), convolutional neural network (CNN) and recurrent neural network (RNN) are the three main families of deep learning architecture. Typically, deep learning models assemble different architectures based on data structures. (C) Model training. The total training data is first divided into the training set, the validation set and the test set. Then the input data is passed into the model to obtain the predicted values. The loss function is applied to evaluate the difference between the predicted and the true values, whereby the model weights are updated. (D) Model interpretation. Features’ importance can be obtained by in silico mutations. For the CNN model, the features can also be evaluated by extracting the weight matrix of the filter.
Deep learning-based models in RNA structure.
| Function | Name | Model | Method highlights | Link |
|---|---|---|---|---|
| RNA secondary structure prediction | SPOT-RNA | ResNet, LSTM | The model was first trained with a large volume of RNA secondary structures, then trained again using a transfer learning strategy on a small number of validated RNA structures |
|
| CDPfold | CNN, MLP | Predicts the pairing probability matrix of RNA structures and applies dynamic programming methods to generate RNA structures |
| |
| DMfold | Bi-LSTM | Predicts the pairing probability matrix of RNA structures and applies IBPMP methods to generate RNA structures |
| |
|
| CNN, MLP | Integrates RNA thermodynamic method, chemical probing data and co-evolutionary information into the model |
| |
|
| Bi-LSTM | Generates synthetic SHAPE data for RNA structure prediction |
| |
| MXfold2 ( | CNN, Bi-LSTM | Four types of the folding score were calculated for each nucleotide pair |
| |
| Ufold ( | FCN | The input is instead of RNA sequences but a matrix of 16 possible pairings and pairing features for each base pair |
| |
| RNA tertiary structure scoring | ARES | MLP | The model first generated many potential RNA structures by sampling and predicting their different score from the true structure, thus overcoming the problem of insufficient RNA tertiary structures |
|
| G-quadruplexes structure prediction | G4NN | MLP | The model is trained on experimentally validated RNA GQSs and provides a stability score for RNA GQSs |
|
| PENGUINN | CNN | Robustness to unbalanced data sets and easy-to-use web interface |
| |
| G4detector | CNN | Introduces RNA secondary structure information into the model to improve G4 prediction |
| |
| DeepG4 | CNN, MLP | The model is trained on |
| |
| RNA structure-mediated protein interactions prediction | iDeepS | CNN, Bi-LSTM | Combines RNA sequence and RNA structure as input during model training |
|
| PrismNet | CNN, ResNet, SE network | Integrates experimental |
| |
| RNA structure-mediated regulatory elements prediction |
| MLP | Comparably, this outperforming model was achieved by using RNA sequences directly as input data, rather than extracted features |
|