| Literature DB >> 31874602 |
Weizhong Lu1, Ye Tang1, Hongjie Wu2,3, Hongmei Huang1, Qiming Fu1, Jing Qiu1, Haiou Li1.
Abstract
BACKGROUND: RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity.Entities:
Keywords: LSTM; Pseudoknots; RNA; Recurrent neural network; Secondary structure prediction
Mesh:
Substances:
Year: 2019 PMID: 31874602 PMCID: PMC6929275 DOI: 10.1186/s12859-019-3258-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Framework of the Adaptive-LSTM with filter
Datasets
| Dataset | Number | Average | Max | Min |
|---|---|---|---|---|
| TMR | 721 | 361.1 | 463 | 102 |
| ASE | 454 | 332.6 | 486 | 189 |
| SPR | 622 | 77.3 | 93 | 54 |
| SRP | 383 | 224.7 | 533 | 66 |
| RFA | 313 | 118.9 | 553 | 40 |
Fig. 2Scatter of accuracy comparison between adaptive-LSTM with and without filter
MCC and ACC of adaptive LSTM and other three methods
| Dataset | Metrics | ProbKnot | Cylofold | CentroidFold | Adaptive | Filter |
|---|---|---|---|---|---|---|
| TMR | MCC | 0.105 | −0.043 | 0.106 | 0.434 | |
| ACC | 0.531 | 0.485 | 0.561 | 0.630 | ||
| SPR | MCC | 0.591 | * | 0.668 | 0.751 | |
| ACC | 0.796 | * | 0.834 | 0.870 | ||
| SRP | MCC | 0.262 | −0.184 | 0.177 | 0.421 | |
| ACC | 0.613 | 0.396 | 0.584 | 0.690 | ||
| RFA | MCC | 0.398 | 0.256 | 0.299 | 0.699 | |
| ACC | 0.677 | 0.624 | 0.650 | 0.661 | ||
| ASE | MCC | 0.238 | 0.043 | 0.286 | 0.323 | |
| ACC | 0.611 | 0.523 | 0.642 | 0.556 | ||
| Average | MCC | 0.319 | 0.014 | 0.307 | 0.483 | |
| ACC | 0.646 | 0.406 | 0.654 | 0.689 |
Boldface represents the highest MCC or ACC in comparison with the other three methods
*indicates Cylofold does not generate results on SPR dataset, since Cylofold can not accept the sequence with missing bases in SPR dataset
Fig. 3Native secondary structure of RFA_00633
Fig. 4Predicted secondary structure of RFA_00633. a ProbKnot. b Cylofold. c Centroidfold. d Adaptive LSTM with energy-based filter