| Literature DB >> 33980138 |
Jun Meng1, Qiang Kang1, Zheng Chang1, Yushi Luan2.
Abstract
BACKGROUND: Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary.Entities:
Keywords: Convolutional neural network; Deep learning; Long short-term memory; Plant; Prediction; lncRNA
Mesh:
Substances:
Year: 2021 PMID: 33980138 PMCID: PMC8114701 DOI: 10.1186/s12859-020-03870-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Effect evaluations of different values of p in lncRNA-LSTM using 5-fold cross validation
Fig. 2Effect evaluations of different hybrid strategies in PlncRNA-HDeep using 5-fold cross validation
Least significant difference of compared methods
| Method | Compared method | |
|---|---|---|
| PlncRNA-HDeep_G | CNN | 0.001+ |
| lncRNA-LSTM | 0.001+ | |
| PlncRNA-HDeep_C | 0.078+ | |
| PlncRNA-HDeep_L | 0.745− | |
| PlncRNA-HDeep_C | CNN | 0.001+ |
| lncRNA-LSTM | 0.001+ | |
| PlncRNA-HDeep_G | 0.078− | |
| PlncRNA-HDeep_L | 0.040− | |
| PlncRNA-HDeep_L | CNN | 0.001+ |
| lncRNA-LSTM | 0.001+ | |
| PlncRNA-HDeep_G | 0.745+ | |
| PlncRNA-HDeep_C | 0.040+ |
“ + ” means the method obtains better accuracy than the compared method. “−” means the compared method obtains better accuracy than the method. There is significant difference between the results obtained by two methods with the significance level of 0.05 when p value ⩽ 0.05
Impact evaluations of balanced and imbalanced sample datasets on performance of PlncRNA-HDeep
| Ratio | F1-score (%) | AUC (%) | GM (%) |
|---|---|---|---|
| 1:1 | 96.5 | 99.3 | 96.5 |
| 1:2 | 76.5 | 91.6 | 82.7 |
| 1:3 | 70.4 | 91.1 | 80.7 |
“Ratio” refers to the ratio of positive samples and negative samples in the dataset
Performance of PlncRNA-HDeep compared with six shallow machine learning methods
| Method | Sensitivity (%) | Precision (%) | Accuracy (%) | F1-score (%) |
|---|---|---|---|---|
| SVM | 87.8 | 92.3 | 90.6 | 90.0 |
| RF | 95.2 | 95.1 | 95.3 | 95.1 |
| k-NN | 90.6 | 94.0 | 92.7 | 92.3 |
| DT | 93.9 | 94.6 | 94.5 | 94.3 |
| NB | 76.7 | 80.3 | 80.0 | 78.4 |
| LR | 84.4 | 96.4 | 91.0 | 90.0 |
| PlncRNA-HDeep | 97.9 | 95.1 | 96.5 | 96.5 |
Fig. 3ROC curves and AUC values obtained by PlncRNA-HDeep and six shallow machine learning methods
Performance of PlncRNA-HDeep compared with five existing tools
| Tool | Sensitivity (%) | Precision (%) | Accuracy (%) | F1-score (%) |
|---|---|---|---|---|
| CNCI | 64.5 | 90.5 | 78.9 | 75.3 |
| PLEK | 93.3 | 68.4 | 75.1 | 78.9 |
| CPC2 | 88.4 | 91.9 | 90.3 | 90.1 |
| LncADeep | 66.6 | 91.0 | 80.0 | 76.9 |
| lncRNAnet | 72.0 | 73.3 | 72.9 | 72.6 |
| PlncRNA-HDeep | 97.9 | 95.1 | 96.5 | 96.5 |
Fig. 4Two encoding styles. a p-nucleotide encoding when the value of p is 3. b one-hot encoding
Fig. 5Architecture of lncRNA-LSTM
Fig. 6Architecture of CNN