| Literature DB >> 29244011 |
Abstract
BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition.Entities:
Keywords: Auto-encoder; Deep learning; Knowledge-based discovery; Long intergenic non-coding RNA (lincRNA); RNA-seq; Transcription sites
Mesh:
Substances:
Year: 2017 PMID: 29244011 PMCID: PMC5731497 DOI: 10.1186/s12859-017-1922-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Architecture of Deep Neural Network. a An Illustration of Deep Neural Network Architecture. b An Illustration of Auto-encoder
Fig. 2Flow Chart for Auto-encoder Method
Fig. 3Five Encoding Schemes
Results on lincRNA Acceptor Data
|
| I | II | III | IV | V |
|---|---|---|---|---|---|
| TP | 49.4 | 49.4 | 49.0 | 49.4 | 49.4 |
| FP | 0.0 | 0.2 | 0.0 | 1.4 | 50.6 |
| FN | 0.0 | 0.4 | 0.0 | 0.1 | 0.0 |
| TN | 50.5 | 50.4 | 0.6 | 49.2 | 0.0 |
|
| I | II | III | IV | V |
| Sn |
| 99.2 |
| 99.9 | 100.0* |
| Sp | 99.9 | 99.6 |
| 97.2 | 0.0 |
| Acc | 100.0 | 99.4 |
| 98.5 | 49.4 |
| Mcc | 99.9 | 98.8 |
| 97.1 | – |
| Ppv | 99.9 | 99.6 |
| 97.2 | 49.4 |
| Pc | 99.9 | 98.8 |
| 97.1 | 49.4 |
| F1 |
| 99.4 |
| 98.5 | 66.1 |
I: DAX, II: EIIP, III: Complimentary, IV: Enthalpy, V: Galois
Panel : the measurement of methods
TP: True positive
FP: False positive
FN: False negative
TN: True negative
Panel : the evaluation of methods
Sensitivity, Sn=TP/(TP+FN)
Specificity, Sp=TN/(TN+FP)
Accuracy, Acc=(TP+TN)/(TP+FP+FN+TN)
Matthews correlation coefficient,
Positive predictive value, Ppv=TP/(TP+FP)
Performance coefficient, Pc=TP/(TP+FN+FP)
F1 score, the harmonic mean of precision and sensitivity, F1=2×TP/(2×TP+FP+FN)
*: Not eligible for comparison due to training failure
–: Invalid value
Results on lincRNA Donor Data
|
| I | II | III | IV | V |
|---|---|---|---|---|---|
| TP | 7.7 | 9.0 | 8.5 | 11.2 | 0.0 |
| FP | 2.1 | 2.7 | 2.8 | 4.5 | 0.0 |
| FN | 6.7 | 5.4 | 5.9 | 3.2 | 14.4 |
| TN | 83.5 | 82.9 | 82.8 | 81.1 | 85.6 |
|
| I | II | III | IV | V |
| Sn | 53.2 | 62.5 | 58.8 |
| 0.0 |
| Sp |
| 96.9 | 96.7 | 94.8 | 100.0* |
| Acc | 91.2 | 91.9 | 91.2 |
| 85.6 |
| Mcc | 60.1 | 64.9 | 61.5 |
| – |
| Ppv |
| 77.1 | 75.0 | 71.5 | – |
| Pc | 46.5 | 52.7 | 49.1 |
| 0.0 |
| F1 | 63.5 | 69.0 | 65.9 |
| 0.0 |
I: DAX, II: EIIP, III: Complimentary, IV: Enthalpy, V: Galois
Panel : the measurement of methods
Panel : the evaluation of methods
*: Not eligible for comparison due to training failure
–: Invalid value
Fig. 4Comparison between Support Vector Machine and Deep Learning on lincRNA Acceptor Data Set
Fig. 5Comparison between Support Vector Machine and Deep Learning on lincRNA Donor Data Set
Fig. 6Comparison between Conventional Neural Network Method and Deep Learning Method on lincRNA Acceptor Data Set
Fig. 7Comparison between Conventional Neural Network Method and Deep Learning Method on lincRNA Donor Data Set
Fig. 8An Unidentified lincRNA Acceptor Site