| Literature DB >> 31796020 |
Johannes Smolander1,2, Alexey Stupnikov3, Galina Glazko4, Matthias Dehmer5,6,7, Frank Emmert-Streib8,9.
Abstract
BACKGROUND: Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation.Entities:
Keywords: Classification; Deep belief network; Deep learning; Lung cancer and Machine learning; Non-coding RNA
Mesh:
Substances:
Year: 2019 PMID: 31796020 PMCID: PMC6892207 DOI: 10.1186/s12885-019-6338-1
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Confusion matrix summarizing the results for binary classifications
| True class | |||
|---|---|---|---|
| Positive | Negative | ||
| Predicted class | Positive | True positives (TP) | False positives (FP) |
| Negative | False negatives (FN) | True negatives (TN) | |
Summary of the best classification results for lung cancer
Only RNA-seq data from protein-coding genes were used. The best classifier is highlighted in green. In addition, for comparison reference results from the literature are shown, highlighted in blue
A: DBN results for the RNA-Seq data set. B: SVM results for the RNA-Seq data set
| A. | |||||||
|---|---|---|---|---|---|---|---|
| DBN | DBN and SVM | ||||||
| Model | Architecture | A % | TPR % | TNR % | A % | TPR % | TNR % |
| DBN + Bprop | A-500-250-100-1 | 95.48±0.25 | 98.06±0.40 | 92.90±0.36 | 94.13±0.27 | 97.58±0.27 | 91.94±0.34 |
| DBN + Bprop | A-100-1 | 95.65±0.13 | 97.90±0.25 | 93.39±0.16 | 94.19±0.11 | 95.16±0.50 | 93.23±0.22 |
| DBN + Rprop | A-5-10-1 | 95.73±0.17 | 98.39±0.50 | 93.06±0.34 | 95.89±0.08 | 98.39±0.50 | 93.39±0.16 |
| DBN + Rprop | A-50-1 | 95.16±0.50 | 98.39±0.50 | 91.94±0.50 | 98.39±0.50 | 93.55±0.50 | |
| Task: AC vs N; non-coding, A=3124 | |||||||
| DBN + Bprop | A-2000-1000-500-1 | 100±0.50 | 93.55±0.50 | 95.16±0.50 | 96.94±0.16 | 93.39±0.16 | |
| DBN + Bprop | A-100-1 | 95.97±0.50 | 100±0.50 | 91.94±0.50 | 95.48±0.13 | 97.42±0.26 | 93.55±0.50 |
| DBN + Rprop | A-5-10-1 | 96.05±0.22 | 98.39±0.50 | 93.71±0.45 | 96.45±0.13 | 98.06±0.22 | 94.84±0.22 |
| DBN + Rprop | A-50-1 | 94.35±0.50 | 100±0.50 | 88.71±0.50 | 95.48±0.13 | 98.39±0.50 | 92.58±0.26 |
| B. | |||||||
| Radial | Linear | ||||||
| Data | Features | A % | TPR % | TNR % | A % | TPR % | TNR % |
| Protein-coding | 12360 | 92.66±0.87 | 94.66±0.91 | 93.00±0.89 | 90.00±0.89 | 96.00±0.85 | |
| Non-coding | 3124 | 91.41±0.97 | 88.00±0.99 | 94.83±0.56 | 90.50±0.73 | 97.33±0.72 | |
The best results are shown in bold
Summary of the best classification results for lung cancer
Only RNA-seq data from non-coding RNAs were used. The best classifier is highlighted in green. In addition, for comparison reference results from the literature are shown, highlighted in blue
Fig. 1Comparison of the results for different classification methods (see x-axis) and data from coding RNAs (red) and non-coding RNAs (green) for RPKM normalization. a Accuracy of the classification. b True positive rate. c True negative rate
Fig. 2Comparison of classification results for SVMs with a linear (red) and radial (green) basis kernel in dependence on the number of input features/RNAs (x-axis). Data are from coding RNAs for RPKM normalization and the label ’all’ corresponds to 12360 RNAs. Feature selection methods used are a Variance, b JIM and c JMI
Fig. 3Comparison of classification results for SVMs with a linear (red) and radial (green) basis kernel in dependence on the number of input features/RNAs (x-axis). Data are from non-coding RNAs for RPKM normalization and the label ’all’ corresponds to 3124 RNAs. Feature selection methods used are a Variance, b JIM and c JMI
Fig. 4Comparison of classification results for SVMs with a linear (red) and radial (green) basis kernel in dependence on the number of input features/RNAs (x-axis). Data are from non-coding RNAs for TPM normalization and the label ’all’ corresponds to 1398 ncRNAs. Feature selection methods used are a Variance, b JIM and c JMI
Fig. 5Comparison of classification results for SVMs with a linear (red) and radial (green) basis kernel in dependence on the number of input features/RNAs (x-axis). Data combine coding and non-coding RNAs and the label ’all’ corresponds to 15484 RNAs. Feature selection method used is JIM. Data were RPKM normalized