| Literature DB >> 35004605 |
Elakkiya R1, Deepak Kumar Jain2, Ketan Kotecha3, Sharnil Pandya4, Sai Siddhartha Reddy1, Rajalakshmi E1, Vijayakumar Varadarajan5, Aniket Mahanti6, Subramaniyaswamy V1.
Abstract
Over the last decade, the field of bioinformatics has been increasing rapidly. Robust bioinformatics tools are going to play a vital role in future progress. Scientists working in the field of bioinformatics conduct a large number of researches to extract knowledge from the biological data available. Several bioinformatics issues have evolved as a result of the creation of massive amounts of unbalanced data. The classification of precursor microRNA (pre miRNA) from the imbalanced RNA genome data is one such problem. The examinations proved that pre miRNAs (precursor microRNAs) could serve as oncogene or tumor suppressors in various cancer types. This paper introduces a Hybrid Deep Neural Network framework (H-DNN) for the classification of pre miRNA in imbalanced data. The proposed H-DNN framework is an integration of Deep Artificial Neural Networks (Deep ANN) and Deep Decision Tree Classifiers. The Deep ANN in the proposed H-DNN helps to extract the meaningful features and the Deep Decision Tree Classifier helps to classify the pre miRNA accurately. Experimentation of H-DNN was done with genomes of animals, plants, humans, and Arabidopsis with an imbalance ratio up to 1:5000 and virus with a ratio of 1:400. Experimental results showed an accuracy of more than 99% in all the cases and the time complexity of the proposed H-DNN is also very less when compared with the other existing approaches.Entities:
Keywords: bioinformatics; deep artificial neural network; deep decision tree classifier; hybrid deep neural network; precursor microRNA
Mesh:
Substances:
Year: 2021 PMID: 35004605 PMCID: PMC8733243 DOI: 10.3389/fpubh.2021.821410
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Comparison of various state-of-the-art methodologies.
|
|
|
|
|
|---|---|---|---|
| Park ( | Recurrent Neural Network (RNN). | Various Metrics were used. Experimental results yielded an F1-score of 0.93 for the Human dataset, 0.94 for the Cross-Species dataset, and 0.93 for the New pre-miRNA dataset. | Human, Cross-Species and New miRNA |
| Thomas et al. ( | Deep Learning with a restricted Boltzmann machine (RBM) for classification and Modified sampling technique is applied to address the class imbalance problem. | Various Metrics were used. The model gave an accuracy of 0.968 | Human Dataset |
| Stegmayer et al. ( | Deep Self-organizing maps (DeepSOM) with Clustering | Various evaluation metrics were used. The model provided almost 95% of Gm and 97% of accuracy for the most imbalanced data. | H. sapien and A. thaliana, Animal, and Plant |
| Bugnon et al. ( | Hybrid deep learning method with CNN and LSTM. | Various Metrics were used for different imbalance ratios. Experimental results yielded an F1-score of more than 40% for Animal and more than 33% for the Plant dataset. | Animal and Plant |
| Tang et al. ( | CNN with different types of feature learning and encoding methods. | The results gave an accuracy of 99.25% and an F1-score of 99.25% for Rfam-300 | Rfam-300, Rfam-120, Rfam-60, Rfam-30 |
| Shi et al. ( | Localized multiple kernel learning model with a nonlinear synthetic kernel (LMKL-D) | The model gave an accuracy of 98.3%, sensitivity of 93.06 | pre-microRNAsand pseudo pre-microRNA |
| Yones et al. ( | Convolutional deep residual neural network | The model provided a precision of | A. thaliana, C. elegans, A. gambiae, and H. Sapiens |
| Proposed HDNN Methodology | Artificial Neural Network embedded with Decision Tree | Various metrics were used for evaluation for different imbalance ratios. The results yielded F1 Score of more than 0.50 (0.50–0.93) for Animal, more than 0.58 (0.58–0.95) for Plant, more than 0.60 (0.60–0.94) for Human, more than 0.50 (0.50–0.95) for Arabidopsis and more than 0.50 (0.50–0.94) for Virus. | Animal, Plant, Human, Arabidopsis, and Virus. |
Figure 1Correlation matrix.
Figure 2Structure of ANN in each node of Decision Tree in the proposed framework.
Proposed H-DNN.
| 1. |
| 2. |
| 3. |
| 4. |
| 5. |
| 6. |
| 7. |
| 8. |
| 9. |
| 10. |
| 11. |
| 12. |
| 13. |
| 14. |
| 15. |
| 16. |
| 17. |
Modified Backpropagation(a).
| 1. |
| 2. |
| 3. |
| 4. |
| 5. |
| 6. |
| 7. |
| 8. |
| 9. |
| 10. |
| 11. |
| 12. |
| 13. |
Figure 3Proposed H-DNN framework.
The no. of positive and negative samples for different Imbalance Ratios.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 1:1 | 7,053 | 7,053 | 2,172 | 2,172 | 1,406 | 1,406 | 231 | 231 |
| 1:100 | 2,182 | 218,154 | 1,149 | 114,929 | 812 | 81,228 | 231 | 23,100 |
| 1:500 | 436 | 218,154 | 230 | 114,929 | 162 | 81,228 | 57 | 28,359 |
| 1:1000 | 218 | 218,154 | 115 | 114,929 | 81 | 81,228 | 28 | 28,359 |
| 1:1500 | 145 | 218,154 | 77 | 114,929 | 54 | 81,228 | 19 | 28,359 |
| 1:2000 | 109 | 218,154 | 57 | 114,929 | 41 | 81,228 | 14 | 28,359 |
| 1:2500 | 87 | 218,154 | 46 | 114,929 | 32 | 81,228 | 11 | 28,359 |
| 1:3000 | 73 | 218,154 | 38 | 114,929 | 27 | 81,228 | 9 | 28,359 |
| 1:3500 | 62 | 218,154 | 33 | 114,929 | 23 | 81,228 | 8 | 28,359 |
| 1:4000 | 55 | 218,154 | 29 | 114,929 | 20 | 81,228 | 7 | 28,359 |
| 1:4500 | 48 | 218,154 | 26 | 114,929 | 18 | 81,228 | 6 | 28,359 |
| 1:5000 | 44 | 218,154 | 23 | 114,929 | 16 | 81,228 | 6 | 28,359 |
The number of positive and negative samples for different IRs for virus.
|
|
| |
|---|---|---|
|
|
| |
| 1:1 | 237 | 237 |
| 1:50 | 17 | 839 |
| 1:100 | 8 | 839 |
| 1:200 | 4 | 839 |
| 1:300 | 3 | 839 |
| 1:400 | 2 | 839 |
Figure 4Experimental results using (A) Animal, (B) Plant, (C) Human, (D) Arabidopsis and, (E) Virus genome datasets.
Experimental results using animal dataset.
|
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 0.51 | 0.03 | 1.00 | 0.37 | 0.73 | 0.57 | 0.49 | 0.96 | 0.65 | 0.83 | 0.93 | 0.94 | 0.93 | 0.93 | 0.93 |
| 100 | 0.99 | 0.99 | 0.66 | 0.74 | 0.69 | 0.99 | 0.81 | 0.83 | 0.78 | 0.76 | 0.99 | 0.64 | 1.00 | 0.82 | 0.82 |
| 500 | 1.00 | 1.00 | 0.53 | 0.50 | 0.50 | 1.00 | 0.72 | 0.77 | 0.60 | 0.59 | 1.00 | 0.44 | 1.00 | 0.70 | 0.68 |
| 1,000 | 1.00 | 1.00 | 0.67 | 0.50 | 0.50 | 1.00 | 0.71 | 0.83 | 0.59 | 0.59 | 1.00 | 0.41 | 1.00 | 0.69 | 0.67 |
| 1,500 | 1.00 | 1.00 | 0.60 | 0.50 | 0.50 | 1.00 | 0.71 | 0.80 | 0.61 | 0.61 | 1.00 | 0.41 | 1.00 | 0.72 | 0.73 |
| 2,000 | 1.00 | 1.00 | 0.44 | 0.50 | 0.50 | 1.00 | 0.63 | 0.72 | 0.57 | 0.58 | 1.00 | 0.26 | 1.00 | 0.64 | 0.65 |
| 2,500 | 1.00 | 1.00 | 0.45 | 0.50 | 0.50 | 1.00 | 0.58 | 0.73 | 0.54 | 0.55 | 1.00 | 0.15 | 1.00 | 0.58 | 0.59 |
| 3,000 | 1.00 | 1.00 | 0.33 | 0.50 | 0.50 | 1.00 | 0.63 | 0.67 | 0.58 | 0.60 | 1.00 | 0.26 | 1.00 | 0.66 | 0.69 |
| 3,500 | 1.00 | 1.00 | 0.13 | 0.50 | 0.50 | 1.00 | 0.60 | 0.56 | 0.55 | 0.55 | 1.00 | 0.20 | 1.00 | 0.60 | 0.60 |
| 4,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.50 | 0.50 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |
| 4,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.50 | 0.50 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |
| 5,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.55 | 0.50 | 0.52 | 0.52 | 1.00 | 0.10 | 1.00 | 0.55 | 0.55 |
Experimental results using plant dataset.
|
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 0.51 | 1.00 | 0.96 | 0.99 | 0.98 | 0.73 | 0.97 | 0.96 | 0.97 | 0.97 | 0.95 | 0.94 | 0.97 | 0.95 | 0.95 |
| 100 | 0.99 | 1.00 | 0.55 | 0.76 | 0.49 | 0.99 | 0.88 | 0.77 | 0.82 | 0.68 | 1.00 | 0.77 | 1.00 | 0.87 | 0.87 |
| 500 | 1.00 | 1.00 | 0.36 | 0.50 | 0.50 | 1.00 | 0.79 | 0.68 | 0.63 | 0.62 | 1.00 | 0.58 | 1.00 | 0.76 | 0.74 |
| 1,000 | 1.00 | 1.00 | 0.45 | 0.50 | 0.50 | 1.00 | 0.77 | 0.73 | 0.61 | 0.59 | 1.00 | 0.53 | 1.00 | 0.71 | 0.67 |
| 1,500 | 1.00 | 1.00 | 0.54 | 0.50 | 0.50 | 1.00 | 0.67 | 0.77 | 0.58 | 0.57 | 1.00 | 0.33 | 1.00 | 0.65 | 0.64 |
| 2,000 | 1.00 | 1.00 | 0.50 | 0.50 | 0.50 | 1.00 | 0.68 | 0.75 | 0.60 | 0.61 | 1.00 | 0.36 | 1.00 | 0.70 | 0.73 |
| 2,500 | 1.00 | 1.00 | 0.33 | 0.50 | 0.50 | 1.00 | 0.64 | 0.67 | 0.57 | 0.57 | 1.00 | 0.27 | 1.00 | 0.64 | 0.64 |
| 3,000 | 1.00 | 1.00 | 0.11 | 0.50 | 0.50 | 1.00 | 0.71 | 0.56 | 0.57 | 0.55 | 1.00 | 0.43 | 1.00 | 0.64 | 0.61 |
| 3,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.80 | 0.50 | 0.61 | 0.58 | 1.00 | 0.60 | 1.00 | 0.71 | 0.67 |
| 4,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.80 | 0.50 | 0.62 | 0.61 | 1.00 | 0.60 | 1.00 | 0.75 | 0.71 |
| 4,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.58 | 0.50 | 0.54 | 0.54 | 1.00 | 0.17 | 1.00 | 0.58 | 0.58 |
| 5,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.61 | 0.50 | 0.56 | 0.56 | 1.00 | 0.22 | 1.00 | 0.61 | 0.61 |
ANN and DT results using human dataset.
|
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 0.50 | 1.00 | 0.99 | 0.86 | 0.85 | 0.72 | 0.97 | 0.97 | 0.90 | 0.90 | 0.94 | 0.94 | 0.95 | 0.94 | 0.94 |
| 100 | 0.99 | 1.00 | 0.79 | 0.50 | 0.50 | 0.99 | 0.85 | 0.89 | 0.67 | 0.66 | 0.99 | 0.69 | 1.00 | 0.83 | 0.83 |
| 500 | 1.00 | 1.00 | 0.56 | 0.50 | 0.50 | 1.00 | 0.75 | 0.78 | 0.63 | 0.63 | 1.00 | 0.51 | 1.00 | 0.76 | 0.76 |
| 1,000 | 1.00 | 1.00 | 0.46 | 0.50 | 0.50 | 1.00 | 0.67 | 0.73 | 0.61 | 0.65 | 1.00 | 0.34 | 1.00 | 0.72 | 0.81 |
| 1,500 | 1.00 | 1.00 | 0.41 | 0.50 | 0.50 | 1.00 | 0.76 | 0.71 | 0.62 | 0.61 | 1.00 | 0.53 | 1.00 | 0.74 | 0.72 |
| 2,000 | 1.00 | 1.00 | 0.45 | 0.50 | 0.50 | 1.00 | 0.79 | 0.72 | 0.63 | 0.62 | 1.00 | 0.57 | 1.00 | 0.77 | 0.75 |
| 2,500 | 1.00 | 1.00 | 0.24 | 0.50 | 0.50 | 1.00 | 0.75 | 0.62 | 0.59 | 0.57 | 1.00 | 0.50 | 1.00 | 0.69 | 0.65 |
| 3,000 | 1.00 | 1.00 | 0.13 | 0.50 | 0.50 | 1.00 | 0.80 | 0.56 | 0.58 | 0.56 | 1.00 | 0.60 | 1.00 | 0.67 | 0.62 |
| 3,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.60 | 0.50 | 0.55 | 0.55 | 1.00 | 0.20 | 1.00 | 0.60 | 0.60 |
| 4,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.80 | 0.50 | 0.65 | 0.65 | 1.00 | 0.60 | 1.00 | 0.80 | 0.80 |
| 4,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 | 0.56 | 0.54 | 1.00 | 1.00 | 1.00 | 0.62 | 0.57 |
| 5,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.92 | 0.50 | 0.63 | 0.60 | 1.00 | 0.83 | 1.00 | 0.76 | 0.69 |
ANN and DT results using arabidopsis dataset.
|
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 0.54 | 1.00 | 0.85 | 0.35 | 0.27 | 0.74 | 0.98 | 0.87 | 0.64 | 0.60 | 0.93 | 0.96 | 0.90 | 0.93 | 0.93 |
| 100 | 0.54 | 1.00 | 0.75 | 0.35 | 0.27 | 0.75 | 0.98 | 0.85 | 0.65 | 0.61 | 0.95 | 0.96 | 0.95 | 0.95 | 0.95 |
| 500 | 1.00 | 1.00 | 0.12 | 0.50 | 0.50 | 1.00 | 0.79 | 0.56 | 0.66 | 0.67 | 1.00 | 0.58 | 1.00 | 0.81 | 0.84 |
| 1,000 | 1.00 | 1.00 | 0.25 | 0.50 | 0.50 | 1.00 | 0.70 | 0.63 | 0.61 | 0.62 | 1.00 | 0.40 | 1.00 | 0.72 | 0.75 |
| 1,500 | 1.00 | 1.00 | 0.55 | 0.50 | 0.50 | 1.00 | 0.55 | 0.77 | 0.53 | 0.53 | 1.00 | 0.10 | 1.00 | 0.56 | 0.56 |
| 2,000 | 1.00 | 1.00 | 0.24 | 0.50 | 0.50 | 1.00 | 0.75 | 0.62 | 0.62 | 0.62 | 1.00 | 0.50 | 1.00 | 0.75 | 0.75 |
| 2,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.50 | 0.50 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |
| 3,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.63 | 0.50 | 0.57 | 0.58 | 1.00 | 0.25 | 1.00 | 0.64 | 0.67 |
| 3,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 | 0.60 | 0.56 | 1.00 | 1.00 | 1.00 | 0.70 | 0.63 |
| 4,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.75 | 0.50 | 0.57 | 0.55 | 1.00 | 0.50 | 1.00 | 0.64 | 0.60 |
| 4,500 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.50 | 0.50 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |
| 5,000 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.50 | 0.50 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |
ANN and DT results using virus dataset.
|
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 0.52 | 0.48 | 0.99 | 0.52 | 0.52 | 0.86 | 0.94 | 0.92 | 0.86 | 0.92 | 0.94 | 0.92 | 0.95 | 0.94 | 0.94 |
| 100 | 0.99 | 1.00 | 0.32 | 0.50 | 0.49 | 0.99 | 1.00 | 0.92 | 0.90 | 0.93 | 1.00 | 1.00 | 1.00 | 0.94 | 0.90 |
| 200 | 0.99 | 1.00 | 0.16 | 0.50 | 0.50 | 0.99 | 1.00 | 0.96 | 0.48 | 0.44 | 0.99 | 0.00 | 1.00 | 0.50 | 0.50 |
| 300 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 0.99 | 0.00 | 0.98 | 0.76 | 0.65 | 0.99 | 1.00 | 0.99 | 0.75 | 0.67 |
| 400 | 1.00 | 1.00 | 0.00 | 0.50 | 0.50 | 1.00 | 0.00 | 1.00 | 0.41 | 0.49 | 1.00 | 0.00 | 1.00 | 0.50 | 0.50 |