| Literature DB >> 29246520 |
Sriparna Saha1, Sayantan Mitra2, Ravi Kant Yadav2.
Abstract
MicroRNA (miRNA) plays vital roles in biological processes like RNA splicing and regulation of gene expression. Studies have revealed that there might be possible links between oncogenesis and expression profiles of some miRNAs, due to their differential expression between normal and tumor tissues. However, the automatic classification of miRNAs into different categories by considering the similarity of their expression values has rarely been addressed. This article proposes a solution framework for solving some real-life classification problems related to cancer, miRNA, and mRNA expression datasets. In the first stage, a multiobjective optimization based framework, non-dominated sorting genetic algorithm II, is proposed to automatically determine the appropriate classifier type, along with its suitable parameter and feature combinations, pertinent for classifying a given dataset. In the second page, a stack-based ensemble technique is employed to get a single combinatorial solution from the set of solutions obtained in the first stage. The performance of the proposed two-stage approach is evaluated on several cancer and RNA expression profile datasets. Compared to several state-of-the-art approaches for classifying different datasets, our method shows supremacy in the accuracy of classification.Entities:
Keywords: MicroRNA; Multiobjective optimization; Non-dominated sorting genetic algorithm; Sequential minimal optimizer
Mesh:
Substances:
Year: 2017 PMID: 29246520 PMCID: PMC5828659 DOI: 10.1016/j.gpb.2016.10.006
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Two stages of the proposed NSGA-II-based approach
A. String/solution representation showing the first stage. There are three parts involved, including the type of classifier, parameters corresponding to the selected classifier, and feature combination. B. Stack-based ensemble showing the second stage of the proposed approach. S1, S2, …, S represent the samples present in the dataset; F1, F2, …, F represent the corresponding features; P1, P2, …, P represent the predicted class labels corresponding to a particular classifier. The absence and presence of a particular feature are indicated with “0” and “1”, respectively. NSGA-II, non-dominated sorting genetic algorithm-II.
Figure 2Steps of the proposed method
Performance comparison between the proposed approach and SVM-nRFE
| SPECT | 267 | 44 | 0.9489 | 1.0000 | 17 | 18 | 95.87 | 87.30 |
| GCM miRNA | 89 | 100 | 0.9350 | 0.9597 | 13 | 6 | 97.43 | 95.80 |
| GCM mRNA | 89 | 100 | 0.9670 | 0.9804 | 16 | 7 | 97.14 | 94.60 |
| GCM miRNA 217 | 75 | 99 | 0.9460 | 0.9780 | 12 | 8 | 97.11 | 88.30 |
| POM | 90 | 100 | 0.8630 | 0.8400 | 23 | 20 | 84.00 | 76.00 |
Note: SVM-nRFE, support vector machine-based recursive feature elimination technique.
Number of mRNA targets and cancer types associated with the selected miRNAs for the GCM miRNA dataset
| 1 | hsa-miR-18 | 682 | HCC/liver, lung, follicular lymphoma |
| 2 | hsa-miR-101 | 671 | Breast, lung, ovary |
| 3 | hsa-miR-126* | 644 | Colon, CNS, lung, hematologic, HCC/liver |
| 4 | hsa-miR-30d | 1603 | CNS |
| 5 | hsa-miR-30a | 1609 | Lung |
| 6 | hsa-miR-152 | 559 | Colon, hematologic |
| 7 | hsa-miR-148 | 945 | Pancreas |
| 8 | hsa-miR-185 | 1517 | Bladder, kidney |
| 9 | hsa-miR-199a* | 621 | Colon, HCC/liver, hematologic |
| 10 | mmu-miR-342 | 542 | – |
| 11 | mmu-miR-340 | 538 | – |
Note: Data were generated based on the data obtained in [29], which is a mammalian dataset. HCC, hepatocellular carcinoma; CNS, central nervous system.
Number of mRNA targets and cancer types associated with the selected miRNAs for the GCM miRNA 217 dataset
| 1 | hsa-miR-99a | 41 | Colon, lung, uterus, hematologic |
| 2 | hsa-miR-197 | 436 | CNS, thyroid, uterus |
| 3 | hsa-miR-220 | – | – |
| 4 | hsa-miR-195 | 1497 | CLL, CNS, HCC/liver, lung, hematologic, uterus |
| 5 | hsa-miR-154 | 373 | CNS |
| 6 | hsa-miR-184 | 45 | Uterus |
| 7 | hsa-miR-133a | 310 | Bladder, breast |
| 8 | hsa-miR-32 | 880 | Colon, pancreas, lung, prostate, uterus |
| 9 | mmu-miR-292 | 497 | – |
| 10 | mmu-miR-293* | 266 | – |
| 11 | mmu-miR-339 | 256 | – |
Note: GCM miRNA217 dataset was generated based on the data obtained in [29], which is a mammalian dataset. HCC, hepatocellular carcinoma; CNS, central nervous system; CLL, chronic lymphocytic leukemia.
Top significant KEGG pathways identified for the GCM miRNA dataset
| 1 | hsa-miR-18 | Gap junction | 5.2E−3 |
| 2 | hsa-miR-101 | Ubiquitin mediated proteolysis | 8.8E−3 |
| 3 | hsa-miR-126* | Melanogenesis | 1.5E−2 |
| 4 | hsa-miR-30d | Ubiquitin mediated proteolysis | 2.1E−3 |
| 5 | hsa-miR-30a | Ubiquitin mediated proteolysis | 1.8E−3 |
| 6 | hsa-miR-152 | Ubiquitin mediated proteolysis | 7.7E−4 |
| 7 | hsa-miR-148 | Ubiquitin mediated proteolysis | 2.8E−4 |
| 8 | hsa-miR-185 | Axon guidance | 4.5E−5 |
| 9 | hsa-miR-199a* | Axon guidance | 1.3E−4 |
Top significant KEGG pathways identified for the GCM miRNA_217 dataset
| 1 | hsa-miR-99a | MAPK signaling pathway | 7.7E−2 |
| 2 | hsa-miR-197 | MAPK signaling pathway | 4.4E−3 |
| 3 | hsa-miR-195 | Pathways in cancer | 1.1E−3 |
| 4 | hsa-miR-154 | Ubiquitin-mediated proteolysis | 4.8E−3 |
| 5 | has-miR-184 | Neurodegenerative diseases | 1.3E−3 |
| 6 | hsa-miR-133a | Seleno amino acid metabolism | 3.3E−2 |
| 7 | hsa-miR-32 | RNA degradation | 1.1E−2 |