| Literature DB >> 30048452 |
Shib Sankar Bhowmick1,2, Indrajit Saha3, Debotosh Bhattacharjee1, Loredana M Genovese4, Filippo Geraci4.
Abstract
MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3' UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30048452 PMCID: PMC6061989 DOI: 10.1371/journal.pone.0200353
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Graphical representation of the differential evolution based feature selection.
Type and number of tumor samples for the cancer types involved in our study.
Each cancer type is coupled with 412 control samples to form a two-class dataset.
| Cancer | Description | Tumor Samples |
|---|---|---|
| BRCA | Breast invasive carcinoma | 762 |
| KIRC | Kidney renal clear cell carcinoma | 255 |
| LGG | Brain lower grade Glioma | 526 |
| LIHC | Liver hepatocellular carcinoma | 374 |
| LUAD | Lung adenocarcinoma | 452 |
| PAAD | Pancreatic adenocarcinoma | 179 |
| PRAD | Prostate adenocarcinoma | 495 |
| SKCM | Skin cutaneous melanoma | 450 |
| STAD | Stomach adenocarcinoma | 395 |
| THCA | Thyroid carcinoma | 510 |
miRNA panels for the investigated cancer types and their FDR corrected p-values.
| Cancer | Rank | miRNA | FDR | Cancer | Rank | miRNA | FDR |
|---|---|---|---|---|---|---|---|
| BRCA | 1 | hsa-mir-140 | 3.83e-15 | KIRC | 1 | hsa-mir-542 | 5.17e-15 |
| 2 | hsa-mir-100 | 4.35e-05 | 2 | hsa-mir-3065 | 1.64e-14 | ||
| 3 | hsa-mir-375 | 1.19e-09 | 3 | hsa-mir-361 | 4.63e-16 | ||
| 4 | hsa-mir-328 | 1.37e-14 | 4 | hsa-mir-374a | 1.58e-09 | ||
| 5 | hsa-mir-744 | 1.24e-15 | 5 | hsa-mir-500a | 1.52e-08 | ||
| 6 | hsa-mir-324 | 1.19e-09 | 6 | hsa-mir-103-2 | 1.49e-02 | ||
| 7 | hsa-mir-30e | 3.32e-11 | 7 | hsa-mir-18a | 9.45e-14 | ||
| 8 | hsa-mir-1307 | 3.17e-12 | 8 | hsa-mir-203 | 1.49e-02 | ||
| 9 | hsa-mir-26a-2 | 1.48e-10 | 9 | hsa-mir-576 | 1.09e-12 | ||
| 10 | hsa-mir-29c | 9.45e-14 | |||||
| LGG | 1 | hsa-mir-335 | 9.99e-01 | LIHC | 1 | hsa-mir-10b | 1.17e-15 |
| 2 | hsa-mir-148b | 2.52e-15 | 2 | hsa-mir-92a-2 | 1.26e-08 | ||
| 3 | hsa-mir-21 | 9.01e-16 | 3 | hsa-mir-20a | 6.58e-11 | ||
| 4 | hsa-mir-155 | 1.19e-11 | 4 | hsa-mir-181a-1 | 5.09e-14 | ||
| 5 | hsa-mir-574 | 5.89e-13 | 5 | hsa-mir-17 | 1.08e-10 | ||
| 6 | hsa-let-7e | 1.44e-13 | 6 | hsa-mir-92a-1 | 5.29e-16 | ||
| 7 | hsa-mir-455 | 1.82e-16 | 7 | hsa-mir-455 | 3.03e-14 | ||
| 8 | hsa-mir-128-1 | 1.09e-12 | 8 | hsa-mir-34a | 7.51e-10 | ||
| 9 | hsa-mir-424 | 2.24e-14 | |||||
| LUAD | 1 | hsa-mir-140 | 6.47e-11 | PAAD | 1 | hsa-mir-582 | 1.91e-14 |
| 2 | hsa-mir-103-1 | 2.76e-09 | 2 | hsa-mir-151 | 1.09e-14 | ||
| 3 | hsa-mir-195 | 8.12e-12 | 3 | hsa-mir-130b | 1.33e-15 | ||
| 4 | hsa-mir-874 | 2.82e-15 | 4 | hsa-mir-194-2 | 4.01e-12 | ||
| 5 | hsa-mir-149 | 3.00e-13 | 5 | hsa-mir-409 | 1.09e-14 | ||
| 6 | hsa-mir-629 | 4.79e-16 | 6 | hsa-mir-181a-1 | 3.67e-14 | ||
| 7 | hsa-mir-141 | 2.51e-12 | 7 | hsa-mir-598 | 2.08e-12 | ||
| 8 | hsa-mir-574 | 1.19e-09 | 8 | hsa-mir-424 | 1.20e-13 | ||
| 9 | hsa-let-7i | 2.86e-08 | 9 | hsa-mir-574 | 4.63e-16 | ||
| 10 | hsa-mir-625 | 1.24e-15 | |||||
| PRAD | 1 | hsa-mir-27b | 6.47e-11 | SKCM | 1 | hsa-mir-27b | 4.66e-12 |
| 2 | hsa-mir-1287 | 9.49e-10 | 2 | hsa-mir-30b | 5.47e-14 | ||
| 3 | hsa-mir-27a | 4.67e-10 | 3 | hsa-mir-425 | 3.49e-08 | ||
| 4 | hsa-mir-342 | 3.93e-14 | 4 | hsa-mir-30a | 2.85e-10 | ||
| 5 | hsa-mir-146b | 1.01e-13 | 5 | hsa-mir-155 | 7.02e-16 | ||
| 6 | hsa-mir-3065 | 2.34e-10 | 6 | hsa-let-7g | 5.75e-11 | ||
| 7 | hsa-mir-128-2 | 7.55e-12 | 7 | hsa-mir-576 | 2.14e-11 | ||
| 8 | hsa-mir-625 | 9.45e-14 | 8 | hsa-mir-99b | 5.75e-11 | ||
| STAD | 1 | hsa-mir-134 | 1.95e-14 | THCA | 1 | hsa-mir-660 | 3.92e-11 |
| 2 | hsa-mir-381 | 3.52e-14 | 2 | hsa-mir-20a | 7.16e-11 | ||
| 3 | hsa-mir-142 | 1.29e-12 | 3 | hsa-mir-15a | 2.90e-09 | ||
| 4 | hsa-mir-337 | 1.82e-13 | 4 | hsa-mir-93 | 1.70e-09 | ||
| 5 | hsa-mir-127 | 1.20e-07 | 5 | hsa-mir-576 | 6.48e-09 | ||
| 6 | hsa-mir-409 | 5.64e-08 | 6 | hsa-let-7a-2 | 1.86e-13 | ||
| 7 | hsa-mir-542 | 9.79e-10 | 7 | hsa-let-7a-1 | 1.45e-11 | ||
| 8 | hsa-mir-744 | 5.34e-10 | 8 | hsa-mir-24-2 | 2.36e-15 | ||
| 9 | hsa-mir-128-1 | 2.28e-11 | 9 | hsa-mir-33a | 9.99e-13 | ||
| 10 | hsa-mir-199b | 3.20e-06 | |||||
| 11 | hsa-mir-379 | 2.14e-07 |
Average SVM classification accuracy of kernel and non-kernel based methods varying the number of selected miRNAs in the feature selection step.
| Cancer | KPCA | PCA | ||||||
|---|---|---|---|---|---|---|---|---|
| n = 5 | n = 10 | n = 20 | n = 30 | n = 5 | n = 10 | n = 20 | n = 30 | |
| BRCA | 90.87 | 89.16 | 56.07 | 91.18 | 89.53 | 60.42 | ||
| KIRC | 91.63 | 90.87 | 88.37 | 91.21 | 91.33 | 69.39 | ||
| LGG | 90.87 | 91.00 | 89.89 | 90.75 | 91.15 | 90.19 | ||
| LIHC | 91.30 | 91.75 | 88.58 | 91.35 | 90.16 | 89.75 | ||
| LUAD | 90.33 | 92.14 | 90.34 | 89.63 | 90.73 | 88.30 | ||
| PAAD | 91.77 | 91.21 | 75.17 | 90.34 | 75.37 | 75.17 | ||
| PRAD | 90.60 | 90.58 | 89.06 | 90.05 | 90.05 | 67.39 | ||
| SKCM | 92.28 | 91.75 | 90.33 | 91.21 | 91.90 | 89.63 | ||
| STAD | 90.63 | 91.11 | 91.11 | 91.56 | 90.29 | 80.44 | ||
| THCA | 90.67 | 91.95 | 87.76 | 89.94 | 90.90 | 78.62 | ||
| Average | 91.10 | 91.15 | 84.67 | 90.72 | 89.14 | 78.93 | ||
Aggregated score derived from: Accuracy, F-measure, Matthews correlation coefficient (MCC) and Area under the curve (AUC) for SVM classification after different feature selection methods.
AS is bounded in the range of [0, 4].
| Algorithm | BRCA | KIRC | LGG | LIHC | LUAD | PAAD | PRAD | SKCM | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|---|
| Our method | 3.75 | 3.80 | 3.85 | 3.69 | 3.75 | 3.74 | 3.69 | 3.80 | 3.61 | 3.66 |
| DE+KPCA | 3.69 | 3.72 | 3.77 | 3.68 | 3.76 | 3.76 | 3.75 | 3.78 | 3.71 | 3.73 |
| DE+PCA | 3.70 | 3.67 | 3.68 | 3.74 | 3.69 | 3.67 | 3.69 | 3.71 | 3.73 | 3.69 |
| KPCA | 3.67 | 3.67 | 3.69 | 3.67 | 3.68 | 3.71 | 3.68 | 3.71 | 3.67 | 3.68 |
| PCA | 3.67 | 3.66 | 3.68 | 3.69 | 3.68 | 3.68 | 3.68 | 3.68 | 3.68 | 3.68 |
| KECA | 2.57 | 3.59 | 3.65 | 3.68 | 3.69 | 3.65 | 3.68 | 3.69 | 3.68 | 3.65 |
| ICA | 2.79 | 3.66 | 3.50 | 3.53 | 3.52 | 3.65 | 3.53 | 3.54 | 3.52 | 3.47 |
| RFFS | 3,58 | 3,57 | 3,65 | 3,54 | 3,59 | 3,45 | 3,49 | 3,61 | 3,49 | 3,58 |
| MIFS | 3.10 | 3.69 | 3.40 | 3.35 | 3.43 | 3.17 | 3.29 | 3.53 | 3.33 | 3.38 |
| t-test | 3.11 | 3.35 | 3.37 | 3.39 | 3.39 | 3.35 | 3.40 | 3.49 | 3.32 | 3.38 |
| mRMR | 3.02 | 3.69 | 3.39 | 3.32 | 3.37 | 3.29 | 3.37 | 3.50 | 3.28 | 3.33 |
| RankSum | 3.07 | 3.27 | 3.33 | 3.34 | 3.35 | 3.31 | 3.40 | 3.53 | 3.33 | 3.32 |
| JMI | 3.02 | 3.42 | 3.39 | 3.34 | 3.37 | 3.27 | 3.31 | 3.47 | 3.32 | 3.33 |
| SNR | 3.07 | 3.27 | 3.33 | 3.34 | 3.35 | 3.31 | 3.31 | 3.49 | 3.32 | 3.32 |
Fig 2Average aggregated score of the compared algorithms.
This score is the mean of all the per cancer type ASs and it is bounded in the range [0, 4]. We narrowed the x-axis in a shorter range so as to highlight the differences among the methods.
Running time of the compared methods (Format: hh:mm:ss.cents).
| Algorithm | Average time |
|---|---|
| Our method | 03:02:43.64 |
| DE+KPCA | 14:48.51 |
| DE+PCA | 11:57.29 |
| KPCA | 7.71 |
| PCA | 5.38 |
| KECA | 9.01 |
| ICA | 8.14 |
| RFFS | 7.86 |
| MIFS | 0.94 |
| t-test | 1:04.26 |
| mRMR | 0.52 |
| RankSum | 0.97 |
| JMI | 0.56 |
| SNR | 0.57 |
Homogeneity and separation of the tumor and control samples.
| Cell type | Homogeneity | Separation | |
|---|---|---|---|
| Tumor | Control | ||
| BRCA | 0.461 | 0.476 | 0.361 |
| KIRC | 0.597 | 0.548 | 0.449 |
| LGG | 0.777 | 0.928 | 0.801 |
| LIHC | 0.522 | 0.579 | 0.397 |
| LUAD | 0.899 | 0.886 | 0.869 |
| PAAD | 0.581 | 0.428 | 0.343 |
| PRAD | 0.814 | 0.545 | 0.623 |
| SKCM | 0.680 | 0.747 | 0.288 |
| STAD | 0.551 | 0.506 | 0.467 |
| THCA | 0.863 | 0.889 | 0.872 |
| Average | 0.674 | 0.6532 | 0,547 |
Fig 3Comparison with panels derived from differential analysis.
Most common KEGG pathways associated with target genes of our cancer panels.
| KEGG Pathway | BRCA | KIRC | LGG | LIHC | LUAD | PAAD | PRAD | SKCM | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|---|
| hsa04144: Endocytosis | ✓ | ✓ | ✓ | |||||||
| hsa04360: Axon guidance | ✓ | ✓ | ✓ | ✓ | ||||||
| hsa05200: Pathways in cancer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| hsa05205: Proteoglycans in cancer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| hsa04151: PI3K-Akt signaling pathway | ✓ | ✓ | ✓ | ✓ | ||||||
| hsa04010: MAPK signaling pathway | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| hsa04550: Signaling pathways regulating pluripotency of stem cells | ✓ | ✓ | ✓ |
Most significant Gene Ontology terms associated with the miRNA targeted genes for Cellular Component obtained through enrichment analysis via Enrichr [54].
| GO Cellular component | BRCA | KIRC | LGG | LIHC | LUAD | PAAD | PRAD | SKCM | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|---|
| GO:0005829 Cytosol | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| GO:0044456 Synapse part | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| GO:0005654 Nucleoplasm | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| GO:0005911 Cell-cell junction | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| GO:0045202 Synapse | ✓ | ✓ | ✓ | |||||||
| GO:0000785 Chromatin | ✓ | ✓ | ✓ |
Most significant Gene Ontology terms associated with the miRNA targeted genes for Biological Process obtained through enrichment analysis via Enrichr [54].
| GO Biological process | BRCA | KIRC | LGG | LIHC | LUAD | PAAD | PRAD | SKCM | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|---|
| GO:0007411 Axon guidance | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| GO:0097485 Neuron projection guidance | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| GO:0045664 Regulation of neuron differentiation | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| GO:0032989 Cellular component morphogenesis | ✓ | ✓ | ✓ | ✓ | ||||||
| GO:0048598 Embryonic morphogenesis | ✓ | ✓ | ✓ | ✓ | ||||||
| GO:0048729 Tissue morphogenesis | ✓ | ✓ | ✓ |
Fig 4Kaplan-Meier survival plots of 10 best miRNAs based on lowest log-rank p-values for the respectively cancer types, (a) BRCA, (b) KIRC, (c) LGG, (d) LIHC, (e) LUAD, (f) PAAD, (g) PRAD, (h) SKCM, (i) STAD and (j) THCA.