| Literature DB >> 34055490 |
Malik Yousef1,2, Gokhan Goy3, Ramkrishna Mitra4, Christine M Eischen4, Amhar Jabeer3, Burcu Bakir-Gungor3.
Abstract
A better understanding of disease development and progression mechanisms at the molecular level is critical both for the diagnosis of a disease and for the development of therapeutic approaches. The advancements in high throughput technologies allowed to generate mRNA and microRNA (miRNA) expression profiles; and the integrative analysis of these profiles allowed to uncover the functional effects of RNA expression in complex diseases, such as cancer. Several researches attempt to integrate miRNA and mRNA expression profiles using statistical methods such as Pearson correlation, and then combine it with enrichment analysis. In this study, we developed a novel tool called miRcorrNet, which performs machine learning-based integration to analyze miRNA and mRNA gene expression profiles. miRcorrNet groups mRNAs based on their correlation to miRNA expression levels and hence it generates groups of target genes associated with each miRNA. Then, these groups are subject to a rank function for classification. We have evaluated our tool using miRNA and mRNA expression profiling data downloaded from The Cancer Genome Atlas (TCGA), and performed comparative evaluation with existing tools. In our experiments we show that miRcorrNet performs as good as other tools in terms of accuracy (reaching more than 95% AUC value). Additionally, miRcorrNet includes ranking steps to separate two classes, namely case and control, which is not available in other tools. We have also evaluated the performance of miRcorrNet using a completely independent dataset. Moreover, we conducted a comprehensive literature search to explore the biological functions of the identified miRNAs. We have validated our significantly identified miRNA groups against known databases, which yielded about 90% accuracy. Our results suggest that miRcorrNet is able to accurately prioritize pan-cancer regulating high-confidence miRNAs. miRcorrNet tool and all other supplementary files are available at https://github.com/malikyousef/miRcorrNet.Entities:
Keywords: Gene expression; Grouping; Integrated; Machine learning; Ranking; microRNA
Year: 2021 PMID: 34055490 PMCID: PMC8140596 DOI: 10.7717/peerj.11458
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1General workflow for classification based on grouping function G() and ranking those groups by R() fucntion.
Output of G() function applied on BLCA.
| Name of the miRNA/group | Genes assocaited with the miRNA |
|---|---|
| hsa-miR-361-5p | CELF2, FBN1, LAMA4, NFIX, ENTPD1, AP1S2, ARHGAP24, HSPA12A, SYDE1, TSHZ3, NRP2, RAB3IL1, CCDC80, ABCD2, EMILIN1, MS4A2, SDC3, ROR2, ANGPTL2, STX2, SLC25A12, GAS7, LIX1L, SEC23A, SMOC2, ANXA6, ZEB2, ALDH2, GPR124 |
| hsa-miR-361-3p | SYNJ2BP, NLGN1, KCNK3 |
| hsa-miR-15b-3p | C9orf3, EMILIN1, CNRIP1, GPR124 |
| hsa-miR-30e-5p | RAB3IL1, LIX1L |
| hsa-miR-181a-5p | SYNJ2BP, TACC2, JMY, ZDHHC15, MEIS1 |
| hsa-let-7a-5p | FBN1, LAMA4, ENTPD1, SETBP1, EMILIN1, ANGPTL2, PDE3B |
| hsa-miR-22-3p | PDZRN3, NPR2, SCN7A, CSGALNACT1, GNAQ, SOX10, C5orf53, LAMB2, PJA2, NFIX, GRIK3, SPARCL1, AP1S2, TCEAL2, HECTD2, THRA, ADCY9, FAM149A, LOC653653, SYNE1, C4orf12, DCHS1, MS4A2, ABI3BP, PBX3, NR3C2, CNRIP1, UBE2Q2, RCAN2, PCDHGB7, RNASE4, ZDHHC15, RNF180, MYOT, SYT11, NAP1L2, STARD13, PLP1, GATA6, GRM7, TENC1, RAI2, SGCE, PLSCR4, GAS7, PKD2, TOR1AIP1, LIX1L, STAT5B, DCN, SMOC2, TCEAL7, LOC399959, RHOJ, ZEB2, ALDH2, PRIMA1, PCDH18, GPR124, KCNK3 |
| hsa-miR-126-3p | LOC653653 |
Ranking algorithm for acquired miRNA–mRNA groups. The ranking method R() assigns a score for each group by performing an internal cross-validation.
Example of input data for miRcorrNet.
| mRNA expression data | |||||
|---|---|---|---|---|---|
| TCGA-DK-A6AV | neg | 32.877 | 28.283 | … | 721.166 |
| TCGA-DK-A3WX | neg | 39.634 | 57.526 | … | 593.293 |
| TCGA-GC-A3WC | pos | 29.789 | 98.344 | … | 1,057.069 |
| TCGA-BT-A20N | pos | 37.378 | 55.011 | … | 755.688 |
| … | … | … | … | … | … |
| TCGA-DK-A6AV | neg | 44.775623 | 13345.98449 | … | 9772.686386 |
| TCGA-DK-A3WX | neg | 34.30313 | 17531.35061 | … | 13508.08329 |
| TCGA-GC-A3WC | pos | 9.389 | 15229.41331 | … | 18601.78121 |
| TCGA-BT-A20N | pos | 3.534104 | 4717.325745 | … | 6845.372094 |
| … | … | … | … | … | … |
Figure 2miRcorrNet Workflow.
Figure 3Details of the R() function.
Example of the output of R() function applied on BLCA.
Whole results for this R() output has been given as mean. The columns are the performance measurement achieved by cross-validation. The rows are the name of each group that is the miRNA. The sorted table according to Accuracy is used as the rank for each miRNA.
| Group | Accuracy | Sensitivity | Specificity | Recall | Precision | F-measure | Cohen’s kappa |
|---|---|---|---|---|---|---|---|
| hsa-miR-32-5p | 0.65 | 0.55 | 0.71 | 0.55 | 0.61 | 0.52 | 0.27 |
| hsa-miR-361-3p | 0.85 | 0.70 | 0.94 | 0.70 | 0.87 | 0.76 | 0.66 |
| hsa-miR-205-5p | 0.91 | 0.90 | 0.91 | 0.90 | 0.86 | 0.88 | 0.81 |
| hsa-miR-30e-5p | 0.76 | 0.60 | 0.86 | 0.60 | 0.77 | 0.60 | 0.46 |
| hsa-miR-181a-5p | 0.89 | 0.75 | 0.97 | 0.75 | 0.96 | 0.82 | 0.75 |
| hsa-miR-106b-5p | 0.93 | 0.85 | 0.97 | 0.85 | 0.96 | 0.88 | 0.83 |
| hsa-let-7a-5p | 0.78 | 0.65 | 0.86 | 0.65 | 0.75 | 0.68 | 0.52 |
| hsa-miR-22-3p | 0.95 | 1.00 | 0.91 | 1.00 | 0.89 | 0.94 | 0.89 |
| hsa-miR-17-3p | 0.91 | 0.80 | 0.97 | 0.80 | 0.96 | 0.85 | 0.79 |
| hsa-miR-151a-5p | 0.82 | 0.70 | 0.89 | 0.70 | 0.86 | 0.73 | 0.60 |
| hsa-miR-374a-3p | 0.69 | 0.55 | 0.77 | 0.55 | 0.57 | 0.55 | 0.32 |
| hsa-miR-186-5p | 0.84 | 0.75 | 0.89 | 0.75 | 0.83 | 0.78 | 0.65 |
| hsa-miR-200c-3p | 0.85 | 0.70 | 0.94 | 0.70 | 0.86 | 0.72 | 0.65 |
| hsa-miR-576-5p | 0.82 | 0.65 | 0.91 | 0.65 | 0.89 | 0.67 | 0.57 |
| hsa-let-7a-3p | 0.93 | 0.85 | 0.97 | 0.85 | 0.95 | 0.89 | 0.84 |
Used datasets details. Detail of the 11 datasets used to test miRcorrNet and other tools. Columns, normal and tumor are class labels while its value is the number of samples belonging to those classes.
| TCGA cancer types | Abbreviation | Control | Case | Pubmed ID |
|---|---|---|---|---|
| Bladder urothelial carcinoma | BLCA | 405 | 19 | PMID: 24476821 |
| Breast invasive carcinoma | BRCA | 760 | 87 | PMID: 31878981 |
| Kidney chromophobe | KICH | 66 | 25 | PMID: 25155756 |
| Kidney renal papillary cell carcinoma | KIRP | 290 | 32 | PMID: 28780132 |
| Kidney renal clear cell carcinoma | KIRC | 255 | 71 | PMID: 23792563 |
| Lung adenocarcinoma | LUAD | 449 | 20 | PMID: 25079552 |
| Lung squamous cell carcinoma | LUSC | 342 | 38 | PMID: 22960745 |
| Prostate adenocarcinoma | PRAD | 493 | 52 | PMID: 26544944 |
| Stomach adenocarcinoma | STAD | 370 | 35 | PMID: 25079317 |
| Papillary thyroid carcinoma | THCA | 504 | 59 | PMID: 25417114 |
| Uterine corpus endometrial carcinoma | UCEC | 174 | 23 | PMID: 23636398 |
Example of performance output of the tools based on eth general approach. This is an example of the output of the miRcorrNet, maTE, or SVM-RCE. This results acquired with miRcorrNet using BLCA data. The column #Genes is the average number of genes. In the first step. we build a model from the genes belonging to the first top group and then test it using the testing part of the data. Then we build a model from the top 1 and 2 groups then test. For j = 10. the model is built from the genes belonging to the top 10 groups and tested accordingly.
| #Top groups | Number of genes | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| 10 | 388.79 | 0.94 | 0.92 | 0.95 |
| 9 | 355.81 | 0.95 | 0.92 | 0.96 |
| 8 | 328.58 | 0.94 | 0.91 | 0.96 |
| 7 | 288.23 | 0.93 | 0.91 | 0.95 |
| 6 | 259.99 | 0.94 | 0.92 | 0.95 |
| 5 | 223.87 | 0.94 | 0.92 | 0.95 |
| 4 | 182.58 | 0.94 | 0.91 | 0.95 |
| 3 | 146.43 | 0.94 | 0.91 | 0.95 |
| 2 | 93.16 | 0.93 | 0.9 | 0.94 |
| 1 | 45.06 | 0.91 | 0.86 | 0.93 |
Ranking miRNAs with RobustRankAggreg strategy using BLCA data. This table presents an example of the output of miRcorrNet based ranking of miRNA, determined from RobustRankAggreg method. Additionally, we have added genes in column three that are negatively correlated with the corresponding miRNA. The last column is the number of genes in each group associated with each miRNA.
| miRNA | Score ( | Targets | #Genes |
|---|---|---|---|
| hsa-miR-21-5p | 8.42423E-33 | RASGEF1C, SOX10, NOVA1, PCSK2, GRIK3, AR, EID1, ARHGAP6, C1QTNF7, CNTN2, TACC2, LYRM7, ZFP2, FAM149A, GPRASP2, FOXP1, TNNI3K, MID2, SYNE1, LRRTM1, RBM24, NR3C2, FAM54B, FOXF1, MEIS1, RNF180, MYOT, ZNF280D, SMAD9, PLP1, RAI2, NRXN1, CBX7, HERC1, MOAP1, LOC643763, MYST4, SERINC1, ZBTB4, PRIMA1, C20orf194 | 41 |
| hsa-miR-22-3p | 1.20936E-12 | MEIS1 | 1 |
| hsa-miR-16-5p | 0.006982937 | EVC, ZNF154, PPP3CB | 3 |
| hsa-miR-1976 | 0.011501451 | FAM168B, AHNAK, ACOX2, PJA2, DNAJC18, F8, NFIX, ARHGAP24, TCEAL2, SETBP1, EVC, THRA, RNF38, ATL1, CRTC3, SETD7, GPRASP2, PLCL1, ZHX3, NFIA, DDR2, PBX3, KLHL13, ZFHX4, MEIS1, PBX1, RNF180, NFIC, KIAA1614, SLC24A3, EPDR1, HERC1, TOR1AIP1, SERINC1, NEK9, ZEB2, GPR124 | 37 |
| hsa-miR-182-5p | 0.125595903 | CSGALNACT1, ACOX2, CD99L2, ARHGAP24, LRRK2, ROR2, ZEB2, ALDH2 | 8 |
| hsa-miR-576-5p | 0.126381607 | MID2, ZBTB4 | 2 |
| hsa-miR-92a-3p | 0.301933719 | SOX10, NOVA1, AQP1, EVC, LRRK2, MID2, ARHGAP1, C10orf72, SLC24A3, ALDH2 | 10 |
| hsa-miR-26b-3p | 0.92385325 | SOX10, RRAGD, ARHGAP24 | 3 |
Comparison results using all 11 datasets. Column AUC is Area Under the Curve. All the values are averaged over 100 MCVV for the level top 2 groups for maTE and miRcorrNet, while 8 and 125 genes for SVM-RFE and finally for SVM-RCE an average of 190.05 genes from cluster level 2. Standard deviation values is given for AUC.
| Method | Number of genes | Accuracy | Sensitivity | Specificity | AUC | Standard deviation |
|---|---|---|---|---|---|---|
| miRcorrNet | 141.1 | 0.96 | 0.94 | 0.97 | 0.98 | 0.05 ± 0.05 |
| maTE | 7.48 | 0.96 | 0.94 | 0.96 | 0.98 | 0.034 ± 0.026 |
| SVM-RCE | 190.05 | 0.96 | 0.94 | 0.97 | 0.99 | 0.06 ± 0.03 |
| SVM-RFE | 8 | 0.84 | 0.85 | 0.85 | 0.91 | 0.07 ± 0.04 |
| SVM-RFE | 125 | 0.96 | 0.97 | 0.95 | 0.98 | 0.05 ± 0.03 |
miRcorrNet results. Whole miRcorrNet results has shown using Area Under the Curve (AUC) value in terms of performance. #Grp is the number of top groups. Number of genes mean values has been given.
| miRcorrNet performance | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.98 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.95 | 0.96 | 1.00 | 0.99 | |
| 0.98 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.95 | 0.98 | 1.00 | 0.99 | |
| 0.99 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.96 | 0.97 | 1.00 | 0.99 | |
| 0.97 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.96 | 0.98 | 1.00 | 0.99 | |
| 0.97 | 0.99 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.95 | 0.93 | 0.99 | 0.99 | |
| 407 | 56 | 4,916 | 245 | 365 | 352 | 398 | 122 | 86 | 278 | 389 | |
| 290 | 60 | 2,998 | 207 | 316 | 257 | 270 | 69 | 52 | 219 | 269 | |
| 211 | 49 | 2,031 | 162 | 297 | 181 | 194 | 54 | 26 | 173 | 193 | |
| 84 | 32 | 870 | 70 | 157 | 65 | 68 | 21 | 13 | 92 | 75 | |
| 46 | 24 | 306 | 35 | 69 | 29 | 28 | 10 | 8 | 48 | 33 | |
Comparison of miRNA-disease associations between miRcorrNet findings and existing associations in databases.
| miRNA name | Score | Evidence | miRNA name | Score | Evidence |
|---|---|---|---|---|---|
| hsa-miR-21-5p | 7.32 | dbDEMC, | hsa-miR-21-5p | 9.66 | dbDEMC, |
| hsa-miR-22-3p | 4.67 | miRCancer | hsa-miR-10b-5p | 7.98 | dbDEMC, |
| hsa-miR-148b-3p | 4.06 | dbDEMC, | hsa-miR-200c-3p | 5.26 | dbDEMC, |
| hsa-let-7g-5p | – | No evidence | – | – | – |
| hsa-miR-222-3p | 9.33 | dbDEMC | hsa-miR-21-5p | 8.62 | dbDEMC, |
| hsa-miR-221-3p | 8.1 | dbDEMC, | hsa-miR-10b-5p | 4.95 | dbDEMC, |
| hsa-miR-96-5p | 7.03 | dbDEMC | hsa-miR-589-5p | 4.27 | dbDEMC |
| hsa-miR-28-3p | 7.96 | dbDEMC, | hsa-miR-151a-5p | 2.23 | dbDEMC |
| hsa-miR-21-5p | 6.35 | dbDEMC, | hsa-miR-200b-3p | 2.12 | dbDEMC, |
| hsa-miR-106b-3p | 6.17 | dbDEMC, | hsa-miR-141-3p | 2.01 | dbDEMC, |
| – | – | – | hsa-miR-1301 | – | No evidence |
| hsa-miR-30a-3p | 6.23 | dbDEMC, | hsa-miR-146b-3p | 3.76 | dbDEMC, |
| hsa-let-7a-5p | 6.13 | dbDEMC, | hsa-miR-181a-5p | 3.74 | dbDEMC |
| hsa-miR-22-3p | 5.49 | dbDEMC, | hsa-miR-205-5p | 3.44 | dbDEMC, |
| hsa-miR-143-3p | 3.31 | dbDEMC, | hsa-miR-21-5p | 9.59 | dbDEMC, |
| hsa-miR-375 | 3.04 | dbDEMC, | hsa-miR-148b-3p | 3.22 | miRCancer |
| hsa-miR-200c-3p | 1 | dbDEMC | hsa-miR-185-5p | 2.39 | dbDEMC, |
| hsa-miR-152 | 7.26 | dbDEMC | |||
| hsa-miR-30a-5p | 6.56 | dbDEMC, | |||
| hsa-miR-148b-3p | 6.5 | dbDEMC | |||
Performance results obtained by applying different experiments on the validation data.
| Experiments | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| LAUD_E random 1 | 0.63 | 0.52 | 0.59 |
| LAUD_E random 2 | 0.58 | 0.71 | 0.63 |
| LAUD_E random 5 | 0.73 | 0.77 | 0.74 |
| LAUD_E random 30 | 0.92 | 0.98 | 0.94 |
| LAUD (train) test on LAUD_E random 1 | 0.53 | 0.61 | 0.56 |
| LAUD (train) test on LAUD_E random 2 | 0.53 | 0.73 | 0.62 |
| LAUD (train) test on LAUD_E random 5 | 0.73 | 0.76 | 0.74 |
| LAUD (train) test on LAUD_E random 30 | 0.87 | 0.94 | 0.91 |
| LAUD (train) test on LAUD_E top 1 | 0.86 | 0.75 | 0.81 |
| LAUD (train) test on LAUD_E top 2 | 0.76 | 0.97 | 0.86 |
| LAUD (train) test on LAUD_E top 5 | 0.97 | 0.92 | 0.95 |
| LAUD (train) test on LAUD_E top 30 | 0.98 | 0.97 | 0.97 |
Figure 4Pan-cancer regulating miRNAs predicted by miRcorrnet.
(A) Eleven miRNAs the potentially regulate 6 or more cancer types, are highlighted. (B) Ranks of these 11 miRNAs in individual cancer types are denoted by dots. These miRNAs are sorted based on their median rank.
List of correlation based tools for mRNA–miRNA integration.
| Tool name | Data sets used | Link | Status |
|---|---|---|---|
| anamiR | Multiple Myeloma | R package | Not Available |
| miRComb | Colon Cancer | R package | Not Available |
| MMIA | ALL | http://cancer.informatics.indiana.edu/mmia (inactive) | Not Available |
| MAGIA | ALL | http://gencomp.bio.unipd.it/magia (inactive) | Not Available |
| MirConnX | GBM | Not Available | |
| BCM | BRCA and THCA | Not Available |