| Literature DB >> 28362846 |
Minsik Oh1, Sungmin Rhee1, Ji Hwan Moon2, Heejoon Chae3, Sunwon Lee4, Jaewoo Kang4, Sun Kim1,2,5.
Abstract
miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28362846 PMCID: PMC5376335 DOI: 10.1371/journal.pone.0174999
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The number of published papers related to the keyword ‘cancer’ since 2010.
More than 100,000 papers have been published every year.
Fig 2Schematic workflow for Context-MMIA.
The system accepts expression information of miRNA and mRNA as inputs. In the MMIA step, DEmiRNAs and DEmRNAs are extracted based on their expression level difference, and their negative correlation is computed. In the Context-MMIA step, the system computes omics and context scores based on user-provided keywords by utilizing the BEST system. Finally, the system ranks miRNA-mRNA pairs using the scores.
Dataset summary.
Each GEO study comes with an experimentally validated miRNA-mRNA target (the second column) to affect their disease domain (the third column). Disease information was used to test performances when different contexts are specified.
| Data | Experimentally validated target | Disease |
|---|---|---|
| GSE21411 | hsa-miR-23a—NEDD4L | Interstitial Lung Diseases |
| GSE40059 | hsa-miR-200c—CFL2 | Breast Cancer |
| GSE53482 | hsa-miR-155—JARID2 | Primary Myelofibrosis |
The ratio of the mapped genes and the number of the genes in the top 200 miRNA-target pairs.
From each method, we extracted the top 200 target pairs using each method and performed pathway analysis using DAVID. The numerator is the number of genes mapped to the enriched pathways, and the denominator is the genes in the top 200 edges. The ratio of Context-MMIA is the largest for each dataset.
| Methods | GSE21411 | GSE40059 | GSE53482 |
|---|---|---|---|
| Context-MMIA | 37 / 79 | 45 / 157 | 42 / 127 |
| MMIA | 12 / 157 | 20 / 179 | 11 / 124 |
| GenMiR++ | 0 / 194 | 18 / 197 | 26 / 200 |
| MAGIA2 | 18 / 182 | 12 / 191 | 19 / 193 |
| CoSMic | 24 / 196 | 9 / 195 | X |
Enriched pathway analysis on GSE40059 breast cancer data.
Breast-cancer-related pathways are selected by the literature search. A circle in a cell means that the pathway is enriched by the gene set predicted by each method (A: Context-MMIA, B: MMIA, C: GenMiR++, D: MAGIA2, and E: CoSMic). More pathways are enriched by the gene set in the Context-MMIA result.
| Breast-Cancer-Related Pathway | A | B | C | D | E |
|---|---|---|---|---|---|
| Purine metabolism [ | O | ||||
| Pyrimidine metabolism [ | O | ||||
| ABC transporters [ | O | ||||
| MAPK signaling pathway [ | O | ||||
| Cytokine-cytokine receptor interaction [ | O | ||||
| Neuroactive ligand-receptor interaction [ | O | ||||
| p53 signaling pathway [ | O | O | |||
| Apoptosis [ | O | ||||
| Notch signaling pathway [ | O | ||||
| TGF-beta signaling pathway [ | O | ||||
| Axon guidance [ | O | ||||
| Focal adhesion [ | O | O | O | ||
| ECM-receptor interaction [ | O | ||||
| Cell adhesion molecules (CAMs) [ | O | O | |||
| Adherens junction [ | O | ||||
| Regulation of actin cytoskeleton [ | O | ||||
| Glioma [ | O | ||||
| Melanoma [ | O |
Reproducibility of validated targets.
This table contains the rankings of validated target pairs in three datasets. The validated targets are listed in the second column of Table I. Context-MMIA outperformed existing tools in predicting the validated targets. MAGIA2 and CoSMic failed to reproduce the validated targets.
| Data | GSE21411 | GSE40059 | GSE53482 |
|---|---|---|---|
| Context-MMIA | |||
| MMIA | 1411 | 387 | 1465 |
| GenMiR++ | 8625 | 1673 | 95492 |
| MAGIA2 | X | X | X |
| CoSMic | X | X | X (Not Work) |
Detection of human-specific validated targets.
This table contains the number of validated target pairs in three datasets. The validated targets are extracted from miRTarBase target pairs filtered by human functional miRNA target interaction (MTI).
| Data | GSE21411 | GSE40059 | GSE53482 |
|---|---|---|---|
| Context-MMIA | |||
| MMIA | 5 | 4 | 12 |
| GenMiR++ | 3 | 4 | 3 |
| MAGIA2 | 0 | 0 | 0 |
| CoSMic | 7 | 0 | X (Not Work) |
Sensitivity tests when different keywords are used.
Rankings of validated targets are shown when different keywords are used. The validated targets had high ranks when disease-related keywords were used.
| Keyword | GSE21411 | GSE40059 | GSE53482 |
|---|---|---|---|
| Correct keyword | |||
| Insulin resistance | 12479 | 2036 | 4250 |
| Influenzas | 6826 | 1169 | 1623 |
| HIV | 5865 | 4002 | 3238 |
| Hepatocellular carcinoma | 5278 | 3265 | 7180 |