| Literature DB >> 30255782 |
Zeynep Hakguder1, Jiang Shu1, Chunxiao Liao1, Kaiyue Pan2, Juan Cui3.
Abstract
BACKGROUND: MicroRNA regulation is fundamentally responsible for fine-tuning the whole gene network in human and has been implicated in most physiological and pathological conditions. Studying regulatory impact of microRNA on various cellular and disease processes has resulted in numerous computational tools that investigate microRNA-mRNA interactions through the prediction of static binding site highly dependent on sequence pairing. However, what hindered the practical use of such target prediction is the interplay between competing and cooperative microRNA binding that complicates the whole regulatory process exceptionally.Entities:
Keywords: Bayesian inference; Dirichlet process Gaussian mixture; Dynamic microRNA regulation; Machine learning; MicroRNA; MicroRNA target prediction
Mesh:
Substances:
Year: 2018 PMID: 30255782 PMCID: PMC6157162 DOI: 10.1186/s12864-018-5029-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Datasets applied in this study
| Datasets | Content |
|---|---|
| CLASH data [ | 17,436 interactions on Human kidney cell (HEK293), associated with Ago1 |
| iPAR-CLIP data [ | 10,566 interactions on HEK293, human embryonic stem cell, EBV-infected lymphoblastoid cell lines, and primary effusion lymphoma cell line, associated with Ago1 and Ago2 |
| CLEAR-CLIP data [ | 32,711 interactions on Human hepatoma cell, associated with Ago |
| mirTarbase [ | 11,002 interactions on Human genome, predicted by miRanda; |
| RefSeq [ | 56,000 human transcripts |
Fig. 1Interaction features used in the model. (a) Breakdown of features according to categories. (b) Recursive feature elimination process. Top panel shows the performance of the feature elimination process on the initial set of 2059 features. Bottom panel illustrates the distributions of four discriminative features in the positive and negative datasets
The positive and negative datasets
| Dataset | Statistics | |
|---|---|---|
| miRNA/mRNA | interaction | |
| Pos-1: Interactions reported in CLASH data | 399 | 17,436 |
| Pos-2: Interactions reported in iPAR-CLIP data | 291 | 10,567 |
| Neg-1: Interactions generated on reported miRNA and mRNA pairs | 755 | 8768 |
| Neg-2: Interactions generated on reported miRNAs and unreported mRNAs | 755 | 8768 |
| Neg-3: Interactions generated on unreported miRNAs and reported mRNAs | 1833 | 7332 |
| Neg-4: Interactions generated on unreported miRNAs and unreported mRNAs | 1833 | 7332 |
Fig. 2Cascade DPGMM model. (a) Flowchart describing training of the model. (b) Illustrative example of cascade DPGMM. Rectangles represent mixed clusters (M), circles represent homogeneous clusters (H). Clustering goes on until homogeneous clusters are obtained
Prediction performance on the training, testing, and independent datasets
| Dataset | Performance | |||
|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | |
| Training | 0.78 | 0.86 | 0.82 | 0.64 |
| Testing | 0.77 | 0.86 | 0.82 | 0.64 |
| Validation-1 CLEAR-CLIP | 0.80 | – | 0.80 | – |
| Validation-2 mirTarbase (validated) | 0.61 | – | 0.61 | – |
| Validation-3 mirTarbase (predicted) | 0.62 | – | 0.62 | – |
Performance based on the 1st and leaf layer clusters
| Dataset | Sensitivity | Specificity | Accuracy | MCC | ||||
|---|---|---|---|---|---|---|---|---|
| 1st | Leaf | 1st | Leaf | 1st | Leaf | 1st | Leaf | |
| Training | 0.93 | 0.94 | 0.96 | 0.97 | 0.95 | 0.96 | 0.89 | 0.91 |
| Testing | 0.93 | 0.94 | 0.96 | 0.97 | 0.94 | 0.96 | 0.89 | 0.91 |
| Validation-1 | 0.87 | 0.92 | – | – | 0.87 | 0.92 | – | – |
| Validation-2 mirTarbase (val.) | 0.63 | 0.60 | – | – | 0.63 | 0.60 | – | – |
| Validation-3 mirTarbase (pre.) | 0.60 | 0.58 | – | – | 0.60 | 0.58 | – | – |
Fig. 3Clusters resulting from cascade DPGMM model. (a) Clustering tree shown at bottom left. A section of the tree is enlarged at the top right. Ellipses are homogeneous clusters. Numbers in parenthesis represent number of examples in the cluster. Striped boxes are mixed clusters, width of each stripe corresponds to the percentage of a miRNA in the mixture. (b) (top) the distribution of number of examples in clusters over all clusters; (bottom) distribution of percentages of dominant miRNAs in each cluster. A dominant miRNA has highest presence in a cluster
Fig. 4Transcriptome prediction statistics. (a) Distribution of number of mRNA transcript targets per miRNA. (b) Distribution of number of miRNA regulators per gene. y-axis shows the percentage in whole transcriptome
Fig. 5Network topologies of predicted regulators of EGFR gene in Stage I, II, and III breast cancer