| Literature DB >> 26528912 |
Jiang Shu1, Kevin Chiang1, Janos Zempleni2, Juan Cui1.
Abstract
MicroRNAs have been long considered synthesized endogenously until very recent discoveries showing that human can absorb dietary microRNAs from animal and plant origins while the mechanism remains unknown. Compelling evidences of microRNAs from rice, milk, and honeysuckle transported to human blood and tissues have created a high volume of interests in the fundamental questions that which and how exogenous microRNAs can be transferred into human circulation and possibly exert functions in humans. Here we present an integrated genomics and computational analysis to study the potential deciding features of transportable microRNAs. Specifically, we analyzed all publicly available microRNAs, a total of 34,612 from 194 species, with 1,102 features derived from the microRNA sequence and structure. Through in-depth bioinformatics analysis, 8 groups of discriminative features have been used to characterize human circulating microRNAs and infer the likelihood that a microRNA will get transferred into human circulation. For example, 345 dietary microRNAs have been predicted as highly transportable candidates where 117 of them have identical sequences with their homologs in human and 73 are known to be associated with exosomes. Through a milk feeding experiment, we have validated 9 cow-milk microRNAs in human plasma using microRNA-sequencing analysis, including the top ranked microRNAs such as bta-miR-487b, miR-181b, and miR-421. The implications in health-related processes have been illustrated in the functional analysis. This work demonstrates the data-driven computational analysis is highly promising to study novel molecular characteristics of transportable microRNAs while bypassing the complex mechanistic details.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26528912 PMCID: PMC4631372 DOI: 10.1371/journal.pone.0140587
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Detailed statistics of microRNA data, which includes a total of 34,612 mature sequences, 28,421 stem-loop precursor sequences, 194 species and 5 kingdoms.
| Types | Animal | Plantae | Fungi | Viruses | Protista | Total |
|---|---|---|---|---|---|---|
| Mature miRNAs | 26,705 | 7,645 | 84 | 152 | 26 | 34,612 |
| Precursor miRNAs | 21,257 | 6,990 | 53 | 91 | 30 | 28,421 |
| Species | 111 | 71 | 5 | 5 | 2 | 194 |
Various annotation information are collected from the following resources
1. miRBase [3], which complies the species/kingdom information of 34,612 mature miRNAs included in this study
2. DMD [26], which contains dietary species information of 5,217 miNRAs
3. Weber et al. [25] provides a list of 360 human circulating miRNAs
4. ExoCarta and EVpedia [27, 28] provide 370 exosomal miRNAs in human and mouse.
Fig 1Length distribution of mature miRNA sequences in 5 kingdoms (A) and schematic plot shows statistics of the cross-species sequence comparison (B).
(A) Length distribution of mature miRNA sequences in Animalia (red), Fungi (brown), Plantae (green), Protista (blue) and Viruses (purple) in both histogram (left) and Boxplots (right). (B) Schematic plot shows statistics of the cross-species sequence comparison. Within each species, light blue indicates the percentage of miRNAs that have homologues miRNA in Human, light purple represents the percentage of miRNAs that have homologues in other species within the same kingdom, and gray shows the percentage of miRNAs that have no homologues in any other species.
Fig 2Phylogenetic tree of miR-190/171 family and sequence alignments of hsa-miR-190a/b and sly-MIR171a.
(A) Phylogenetic tree of miR-190/171 family based on the 162 precursor sequences from 89 species, where three major clusters are formed for miR-190a, miR-190b, and miR-171. (B) Alignments among precursor sequences of hsa-miR-190a/b and sly-MIR171a. Hsa-miR-190b show sequence identify of 79% (with blast similarity score = 28.3, E-value = 2E-05) and 77% (with blast similarity score = 76.1, E-value = 2E-18) with sly-miR-171a (tomato) and miR-190a (human), respectively.
Performance summary for kingdom-wise classification and human secreted miRNA prediction.
| Classification | Selected Features | Accuracy | Sensitivity | Specificity | MCC |
|---|---|---|---|---|---|
|
| 166 | 93.29% | 96.46% | 90.11% | 86.75% |
|
| 147 | 93.28% | 89.71% | 96.86% | 86.79% |
|
| 126 | 89.79% | 87.39% | 92.19% | 79.68% |
|
| 96 | 90.03% | 84.68% | 95.37% | 80.51% |
The classification results are obtained through 5-fold cross validation with respect to different feature subsets selected.
a“FPV” denotes the virtual kingdom of Fungi, Protista and Viruses.
Examples of overlapped discriminative features chosen by three kingdom-wise classifications and the human blood secretory prediction.
| Features | Details | A | P | F | H | adj-P |
|---|---|---|---|---|---|---|
| Ensemble Free Energy (EFE) | Binding energy in kcal/mol | 2 | 8 | 8 | 3.81E-02 | |
| Pairs | Number of pairing on stem-loop structure | 1 | 2 | 19 | 9.43E-14 | |
| Minimum Free Energy (MFE) | Precursor sequence fold with least thermodynamic free energy in kcal/mol | 7 | 9 | 7 | ||
| %pairGC | Normalized frequency of G-C pairing on stem-loop structure. | 10 | 24 | 5 | ||
| Length_P | Length of precursor sequence | 9 | 10 | 57 | ||
| G((( | Frequency of 3 paired nucleotides leading with G | 15 | 17 | 31 | ||
| C((( | Frequency of 3 paired nucleotides leading with C | 12 | 77 | 11 | 1.14E-03 | |
| freqUUCC_seed | Normalized frequency of UUCC in the seed region | 29 | 33 | |||
| freqCCA_seed | Normalized frequency of CCA in the seed region | 27 | 35 | |||
| freqCG_P | Normalized frequency of CG on precursor sequence | 41 | 12 | |||
| %G+C content_P | Normalized frequency of G and C on precursor sequence | 16 | 12 | 57 | 5.14E-16 | |
| Max–stem-length | Longest stems block on stem-loop structure | 39 | 20 | |||
| A((( | Frequency of 3 paired nucleotides leading with A | 26 | 1 | |||
| freqGA_m | Normalized frequency of GA on mature miRNA | 105 | 18 | |||
| freqC_m | Normalized frequency of C on mature miRNA | 126 | 94 | 30 | ||
| freqCUG_P | Normalized frequency of CUG on precursor sequence | 24 | 58 | 98 | 80 | |
| freqCUG_m | Normalized frequency of CUG on mature miRNA | 122 | 120 | 92 | ||
| freqCUGG_P | Normalized frequency of CUGG on precursor sequence | 97 | 75 | 51 | ||
| freqGU_P | Normalized frequency of GU on precursor sequence | 137 | 101 | 52 | 5.78E-08 | |
| freqGU_m | Normalized frequency of GU on mature miRNA | 51 | 103 | 5 | 2.27E-04 | |
| freqU_m | Normalized frequency of U on mature miRNA | 114 | 53 | 127 | 48 | 6.89E-08 |
| freqUGU_m | Normalized frequency of UGU on mature miRNA | 101 | 98 | 93 | ||
| Palindromes | Number of palindromes with length greater 3 occurring on precursor sequence | 11 | 16 | 79 | 68 | 2.22E-16 |
| freqC_seed | Frequency of C in the seed region | 82 | 26 | 70 | ||
| %G+C content_m | Frequency of CG that occurs on mature miRNA | 93 | 70 | 37 | 3.90E-09 |
The numbers in the table represents the ranks of the selected feature in the corresponding classifiers (A: Animalia versus others, P: Plantae versus others, F: FPV versus others; H: human blood secretory miRNAs versus other human miRNAs) and the unselected ranks are not shown. The last column of “adj-P” shows that the adjusted p-value when analyzing the corresponding feature in the blood secretory prediction using Wilcoxon signed-rank test (insignificant p-values are not shown). Complete list was given in Table B in S1 File.
221 features that have been selected for the final ranking of possible circulating miRNAs, which are categorized into eight groups according to the feature type and involved sequence type.
| Feature groups | Number of selected features | Feature list |
|---|---|---|
| Frequency in seed region | 28 | AG, AGGU, C, CAGC, CAUC, CC, CCA, CCAG, CCAU, CCCA, CUUC, GA, GAG, GAGG, GCA, GCAG, GGU, GGUA, GU, GUA, GUAG, UA, UAG, UCC, UCCA, UGAG, UUC, UUCC |
| Frequency in mature miRNA | 63 | ACG, ACGG, AG, AGC, AGCU, AGG, AGGU, AGU, AGUA, AGUU, AUA, AUAG, AUCG, C, CAGU, CAUA, CC, CCA, CCAU, CCCA, CCG, CCGA, CG, CGA, CGAC, CGG, CGGA, CU, CUCC, CUG, GA, GAC, GACG, GAGC, GAGG, GCAC, GCUC, GGG, GGU, GGUA, GGUU, GU, GUA, GUAG, GUU, GUUG, U, UA, UAG, UAGG, UAGU, UC, UCC, UCCG, UCG, UG, UGA, UGU, UGUA, UGUG, UUCC, UUG, UUGU |
| Frequency in precursor sequence | 80 | ACCC, ACG, ACGA, ACGG, ACUA, AGCC, AGCG, AGG, AGGU, AGU, AGUA, AUA, AUAC, AUCG, C, CACG, CAG, CAGG, CAGU, CC, CCA, CCG, CCGA, CCGC, CCUG, CG, CGA, CGAC, CGAG, CGCC, CGG, CGGA, CGU, CGUG, CUA, CUAG, CUAU, CUCG, CUG, CUGG, CUGU, G, GA, GAAU, GACG, GCG, GCGA, GCUC, GCUG, GGCC, GGCG, GGU, GGUA, GGUU, GU, GUA, GUAG, GUUG, U, UA, UAC, UACG, UAG, UAGC, UAGG, UAGU, UAUA, UC, UCC, UCCG, UCG, UCGU, UCU, UG, UGCU, UGG, UGGC, UGGG, UUG, UUGU |
| Frequency of 3 nucleotides in stem loop structure | 16 | A(((, A((., A(.(, A.((, A.(., A. . ., C(((, C(.(, C.((, C. . ., G(((, G((., G(.(, G(., G. . ., U((( |
| Structure indicators | 14 | 5 predicted shape type probability base on RNAshapes, MFE, Normalized_MFE, EFE, Normalized_EFE, freqMFEStructures, MFEI1, MFEI3, MFEI4, Shannon_Entropy |
| Stems/Pairs | 12 | %pairAU, %pairGC, %pairGU %A_unpaired, %C_unpaired, %G_unpaired max_stem_length, %G+C_stem, pairs, %pairs, %stems, Base-pairing-propensity |
| Percentage of nucleotides | 4 | %A+U_P, %A+U_m, %G+C content_P, %G+C content_m, |
| Length/Palindromes | 4 | Length_m, length_P, palindromes_P, palindromes_seed |
Statistics of the top miRNA entries in the ranking list with respect to their origins.
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
|
| 26705 (77.16%) | 7645 (22.09%) | 152 (0.44%) | 84 (0.24) | 26 (0.08%) | 5217 (15.07%) |
|
| 499 (99.8%) | 1 (0.02%) | 0 | 0 | 0 | 14 (2.8%) |
|
| 962 (96.2%) | 30 (3%) | 8 (0.8%) | 0 | 0 | 62 (6.2%) |
|
| 2812 (93.7%) | 163 (5.43%) | 25 (0.87%) | 0 | 0 | 273 (9.1%) |
|
| 4678 (93.56%) | 295 (5.9%) | 27 (0.54%) | 0 | 0 | 519 (10.38%) |
|
| 9269 (92.69%) | 670 (6.7%) | 55 (0.55%) | 4 | 2 | 1024 (10.24%) |
Gene targets and functional analysis of the three top predictions of the transportable miRNAs in cow’s milk, EBV, and rLCV.
| Dietary miRNAs | Human homologs with identical sequences | Number of targets | Related functional processes | |||
|---|---|---|---|---|---|---|
| Experimentally validated | Predicated | Enriched pathway examples | Enriched GO term examples | |||
|
| bta-miR-487b | hsa-miR-487b-3p | 46 | 468 | Axon guidance (4.2E-2); Regulation of actin cytoskeleton (4.2E-2); Butanoate metabolism (5.2E-2); MAPK signaling pathway (3.4E-2). | Cell migration (8.0E-3); Positive regulation of locomotion (9.8E-3); Localization of cell (1.2E-2); Protein kinase activity (1.8E-2). |
| bta-miR-181b | hsa-miR-181b-5p | 1077 | 2084 | Glioma (5.8E-3); Melanoma (8.1E-3); p53 Signaling Pathway (2.6E-2); Prostate cancer (1.0E-5). | Organelle lumen (7.3E-9); Nuclear envelope (9.5E-5); Intracellular non-membrane-bounded Organelle (1.4E-4). | |
| bta-miR-421 | hsa-miR-421 | 793 | 808 | Huntington's disease (3.5E-2); Hypoxia and p53 in the Cardiovascular system (4.6E-2); Apoptotic Signaling in Response to DNA Damage (3.6E-2). | Endomembrane system (4.6E-5); Nuclear lumen (7.8E-5); Establishment of protein localization (1.6E-4); Intracellular transport (2.8E-4). | |
|
| ebv-mir-BART13-3p | - | - | 208 | Long-term potentiation (3.3E-2); Neurotrophin signaling pathway (3.9E-2); ErbB signaling pathway (6.1E-2); Melanogenesis (8.3E-2). | Phosphate metabolic process (1.2E-4); Wnt receptor signaling pathway (4.7E-2); hosphorylation (1.3E-3); Enzyme linked receptor protein signaling pathway (2.6E-3). |
| ebv-mir-BART8-3p | - | - | 549 | Aldosterone-regulated sodium reabsorption (6.5E-3); Growth factors Survival factors Mitogens (2.3E-2); Prostate cancer (4.5E-2); Renal cell carcinoma (5.3E-2). | Nuclear lumen (5.6E-5); Transcriptional repressor complex (6.2E-5); Synapse (8.8E-5); Nucleoplasm (8.8E-5). | |
| ebv-mir-BART9-3p | - | - | 568 | Adherens junction (1.4E-2); Endocytosis (2.0E-2); Signaling Pathway from G-Protein Families (5.4E-2); Cell cycle (5.9E-2). | Membrane raft (1.5E-4); Regulation of transcription (1.8E-4); Regulation of transcription from RNA polymerase II promoter (3.4E-4); Positive regulation of macromolecule metabolic process (3.8E-4). | |
|
| rLCV-mir-rl1-16-3p | - | - | 466 | Fc gamma R-mediated phagocytosis (5.8E-4); Ubiquitin mediated proteolysis (6.0E-4); Endocytosis (6.6E-4); Melanoma (1.9E-3). | Regulation of endocytosis (1.0E-4); protein modification by small protein Conjugation or removal (1.4E-4); Extrinsic to membrane (5.7E-4). |
| rLCV-mir-rl1-16-5p | - | - | 403 | Regulation of actin cytoskeleton (5.6E-4); Ca++/ Calmodulin-dependent Protein Kinase Activation (7.9E-3); Vascular smooth muscle contraction (9.3E-3); Focal adhesion (9.9E-3). | Positive regulation of cell migration (1.2E-4); Cytoskeleton (2.0E-4); Positive regulation of locomotion (2.6E-4). | |
| rLCV-mir-rl1-7-3p | - | - | 228 | Axon guidance (6.9E-3); Adherens junction (3.5E-2); Metabolism of Anandamide (5.7E-2). | Intrinsic to membrane (1.8E-4); Alkali metal ion binding (9.0E-4); Plasma membrane part (1.6E-3); Metal ion transport (1.7E-3). | |
The experimentally validated targets are collected from CLASH, MirTarBase and DIANA-TarBase; the complete list of the enriched pathways and GOs are listed in Table D in S1 File.
Fig 3The multiple sequences alignment among the binding regions of 15 miR-487b targets in cow (_bta) and 46 targets in human (_hsa).
Fig 4Regulatory network of bta-miR-487b in human.
Blue octagon nodes indicate genes that are involved in MAPK signaling pathway (adjusted fisher test p-value = 0.034); purple circle nodes indicate genes that are involved in regulation of actin cytoskeleton (p-value = 0.042); green triangle nodes represent genes that are involved in axon guidance (p-value = 0.042); pink square nodes denote genes that are involved in butanoate metabolism (p-value = 0.052). All light blue small circle nodes represent other predicted targets of miR-487b.