| Literature DB >> 19091026 |
Wing-Sze Leung1, Marie C M Lin, David W Cheung, S M Yiu.
Abstract
BACKGROUND: MicroRNAs are small non-coding RNA gene products that play diversified roles from species to species. The explosive growth of microRNA researches in recent years proves the importance of microRNAs in the biological system and it is believed that microRNAs have valuable therapeutic potentials in human diseases. Continual efforts are therefore required to locate and verify the unknown microRNAs in various genomes. As many miRNAs are found to be arranged in clusters, meaning that they are in close proximity with their neighboring miRNAs, we are interested in utilizing the concept of microRNA clustering and applying it in microRNA computational prediction.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19091026 PMCID: PMC2638143 DOI: 10.1186/1471-2105-9-S12-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Effects on the distance of chromosomal separation of clustered miRNAs on the number of clustered miRNAs found in the human, mouse and rat genomes.
| 1500 nt | 3000 nt | 6000 nt | 10000 nt | 25000 nt | 50000 nt | ||
| Human | # of clustered miRNAs | 196 | 217 | 240 | 241 | 242 | 261 |
| # of clusters defined | 71 | 68 | 60 | 60 | 60 | 65 | |
| Average cluster size | 2.76 | 3.19 | 4 | 4.02 | 4.03 | 4.02 | |
| Mouse | # of clustered miRNAs | 204 | 215 | 237 | 243 | 253 | 260 |
| # of clusters defined | 70 | 53 | 55 | 57 | 58 | 61 | |
| Average cluster size | 2.91 | 4.06 | 4.31 | 4.26 | 4.36 | 4.26 | |
| Rat | # of clustered miRNAs | 119 | 129 | 145 | 154 | 154 | 160 |
| # of clusters defined | 46 | 45 | 48 | 49 | 47 | 48 | |
| Average cluster size | 2.59 | 2.87 | 3.04 | 3.14 | 3.28 | 3.14 | |
"Average cluster size" is equivalently to the average number of miRNAs found in a single cluster. It can be seen that there is an abrupt increase in the number of clustered miRNAs from the case of 3000 nt to the case of 6000 nt. There are little effects on the number of clustered miRNAs and the number of clusters defined when the separation is more than 10000 nt. To conclude, among the six distances that we have tested, 6000 nt is an optimal chromosomal distance bound within which two clustered miRNAs are separated.
Figure 1Our definition of a cluster. MiRNAs which are separated by a distance of less than 6000 nt are grouped as one cluster.
The number of clustered miRNAs and isolated miRNAs found in the human, mouse and rat genomes using our definition of miRNA cluster.
| Version of miRBase | 10.0 | 10.0 | 10.1 | |
| Total # of clusters | 60 | 55 | 48 | |
| Size of clusters | Min | 2 | 2 | 2 |
| Mean | 4 | 4.31 | 3.04 | |
| Max | 43 | 52 | 17 | |
| Isolated miRNAs | # | 288 | 220 | 154 |
| % | 54.55% | 48.14% | 51.33% | |
| Clustered miRNAs | # | 240 | 237 | 146 |
| % | 45.45% | 51.86% | 48.67% | |
| Total # of miRNAs | 528 | 457 | 300 | |
"Version of miRBase" denotes the update version of miRBase where the datasets are downloaded. "Size of clusters" is equivalently to the number of miRNAs found in a single cluster. There are 60, 55 and 47 clusters identified in the three genomes respectively, which are equivalent to 45.45%, 51.86% and 48.67% of the total human, mouse and rat miRNAs.
Figure 2Similarity analyses of a clustered miRNA with four groups of sequences. A clustered miRNA is aligned with sequences from four categories: (i) miRNA(s) in the same cluster; (ii) miRNAs outside its cluster; (iii) random sequences extracted from the genome; and (iv) random sequences extracted from its flanking 3000 nt region.
Results of the similarity analyses between clustered miRNAs and other sequences.
| Human | Category | Maximum | Average | Minimum | Std Dev | Maximum | Average | Minimum | Std Dev |
| (i) | 79 | 45.20 | 17 | 15.70 | 71 | 28.86 | 0 | 9.73 | |
| (ii) | 73 | 46.30 | 15 | 12.96 | 94 | 37.21 | 15 | 11.29 | |
| (iii) | 74 | 45.47 | 17 | 14.13 | 203 | 140.89 | 104 | 18.78 | |
| (iv) | 77 | 44.81 | 15 | 12.60 | 134 | 75.55 | 28 | 22.19 | |
| Mouse | (i) | 79 | 43.63 | 0 | 14.40 | 72 | 30.41 | 0 | 9.33 |
| (ii) | 78 | 45.28 | 16 | 13.03 | 69 | 35.19 | 10 | 9.67 | |
| (iii) | 75 | 43.72 | 16 | 14.40 | 239 | 139.93 | 103 | 17.31 | |
| (iv) | 72 | 45.37 | 15 | 13.33 | 128 | 69.55 | 26 | 19.31 | |
| Rat | (i) | 75 | 44.28 | 17 | 12.83 | 113 | 31.41 | 11 | 12.60 |
| (ii) | 72 | 45.18 | 16 | 13.12 | 116 | 35.68 | 13 | 11.73 | |
| (iii) | 69 | 45.03 | 16 | 12.48 | 198 | 143.12 | 110 | 18.43 | |
| (iv) | 73 | 47.90 | 15 | 13.26 | 114 | 74.69 | 32 | 18.74 | |
Sequence and secondary structural alignments are performed for each clustered miRNA with sequences from the following categories: (i) clustered miRNAs, (ii) non-clustered miRNAs, (iii) random and (iv) neighboring sequences. A higher score implies a greater distance and hence a higher degree of dissimilarity. Std Dev, standard deviation.
Results of the performance analyses of ProMirII-g and miR-abela using human, mouse and rat genome data.
| ProMirII-g | # of predictions | 690 | 656 | 640 | 615 |
| # of TPs | 215 | 199 | 183 | 127 | |
| # of FPs | 475 | 457 | 457 | 485 | |
| # of real miRNAs missed | 25 | 46 | 29 | 19 | |
| SE | 89.58% | 81.22% | 85.92% | 86.99% | |
| PPV | 31.16% | 30.34% | 28.59% | 20.65% | |
| miR-abela | # of predictions | 1036 | 915 | 901 | 646 |
| # of TPs | 149 | 140 | 126 | 86 | |
| # of FPs | 887 | 775 | 775 | 560 | |
| # of real miRNAs missed | 91 | 105 | 86 | 60 | |
| SE | 62.08% | 57.14% | 59.43% | 58.90% | |
| PPV | 14.38% | 15.30% | 13.98% | 13.31% | |
It should be noted that in the mouse genome, 25 of the pre-miRNAs are duplicated. In other words, only 212 mouse pre-miRNAs are distinct in the genome. To avoid overestimation of the performance of the software, we identify the duplicated ones and conduct two measurements. #, number; FP, false positives; TP, true positives; SE, sensitivity; PPV, positive predictive value.
Figure 3Overview of the clustering-based approach. In principle, our clustering-based approach consists of two stages. In stage 1, a currently available prediction program, for example ProMirII-g, is selected to produce a list of potential candidates. A loose threshold is used because we want to include as many TPs as possible to achieve a high SE. In stage 2, we aim at filtering the FPs from the list of candidates by picking out the dissimilar pairs as determined by the RNAdistance scores.
Results of our clustering-based approach on the human, mouse and rat genome data.
| # of predictions | 374 | 381 | 359 | 276 |
| # of TPs | 196 | 172 | 165 | 101 |
| # of real miRNAs | 240 | 245 | 212 | 146 |
| SE | 81.67% | 70.20% | 77.83% | 69.18% |
| Change in SE | -7.91% | -11.02% | -8.09% | -17.81% |
| PPV | 52.41% | 45.14% | 45.96% | 36.59% |
| Change in PPV | +21.25% | +14.81% | +17.37% | +15.94% |
This table shows the SE and PPV obtained after our filtering approach is applied to the predicted candidates generated by ProMirII-g. The PPV is increased by more than 15% in all the three genomes. The SE is kept reasonably high as it is just slightly decreased by a percentage less than 10% in the human and mouse genomes.
Results of our clustering-based approach when applied on clusters with more than one TP.
| Before/After filtering | before | after | before | after | before | after | before | after |
| # of predictions | 620 | 335 | 632 | 363 | 616 | 350 | 525 | 224 |
| # of TPs | 206 | 189 | 195 | 168 | 179 | 155 | 121 | 99 |
| # of real miRNAs | 220 | 220 | 234 | 234 | 202 | 202 | 132 | 132 |
| SE | 93.64% | 85.91% | 83.33% | 71.79% | 88.61% | 76.73% | 91.67% | 75.00% |
| Change in SE | - | -7.73% | - | -11.54% | - | -11.88% | - | -16.67% |
| PPV | 33.23% | 56.42% | 30.85% | 46.28% | 29.06% | 44.29% | 23.05% | 44.20% |
| Change in PPV | - | +23.19% | - | +15.43% | - | +15.23% | - | +21.15% |
If we exclude the clusters which bear no TPs or just one TP among the candidates predicted by ProMirII-g, we can see a great improvement in PPV without a significant effect on SE after our filtering approach is applied. The results agree with the principle of our approach, which is developed based on the phenomenon of miRNA clustering. In other words, if there are no clustered miRNAs in a sequence, our approach is not going to work properly. This table presents the results of a fairer comparison, suggesting that our approach is effective in filtering FPs.