| Literature DB >> 20509939 |
Martin Sturm1, Michael Hackenberg, David Langenberger, Dmitrij Frishman.
Abstract
BACKGROUND: Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20509939 PMCID: PMC2889937 DOI: 10.1186/1471-2105-11-292
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Classification of microRNA target site prediction tools
| Organism | Seed match not required | Seed match required | Seed match required and conservation considered |
|---|---|---|---|
| Human | RNA22 [ | PITA All 3/15 [ | EIMMo [ |
| Fly | RNA22 [ | PITA All 3/15 [ | EIMMo [ |
Figure 1A schematic overview of the . Annotated 3'UTR sequences and all known microRNAs from a given species serve as input. MicroRNAs are matched against 3'UTRs to generate potential candidate zones. The resulting candidate zones are classified and ranked according to their score, with overlapping zones being merged together.
Number of target sites predicted in each species by different versions of the TargetSpy method. See Methods for more detail
| Number of predicted target sites | |||||||
|---|---|---|---|---|---|---|---|
| Number of 3'UTRs | Average 3'UTR length | Number of microRNAs | TargetSpy no-seed sens | TargetSpy no-seed spec | TargetSpy seed sens | TargetSpy seed spec | |
| Human | 26161 | 1210 | 692 | 4837 k | 1023 k | 829 k | 339 k |
| Mouse | 18694 | 1082 | 513 | 1906 k | 407 k | 340 k | 137 k |
| Rat | 11859 | 760 | 292 | 535 k | 113 k | 91 k | 36 k |
| Chicken | 3676 | 927 | 443 | 372 k | 80 k | 59 k | 24 k |
| Fly | 15884 | 471 | 147 | 247 k | 54 k | 50 k | 20 k |
A ranked list of all features used in this work. The score is calculated by the ReliefF method
| Rank | Features | Score |
|---|---|---|
| 1 | Number of base parings to the microRNA 8-mer seed | 0.03175 |
| 2 | G+C content of target site | 0.01263 |
| 3 | Number of base pairings to the first 8 nucleotides of the microRNA 3' end | 0.01038 |
| 4 | Number of consecutive base-pairings to the microRNA 3' end with two allowed non-pairing positions | 0.00995 |
| 5 | Occurrence of CpG in target site | 0.00799 |
| 6 | G+C content ratio between the microRNA and the target site | 0.00642 |
| 7 | Compactness | 0.00619 |
| 8 | T9 anchor | 0.00556 |
| 9 | Longest stretch of consecutive base-pairings in the hybrid | 0.00513 |
| 10 | Number of bulges in the microRNA of size three | 0.00498 |
| 11 | T1 S/W anchor | 0.00491 |
| 12 | Total number of base-pairings | 0.00475 |
| 13 | Number of bulges on the target site of size seven or greater | 0.00442 |
| 14 | T1 anchor | 0.00434 |
| 15 | Number of bulges in the microRNA of size two | 0.00433 |
| 16 | Occurrence of CpG in the upstream flanking area | 0.00383 |
| 17 | Number of bulges in the target site of size one | 0.00374 |
| 18 | Total bulge length of the target site | 0.00362 |
| 19 | Length of the target site | 0.00336 |
| 20 | Total bulge length of the microRNA | 0.00334 |
| 21 | Target site position within the 3'UTR | 0.00333 |
| 22 | Number of symmetric bulges | 0.00290 |
| 23 | G + C content upstream of the target site | 0.00287 |
| 24 | Number of bulges on the target site | 0.00286 |
| 25 | Length of the second largest bulge on the target site | 0.00268 |
| 26 | Mean length of bulges on the target site | 0.00263 |
| 27 | T9 S/W anchor | 0.00261 |
| 28 | Binding asymmetry | 0.00255 |
| 29 | Number of bulges in the target site of size two | 0.00240 |
| 30 | Total number of G:U wobble base pairs | 0.00227 |
| 31 | Local RISC accessibility 30/30 | 0.00220 |
| 32 | Local RISC accessibility 3/15 | 0.00215 |
| 33 | Number of bulges in the target site of size four | 0.00210 |
| 34 | Difference in G+C content between the first and the last nt of the target site | 0.00201 |
| 35 | Occurrence of CpG in downstream flanking area | 0.00179 |
| 36 | Number of bulges in the microRNA of size one | 0.00179 |
| 37 | Length of the second largest bulge on the microRNA | 0.00174 |
| 38 | Number of bulges on the microRNA | 0.00153 |
| 39 | Number of bulges in the microRNA of size five | 0.00128 |
| 40 | Number of bulges in the target site of size three | 0.00113 |
| 41 | Difference in G + C content between the target site and the 20 nt upstream and downstream flanking region | 0.00112 |
| 42 | Number of bulges in the target site of size five | 0.00100 |
| 43 | Number of bulges in the microRNA of size four | 0.00084 |
| 44 | G+C content downstream of the target site | 0.00084 |
| 45 | Number of bulges in the target site of size six | 0.00021 |
Figure 2Classifier performance as a function of the feature set size. The classifier was evaluated in an iterative process where one feature was added at a time. Features were selected according to the ranked feature list (see Table 3), beginning with the best feature. In black the AUC values (y-axis) for the corresponding feature set size (x-axis) are shown. The red line indicates the AUC value of the feature set that was achieved by the feature subset selection approach.
Applied thresholds and limitations on the prediction subsets
| Prediction dataset name | Seed match required | Conservation considered | False-positive rate threshold |
|---|---|---|---|
| TargetSpy no-seed sens | No | No | 0.05 |
| TargetSpy no-seed spec | No | No | 0.01 |
| TargetSpy seed sens | Yes | No | 0.05 |
| TargetSpy seed spec | Yes | No | 0.01 |
| TargetSpy cons. seed sens | Yes | Yes | 0.05 |
| TargetSpy cons. seed spec | Yes | Yes | 0.01 |
Figure 3Performance comparison of target prediction approaches. A) and C) refer to the dataset compiled by Stark et al. [17]). B) and D) refer to the dataset compiled by Kertesz [9]. A) and B) show the ROC curves of the tested approaches, C) and D) the AUC values. The gray line indicates the performance of random guessing.
Figure 4Performance evaluation of various prediction approaches on the pSILAC data set. This set contains changes in protein production caused by the five microRNAs miR-1, miR-16, miR-155, miR-30a-5p and let-7b. The first value in each bar represents the number of predicted microRNA-target interactions that are associated with down-regulation (log2-fold change < -0.1) and the second value reports the total number of interactions predicted for the pSILAC set. The value on top of each bar displays the accuracy. White bars with black outlines display the trivial predictors, TargetSpy is represented in orange and other approaches are displayed in black.
Figure 5Cumulative fraction of predicted target sites of down-regulated proteins according to the measured fold change. The distributions are given for A) approaches not requiring a seed (class I), B) approaches requiring a seed (class II), and C) approaches requiring a seed and considering site conservation (class III).
Figure 6Schematic illustration of the candidate target site generation pipeline. A) MicroRNA - mRNA duplexes sharing the same anchor position on the mRNA are grouped. Duplexes with the lowest free energy in each group are shown in green color, all others in blue. B) Zoom-in at one group. The anchor of each hybrid (red vertical line) is the first nucleotide of the target site base-pairing with the 5' end of the microRNA. Only the energetically most favorable hybrid, shown in green, is retained for further analysis. C) Smoothed attraction graph of all the retained hybrids. A candidate zone is defined as the stretch of the target sequence (shown in purple) where the smoothed hybrid free energy falls below a certain energy threshold. D) For each candidate zone the energetically most favorable hybrid that shows base pairing within the first two nucleotides counting from the microRNA 5' end is selected as its representative.
Figure 7ROC curves generated by various classifiers evaluated in 10-fold cross-validations on the training set.