| Literature DB >> 12844358 |
Eric C Lai1, Pavel Tomancak, Robert W Williams, Gerald M Rubin.
Abstract
BACKGROUND: MicroRNAs (miRNAs) are a large family of 21-22 nucleotide non-coding RNAs with presumed post-transcriptional regulatory activity. Most miRNAs were identified by direct cloning of small RNAs, an approach that favors detection of abundant miRNAs. Three observations suggested that miRNA genes might be identified using a computational approach. First, miRNAs generally derive from precursor transcripts of 70-100 nucleotides with extended stem-loop structure. Second, miRNAs are usually highly conserved between the genomes of related species. Third, miRNAs display a characteristic pattern of evolutionary divergence.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12844358 PMCID: PMC193629 DOI: 10.1186/gb-2003-4-7-r42
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1miRNA genes are isolated, evolutionarily conserved genomic sequences that have the capacity to form extended stem-loop structures as RNA. Shown are VISTA plots of globally aligned sequence from D. melanogaster and D. pseudoobscura, in which the degree of conservation is represented by the height of the peak. This particular region contains a conserved sequence identified in this study that adopts a stem-loop structure characteristic of known miRNAs. Expression of this sequence was confirmed by northern analysis (Table 2), and it was subsequently determined to be the fly ortholog of mammalian mir-184. Most conserved sequences do not have the ability to form extended stem-loops, as evidenced by the fold adopted by the sequence in the neighboring peak.
Figure 2Classification of conserved stem-loop sequences. (a) Patterns of Drosophila pre-miRNA nucleotide divergence patterns imply a canonical progression in miRNA evolution. The Drosophila orthologs of 23/24 previously described miRNAs are either completely conserved (class 1), contain one or more mismatches or gaps located exclusively in the loop (class 2) or contain an equal or greater number of mutations within the loop compared to the non-miRNA-encoding arm (class 3). We consider these to represent successive steps in the normal evolution of miRNAs and therefore connect them with arrows. Members of classes 1-3 are considered as equally good candidates while members of classes 4-6 are poor candidates. As we expect class 3 candidates to eventually evolve into class 6 candidates (broken arrow), these evolutionary considerations are most relevant to species separated by an evolutionary distance comparable to D. melanogaster and D. pseudoobscura. (b) Preferential divergence of miRNAs within their loop sequences is illustrated by let-7. The Drosophila orthologs of let-7 contain three mismatches and one gap within the loop, whereas both arms have been completely conserved.
Figure 3Overview of miRseeker, a computational strategy for identifying Drosophila miRNAs. See text for details.
List of Drosophila miRNAs and additional unverified candidates supported by third-species conservation
| Rank | Score | miR name | miR position | Cytological position | Sequence | Other | Nearest gene | Comment | ||
| 2 | 26.15 | miR-2a-2 (1) | 2L:19547562 | 37E | UAUCACAGCCAAGCUUUGAUGAGC | - | - | In the intron of spi (sense) | [ | |
| 3 | 26.00 | miR-2a-1 (2) | 2L:19547974 | 37E | UAUCACAGCCAGCUUUGAUGAGC | + | + | Worm | In the intron of spi (sense) | [ |
| 6 | 24.16 | miR-2b-2 (3) | 2L:19548259 | 37E | UAUCACAGCCAGCUUUGAGGAGC | + | + | In the intron of spi (sense) | [ | |
| (8) | 24.01* | miR-2b-1 | 2L:8250840 | 28B | UAUCACAGCCAGCUUUGAGGAGC | - | - | 895 upstream of Btk29A | [ | |
| 8 | 23.52 | miR-13b-2 (4) | X:8830202 | 8C | UAUCACAGCCAUUUUGACGAGU | - | - | In the intron of CG7033 (sense) | [ | |
| 9 | 23.45 | miR6-3 (5) | 2R:14724424 | 56E | UAUCACAGUGGCUGUUCUUUUU | - | - | 1732 upstream of CG11018 | [ | |
| 12 | 22.89 | miR-12 (6) | X:15240478 | 13D | UGAGUAUUACAUCAGGUACUGGU | + | + | 1986 upstream of Ac13E | [ | |
| 13 | 22.84 | miR-7 (7) | 2R:15669777 | 57A | UGGAAGACUAGUGAUUUUGUUGU | + | + | Vertebrate | 816 upstream of CG30147 | [ |
| 17 | 22.45 | miR-14 (8) | 2R:4614375 | 45E | UCAGUCUUUUUCUCUCUCCUA | + | + | 5855 upstream of Or45b | [ | |
| 27 | 20.94 | miR-9 (9) | 3L:19515075 | 76C | UCUUUGGUUAUCUAGCUGUAUGA | + | + | 6462 upstream of Shal | [ | |
| 29 | 20.77 | miR6-2 (10) | 2R:14724582 | 56E | UAUCACAGUGGCUGUUCUUUUU | - | - | 1574 upstream of CG11018 | [ | |
| 31 | 20.65 | miR6-1 (11) | 2R:14724711 | 56E | UAUCACAGUGGCUGUUCUUUUU | - | - | 1445 upstream of CG11018 | [ | |
| 33 | 20.38 | miR-13a (12) | 3R:11243269 | 88F | UAUCACAGCCAUUUUGAUGAGU | + | + | 4626 upstream of CG6118 | [ | |
| 36 | 20.08 | miR-5 (13) | 2R:14724858 | 56E | AAAGGAACGAUCGUUGUGAUAUG | - | - | 1298 upstream of CG11018 | [ | |
| 62 | 18.86 | let-7 (14) | 2L:18450101 | 36E | UGAGGUAGUAGGUUGUAUAGU | + | + | Vertebrate/ worm | 932 upstream of CG10283 | [ |
| 18.45* | miR-10 | 3R:2635277 | 84B | ACCCUGUAGAUCCGAAUUUGU | + | + | Vertebrate | 13566 upstream of Scr | [ | |
| 74 | 18.42 | miR-1 (15) | 2L:20457182 | 38D | UGGAAUGUAAAGAAGUAUGGAG | + | - | Vertebrate/ worm | 14444 upstream of CG15476 | [ |
| 96 | 17.79 | miR-3 (16) | 2R:14725313 | 56E | UCACUGGGCAAAGUGUGUCUCA | - | - | 843 upstream of CG11018 | [ | |
| 114 | 17.49 | miR-11 (17) | 3R:17439181 | 93E | CAUCACAGUCUGAGUUCUUGC | - | - | In the intron of E2f (sense) | [ | |
| 124 | 17.36 | miR-4 (18) | 2R:14724998 | 56E | AUAAAGCUAGACAACCAUUGA | - | - | 1158 upstream of CG11018 | [ | |
| 172 | 16.54 | miR-13b-1 (19) | 3R:11243135 | 88F | UAUCACAGCCAUUUUGACGAGU | + | + | 4760 upstream of CG6118 | [ | |
| 192 | 16.27 | miR-8 (20) | 2R:11895154 | 53D | UAAUACUGUCAGGUAAAGAUGUC | + | + | 3783 downstream of CG6301 | [ | |
| 14.20 | miR-125 | 2L:18450405 | 36E | UCCCUGAGACCCUAACUUGUGA | + | + | Vertebrate/ worm | 628 upstream of CG10283 | [ | |
| miR-2c | 3R:11243493 | 88F | UAUCACAGCCAGCUUUGAUGGGC | - | - | 4402 upstream of CG6118 | Hom; score n/a; 3 miR cluster | |||
| Rank | Score | miR name | miR position | Cytological position | Sequence | Other | Nearest gene | Comment | ||
| 4 | 24.67 | miR-184 | 2R:8394117 | 50A | UGGACGGAGAACUGAUAAGGG | + | + | Vertebrate | 24406 upstream of CG17048 | Expression verified |
| 7 | 24.15 | miR-274 | 3L:11614451 | 68C | UUUUGUGACCGACACUAACGGGUAAU | - | - | In the intron of CG32085 (antisense) | Expression verified | |
| 10 | 23.10 | miR-275 | 2L:7418027 | 27F | cAGUCAGGUACCUGAAGUAGCGCGCG | + | + | 1070 upstream of CG5261 | 2miR cluster (+Ano and Apis); expression verified | |
| 16 | 22.57 | miR-92a | 3R:21461594 | 96E | CAUUGCACUUGUCCCGGCCUG | + | + | Vertebrate | 6578 upstream of BcDNA:LD22548 | 2miR cluster |
| 21 | 21.72 | miR-219 | 3L:17263886 | 74A | UGAUUGUCCAAACGCAAUUCUUG | + | + | Vertebrate | 4955 upstream of CG6485 | Expression not seen |
| 25 | 21.12 | miR-276a | 3L:10322758 | 67E | CAGCGAGGUAUAGAGUUCCUACG | + | + | 47587 upstream of CG12362 | Duplicated, 45 kb apart; expression verified; 1copy in Ano and Apis | |
| 28 | 20.88 | miR-277 | 3R:5925763 | 85F | UGUAAAUGCACUAUCUGGUACGACAU | + | + | 1391 upstream of Fmr1 | 2 miR cluster; expression verified | |
| 30 | 20.73 | miR-278 | 2R:10720792 | 52B | ggUGGGACUUUCGUCCGUUUGUAA | + | - | 386 upstream of fus | Expression verified | |
| 34 | 20.27 | miR-133 | 2L:20586360 | 38D | UUGGUCCCCUUCAACCAGCUGU | + | + | Vertebrate | 1059 downstream of CG15475 | 3 miR cluster; expression verified; [ |
| 37 | 20.03 | miR-279 | 3R:25030674 | 99A | UGUGACUAGAUCCACACUCAU | + | + | 1328 upstream of CG31044 | Related to miR-286; expression verified | |
| 38 | 19.90 | miR-33 | 3L:19716503 | 76C | AGGUGCAUUGUAGUCGCAUUG | - | - | Vertebrate | In the intron of HLH106 (sense) | |
| 39 | 19.77 | miR-280 | 2R:3358854 | 44C | UGUAUUUACGUUGCAUAUGAAAUGAUA | - | - | 21740 upstream of CG30358 | Expression verified | |
| 41 | 19.73 | miR-281a | 2R:7235078 | 48E | ACUGUCGACGGACAGCUCUCUU | - | - | 356 downstream of SmD3 | Duplicated cluster; expression verified; 1 copy in Ano | |
| 43 | 19.64 | miR-282 | 3L:3231652 | 63C | aaucUAGCCUCUACUAGGCUUUGUCUGU | + | - | 7132 upstream of CG14959 | Expression verified | |
| 44 | 19.55 | miR-283 | X:15238971 | 13D | AAAUAUCAGCUGGUAAUUCUGGG | + | + | 3493 upstream of Ac13E | 2 miR cluster; expression verified | |
| 46 | 19.52 | miR-284 | 3R:8377257 | 87C | UGAAGUCAGCAACUUGAUUCCAGCAAUUG | - | - | 1128 upstream of CG6989 | Expression verified | |
| 47 | 19.47 | miR-281b | 2R:7234866 | 48E | ACUGUCGACGGAUAGCUCUCUU | + | - | 144 downstream of SmD3 | Duplicated cluster; expression verified | |
| 49 | 19.35 | miR-34 | 3R:5926677 | 85F | UGGCAGUGUGGUUAGCUGGUUG | + | + | Vertebrate/ worm | 477 upstream of Fmr1 | 2 miR cluster; expression verified; [ |
| 50 | 19.27 | miR-263a | 2L:11942273 | 33B | aAUGGCACUGGAAGAAUUCACg | + | + | Vertebrate | 4764 downstream of CG16964 | Expression verified; [ |
| 59 | 18.89 | miR-124 | 2L:17544454 | 36D | AUAAGGCACGCGGUGAAUGCCA | + | + | Vertebrate/ worm | 10606 downstream of CG7094 | 2 miR cluster; [ |
| 66 | 18.58 | miR-79 | 2L:16676639 | 36A | AUAAAGCUAGAUUACCAAAGC | + | + | Worm | 822 upstream of CG31782 | 3 miR cluster; expression verified; [ |
| 67 | 18.57 | miR-276b | 3L:10277315 | 67E | CAGCGAGGUAUAGAGUUCCUACG | - | - | Vertebrate | 7073 downstream of CG6559 | Duplicated, 45 kb apart; expression verified; 1copy in Ano and Apis |
| 77 | 18.36 | miR-210 | X:17859179 | 16F | UUGUGCGUGUGACAGCGGCUA | + | + | Vertebrate | 1193 downstream of CG32553 | |
| 83 | 18.11 | miR-285 | 3L:11903642 | 68E | UAGCACCAUUCGAAAUCAGUGCU | - | - | Vertebrate | 1592 upstream of CG7252 | Similar to miR-29 |
| 18.08* | miR-100 | 2L:18449518 | 36E | AACCCGUAAAUCCGAACUUGUG | + | - | Vertebrate | 1515 upstream of CG10283 | Failed conservation filter; 3 miR cluster; expression verified; [ | |
| 91 | 17.93 | miR-92b | 3R:21466486 | 96E | AAUUGCACUAGUCCCGGCCU | + | - | Vertebrate | 1686 upstream of BcDNA:LD22548 | Expression verified; 2 miR cluster; [ |
| 145 | 17.12 | miR-286 | 2R:14724858 | 56E | AGUGACUAGACCGAACACUCG | + | - | 1013 upstream of CG11018 | Expression verified; 7 miR cluster; related to miR-279 | |
| 146 | 17.11 | bantam | 3L:622845 | 61C | AGUGAGAUCAUUUUGAAAGCUG | + | - | Worm | 6301 upstream of CG12030 | [ |
| 208 | 16.09 | miR-289 | 3L:13578391 | 70C | UAAAUAUUUAAGUGGAGCCUGCGACU | - | - | In the intron of bru-3 (antisense) | Expression verified | |
| 13.73 | miR-287 | 2L:17552694 | 36D | UGUGUUGAAAAUCGUUUGCAC | + | - | 14896 upstream of Oli | Very low score;found by proximity to miR-124; expression verified | ||
| 13.35 | miR-87 | 2L:9942828 | 30D | UGAGCAAAAUUUCAGGUGUG | - | - | Worm | 2009 upstream of CG13126 | Hom: very low score | |
| miR-263b | 3L:15666960 | 72D | cuUGGCACUGGGAGAAUUCACa | + | - | Vertebrate | 4243 upstream of comm | Hom; score n/a | ||
| miR-288 | 2L:20588106 | 38D | UUUCAUGUCGAUUUCAUUUCAUG | + | - | 2805 downstream of CG15475 | Score n/a; found by proximity to miR-133; expression verified | |||
| Rank | Score | miR name | miR position | Cytological position | Sequence | Other | Nearest gene | Comment | ||
| 1 | 26.76 | 2R:4681879 | 46A | CAUCACACCCAGGUUGAGUGAGU | + | + | In the intron of Mmp2 (antisense) | NT | ||
| 5 | 24.35 | 3R:121090 | 82A | AAAUUGACUCUAGUAGGGAGUCC | + | + | 533 downstream of CG9780 | NT | ||
| 14 | 22.63 | X:1545630 | 2B | UGCAGGUUUCGUCGACAACGA | + | - | 732 upstream of CG32806 | NT | ||
| 19 | 22.13 | 3L:21585985 | 79A | CGAUUUGUCUUUUUCCGCUUACUG | + | - | 1727 downstream of CG7160 | NT | ||
| 20 | 21.95 | 3L:18809845 | 75E | UUUUGAUUGUUGCUCAGAAAGCC | + | + | 3283 upstream of CG6865 | No expression seen either strand | ||
| 23 | 21.38 | 3L:8530512 | 66D | GUGAGAUAUGUUUGAUAUUCUUGGUUGUU | + | + | 2374 upstream of CG6638 | NT | ||
| 40 | 19.75 | X:12366993 | 11B | UAUCAUAAGACACACGCGGCUAU | + | - | in the intron of tomosyn (sense) | NT | ||
| 54 | 19.06 | 2R:11128979 | 52E | guUAUUGCUUGAGAAUACACGUAGUU | + | + | 15915 upstream of Dg | No expression seen either strand | ||
| 61 | 18.86 | 2L:859210 | 21D | AGUUUGUUCGUUUGGCUCGAGUUAU | + | - | 2208 downstream of CG13949 | NT | ||
| 104 | 17.64 | 2L:16676008 | 36A | UCUUUGGUAUUCUAGCUGUAGA | + | - | 1453 upstream of CG31782 | No expression seen; miR-79 cluster | ||
| 117 | 17.44 | 3R:21403955 | 96E | UGAUAUUGUCCUGUCACAGCAGUA | + | - | 3265 upstream of CG12250 | No expression seen | ||
| 123 | 17.36 | 2L:7418192 | 27F | AUUGUACUUCAUCAGGUGCUCUGGUG | + | + | 905 upstream of CG5261 | NT | ||
| 126 | 17.31 | 3R:16621175 | 92F | UUUGUUUUGCAAUUUUCGCUUU | + | - | In the intron of CG17838 (sense) | NT | ||
| 130 | 17.24 | 2L:16676828 | 36A | CUUUGGUGAUUUUAGCUGUAUG | + | - | 633 upstream of CG31782 | No expression seen; miR-79 cluster | ||
| 183 | 16.39 | 2R:7223583 | 48E | UCAUCCCCUUGUUGCAAACCUCACGC | + | - | In the intron of CG8877 (sense) | NT | ||
| 190 | 16.28 | 3R:5916861 | 85F | UGGGAUACACCCUGUGCUCGCU | + | - | 17107 upstream of CG5361 | NT | ||
| 195 | 16.24 | 2L:243049 | 21B | CAUAAGCGUAUAGCUUUUCCC | + | + | In the intron of kis (sense) | NT | ||
These sequences were identified as high-scoring candidates through miRseeker analysis of drosophilid genomes (except as noted) and are ordered by their rank and score. The first part of the table includes members of the reference set, whose rank within the reference set is given in parentheses after the gene name; thus miR-4 ranked 18th among the reference set and 124th overall. The second part of the table includes miRNAs newly identified in this study. In general, we defined a candidate miRNA sequence on the basis of the bounds of conserved sequence; this is often longer than the presumed 21-22 nucleotide mature product. The third part of the table includes unverified gene predictions supported by conservation in Anopheles and/or Apis. Drosophila-specific predictions without confirming expression data may be viewed on the web [43]. References in the comments are to miRNAs that have been independently identified in previous or concurrent studies. n/a, score not available; NT, expression not tested; Hom, miRNA identified solely by homology to other miRNAs. The following miRs were not identifed by miRseeker: miR-10 was not aligned using the first release of the D. pseudoobscura genome while miR-2b-1 and miR-100 failed the conservation filters. These three received very high miRseeker scores, however, and they have been placed into the list for the sake of comparison, although they are not ranked. Six additional miRNAs scored poorly but are genuine. These include two members of the reference set (miR-125 and miR-2c), two that were identified by homology to miRNAs cloned from other species (miR-87 and miR-263b), and two that were identified as Anopheles-conserved stem-loops located in proximity to other Drosophila miRNAs whose expression was verified by northern analysis (miR-287 and miR-288). Most miRNAs are located in intragenic regions, and there is an apparent bias for intronic miRNAs to be located on the transcribed strand.
Figure 4Efficient selection of genuine miRNAs by miRseeker. (a) Distribution of the top 2,996 candidates binned by helical/free energy score (white bars), of which 570 passed subsequent conservation filters (green bars). 21/24 members of the reference set received a score of 16 or higher, and 20 of these passed the conservation filters. Note that these figures do not include mir-10, which did not fall in an aligned contig and was thus not analyzed, even though its miRseeker score is 18.45 and it passes conservation filters. (b) List of the top 124 miRseeker candidates; members of the reference set are highlighted in green, newly identified miRNAs from this study in blue, and additional third-species-conserved candidates in orange. The vast majority of the highest-scoring candidates are bona fide.
Figure 5Diverse temporal and quantitative expression profiles of novel miRNAs by northern blotting. The three lanes represent 0-24-hour embryos (E), third instar larvae and 0-1-day pupae (L) and adult males (A), and hybridizing bands from the 21-24 nucleotide range are shown. (a-g) miRs with preferential expression at individual stages or a combination of two of these stages. (h-j) miRs that are expressed throughout development, either at uniform levels or in a graded fashion. (k) miR-1 was used as a control. Note that the blots shown were exposed for different lengths of time, so the relative levels of different miRNAs are not directly comparable; please refer to Table 2.
Summary of northern blot studies
| Gene | E | L | A |
| miR-286 | ++++ | ||
| miR-92b | ++ | ++ | |
| miR-79 | ++ | ++ | + |
| miR-275 | + | + | + |
| miR-287 | + | + | + |
| miR-283 | + | + | + |
| miR-281a, b | + | + | + |
| miR-279 | +++ | +++ | +++ |
| miR-263a | +++ | +++ | +++ |
| miR-276a, b | + | ++ | +++ |
| miR-288 | + | + | ++ |
| miR-184 | + | + | ++ |
| miR-282 | ++ | ||
| miR-289 | ++ | ||
| miR-133 | + | ++ | |
| miR-278 | + | + | |
| miR-284 | + | + | |
| miR-100 | ++ | ++ | |
| miR-34 | + | ++ | |
| miR-280 | + | ||
| miR-277 | ++++ | ||
| miR-274 | ++++ |
The relative abundance of a given miRNA at each stage is represented by the number of plus signs. miR-276a, b and miR-281a, b produce similar miRNAs that are not distinguished by northern analysis.
Figure 6Example of a miRNA with false-negative evidence by northern blot (2R:11128979 = miR-137). In this example, four related sequences from four species of insects (Dm, D. melanogaster; Dp, D. pseudoobscura; Ag, A. gambiae; Am, A. mellifera) all adopt a phylogenetically conserved stem-loop structure. One arm has been perfectly preserved among all four species, and we presume that a miRNA is processed from within the conserved sequence (orange). Patterns of nucleotide divergence characteristic of miRNAs are seen, with more related sequences (Dm/Dp and Ag/Am) showing approximately equal amounts of divergence within the loop and along one arm, whereas the Dm/Dp vs Ag/Am comparison shows complete divergence within the loop (blue), with slightly less overall divergence along the putative non-miRNA-encoding arms. We deduce that a mature miRNA may initiate at one of the U residues that are highlighted by asterisks, as the first residue of the conserved region is found in the loop of the drosophilid hairpins and the second G residue is unfavored as the 5' residue of a miRNA. Northern analysis was negative using a probe complementary to the conserved region as well as with a probe identical to the conserved region (in the event that a miRNA is transcribed from the other strand). This sequence was only subsequently discovered to be orthologous to vertebrate miR-137 (which initiates at the second highlighted U). We consider other unverified predicted genes conserved in other insect species with similar characteristics to be potential candidates (see also Table 1).
Figure 7Examples of Drosophila miRNA gene clusters. In this figure, pre-miRNAs are represented by rectangles and the arm that gives rise to the mature miRNA is colored. (a) The largest miRNA cluster was previously identified by Tuschl and colleagues [10]; we identified and experimentally verified a new member of this cluster, mir-286. A second conserved hairpin was found (light gray box), but its expression was not seen. Of the seven genes in this cluster, only mir-286 is conserved in Anopheles (ano). Note also that this cluster contains both related miRNA genes (mir-6-1, -2, -3 and the K-box antisense gene mir-5, yellow), as well as unrelated miRNA genes (black). (b) A second example of rapid miRNA gene evolution. The Anopheles genome contains four members of the mir-2/mir-13 family, which are all located in a single cluster. In contrast, drosophilid genomes contain eight members of this family, located at four distinct genomic locations on three different chromosomes. (c) A cluster of putative developmental regulators. let-7 and mir-125 are orthologous to the genetically characterized genes let-7 and lin-4 in C. elegans. A similar gene cluster exists in Anopheles, although mir-100 is separated from the other two by several kilobases (not shown). (d) Other examples of miRNA clusters. Note that, as is the case for the other clusters shown, miRNA clusters can contain related genes (yellow), but appear to be as likely to contain unrelated genes (black).