| Literature DB >> 16698958 |
Abstract
MicroRNAs are short (approximately 22 nt) regulatory RNA molecules that play key roles in metazoan development and have been implicated in human disease. First discovered in Caenorhabditis elegans, over 2500 microRNAs have been isolated in metazoans and plants; it has been estimated that there may be more than a thousand microRNA genes in the human genome alone. Motivated by the experimental observation of strong conservation of the microRNA let-7 among nearly all metazoans, we developed a novel methodology to characterize the class of such strongly conserved sequences: we identified a non-redundant set of all sequences 20 to 29 bases in length that are shared among three insects: fly, bee and mosquito. Among the few hundred sequences greater than 20 bases in length are close to 40% of the 78 confirmed fly microRNAs, along with other non-coding RNAs and coding sequence.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16698958 PMCID: PMC3303174 DOI: 10.1093/nar/gkl173
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Fractions of unfiltered maximal 25mer hits to repeat-masked fly genome with a given annotation from FlyBase; see caption to Table 2 for details and abbreviations. Corresponding figures for all lengths are available in Supplementary Figures 1-unfiltered and 1-filtered. Annotations were counted as described in Materials and Methods: annotations. A single hit can have multiple annotations; consequently, the sum of the hit fractions displayed in the figure may exceed unity.
Maximal N-mers in ternary insect intersections
| Length N | Unfiltered N-mers | Filtered N-mers | miRNAs | cumulative miRNAs |
|---|---|---|---|---|
| >29 | 67 | 67 | a2 | a2 |
| 29 | 8 | 8 | 1 | 1 |
| 28 | 9 | 9 | 2 | 3 |
| 27 | 6 | 6 | 3 | 6 |
| 26 | 17 | 17 | 4 | 10 |
| 25 | 18 | 15 | 9 | 19 |
| 24 | 29 | 18 | 5 | 24 |
| 23 | 90 | 32 | 5 | 29 |
| 22 | 334 | 28 | 1 | 30 |
| 21 | 1163 | 88 | 3 | 33 |
| 20 | 5635 | 695 | 1 | 34 |
| Total: 34 of 78 in fly |
Number of unfiltered maximal N-mers [N-mers not contained in any (N+1)-mer] in the ternary intersection of repeat-masked fly, bee and mosquito genomes. Column 4 displays the number of exact, full-length hits containing mature miRNA possibly together with some flanking sequence in the precursor. Although all genomes were repeat-masked before intersection, we found that intersection strongly enriches for repetitive and low-entropy sequence, especially as N decreases. We applied a simple filter to the N-mers, detailed in the text, that increased enrichment for confirmed miRNA sequence in this regime as shown in column 3: Filtered N-mers.
aThere were two exact hits above N = 29 to miRNA sequences containing the partners of mature miRNAs within their respective stem–loop precursors; one hit at N = 30, and one at N = 31. All hairpin precursors and the sequences of the hits to them can be found in Supplementary Tables 1-unfiltered and 1-filtered for both binary and ternary intersections.
Annotations of unfiltered maximal 25mers
| miR | tR | snR | ncR | snoR | rR | 3′ | 5′ | cds | cborder | intron | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Fly-bee-mosquito | 9 | 4 | 0 | 6 | 0 | 0 | 0 | 0 | 3 | 3 | 1 |
| Fly-bee | 9 | 11 | 0 | 6 | 0 | 0 | 12 | 7 | 51 | 5 | 157 |
| Fly-mosquito | 6 | 4 | 1 | 3 | 0 | 0 | 17 | 21 | 302 | 17 | 220 |
| Fly-sample (0.2%) | 12 | 74 | 5 | 338 | 10 | 2 | 6098 | 3574 | 37573 | 5169 | 65886 |
| interg | repeat | tss | enh | pbs | regal | poly(A) | transpo | teis | hits | seqs | |
| Fly-bee-mosquito | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 22 | 18 |
| Fly-bee | 237 | 16 | 0 | 0 | 0 | 0 | 0 | 1 | 8 | 491 | 446 |
| Fly-mosquito | 243 | 152 | 3 | 0 | 0 | 0 | 0 | 1 | 19 | 832 | 631 |
| Fly-sample (0.2%) | 88138 | 7293 | 1896 | 58 | 74 | 269 | 1 | 500 | 2660 | 207152 | 199560 |
Annotations from FlyBase of unfiltered maximal 25mers conserved exactly among binary and ternary intersections with fly. Annotations of a 0.2% random sample of all distinct 30mers from repeat-masked fly genome are shown for comparison. Labels: miR: microRNA; tR: transfer RNA; snR: small nuclear RNA; ncR: non-coding RNA; snoR: small nucleolar RNA; rR: ribosomal RNA; cds: protein coding; interg: intergenic; cborder: spans an coding/intron or coding/UTR boundary; repeat: repeat region; tss: transcription start site; enh: enhancer; pbs: protein binding site; regla: regulatory region; poly(A): polyadenylation site; transpo: transposable element; teis: transposable element insertion site; hits: number of hits to repeat-masked fly genome; seqs: number of sequences. Complete data for all N > 19, both filtered and unfiltered, are available as Supplementary Table 2.
Figure 2Fraction of unfiltered maximal 25mers with annotations from FlyBase, for a 0.2% random sample of distinct 25mers from the repeat-masked fly genome, and for binary and ternary intersections with fly genome. The number of maximal 25mers in the intersection is displayed inside the parentheses, followed by P-value. Corresponding figures for all other values of N are available in Supplementary Figures 2-unfiltered and 2-filtered.
Whole-genome alignment versus N-mer intersection
| Exact matches of length >22 containing mature microRNAs | ||
|---|---|---|
| Genomes | Alignment | Intersection |
| Dm2/Am2 | 5 | 29 |
| Dm2/Ag1 | 25 | 33 |
| Hg17/Xt1 | 75 | 132 |
| Mm7/Xt1 | 65 | 128 |
| Hg16/Mm4/Rn3 | 207 | 213 |
For the ‘alignment’ column, we identified all (gap-free) contiguous sequence of length greater than 22 bases from UCSC whole-genome alignments (see Materials and Methods) that was both (i) aligned at corresponding positions and (ii) shared identical base sequence in all organisms contributing to the alignment. For the ‘intersection’ column, we performed N-mer intersections on the same sources, versions and assemblies of the genomes that were used for the alignments. Both sets of sequences were searched against miRbase for exact, full-length hits to mature microRNAs, and the numbers of distinct hits recorded in the table. With one or two exceptions, whenever a subsequence of an N-mer exactly matched a confirmed mature microRNA sequence, the exact matching extended into the corresponding parent hairpin up to the full N-mer length.