| Literature DB >> 15891114 |
Yael Altuvia1, Pablo Landgraf, Gila Lithwick, Naama Elefant, Sébastien Pfeffer, Alexei Aravin, Michael J Brownstein, Thomas Tuschl, Hanah Margalit.
Abstract
MicroRNAs (miRNAs) are approximately 22 nt-long non-coding RNA molecules, believed to play important roles in gene regulation. We present a comprehensive analysis of the conservation and clustering patterns of known miRNAs in human. We show that human miRNA gene clustering is significantly higher than expected at random. A total of 37% of the known human miRNA genes analyzed in this study appear in clusters of two or more with pairwise chromosomal distances of at most 3000 nt. Comparison of the miRNA sequences with their homologs in four other organisms reveals a typical conservation pattern, persistent throughout the clusters. Furthermore, we show enrichment in the typical conservation patterns and other miRNA-like properties in the vicinity of known miRNA genes, compared with random genomic regions. This may imply that additional, yet unknown, miRNAs reside in these regions, consistent with the current recognition that there are overlooked miRNAs. Indeed, by comparing our predictions with cloning results and with identified miRNA genes in other mammals, we corroborate the predictions of 18 additional human miRNA genes in the vicinity of the previously known ones. Our study raises the proportion of clustered human miRNAs that are <3000 nt apart to 42%. This suggests that the clustering of miRNA genes is higher than currently acknowledged, alluding to its evolutionary and functional implications.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15891114 PMCID: PMC1110742 DOI: 10.1093/nar/gki567
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Cumulative distance distribution of miRNA genes and other types of human genomic functional elements. For each of the described elements, the distances (in nucleotides) between every two same-chromosome same-strand successive elements were calculated. Distance is drawn on a logarithmic scale. The different elements are marked: orange (exon of protein-coding genes), green (protein-coding gene), black (snoRNA), blue (tRNA), red (miRNA) and cyan (snRNA). The genomic coordinates were derived from the UCSC July 2003 human genome assembly build 34, hg16 (20,21) (). Protein-coding genes and exons were based on the refGene and knownGene tables. SnoRNA, tRNA and snRNA pseudogenes were excluded.
Figure 2Conservation patterns of known and predicted human miRNAs. The conservation patterns are based on the UCSC phastCons scores (22,23) (). The chromosomal regions of the miRNAs with an additional 3000 nt flanking on both sides are presented. The chromosomal coordinates follow the build 34 assembly (hg16) of the human genome from UCSC (20,21) (). For simplicity the x-axis displays the relative positions. Known miRNAs are designated by their Rfam name omitting the ‘hsa’ prefix (19). The predicted miRNAs that were verified experimentally fall into two categories: (E)-verified experimentally in this study, and (S)-verified by similarity to a homologous miRNA in another organism. The miRNA orientation is marked by an arrow. (A) known large miRNA cluster; (B) known miRNA clustered pair; (C) example of a miRNA prediction that extends a known pair cluster; (D) reveals a new multi-member cluster; and (E) reveals a new clustered pair. The plots are not plotted to scale and, therefore, the conserved region width is a function of the length of the presented region; the longer the region, the narrower is the presented profile).
Comparison of conservation and stem–loop folding potential between miRNA regions and random regions
| NMH | NMH random | MH | MH random | |
|---|---|---|---|---|
| No. of conserved subsequences | 525 | 407 | 259 | 147 |
| No. of conserved subsequences with predicted stem–loop | 253 (48%) | 109 (27%) | 122 (47%) | 47 (32%) |
| No. of known miRNAs included in the conserved stem–loop regions | 113 (86%) | — | 60 (84%) | — |
| No. of miRNA-neighboring sequences with conserved stem–loops | 140 | — | 62 | — |
aThe NMH set includes a total of 103 fragments of 131 miRNA sequences within intergenic regions and their flanking regions.
b‘NMH random’ is comprised of 25 sets, each consisting of 103 random sequences in intergenic regions. This column shows the number of random fragments with a tested property, averaged over the 25 random sets. The percentage is also an average over the 25 sets.
cThe MH set includes a total of 55 fragments of 71 miRNA gene sequences within pre-mRNA non-coding sequences and their flanking regions.
d‘MH random’ is comprised of 25 sets, each consisting of 55 sequence fragments, chosen randomly from pre-mRNA intronic regions. The column contains information as in ‘b’.
eAs miRNAs are rarely found to overlap exons on the opposite strand, such conserved stem–loop regions were filtered out. The percentage is out of the conserved subsequences.
fPercentage is out of the known miRNA genes (131 for NMH and 71 for MH).
gSubtraction of the third row from the second.
miRNA-related properties in miRNA genes and random data
| Free energy of folding (kcal/mole) | No. of base-paired nucleotides/fold length | Conserved region length (nucleotides) | |
|---|---|---|---|
| NMH training set | −41.6 ± 9.9 | 0.37 ± 0.025 | 130 ± 81 |
| NMH random set | −23.39 ± 12.6 | 0.31 ± 0.036 | 259 ± 213 |
| Applied threshold | X < −18.8 | X > 0.31 | X < 333 |
aAverage number over the 25 random sets.
Identification of sequences with miRNA-like properties
| Initial no. of sequences | No. of sequences that passed additional filters | |
|---|---|---|
| NMH (training set) | 113 | 108 (96%) |
| MH (test set) | 60 | 60 (100%) |
| NMH random | 109 | 27 (25%) |
| MH random | 47 | 12 (25%) |
| Vicinity of NMH | 140 | 76 (54%) |
| Vicinity of MH | 62 | 21 (34%) |
aNumber of sequences that show the conservation pattern and folding potential.
bNumber of sequences that in addition to the conservation and folding potential show three other properties that regard the length of the conserved region and the properties of the folded structure (detailed in Table 2).
cAverage number and average percentage over the 25 random sets.
dAll predictions excluding the known miRNAs.
Supporting evidence for the predicted miRNA genes in the vicinity of known miRNAs
| Coordinates of cluster-founding miRNAs | Predicted miRNA precursor coordinates | Supporting evidence | |||||
|---|---|---|---|---|---|---|---|
| Cluster-founding miRNAs | Chromosome | Start | End | Start | End | By cloning (this study) | By similarity |
| Predicted miRNA genes supported by cloning | |||||||
| miR-200b, miR-200a | 1 (+) | 1 008 542 | 1 009 390 | 1 010 452 | 1 010 518 | hsa-miR-429 ( | miR-429 ( |
| miR-191 (MH) | 3 (−) | 49 017 063 | 49 017 154 | 49 016 591 | 49 016 681 | hsa-miR-425–3p,5p | Rfam: hsa-miR-425 |
| miR-127,miR-136 | 14 (+) | 99 339 357 | 99 341 161 | 99 337 372 | 99 337 503 | hsa-miR-431 | |
| 99 338 264 | 99 338 356 | hsa-miR-433 | |||||
| miR-299,miR-323 | 14 (+) | 99 480 172 | 99 482 195 | 99 478 434 | 99 478 519 | hsa-miR-379 | Rfam: hsa-miR-379 |
| 99 483 163 | 99 483 242 | hsa-miR-329 | |||||
| miR-368 | 14 (+) | 99 496 068 | 99 496 133 | 99 497 151 | 99 497 236 | hsa-miR-376a-3p | Rfam:hsa-miR-376a |
| miR-134 | 14 (+) | 99 511 065 | 99 511 137 | 99 510 681 | 99 510 762 | hsa-miR-382 | Rfam: hsa-miR-382 |
| 99 512 568 | 99 512 647 | hsa-miR-453 | |||||
| miR-154 | 14 (+) | 99 516 133 | 99 516 216 | 99 518 408 | 99 518 516 | hsa-miR-377 | Rfam: hsa-miR-377 |
| miR-369 | 14 (+) | 99 521 976 | 99 522 045 | 99 521 669 | 99 521 773 | hsa-miR-409–3p,5p | Rfam: mmu-miR-409 |
| miR-144 | 17 (−) | 27 334 114 | 27 334 199 | 27 333 954 | 27 334 017 | hsa-miR-451 | cand919 ( |
| miR-224 (MH) | X (−) | 149 744 663 | 149 744 743 | 149 745 713 | 149 745 797 | hsa-miR-452 | |
| Predicted miRNA genes supported by similarity | |||||||
| miR-92,miR-19b,miR-106a | X (−) | 132 009 175 | 132 009 915 | 132 009 008 | 132 009 096 | — | cand343 ( |
| miR-299,miR-323 | 14 (+) | 99 480 172 | 99 482 195 | 99 481 385 | 99 481 464 | — | Rfam: hsa-miR-380 |
| miR-368 | 14 (+) | 99 496 068 | 99 496 133 | 99 496 814 | 99 496 913 | — | Rfam: mmu-miR-376b |
| miR-369 | 14 (+) | 99 521 976 | 99 522 045 | 99 521 825 | 99 521 915 | — | Rfam: mmu-miR-412 |
| 99 522 290 | 99 522 369 | — | Rfam: mmu-miR-410 | ||||
aThe precursor coordinates are listed. When the predicted miRNA is in the vicinity of a previously known miRNA cluster, the coordinates of the whole cluster are listed, from the initial coordinate of the precursor of the first miRNA to the end coordinate of the precursor of the last miRNA. MiRNAs from the MH group are marked. The cluster-founding miRNA sequences and their precursor sequences are listed in Supplementary Tables 4 and 5, respectively.
bThe chromosome number, strand and coordinates were taken from the UCSC July 2003 human genome assembly build 34 (hg16) ().
cThe coordinates of predicted miRNAs are on the same chromosome and strand as the known cluster member/s. Coordinates in bold designate new predictions that were submitted to Rfam since their similarity to a known ortholog was very high. They were named hsa-miR-376b, hsa-miR-412 and hsa-miR-410 respective to their listed order.
dCloned miRNAs were named following Rfam convention. miRNA names in bold designate miRNAs that were submitted to Rfam either as novel miRNAs or as new human orthologs. Hsa-miR-429 was submitted to Rfam by (37) while this paper was submitted. Hsa-miR-376a, hsa-miR-379, hsa-miR-377, hsa-miR-382 and hsa-miR-425 were each listed in Rfam as a miRNA confirmed by similarity. Here, we present experimental evidence for the existence of these miRNAs. miRNAs that were identified from both sides of the precursor stem and matched our predictions were designated with 3p and 5p. The cloned miRNA sequences and their predicted precursor sequences are listed in Supplementary Tables 4 and 5, respectively.
eThere are three types of ‘by similarity’ supporting evidence: (i) Similarity to miRNAs in other mammals not recorded in Rfam (ii) Similarity to miRNAs in other mammals where only non-human orthologs are recorded in Rfam. (iii) Similarity to miRNAs in other mammals with a human (‘hsa’) ortholog recorded in Rfam. All the ‘hsa’ Rfam entries that are presented here as supporting the predictions were not included in our analysis as they are new entries of the miRNA registry 5.1 (19). All these entries do not have direct experimental evidence in human and they are tagged in Rfam as ‘not_experimental’. However they are regarded as human miRNAs ‘by similarity’.
Cloning frequencies of experimentally verified newly predicted human miRNAs clustered with known human miRNAsa
| Cluster-founding miRNAs | miRNA | Cell line/Tissue | |||
|---|---|---|---|---|---|
| Pituitary gland | MCF7 | SkBr3 | BE(2)-M17 | ||
| miR-200b, miR-200a | miR-429 | 2 | 2 | — | — |
| miR-200a | 9 | 7 | 1 | — | |
| miR-200a* | 1 | — | — | — | |
| miR-200b | 9 | 7 | 1 | — | |
| miR-368 | miR-368 | 10 | 2 | — | 1 |
| miR-376a-3p | 3 | — | — | — | |
| miR-369 | miR-409–5p | 2 | — | — | — |
| miR-409–3p | — | — | — | 1 | |
| miR-369–5p | 1 | — | — | — | |
| miR-369–3p | 1 | — | — | — | |
| miR-144 | miR-451 | 20 | — | — | — |
| miR-144 | — | — | — | — | |
| miR-224 | miR-452 | — | — | 1 | — |
| miR-224 | — | — | 1 | — | |
| miR-191 | miR-425–5p | 1 | 3 | 5 | 1 |
| miR-425–3p | — | — | 3 | — | |
| miR-191 | 3 | 5 | 16 | — | |
| miR-191* | — | — | — | 1 | |
| miR-127, miR-136 | miR-431 | — | — | — | 3 |
| miR-433 | 1 | — | — | — | |
| miR-127 | 4 | — | — | 1 | |
| miR-136 | 2 | — | — | 31 | |
| miR-299, miR-323 | miR-329 | 1 | — | — | — |
| miR-379 | 4 | 1 | — | 30 | |
| miR-299–3p | — | — | — | 1 | |
| miR-299–5p | — | — | — | — | |
| miR-323 | — | — | — | — | |
| miR-134 | miR-453 | — | — | — | 1 |
| miR-382 | — | — | — | 1 | |
| miR-134 | 2 | — | — | 4 | |
| miR-154 | miR-377 | 4 | — | — | 13 |
| miR-154 | 5 | — | — | 2 | |
| Total miRNA clones | 1502 | 767 | 794 | 616 | |
aGiven are the absolute numbers of cloned sequences. The total number of miRNA sequences in the library is indicated at the bottom.
bThe miRNA sequences and the precursor sequences are listed in Supplementary Tables 4 and 5, respectively.
cMCF7 and SkBr3 are human breast cancer cell lines. BE(2)-M17 is a human neuroblastoma cell line.