| Literature DB >> 23444140 |
Anthony Mathelier1, Alessandra Carbone.
Abstract
MicroRNAs (miRNAs) can group together along the human genome to form stable secondary structures made of several hairpins hosting miRNAs in their stems. The few known examples of such structures are all involved in cancer development. A large scale computational analysis of human chromosomes crossing sequence analysis and deep sequencing data revealed the presence of >400 structural clusters of miRNAs in the human genome. An a posteriori analysis validates predictions as bona fide miRNAs. A functional analysis of structural clusters position along the chromosomes co-localizes them with genes involved in several key cellular processes like immune systems, sensory systems, signal transduction and development. Immune systems diseases, infectious diseases and neurodegenerative diseases are characterized by genes that are especially well organized around structural clusters of miRNAs. Target genes functional analysis strongly supports a regulatory role of most predicted miRNAs and, notably, a strong involvement of predicted miRNAs in the regulation of cancer pathways. This analysis provides new fundamental insights on the genomic organization of miRNAs in human chromosomes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23444140 PMCID: PMC3632110 DOI: 10.1093/nar/gkt112
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Four examples of structural clusters predicted by the algorithm. (a): structural cluster, known as mir-17-92, predicted on human chromosome 13 from deep-sequencing data. (b): structural cluster, known as mir-106a-363, predicted on human chromosome X from deep-sequencing data. (c): Structural cluster predicted on human chromosome 19 from paralogous sequences. (d): Structural cluster predicted on human chromosome 22 by combining paralogous sequences and deep-sequencing data. In (a) and (b), miRNAs validated by the algorithm are highlighted in blue. In (c) and (d), miRNAs validated by the algorithm are highlighted in red (similar sequences) or blue (deep-sequencing reads). All structural clusters were filtered with RepeatMasker.
Figure 2.MIReStruC: an algorithm searching for miRNA structural clusters along a genome. The search starts either from repeated (similar) sequences in palindromic regions (black path) or from deep-sequencing data (red path). Predictions can also be made by combining the two kinds of information (green path).
Structural cluster predictions on human chromosomes
| Method | SC | Known miRNAs in SC seq | SC with seed | ||
|---|---|---|---|---|---|
| Total | Intron | Inter | |||
| Paral | 300 | 142 | 158 | 37 (16) | 179 (64) |
| Deep | 99 | 66 | 33 | 88 (43) | 84 (32) |
| Comb | 20 | 10 | 10 | 0 (0) | 14 (1) |
| All methods | 416 | 217 | 199 | 89 (43) | 276 (96) |
Predictions are realized with the three paths of the algorithm, respectively. based on: paralogous sequences (paral), deep-sequencing reads (deep) and a combination of the two kinds of data (comb). The total number of predicted structural clusters (SCs; total), the number of predicted SCs lying in intronic regions (intron) and the number of predicted SCs lying in intergenic regions (inter) are reported for each method. The number of known miRNAs (with 100% sequence identity) occurring in predicted SC sequences and the number of SCs containing at least one predicted miRNA with same seed as in known miRNAs are also reported (last two columns). (Recall that two miRNAs have the same seed if their nucleotides at positions 2–8 are the same.) The number of known miRNAs is computed on the miRBase data set. The number of known human miRNAs is given in parenthesis. A full set of information, organized by chromosome, is reported in Supplementary Table S3. The total number of predictions obtained by the three methods is reported in the last line. Identical predictions (see ‘Materials and Methods’ section) are counted once. See Supplementary Figure S3.
Best miRNA/target hits localized in 3′UTR or CDS regions by miRanda
| Sets of pairs | Number of pairs | Number of 3′UTR | Number of miRNA | Number of SCs | DAVID IDs | GO-BP | GO-MF | KEGG | PIR | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| miRNA/3′UTR pairs | |||||||||||||
| Large | 394 005 | 20 264 | 1263 | 349 | 11 087 | 608 | 244 | 142 | 70 | 63 | 41 | 139 | 81 |
| Small | 50 805 | 9316 | 623 | 229 | 5141 | 406 | 132 | 85 | 22 | 48 | 22 | 87 | 45 |
For 3′UTR and CDS regions, we report the number of miRNA/target pairs, the number of 3′UTRs or CDSs containing targets, the number of miRNAs for which targets are predicted, the number of different predicted structural clusters (416) with at least a miRNA that has a target, the number of different genes involving targets (DAVID_IDs), the number of GO terms characterizing the best miRNA/target pairs and found in BP and MF GO classes. The same analysis is reported for the KEGG data set and the PIR keywords of Swiss-Prot. For the four data sets, GO, KEGG and PIR terms with P-value, or are counted.
Structural clusters, miRNAs and targets
| Sets | 3′UTR | CDS | ||||
|---|---|---|---|---|---|---|
| Total | Total | |||||
| Large | 349 | 319 | 280 | 342 | 315 | 275 |
| Small | 229 | 177 | 121 | 213 | 158 | 116 |
The number of predicted structural clusters containing at least two or three miRNAs that target either 3′UTR or CDS regions is reported. Both large and small sets of miRNA/3′UTR and miRNA/CDS pairs are considered and for those the total number of predicted structural clusters with at least one miRNA that targets 3′UTR or CDS regions is given.
Pathways containing genes whose 3′UTR regions is targeted by some predicted miRNA
Functional analysis is realized on the large set of pairs. For each pathway, the number of genes of the pathway that are targeted by some predicted miRNA, P-value, fold enrichment and Benjamini-corrected P-value are reported. The most significant outcomes are listed for GO (BP and MF), KEGG and Swiss-Prot databases. Pathways related to regulation (orange), binding (blue), signalling (pink), cancer (green) are highlighted.
Functional analysis of structural clusters
For each pathway, the number of mRNAs in the pathway containing at least one target and the number of miRNAs with at least one target in these mRNAs are reported in the last two columns. For the set of miRNAs targeting genes associated to a specific pathway, we report the number of structural clusters (SCs) containing at least one of the miRNAs in the set (second column), the number of SCs with all their miRNAs in the set (thirrd column) and the ratio of these two numbers (fourth column). Pathways with a ratio % (blue) and % (green) are highlighted. In the second and third columns, the numbers in parenthesis correspond to structural clusters predicted with deep-sequencing data. Pathways correspond to those in Table 4 and Supplementary Table S27.
Figure 3.Analysis of KEGG classes coverage by structural cluster regions. Curves associated to the subclass of immune system diseases (a), of sensory systems (b), and olfactory transduction (c). Each arrow corresponds to the point in the curve with largest distance from the corresponding random curve. P-values for these arrow points are . See Supplementary Table S8. Curves are constructed by interpolating on all δ values (see ‘Materials and Methods’ section for the definition of δ steps). Comparison is realized with random curves (dotted curves; see ‘Materials and Methods’ section for randomized gene selection).
Figure 4.Distribution of gene coverage by fragile sites and by structural cluster regions. Coverage is computed for all biological pathways defined in KEGG and containing at least five genes. Fragile sites cover the 26.38% of chromosomes, and structural cluster regions are set to cover the 26.40%. Structural cluster regions (dotted line) better cover genes in KEGG biological pathways than fragile sites (solid line). See also Supplementary Table S30 and Figure S13.