| Literature DB >> 16813663 |
Sung-Chou Li1, Chao-Yu Pan, Wen-chang Lin.
Abstract
BACKGROUND: MicroRNAs (miRNAs) function in many physiological processes, and their discovery is beneficial for further studying their physiological functions. However, many of the miRNAs predicted from genomic sequences have not been experimentally validated to be authentic expressed RNA transcripts, thereby decreasing the reliability of miRNA discovery. To overcome this problem, we examined expressed transcripts - ESTs and intronic sequences - to identify novel miRNAs as well as their target genes.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16813663 PMCID: PMC1526439 DOI: 10.1186/1471-2164-7-164
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Illustration of how to infer putative miRNAs. After applying Srnaloop, we noted the positions of the first nucleotide (Pfn) and the terminal nucleotide (Ptn) of the terminal loop. We elongated each putative miRNA by two nucleotides at each end. By doing so, we acquired 26-nt putative miRNAs, each of which is located between (Pfn-24, Pfn+1) or (Ptn+1, Ptn+26) within each candidate hairpin.
Figure 2Evaluation of inferring putative miRNAs and sequence comparison among mir-192 orthologs from distinct species. (a) We inferred putative miRNAs from 195 known pre-miRNAs (detected by Srnaloop). We then compared the sequences of the known mature miRNAs with those of the putative miRNAs. The results show that 93.8% of the known miRNAs were almost entirely included in the putative miRNAs inferred from their corresponding precursors. This high level of coverage enabled us to use the putative miRNA sequences for the conservation examination. (b) Mir-192 distributes in human (hsa), mouse (mmu) and rat (rno). Using ClustalW [30], we compared sequences of mir-192 orthologs. The alignment shows that most of the mismatches occur in the terminal loop, the opposite arm and the external portion of the hairpins. Besides the mature functional sequences, the entire pre-miRNA sequences are also highly conserved.
Distributions and optimal ranges of pre-miRNA quantifiable features. We calculated the distributions of quantifiable features, namely GC content, core mfe, hairpin mfe and the ratio of core mfe to hairpin mfe. Because of the existence of extreme values, we adopted the reference value rather than the original distribution ranges in the Sequence & Structural features filter. This strategy led to 90% coverage and 86.5% sensitivity.
| Distribution | 21 ~ 68 | -42.5 ~ -11.2 | -56.1 ~ -24.02 | 36 ~ 96 |
| Reference value | 30 ~ 60 | -42.5 ~ -17.0 | -50.0 ~ -24.02 | 50 ~ 96 |
| Coverage | 182/195 = 93% | 193/195 = 99% | 193/195 = 99% | 193/195 = 99% |
| Total coverage | 179/195 = 90% | Sensitivity | 179/207 = 86.5% | |
Statistics of candidates from different result sets. We started with 842,212 ESTs and 209,904 intronic sequences, within which there were originally 26 and 60 pre-miRNAs, respectively. After conservation examination, finding the target, and checking the conservation of target genes, we obtained HMDR, HMDR_Target(H), HMDR_Target(M), HMDR_Target(R), HMDR_Target(HM) and HMDR_Target(HR) sets and calculated the specificity and sensitivity for each set. HMDR are the pre-miRNA candidates conserved in all four genomes (human, mouse, dog and rat). HMDR_Target(H) represents the pre-miRNA candidates found in the HMDR set and also found to have human target genes. HMDR_Target(M) represents the pre-miRNA candidates found in HMDR set and also found to have mouse target genes. HMDR_Target(R) represents the pre-miRNA candidates found in HMDR set and also found to have rat target genes. HMDR_Target(HM) represents the pre-miRNA candidates found in HMDR set and also found to have human and mouse target genes in orthologous pairs. HMDR_Target(HR) represents the pre-miRNA candidates found in HMDR set and also found to have human and rat target genes in orthologous pairs.
| miRNA Candidate Set | Our Candidate | Known miRNA | Our Candidate | Known miRNA | Our Candidate | Known miRNA | Sensitivity | Specificity |
| HMDR | ||||||||
| HMDR_Target(H) | ||||||||
| HMDR_Target(M) | ||||||||
| HMDR_Target(R) | ||||||||
| HMDR_Target(HM) | ||||||||
| HMDR_Target(HR) | ||||||||
Sensitivity test on 130 newly published pre-miRNAs. We tested the sensitivity of applying the same criteria, derived from the 207 original pre-miRNAs, on 130 newly published pre-miRNAs based on release 8.0. After the hairpin finding procedure, 116 of the 130 input pre-miRNAs were detected. We calculated the distributions of quantifiable features, namely GC content, core mfe, hairpin mfe and the ratio of core mfe to hairpin mfe. Such criteria in the Sequence & Structural features filter led to 85% sensitivity, similar to the result obtained from the test on the original 207 pre-miRNAs.
| Distribution | 22 ~ 72 | -46.1 ~ -6.8 | -56.1 ~ -13.3 | 0.36 ~ 0.96 |
| Reference value | 30 ~ 60 | -42.5 ~ -17.0 | -50.0 ~ -24.02 | 0.50 ~ 0.96 |
| Coverage | 114/116 = 98% | 112/116 = 97% | 112/116 = 97% | 115/116 = 99% |
| Coverage | 110/116 = 95% | Sensitivity | 110/130 = 85% | |
Figure 3Illustration of TDL_miRBase dataset report interface. (a) The complete information report for a candidate hairpin includes host gene, host gene NM accession number (for intronic candidates), genomic location, expression level and match to known miRNAs. The score and minimum free energy (mfe) are the output results from Srnaloop and RNAfold, respectively. (b) Target gene information for a candidate from hairpin Ih788. Target genes were discovered by RNAhybrid and pre-defined conserved motif seeds as described in the text. Optimal free energy and RNA duplex mfe are the output values of RNAhybrid. The GO information of the Ih788 host gene and one of its target genes are displayed in (c). (d) Orthologous target genes report. Some of the target genes were found to be orthologous pairs according to Ensembl gene information. They are displayed as human-mouse or human-rat pairs.