Literature DB >> 16845048

ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs.

Jin-Wu Nam¹, Jinhan Kim, Sung-Kyu Kim, Byoung-Tak Zhang.

Abstract

ProMiR is a web-based service for the prediction of potential microRNAs (miRNAs) in a query sequence of 60-150 nt, using a probabilistic colearning model. Identification of miRNAs requires a computational method to predict clustered and nonclustered, conserved and nonconserved miRNAs in various species. Here we present an improved version of ProMiR for identifying new clusters near known or unknown miRNAs. This new version, ProMiR II, integrates additional evidence, such as free energy data, G/C ratio, conservation score and entropy of candidate sequences, for more controllable prediction of miRNAs in mouse and human genomes. It also provides a wider range of services, e.g. the prediction of miRNA genes in long nonrelated sequences such as viral genomes. Importantly, we have validated this method using several case studies. All data used in ProMiR II are structured in the MySQL database for efficient analysis. The ProMiR II web server is available at http://cbit.snu.ac.kr/~ProMiR2/.

Entities: Disease Species

Mesh：

Substances：
MicroRNAs

Year: 2006 PMID： 16845048 PMCID： PMC1538778 DOI： 10.1093/nar/gkl321

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

MicroRNAs (miRNAs) constitute a large family of noncoding RNAs, which take part directly in posttranscriptional regulation either by arresting the translation of mRNAs or by their cleavage (1). miRNAs are defined as single-stranded RNAs of ∼22 nt in length (range 19–25 nt) generated from endogenous transcripts that can form local hairpin structures (2). Since the discovery of lin-4 and let-7, efforts to identify miRNA genes have led to the discovery of hundreds of miRNAs in animals, plants and viruses (3–6). All of them have been archived in miRBase (). High-throughput miRNA identification has been accomplished by directional cloning of endogenous small RNAs (7,8). However, a limitation of this approach is that miRNAs expressed at low levels or only in a specific condition or specific cell types are difficult to detect. Computational approaches can overcome this problem, at least in part. They are based on the structural and sequential characteristics of miRNA precursors. Previous computational approaches for miRNA prediction have mainly searched for miRNAs that are closely homologous to published miRNAs (9–11). However, such methods failed to detect any new families that lacked clear homologues. In particular, several miRNAs with genus-specific patterns require a method to predict unrelated miRNA genes. Several approaches have been proposed to search for new miRNA families using comparative genomics, based on regulatory motifs in conserved DNA and with patterns conserved among the sequences and structures of previously studied distant families (12–14). ProMiR has been used successfully to predict an miRNA in a stem–loop sequence using a score generated by a probabilistic colearning model without any other evidence (15). Here we introduce an improved method to identify the conserved and nonconserved miRNAs near known miRNAs or candidates. This strategy is very useful because more than half of the known miRNA genes are present as tandem arrays within operon-like clusters. This new version, ProMiR II, generates a list of nearby potential miRNAs according to score and to several filtering criteria such as conservation score, entropy, G/C ratio and free energy. This enhanced method allows for low- or high-stringency prediction of conserved and nonconserved miRNA genes by adjusting the filtering criteria. Importantly, we have used it to validate the prediction of miRNA genes through two case studies.

SYSTEM SPECIFICATION

The ProMiR II web interface is implemented on a Linux server using PHP scripting. The core module of ProMiR, a probabilistic colearning model, is written in Java version 1.4.2. It uses the library of the program ‘RNAfold’ to predict the folding of a primary RNA sequence (Vienna RNA package version 1.6) (16). For efficient analysis and management, all data and information are stored in a MySQL database (version 5.0). The system runs on two dual 2.2 GHz OPTERON CPUs with four 1 GB RAM modules.

PRINCIPLE OF PROGRAM

ProMiR II is a web-based tool that searches for potential miRNAs in a given sequence or in its vicinity. It provides three programs: ProMiR-v, ProMiR-c and ProMiR-g. They include both common and different procedures to accomplish each purpose. ProMiR-v searches for clusters of miRNAs near a known miRNA sequence. It maps them on one of two genome assemblies: human (hg17) or mouse (mm7) with known miRNAs and genes. ProMiR-c predicts clustered miRNAs near an miRNA candidate. It also maps predicted miRNAs on one of the two genome assemblies, as does ProMiR-v. If there are clustered miRNAs, the initial candidate is tagged as a likely ‘real’ miRNA. ProMiR-v and ProMiR-c perform predictions of human and mouse miRNAs, respectively. ProMiR-g is a general version of ProMiR (), which searches for an miRNA in a stem–loop sequence. ProMiR-g provides the prediction of all potential miRNAs in a long sequence within various model species: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus, Drosophila melanogaster, Drosophila pseudoobscura, Caenorhabditis elegans and Caenorhabditis briggsae. The three programs all extract stem–loops based on the filtering parameters by scanning a given sequence with a predefined window size (range 70–150 nt) and a given shift size (range 3–10 nt). The orientation of a given sequence is determined according to the orientation of the input query (a known miRNA or a candidate sequence) in ProMiR-v and ProMiR-c. During the scanning sequence, they search for miRNA candidates beyond a set threshold of the ProMiR score, which is generated by a probabilistic model learned here with real training data based on published miRNAs (miRBase release 7.0; ). In addition, ProMiR-v and ProMiR-c can find both conserved and nonconserved miRNAs across the human and mouse genome using conserved sequence information; however, ProMiR-g does not use this because it searches for unrelated miRNAs on a given sequence. For genome mapping, ProMiR-v retrieves the genome coordination information of known miRNAs from the MySQL database, but ProMiR-c takes the position of a query sequence on a genome by BLAT searching ().

INPUT DESIGN

The interface of the program is shown in Figure 1. The user is required to enter different input queries according to each program. For ProMiR-v, the user selects a species (human or mouse) and one of the known miRNAs in the list box (based on miRBase release version 8.0), and enters a range to define the vicinity (up to ±10 kb). For ProMiR-c, a species is selected and a candidate sequence of 70–150 nt is input as plain text, and the range of the vicinity is then set. For ProMiR-g, a long sequence (from 70 nt to 10 kb) should be entered as plain text and one of eight species is selected as the model. In ProMiR-c and ProMiR-g, the input sequence should consist of only four bases: A, T(U), G and C. No other characters are allowed. For all programs, the user also needs to set filtering parameters and a threshold for the ProMiR score. The filtering step contains four parameters: minimum free energy (MFE), GC-ratio, entropy and conservation score (Cscore). The MFE is the cutoff value for the MFE of a stem–loop structure. The default value is −25 kcal/mol. The MFE guarantees the extraction of stem–loops with sufficient length. The G/C ratio and entropy settings filter out stem–loops made of simple repeats. The default G/C ratio ranges from 0.3 to 0.7, covering the values for most published pre-miRNAs. Entropy is entered as Shannon's entropy value, ranging from 0 to 2 (17), with a default threshold of 1.8. The Cscore uses phastCons scores for multiple alignments of eight vertebrate genomes: human (hg17), chimp (panTro1), dog (canFam1), mouse (mm5), rat (rn3), chicken (galGal2), zebrafish (danRer1) and fugu (fr1), as defined by Siepel et al. (18). The range of Cscore is from 0 to 1. If the Cscore is 0, ProMiR II will search for both conserved and nonconserved miRNAs. Otherwise, it will look for conserved miRNAs. The default Cscore is 0. ProMiR-g does not use conserved sequence information. The distribution of each parameter for published miRNAs is shown in Supplementary Figure S1.

Figure 1

The input page of ProMiR II. Use is demonstrated in the online tutorial page ().

ProMiR generates a score for the classification of a stem–loop. If its score is bigger than the given threshold, then ProMiR predicts that it should be an miRNA candidate. The higher the threshold the greater the specificity of classification: the lower the threshold the greater the sensitivity, as shown in the receiver operating characteristic (ROC; Supplementary Figure S2) curve. The default threshold value is 0.033.

SYSTEM OUTPUT DESIGN

ProMiR II produces three reports (Figure 2). The first is a summary of input parameters. The next shows predicted miRNAs, known miRNAs and genes on a map. In the last, a list of miRNA candidates is displayed in order of position. The information shown for each predicted miRNA candidate includes its position, its sequence and a note. More detailed information including parameter values and a secondary structure is described in a page linked online.

Figure 2

The output page of ProMiR II. This is also explained in the online tutorial page ().

EXAMPLES

Clustered mouse miRNAs

To test if there are clustered miRNAs in the vicinity of a new mouse miRNA, identified by cloning and northern blotting, we applied ProMiR-c with a threshold of ProMiR score 0.017 and the default values of conservation score, entropy, MFE and G/C ratio. The search range was ±10 kb at the position of the new miRNA. The window and shift sizes were 100 and 5 nt, respectively. The program found five upstream and four known downstream clustered miRNAs, and predicted six new clustered miRNA candidates. The results are summarized in Supplementary Figure S4.

Nonrelated viral miRNAs

We analyzed a genome sequence of the human cytomegalovirus (HCMV; complete genome of strain AD169; GenBank accession no. X17403) to search for potential miRNAs using ProMiR-g. HCMV is a member of the Herpes viral family and has a double-stranded DNA genome of 229 354 bp (19). Nine miRNAs have been identified to date. Because HCMV does not have genes related to miRNA processing, it must use human genes when infecting human immune cells. Thus, because we could assume that it has the same recognition and processing mechanisms, we used the human miRNAs as training data to search for HCMV miRNAs. ProMiR-g predicted 51 candidates using a threshold ProMiR score of 0.01 and the default values of entropy, MFE and GC-ratio. The window and shift sizes were 100 and 10 nt, respectively. The candidates include five of nine published miRNAs (hcmv-mir-UL36-1, hcmv-mir-UL112-1, hcmv-mir-US5-1, hcmv-mir-US5-2 and hcmv-mir-US33-1). Results are detailed in the Supplementary Data.

DISCUSSION

ProMiR is applicable to all species given sufficient training data, and searches for related and unrelated miRNAs. Evaluation of ProMiR was performed by plotting ROCs using 5-fold cross-validation according to 15 classification thresholds (Supplementary Figure S2). ProMiR showed good performance in six species, excluding the Caenorhabditis genus.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

17 in total

1. Identification of novel genes coding for small expressed RNAs.

Authors: M Lagos-Quintana; R Rauhut; W Lendeckel; T Tuschl
Journal: Science Date: 2001-10-26 Impact factor: 47.728

Review 2. MicroRNAs: genomics, biogenesis, mechanism, and function.

Authors: David P Bartel
Journal: Cell Date: 2004-01-23 Impact factor: 41.582

3. The microRNA Registry.

Authors: Sam Griffiths-Jones
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. Vienna RNA secondary structure server.

Authors: Ivo L Hofacker
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

5. Identification of microRNAs and other tiny noncoding RNAs by cDNA cloning.

Authors: Victor Ambros; Rosalind C Lee
Journal: Methods Mol Biol Date: 2004

6. Phylogenetic shadowing and computational identification of human microRNA genes.

Authors: Eugene Berezikov; Victor Guryev; José van de Belt; Erno Wienholds; Ronald H A Plasterk; Edwin Cuppen
Journal: Cell Date: 2005-01-14 Impact factor: 41.582

7. An extensive class of small RNAs in Caenorhabditis elegans.

Authors: R C Lee; V Ambros
Journal: Science Date: 2001-10-26 Impact factor: 47.728

8. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors: Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal: Genome Res Date: 2005-07-15 Impact factor: 9.043

Review 9. The human cytomegalovirus.

Authors: Santo Landolfo; Marisa Gariglio; Giorgio Gribaudo; David Lembo
Journal: Pharmacol Ther Date: 2003-06 Impact factor: 12.310

10. Computational identification of Drosophila microRNA genes.

Authors: Eric C Lai; Pavel Tomancak; Robert W Williams; Gerald M Rubin
Journal: Genome Biol Date: 2003-06-30 Impact factor: 13.583

18 in total

1. An insect virus-encoded microRNA regulates viral replication.

Authors: Mazhar Hussain; Ryan J Taft; Sassan Asgari
Journal: J Virol Date: 2008-07-09 Impact factor: 5.103

Review 2. Computational approaches for microRNA studies: a review.

Authors: Li Li; Jianzhen Xu; Deyin Yang; Xiaorong Tan; Hongfei Wang
Journal: Mamm Genome Date: 2009-12-15 Impact factor: 2.957

Review 3. Computational methods in noncoding RNA research.

Authors: Ariane Machado-Lima; Hernando A del Portillo; Alan Mitchell Durham
Journal: J Math Biol Date: 2007-09-04 Impact factor: 2.259

4. Fine tuning of auxin signaling by miRNAs.

Authors: Preeti Singh Teotia; Sunil Kumar Mukherjee; Neeti Sanan Mishra
Journal: Physiol Mol Biol Plants Date: 2008-06-15

Review 5. Computational Detection of Pre-microRNAs.

Authors: Müşerref Duygu Saçar Demirci
Journal: Methods Mol Biol Date: 2022

6. Association study of SNAP25 and schizophrenia in Irish family and case-control samples.

Authors: A H Fanous; Z Zhao; E J C G van den Oord; B S Maher; D L Thiselton; S E Bergen; B Wormley; T Bigdeli; R L Amdur; F A O'Neill; D Walsh; K S Kendler; B P Riley
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2010-03-05 Impact factor: 3.568

7. UCbase & miRfunc: a database of ultraconserved sequences and microRNA function.

Authors: Cristian Taccioli; Enrica Fabbri; Rosa Visone; Stefano Volinia; George A Calin; Louise Y Fong; Roberto Gambari; Arianna Bottoni; Mario Acunzo; John Hagan; Marilena V Iorio; Claudia Piovan; Giulia Romano; Carlo Maria Croce
Journal: Nucleic Acids Res Date: 2008-10-22 Impact factor: 16.971

8. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

Authors: Michael Hackenberg; Martin Sturm; David Langenberger; Juan Manuel Falcón-Pérez; Ana M Aransay
Journal: Nucleic Acids Res Date: 2009-05-11 Impact factor: 16.971

9. Filtering of false positive microRNA candidates by a clustering-based approach.

Authors: Wing-Sze Leung; Marie C M Lin; David W Cheung; S M Yiu
Journal: BMC Bioinformatics Date: 2008-12-12 Impact factor: 3.169

10. Prediction of novel microRNA genes in cancer-associated genomic regions--a combined computational and experimental approach.

Authors: Anastasis Oulas; Alexandra Boutla; Katerina Gkirtzou; Martin Reczko; Kriton Kalantidis; Panayiota Poirazi
Journal: Nucleic Acids Res Date: 2009-03-25 Impact factor: 16.971