Literature DB >> 23703208

ComiR: Combinatorial microRNA target prediction tool.

Claudia Coronnello1, Panayiotis V Benos.   

Abstract

ComiR is a web tool for combinatorial microRNA (miRNA) target prediction. Given an messenger RNA (mRNA) in human, mouse, fly or worm genomes, ComiR computes the potential of being targeted by a set of miRNAs, each of which can have zero, one or more targets on its 3'untranslated region. In determining the regulatory potential of an mRNA from a set of miRNAs, ComiR uses user-provided miRNA expression levels in a combination of appropriate thermodynamic modeling and machine learning techniques to make more accurate predictions. For each gene, ComiR returns the probability of being a functional target of a set of miRNAs, which depends on the relative miRNA expression levels. The tool provides a user-friendly interface to input a miRNA expression table containing many sample information and filter out the most relevant miRNAs. ComiR results can be downloaded or visualized on a table, which can then be used to select the most relevant targets and to compare the results obtained with different miRNA expression input. ComiR is freely available for academic use at http://www.benoslab.pitt.edu/comir/.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23703208      PMCID: PMC3692082          DOI: 10.1093/nar/gkt379

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

MicroRNAs (miRNA) are a class of short (18–25 nucleotide) non-coding RNAs that regulate gene expression post-transcriptionally. Their regulatory activity depends heavily on the recognition of binding sites located mainly on the 3′-untranslated regions (3′UTRs) of target messenger RNA (mRNA) (1). Existing computational tools predict the miRNA targets by considering site-specific factors of target sites [see (2–4) for some examples]. These tools, tested against the available experimentally validated miRNA-target pairs each, were able to predict only a portion of them, while their target overlap remains poor (5,6). However, the efficiency of miRNA-mediated regulation can be affected by multiple system-wide factors (7,8), such as miRNA expression, combinatorial binding of multiple miRNA targets or the expression of competitive endogenous RNA. ComiR addresses two of them, namely how miRNA expression affects binding and how combinatorial miRNA binding affects mRNA regulation. We note that some recent algorithms also consider these factors when they decide on targets: PicTar (9) is a computational method for identifying common targets of miRNAs, but it does not consider the miRNA expression in determining the relative binding; GenMir++ (10), TargetMiner (11) and TaLasso (12) evaluate the relevance of miRNA:mRNA interactions by analyzing expression profiles and prior targeting information, but they require paired mRNA and miRNA expression profiles over many samples. Furthermore, none of them incorporates miRNA expression in the binding model. Recent experimental approaches, like those based on immunoprecipitation (IP) of miRISC proteins (RNA-induced silencing complex) (13,14), generate validated miRNA:mRNA target data in a high-throughput fashion. These data are useful to unravel the systemic aspects of miRNA targeting. Indeed, miRNA expression profiles are now known in several tissues and cell lines (15) and as a result it becomes more important to know the targets of a set of miRNA genes instead of individual miRNAs. Of course, one might infer the targets of a set from the targets of its members, but as we have recently shown, a naïve inference model does not produce good results (16). To this end, we developed ComiR (http://www.benoslab.pitt.edu/comir/), an algorithm that predicts the regulatory potential of a set of miRNA genes toward a given mRNA. ComiR initially integrates miRNA expression in a scoring scheme for each single target on this mRNA, and then it additively combines the weighted scores of the single targets. ComiR uses four popular scoring schemes [the ones used by miRanda (4), PITA (2), TargetScan (3) and mirSVR (17)] and at the final stage it combines all four integrative scorings into one using a support vector machine (SVM) trained on Drosophila AGO1 IP data (18). The end result is the SVM probability, which represents the likelihood that this mRNA is regulated by the set of miRNA genes. Our results (16) have shown that ComiR presents a significant improvement over standard miRNA prediction algorithms adapted for combinatorial miRNA targeting.

TOOL DESCRIPTION

General framework

ComiR is a tool designed to predict the targets of a set of miRNAs by considering the miRNA expression levels as an integral part of the target decision process. The user has the option to add his/her own target set of genes (e.g. 3′UTRs of mRNAs, LINCs) to the default list of genes considered for the available species (Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans) Below, without loss of generality, references to ‘target sequences’ or ‘mRNAs’ should include the 3′UTRs of the mRNAs together with any other custom target sequence set the user submits. For each mRNA, ComiR determines the potential of being targeted by the miRNA set. The targeting potential is calculated in two steps. In the first step, four different methods are used that complement each other to some extent. One method is an adaptation of miRanda (4), in which the probability of an mRNA:miRNA binding is calculated based on the Fermi–Dirac equation (16,19), which takes into consideration the miRNA expression, and the individual probabilities are summed over all targets on this mRNA of all miRNAs in the miRNA set. The second method is a similar adaptation of PITA (2), in which the Fermi–Dirac equation also substitutes the standard energies. As the third method, we use the TargetScan (3) scoring (without conservation) weighted by each miRNA expression level. Finally, mirSVR (20) is also used, but its scores are also combined after they have been weighted by miRNA expression. In the second step, we combine the predictions of the above four methods with a SVM, which is trained on a high-quality data set derived from AGO1 IP experiments in D.melanogaster. It has been shown (16) that this model can be efficiently applied to predict miRNA targets in other species, after a normalization step. ComiR’s output is a ranked list of genes based on the target probability computed through the SVM model. Because the score computation includes the miRNAs expression values, the same set of miRNAs might generate different ranking if different miRNA expression levels are used. ComiR’s pipeline is shown in Figure 1.
Figure 1.

Overview of the integrated analysis in ComiR.

Overview of the integrated analysis in ComiR.

Databases and pre-computed binding scores

The ComiR help page provides links to the FASTA files with the mRNA and miRNA sequences used to run ComiR. We restricted our analysis on the mRNA 3′UTR sequences that we downloaded from the ENSEMBL Web site (as in 20 September 2012). When more than one 3′UTR sequence is associated to the same ENSEMBL gene ID, we use the longest sequence. miRNAs mature sequences were downloaded from the miRBase Web site (Release 19). An internal ID conversion tool is used to convert the old miRNA IDs with the updated ones. Regardless, we suggest the users to refer to the latest version of miRBase official ID. miRNA binding scores have been pre-computed for every 3′UTR gene sequence according to the original miRanda, PITA, TargetScan and mirSVR scoring systems and they are stored on our server for a fast access. PITA and TargetScan ran with their default parameters. The miRanda’s parameters have been relaxed to include in the predictions the weaker binding sites (score = 0, energy = 0). Finally, we used the mirSVR prediction files labeled as ‘Good mirSVR score, Conserved miRNA’. The single target scores we use as basis to calculate the overall regulatory potential of each miRNA:mRNA pair are the following. (i) miRanda-type scoring: single target score is the binding energy of each binding site. (ii) PITA-type scoring: single target score is the difference between the free energy gained by the binding of the miRNA to the target and the free energy lost by unpairing the target-site nucleotides (ΔΔG) of each binding site. (iii) TargetScan-type scoring: single target score is the number of detected binding sites (no phylogenetic conservation features are used). (iv) mirSVR scoring: single target score is mirSVR score of the predicted target sites.

Input formats

ComiR accepts miRNA expression levels input as comma-separated field table format. The first row contains the sample IDs where miRNA expression levels are detected and the first column contains miRNA IDs. If miRNA expression levels are expressed in log scale (e.g. from microarray measurements), ComiR will internally convert them into the linear scale, provided the user checks the corresponding checkbox. We also allow users to submit a list of miRNA IDs without expression levels, in which case ComiR assumes that all miRNAs are expressed at the same level. Custom mRNA sequences can be inserted with FASTA file format.

Data pre-processing

The input miRNA IDs are first converted into the latest miRBase IDs. The miRNAs that are not found in the stored data set related to the selected species are removed from the analysis and the user is notified. If miRNA expression levels are provided, the miRNA concentrations are computed as the fraction of expression level over the total amount, for each sample. It is possible to focus the analysis on the top expressed miRNA, by selecting the percentage of total miRNA abundance to be covered in the ‘More Options’ panel, and/or a threshold for the minimum expression level to be considered. It is also possible to input a list of miRNAs to be excluded from the filtering, even if their expression level does not reach the desired threshold. The list of miRNAs can be treated in two different ways, i.e. as a set of miRNAs or as single miRNAs. One of the two options can be selected in the ‘More Options’ panel. If the user wants the miRNAs in the list to be considered as a set, ComiR will compute one score for each gene. For multiple samples, ComiR will calculate one score per gene per sample. If the user wants each miRNA in the list to be considered independently, ComiR will compute one set of scores per miRNA, by using a concentration equal to 1 for each of them even if the miRNA input contains expression levels. If custom target sequences are provided, ComiR will run the publicly available scripts of miRanda, PITA and TargetScan tools, to obtain the initial binding energy and target prediction scores and these values will be transformed and combined as we describe above.

Score combination and method integration

As we mentioned above, ComiR uses four complementary methods to assess the binding potential of a miRNA to a single target in each 3′UTR. One is based on simple binding energies, which we take from the miRanda scoring file. The second is based on binding energies after the mRNA secondary structure has been factored in. For this we use the PITA scoring file. The third is the number of string matches of the miRNA seed sequence, which we take from the TargetScan primary predictions. And the fourth method is the prediction score of mirSVR. ComiR transforms these primary scores of individual targets using the miRNA expression and sums the transformed scores to account for multiple miRNA targets. The summing is a simple weighted sum in the case of TargetScan and mirSVR types of scores, whereas in the case of miRanda and PITA types we use the Fermi–Dirac equation. The details are provided in (16). Briefly, the Fermi–Dirac model is the following: where , N is the number of miRNAs in the input set, n is the number of binding sites predicted for the miRNAi:mRNAk pair and for each of them E is the binding energy computed for the binding site j. For the seed matching scores and the mirSVR scores we use a simpler scoring scheme (weighted sum): At the end of this step, each gene is associated with four scores, one for each primary method. These four scores are then used as input to the SVM model, which has been calculated on a D.melanogaster AGO1 IP data set (16). To use the SVM to efficiently predict targets in other species, the algorithm includes a normalization step. Specifically, the input to the SVM is the ranking of the genes with respect to the computed combined scores. This expedient avoids the biases due to different prediction scores distributions in different species and to the variety of miRNAs the user may use as input. Finally, the SVM computed probability is chosen as the ComiR score referring to the probability of an mRNA to be regulated by the set of miRNAs with the specific expression levels.

Submission and wait time

Depending on the size of the input files and the chosen species, the analysis could take anywhere from seconds to up to an hour. As an example, for D.melanogaster predictions with 25 miRNAs, the computing time is roughly 25 s, while the same number of miRNAs in H.sapiens predictions is processed in roughly 5 min. The inclusion of custom mRNA sequences will increase the execution time on the server linearly with the number and the length of the inserted sequences. For instance, processing one custom sequence of 3000 bases and 25 miRNAs requires about 45 extra seconds. After submission, a summary page describing the input and the results of the data pre-processing is shown. If the input setting needs to be adjusted then a ‘Go to Input settings’ button leads back to the input page. Otherwise the ‘Go to Results’ button brings to an execution log page. This page will contain the results once the job finishes and the link to it can be saved so it can be visited later.

ComiR output

The user can decide to download the complete results in a table format or to browse and filter the results table in the web page. The downloadable results zip file contains the Ensembl ID and the Official Gene ID columns followed by the computed ComiR scores columns, one per each miRNA set present in the miRNA input file. If custom mRNA sequences have been inserted, the file will also contain a row for each inserted sequence. The results web page shows a table with the first three columns describing the genes followed by the computed ComiR score columns. The genes are identified by the Ensembl ID, Entrez ID and Official Gene Name. Ensembl IDs and Entrez IDs are linked to the correspondent external database. The user can sort the table rows by any header and in ascending or descending order. It is possible to modify the shown results table by removing all the genes not predicted as targets by any of the input miRNA sets, by choosing the ComiR score threshold above which the genes has to be considered as targets. Finally, if more than one miRNA set is inserted, it is possible to visualize an extra column containing the difference between the ComiR scores obtained with any two miRNA sets. Moreover, if enough data are available, it is possible to perform a Wilcoxon analysis comparing the ranks of the genes in ComiR predictions of two groups of at least three miRNA sets. In fact, it is possible to select the composition of the two groups of samples to be compared and a new column containing the Wilcoxon-test P-value will be shown. In this case, the score difference column will contain the difference between the average ComiR scores in the two groups. The genes with significant P-values are what we call the ‘differentially predicted’ genes. The results visualization and the differentially predicted genes analysis can also be performed by considering the rank of the genes with respect of the associated ComiR score instead of the ComiR score itself.

CASE STUDY SCENARIOS

The primary goal of ComiR is to rank the targets of a set of miRNAs with known expression levels in the sample of interest. In (16) we described some applications of ComiR in this basic direction, where we predicted the targets of D.melanogaster, C.elegans and H.sapiens miRNA sets and showed that ComiR outperforms existing target prediction tools. Moreover, ComiR can be applied to perform more elaborate analysis, and some examples are described above.

Single-nucleotide polymorphisms effects on miRNA binding

Single-nucleotide polymorphisms (SNP) can be located in miRNA binding regions and consequently they could affect gene expression. ComiR can be used to evaluate the SNP’s effect on miRNA regulation by inserting as input the wild type and the SNP mRNA sequence and ranking the evaluated ComiR scores. As an example, in (16) we evaluated the effect of 15 high-frequency SNPs in ERα pathway genes on the affinity of the affected genes with every known human miRNA. We found that rs17737058, located in the 3′UTR of NCOA1 gene, causes the disruption of the binding with the hsa-miR-488* miRNA. This effect has been experimentally validated, and we also found that this SNP is associated with decreased bone mineral density.

Differentially predicted genes

This analysis is thought to compare the target predictions obtained with different miRNA sets. It is, for instance, conceived for data sets containing replicates of miRNA expression levels data in different samples. The differentially predicted genes analysis can be applied as an alternative strategy to predicting the targets of differentially expressed miRNAs. In fact, with the same list of miRNAs, ComiR generates different results, depending on the concentrations associated with the miRNAs. It is worth noticing that while predicting the targets of the differentially expressed miRNAs does not take into account the presence of all the other miRNAs that might prevail over them, ComiR can be used to predict the targets of the top-expressed miRNAs in each sample, and consequently used to search for which genes are highly predicted in one group of samples and not in the other (21).

Single miRNA contribution

ComiR algorithm incorporates the information about all the input miRNAs into a single score. If the user is interested in determining which miRNAs contributed more to the formation of one specific gene’s score, we suggest to run ComiR with the same miRNA set, but with the option ‘single miRNAs’ activated. This analysis will run a separated ComiR analysis for each miRNA in the set. The higher is the contribution of a specific miRNA to the ComiR score associated to the entire miRNA set, the higher will be the ComiR score associated to the single miRNA.

CONCLUSIONS

In recent years, with the availability of high-throughput miRNA expression data, a new perspective for miRNA target prediction is starting to emerge. Many target prediction tools already exist, but none of them takes into account system-level factors, like the co-expression of several miRNAs with sample-specific expression levels. ComiR has been developed specifically to predict the targets of sets of miRNAs, and its user-friendly interface allows researchers to compare the results obtained with groups of them. A number of issues remain unsolved, however, such as the competition of miRNAs for the Ago proteins (22), or the extent of the co-operative effect of miRNAs. As more relevant data sets become available we will extend the miRNA binding models to incorporate these effects.

FUNDING

National Institutes of Health (NIH) - National Library of Medicine (NLM) R01 [LM007994 and LM009657 to P.V.B.]; Fondazione Ri.MED (to C.C.). Funding for open access charge: NIH - NLM R01 [LM009657]. Conflict of interest statement. None declared.
  21 in total

1.  Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Authors:  Benjamin P Lewis; Christopher B Burge; David P Bartel
Journal:  Cell       Date:  2005-01-14       Impact factor: 41.582

2.  Combinatorial microRNA target predictions.

Authors:  Azra Krek; Dominic Grün; Matthew N Poy; Rachel Wolf; Lauren Rosenberg; Eric J Epstein; Philip MacMenamin; Isabelle da Piedade; Kristin C Gunsalus; Markus Stoffel; Nikolaus Rajewsky
Journal:  Nat Genet       Date:  2005-04-03       Impact factor: 38.330

3.  Using expression profiling data to identify human microRNA targets.

Authors:  Jim C Huang; Tomas Babak; Timothy W Corson; Gordon Chua; Sofia Khan; Brenda L Gallie; Timothy R Hughes; Benjamin J Blencowe; Brendan J Frey; Quaid D Morris
Journal:  Nat Methods       Date:  2007-11-18       Impact factor: 28.547

4.  The role of site accessibility in microRNA target recognition.

Authors:  Michael Kertesz; Nicola Iovino; Ulrich Unnerstall; Ulrike Gaul; Eran Segal
Journal:  Nat Genet       Date:  2007-09-23       Impact factor: 38.330

5.  Transfection of small RNAs globally perturbs gene regulation by endogenous microRNAs.

Authors:  Aly A Khan; Doron Betel; Martin L Miller; Chris Sander; Christina S Leslie; Debora S Marks
Journal:  Nat Biotechnol       Date:  2009-06       Impact factor: 54.908

6.  mirWIP: microRNA target prediction based on microRNA-containing ribonucleoprotein-enriched transcripts.

Authors:  Molly Hammell; Dang Long; Liang Zhang; Andrew Lee; C Steven Carmack; Min Han; Ye Ding; Victor Ambros
Journal:  Nat Methods       Date:  2008-09       Impact factor: 28.547

7.  A mammalian microRNA expression atlas based on small RNA library sequencing.

Authors:  Pablo Landgraf; Mirabela Rusu; Robert Sheridan; Alain Sewer; Nicola Iovino; Alexei Aravin; Sébastien Pfeffer; Amanda Rice; Alice O Kamphorst; Markus Landthaler; Carolina Lin; Nicholas D Socci; Leandro Hermida; Valerio Fulci; Sabina Chiaretti; Robin Foà; Julia Schliwka; Uta Fuchs; Astrid Novosel; Roman-Ulrich Müller; Bernhard Schermer; Ute Bissels; Jason Inman; Quang Phan; Minchen Chien; David B Weir; Ruchi Choksi; Gabriella De Vita; Daniela Frezzetti; Hans-Ingo Trompeter; Veit Hornung; Grace Teng; Gunther Hartmann; Miklos Palkovits; Roberto Di Lauro; Peter Wernet; Giuseppe Macino; Charles E Rogler; James W Nagle; Jingyue Ju; F Nina Papavasiliou; Thomas Benzing; Peter Lichter; Wayne Tam; Michael J Brownstein; Andreas Bosio; Arndt Borkhardt; James J Russo; Chris Sander; Mihaela Zavolan; Thomas Tuschl
Journal:  Cell       Date:  2007-06-29       Impact factor: 41.582

8.  Expression of regulatory platelet microRNAs in patients with sickle cell disease.

Authors:  Shilpa Jain; Maria G Kapetanaki; Nalini Raghavachari; Kimberly Woodhouse; Guoying Yu; Suchitra Barge; Claudia Coronnello; Panayiotis V Benos; Gregory J Kato; Naftali Kaminski; Mark T Gladwin
Journal:  PLoS One       Date:  2013-04-12       Impact factor: 3.240

9.  Identification and target prediction of miRNAs specifically expressed in rat neural tissue.

Authors:  You-Jia Hua; Zhong-Yi Tang; Kang Tu; Li Zhu; Yi-Xue Li; Lu Xie; Hua-Sheng Xiao
Journal:  BMC Genomics       Date:  2009-05-09       Impact factor: 3.969

10.  MicroRNA targets in Drosophila.

Authors:  Anton J Enright; Bino John; Ulrike Gaul; Thomas Tuschl; Chris Sander; Debora S Marks
Journal:  Genome Biol       Date:  2003-12-12       Impact factor: 13.583

View more
  85 in total

1.  Norcantharidin induces autophagy-related prostate cancer cell death through Beclin-1 upregulation by miR-129-5p suppression.

Authors:  Wenjun Xiao; Bo Dai; Yao Zhu; Dingwei Ye
Journal:  Tumour Biol       Date:  2015-12-05

2.  High Expression of miR-532-5p, a Tumor Suppressor, Leads to Better Prognosis in Ovarian Cancer Both In Vivo and In Vitro.

Authors:  Fan Wang; Jeremy T-H Chang; Chester Jingshiu Kao; R Stephanie Huang
Journal:  Mol Cancer Ther       Date:  2016-02-12       Impact factor: 6.261

3.  Overexpression of miR-100 inhibits cancer growth, migration, and chemosensitivity in human NSCLC cells through fibroblast growth factor receptor 3.

Authors:  Jie Luo; Bin Chen; Xian-Xiu Ji; Song-Wen Zhou; Di Zheng
Journal:  Tumour Biol       Date:  2015-08-28

4.  MicroRNA-423 enhances the invasiveness of hepatocellular carcinoma via regulation of BRMS1.

Authors:  Xiangjun Sun; Mingchun Wang; Haiyan Liu; Jian Wang
Journal:  Am J Transl Res       Date:  2017-12-15       Impact factor: 4.060

Review 5.  Statistical analysis of non-coding RNA data.

Authors:  Qianchuan He; Yang Liu; Wei Sun
Journal:  Cancer Lett       Date:  2018-01-04       Impact factor: 8.679

6.  Regulation of MET-mediated proliferation of thyroid carcinoma cells by miR-449b.

Authors:  Lei Chen; Lei Xu; Gang Wang
Journal:  Tumour Biol       Date:  2015-06-06

7.  Functional analysis of microRNA activity in Brugia malayi.

Authors:  Canhui Liu; Denis Voronin; Catherine B Poole; Saheed Bachu; Matthew B Rogers; Jingmin Jin; Elodie Ghedin; Sara Lustigman; Larry A McReynolds; Thomas R Unnasch
Journal:  Int J Parasitol       Date:  2015-05-22       Impact factor: 3.981

8.  Extracellular vesicles from osteosarcoma cell lines contain miRNAs associated with cell adhesion and apoptosis.

Authors:  Sofía Jerez; Héctor Araya; Daniel Hevia; Carlos E Irarrázaval; Roman Thaler; Andre J van Wijnen; Mario Galindo
Journal:  Gene       Date:  2019-06-06       Impact factor: 3.688

9.  Molecular mechanisms underlying application of serum procalcitonin and stool miR-637 in prognosis of acute ischemic stroke.

Authors:  You-Mei Li; Xue-Yuan Liu
Journal:  Am J Transl Res       Date:  2016-10-15       Impact factor: 4.060

10.  Suppression of miR-4735-3p in androgen receptor-expressing prostate cancer cells increases cell death during chemotherapy.

Authors:  Weidong Zhou; Shengsong Huang; Qiquan Jiang; Tao Yuan
Journal:  Am J Transl Res       Date:  2017-08-15       Impact factor: 4.060

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.