Literature DB >> 29993641

A Distributed Classifier for MicroRNA Target Prediction with Validation Through TCGA Expression Data.

Asish Ghoshal, Jinyi Zhang, Michael A Roth, Kevin Muyuan Xia, Ananth Y Grama, Somali Chaterji.   

Abstract

BACKGROUND: MicroRNAs (miRNAs) are approximately 22-nucleotide long regulatory RNA that mediate RNA interference by binding to cognate mRNA target regions. Here, we present a distributed kernel SVM-based binary classification scheme to predict miRNA targets. It captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves. This is accomplished separately for various input features, such as thermodynamic and sequence-based features. Further, we use a principled approach to uniformly model both canonical and non-canonical seed matches, using a novel seed enrichment metric. Finally, we verify our miRNA-mRNA pairings using an Elastic Net-based regression model on TCGA expression data for four cancer types to estimate the miRNAs that together regulate any given mRNA.
RESULTS: We present a suite of algorithms for miRNA target prediction, under the banner Avishkar, with superior prediction performance over the competition. Specifically, our final kernel SVM model, with an Apache Spark backend, achieves an average true positive rate (TPR) of more than 75 percent, when keeping the false positive rate of 20 percent, for non-canonical human miRNA target sites. This is an improvement of over 150 percent in the TPR for non-canonical sites, over the best-in-class algorithm. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of miRNA-mRNA interaction as curves, devising a novel seed enrichment metric, and learning an ensemble of miRNA family-specific kernel SVM classifiers. We provide an easy-to-use system for large-scale interactive analysis and prediction of miRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction, and computing performance metrics are fully distributed and are scalable.
CONCLUSIONS: We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves for different species (human or mouse), or different target types (canonical or non-canonical). We analyzed the agreement between the target pairings using CLIP-seq data and using expression data from four cancer types. To the best of our knowledge, we provide the first distributed framework for miRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY: All source code and sample data are publicly available at https://bitbucket.org/cellsandmachines/avishkar. Our scalable implementation of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems, is available at https://bitbucket.org/cellsandmachines/kernelsvmspark.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29993641      PMCID: PMC6175706          DOI: 10.1109/TCBB.2018.2828305

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  43 in total

1.  Prevention and therapy of fungal infections in bone marrow transplantation.

Authors:  L R Baden
Journal:  Leukemia       Date:  2003-06       Impact factor: 11.528

2.  Combinatorial microRNA target predictions.

Authors:  Azra Krek; Dominic Grün; Matthew N Poy; Rachel Wolf; Lauren Rosenberg; Eric J Epstein; Philip MacMenamin; Isabelle da Piedade; Kristin C Gunsalus; Markus Stoffel; Nikolaus Rajewsky
Journal:  Nat Genet       Date:  2005-04-03       Impact factor: 38.330

3.  The role of site accessibility in microRNA target recognition.

Authors:  Michael Kertesz; Nicola Iovino; Ulrich Unnerstall; Ulrike Gaul; Eran Segal
Journal:  Nat Genet       Date:  2007-09-23       Impact factor: 38.330

Review 4.  Crosstalk between steroid receptors and the c-Src-receptor tyrosine kinase pathways: implications for cell proliferation.

Authors:  Margaret A Shupnik
Journal:  Oncogene       Date:  2004-10-18       Impact factor: 9.867

Review 5.  Hijacking the vasculature in ccRCC--co-option, remodelling and angiogenesis.

Authors:  Chao-Nan Qian
Journal:  Nat Rev Urol       Date:  2013-03-05       Impact factor: 14.432

Review 6.  MicroRNA and cancer.

Authors:  Martin D Jansson; Anders H Lund
Journal:  Mol Oncol       Date:  2012-10-09       Impact factor: 6.603

Review 7.  Fungal infections in leukemia patients: how do we prevent and treat them?

Authors:  Konstantinos Leventakos; Russell E Lewis; Dimitrios P Kontoyiannis
Journal:  Clin Infect Dis       Date:  2010-02-01       Impact factor: 9.079

8.  REVIGO summarizes and visualizes long lists of gene ontology terms.

Authors:  Fran Supek; Matko Bošnjak; Nives Škunca; Tomislav Šmuc
Journal:  PLoS One       Date:  2011-07-18       Impact factor: 3.240

9.  RNAhybrid: microRNA target prediction easy, fast and flexible.

Authors:  Jan Krüger; Marc Rehmsmeier
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  CLIP-based prediction of mammalian microRNA binding sites.

Authors:  Chaochun Liu; Bibekanand Mallick; Dang Long; William A Rennie; Adam Wolenc; C Steven Carmack; Ye Ding
Journal:  Nucleic Acids Res       Date:  2013-05-22       Impact factor: 16.971

View more
  3 in total

1.  Computational Detection of MicroRNA Targets.

Authors:  Pedro Gabriel Nachtigall; Luiz Augusto Bovolenta
Journal:  Methods Mol Biol       Date:  2022

Review 2.  MicroRNA Targeting.

Authors:  Hossein Ghanbarian; Mehmet Taha Yıldız; Yusuf Tutar
Journal:  Methods Mol Biol       Date:  2022

3.  Machine learning analysis of TCGA cancer data.

Authors:  Jose Liñares-Blanco; Alejandro Pazos; Carlos Fernandez-Lozano
Journal:  PeerJ Comput Sci       Date:  2021-07-12
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.