Nicolò Navarin1, Fabrizio Costa2,3. 1. Department of Mathematics, University of Padova, Padova 35121, Italy. 2. Department of Computer Science, University of Freiburg, D-79110 Freiburg, Germany. 3. Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK.
Abstract
MOTIVATION: The importance of RNA protein-coding gene regulation is by now well appreciated. Non-coding RNAs (ncRNAs) are known to regulate gene expression at practically every stage, ranging from chromatin packaging to mRNA translation. However the functional characterization of specific instances remains a challenging task in genome scale settings. For this reason, automatic annotation approaches are of interest. Existing computational methods are either efficient but non-accurate or they offer increased precision, but present scalability problems. RESULTS: In this article, we present a predictive system based on kernel methods, a type of machine learning algorithm grounded in statistical learning theory. We employ a flexible graph encoding to preserve multiple structural hypotheses and exploit recent advances in representation and model induction to scale to large data volumes. Experimental results on tens of thousands of ncRNA sequences available from the Rfam database indicate that we can not only improve upon state-of-the-art predictors, but also achieve speedups of several orders of magnitude. AVAILABILITY AND IMPLEMENTATION: The code is available from http://www.bioinf.uni-freiburg.de/~costa/EDeN.tgz . CONTACT: f.costa@exeter.ac.uk.
MOTIVATION: The importance of RNA protein-coding gene regulation is by now well appreciated. Non-coding RNAs (ncRNAs) are known to regulate gene expression at practically every stage, ranging from chromatin packaging to mRNA translation. However the functional characterization of specific instances remains a challenging task in genome scale settings. For this reason, automatic annotation approaches are of interest. Existing computational methods are either efficient but non-accurate or they offer increased precision, but present scalability problems. RESULTS: In this article, we present a predictive system based on kernel methods, a type of machine learning algorithm grounded in statistical learning theory. We employ a flexible graph encoding to preserve multiple structural hypotheses and exploit recent advances in representation and model induction to scale to large data volumes. Experimental results on tens of thousands of ncRNA sequences available from the Rfam database indicate that we can not only improve upon state-of-the-art predictors, but also achieve speedups of several orders of magnitude. AVAILABILITY AND IMPLEMENTATION: The code is available from http://www.bioinf.uni-freiburg.de/~costa/EDeN.tgz . CONTACT: f.costa@exeter.ac.uk.
Authors: Martin Raden; Syed M Ali; Omer S Alkhnbashi; Anke Busch; Fabrizio Costa; Jason A Davis; Florian Eggenhofer; Rick Gelhausen; Jens Georg; Steffen Heyne; Michael Hiller; Kousik Kundu; Robert Kleinkauf; Steffen C Lott; Mostafa M Mohamed; Alexander Mattheis; Milad Miladi; Andreas S Richter; Sebastian Will; Joachim Wolff; Patrick R Wright; Rolf Backofen Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971
Authors: Merylin Monaro; Cristina Mazza; Marco Colasanti; Stefano Ferracuti; Graziella Orrù; Alberto di Domenico; Giuseppe Sartori; Paolo Roma Journal: Psychol Res Date: 2021-01-16