Literature DB >> 24771344

Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm.

Supatcha Lertampaiporn1, Chinae Thammarongtham2, Chakarida Nukoolkit3, Boonserm Kaewkamnerdpong1, Marasri Ruengjitchatchawalya4.   

Abstract

To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24771344      PMCID: PMC4066759          DOI: 10.1093/nar/gku325

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  66 in total

1.  Structural profiles of human miRNA families from pairwise clustering.

Authors:  Bogumił Kaczkowski; Elfar Torarinsson; Kristin Reiche; Jakob Hull Havgaard; Peter F Stadler; Jan Gorodkin
Journal:  Bioinformatics       Date:  2008-12-04       Impact factor: 6.937

Review 2.  Long non-coding RNAs: insights into functions.

Authors:  Tim R Mercer; Marcel E Dinger; John S Mattick
Journal:  Nat Rev Genet       Date:  2009-03       Impact factor: 53.242

3.  RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

Authors:  Stefan Washietl; Sven Findeiss; Stephan A Müller; Stefan Kalkhof; Martin von Bergen; Ivo L Hofacker; Peter F Stadler; Nick Goldman
Journal:  RNA       Date:  2011-02-28       Impact factor: 4.942

Review 4.  Regulatory RNAs in bacteria.

Authors:  Lauren S Waters; Gisela Storz
Journal:  Cell       Date:  2009-02-20       Impact factor: 41.582

5.  De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring.

Authors:  Pontus Larsson; Andrea Hinas; David H Ardell; Leif A Kirsebom; Anders Virtanen; Fredrik Söderbom
Journal:  Genome Res       Date:  2008-03-17       Impact factor: 9.043

Review 6.  From structure prediction to genomic screens for novel non-coding RNAs.

Authors:  Jan Gorodkin; Ivo L Hofacker
Journal:  PLoS Comput Biol       Date:  2011-08-04       Impact factor: 4.475

7.  Self containment, a property of modular RNA structures, distinguishes microRNAs.

Authors:  Miler T Lee; Junhyong Kim
Journal:  PLoS Comput Biol       Date:  2008-08-22       Impact factor: 4.475

8.  Non-coding RNA prediction and verification in Saccharomyces cerevisiae.

Authors:  Laura A Kavanaugh; Fred S Dietrich
Journal:  PLoS Genet       Date:  2009-01-02       Impact factor: 5.917

9.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine.

Authors:  Lei Kong; Yong Zhang; Zhi-Qiang Ye; Xiao-Qiao Liu; Shu-Qi Zhao; Liping Wei; Ge Gao
Journal:  Nucleic Acids Res       Date:  2007-07       Impact factor: 16.971

10.  Conditional variable importance for random forests.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Thomas Kneib; Thomas Augustin; Achim Zeileis
Journal:  BMC Bioinformatics       Date:  2008-07-11       Impact factor: 3.169

View more
  12 in total

1.  Inferring Potential CircRNA-Disease Associations via Deep Autoencoder-Based Classification.

Authors:  K Deepthi; A S Jereesh
Journal:  Mol Diagn Ther       Date:  2020-11-06       Impact factor: 4.074

2.  CPPred: coding potential prediction based on the global description of RNA sequence.

Authors:  Xiaoxue Tong; Shiyong Liu
Journal:  Nucleic Acids Res       Date:  2019-05-07       Impact factor: 16.971

3.  FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

Authors:  Valentin Wucher; Fabrice Legeai; Benoît Hédan; Guillaume Rizk; Lætitia Lagoutte; Tosso Leeb; Vidhya Jagannathan; Edouard Cadieu; Audrey David; Hannes Lohi; Susanna Cirera; Merete Fredholm; Nadine Botherel; Peter A J Leegwater; Céline Le Béguec; Hille Fieten; Jeremy Johnson; Jessica Alföldi; Catherine André; Kerstin Lindblad-Toh; Christophe Hitte; Thomas Derrien
Journal:  Nucleic Acids Res       Date:  2017-05-05       Impact factor: 16.971

4.  Derivation and validation of a clinical risk score to predict death among patients awaiting cardiac surgery in Ontario, Canada: a population-based study.

Authors:  Louise Y Sun; Harindra C Wijeysundera; Douglas S Lee; Sean van Diepen; Marc Ruel; Anan Bader Eddeen; Thierry G Mesana
Journal:  CMAJ Open       Date:  2022-03-08

5.  Dietary MicroRNA Database (DMD): An Archive Database and Analytic Tool for Food-Borne microRNAs.

Authors:  Kevin Chiang; Jiang Shu; Janos Zempleni; Juan Cui
Journal:  PLoS One       Date:  2015-06-01       Impact factor: 3.240

Review 6.  Long non-coding RNAs and their biological roles in plants.

Authors:  Xue Liu; Lili Hao; Dayong Li; Lihuang Zhu; Songnian Hu
Journal:  Genomics Proteomics Bioinformatics       Date:  2015-04-30       Impact factor: 7.691

7.  Computational identification of putative lincRNAs in mouse embryonic stem cell.

Authors:  Hui Liu; Jie Lyu; Hongbo Liu; Yang Gao; Jing Guo; Hongjuan He; Zhengbin Han; Yan Zhang; Qiong Wu
Journal:  Sci Rep       Date:  2016-10-07       Impact factor: 4.379

8.  nRC: non-coding RNA Classifier based on structural features.

Authors:  Antonino Fiannaca; Massimo La Rosa; Laura La Paglia; Riccardo Rizzo; Alfonso Urso
Journal:  BioData Min       Date:  2017-08-01       Impact factor: 2.522

9.  An improved method for identification of small non-coding RNAs in bacteria using support vector machine.

Authors:  Ranjan Kumar Barman; Anirban Mukhopadhyay; Santasabuj Das
Journal:  Sci Rep       Date:  2017-04-06       Impact factor: 4.379

10.  IRSOM, a reliable identifier of ncRNAs based on supervised self-organizing maps with rejection.

Authors:  Ludovic Platon; Farida Zehraoui; Abdelhafid Bendahmane; Fariza Tahi
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.