Literature DB >> 29800232

Predicting novel microRNA: a comprehensive comparison of machine learning approaches.

Georgina Stegmayer1, Leandro E Di Persia1, Mariano Rubiolo1, Matias Gerard1, Milton Pividori1, Cristian Yones1, Leandro A Bugnon1, Tadeo Rodriguez1, Jonathan Raad1, Diego H Milone1.   

Abstract

MOTIVATION: The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA.
RESULTS: This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  high class imbalance; machine learning; miRNA prediction

Year:  2019        PMID: 29800232     DOI: 10.1093/bib/bby037

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  7 in total

Review 1.  Why the hype - What are microRNAs and why do they provide unique investigative, diagnostic, and therapeutic opportunities in veterinary medicine?

Authors:  Joshua Antunes; Olivia Lee; Amir Hamed Alizadeh; Jonathan LaMarre; Thomas Gadegaard Koch
Journal:  Can Vet J       Date:  2020-08       Impact factor: 1.008

Review 2.  MicroRNAs in hypertrophic cardiomyopathy: pathogenesis, diagnosis, treatment potential and roles as clinical biomarkers.

Authors:  Fanyan Luo; Wei Liu; Haisong Bu
Journal:  Heart Fail Rev       Date:  2022-03-25       Impact factor: 4.654

3.  HumiR: Web Services, Tools and Databases for Exploring Human microRNA Data.

Authors:  Jeffrey Solomon; Fabian Kern; Tobias Fehlmann; Eckart Meese; Andreas Keller
Journal:  Biomolecules       Date:  2020-11-20

4.  Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19.

Authors:  L A Bugnon; J Raad; G A Merino; C Yones; F Ariel; D H Milone; G Stegmayer
Journal:  Mach Learn Appl       Date:  2021-09-09

Review 5.  MicroRNA-mediated bioengineering for climate-resilience in crops.

Authors:  Suraj Patil; Shrushti Joshi; Monica Jamla; Xianrong Zhou; Mohammad J Taherzadeh; Penna Suprasanna; Vinay Kumar
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

6.  Robust and efficient COVID-19 detection techniques: A machine learning approach.

Authors:  Md Mahadi Hasan; Saba Binte Murtaz; Muhammad Usama Islam; Muhammad Jafar Sadeq; Jasim Uddin
Journal:  PLoS One       Date:  2022-09-15       Impact factor: 3.752

7.  Fast and accurate microRNA search using CNN.

Authors:  Xubo Tang; Yanni Sun
Journal:  BMC Bioinformatics       Date:  2019-12-27       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.