Literature DB >> 33632132

geneRFinder: gene finding in distinct metagenomic data complexities.

Raíssa Silva1,2, Kleber Padovani2, Fabiana Góes3, Ronnie Alves4,5.   

Abstract

BACKGROUND: Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates.
RESULTS: We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar's test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval.
CONCLUSIONS: We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/ , and also we provide a novel, comprehensive benchmark data for gene prediction-which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions-available at https://sourceforge.net/p/generfinder-benchmark .

Entities:  

Keywords:  Gene prediction; Machine learning; Metagenomics

Mesh:

Year:  2021        PMID: 33632132      PMCID: PMC7905635          DOI: 10.1186/s12859-021-03997-w

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  30 in total

1.  Interactions between commensal intestinal bacteria and the immune system.

Authors:  Andrew J Macpherson; Nicola L Harris
Journal:  Nat Rev Immunol       Date:  2004-06       Impact factor: 53.106

2.  Study of DNA binding sites using the Rényi parametric entropy measure.

Authors:  A Krishnamachari; Vijnan moy Mandal
Journal:  J Theor Biol       Date:  2004-04-07       Impact factor: 2.691

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

Review 5.  Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants.

Authors:  Tatianne da Costa Negri; Wonder Alexandre Luz Alves; Pedro Henrique Bugatti; Priscila Tiemi Maeda Saito; Douglas Silva Domingues; Alexandre Rossi Paschoal
Journal:  Brief Bioinform       Date:  2019-03-25       Impact factor: 11.622

6.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

7.  Prodigal: prokaryotic gene recognition and translation initiation site identification.

Authors:  Doug Hyatt; Gwo-Liang Chen; Philip F Locascio; Miriam L Land; Frank W Larimer; Loren J Hauser
Journal:  BMC Bioinformatics       Date:  2010-03-08       Impact factor: 3.169

8.  FragGeneScan: predicting genes in short and error-prone reads.

Authors:  Mina Rho; Haixu Tang; Yuzhen Ye
Journal:  Nucleic Acids Res       Date:  2010-08-30       Impact factor: 16.971

9.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.

Authors:  Hideki Noguchi; Jungho Park; Toshihisa Takagi
Journal:  Nucleic Acids Res       Date:  2006-10-05       Impact factor: 16.971

10.  Back to the Future of Soil Metagenomics.

Authors:  Joseph Nesme; Wafa Achouak; Spiros N Agathos; Mark Bailey; Petr Baldrian; Dominique Brunel; Åsa Frostegård; Thierry Heulin; Janet K Jansson; Edouard Jurkevitch; Kristiina L Kruus; George A Kowalchuk; Antonio Lagares; Hilary M Lappin-Scott; Philippe Lemanceau; Denis Le Paslier; Ines Mandic-Mulec; J Colin Murrell; David D Myrold; Renaud Nalin; Paolo Nannipieri; Josh D Neufeld; Fergal O'Gara; John J Parnell; Alfred Pühler; Victor Pylro; Juan L Ramos; Luiz F W Roesch; Michael Schloter; Christa Schleper; Alexander Sczyrba; Angela Sessitsch; Sara Sjöling; Jan Sørensen; Søren J Sørensen; Christoph C Tebbe; Edward Topp; George Tsiamis; Jan Dirk van Elsas; Geertje van Keulen; Franco Widmer; Michael Wagner; Tong Zhang; Xiaojun Zhang; Liping Zhao; Yong-Guan Zhu; Timothy M Vogel; Pascal Simonet
Journal:  Front Microbiol       Date:  2016-02-10       Impact factor: 5.640

View more
  1 in total

1.  NGS read classification using AI.

Authors:  Benjamin Voigt; Oliver Fischer; Christian Krumnow; Christian Herta; Piotr Wojciech Dabrowski
Journal:  PLoS One       Date:  2021-12-22       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.