Literature DB >> 22554517

The elusive short gene--an ensemble method for recognition for prokaryotic genome.

Baharak Goli1, Achuthsankar S Nair.   

Abstract

Accurate prediction of short protein coding DNA from genome sequence information remains an unsolved problem in DNA sequence analysis. Popular gene finding tools show drastic reduction in accuracy while attempting to predict genes of length less than 400 nt, a length we define as short. This study performs a quantitative evaluation of a set of selected coding measures in terms of their discriminative power in recognizing short genes in prokaryotic genomes. By performing Fast Correlation Based Feature Selection (FCBF) technique, we identified a subset of coding measures with high discriminative power. Using the measures identified thus, we present a novel approach for short genes recognition. A short-gene predictor employing AdaBoost.M1 in conjunction with random forests as the base classifier gives 92.74% accuracy, 94.77% sensitivity and 90.06% specificity on short genes.
Copyright © 2012 Elsevier Inc. All rights reserved.

Mesh:

Substances:

Year:  2012        PMID: 22554517     DOI: 10.1016/j.bbrc.2012.04.090

Source DB:  PubMed          Journal:  Biochem Biophys Res Commun        ISSN: 0006-291X            Impact factor:   3.575


  5 in total

Review 1.  Alternative ORFs and small ORFs: shedding light on the dark proteome.

Authors:  Mona Wu Orr; Yuanhui Mao; Gisela Storz; Shu-Bing Qian
Journal:  Nucleic Acids Res       Date:  2020-02-20       Impact factor: 16.971

Review 2.  Escherichia coli Small Proteome.

Authors:  Matthew R Hemm; Jeremy Weaver; Gisela Storz
Journal:  EcoSal Plus       Date:  2020-05

3.  OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques.

Authors:  Fabio R Cerqueira; Ana Tereza Ribeiro Vasconcelos
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

4.  Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm.

Authors:  Sun Chen; Chun-ying Zhang; Kai Song
Journal:  Biol Direct       Date:  2013-09-25       Impact factor: 4.540

5.  Identifying New Small Proteins in Escherichia coli.

Authors:  Caitlin E VanOrsdel; John P Kelly; Brittany N Burke; Christina D Lein; Christopher E Oufiero; Joseph F Sanchez; Larry E Wimmers; David J Hearn; Fatimeh J Abuikhdair; Kathryn R Barnhart; Michelle L Duley; Sarah E G Ernst; Briana A Kenerson; Aubrey J Serafin; Matthew R Hemm
Journal:  Proteomics       Date:  2018-05-02       Impact factor: 3.984

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.