Literature DB >> 12715833

Compositional features of eukaryotic genomes for checking predicted genes.

Stéphane Cruveiller1, Kamel Jabbari, Oliver Clay, Giorgio Bemardi.   

Abstract

Gene prediction relies on the identification of characteristic features of coding sequences that distinguish them from non-coding DNA. The recent large-scale sequencing of entire genomes from higher eukaryotes, in conjunction with currently used gene prediction algorithms, has provided an abundance of putative genes that can now be analysed for their compositional properties. Strong, systematic differences still exist, in several species, between the compositional properties of sets of ex novo predicted genes and genes that have been experimentally detected and/or verified. This is particularly evident in the estimated gene set (>45,000 genes) of the recently sequenced rice genome, where roughly half the predicted genes are compositionally unusual and have no known orthologues in the dicot Arabidopsis. In a few cases such differences might suggest a bias in experimental gene-finding protocols, but the quasi-random nature of the compositionally aberrant predicted genes is a strong indication that many, if not most, of them are false positives. It therefore appears that some important features of coding regions have not yet been taken into account in existing gene prediction programs. Statistical base compositional properties of curated gene data sets from vertebrates, which we briefly review here, should therefore provide a useful benchmark for fine-tuning probabilistic gene models and model parameters that are currently in use.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12715833     DOI: 10.1093/bib/4.1.43

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  5 in total

1.  Compositional gene landscapes in vertebrates.

Authors:  Stéphane Cruveiller; Kamel Jabbari; Oliver Clay; Giorgio Bernardi
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

2.  5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation.

Authors:  Chia-Lin Wei; Patrick Ng; Kuo Ping Chiu; Chee Hong Wong; Chin Chin Ang; Leonard Lipovich; Edison T Liu; Yijun Ruan
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-22       Impact factor: 11.205

3.  Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain.

Authors:  Miguel A Fuertes; José M Pérez; Emile Zuckerkandl; Carlos Alonso
Journal:  J Mol Evol       Date:  2010-12-04       Impact factor: 2.395

4.  Identification and characterization of lineage-specific genes within the Poaceae.

Authors:  Matthew A Campbell; Wei Zhu; Ning Jiang; Haining Lin; Shu Ouyang; Kevin L Childs; Brian J Haas; John P Hamilton; C Robin Buell
Journal:  Plant Physiol       Date:  2007-10-19       Impact factor: 8.340

5.  Genome update of the dimorphic human pathogenic fungi causing paracoccidioidomycosis.

Authors:  José F Muñoz; Juan E Gallo; Elizabeth Misas; Margaret Priest; Alma Imamovic; Sarah Young; Qiandong Zeng; Oliver K Clay; Juan G McEwen; Christina A Cuomo
Journal:  PLoS Negl Trop Dis       Date:  2014-12-04
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.