Literature DB >> 17204465

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists.

Yvan Saeys1, Pierre Rouzé, Yves Van de Peer.   

Abstract

MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential.
RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17204465     DOI: 10.1093/bioinformatics/btl639

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Classifier assessment and feature selection for recognizing short coding sequences of human genes.

Authors:  Kai Song; Ze Zhang; Tuo-Peng Tong; Fang Wu
Journal:  J Comput Biol       Date:  2012-03       Impact factor: 1.479

2.  Small open reading frames associated with morphogenesis are hidden in plant genomes.

Authors:  Kousuke Hanada; Mieko Higuchi-Takeuchi; Masanori Okamoto; Takeshi Yoshizumi; Minami Shimizu; Kentaro Nakaminami; Ranko Nishi; Chihiro Ohashi; Kei Iida; Maho Tanaka; Yoko Horii; Mika Kawashima; Keiko Matsui; Tetsuro Toyoda; Kazuo Shinozaki; Motoaki Seki; Minami Matsui
Journal:  Proc Natl Acad Sci U S A       Date:  2013-01-22       Impact factor: 11.205

3.  SNR of DNA sequences mapped by general affine transformations of the indicator sequences.

Authors:  Jianfeng Shao; Xiaohua Yan; Shuo Shao
Journal:  J Math Biol       Date:  2012-07-21       Impact factor: 2.259

Review 4.  Small open reading frames: current prediction techniques and future prospect.

Authors:  Haoyu Cheng; Wai Soon Chan; Zhixiu Li; Dan Wang; Song Liu; Yaoqi Zhou
Journal:  Curr Protein Pept Sci       Date:  2011-09       Impact factor: 3.272

5.  Some novel intron positions in conserved Drosophila genes are caused by intron sliding or tandem duplication.

Authors:  Jörg Lehmann; Carina Eisenhardt; Peter F Stadler; Veiko Krauss
Journal:  BMC Evol Biol       Date:  2010-05-26       Impact factor: 3.260

6.  Constructing Physical and Genomic Maps for Puccinia striiformis f. sp. tritici, the Wheat Stripe Rust Pathogen, by Comparing Its EST Sequences to the Genomic Sequence of P. graminis f. sp. tritici, the Wheat Stem Rust Pathogen.

Authors:  Jinbiao Ma; Xianming Chen; Meinan Wang; Zhensheng Kang
Journal:  Comp Funct Genomics       Date:  2010-02-11

7.  Ontologies for bioinformatics.

Authors:  Nadine Schuurman; Agnieszka Leszczynski
Journal:  Bioinform Biol Insights       Date:  2008-03-12

8.  Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli.

Authors:  W Nicholson Price; Samuel K Handelman; John K Everett; Saichiu N Tong; Ana Bracic; Jon D Luff; Victor Naumov; Thomas Acton; Philip Manor; Rong Xiao; Burkhard Rost; Gaetano T Montelione; John F Hunt
Journal:  Microb Inform Exp       Date:  2011-06-27

9.  Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

Authors:  Michael F Lin; Ameya N Deoras; Matthew D Rasmussen; Manolis Kellis
Journal:  PLoS Comput Biol       Date:  2008-04-18       Impact factor: 4.475

Review 10.  small ORFs: A new class of essential genes for development.

Authors:  João Paulo Albuquerque; Vitória Tobias-Santos; Aline Cáceres Rodrigues; Flávia Borges Mury; Rodrigo Nunes da Fonseca
Journal:  Genet Mol Biol       Date:  2015-08-21       Impact factor: 1.771

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.