Literature DB >> 24448658

Coding sequence density estimation via topological pressure.

David Koslicki1, Daniel J Thompson.   

Abstract

We give a new approach to coding sequence (CDS) density estimation in genomic analysis based on the topological pressure, which we develop from a well known concept in ergodic theory. Topological pressure measures the 'weighted information content' of a finite word, and incorporates 64 parameters which can be interpreted as a choice of weight for each nucleotide triplet. We train the parameters so that the topological pressure fits the observed coding sequence density on the human genome, and use this to give ab initio predictions of CDS density over windows of size around 66,000 bp on the genomes of Mus Musculus, Rhesus Macaque and Drososphilia Melanogaster. While the differences between these genomes are too great to expect that training on the human genome could predict, for example, the exact locations of genes, we demonstrate that our method gives reasonable estimates for the 'coarse scale' problem of predicting CDS density. Inspired again by ergodic theory, the weightings of the nucleotide triplets obtained from our training procedure are used to define a probability distribution on finite sequences, which can be used to distinguish between intron and exon sequences from the human genome of lengths between 750 and 5,000 bp. At the end of the paper, we explain the theoretical underpinning for our approach, which is the theory of Thermodynamic Formalism from the dynamical systems literature. Mathematica and MATLAB implementations of our method are available at http://sourceforge.net/projects/topologicalpres/ .

Entities:  

Mesh:

Year:  2014        PMID: 24448658     DOI: 10.1007/s00285-014-0754-2

Source DB:  PubMed          Journal:  J Math Biol        ISSN: 0303-6812            Impact factor:   2.259


  36 in total

Review 1.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

2.  Genetic mapping and BAC assignment of EST-derived SSR markers shows non-uniform distribution of genes in the barley genome.

Authors:  R K Varshney; I Grosse; U Hähnel; R Siefken; M Prasad; N Stein; P Langridge; L Altschmied; A Graner
Journal:  Theor Appl Genet       Date:  2006-06-01       Impact factor: 5.699

3.  Weak pairwise correlations imply strongly correlated network states in a neural population.

Authors:  Elad Schneidman; Michael J Berry; Ronen Segev; William Bialek
Journal:  Nature       Date:  2006-04-09       Impact factor: 49.962

Review 4.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

Review 5.  Genomic analyses of sex chromosome evolution.

Authors:  Melissa A Wilson; Kateryna D Makova
Journal:  Annu Rev Genomics Hum Genet       Date:  2009       Impact factor: 8.929

6.  RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

Authors:  Stefan Washietl; Sven Findeiss; Stephan A Müller; Stefan Kalkhof; Martin von Bergen; Ivo L Hofacker; Peter F Stadler; Nick Goldman
Journal:  RNA       Date:  2011-02-28       Impact factor: 4.942

7.  Codon usages in different gene classes of the Escherichia coli genome.

Authors:  S Karlin; J Mrázek; A M Campbell
Journal:  Mol Microbiol       Date:  1998-09       Impact factor: 3.501

8.  Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements.

Authors:  Teresa M Creanza; David S Horner; Annarita D'Addabbo; Rosalia Maglietta; Flavio Mignone; Nicola Ancona; Graziano Pesole
Journal:  BMC Bioinformatics       Date:  2009-06-16       Impact factor: 3.169

9.  The footprint of metabolism in the organization of mammalian genomes.

Authors:  Luisa Berná; Ankita Chaurasia; Claudia Angelini; Concetta Federico; Salvatore Saccone; Giuseppe D'Onofrio
Journal:  BMC Genomics       Date:  2012-05-08       Impact factor: 3.969

10.  Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

Authors:  Michael F Lin; Ameya N Deoras; Matthew D Rasmussen; Manolis Kellis
Journal:  PLoS Comput Biol       Date:  2008-04-18       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.