Literature DB >> 8332493

Estimation of protein coding density in a corpus of DNA sequence data.

J W Fickett1, R Guigó.   

Abstract

A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a 'coding statistic' is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C. elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.

Entities:  

Mesh:

Substances:

Year:  1993        PMID: 8332493      PMCID: PMC309664          DOI: 10.1093/nar/21.12.2837

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  12 in total

1.  The EMBL Data Library.

Authors:  D G Higgins; R Fuchs; P J Stoehr; G N Cameron
Journal:  Nucleic Acids Res       Date:  1992-05-11       Impact factor: 16.971

2.  Electronic data publishing and GenBank.

Authors:  M J Cinkosky; J W Fickett; P Gilna; C Burks
Journal:  Science       Date:  1991-05-31       Impact factor: 47.728

3.  The complete DNA sequence of yeast chromosome III.

Authors:  S G Oliver; Q J van der Aart; M L Agostoni-Carbone; M Aigle; L Alberghina; D Alexandraki; G Antoine; R Anwar; J P Ballesta; P Benit
Journal:  Nature       Date:  1992-05-07       Impact factor: 49.962

4.  A common language for physical mapping of the human genome.

Authors:  M Olson; L Hood; C Cantor; D Botstein
Journal:  Science       Date:  1989-09-29       Impact factor: 47.728

5.  Improved methods for the formation and stabilization of R-loops.

Authors:  D B Kaback; L M Angerer; N Davidson
Journal:  Nucleic Acids Res       Date:  1979-06-11       Impact factor: 16.971

6.  The unc-22(IV) region of Caenorhabditis elegans: genetic analysis of lethal mutations.

Authors:  D V Clark; T M Rogalski; L M Donati; D L Baillie
Journal:  Genetics       Date:  1988-06       Impact factor: 4.562

7.  Selfish DNA: the ultimate parasite.

Authors:  L E Orgel; F H Crick
Journal:  Nature       Date:  1980-04-17       Impact factor: 49.962

8.  Mutations with dominant effects on the behavior and morphology of the nematode Caenorhabditis elegans.

Authors:  E C Park; H R Horvitz
Journal:  Genetics       Date:  1986-08       Impact factor: 4.562

9.  The C. elegans genome sequencing project: a beginning.

Authors:  J Sulston; Z Du; K Thomas; R Wilson; L Hillier; R Staden; N Halloran; P Green; J Thierry-Mieg; L Qiu
Journal:  Nature       Date:  1992-03-05       Impact factor: 49.962

10.  A survey of expressed genes in Caenorhabditis elegans.

Authors:  R Waterston; C Martin; M Craxton; C Huynh; A Coulson; L Hillier; R Durbin; P Green; R Shownkeen; N Halloran
Journal:  Nat Genet       Date:  1992-05       Impact factor: 38.330

View more
  1 in total

1.  Evolution of protein domain promiscuity in eukaryotes.

Authors:  Malay Kumar Basu; Liran Carmel; Igor B Rogozin; Eugene V Koonin
Journal:  Genome Res       Date:  2008-01-29       Impact factor: 9.043

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.