Literature DB >> 10908339

Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve.

C T Zhang1, J Wang.   

Abstract

The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is </=5645, significantly smaller than the 5800-6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ OE [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10908339      PMCID: PMC102655          DOI: 10.1093/nar/28.14.2804

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  18 in total

1.  Analysis of distribution of bases in the coding sequences by a diagrammatic technique.

Authors:  C T Zhang; R Zhang
Journal:  Nucleic Acids Res       Date:  1991-11-25       Impact factor: 16.971

2.  Bioinformatics and the discovery of gene function.

Authors:  G Casari; A De Daruvar; C Sander; R Schneider
Journal:  Trends Genet       Date:  1996-07       Impact factor: 11.639

Review 3.  The yeast genome project: what did we learn?

Authors:  B Dujon
Journal:  Trends Genet       Date:  1996-07       Impact factor: 11.639

4.  Evaluation of gene structure prediction programs.

Authors:  M Burset; R Guigó
Journal:  Genomics       Date:  1996-06-15       Impact factor: 5.736

5.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

6.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences.

Authors:  R Zhang; C T Zhang
Journal:  J Biomol Struct Dyn       Date:  1994-02

7.  Origin and properties of non-coding ORFs in the yeast genome.

Authors:  P Mackiewicz; M Kowalczuk; A Gierlik; M R Dudek; S Cebrat
Journal:  Nucleic Acids Res       Date:  1999-09-01       Impact factor: 16.971

8.  Complete DNA sequence of yeast chromosome XI.

Authors:  B Dujon; D Alexandraki; B André; W Ansorge; V Baladron; J P Ballesta; A Banrevi; P A Bolle; M Bolotin-Fukuhara; P Bossier; G Bou; J Boyer; M J Bultrago; G Cheret; L Colleaux; B Dalgnan-Fornler; F del Rey; C Dlon; H Domdey; A Düsterhoft; S Düsterhus; K D Entlan; H Erfle; P F Esteban; H Feldmann; L Fernandes; G M Robo; C Fritz; H Fukuhara; C Gabel; L Gaillon; J M Carcia-Cantalejo; J J Garcia-Ramirez; N E Gent; M Ghazvini; A Goffeau; A Gonzaléz; D Grothues; P Guerreiro; J Hegemann; N Hewitt; F Hilger; C P Hollenberg; O Horaitis; K J Indge; A Jacquier; C M James; C Jauniaux; A Jimenez; H Keuchel; L Kirchrath; K Kleine; P Kötter; P Legrain; S Liebl; E J Louis; A Maia e Silva; C Marck; A L Monnier; D Möstl; S Müller; B Obermaier; S G Oliver; C Pallier; S Pascolo; F Pfeiffer; P Philippsen; R J Planta; F M Pohl; T M Pohl; R Pöhlmann; D Portetelle; B Purnelle; V Puzos; M Ramezani Rad; S W Rasmussen; M Remacha; J L Revuelta; G F Richard; M Rieger; C Rodrigues-Pousada; M Rose; T Rupp; M A Santos; C Schwager; C Sensen; J Skala; H Soares; F Sor; J Stegemann; H Tettelin; A Thierry; M Tzermia; L A Urrestarazu; L van Dyck; J C Van Vliet-Reedijk; M Valens; M Vandenbo; C Vilela; S Vissers; D von Wettstein; H Voss; S Wiemann; G Xu; J Zimmermann; M Haasemann; I Becker; H W Mewes
Journal:  Nature       Date:  1994-06-02       Impact factor: 49.962

9.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences.

Authors:  C T Zhang; K C Chou
Journal:  J Mol Biol       Date:  1994-04-22       Impact factor: 5.469

10.  Codon selection in yeast.

Authors:  J L Bennetzen; B D Hall
Journal:  J Biol Chem       Date:  1982-03-25       Impact factor: 5.157

View more
  35 in total

1.  A question of size: the eukaryotic proteome and the problems in defining it.

Authors:  Paul M Harrison; Anuj Kumar; Ning Lang; Michael Snyder; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2002-03-01       Impact factor: 16.971

2.  ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.

Authors:  Feng-Biao Guo; Hong-Yu Ou; Chun-Ting Zhang
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

3.  Classifier assessment and feature selection for recognizing short coding sequences of human genes.

Authors:  Kai Song; Ze Zhang; Tuo-Peng Tong; Fang Wu
Journal:  J Comput Biol       Date:  2012-03       Impact factor: 1.479

Review 4.  Identification of replication origins in archaeal genomes based on the Z-curve method.

Authors:  Ren Zhang; Chun-Ting Zhang
Journal:  Archaea       Date:  2005-05       Impact factor: 3.273

5.  SNR of DNA sequences mapped by general affine transformations of the indicator sequences.

Authors:  Jianfeng Shao; Xiaohua Yan; Shuo Shao
Journal:  J Math Biol       Date:  2012-07-21       Impact factor: 2.259

6.  Saccharomyces cerevisiae S288C genome annotation: a working hypothesis.

Authors:  Dianna G Fisk; Catherine A Ball; Kara Dolinski; Stacia R Engel; Eurie L Hong; Laurie Issel-Tarver; Katja Schwartz; Anand Sethuraman; David Botstein; J Michael Cherry
Journal:  Yeast       Date:  2006-09       Impact factor: 3.239

7.  Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species.

Authors:  Albert Lee; Kasper Daniel Hansen; James Bullard; Sandrine Dudoit; Gavin Sherlock
Journal:  PLoS Genet       Date:  2008-12-19       Impact factor: 5.917

8.  DIGA--a database of improved gene annotation for phytopathogens.

Authors:  Na Gao; Ling-Ling Chen; Hong-Fang Ji; Wei Wang; Ji-Wei Chang; Bei Gao; Lin Zhang; Shi-Cui Zhang; Hong-Yu Zhang
Journal:  BMC Genomics       Date:  2010-01-21       Impact factor: 3.969

9.  Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods.

Authors:  Feng-Biao Guo; Lifeng Xiong; Jade L L Teng; Kwok-Yung Yuen; Susanna K P Lau; Patrick C Y Woo
Journal:  DNA Res       Date:  2013-04-09       Impact factor: 4.458

10.  Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics.

Authors:  Suping Deng; Yixiang Shi; Liyun Yuan; Yixue Li; Guohui Ding
Journal:  BMC Genomics       Date:  2012-12-17       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.