Literature DB >> 10072084

Assembling genes from predicted exons in linear time with dynamic programming.

R Guigó1.   

Abstract

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

Entities:  

Mesh:

Year:  1998        PMID: 10072084     DOI: 10.1089/cmb.1998.5.681

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  28 in total

1.  SGP-1: prediction and validation of homologous genes based on sequence alignments.

Authors:  T Wiehe; S Gebauer-Jung; T Mitchell-Olds; R Guigó
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

2.  In silico identification of the sea squirt selenoproteome.

Authors:  Liang Jiang; Qiong Liu; Jiazuan Ni
Journal:  BMC Genomics       Date:  2010-05-10       Impact factor: 3.969

3.  Comparative gene prediction in human and mouse.

Authors:  Genís Parra; Pankaj Agarwal; Josep F Abril; Thomas Wiehe; James W Fickett; Roderic Guigó
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

Review 4.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

5.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.

Authors:  Kevin L Howe; Tom Chothia; Richard Durbin
Journal:  Genome Res       Date:  2002-09       Impact factor: 9.043

6.  BOD: a customizable bioinformatics on demand system accommodating multiple steps and parallel tasks.

Authors:  Li-An Qiao; Jing Zhu; Qingyan Liu; Tao Zhu; Chi Song; Wei Lin; Guozhu Wei; Lisen Mu; Jiang Tao; Nanming Zhao; Guangwen Yang; Xiangjun Liu
Journal:  Nucleic Acids Res       Date:  2004-08-09       Impact factor: 16.971

7.  Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton.

Authors:  Yan Hu; Jiedan Chen; Lei Fang; Zhiyuan Zhang; Wei Ma; Yongchao Niu; Longzhen Ju; Jieqiong Deng; Ting Zhao; Jinmin Lian; Kobi Baruch; David Fang; Xia Liu; Yong-Ling Ruan; Mehboob-Ur Rahman; Jinlei Han; Kai Wang; Qiong Wang; Huaitong Wu; Gaofu Mei; Yihao Zang; Zegang Han; Chenyu Xu; Weijuan Shen; Duofeng Yang; Zhanfeng Si; Fan Dai; Liangfeng Zou; Fei Huang; Yulin Bai; Yugao Zhang; Avital Brodt; Hilla Ben-Hamo; Xiefei Zhu; Baoliang Zhou; Xueying Guan; Shuijin Zhu; Xiaoya Chen; Tianzhen Zhang
Journal:  Nat Genet       Date:  2019-03-18       Impact factor: 38.330

8.  GeneID in Drosophila.

Authors:  G Parra; E Blanco; R Guigó
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

9.  Approaches to Fungal Genome Annotation.

Authors:  Brian J Haas; Qiandong Zeng; Matthew D Pearson; Christina A Cuomo; Jennifer R Wortman
Journal:  Mycology       Date:  2011-10-03

10.  Testing the coding potential of conserved short genomic sequences.

Authors:  Jing Wu
Journal:  Adv Bioinformatics       Date:  2010-03-08
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.