Literature DB >> 10869012

Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps.

O Gotoh1.   

Abstract

MOTIVATION: Locating protein-coding exons (CDSs) on a eukaryotic genomic DNA sequence is the initial and an essential step in predicting the functions of the genes embedded in that part of the genome. Accurate prediction of CDSs may be achieved by directly matching the DNA sequence with a known protein sequence or profile of a homologous family member(s).
RESULTS: A new convention for encoding a DNA sequence into a series of 23 possible letters (translated codon or tron code) was devised to improve this type of analysis. Using this convention, a dynamic programming algorithm was developed to align a DNA sequence and a protein sequence or profile so that the spliced and translated sequence optimally matches the reference the same as the standard protein sequence alignment allowing for long gaps. The objective function also takes account of frameshift errors, coding potentials, and translational initiation, termination and splicing signals. This method was tested on Caenorhabditis elegans genes of known structures. The accuracy of prediction measured in terms of a correlation coefficient (CC) was about 95% at the nucleotide level for the 288 genes tested, and 97. 0% for the 170 genes whose product and closest homologue share more than 30% identical amino acids. We also propose a strategy to improve the accuracy of prediction for a set of paralogous genes by means of iterative gene prediction and reconstruction of the reference profile derived from the predicted sequences. AVAILABILITY: The source codes for the program 'aln' written in ANSI-C and the test data will be available via anonymous FTP at ftp.genome.ad.jp/pub/genomenet/saitama-cc. CONTACT: gotoh@cancer-c.pref.saitama.jp

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10869012     DOI: 10.1093/bioinformatics/16.3.190

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

1.  Massive sequence comparisons as a help in annotating genomic sequences.

Authors:  A Louis; E Ollivier; J C Aude; J L Risler
Journal:  Genome Res       Date:  2001-07       Impact factor: 9.043

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

Review 3.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

4.  Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.

Authors:  Osamu Gotoh
Journal:  Methods Mol Biol       Date:  2021

5.  The 2008 update of the Aspergillus nidulans genome annotation: a community effort.

Authors:  Jennifer Russo Wortman; Jane Mabey Gilsenan; Vinita Joardar; Jennifer Deegan; John Clutterbuck; Mikael R Andersen; David Archer; Mojca Bencina; Gerhard Braus; Pedro Coutinho; Hans von Döhren; John Doonan; Arnold J M Driessen; Pawel Durek; Eduardo Espeso; Erzsébet Fekete; Michel Flipphi; Carlos Garcia Estrada; Steven Geysens; Gustavo Goldman; Piet W J de Groot; Kim Hansen; Steven D Harris; Thorsten Heinekamp; Kerstin Helmstaedt; Bernard Henrissat; Gerald Hofmann; Tim Homan; Tetsuya Horio; Hiroyuki Horiuchi; Steve James; Meriel Jones; Levente Karaffa; Zsolt Karányi; Masashi Kato; Nancy Keller; Diane E Kelly; Jan A K W Kiel; Jung-Mi Kim; Ida J van der Klei; Frans M Klis; Andriy Kovalchuk; Nada Krasevec; Christian P Kubicek; Bo Liu; Andrew Maccabe; Vera Meyer; Pete Mirabito; Márton Miskei; Magdalena Mos; Jonathan Mullins; David R Nelson; Jens Nielsen; Berl R Oakley; Stephen A Osmani; Tiina Pakula; Andrzej Paszewski; Ian Paulsen; Sebastian Pilsyk; István Pócsi; Peter J Punt; Arthur F J Ram; Qinghu Ren; Xavier Robellet; Geoff Robson; Bernhard Seiboth; Piet van Solingen; Thomas Specht; Jibin Sun; Naimeh Taheri-Talesh; Norio Takeshita; Dave Ussery; Patricia A vanKuyk; Hans Visser; Peter J I van de Vondervoort; Ronald P de Vries; Jonathan Walton; Xin Xiang; Yi Xiong; An Ping Zeng; Bernd W Brandt; Michael J Cornell; Cees A M J J van den Hondel; Jacob Visser; Stephen G Oliver; Geoffrey Turner
Journal:  Fungal Genet Biol       Date:  2008-12-25       Impact factor: 3.495

6.  Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST.

Authors:  E Michael Gertz; Yi-Kuo Yu; Richa Agarwala; Alejandro A Schäffer; Stephen F Altschul
Journal:  BMC Biol       Date:  2006-12-07       Impact factor: 7.431

7.  Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment.

Authors:  Osamu Gotoh; Mariko Morita; David R Nelson
Journal:  BMC Bioinformatics       Date:  2014-06-14       Impact factor: 3.169

8.  Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.

Authors:  Hiroaki Iwata; Osamu Gotoh
Journal:  Nucleic Acids Res       Date:  2012-07-30       Impact factor: 16.971

9.  Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae.

Authors:  Wanwipa Vongsangnak; Peter Olsen; Kim Hansen; Steen Krogsgaard; Jens Nielsen
Journal:  BMC Genomics       Date:  2008-05-23       Impact factor: 3.969

10.  A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence.

Authors:  Osamu Gotoh
Journal:  Nucleic Acids Res       Date:  2008-03-15       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.