Literature DB >> 8799154

Gene recognition via spliced sequence alignment.

M S Gelfand1, A A Mironov, P A Pevzner.   

Abstract

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

Entities:  

Mesh:

Year:  1996        PMID: 8799154      PMCID: PMC38595          DOI: 10.1073/pnas.93.17.9061

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  23 in total

1.  Computer prediction of the exon-intron structure of mammalian pre-mRNAs.

Authors:  M S Gelfand
Journal:  Nucleic Acids Res       Date:  1990-10-11       Impact factor: 16.971

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.

Authors:  E C Uberbacher; R J Mural
Journal:  Proc Natl Acad Sci U S A       Date:  1991-12-15       Impact factor: 11.205

4.  Recognition of genes in human DNA sequences.

Authors:  M S Gelfand; L I Podolsky; T V Astakhova; M A Roytberg
Journal:  J Comput Biol       Date:  1996       Impact factor: 1.479

5.  Approximate matching of regular expressions.

Authors:  E W Myers; W Miller
Journal:  Bull Math Biol       Date:  1989       Impact factor: 1.758

6.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

7.  The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules.

Authors:  R Legouis; J P Hardelin; J Levilliers; J M Claverie; S Compain; V Wunderle; P Millasseau; D Le Paslier; D Cohen; D Caterina
Journal:  Cell       Date:  1991-10-18       Impact factor: 41.582

8.  Two SP-C genes encoding human pulmonary surfactant proteolipid.

Authors:  S W Glasser; T R Korfhagen; C M Perme; T J Pilot-Matias; S E Kister; J A Whitsett
Journal:  J Biol Chem       Date:  1988-07-25       Impact factor: 5.157

9.  Search algorithm for pattern match analysis of nucleic acid sequences.

Authors:  R Harr; M Häggström; P Gustafsson
Journal:  Nucleic Acids Res       Date:  1983-05-11       Impact factor: 16.971

10.  Amino acid substitution matrices from an information theoretic perspective.

Authors:  S F Altschul
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

View more
  58 in total

1.  A cell plate-specific callose synthase and its interaction with phragmoplastin.

Authors:  Z Hong; A J Delauney; D P Verma
Journal:  Plant Cell       Date:  2001-04       Impact factor: 11.277

2.  An assessment of gene prediction accuracy in large DNA sequences.

Authors:  R Guigó; P Agarwal; J F Abril; M Burset; J W Fickett
Journal:  Genome Res       Date:  2000-10       Impact factor: 9.043

3.  SGP-1: prediction and validation of homologous genes based on sequence alignments.

Authors:  T Wiehe; S Gebauer-Jung; T Mitchell-Olds; R Guigó
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

4.  Computational inference of homologous gene structures in the human genome.

Authors:  R F Yeh; L P Lim; C B Burge
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

5.  Evaluation of gene-finding programs on mammalian sequences.

Authors:  S Rogic; A K Mackworth; F B Ouellette
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

6.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model.

Authors:  Marina Alexandersson; Simon Cawley; Lior Pachter
Journal:  Genome Res       Date:  2003-03       Impact factor: 9.043

7.  Gene structure prediction in syntenic DNA segments.

Authors:  Jonathan E Moore; James A Lake
Journal:  Nucleic Acids Res       Date:  2003-12-15       Impact factor: 16.971

8.  Dictionary-driven prokaryotic gene finding.

Authors:  Tetsuo Shibuya; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2002-06-15       Impact factor: 16.971

9.  Gene structure conservation aids similarity based gene prediction.

Authors:  Irmtraud M Meyer; Richard Durbin
Journal:  Nucleic Acids Res       Date:  2004-02-04       Impact factor: 16.971

10.  Comparative gene prediction in human and mouse.

Authors:  Genís Parra; Pankaj Agarwal; Josep F Abril; Thomas Wiehe; James W Fickett; Roderic Guigó
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.