Literature DB >> 10072083

A decision tree system for finding genes in DNA.

S Salzberg1, A L Delcher, K H Fasman, J Henderson.   

Abstract

MORGAN is an integrated system for finding genes in vertebrate DNA sequences. MORGAN uses a variety of techniques to accomplish this task, the most distinctive of which is a decision tree classifier. The decision tree system is combined with new methods for identifying start codons, donor sites, and acceptor sites, and these are brought together in a frame-sensitive dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and noncoding regions (exons and introns). The optimal segmentation is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. The scoring functions in MORGAN are sets of decision trees that are combined to give a probability estimate. Experimental results on a database of 570 vertebrate DNA sequences show that MORGAN has excellent performance by many different measures. On a separate test set, it achieves an overall accuracy of 95 %, with a correlation coefficient of 0.78, and a sensitivity and specificity for coding bases of 83 % and 79%. In addition, MORGAN identifies 58% of coding exons exactly; i.e., both the beginning and end of the coding regions are predicted correctly. This paper describes the MORGAN system, including its decision tree routines and the algorithms for site recognition, and its performance on a benchmark database of vertebrate DNA.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 10072083     DOI: 10.1089/cmb.1998.5.667

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  16 in total

1.  GeneSplicer: a new computational method for splice site prediction.

Authors:  M Pertea; X Lin; S L Salzberg
Journal:  Nucleic Acids Res       Date:  2001-03-01       Impact factor: 16.971

Review 2.  Computational gene finding in plants.

Authors:  Mihaela Pertea; Steven L Salzberg
Journal:  Plant Mol Biol       Date:  2002-01       Impact factor: 4.076

3.  Evaluation of gene-finding programs on mammalian sequences.

Authors:  S Rogic; A K Mackworth; F B Ouellette
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

4.  DNA splice site detection: a comparison of specific and general methods.

Authors:  Won Kim; W John Wilbur
Journal:  Proc AMIA Symp       Date:  2002

5.  Computational gene prediction using multiple sources of evidence.

Authors:  Jonathan E Allen; Mihaela Pertea; Steven L Salzberg
Journal:  Genome Res       Date:  2004-01       Impact factor: 9.043

Review 6.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

7.  Classifier assessment and feature selection for recognizing short coding sequences of human genes.

Authors:  Kai Song; Ze Zhang; Tuo-Peng Tong; Fang Wu
Journal:  J Comput Biol       Date:  2012-03       Impact factor: 1.479

8.  A neural-network technique to learn concepts from electroencephalograms.

Authors:  Vitaly Schetinin; Joachim Schult
Journal:  Theory Biosci       Date:  2005-07-14       Impact factor: 1.919

9.  A novel phosphotransferase system of Streptococcus mutans is responsible for transport of carbohydrates with α-1,3 linkage.

Authors:  D Ajdic; Z Chen
Journal:  Mol Oral Microbiol       Date:  2012-11-01       Impact factor: 3.563

10.  Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics.

Authors:  Suping Deng; Yixiang Shi; Liyun Yuan; Yixue Li; Guohui Ding
Journal:  BMC Genomics       Date:  2012-12-17       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.