Literature DB >> 8521276

Locating protein coding regions in human DNA using a decision tree algorithm.

S Salzberg1.   

Abstract

Genes in eukaryotic DNA cover hundreds or thousands of base pairs, while the regions of those genes that code for proteins may occupy only a small percentage of the sequence. Identifying the coding regions is of vital importance in understanding these genes. Many recent research efforts have studied computational methods for distinguishing between coding and noncoding regions, and several promising results have been reported. We describe here a new approach, using a machine learning system that builds decision trees from the data. This approach combines several coding measures to produce classifiers with consistently higher accuracies than previous methods, on DNA sequences ranging from 54 to 162 base pairs in length. The algorithm is very efficient, and it can easily be adapted to different sequence lengths. Our conclusion is that decision trees are a highly effective tool for identifying protein coding regions.

Entities:  

Mesh:

Substances:

Year:  1995        PMID: 8521276     DOI: 10.1089/cmb.1995.2.473

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  8 in total

Review 1.  Computational gene finding in plants.

Authors:  Mihaela Pertea; Steven L Salzberg
Journal:  Plant Mol Biol       Date:  2002-01       Impact factor: 4.076

2.  A Random Forest Approach for Counting Silicone Oil Droplets and Protein Particles in Antibody Formulations Using Flow Microscopy.

Authors:  Miguel Saggu; Ankit R Patel; Theodoro Koulis
Journal:  Pharm Res       Date:  2016-12-19       Impact factor: 4.200

3.  The use of classification trees for bioinformatics.

Authors:  Xiang Chen; Minghui Wang; Heping Zhang
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2011-01-06

4.  Classification of genomic islands using decision trees and their ensemble algorithms.

Authors:  Dongsheng Che; Cory Hockenbury; Robert Marmelstein; Khaled Rasheed
Journal:  BMC Genomics       Date:  2010-11-02       Impact factor: 3.969

5.  IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction.

Authors:  Kiran Sree Pokkuluri; Ramesh Babu Inampudi; S S S N Usha Devi Nedunuri
Journal:  Adv Bioinformatics       Date:  2014-07-15

6.  Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.

Authors:  Shun-Long Weng; Kai-Yao Huang; Fergie Joanda Kaunang; Chien-Hsun Huang; Hui-Ju Kao; Tzu-Hao Chang; Hsin-Yao Wang; Jang-Jih Lu; Tzong-Yi Lee
Journal:  BMC Bioinformatics       Date:  2017-03-14       Impact factor: 3.169

7.  Characterization and Identification of Natural Antimicrobial Peptides on Different Organisms.

Authors:  Chia-Ru Chung; Jhih-Hua Jhong; Zhuo Wang; Siyu Chen; Yu Wan; Jorng-Tzong Horng; Tzong-Yi Lee
Journal:  Int J Mol Sci       Date:  2020-02-02       Impact factor: 5.923

8.  Gene expression prediction using low-rank matrix completion.

Authors:  Arnav Kapur; Kshitij Marwah; Gil Alterovitz
Journal:  BMC Bioinformatics       Date:  2016-06-17       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.