Literature DB >> 19179707

Gene classification using codon usage and support vector machines.

Jianmin Ma1, Minh N Nguyen, Jagath C Rajapakse.   

Abstract

A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19179707     DOI: 10.1109/TCBB.2007.70240

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  1 in total

1.  The gene-specific codon counting database: a genome-based catalog of one-, two-, three-, four- and five-codon combinations present in Saccharomyces cerevisiae genes.

Authors:  Sudheer Tumu; Ashish Patil; William Towns; Madhu Dyavaiah; Thomas J Begley
Journal:  Database (Oxford)       Date:  2012-02-08       Impact factor: 3.451

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.