Literature DB >> 12814599

Vector space classification of DNA sequences.

H-M Müller1, S E Koonin.   

Abstract

Revisiting the problem of intron-exon identification, we use a principal component analysis (PCA) to classify DNA sequences and present first results that validate our approach. Sequences are translated into document vectors that represent their word content; a principal component analysis then defines Gaussian-distributed sequence classes. The classification uses word content and variation of word usage to distinguish sequences. We test our approach with several data sets of genomic DNA and are able to classify introns and exons with an accuracy of up to 96%. We compare the method with the best traditional coding measure, the non-overlapping hexamer frequency count, and find that the PCA method produces better results. We also investigate the degree of cross-validation between different data sets of introns and exons and find evidence that the quality of a data set can be detected.

Mesh:

Year:  2003        PMID: 12814599     DOI: 10.1016/s0022-5193(03)00082-1

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  4 in total

1.  Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform.

Authors:  C H Cannon; C S Kua; E K Lobenhofer; P Hurban
Journal:  Nucleic Acids Res       Date:  2006-09-25       Impact factor: 16.971

2.  Graphical classification of DNA sequences of HLA alleles by deep learning.

Authors:  Jun Miyake; Yuhei Kaneshita; Satoshi Asatani; Seiichi Tagawa; Hirohiko Niioka; Takashi Hirano
Journal:  Hum Cell       Date:  2018-01-11       Impact factor: 4.174

3.  A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier.

Authors:  Garima Mathur; Anjana Pandey; Sachin Goyal
Journal:  J Ambient Intell Humaniz Comput       Date:  2022-06-25

4.  Metagenome fragment classification using N-mer frequency profiles.

Authors:  Gail Rosen; Elaine Garbarine; Diamantino Caseiro; Robi Polikar; Bahrad Sokhansanj
Journal:  Adv Bioinformatics       Date:  2008-11-16
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.