Literature DB >> 26170017

A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

Antonino Fiannaca1, Massimo La Rosa2, Riccardo Rizzo2, Alfonso Urso2.   

Abstract

OBJECTIVES: In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed.
METHODS: In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database".
RESULTS: The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%.
CONCLUSIONS: Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.
Copyright © 2015 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Alignment-free analysis; DNA barcode classification; Neural gas; k-Mer representation

Mesh:

Substances:

Year:  2015        PMID: 26170017     DOI: 10.1016/j.artmed.2015.06.002

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  5 in total

1.  Deep learning models for bacteria taxonomic classification of metagenomic data.

Authors:  Antonino Fiannaca; Laura La Paglia; Massimo La Rosa; Giosue' Lo Bosco; Giovanni Renda; Riccardo Rizzo; Salvatore Gaglio; Alfonso Urso
Journal:  BMC Bioinformatics       Date:  2018-07-09       Impact factor: 3.169

2.  Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification.

Authors:  Rahul Jamdade; Maulik Upadhyay; Khawla Al Shaer; Eman Al Harthi; Mariam Al Sallani; Mariam Al Jasmi; Asma Al Ketbi
Journal:  Plants (Basel)       Date:  2021-12-13

3.  Mathematical Modeling and Computational Prediction of High-Risk Types of Human Papillomaviruses.

Authors:  Junchao Zhang; Kechao Wang
Journal:  Comput Math Methods Med       Date:  2022-07-21       Impact factor: 2.809

4.  Deep learning architectures for prediction of nucleosome positioning from sequences data.

Authors:  Mattia Di Gangi; Giosuè Lo Bosco; Riccardo Rizzo
Journal:  BMC Bioinformatics       Date:  2018-11-20       Impact factor: 3.169

5.  Methylation-driven model for analysis of dinucleotide evolution in genomes.

Authors:  Jian-Hong Sun; Shi-Meng Ai; Shu-Qun Liu
Journal:  Theor Biol Med Model       Date:  2020-04-08       Impact factor: 2.432

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.