Literature DB >> 33935376

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Marika Kaden1,2, Katrin Sophie Bohnsack1,2, Mirko Weber1,2, Mateusz Kudła1,3, Kaja Gutowska3,4,5, Jacek Blazewicz3,4,5, Thomas Villmann1,2.   

Abstract

We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants of learning vector quantization (LVQ) based on dissimilarity measures for RNA sequence data. The respective matrix LVQ provides additional knowledge about the classification decisions like discriminant feature correlations and, additionally, can be equipped with easy to realize reject options for uncertain data. Those options provide self-controlled evidence, i.e., the model refuses to make a classification decision if the model evidence for the presented data is not sufficient. This model is first trained using a GISAID dataset with given virus types detected according to the molecular differences in coronavirus populations by phylogenetic tree clustering. In a second step, we apply the trained model to another but unlabeled SARS-CoV-2 virus dataset. For these data, we can either assign a virus type to the sequences or reject atypical samples. Those rejected sequences allow to speculate about new virus types with respect to nucleotide base mutations in the viral sequences. Moreover, this rejection analysis improves model robustness. Last but not least, the presented approach has lower computational complexity compared to methods based on (multiple) sequence alignment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06018-2.
© The Author(s) 2021.

Entities:  

Keywords:  Interpretable models; Genomic sequence analysis; Learning vector quantization; Reject options

Year:  2021        PMID: 33935376      PMCID: PMC8076884          DOI: 10.1007/s00521-021-06018-2

Source DB:  PubMed          Journal:  Neural Comput Appl        ISSN: 0941-0643            Impact factor:   5.606


  53 in total

1.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

2.  Adaptive relevance matrices in learning vector quantization.

Authors:  Petra Schneider; Michael Biehl; Barbara Hammer
Journal:  Neural Comput       Date:  2009-12       Impact factor: 2.026

3.  Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984.

Authors:  A Cornish-Bowden
Journal:  Nucleic Acids Res       Date:  1985-05-10       Impact factor: 16.971

4.  Virus classification in 60-dimensional protein space.

Authors:  Yongkun Li; Kun Tian; Changchuan Yin; Rong Lucy He; Stephen S-T Yau
Journal:  Mol Phylogenet Evol       Date:  2016-03-15       Impact factor: 4.286

5.  Regularization in matrix relevance learning.

Authors:  Petra Schneider; Kerstin Bunte; Han Stiekema; Barbara Hammer; Thomas Villmann; Michael Biehl
Journal:  IEEE Trans Neural Netw       Date:  2010-03-15

6.  Hypercycle.

Authors:  Natalia Szostak; Szymon Wasik; Jacek Blazewicz
Journal:  PLoS Comput Biol       Date:  2016-04-07       Impact factor: 4.475

7.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

Review 8.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

9.  Progressive multiple sequence alignment with indel evolution.

Authors:  Massimo Maiolo; Xiaolei Zhang; Manuel Gil; Maria Anisimova
Journal:  BMC Bioinformatics       Date:  2018-09-21       Impact factor: 3.169

10.  The proximal origin of SARS-CoV-2.

Authors:  Kristian G Andersen; Andrew Rambaut; W Ian Lipkin; Edward C Holmes; Robert F Garry
Journal:  Nat Med       Date:  2020-04       Impact factor: 87.241

View more
  1 in total

1.  A self-organizing world: special issue of the 13th edition of the workshop on self-organizing maps and learning vector quantization, clustering and data visualization, WSOM + 2019.

Authors:  Alfredo Vellido; Cecilio Angulo; Karina Gibert
Journal:  Neural Comput Appl       Date:  2021-07-19       Impact factor: 5.606

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.