| Literature DB >> 33069829 |
Tirthankar Paul1, Seppo Vainio2, Juha Roning3.
Abstract
The coronavirus pandemic became a major risk in global public health. The outbreak is caused by SARS-CoV-2, a member of the coronavirus family. Though the images of the virus are familiar to us, in the present study, an attempt is made to hear the coronavirus by translating its protein spike into audio sequences. The musical features such as pitch, timbre, volume and duration are mapped based on the coronavirus protein sequence. Three different viruses Influenza, Ebola and Coronavirus were studied and compared through their auditory virus sequences by implementing Haar wavelet transform. The sonification of the coronavirus benefits in understanding the protein structures by enhancing the hidden features. Further, it makes a clear difference in the representation of coronavirus compared with other viruses, which will help in various research works related to virus sequence. This evolves as a simplified and novel way of representing the conventional computational methods.Entities:
Keywords: Coronavirus; Haar wavelet; MIDI; Protein music; SVM
Year: 2020 PMID: 33069829 PMCID: PMC7561519 DOI: 10.1016/j.ygeno.2020.10.009
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736
Genome sequence clustering algorithms.
| Reference | Algorithm | Significance | Year |
|---|---|---|---|
| [ | Natural Vector (NV) | DNA sequences into twelve statistical points vectors. | 2011 |
| [ | mBKM with DMk | The occurrence and position of k-tuples of DNA sequences | 2012 |
| [ | Linclust | Reduce the time complexity for the large dataset | 2018 |
| [ | MeShClust | Clustering method based on shift algorithm of image processing. | 2018 |
| [ | Haar wavelet filtering | Detecting cancerous and non-cancerous genome. | 2018 |
| [ | Accumulated Natural Vector (ANV) | DNA sequences into eighteen statistical points vectors. | 2019 |
Fig. 1Steps of the musical conversion.
Fig. 2Cross-validation optimization fit for coronavirus and non-coronavirus data.
Accession numbe of whole genome virus sequences.
| Reference | Accession number | Description |
|---|---|---|
| [ | AF304460, AY994055, DQ811787, GQ477367, EU420138, EU420139, AF353511, DQ648858, AY585228, DQ011855, AY597011, FJ938068, AY278741, DQ412042, AY304486, M95169, EU022526, EU111742 | Coronavirus Family |
| [ | KU922529, KU922531, KU922536, KU922542, KY888158 | H1N1 Virus from Kerala, India |
| [ | NC_014373, NC_004161, NC_006432, NC_014372, NC_002549 | Ebolavirus |
| [ | NC_003045, NC_004718, KX722529, AB257344, AY291451, AY310120, AY338174, DQ182595, HQ890541 | Coronavirus |
| [ | AF455734, AF455726, MF955665, AF250130, AF250129, AB731584, CY043013, CY042542, CY020868, NC_002023 | Influenza |
| [ | KU182909, KY007523, KY007522, KT725391, KT725389, KT725378 | Ebolavirus |
Fig. 3Piano roll plot of virus protein sequence.
Euclidian distance of coronavirus before and after translating into music.
Fig. 4Clustering of virus sequences.
Fig. 5Clustering of auditory sequence of the viruses.
Fig. 6Boxplot demonstrating Average detail coefficient distribution on different viruses.
Fig. 7Classification based on Haar wavelet coefficients.