| Literature DB >> 34189012 |
Sidharth Purohit1, Suresh Chandra Satapathy1, S Sibi Chakkaravarthy2, Yu-Dong Zhang3.
Abstract
Virus attacks have had devastating effects on mankind. The prominent viruses such as Ebola virus (2012), SARS-CoV or Severe acute respiratory syndrome, Middle East respiratory syndrome-related coronavirus called as the MERS (EMC/2012), Spanish flu (H1N1 virus-1918) and the most recent COVID-19(SARS-CoV-2) are the ones that have created a difficult situation for the survival of the human race. Currently, throughout the world, a global pandemic situation has put economy, livelihood and human existence in a very pathetic situation. Most of the above-mentioned viruses exhibit some similar characteristics and genetic pattern. Analysing such characteristics and genetic pattern can help the researchers to get a deeper insight into the viruses and helps in finding appropriate medicine or cure. To address these issues, this paper proposes an experimental analysis of the above-mentioned viruses data using correlation methods. The virus data considered for the experimental analysis include the distribution of various amino acids, protein sequences, 3D modelling of viruses, pairwise alignment of proteins that comprise the DNA genome of the viruses. Furthermore, this comparative analysis can be used by the researchers and organizations like WHO(World Health Organization), computational biologists, genetic engineers to frame a layout for studying the DNA sequence distribution, percentage of GC (guanine-cytosine) protein which determines the heat stability of viruses. We have used the Biopython to illustrate the gene study of prominent viruses and have derived results and insights in the form of 3D modelling. The experimental results are more promising with an accuracy rate of 96% in overall virus relationship calculation. © King Fahd University of Petroleum & Minerals 2021.Entities:
Keywords: Biopython; COVID-19; Corona; DNA; Ebola; Genome; MERS; SARS; Spanish Flu
Year: 2021 PMID: 34189012 PMCID: PMC8221988 DOI: 10.1007/s13369-021-05811-4
Source DB: PubMed Journal: Arab J Sci Eng ISSN: 2191-4281 Impact factor: 2.807
Fig.1Flowchart of the proposed methodology
Modified sequence of the virus-COVID-19 in the k-mers form
| Virus DNA genome considered | Modified sequence with optimum k-mer, | String conversion of the sequence | Cleaned hexamers to be used in frequency matrix |
|---|---|---|---|
| COVID-19 | [Seq('attaaa', SingleLetterAlphabet()), Seq('ttaaag', SingleLetterAlphabet()), Seq('taaagg', SingleLetterAlphabet()), Seq('aaaggt', SingleLetterAlphabet()), Seq('aaggtt', SingleLetterAlphabet()), Seq('aggttt', SingleLetterAlphabet()) ….……..] | "[Seq('attaaa',SingleLetterAlphabet()), Seq('ttaaag', SingleLetterAlphabet()), Seq('taaagg', SingleLetterAlphabet()), Seq('aaaggt', SingleLetterAlphabet()), Seq('aaggtt', SingleLetterAlphabet()),….]” | attaaa ttaaag taaagg aaaggt aaggtt aggttt ggttta gtttat tttata ttatac tatacc atacct tacctt accttc ccttcc cttccc ttccca tcccag cccagg ccaggt caggta |
Fig.2DNA standard codon table
Finding the length of each considered DNA sequence
| Virus name | Length of protein sequence |
|---|---|
| COVID-19 virus | 29,903 |
| SARS virus | 29,751 |
| MERS virus | 30,119 |
| HIV virus | 999 |
| Ebola virus | 18,959 |
| Dengue virus | 15,256 |
| Rotavirus | 213 |
| Hanta virus | 3653 |
| Spanish flu virus | 930 |
| Swine flu virus | 982 |
GC content of each of the virus
| Virus name | GC content percentage |
|---|---|
| COVID-19 virus | 37.97277865097148 |
| SARS virus | 40.7616550704178 |
| MERS virus | 41.23642883229855 |
| HIV virus | 33.233233233233236 |
| Ebola virus | 41.07284139458832 |
| Dengue virus | 49.0954378605139 |
| Rotavirus | 42.857142857142854 |
| Hanta virus | 39.41965507801807 |
| Spanish flu virus | 47.41935483870968 |
| Swine flu virus | 47.04684317718941 |
Pairwise global alignment between COVID-19 DNA and other viruses DNA sequence
| Virus name | Pairwise global alignment percentage (%) |
|---|---|
| SARS virus | 89.0 |
| MERS virus | 72 |
| HIV virus | 57 |
| Ebola virus | 58 |
| Dengue virus | 62 |
| Rotavirus | 21 |
| Hanta virus | 62 |
| Spanish flu virus | 64 |
| Swine flu virus | 62 |
Fig.3Protein sequence distribution of various virus such as (i) COVID-19, (ii) Ebola virus, (iii) MERS, (iv) SARS virus, (v) Hanta virus, (vi) Spanish flu virus, (vii) Dengue virus, (viii) Swine flu virus, (ix) Rota
Fig.4Dot plot-based comparison of the COVID-19 versus the various virus such as (i) SARS (ii) Ebola virus, (iii) MERS, (iv) Hanta (v) Spanish flu (vi) Dengue virus, (vii) Swine flu virus, (viii) Rota
Fig.5Visualization (3D model) of various virus such as (i) COVID-19, (ii) Ebola virus, (iii) MERS, (iv) SARS virus, (v) Hanta virus, (vi) Spanish flu virus, (vii) Dengue virus, (viii) Swine flu virus, (ix) Rota
Cosine similarity-based analysis and angle between COVID-19 DNA sequence and other sequences
| Virus name | Cosine of angle between COVID-19 virus vector and the considered virus, the (value) | Angle between COVID-19 & the virus cos−1 (value) |
|---|---|---|
| COVID-19 virus | 1 | 0° |
| SARS virus | 0.92229834 | 22.7355694° |
| MERS virus | 0.89049951 | 27.06391766° |
| HIV virus | 0.06197418 | 86.44686409° |
| Ebola virus | 0.787751 | 38.02416713° |
| Dengue virus | 0.65690051 | 48.93608574° |
| Rotavirus | 0.05758748 | 86.69865401° |
| Hanta virus | 0.64873133 | 49.55398225° |
| Spanish flu | 0.31512345 | 71.63173348° |
| Swine flu | 0.32418355 | 71.08388204° |
Cosine similarity-based analysis and angle between COVID-19 DNA sequence (countrywise)
| Mutant virus country specific/NCBI ID | Angle between Wuhan COVID-19 DNA and the considered virus variant. cos−1 (value). (In Degrees) | Cosine of angle between Wuhan COVID-19 virus DNA vector and the considered virus variant |
|---|---|---|
| Australia/MT007544.1 | 0.68362699 | 0.99992882 |
| Bahrain/MW332535.1 | 2.94291781 | 0.99868118 |
| France/MT470127.1 | 0.91760131 | 0.99987176 |
| Germany/MT270108.1 | 2.78379925 | 0.99881991 |
| Greece/MT459897.1 | 2.75083162 | 0.99884769 |
| India/MW243003.1 | 2.85687987 | 0.99875715 |
| Italy/MW423686.1 | 3.20000429 | 0.99844076 |
| Japan/LC529905.1 | 0.54542658 | 0.99995469 |
| Netherlands/MT705205.1 | 0.83872064 | 0.99989286 |
| Saudi-Arabia/MT755890.1 | 1.05335376 | 0.99983101 |
| South-Korea/MT039890.1 | 0.95329293 | 0.99986159 |
| Spain/MW375727.1 | 2.77313779 | 0.99882893 |
| Thailand/MT447155.1 | 2.66007165 | 0.99892246 |
| Tunisia/MW426406.1 | 1.55209695 | 0.99963311 |
| USA (California)/MW306388.1 | 2.96895029 | 0.99865775 |
| USA (Minnesota)/MW349056.1 | 2.95714573 | 0.9986684 |
| USA (New Mexico)/MW269901.1 | 89.7553073 | 0.00427068 |
| USA (Wisconsin)/MW342048.1 | 46.1985380 | 0.69216159 |
| UK (England)/MW059036.1 | 1.18159272 | 0.99978736 |
| Vietnam/ MT192773.1 | 2.83321587 | 0.99877765 |
| Zambia/MT790522.1 | 2.68441672 | 0.99890265 |