| Literature DB >> 32953249 |
Dipendra C Sengupta1, Matthew D Hill1, Kevin R Benton1, Hirendra N Banerjee2.
Abstract
The novel coronavirus (SARS-COV-2) is generally referred to as Covid-19 virus has spread to 213 countries with nearly 7 million confirmed cases and nearly 400,000 deaths. Such major outbreaks demand classification and origin of the virus genomic sequence, for planning, containment, and treatment. Motivated by the above need, we report two alignment-free methods combing with CGR to perform clustering analysis and create a phylogenetic tree based on it. To each DNA sequence we associate a matrix then define distance between two DNA sequences to be the distance between their associated matrix. These methods are being used for phylogenetic analysis of coronavirus sequences. Our approach provides a powerful tool for analyzing and annotating genomes and their phylogenetic relationships. We also compare our tool to ClustalX algorithm which is one of the most popular alignment methods. Our alignment-free methods are shown to be capable of finding closest genetic relatives of coronaviruses.Entities:
Keywords: Chaos Game Representation; Covid-19; Deoxyribonucleic Acid; Phylogenetic Analysis; Shannon Entropy
Year: 2020 PMID: 32953249 PMCID: PMC7497811 DOI: 10.4236/cmb.2020.103004
Source DB: PubMed Journal: Comput Mol Biosci ISSN: 2165-3445
Dataset for the experiment.
| Virus name | NCBI/GISAID Accession number |
|---|---|
| 1) hCov-19/bat/Yunnan | EPI_ISL_412976 |
| 2) hCov-19/pangolin/Guangdong | EPI_ISL_410721 |
| 3) hCov-19/bat/Yunnan/RaTG13 | EPI_ISL_402131 |
| 4) hCov-19/India | EPI_ISL_431117 |
| 5) hCov-19/Italy | EPI_ISL_417446 |
| 6) hCov-19/Iran | EPI_ISL_437512 |
| 7) hCov-19/Spain | EPI_ISL_428684 |
| 8) hCov-19/USA | EPI_ISL_431086 |
| 9) hCov-19/Wuhan | EPI_ISL_412980 |
| 10) Human Coronavirus-229E | KF-514433 |
| 11) Human Coronavirus-HKU1 | KF-430201 |
| 12) Human Coronavirus-NL63 | KF-530114 |
| 13) Human Coronavirus-OC43 | KF-530099 |
| 14) SARS-Cov | NC_004718 |
| 15) MERS | KT-026456 |
Figure 1.CGR images of all fifteen coronaviruses listed in Table 1.
Probability distance matrix of 15 viruses listed in Table 1.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||
| 2 | 0.3079 | ||||||||||||||
| 3 | 0.4900 | 0.4606 | |||||||||||||
| 4 | 0.5129 | 0.6301 | 0.6303 | ||||||||||||
| 5 | 0.7076 | 0.7548 | 0.7506 | 0.7436 | |||||||||||
| 6 | 0.7342 | 0.7737 | 0.7602 | 0.7969 | 0.7858 | ||||||||||
| 7 | 0.8657 | 0.8700 | 0.8443 | 0.9420 | 0.8850 | 0.8406 | |||||||||
| 8 | 0.8074 | 0.8299 | 0.8037 | 0.8828 | 0.8587 | 0.7247 | 0.7237 | ||||||||
| 9 | 0.7578 | 0.7904 | 0.7744 | 0.8132 | 0.7894 | 0.7612 | 0.7067 | 0.7470 | |||||||
| 10 | 0.4920 | 0.7671 | 0.2929 | 0.6313 | 0.7441 | 0.7714 | 0.8531 | 0.8123 | 0.7846 | ||||||
| 11 | 0.4947 | 0.4750 | 0.0600 | 0.6408 | 0.7608 | 0.7614 | 0.8519 | 0.8029 | 0.7827 | 0.3143 | |||||
| 12 | 0.4930 | 0.4677 | 0.0321 | 0.6341 | 0.7553 | 0.7602 | 0.8477 | 0.8028 | 0.7783 | 0.3024 | 0.0299 | ||||
| 13 | 0.4905 | 0.4644 | 0.0180 | 0.6311 | 0.7529 | 0.7601 | 0.8456 | 0.8032 | 0.7757 | 0.2972 | 0.0492 | 0.0200 | |||
| 14 | 0.4901 | 0.4646 | 0.0179 | 0.6318 | 0.7524 | 0.7595 | 0.8451 | 0.8030 | 0.7748 | 0.2978 | 0.0530 | 0.0254 | 0.0168 | ||
| 15 | 0.4907 | 0.4623 | 0.0095 | 0.6306 | 0.7514 | 0.7599 | 0.8444 | 0.8037 | 0.7748 | 0.2953 | 0.0583 | 0.0320 | 0.0192 | 0.0192 |
CGR Centroid distance matrix of 15 viruses listed in Table 1.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||
| 2 | 0.4531 | ||||||||||||||
| 3 | 0.5567 | 0.4439 | |||||||||||||
| 4 | 0.5408 | 0.6281 | 0.6188 | ||||||||||||
| 5 | 0.9029 | 0.9255 | 0.8784 | 0.7598 | |||||||||||
| 6 | 0.8845 | 0.8718 | 0.8409 | 0.8615 | 0.8762 | ||||||||||
| 7 | 1.4297 | 1.3203 | 1.2682 | 1.3924 | 1.300 | 1.2339 | |||||||||
| 8 | 1.2246 | 1.0924 | 1.0161 | 1.2011 | 1.200 | 0.9157 | 0.9635 | ||||||||
| 9 | 1.0256 | 0.9862 | 0.9295 | 0.9869 | 0.9310 | 0.8623 | 0.9538 | 0.9123 | |||||||
| 10 | 0.5581 | 0.4575 | 0.3303 | 0.6356 | 0.9271 | 0.8824 | 1.2759 | 1.0163 | 0.9912 | ||||||
| 11 | 0.5915 | 0.4816 | 0.1350 | 0.6525 | 0.9115 | 0.8667 | 1.2682 | 1.0432 | 0.9391 | 0.3694 | |||||
| 12 | 0.5654 | 0.4591 | 0.0969 | 0.6312 | 0.8839 | 0.8518 | 1.2604 | 1.0403 | 0.9217 | 0.3446 | 0.0670 | ||||
| 13 | 0.5607 | 0.4576 | 0.0702 | 0.6247 | 0.8837 | 0.8450 | 1.2644 | 1.0326 | 0.9291 | 0.3367 | 0.1156 | 0.0636 | |||
| 14 | 0.6113 | 0.5127 | 0.1596 | 0.6785 | 0.9097 | 0.8583 | 1.3064 | 1.0584 | 0.9613 | 0.3859 | 0.2254 | 0.1793 | 0.1558 | ||
| 15 | 0.5460 | 0.4416 | 0.0454 | 0.6167 | 0.8783 | 0.8332 | 1.2680 | 1.0221 | 0.9235 | 0.3290 | 0.1295 | 0.0943 | 0.0721 | 0.1586 |
Figure 2.HAC phylogenetic tree using probability matrix distance.
Figure 3.HAC phylogenetic tree using CGR centroid distance.
Figure 4.Phylogenetic Tree was created by Clustal X by aligning 15 DNA sequences using Neighborhood Joining Method.
Figure 5.Shannon Entropy of 57-virus genomes.
Figure 6.7-mers Shannon Entropy of 57 virus sequences.