| Literature DB >> 33410308 |
Jacobo Pardo-Seco1,2, Alberto Gómez-Carballa1,2, Xabier Bello1,2, Federico Martinón-Torres2,3, Antonio Salas1,4.
Abstract
Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites (as a proxy for phylogenetic-based criteria). This procedure demonstrates that, for short barcodes (e.g., 11 sites), there are thousands of informative site combinations that improve previous proposals. We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny, such that most representative genomes in these ancestral nodes are no longer in circulation. Consequently, coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.Entities:
Keywords: Barcode; COVID-19; Informative subtype markers; Phylodynamics; Phylogeny; SARS-COV-2
Mesh:
Year: 2021 PMID: 33410308 PMCID: PMC7840454 DOI: 10.24272/j.issn.2095-8137.2020.364
Source DB: PubMed Journal: Zool Res ISSN: 2095-8137
Figure 1Skeleton of the SARS-CoV-2 phylogeny based on ISMs signatures, interpolated frequency maps of haplogroup sub-lineages having differential geographic distributions, and comparative entropy values for ISMs signatures using different strategies
ISMs selected using HE procedure described in the present study and 20 ISMs signature captured by Zhao et al. (2020)
| 90 K database– | 90 K database – | ||||||||||||
| All database | Before 18 June 2020 | After 17 June 2020 | All database | Before 18 June 2020 | After 17 June 2020 | ||||||||
| Sites common in all columns are in bold. Database used by | |||||||||||||
| Site | Site | Site | Site | Site | Site | ||||||||
| #1 | 0.93 | 0.86 | 0.99 | 0.93 | 0.86 | 0.99 | |||||||
| #2 | 1.58 | 1.58 | 1.58 | 1.58 | 1.58 | 1.58 | |||||||
| #3 | 2.06 | 2.07 | 1.97 | 2.06 | 2.07 | 1.97 | |||||||
| #4 | 2.37 | 2.41 | 1163 | 2.35 | 2.37 | 2.41 | 2.24 | ||||||
| #5 | 1163 | 2.61 | 2.64 | 2.61 | 2.59 | 2.64 | 2.45 | ||||||
| #6 | 2.83 | 2.84 | 28854 | 2.83 | 2.78 | 2.84 | 2.64 | ||||||
| #7 | 3.02 | 3.03 | 3.03 | 2.93 | 3.03 | 2.75 | |||||||
| #8 | 3.17 | 3.21 | 19839 | 3.21 | 3.06 | 3.21 | 2.82 | ||||||
| #9 | 23731 | 3.31 | 15324 | 3.33 | 23731 | 3.37 | 18060 | 3.12 | 17747 | 3.30 | 14408 | 2.87 | |
| #10 | 28854 | 3.45 | 27964 | 3.44 | 3.52 | 14408 | 3.18 | 2558 | 3.36 | 18060 | 2.92 | ||
| #11 | 19839 | 3.58 | 10097 | 3.54 | 27964 | 3.65 | 2558 | 3.22 | 3037 | 3.42 | 23403 | 2.95 | |
| #12 | 3.70 | 28854 | 3.64 | 313 | 3.77 | 23403 | 3.25 | 26144 | 3.45 | 2558 | 2.99 | ||
| #13 | 27964 | 3.83 | 27046 | 3.73 | 3.88 | 3037 | 3.28 | 14408 | 3.48 | 3037 | 3.01 | ||
| #14 | 15324 | 3.93 | 17747 | 3.81 | 11916 | 3.98 | 26144 | 3.31 | 28144 | 3.50 | 17747 | 3.03 | |
| #15 | 313 | 4.03 | 25429 | 3.89 | 15324 | 4.07 | 17747 | 3.32 | 18060 | 3.52 | 26144 | 3.04 | |
| #16 | 11916 | 4.12 | 11916 | 3.97 | 22480 | 4.15 | 28144 | 3.33 | 23403 | 3.54 | 28882 | 3.05 | |
| #17 | 18877 | 4.19 | 313 | 4.04 | 4.22 | 28882 | 3.34 | 2480 | 3.54 | 2480 | 3.05 | ||
| #18 | 25429 | 4.26 | 29553 | 4.11 | 21575 | 4.29 | 2480 | 3.35 | 28882 | 3.55 | 28144 | 3.05 | |
| #19 | 18060 | 4.32 | 19839 | 4.18 | 18877 | 4.35 | 17858 | 3.35 | 17858 | 3.55 | 17858 | 3.06 | |
| #20 | 21575 | 4.38 | 18877 | 4.24 | 13862 | 4.41 | 28883 | 3.35 | 28883 | 3.56 | 28883 | 3.06 | |