| Literature DB >> 34258267 |
Abdelmalek Hakmaoui1, Faisal Khan2, Abdelhamid Liacini3, Amanjot Kaur2, Yacine Berka4, Safaa Machraoui1, Hafid Soualhine5, Noureddine Berka6, Hanane Rais1, Brahim Admou1,7.
Abstract
Real-time genome monitoring of the SARS-CoV-2 pandemic outbreak is of utmost importance for designing diagnostic tools, guiding antiviral treatment and vaccination strategies. In this study, we present an accurate method for temporal and geographical comparison of mutational events based on GISAID database genome sequencing. Among 42523 SARS-CoV-2 genomes analyzed, we found 23202 variants compared to the reference genome. The Ti/Tv (transition/transversion) ratio was used to filter out possible false-positive errors. Transition mutations generally occurred more frequently than transversions. Our clustering analysis revealed remarkable hotspot mutation patterns for SARS-CoV-2. Mutations were clustered based on how their frequencies changed over time according to each geographical location. We observed some clusters showing a clear variation in mutation frequency and continuously evolving in the world. However, many mutations appeared in specific periods without a clear pattern over time. Various important nonsynonymous mutations were observed, mainly in Oceania and Asia. More than half of these mutations were observed only once. Four hotspot mutations were found in all geographical locations at least once: T265I (NSP2), P314L (NSP12), D614G (S), and Q57H (ORF3a). The current analysis of SARS-CoV-2 genomes provides valuable information on the geographical and temporal mutational evolution of SARS-CoV-2.Entities:
Year: 2021 PMID: 34258267 PMCID: PMC8241501 DOI: 10.1155/2021/5553173
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Distribution of the number of transition and transversion mutations and the Ti/Tv ratio according to cutoffs of variant frequencies (VF) using global data of 42523 SARS-CoV-2 genomes.
| Cutoff frequency | Transition | Transversion | Ti/Tv ratio |
|---|---|---|---|
| VF ≥ 1% | 42 (79%) | 13 (21%) | 3.2 |
| VF ≥ 0.1% | 832 (73%) | 303 (27%) | 2.7 |
| VF ≥ 0.01% | 5423 (67%) | 3225 (37%) | 1.7 |
| Total | 12822 (55%) | 10380 (45%) | 1.2 |
Distribution of the number of genome samples (N) and the number of transition (Ti) and transversion (Tv) mutations and the Ti/Tv ratio for data from each location during different periods of virus collection. A chi-squared test was used to compare the distribution of mutation types by period for each location and to compare the mutation type by location for each period. NA: data not available or sample number under 20.
| Period of collection | January | February | March | April | May | June |
|
|---|---|---|---|---|---|---|---|
| Africa ( | 0.9134 | ||||||
|
| NA | NA | 142 | 210 | 78 | 139 | |
| Ti, | NA | NA | 64 (76.19) | 80 (72.07) | 139 (73.54) | 67 (72.04) | |
| Tv, | NA | NA | 20 (23.81) | 31 (27.93) | 50 (26.46) | 26 (27.96) | |
| Ti/Tv | 3.2 | 2.58 | 2.78 | 2.58 | |||
|
| |||||||
| Asia ( | 0.8918 | ||||||
|
| 379 | 551 | 1247 | 1025 | 685 | 418 | |
| Ti, | 30 (78.95) | 36 (72) | 38 (70.37) | 44 (69.94) | 56 (68.3) | 65 (73.03) | |
| Tv, | 8 (21.05) | 14 (28) | 16 (29.63) | 19 (30.16) | 26 (31.7) | 24 (26.97) | |
| Ti/Tv | 3.75 | 2.57 | 2.37 | 2.32 | 2.15 | 2.7 | |
|
| |||||||
| Europe ( | 0.1793 | ||||||
|
| 21 | 196 | 10025 | 10450 | 3003 | 631 | |
| Ti, | 86 (71.67) | 55 (84.62) | 37 (80.43) | 33 (75) | 42 (66.67) | 117 (70.9) | |
| Tv, | 34 (28.33) | 10 (15.38) | 9 (19.57) | 11 (25) | 21 (33.33) | 48 (29.02) | |
| Ti/Tv | 2.53 | 5.5 | 4.11 | 3 | 2 | 2.34 | |
|
| |||||||
| North America ( | 0.7178 | ||||||
|
| 20 | 106 | 5370 | 3858 | 1051 | 473 | |
| Ti, | 18 (75) | 91 (77.75) | 32 (74.42) | 42 (75) | 72 (76.6) | 61 (74.39) | |
| Tv, | 6 (25) | 26 (22.22) | 11 (25.58) | 14 (25) | 22 (23.4) | 21 (25.61) | |
| Ti/Tv | 3 | 3.5 | 2.90 | 3 | 3.27 | 2.90 | |
|
| |||||||
| Oceania ( | 0.2222 | ||||||
|
| NA | 32 | 1315 | 362 | 88 | 107 | |
| Ti, | NA | 26 (82.53) | 57 (75) | 61 (75.31) | 132 (72.13) | 185 (68.01) | |
| Tv, | NA | 6 (17.65) | 19 (25) | 20 (24.61) | 51 (27.87) | 87 (31.99) | |
| Ti/Tv | 4.67 | 3 | 3.05 | 2.59 | 2.12 | ||
|
| |||||||
| South America ( | 0.6071 | ||||||
|
| NA | NA | 289 | 207 | 23 | 22 | |
| Ti, | NA | NA | 39 (72.22) | 58 (80.56) | 53 (75.71) | 32 (82.1) | |
| Tv, | NA | NA | 15 (27.78) | 14 (19.44) | 17 (24.29) | 7 (17.9) | |
| Ti/Tv | NA | NA | 2.6 | 4.14 | 3.12 | 4.6 | |
|
| 0.6654 | 0.4094 | 0.8990 | 0.77753 | 0.68839 | 0.52399 | |
Figure 1Boxplot comparing the distribution of transition and transversion mutations with variant frequencies according to the geographical location and period of sample collection using the nonparametric Wilcoxon test. The x-axis represents the mutation substitution type, and the y-axis represents the log-transformed variant frequency.
Figure 2A heatmap showing the dynamics of hotspot mutation frequencies according to geographical locations. The period of collection is shown as rows, and hotspot mutations are shown as columns. Red/blue coloration implies higher/lower mutation frequencies.
Figure 3Lollipop plots showing the distribution of nonsynonymous hotspot mutations on the SARS-CoV-2 genome and the change of mutation frequencies over time within geographical locations. The presence of a mutation is shown on the x-axis (lollipop), and the frequency of mutations is shown on the y-axis vertical line. The period of genome collection is distinguished by color. Nucleotide coordinates are according to the SARS-CoV-2 reference genome. Amino acid positions are according to the mature peptides in the SARS-CoV-2 reference genome. Only nonsynonymous mutations with frequencies ≥ 10% are presented here.