| Literature DB >> 34033650 |
Dania Haddad1, Sumi Elsa John1, Anwar Mohammad2, Maha M Hammad2, Prashantha Hebbar1, Arshad Channanath1, Rasheeba Nizam1, Sarah Al-Qabandi3, Ashraf Al Madhoun1, Abdullah Alshukry4, Hamad Ali1,5, Thangavel Alphonse Thanaraj1, Fahd Al-Mulla1.
Abstract
COVID-19 is challenging healthcare preparedness, world economies, and livelihoods. The infection and death rates associated with this pandemic are strikingly variable in different countries. To elucidate this discrepancy, we analyzed 2431 early spread SARS-CoV-2 sequences from GISAID. We estimated continental-wise admixture proportions, assessed haplotype block estimation, and tested for the presence or absence of strains' recombination. Herein, we identified 1010 unique missense mutations and seven different SARS-CoV-2 clusters. In samples from Asia, a small haplotype block was identified, whereas samples from Europe and North America harbored large and different haplotype blocks with nonsynonymous variants. Variant frequency and linkage disequilibrium varied among continents, especially in North America. Recombination between different strains was only observed in North American and European sequences. In addition, we structurally modelled the two most common mutations, Spike_D614G and Nsp12_P314L, which suggested that these linked mutations may enhance viral entry and replication, respectively. Overall, we propose that genomic recombination between different strains may contribute to SARS-CoV-2 virulence and COVID-19 severity and may present additional challenges for current treatment regimens and countermeasures. Furthermore, our study provides a possible explanation for the substantial second wave of COVID-19 presented with higher infection and death rates in many countries.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34033650 PMCID: PMC8148317 DOI: 10.1371/journal.pone.0251368
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 3Identification of SARS-CoV-2 genetic clusters in different continents.
Illustration of the seven (C1 to C7, color-coded) genetic subdivisions of SARS-CoV-2 sequences across continents using variants with MAF ≥ 0.5%. Differential proportions of strong LD (C), weak LD (D), haplotype block (E), nonsynonymous (F), and synonymous (G) variants across continental datasets are shown.
Fig 63D modelling of SARS-CoV-2 Spike protein.
(A) Trimeric structure of SARS-CoV-S spike like protein (PBD:6VSB). (B) Overlay of the SARS-CoV-S spike like protein (PBD ID: 6VSB, blue) with the modelled SARS-CoV-2 S protein (PDB ID: 6M71, magenta). (C) The surface of the modelled S protein with the RRAR furin cleavage site (blue).
Fig 83D modelling of G614 mutation.
(A) S protein monomer 6VSB with D614G mutation, the red region of the protein depicts the more flexible region of the protein due to the D614G mutation with a decrease in stability of ΔΔG: -0.086 kcal/mol and an increase in vibrational entropy to ΔΔSVib 0.137 kcal.mol-1.K-1. (B) A zoomed-in structure of the N-terminal domain (NTD) and the G614 mutation in close vicinity to the RARR furin cleavage site.
Fig 103D modelling of P323L mutation.
(A) Suggested bonding network of P323 where the COO- group might form H-bonds with the backbone NH group of T324 and S325 and the side chain of S325. The grey dashed lines depict the hydrophobic interactions between P323 and W268 and F275. (B) The mutated L323 forms a H-bond with the side chain of S325 and forms a hydrophobic interaction with L270, which is at the curve of the loop making that region more compact.
Evidence of recombination in the sequences from continental datasets and regions exhibiting evidence of recombination observed from ‘Profile’ program in continental datasets.
| Dataset | Number of informative variants | Regions showing significant (p<0.05) evidence of recombination | Tests to detect evidence of recombination | Significance of observed Phi statistics |
|---|---|---|---|---|
| Africa (n = 25) | 19 | 18800–18900, 22450–23425, 29300–29325 | NSS | 1 |
| MaxChi2 | 0.978 | |||
| Phi (permutation) | 1 | |||
| Phi (normal) | 1 | |||
| Asia (n = 364) | 127 | 4600–4825, 5200–5825, 6425–6500, 6900–7375, 8800–8925, 9175–9750, 9950–10150, 21000–21050, 22650–23625 | NSS | 0.113 |
| MaxChi2 | 0 | |||
| Phi (permutation) | 0.493 | |||
| Phi (normal) | 0.305 | |||
| Europe (n = 1132) | 276 | 500–700, 2100–2125, 3300–3325, 4050–4725, 5150–5450, 7825–7850, 12925–14075, 18825–18850, 19375–19400, 20075–20475, 22975–23000, 23575–23625, 25275–25325, 26275–26325, 29200–29325 | ||
| MaxChi2 | 0.208 | |||
| Phi (permutation) | 0.343 | |||
| Phi (normal) | 0.223 | |||
| North America (n = 738) | 194 | 800–1100, 5575–6300, 6475–6525, 7300–7450, 18800–18875, 19950–19975, 21525–22625, 23500–23600, 24275–24675, 24975–25000, 25725–26050, 26375–26425, 27600–27800, 29300–29325 | ||
| MaxChi2 | 0.061 | |||
| Phi (permutation) | 0.060 | |||
| South America (n = 24) | 24 | None | NSS | 0.596 |
| MaxChi2 | 0.680 | |||
| Phi (permutation) | 0.717 | |||
| Phi (normal) | 0.454 | |||
| Oceania (n = 69) | 50 | 22625–23425 | NSS | 1 |
| MaxChi2 | 0.502 | |||
| Phi (permutation) | 0.872 | |||
| Phi (normal) | 0.361 | |||
| Combined (n = 2352) | 554 | 1625–1650, 3300–3525, 3975–4225, 4625–4725, 5175–5375, 7550–7800, 8800–8925, 9175–9750, 9925–10050, 10600–12125, 13025–14075, 18175–18550, 18850–19150, 20075–20475, 23000–23025, 23475–23525, 24700–24750, 27050–28225 | ||
Results of NSS, MaxChi2, Phi (permutation) and Phi (normal) tests using pairwise homoplasy index test available from PhiPack software on the combined dataset of all the 2352 samples. Significant P-values suggest the possibility of coinfection on a global level. European (NSS test, P-value of 0.001) and North American (NSS and Phi(normal), P-value of 0.007, 0.042 respectively) show evidence for the presence of recombination events, while African, Oceanic, South American, and Asian datasets show no recombination in early spread of SARS-CoV-2 in respective continents.
Plausible recombination events validated by RDP4 suite.
| Continental datasets | Event | Start | End | RDP | GENECONV | MaxChi2 | Chimera | 3Seq |
|---|---|---|---|---|---|---|---|---|
| North American | 1~ | 530 | 29326 | NS | NS | 3.84E-03 | NS | 1.14E-04 |
| 2~ | 226 | 29535 | NS | NS | 1.05E-02 | NS | 2.54E-04 | |
| 3~ | 175 | 26046 | NS | NS | 1.02E-02 | NS | 4.83E-03 | |
| European | 1~ | 820 | 24825 | NS | NS | 9.72E-05 | 2.12E-03 | 4.47E-04 |
| Combined | 1~ | 1519 | 29004 | NS | NS | 1.09E-03 | NS | 1.07E-04 |
| 2~ | 29575 | 29835 | NS | 4.98E-05 | NS | NS | 2.56E-02 |
Analysis performed using RDP, GENECONV, MaxChi2, Chimera, and 3Seq algorithms.
NS = No significant p-value is observed for the recombinant event using respective method.
* = The actual breakpoint position is undetermined (it was most likely overprinted by a subsequent recombination event.
~ = It is possible that this apparent recombination signal could have been caused by an evolutionary process other than recombination.
MAF distribution of 18 variants involved in haplotype block of combined dataset in each continental data.
| SNV | Minor allele frequency | Functional consequence | Gene | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Africa | Asia | Europe | North America | South America | Oceania | Combined | |||
| 241CT | 0.08 | 0.0811 | 0.2507 | 0.3231 | 0.4583 | 0.1905 | 0.49 | downstream | 5’-UTR |
| 1059CT | 0.24 | 0.019 | 0.1435 | 0.1865 | 0 | 0.0289 | 0.1313 | nonsynonymous | ORF1a |
| 3037CT | 0.08 | 0.0760 | 0.2502 | 0.313 | 0.4583 | 0.1884 | 0.4804 | synonymous | ORF1a |
| 8782CT | 0.04 | 0.2304 | 0.0424 | 0.4197 | 0.2917 | 0.1884 | 0.2484 | synonymous | ORF1a |
| 11083GT | 0.04 | 0.269 | 0.1339 | 0.0531 | 0.125 | 0.4928 | 0.1416 | nonsynonymous | ORF1a |
| 14408CT | 0.08 | 0.0760 | 0.25 | 0.3148 | 0.4583 | 0.1884 | 0.481 | nonsynonymous | ORF1b |
| 14805CT | 0.08 | 0.0047 | 0.1366 | 0.0224 | 0.3333 | 0.1159 | 0.0787 | synonymous | ORF1b |
| 17747CT | 0 | 0 | 0.0106 | 0.4616 | 0 | 0.0579 | 0.174 | nonsynonymous | ORF1b |
| 17858AG | 0 | 0 | 0.0097 | 0.4418 | 0 | 0.0579 | 0.1798 | nonsynonymous | ORF1b |
| 18060CT | 0 | 0.0166 | 0.0088 | 0.4382 | 0 | 0.0579 | 0.1829 | synonymous | ORF1b |
| 23403AG | 0.08 | 0.0783 | 0.25 | 0.3121 | 0.4583 | 0.1884 | 0.4808 | nonsynonymous | S |
| 25563GT | 0.32 | 0.0213 | 0.1851 | 0.2262 | 0 | 0.0434 | 0.1647 | nonsynonymous | ORF3a |
| 26144GT | 0.04 | 0.1119 | 0.1293 | 0.0211 | 0.125 | 0.1594 | 0.0923 | nonsynonymous | ORF3a |
| 27046CT | 0 | 0 | 0.1114 | 0.0013 | 0.125 | 0.0144 | 0.0538 | nonsynonymous | M |
| 28144TC | 0.04 | 0.2304 | 0.0407 | 0.4185 | 0.2917 | 0.1884 | 0.2483 | nonsynonymous | N |
| 28881GA | 0 | 0.0381 | 0.2396 | 0.0423 | 0.375 | 0.0869 | 0.1378 | nonsynonymous | N |
| 28882GA | 0 | 0.0381 | 0.2396 | 0.0410 | 0.375 | 0.0869 | 0.1374 | synonymous | N |
| 28883GC | 0 | 0.0381 | 0.2396 | 0.0410 | 0.375 | 0.0869 | 0.1374 | nonsynonymous | N |
Display of minor allele frequency for each variant in different continents, the functional consequence of these variants, and their corresponding genes. (SNV- single nucleotide variant).
Characteristics of haplotype blocks estimated from three continental datasets.
| Dataset | Haplotype block start | Haplotype block end | Length (in kb) | Number of variants | Number of nonsynonymous variants | Variant | MAF |
|---|---|---|---|---|---|---|---|
| 3037 | 23403 | 20.367 | 5 | 3 | 3037CT | 0.076 | |
| 8782CT | 0.23 | ||||||
| 241 | 28883 | 28.643 | 17 | 10 | 241CT | 0.25 | |
| 3037CT | 0.25 | ||||||
| 14805CT | 0.136 | ||||||
| 15324CT | 0.062 | ||||||
| 17247TC | 0.0689 | ||||||
| 20268AG | 0.0734 | ||||||
| 28882GA | 0.239 | ||||||
| 241 | 8782 | 8.54 | 4 | 1 | 241CT | 0.323 | |
| 3037CT | 0.313 | ||||||
| 8782CT | 0.419 | ||||||
| 14408 | 28144 | 13.737 | 7 | 6 | |||
| 18060CT | 0.438 | ||||||
Characteristics of haplotype blocks estimated from Asian, European, and North American datasets. Nonsynonymous variants are shown with bold font.