| Literature DB >> 32507105 |
Shu-Miaw Chaw1, Jui-Hung Tai1,2, Shi-Lun Chen3, Chia-Hung Hsieh4, Sui-Yuan Chang5, Shiou-Hwei Yeh6, Wei-Shiung Yang2, Pei-Jer Chen2, Hurng-Yi Wang7,8.
Abstract
BACKGROUND: SARS-CoV-2 began spreading in December 2019 and has since become a pandemic that has impacted many aspects of human society. Several issues concerning the origin, time of introduction to humans, evolutionary patterns, and underlying force driving the SARS-CoV-2 outbreak remain unclear.Entities:
Keywords: Coronavirus; Mutational bias; Population genetics; Positive selection
Mesh:
Year: 2020 PMID: 32507105 PMCID: PMC7276232 DOI: 10.1186/s12929-020-00665-8
Source DB: PubMed Journal: J Biomed Sci ISSN: 1021-7770 Impact factor: 8.410
Pairwise comparison of nonsynonymous (dN; above slash) and synonymous (dS; below slash) divergence between SARS-CoV-2, RaTG13, and Pangolin_2019 of different coding regions
| Length (aa) | SARS-CoV-2 vs RaTG13 | SARS-CoV-2 vs Pangolin_2019 | RaTG13 vs Pangolin_2019 | |
|---|---|---|---|---|
| All | 9555 | 0.007/0.168 | 0.024/0.469 | 0.025/0.467 |
| (0.042) | (0.051) | (0.054) | ||
| 4330 | 0.008/0.166 | 0.024/0.472 | 0.023/0.472 | |
| (0.048) | (0.051) | (0.049) | ||
| 2692 | 0.003/0.126 | 0.008/0.505 | 0.010/0.515 | |
| (0.024) | (0.016) | (0.019) | ||
| 1219 | 0.013/0.313 | 0.068/0.651 | 0.073/0.680 | |
| (0.040) | (0.104) | (0.107) | ||
| RBD of | 219 | 0.055/0.511 | 0.058/0.863 | |
| (0.107) | (0.068) | |||
| 274 | 0.009/0.156 | 0.019/0.285 | 0.019/0.261 | |
| (0.060) | (0.066) | (0.072) | ||
| 75 | 0/0.018 | 0/0.037 | 0/0.018 | |
| (0) | (0) | (0) | ||
| 221 | 0.004/0.186 | 0.010/0.299 | 0.006/0.317 | |
| (0.021) | (0.033) | (0.019) | ||
| 60 | 0/0.099 | 0.014/0.220 | 0.014/0.345 | |
| (0) | (0.062) | (0.040) | ||
| 121 | 0.011/0.177 | 0.018/0.275 | 0.029/0.329 | |
| (0.061) | (0.066) | (0.088) | ||
| 121 | 0.032/0.303 | 0.025/0.362 | 0.017/0.391 | |
| (0.105) | (0.069) | (0.042) | ||
| 415 | 0.005/0.124 | 0.011/0.145 | 0.010/0.125 | |
| (0.042) | (0.076) | (0.080) |
Numbers in parentheses are dN/dS ratios throughout this table
A: RBD Receptor binding domain of spike
Fig. 1Frequency spectra of SARS-CoV-2. The mutation frequency in 137 SARS-CoV-2 genomes is depicted on the x axis, and the y axis shows the number of sites in which mutations occurred. a The derived nucleotides were inferred by referencing SARS-CoV-2 genomes to the RaTG13 genome. b The direction of changes was cross-referenced with the haplotype network in Fig. 2
Fig. 2A haplotype network of sampled SARS-CoV-2 genomes. The haplotype network was constructed by the median joining algorithm. Circle areas are proportional to the number of sequences. Numbers along the branches are mutation steps between haplotypes. Mutation types are given on the branches. Mutations involved in different evolutionary pathways or occurred more than once are enclosed. Also see Table 2 for comparison. Seven genomes—EPI_ISL_404253, 407,079, 408,511, 408,512, 408,487, 410,480, and 408,483—were excluded from this analysis because their sequences contained too many ‘N’ notations
Non-singleton mutations detected across the sampled SARS-CoV-2 genomes
| Genome position | Gene | RaTG13 | Pangolin_2017 | Pangolin_2019 | Major allele | Minor allele | amount of change | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Nonsynonymous | I | II | ||||||||
| A | 614 | G | G | G | G | A | 2 | H116Q | ||
| B | 1190 | C | C | C | C | T | 3 | P308S | ||
| C | 5084 | A | A | A | A | G | 2 | A1606T | ||
| D | 9438 | C | C | C | C | T | 3 | T3058I | ||
| E | 11,083 | G | T | G | G | T | 9 | L3606F | ||
| F | 18,488 | T | T | T | T | C | 2 | I6074V | ||
| G | 21,707 | C | C | N/A | C | T | 5 | H48Y | ||
| H | 22,661 | G | G | G | G | T | 5 | V366F | ||
| I | 26,144 | G | G | G | G | T | 18 | G251V | ||
| J | 27,147 | G | G | G | G | C | 2 | I208T | ||
| K | 28,077 | G | G | G | G | C | 4 | V61L | ||
| L | 28,144 | C | C | C | T | C | 99 | 38 | L84S | |
| M | 28,854 | C | C | C | C | T | 5 | S194L | ||
| N | 28,878 | G | G | G | G | A | 6 | S202N | ||
| O | 29,019 | A | A | A | A | T | 2 | D249H | ||
| P | 29,303 | C | C | C | C | T | 2 | K343I | ||
| Synonymous | ||||||||||
| α | 2662 | C | T | T | C | T | 3 | C2397T | ||
| β | 8782 | T | T | T | C | T | 100 | 37 | C8517T | |
| γ | 10,138 | T | T | T | C | T | 134 | 3 | C9873T | |
| δ | 15,324 | C | C | C | C | T | 2 | C15059T | ||
| ϵ | 17,373 | T | C | T | C | T | 132 | 5 | C17108T | |
| ζ | 18,060 | T | T | A | C | T | 131 | 6 | C17795T | |
| η | 18,603 | T | T | C | T | A | 2 | T18338C | ||
| θ | 23,569 | A | C | A | T | C | 2 | T2007C | ||
| ι | 23,605 | N/A | N/A | N/A | T | G | 2 | T2043G | ||
| κ | 24,034 | T | C | C | C | T | 131 | 6 | C2472T | |
| λ | 24,325 | A | A | A | A | G | 2 | A2763G | ||
| μ | 26,729 | T | T | T | T | C | 4 | T207C | ||
| ν | 29,095 | T | T | T | C | T | 125 | 12 | C822T | |
I: Number of changes was inferred by outgroup comparison only
II: Number of changes was cross-referenced with the haplotype network of SARS-CoV-2;only numbers different from method I were shown
E: envelope; M: matrix; N: nucleocapsid; S: spike
Fig. 3Mutation frequency of 84S in orf8 and 251 V in orf3 proteins. Numbers in parentheses are cumulative number of sequences on the indicated day. The dashed line indicates the date of the Wuhan lockdown
Comparison of dN, dS, and dN/dS estimates in the coding regions of SARS-CoV-2 without singleton between two episodes
| Gene | Episode I ( | Episode II ( | Episode I + II (2019/12/24–2020/2/23) | |||
|---|---|---|---|---|---|---|
| dN × 104 | dS × 104 | dN × 104 | dS × 104 | dN X 104 | dS X 104 | |
| dN/dS | dN/dS | dN/dS | ||||
| All | 0.34 | 1.70 | 0.78 | 1.98 | 0.61 | 1.87 |
| 0.20 | 0.39 | 0.32 | ||||
| 0.10 | 1.46 | 0.37 | 2.15 | 0.26 | 1.85 | |
| 0.07 | 0.17 | 0.14 | ||||
| 0.06 | 0.81 | 0.08 | 1.60 | 0.07 | 1.29 | |
| 0.07 | 0.05 | 0.05 | ||||
| 0.23 | 2.49 | 0.64 | 1.82 | 0.48 | 2.10 | |
| 0.09 | 0.35 | 0.23 | ||||
| 0.00 | 0.00 | |||||
| 0.00 | ||||||
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | ||||
| 0.00 | 4.60 | 0.97 | 3.37 | 0.57 | 3.86 | |
| 0.00 | 0.29 | 0.15 | ||||
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | ||||
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 0.00 | ||||
| 1.16 | 7.31 | 2.98 | 4.26 | 2.25 | 5.61 | |
| 0.16 | 0.70 | 0.40 | ||||
*No synonymous mutation in this region was observed. The genome-wide dS value was used here. As the sequence EPI_ISL_411929 from South Korea did not have sampling date, it was excluded from this analysis
Fig. 4The epidemic growth curve for the SARS-CoV-2 outbreak. The three lines are the median (blue line) and 95% HPD intervals (gray lines) of the Bayesian skyline plot (m = 5). Vertical dash line indicates the date of the Wuhan lockdown
Estimated nucleotide diversity of SARS-CoV-2 across geographic regions
| Sample origin | Sample size | S | θ x 10−4 | π x 10−4 |
|---|---|---|---|---|
| Total | 137 | 223 | 13.92 | 1.81 |
| China | 64 | 157 | 11.38 | 2.10 |
| Wuhan | 24 | 41 | 3.76 | 1.16 |
| Rest of China | 40 | 119 | 9.59 | 2.62 |
| Rest of the World | 73 | 81 | 5.71 | 1.52 |
| USA | 17 | 28 | 2.84 | 1.71 |
| Rest of the World excluding USA | 56 | 62 | 4.63 | 1.43 |
S: Number of segregating sites
θ: Nucleotide diversity based on Watterson [29]
π: Nucleotide diversity based on Nei and Li [30]