| Literature DB >> 34035952 |
James E San1, Sinaye Ngcapu2,3, Aquillah M Kanzi1, Houriiyah Tegally1, Vagner Fonseca1, Jennifer Giandhari1, Eduan Wilkinson1, Chase W Nelson4,5, Werner Smidt1,6, Anmol M Kiran7,8, Benjamin Chimukangara1, Sureshnee Pillay1, Lavanya Singh1, Maryam Fish1, Inbal Gazy1, Darren P Martin9, Khulekani Khanyile1, Richard Lessells1, Tulio de Oliveira1,10.
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes acute, highly transmissible respiratory infection in humans and a wide range of animal species. Its rapid global spread has resulted in a major public health emergency, necessitating commensurately rapid research to improve control strategies. In particular, the ability to effectively retrace transmission chains in outbreaks remains a major challenge, partly due to our limited understanding of the virus' underlying evolutionary dynamics within and between hosts. We used high-throughput sequencing whole-genome data coupled with bottleneck analysis to retrace the pathways of viral transmission in two nosocomial outbreaks that were previously characterised by epidemiological and phylogenetic methods. Additionally, we assessed the mutational landscape, selection pressures, and diversity at the within-host level for both outbreaks. Our findings show evidence of within-host selection and transmission of variants between samples. Both bottleneck and diversity analyses highlight within-host and consensus-level variants shared by putative source-recipient pairs in both outbreaks, suggesting that certain within-host variants in these outbreaks may have been transmitted upon infection rather than arising de novo independently within multiple hosts. Overall, our findings demonstrate the utility of combining within-host diversity and bottleneck estimations for elucidating transmission events in SARS-CoV-2 outbreaks, provide insight into the maintenance of viral genetic diversity, provide a list of candidate targets of positive selection for further investigation, and demonstrate that within-host variants can be transferred between patients. Together these results will help in developing strategies to understand the nature of transmission events and curtail the spread of SARS-CoV-2.Entities:
Keywords: NGS whole-genome sequencing; SARS-CoV-2; South Africa; bottleneck; nonsynonymous; selection; transmission dynamics; within-host variants
Year: 2021 PMID: 34035952 PMCID: PMC8135343 DOI: 10.1093/ve/veab041
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Phylogenetic analysis of the two outbreaks showing the clustering of sequences across hospital departments and associated Pangolin lineages to which the sequences belong. A) Pyhlogeny of samples from CH1 outbreak. B) Phylogeny of samples from the CH3 outbreak. Inset of each phylogeny is the TempEst plot showing showing the clocklike signal. Sample clustering was not consistent with the epidemiological settings in CH3.
Figure 2.Overview of general diversity of SARS-CoV-2 genomes from South African patients. (A) Nucleotide changes in SARS-CoV-2 genomes. (B) Distribution of variant frequencies across nucleotide changes. (C) Regression plot showing the correlation between frequencies of mutations in the two replicates. Outliers colored in red show variants that only occurred in a single replicate or at very low frequencies (<5%) in the second replicate and as such were filtered out. (D) The upset plot shows the distribution of iSNVs and SNVs across the outbreaks. The vertical bar chart shows the size of the intersection and the black dots and lines show the combination of iSNVs and SNVs. The horizontal bars show the unconditional frequency count of variants within each group. (E) Sequence variability detected in SARS-CoV-2 overlaid with major protein coding regions in the genome. Variants that only occurred as SNVs in more than ten samples are labelled in black while those that also occurred as iSNVs and in more than ten samples as SNVs are marked in red.
Common consensus mutations shared between putative source-recipient pairs in the CH1 outbreak.
| Source | Recipient |
|---|---|
| P3 (C241T, C3037T, C14408T, A23403G) | P7 (C241T, C3037T, C14408T, A23403G) |
| P10 (C241T, C3037T, C14408T, A23403G) | |
| HW4 (C241T, C3037T, C14408T, A23403G) | |
| P22 (C241T, C3037T, C14408T, | |
| P5 (C241T, C3037T, C14408T, | |
| P20 (C241T, C3037T, C14408T, | |
| P27 (C241T, C3037T, C14408T, | |
| P29 (C241T, C3037T, C14408T, | |
| HW4 (C241T, C3037T, C14408T, A23403G) | P11 (C241T, C3037T, C14408T, A23403G) |
| P15 (C241T, C3037T, C14408T, A23403G) | |
| P7 (C241T, C3037T, C14408T, A23403G) | P23 (C241T, C3037T, C14408T, C16376T, A23403G) |
| X1 (C241T, | |
| P26 (C241T, C3037T, C14408T, |
Mutations in bold were present in the recipient but not in the source. SNP distances between the genomes were confirmed using snp-dists package. Mutations were called relative to the Wuhan-Hu-1 reference (NC044512.2).
Shared iSNVs, days between samples, SNP distance and bottleneck estimates of CH1 outbreak putative source–recipient pairs.
| Source_ Recipient outbreak ID | Days between samples | SNP distance | Shared iSNVs | Shared iSNV count | Bottle Neck Estimate | lower CI | upper CI | L1_ norm |
|---|---|---|---|---|---|---|---|---|
| P3_P10 | 5 | 0 | U11288G|A12240G|A13003G|A20465G | 4 | 4 | 2 | 8 | 13 |
| P3_HW4 | 7 | 0 | G11241A|G11243C|U11288G|A11556U|A12240G|A13003G| G14707A|A20465G|U22507A|C29187U|A29188G | 11 | 11 | 6 | 19 | 13 |
| P3_P13 | 12 | 1 | U11288G|A11556U|A12240G|A13003G|A13587U| G14707A|G18181U|A20465G|U22507A|U22514A | 10 | 17 | 9 | 34 | 8 |
| P3_P21 | 14 | 1 | U11288G|A11556U|A12240G|A13003G|G14707A|G18181U| A20465G|U22507A|G22763A|C29187U|A29188G | 11 | 10 | 6 | 17 | 13 |
| P3_P22 | 16 | 1 | U11288G|A11556U|A12240G|A13003G|G14707A|G18181U| A20465G|U22507A|G22763A|C29187U|A29188G | 11 | 13 | 8 | 23 | 7 |
| P3_P5 | 5 | 1 | A11556U|A12240G|G22763A|G23302A|C23306G| C29187U|A29188G | 7 | 1000 | 311 | 1,000 | 11 |
| P3_P7 | 11 | 0 | A12240G|A13003G|A13587U|G18181U|G23302A|C23306G | 6 | 32 | 12 | 74 | 6 |
| P3_P29 | 15 | 1 | A12240G|A13003G|G14707A|G18181U|A20465G| U22507A|G22763A | 7 | 5 | 3 | 8 | 12 |
| P3_P20 | 14 | 1 | A11556U|A12240G|A13003G|G14707A|G18181U| A20465G|C29187U|A29188G | 8 | 7 | 4 | 13 | 11 |
| P3_P27 | 14 | 1 | 0 | 942 | 1 | 1,000 | 5 | |
| HW4_P11 | 5 | 0 | U11288G|A11556U|A12240G|A13003G|G14707A|C17933G| U20135A|U22507A | 8 | 8 | 5 | 16 | 15 |
| HW4_P15 | 6 | 0 | 0 | 704 | 1 | 1,000 | 14 | |
| P7_P23 | 3 | 1 | A12240G|A13003G|U17928G|A17929C|C17933G|G18181U| C18904U|U20135A|A20387G | 9 | 38 | 15 | 83 | 12 |
| P7_X1 | 0 | 1 | C12053G|A12240G|A13003G|A13587U|G17252U|A17256G| U17928G|A17929C|C17933G|G18181U|C18904U| U24552C|C25132A|A25136G | 14 | 8 | 6 | 12 | 21 |
| P7_P26 | 3 | 2 | 0 | 725 | 1 | 1,000 | 6 |
Three pairs shared no iSNVs even though other recipients sharing the same source had iSNVs present in the source.
Summary of iSNVs present at frequencies between 5% and 50% in the 109 SARS-CoV-2 genomes classified according to import on the genes and ORFs in which they occur.
| Gene | Length | High | Moderate | Low | Total, |
|---|---|---|---|---|---|
| (nonsense) | (non-synonymous) | (synonymous) | |||
| ORF1ab | 21,393 | 41 | 1234 | 466 | 1,741 (81.38) |
| S | 3,822 | 32 | 287 | 141 | 460 (120.36) |
| ORF3a | 828 | 3 | 62 | 32 | 97 (117.15) |
| E | 228 | 2 | 15 | 5 | 22 (96.49) |
| M | 669 | 3 | 42 | 13 | 58 (86.7) |
| ORF6 | 186 | 0 | 2 | 14 | 16 (86.02) |
| ORF7a | 366 | 0 | 11 | 16 | 27 (73.77) |
| ORF7b | 132 | 1 | 1 | 0 | 2 (15.15) |
| ORF8 | 366 | 1 | 16 | 9 | 26 (71.04) |
| N | 1260 | 5 | 139 | 60 | 204 (161.9) |
| ORF10 | 117 | 0 | 5 | 3 | 8 (68.38) |
| Total, N | 88 | 1,814 | 759 |
In the last column, total mutation counts are normalized to number of mutations per kilobase for easy comparison. Majority of the iSNVs detected were nonsynonymous.
N = (total variants in gene/gene length) × 1,000.
Figure 3.Transmission dynamics of shared within-host variants between CH1 outbreak samples. Each plot shows shared iSNVs between putative donor (red)/recipient (blue) pairs as evidence for transmission. (A) The presence of shared iSNVs between P3 and P5, (B) between P3 and P7. P7 further infected X1 (L) and P23 (C). P3 also sustained a transmission chain through HW4 (I) who infected P11 (F). Finally, P3 infected and transmitted iSNVs to P22 (D), P21(E), P10 (G), P13 (H), P20 (J) and P29 (L). Variants with frequency greater than 0.5 fixed as SNVs in the recipient.
Figure 4.Reconstruction of CH1and CH3 transmission chains. (A) and (C) show epidemiological links between samples, while (B) and (D) show refined links after in corporating within-host diversity, bottleneck estimates, SNP distance and days between samples for CH1 and CH3, respectively. Bold green line connects pairs with greater than three shared iSNVs, SNP distance of ≤2, and days between samples less than ten. Dashed lines show pairs that shared greater than three iSNVs but days between samples was greater than ten or did not share any iSNVs but SNP distance was less than two and days between samples less than ten. Maroon lines show samples that did not share any iSNVs with the source in CH1 even though other recipients from the same source share multiple iSNVs with the source.
Figure 5.Putative iSNVs transmission events amongst CH3 samples. (A) Gapped barplot showing number of shared iSNVs amongst CH3 pairs and (B) at given nucleotide positions. Majority of pairs had no shared iSNVs while positions 28881/2/3 co-evolved as iSNVs in three samples and SNVs in twenty-three other samples. Positions 6,762, 16,376, 22,675, 24,034, and 26,530 showed strong signals for shared iSNVs and later fixed as SNVs.
Figure 6.Whole gene within-host nonsynonymous (πN) and synonymous (πS) nucleotide diversity in SARS-CoV-2 samples from the CH1 and CH3 outbreaks. Each gene/outbreak is shaded according to the normalized difference between mean nonsynonymous and synonymous differences per site (πN − πS) to indicate purifying selection (πN < πS; blue) or positive selection (πN > πS; red). Values of πN/πS range from a minimum of 0.007 (nsp9, CH1 outbreak; P = 0.257) to a maximum of 12.46 (M, CH1 outbreak; P = 0.00238), where significance was evaluated using Z-tests of the null hypothesis that πN − πS = 0 (10,000 bootstrap replicates, codon unit). Sites encoding two or more genes in different reading frames were excluded from analysis (e.g. ORF3a sites overlapping ORF3c, ORF3d, or ORF3b). Error bars represent the standard error, evaluated using 10,000 bootstrap replicates (codon unit).
Candidate regions of positive selection within hosts.
| Gene product | Codons | Length (codons) |
|
|
| Codons with nonsynonymous differences |
|---|---|---|---|---|---|---|
| nsp2 | 331–369 | 39 | 2.98 (±1.14) | 1.21 (±0.69) | 2.46 | 332, 336, 338, 340+, 345, 355, 359, |
| (0.176) | ||||||
| nsp3 | 103–155 | 53 | 1.69 (±0.75) | 0.07 (±0.08) | 22.92 | 112+, 113, 126+, 132, |
| (0.033)* | ||||||
| nsp3 | 220–255 | 36 | 1.09 (±0.37) | 0 (–) | – | 224, 230, 231+, 233, |
| (0.003)** | ||||||
| nsp3 | 419–457 | 39 | 1.68 (±0.60) | 0 (–) | – | 422, |
| (0.005)** | ||||||
| nsp3 | 511–540 | 30 | 2.60 (±1.19) | 0.61 (±0.60) | 4.27 | 511, |
| (0.072) | ||||||
| nsp3 | 962–1,007 | 46 | 2.55 (±1.65) | 0.39 (±0.38) | 6.56 | 966, |
| (0.206) | ||||||
| nsp3 | 1,156–1,274 | 119 | 3.31 (±1.24) | 0 (–) | – |
|
| (0.008)** | ||||||
| nsp3 | 1,433–1,493 | 61 | 1.87 (±0.71) | 0 (–) | – | 1437, |
| (0.008)** | ||||||
| nsp3 | 1,589–1,644 | 56 | 1.17 (±0.41) | 0.23 (±0.23) | 5.03 | 1595, 1597, 1599, 1615, 1617, |
| (0.049)* | ||||||
| nsp3 | 1,733–1,765 | 33 | 1.98 (±1.14) | 0.43 (±0.44) | 4.58 | 1738, 1748, |
| (0.223) | ||||||
| nsp3 | 1,774–1,824 | 51 | 2.05 (±1.06) | 0.47 (±0.46) | 4.40 | 1789, 1795+, |
| (0.186) | ||||||
| nsp4 | 140–173 | 34 | 3.76 (±1.48) | 0.61 (±0.61) | 6.16 | 140, |
| (0.059) | ||||||
| nsp6 | 65–127 | 63 | 5.88 (±2.66) | 0.88 (±0.86) | 6.71 | 74, 76, 83, 84, 86, 90, 91, 94, 98, 104, |
| (0.077) | ||||||
| nsp6 | 169–206 | 38 | 7.83 (±6.23) | 0.86 (±0.84) | 9.08 | 189, 190, |
| (0.266) | ||||||
| nsp8 | 21–190 | 170 | 3.51 (±2.03) | 0.10 (±0.10) | 36.12 |
|
| (0.097) | ||||||
| nsp13 | 565–594 | 30 | 17.55 (±11.66) | 0.83 (±0.87) | 21.09 | 565, |
| (0.158) | ||||||
| nsp14 | 248–289 | 42 | 7.09 (±4.54) | 0.52 (±0.47) | 13.71 | 255+, 267, 269, 272, 274, 276, 278, 286, |
| (0.151) | ||||||
| nsp15 | 83–120 | 38 | 0.82 (±0.41) | 0.08 (±0.08) | 10.46 |
|
| (0.044)* | ||||||
| nsp15 | 236–337 | 102 | 5.10 (±3.35) | 0 (–) | – | 250, 256, 267, 270+, |
| (0.130) | ||||||
| nsp16 | 85–115 | 31 | 1.76 (±0.84) | 0 (–) | – | 86, 91, |
| (0.037)* | ||||||
| ORF3a | 97–136 | 40 | 5.99 (±3.11) | 1.32 (±1.30) | 4.55 | 100, 103, 117, 118, 121, 123, 125, |
| (0.080) | ||||||
| E | 44–76 | 33 | 4.51 (±2.00) | 1.17 (±0.81) | 3.86 | 50, 52, 58, 60, 68, 71, |
| (0.130) | ||||||
| M | 135–201 | 67 | 1.96 (±0.60) | 0 (–) | – | 154, 158, 160, 161, 163, 164, 167, 187, 189, |
| (0.001)** | ||||||
| ORF7b | 1–38 | 38 | 0.28 (±0.28) | 0 (–) | – |
|
| (0.307) |
Genes are ordered 5′ to 3′ by start site in the genome.
Codons are numbered with respect to mature gene products, that is each nonstructural protein (nsp) is re-numbered starting at 1.
Undefined values are indicated with a horizontal line (–).
P-values refer to Z tests of the hypothesis that πN = πS, evaluated for the indicated region using 10,000 bootstrap replicates (codon unit) *P<0.05; **P<0.01.
The codon with the highest πN (best candidate) for each region is shown with underline and bold; codons with evidence for between-host pervasive and episodic positive selection and an increasing frequency trend (Pond 2020) are shown with a ‘+’.
Note that the hypothesized overlapping gene ORF3d occupies codons 44–102 of ORF3a.