| Literature DB >> 33388272 |
Yi Xu1, Lu Kang2, Zijie Shen2, Xufang Li1, Weili Wu3, Wentai Ma2, Chunxiao Fang1, Fengxia Yang1, Xuan Jiang3, Sitang Gong4, Li Zhang5, Mingkun Li6.
Abstract
In response to the current coronavirus disease 2019 (COVID-19) pandemic, it is crucial to understand the origin, transmission, and evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which relies on close surveillance of genomic diversity in clinical samples. Although the mutation at the population level had been extensively investigated, how the mutations evolve at the individual level is largely unknown. Eighteen time-series fecal samples were collected from nine patients with COVID-19 during the convalescent phase. The nucleic acids of SARS-CoV-2 were enriched by the hybrid capture method. First, we demonstrated the outstanding performance of the hybrid capture method in detecting intra-host variants. We identified 229 intra-host variants at 182 sites in 18 fecal samples. Among them, nineteen variants presented frequency changes > 0.3 within 1-5 days, reflecting highly dynamic intra-host viral populations. Moreover, the evolution of the viral genome demonstrated that the virus was probably viable in the gastrointestinal tract during the convalescent period. Meanwhile, we also found that the same mutation showed a distinct pattern of frequency changes in different individuals, indicating a strong random drift. In summary, dramatic changes of the SARS-CoV-2 genome were detected in fecal samples during the convalescent period; whether the viral load in feces is sufficient to establish an infection warranted further investigation.Entities:
Keywords: Dynamics; Hybrid capture; Intra-host variant; Mutation; SARS-CoV-2
Year: 2020 PMID: 33388272 PMCID: PMC7649052 DOI: 10.1016/j.jgg.2020.10.002
Source DB: PubMed Journal: J Genet Genomics ISSN: 1673-8527 Impact factor: 4.275
Fig. 1SARS-CoV-2 read counts and genome coverage obtained using direct metatranscriptomic sequencing and hybrid capture-based sequencing. A: The number of SARS-CoV-2 reads in the unit of reads per million (RPM). B: Genome coverage with depth ≥1. C: Genome coverage with and depth ≥ 10. In A–C, local polynomial regression line, R, and p values of Spearman correlations are shown; the Ct value that was below the detection limit was replaced with 42 for better visualization. D: Heatmap of Ct, duplicate reads rate, RPM, and genome coverage after deduplication. The median number for each group is shown with a bar plot, and crosses indicate that the samples were not included in E2 data. BDL, below the detection limit. E: Depth distribution of samples and probes along the SARS-CoV-2 genome. The relative depth was calculated by normalizing depth (bin size = 500 bases) to the average depth of each sample or the probes. Samples with Ct < 34 and Ct > 34 are shown separately; only samples with average depth > 1 are shown. The red curve indicates GC content along the reference genome. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Coverage of the SARS-CoV-2 genome obtained using direct metatranscriptomic sequencing and hybrid capture-based sequencing.
| Group/coverage | Before deduplication | After deduplication | ||||
|---|---|---|---|---|---|---|
| Depth ≥ 1 | Depth ≥ 10 | Depth ≥ 50 | Depth ≥ 1 | Depth ≥ 10 | Depth ≥ 50 | |
| Raw | 0.0–100.0, 57.0 | 0.0–100.0, 11.4 | 0–99.9, 0.9 | 0.0–100.0, 57.0 | 0.0–100.0, 9.2 | 0–99.9, 0.4 |
| E1 | 63.5–100.0, 99.9 | 0.3–100.0, 99.1 | 0.2–100.0, 94.6 | 63.5–100.0, 99.9 | 0.3–100.0, 98.9 | 0.1–100.0, 92.4 |
| E2 | 99.7–100.0, 99.9 | 85.5–100.0, 99.9 | 22.8–100.0, 99.7 | 99.7–100.0, 99.9 | 85.1–100.0, 99.7 | 18.1–100.0, 99.7 |
SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Fig. 2Evaluation of hybrid capture in mutant allele frequency profiling. A–C: Consistency among alternative allele frequencies (AAFs) in Raw, E1, and E2 data. Mutations were identified as described in the Materials and methods section; if a mutation was only identified in one of the comparison pairs, the AAF of the other one was used as long as its depth ≥ 10. D: Consistency between AAF changes from T1 to T2 in Raw and E1 data. Sites with depth ≥ 10, minor allele frequency ≥ 0.01, and minor allele supporting reads ≥ 2 were used. R2 and p values of linear regressions are shown.
Fig. 3iSNV profiles and longitudinal dynamics in fecal samples. A: Alternative allele frequency (AAF) distribution of all iSNVs. The inserted plot shows the number of observed iSNVs in different genes; the expected number of iSNVs was calculated based on the length of each region. B: The number of iSNVs identified in each sample. Genomic coverages with depth > 50 are shown below the plot. C: AAF at iSNV positions shared by multiple individuals. The nucleotide at the position (reference allele/alternative allele), mutation type, genomic region, and amino acid change are shown on the right of the heatmap. Open circles indicate identified iSNVs. The four positions associated with recurrent mutations (van Dorp et al., 2020) are highlighted in red. D: iSNVs with frequency change > 0.30 from T1 to T2. Only changes with adjusted P < 0.05 (Fisher's exact test) are shown. Arrows indicate the direction of changes from T1 to T2, the colors of lines indicate time intervals between T1 and T2, and the colors of triangles/squares/dots indicate the number of nonduplicated reads at each site. E–G: Correlation between the Ct value, iSNV number, and the duration of detection of SARS-CoV-2 RNA in feces since T1/T2. R2 and p values of linear regressions are shown. In F and G, only samples having full genomic coverage with depth ≥ 50 were included (n = 9). NS, nonsynonymous, S, synonymous, NC, noncoding; iSNV, intra-host single-nucleotide variation.