| Literature DB >> 35393464 |
Michelle J Lin1, Victoria M Rachleff1,2,3, Hong Xie1, Lasata Shrestha1, Nicole A P Lieberman1, Vikas Peddu1, Amin Addetia1, Amanda M Casto4, Nathan Breit1, Patrick C Mathias1, Meei-Li Huang1,2, Keith R Jerome5,6, Alexander L Greninger7,8, Pavitra Roychoudhury9,10.
Abstract
Rapid dissemination of SARS-CoV-2 sequencing data to public repositories has enabled widespread study of viral genomes, but studies of longitudinal specimens from infected persons are relatively limited. Analysis of longitudinal specimens enables understanding of how host immune pressures drive viral evolution in vivo. Here we performed sequencing of 49 longitudinal SARS-CoV-2-positive samples from 20 patients in Washington State collected between March and September of 2020. Viral loads declined over time with an average increase in RT-QPCR cycle threshold of 0.87 per day. We found that there was negligible change in SARS-CoV-2 consensus sequences over time, but identified a number of nonsynonymous variants at low frequencies across the genome. We observed enrichment for a relatively small number of these variants, all of which are now seen in consensus genomes across the globe at low prevalence. In one patient, we saw rapid emergence of various low-level deletion variants at the N-terminal domain of the spike glycoprotein, some of which have previously been shown to be associated with reduced neutralization potency from sera. In a subset of samples that were sequenced using metagenomic methods, differential gene expression analysis showed a downregulation of cytoskeletal genes that was consistent with a loss of ciliated epithelium during infection and recovery. We also identified co-occurrence of bacterial species in samples from multiple hospitalized individuals. These results demonstrate that the intrahost genetic composition of SARS-CoV-2 is dynamic during the course of COVID-19, and highlight the need for continued surveillance and deep sequencing of minor variants.Entities:
Mesh:
Year: 2022 PMID: 35393464 PMCID: PMC8987511 DOI: 10.1038/s41598-022-09752-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographics and clinical characteristics of patients included in study.
| Characteristics | (N = 20) |
|---|---|
| 70 (18) | |
| 13 (65) | |
| White | 12 (60) |
| Asian | 4 (15) |
| Black or African American | 2 (10) |
| American Indian or Alaska Native | 1 (5) |
| Unknown or Unavailable | 1 (5) |
| Hypertension | 10 (50) |
| Diabetes | 7 (35) |
| Obesity | 2 (10) |
| Asthma | 1 (5) |
| Convalescent Plasma | 2 (10) |
| Hydroxychloroquine | 2 (10) |
| Azithromycin | 5 (25) |
| Tocilizumab | 1 (5) |
| ACTT-1 Trial | 2 (10) |
| No Treatment | 8 (40) |
| Unknown | 4 (20) |
| Hospital admission | 14 (70) |
| ICU admission for COVID-19 | 4 (20) |
| Survival to discharge | 18 (90) |
Different categories (in bold) and their subcategories are shown in the first column, with their respective number of patients in the second column. In parentheses, standard deviation is indicated in the first row, and percentages for all other rows.
Figure 1Viral load dynamics in sequenced samples. Dots represent a unique sequenced sample. Lines connect samples from a single patient. Same day samples are not shown (see Supplementary Fig. S1).
Consensus sequence analysis of SARS-CoV-2 in longitudinal specimens.
| Patient | Sample # | Days since symptom onset | Ct value | %Ns | Clade (Nextclade/Pangolin) | Number of nt differences relative to first sample |
|---|---|---|---|---|---|---|
| P001 | 1 | Asymptomatic | 22.6 | 0.0 | 19B/A.1 | – |
| 2 | 3** | 22.2 | 0.0 | 4 | ||
| P003 | 1 | Unknown | 18.0 | 0.0 | 19B/A.1 | – |
| 2 | 5** | 25.3 | 0.0 | 0 | ||
| P005 | 1 | 0* | 19.1 | 0.3 | 19B/A.1 | – |
| 2 | 12 | 30.9 | 0.0 | 0 | ||
| P006 | 1 | Asymptomatic | 25.6 | 0.0 | 19B/A.1 | – |
| 2 | 0** | 29.7 | 0.0 | 0 | ||
| P007 | 1 | 0* | 15.8 | 0.0 | 19B/A.1 | – |
| 2 | 6 | 26.3 | 0.0 | 0 | ||
| P008 | 1 | 0* | 17.7 | 0.0 | 19B/A.1 | – |
| 2 | 16 | 29.4 | 0.0 | 0 | ||
| P009 | 1 | 0* | 25.3 | 0.0 | 20C/B.1.21 | – |
| 2 | 9 | 21.7 | 0.0 | 0 | ||
| P010 | 1 | − 7 | 20.7 | 0.0 | 19B/A.1 | – |
| 2 | 4 | 26.2 | 0.0 | 0 | ||
| 3 | 7 | 31.3 | 0.0 | 0 | ||
| 4 | 15 | 26.1 | 0.0 | 0 | ||
| P011 | 1 | 0 | 25.3 | 0.5 | 20C/B.1.21 | – |
| 2 | 8 | 27.7 | 0.0 | 20C/B.1.21 | 0 | |
| 3 | 11 | 31.6 | 0.1 | 0 | ||
| P012 | 1 | 5 | 21.6 | 0.7 | 20C/B.1.21 | – |
| 2 | 5 | 19.8 | 0.0 | 0 | ||
| P014 | 1 | Asymptomatic | 27.6 | 0.0 | 19B/A.1 | – |
| 2 | 3** | 29.2 | 1.6 | 0 | ||
| P015 | 1 | 3 | 18.5 | 0.0 | 19B/A.1 | – |
| 2 | 11 | 21.1 | 0.0 | 0 | ||
| 3 | 14 | 27.9 | 0.0 | 0 | ||
| P016 | 1 | 16 | 23.9 | 0.0 | 20C/B.1 | – |
| 2 | 19 | 25.3 | 0.0 | 0 | ||
| 3 | 22 | 30.9 | 0.0 | 0 | ||
| P017 | 1 | 10 | 24.8 | 0.0 | 19B/A.1 | – |
| 2 | 13 | 28.6 | 0.0 | 0 | ||
| 3 | 21 | 26.6 | 0.0 | 0 | ||
| P018 | 1 | Unknown | 15.3 | 0.0 | 20B/B.1.1.77 | – |
| 2 | 3** | 17.8 | 0.0 | 0 | ||
| P019 | 1 | Unknown | 14.6 | 0.0 | 20A/B.1 | – |
| 2 | 19** | 31.1 | 0.3 | 0 |
All patients with less than 2% unknown bases (Ns) are included. The last column indicates nucleotide differences compared to the first sample collected for each respective patient. One asterisk (*) indicates symptoms were present at first time point but exact date of symptom onset is unknown. Two asterisks (**) indicate days since first sample.
Figure 2Low frequency variation is abundant but only a small number of variants exhibit a significant change in allele frequency over the course of infection. (A) Each dot represents a high-confidence coding change in a single sample relative to the Wuhan-Hu-1 (NC_045512.2) reference genome with variant allele frequency between 5 and 95%, at least ×100 coverage at the site, and reproducibility in multiple samples at lower frequencies (< 40%). Color scale represents the change in allele frequency across time points in the same patient with darker colors representing variants that had greater changes in frequency across samples. Small dark grey marks along the top margin shows positions with variant frequencies > 95% (fixed mutations relative to the reference). Size of circles indicates sequencing depth at the site. Marginal histogram shows distribution of variants using bin width of 500 nucleotides. (B) Comparison of allele frequencies of low-frequency variants (< 20%) across replicates of the same sample (n = 10 samples). Each dot represents a variant with ≥ 100 total depth and ≥ 10 allelic depth in each replicate. Line of best fit is shown in purple, and dots in orange represent replicates that were re-sequenced using a different library preparation method (amplicon sequencing vs. shotgun metagenomic sequencing).
Frequent non-synonymous variants observed in ≥ 15 samples (n = 47).
| Variant | AF range | # patients | # samples | ORF: effect |
|---|---|---|---|---|
| C17747T | 0.98–1 | 11 | 24 | ORF1ab: P5828L; helicase: P504L |
| A17858G | 0.98–1 | 11 | 24 | ORF1ab: Y5865C; helicase: Y541C |
| T28144C | 0.02–1 | 12 | 23 | ORF8: L84S |
| A23403G | 0.99–1 | 9 | 19 | S: D614G |
| C14408T | 0.02–1 | 10 | 17 | ORF1ab: P4715L; RdRp: P323L |
| G25563T | 0.03–1 | 8 | 15 | ORF3a: Q57H |
| C1059T | 0.95–1 | 7 | 15 | ORF1a: T265I; nsp2: T85I |
All variants called had at least 10 reads of support for the alternate allele. For the three variants with large ranges in allele frequency (T28144C, C14408T, G14408T), ≤ 3 outlier samples with variant AFs below 0.1 were present. When these samples are excluded, minimum AF increases to ≥ 0.99.
Figure 3Variants that exhibit ≥ 40% maximum change in allele frequency in the individuals profiled here in summer 2020 show limited ability to predict future GISAID consensus sequences as of April 2021. (A) Relative frequencies of the derived allele found in GISAID consensuses across the genome. Dots represent each unique variant with size indicating the maximum intra-host change in allele frequency found in our study. (B) Number of GISAID consensuses with the derived allele for each variant. Height of vertical bars represents the total number of consensuses with the derived allele collected for each month from March 2020 to March 2021 and bar color represents the number of continents of origin for these consensuses.
Figure 4Variants that exhibit intra-host evolution in the spike protein across all patients. (A) All non-synonymous variants located in the spike protein with a ≥ 20% change in allele frequency among timepoints for any patient. (B) Enumeration of deletions that arose between residues 138–149 of the spike protein in P016 at ≥ 1% relative frequency reveals a rapidly changing complement of low frequency alleles present over a 6-day period. The reference nucleotide sequence (NC_045512) is located at the top of the sequence alignment. Above the reference is the corresponding amino acid sequence with associated residue numbers. Alleles that match the reference are in gray, and deletions are shown in black. To the right of the sequence alignment is a bar graph showing the square root of the relative frequency of each variant, for visualization purposes, labeled with the allele frequency in percentage and read depth in parentheses.
Figure 5Differentially expressed genes during SARS-CoV-2 infection. (A) Twenty differentially expressed genes with lowest adjusted p value. Fold changes are of later samples relative to initial samples. Genes highlighted in red have a log2 fold change > 2 and an adjusted p value < 0.1. (B) Gene Ontology analysis reveals that differentially expressed genes are significantly enriched in biological processes related to microtubule-based motility. The twenty biological processes with the lowest adjusted p values are shown. The length of the horizontal bars corresponds to the number of DE genes in each GO category (“Number Enriched). Bar color corresponds to the adjusted p value for enrichment of DE genes in each pathway.