| Literature DB >> 34494031 |
Jasmijn A Baaijens, Alessandro Zulli, Isabel M Ott, Mary E Petrone, Tara Alpert, Joseph R Fauver, Chaney C Kalinich, Chantal B F Vogels, Mallery I Breban, Claire Duvallet, Kyle McElroy, Newsha Ghaeli, Maxim Imakaev, Malaika Mckenzie-Bennett, Keith Robison, Alex Plocik, Rebecca Schilling, Martha Pierson, Rebecca Littlefield, Michelle Spencer, Birgitte B Simen, William P Hanage, Nathan D Grubaugh, Jordan Peccia, Michael Baym.
Abstract
Effectively monitoring the spread of SARS-CoV-2 variants is essential to efforts to counter the ongoing pandemic. Wastewater monitoring of SARS-CoV-2 RNA has proven an effective and efficient technique to approximate COVID-19 case rates in the population. Predicting variant abundances from wastewater, however, is technically challenging. Here we show that by sequencing SARS-CoV-2 RNA in wastewater and applying computational techniques initially used for RNA-Seq quantification, we can estimate the abundance of variants in wastewater samples. We show by sequencing samples from wastewater and clinical isolates in Connecticut U.S.A. between January and April 2021 that the temporal dynamics of variant strains broadly correspond. We further show that this technique can be used with other wastewater sequencing techniques by expanding to samples taken across the United States in a similar timeframe. We find high variability in signal among individual samples, and limited ability to detect the presence of variants with clinical frequencies <10%; nevertheless, the overall trends match what we observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in variant prevalence in situations where clinical sequencing is unavailable or impractical.Entities:
Year: 2021 PMID: 34494031 PMCID: PMC8423229 DOI: 10.1101/2021.08.31.21262938
Source DB: PubMed Journal: medRxiv
Figure 1.Computational approach to variant of concern (variant) abundance estimation. a) Computational similarity between RNA transcript quantification and variant abundance estimation. b) Key aspects of the kallisto algorithm in the context of variant abundance estimation. c) Our workflow uses multiple reference sequence per lineage to capture within-lineage variation. Applying kallisto (as in part b) results in abundance estimates per reference sequence. These abundances are filtered using a minimal abundance cutoff and subsequently summed per lineage to obtain abundance estimates per lineage. Finally, variant abundances are reported.
Performance statistics per dataset. Results separated by a forward slash correspond to an abundance threshold of 0.1% and 1%, respectively. FPR = false positive rate; FNR = false negative rate; relative estimation error reflects the average relative frequency estimation error across all true positives.
| Benchmark | FPR | FNR | Precision | Recall | Relative estimation error (%) |
|---|---|---|---|---|---|
| Whole genome 100x | 0.191 / 0.0 | 0.057 / 0.032 | 0.423 / 1.0 | 0.943 / 0.968 | 29.4 / 19.4 |
| Whole genome 1,000x | 0.163 / 0 | 0.007 / 0.042 | 0.470 / 1.0 | 0.993 / 0.958 | 27.1 / 18.5 |
| Spike-only 100x | 0.121 / 0.003 | 0.107 / 0.074 | 0.508 / 0.978 | 0.893 / 0.926 | 26.3 / 15.8 |
| Spike-only 1,000x | 0.041 / 0.003 | 0.043 / 0.042 | 0.753 / 0.978 | 0.957 / 0.958 | 17.3 / 14.0 |
| Spike-only 10,000x | 0.010 / 0 | 0.014 / 0.042 | 0.926 / 1.0 | 0.986 / 0.958 | 15.3 / 13.0 |
Figure 2.Estimated variant abundances and relative prediction errors. Relative prediction errors are defined as the absolute difference between true and estimated frequency, relative to the true frequency.
Figure 3.a) RNA levels in wastewater (copies/ml sludge, displayed on left vertical axis) follow the same trend as COVID-19 case rates (cases per 100K people, displayed on right vertical axis). b) Percent genome with >20x coverage versus sludge Ct values. c) Impact of genome coverage on predicted B.1.1.7 abundance for random subsamples of a sludge sample with full genome coverage. The horizontal dotted line indicated the predicted B.1.1.7 abundance for the full sample (99% genome coverage).
Figure 4.Wastewater versus clinical abundance estimates for B.1.1.7 and B.1.526 in New Haven from early January 2021 to late April 2021. Dates of clinical sampling correspond to the date of specimen collection.
Figure 5.Wastewater versus GISAID abundance estimates for B.1.1.7, B.1.427, B.1.429 and B.1.526 at 16 locations across 8 states of the US. Samples were collected between late December 2020 and late January 2021; sampling date and location are indicated on the horizontal axis. Samples are sorted by location, with different locations separated by a dotted line and different states separated by a solid line.