| Literature DB >> 36228784 |
Vic-Fabienne Schumann1, Rafael Ricardo de Castro Cuadrat1, Emanuel Wyler2, Ricardo Wurmus1, Aylina Deter2, Claudia Quedenau3, Jan Dohmen1, Miriam Faxel1, Tatiana Borodina3, Alexander Blume1, Jonas Freimuth1, Martin Meixner4, José Horacio Grau4, Karsten Liere4, Thomas Hackenbeck4, Frederik Zietzschmann5, Regina Gnirss5, Uta Böckelmann5, Bora Uyar1, Vedran Franke1, Niclas Barke2, Janine Altmüller3, Nikolaus Rajewsky6, Markus Landthaler7, Altuna Akalin8.
Abstract
The use of RNA sequencing from wastewater samples is a valuable way for estimating infection dynamics and circulating lineages of SARS-CoV-2. This approach is independent from testing individuals and can therefore become the key tool to monitor this and potentially other viruses. However, it is equally important to develop easily accessible and scalable tools which can highlight critical changes in infection rates and dynamics over time across different locations given sequencing data from wastewater. Here, we provide an analysis of lineage dynamics in Berlin and New York City using wastewater sequencing and present PiGx SARS-CoV-2, a highly reproducible computational analysis pipeline with comprehensive reports. This end-to-end pipeline includes all steps from raw data to shareable reports, additional taxonomic analysis, deconvolution and geospatial time series analyses. Using simulated datasets (in silico generated and spiked-in samples) we could demonstrate the accuracy of our pipeline calculating proportions of Variants of Concern (VOC) from environmental as well as pre-mixed samples (spiked-in). By applying our pipeline on a dataset of wastewater samples from Berlin between February 2021 and January 2022, we could reconstruct the emergence of B.1.1.7(alpha) in February/March 2021 and the replacement dynamics from B.1.617.2 (delta) to BA.1 and BA.2 (omicron) during the winter of 2021/2022. Using data from very-short-reads generated in an industrial scale setting, we could see even higher accuracy in our deconvolution. Lastly, using a targeted sequencing dataset from New York City (receptor-binding-domain (RBD) only), we could reproduce the results recovering the proportions of the so-called cryptic lineages shown in the original study. Overall our study provides an in-depth analysis reconstructing virus lineage dynamics from wastewater. While applying our tool on a wide range of different datasets (from different types of wastewater sample locations and sequenced with different methods), we show that PiGx SARS-CoV-2 can be used to identify new mutations and detect any emerging new lineages in a highly automated and scalable way. Our approach can support efforts to establish continuous monitoring and early-warning projects for detecting SARS-CoV-2 or any other pathogen.Entities:
Keywords: COVID-19 surveillance; Environmental monitoring; Public health risk; Sequencing; Sewage sampling
Year: 2022 PMID: 36228784 PMCID: PMC9549760 DOI: 10.1016/j.scitotenv.2022.158931
Source DB: PubMed Journal: Sci Total Environ ISSN: 0048-9697 Impact factor: 10.753
Fig. 1Flowchart of PiGx SARS-CoV-2 pipeline describing required input files, the analysis workflow and used tools and output files.
Feature comparison between different available pipelines and analysis tools.
| COJAC (+ V-pipe) | Freyja | ARTIC bioinformatics pipeline | PiGx Sars-Cov-2 | |
|---|---|---|---|---|
| Deployment | Package available through conda, but execution relies on separate jupyter notebooks | Package available through conda | Package available through conda | Package available through GNU Guix, workflow management using snakemake |
| Lineage prediction strategy | Co-occurrence analysis using Maximum-Likelihood-Estimation | Deconvolution using constrained minimization | None, ends at variant calling step | Deconvolution using robust regression |
| Detection/Identification of emerging new/single mutations | ||||
| End-to-end | ||||
| Variable reference ge-mes as Input | ||||
| Output summary reports with visualization, stats and data | ||||
| Enables geospatial analysis | ||||
| Single Mutation trend analysis directly implemented | ||||
| Can take Input from different seq strategies and different read length | Limited, performance may vary with read length ( | Starts only from the BAM files | ||
| Bit-by-bit reproducibility |
Fig. 2A) Prediction verification results for the spike-in data simulation per lineage, the dotted line shows the expected trendline; B) Prediction verification results for the spike-in data simulation across all lineages excluding lineage A; C) Prediction verification results in-silico simulation, single-end simulated 40 bp reads from GISAID, 100 k reads.
Fig. 3A) Top 10 sequence variants that significantly increase over time in Berlin. The mutations were pooled over locations of four different wastewater treatment plants and daytime and sorted by decreasing coefficients from linear models. Statistical significance was evaluated by a t-test using p ≤ 0.05 as cutoff. Only samples passing the sample quality scoring (>90 % reference genome coverage) were used. There was no sampling between June 11 and September 19, 2021. B) Top 10 sequence variants that significantly increase over time in New York City (NYC) (2021). The mutations were pooled over locations of 14 different wastewater treatment plants in NYC and daytime and sorted by decreasing coefficients from linear models. Statistical significance was evaluated by a t-test using p ≤ 0.05 as cutoff.
Fig. 4A) Proportion of tracked lineages over time in Berlin wastewater. Only samples passing the sample quality scoring (≥ 90 % reference genome coverage) were considered. Shaded area highlights the non-sampling Phase. B) Proportion of tracked lineages over time in New York City wastewater. The proportions were calculated with a deconvolution model based on the signature mutation frequencies. “Others” denotes a set of reference mutations derived from the deconvolution matrix. Sample results were pooled from four different wastewater treatment plants using weighted mean with read number as weights. In case of undistinguishable lineages the proportion derived for the group was distributed equally for the affected lineages. C,D) Comparison of deconvolution results (dark color) with lineage frequency analysis data from Robert-Koch-Institute (RKI) (C) or NYC Department of Health and Mental Hygiene (NYC) (D) (light color). Deconvolution results were pooled by weeks using weighted mean using sample read numbers as weights. For the data from Berlin only samples passing the sample quality scoring (≥ 90 % reference genome coverage) were used.
Fig. 5A) Combination of lineage prediction results (deconvolution) for B.1.617.2 and BA.1/BA.2 (dataset-Berlin250), B,C,D) single key signature mutations M:I82T::T26767C, N:D63G::A28461G, ORF1ab:T3255I::C10029T, ORF1ab:P3395H::C10449A, N:P13L::C28311T, S:H655Y::C23525T and case numbers in Berlin (from RKI).
Fig. 6A) 7 days average of COVID-19 cases in Berlin, data from Robert Koch-Institute (RKI) (light green, left axis) and proportion of samples positively determined SARS-CoV-2 RNA by RT-qPCR (dark violet, right axis) over Feb - Jan 2022. B) Correlation of 7 days average of COVID-19 cases in Berlin and proportion of samples with positively determined SARS-CoV-2 RNA by RT-qPCR. C) 7 days average of COVID-19 cases in Berlin, data from Robert Koch-Institute (RKI) (light green, left axis) and proportion of samples positively determined SARS-CoV-2 RNA by RT-qPCR (dark violet, right axis) over Feb - Jan 2022 with one time point lag. D) Correlation of 7 days average of COVID-19 cases in Berlin and proportion of samples with positively determined SARS-CoV-2 RNA by RT-qPCR with one time point lag.
SARS-CoV-2 genomes used for in silico simulations.
| WHO Lineage/Pango ID | GISAID accession |
|---|---|
| Gamma/P1 | >hCoV-19/Brazil/AM-FIOCRUZ-21890579EMP/2021|EPI_ISL_4520422|2021-07-14 |
| Alpha/B.1.7.7 | >hCoV-19/Kenya/KEM-CVR-3EL/2021|EPI_ISL_4506017|2021-04-21 |
| Lambda/C37 | >hCoV-19/Denmark/DCGC-151255/2021|EPI_ISL_3450383|2021-08-11 |
| Delta/B.1.617.2 | >hCoV-19/Poland/CovSeq215/2021|EPI_ISL_4551640|2021-09-08 |
| Mu/B.1.621 | >hCoV-19/Colombia/ATL-UNIANDES-G029686/2021|EPI_ISL_4566376|2021-08-20 |
| Omicron/B.1.1.529 | >hCoV-19/Belgium/rega-20,174/2021|EPI_ISL_6794907.2|2021-11-24 |