| Literature DB >> 35177697 |
Nikolaos Pechlivanis1,2, Maria Tsagiopoulou1, Maria Christina Maniou1, Anastasis Togkousidis1, Evangelia Mouchtaropoulou1, Taxiarchis Chassalevris3, Serafeim C Chaintoutis3, Maria Petala4, Margaritis Kostoglou5, Thodoris Karapantsios5, Stamatia Laidou1,2, Elisavet Vlachonikola1,2, Anastasia Chatzidimitriou1, Agis Papadopoulos6, Nikolaos Papaioannou3, Chrysostomos I Dovas3, Anagnostis Argiriou1,7, Fotis Psomopoulos8.
Abstract
The COVID-19 pandemic represents an unprecedented global crisis necessitating novel approaches for, amongst others, early detection of emerging variants relating to the evolution and spread of the virus. Recently, the detection of SARS-CoV-2 RNA in wastewater has emerged as a useful tool to monitor the prevalence of the virus in the community. Here, we propose a novel methodology, called lineagespot, for the monitoring of mutations and the detection of SARS-CoV-2 lineages in wastewater samples using next-generation sequencing (NGS). Our proposed method was tested and evaluated using NGS data produced by the sequencing of 14 wastewater samples from the municipality of Thessaloniki, Greece, covering a 6-month period. The results showed the presence of SARS-CoV-2 variants in wastewater data. lineagespot was able to record the evolution and rapid domination of the Alpha variant (B.1.1.7) in the community, and allowed the correlation between the mutations evident through our approach and the mutations observed in patients from the same area and time periods. lineagespot is an open-source tool, implemented in R, and is freely available on GitHub and registered on bio.tools.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35177697 PMCID: PMC8854625 DOI: 10.1038/s41598-022-06625-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Abbreviation table containing all acronyms that are used in the text.
| Acronym/phrase | Meaning |
|---|---|
| Variant of concern | |
| Next-generation sequencing | |
| Variant call format | |
| The name of the sequence (typically a chromosome) on which the variation is being called | |
| The 1-based position of the variation on the given sequence | |
| The reference base (or bases in the case of an indel) at the given position on the given reference sequence | |
| The list of alternative alleles at this position | |
| Read depth | |
| Read depth for each allele | |
| Allele frequency for each ALT allele in the same order as listed (use this when estimated from primary data, not called genotypes) | |
| Corresponding amino acid substitution | |
| Referring to nucleotide mutations |
Each row in the table (used as an internal structure) corresponds to a single lineage.
| Lineage | Total number of lineage’s characteristic mutations | Ntfreebayes | Ntmpileup | Ntgatk |
|---|---|---|---|---|
| 17 | 09 | 07 | 07 | |
| 21 | 10 | 15 | 4 | |
| 19 | 05 | 08 | 08 |
The columns correspond to the different number of SARS-CoV-2 mutations (Nt) that are captured in the sample.
Difference between the number of common mutations across the three variant callers.
| Lineage | |Ntfreebayes − Ntgatk| | |Ntfreebayes − Ntmpileup| | |Ntgatk − Ntmpileup| |
|---|---|---|---|
| 2 | 2 | 0 | |
| 6 | 5 | 11 | |
| 3 | 3 | 0 |
Figure 1Evolution of mutations across different low frequency parameters. (A) Density plot of the absolute difference values between the number of common mutations of the three variant calling tools used (pairwise comparisons). (B) Number of reads for each replicate and for the common mutations. (C) The corresponding allele frequency for each replicate and for the common mutations.
Summary table of the three output files produced by freebayes, mpileup and GATK mutect2 variant caller.
| Variant calling comparison | Number of differences | Max absolute Nt difference |
|---|---|---|
| 3791 | 31 | |
| 3140 | 33 | |
| 1571 | 31 |
The three variant callers are compared in pairs.
Snapshot of table containing every mutation per sample along with the corresponding gene and the amino acid substitutions.
| CHROM | POS | REF | ALT | DP | AD alt | Gene name | HGVS | AF | Sample |
|---|---|---|---|---|---|---|---|---|---|
| NC_045512.2 | 326 | T | A | 7 | 1 | ORF1ab | T21I | 0.143 | Sample A |
| NC_045512.2 | 378 | T | C | 10 | 1 | ORF1ab | V38A | 0.100 | Sample A |
| NC_045512.2 | 408 | A | T | 10 | 1 | ORF1ab | D48V | 0.100 | Sample B |
| NC_045512.2 | 433 | T | C | 10 | 2 | ORF1ab | V56V | 0.200 | Sample C |
| NC_045512.2 | 442 | C | T | 10 | 1 | ORF1ab | G59G | 0.100 | Sample C |
Figure 2Unsupervised mutation clustering was performed on a table containing all amino acid substitutions (A) Hierarchical clustering shows the clustered collapsed amino acid substitution using the Euclidean distance as a distance metric and ward.D as a clustering method. (B) Hierarchical clustering based on the cluster 1 of the (A). The heatmap shows the mutation evolution across the different periods. (C) Number of mutations per gene across the different periods. The values of the plot were normalized based on the length of each gene.
Figure 3Clustering amino acid substitutions for the Alpha (B.1.1.7) and the Beta (B.1.351) variants. Heatmap displays the corresponding allele frequency (AF) of each period per amino acid substitution. (A) Evolution of B.1.1.7-detected mutations. (B) Evolution of B.1.351-detected mutations. Positions with low coverage (less than 20 reads) are depicted with dark gray color.
Allele frequency metrics computed for the comparison with the clinical data.
| Average allele frequency of variants’ mutations | Average allele frequency of the unique variants’ mutations | Minimum allele frequency of the present (non-zero) unique variant’s mutations | ||||
|---|---|---|---|---|---|---|
| 2–14 Dec. 2020 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 5–11 Feb. 2021 | 32.71 | 0.00 | 35.15 | 0.00 | 16.67 | 0.00 |
| 12–18 Feb. 2021 | 51.92 | 9.05 | 49.20 | 0.21 | 39.53 | 1.47 |
| 19–25 Feb. 2021 | 79.78 | 22.15 | 85.84 | 14.20 | 58.06 | 35.48 |
| 26 Feb.–4 Mar. 2021 | 78.18 | 11.11 | 77.51 | 0.00 | 31.88 | 0.00 |
| 5–11 Mar. 2021 | 95.54 | 11.11 | 96.28 | 0.00 | 76.71 | 0.00 |
| 12–18 Mar. 2021 | 94.12 | 11.11 | 92.31 | 0.00 | 100.0 | 0.00 |
| 19–25 Mar. 2021 | 92.58 | 11.41 | 91.14 | 0.56 | 92.75 | 3.92 |
| 26 Mar.–1 Apr. 2021 | 74.09 | 11.51 | 74.27 | 0.63 | 89.49 | 4.44 |
| 2–8 Apr. 2021 | 96.89 | 12.30 | 95.98 | 1.56 | 92.14 | 2.03 |
| 9–15 Apr. 2021 | 86.20 | 11.10 | 81.99 | 0.00 | 77.15 | 0.00 |
| 16–22 Apr. 2021 | 74.32 | 10.97 | 74.50 | 0.00 | 89.22 | 0.00 |
| 23–29 Apr. 2021 | 81.60 | 11.11 | 83.81 | 0.00 | 93.41 | 0.00 |
| 30 Apr.–6 May 2021 | 93.54 | 11.11 | 91.55% | 0.00 | 93.85 | 0.00 |
To this end the average allele frequency of all mutations, the average allele frequency of the unique mutations and the minimum allele frequency were calculated for each time period. Three metrics were used to quantify the presence of each lineage. Firstly, the “Average allele frequency of the mutation” which is the average allele frequency of all amino acid mutation of a lineage, the “Average allele frequency of the unique mutation” which is the average allele frequency of the unique (no shared with another lineage) amino acids mutations of a lineage, and finally the “Minimum allele frequency of the present (non-zero) unique mutation” which is the minimum allele frequency of the non-zero unique mutation of lineage.
Figure 4Comparison between wastewater samples and clinical data. (A) SARS-CoV-2 lineages detected on clinical samples over all time periods. (B) The percentage of presence of the Alpha (B.1.1.7) variant of concern (VoC) in the clinical samples and the estimated minimum level of presence of the same VoC in the wastewater data. (C) Average percentage of the presence of each characteristic mutation of the B.1.1.7 variant of concern. The line corresponds to the average value per time period. Mutations that are not found in a particular time point are detected in the neighbouring ones, thus leading to the variations in the average.
Main quality characteristics of wastewater samples.
| pH | Electrical conductivity (S/cm) | Total suspended solids (mg/L) | BOD5 (mg/L) | COD (mg/L) | Dissolved organic carbon (mg/L) | UV absorption at 254 nm (1/cm) | Total nitrogen (mg/L) | Ammonium nitrogen (mg/L) | Total phosphorus (mg/L) | Copies/μL | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 02–14 Dec. 2021 | 7.5 | 8.5 | 620 | 385 | 960 | 35 | 0.35 | 62 | 28.5 | 11.5 | 36 |
| 05–11 Feb. 2021 | 7.8 | 9.6 | 930 | 525 | 1250 | 49 | 0.4 | 76 | 33 | 11.5 | 68 |
| 12–18 Feb. 2021 | 7.8 | 4.6 | 1200 | 650 | 1570 | 44 | 0.45 | 95 | 38 | 12 | 53 |
| 19–25 Feb. 2021 | 7.9 | 3.5 | 1225 | 684 | 1610 | 56 | 0.49 | 95 | 38.2 | 15.2 | 82 |
| 26 Feb.–4 Mar. 2021 | 7.8 | 2.9 | 1225 | 535 | 1383 | 53.5 | 0.47 | 78.5 | 36.7 | 12.4 | 179 |
| 5–11 Mar. 2021 | 7.6 | 2.8 | 1017 | 540 | 1373 | 50.2 | 0.47 | 71.7 | 36.8 | 12.1 | 102 |
| 12–18 Mar. 2021 | 7.8 | 4.5 | 852 | 580 | 1285 | 66.3 | 0.48 | 76.4 | 37.4 | 11.7 | 277 |
| 19–25 Mar. 2021 | 7.6 | 4.1 | 926 | 582 | 1467 | 60.8 | 0.48 | 79.3 | 39.6 | 11.6 | 467 |
| 26 Mar.–3 Apr. 2021 | 7.6 | 3.4 | 1095 | 660 | 1708 | 52.1 | 0.44 | 85.9 | 41.4 | 12.3 | 494 |
| 2–08 Apr. 2021 | 7.6 | 4.2 | 1054 | 667 | 1537 | 52 | 0.48 | 88.1 | 40.5 | 13.2 | 498 |
| 09–15 Apr. 2021 | 7.7 | 4.1 | 1025 | 579 | 1464 | 55.6 | 0.5 | 80 | 32.1 | 11.4 | 505 |
Figure 5Snapshots of the intermediate steps. (A) A summary plot showing the overall process from the sampling to (B) A VCF file produced by the chosen variant caller (C) SARS-CoV-2 characteristic substitutions retrieved either from a public source (such as Pangolin or outbreak.info), or be user-provided. (D). A tab-delimited file as produced by .