| Literature DB >> 34855869 |
Bastien Paré1,2, Marieke Rozendaal1, Sacha Morin3,4, Léa Kaufmann1,2, Shawn M Simpson1, Raphaël Poujol5, Fatima Mostefai2,5, Jean-Christophe Grenier5, Henry Xing1, Miguelle Sanchez6, Ariane Yechouron6, Ronald Racette7, Julie G Hussin5,8, Guy Wolf4,9, Ivan Pavlov7, Martin A Smith1,2.
Abstract
The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled amplicon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive samples from 31 March to 17 April 2020 in 2 local hospitals to assess viral diversity (unknown at the time in Quebec) and potential associations with clinical outcomes. We report 264 viral genomes from 242 individuals-both staff and patients-with associated clinical features and outcomes, as well as longitudinal samples and technical replicates. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades with mutations of clinical interest, namely in the Spike protein, that evaded supervised lineage assignment methods-including Pangolin and NextClade supervised lineage assignment tools. We also report that certain symptoms (headache, myalgia and sore throat) are significantly associated with favorable patient outcomes. Our findings demonstrate the strength of unsupervised, data-driven analyses whilst suggesting that caution should be used when employing supervised genomic workflows, particularly during the early stages of a pandemic.Entities:
Mesh:
Year: 2021 PMID: 34855869 PMCID: PMC8638998 DOI: 10.1371/journal.pone.0260714
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Cohort summary.
| n | % | ||
|---|---|---|---|
|
|
|
| |
| Male | 79 | 32.6 | |
| Female | 163 | 66.9 | |
| Employee | 134 | 55.4 | |
| Hospitalized | 87 | 36 | |
|
| Symptomatic | 205 | 84.7 |
| Presymptomatic | 21 | 8.7 | |
| Asymptomatic | 16 | 6.6 | |
|
| COVID-19 | 54 | 62.1 |
| Other | 33 | 37.9 | |
|
| Deceased | 23 | 9.5 |
| Survival or unknown outcome | 219 | 90.5 | |
|
| 0–30 | 27 | 11.2 |
| 31–60 | 134 | 55.3 | |
| 61–104 | 81 | 33.5 | |
|
| 0 | 187 | 77.3 |
| 1 | 7 | 2.9 | |
| 2 | 26 | 10.7 | |
| 3 | 16 | 6.6 | |
| 4+ | 6 | 2.5 |
* Includes individuals with no reported comorbidities.
Fig 1Viral genome sequencing of 264 SARS-CoV-2 samples with Oxford nanopore.
(A) Cumulative distribution of genome completeness using the ARTIC bioinformatics SOP (see methods). Dashed vertical line corresponds to the 80% completeness threshold used for phylogenetic reconstruction. (B) Relationship between CN score at diagnosis, as measured by the Abbott RealTime M2000rt device (higher CN = lower viral load), and genome completeness. (C) Relationship between number of quality passed reads filtered using the ARTIC bioinformatics SOP and genome completeness.
Fig 2Genomic features in relation to RNA abundance at diagnosis.
Average variant allele frequency (VAF) in function of RNA abundance (left). Genome completeness in function of RNA abundance (right). Outliers highlighted in orange. CN = Cycle Number.
Fig 3Genomic and clinical features of 234 SARS-CoV-2 infections.
(Top) Maximum likelihood phylogenetic tree reconstruction of de novo assembled genomes spanning at least 80% of the SARS-CoV-2 reference genome (Wuhan-Hu-1) covered by ≥20 reads and annotated with clinical features of interest. The phylogeny was calculated from a multiple sequence alignment generated with MAFFT [19] using MEGA [20] and visualised with Iroki [21]. Lineage classification performed with Pangolin 2.1.10 [12] (outer circles) and haplotype assignment was performed based on the 20 most common variants in GISAID [18] from the first wave of the pandemic (inner circles). (Bottom) Haplotype diversity over time across two local hospitals. mVAF: Median variant allele frequency, CN: Cycle number.
Fig 4PHATE embeddings of genomic and clinical features.
Two-dimensional PHATE embeddings of the genomes (Top) and of the clinical features (Bottom). Each marker represents one patient and the embedding location of a given patient indicates feature similarity with surrounding samples as well as dissimilarity with distant ones. The embeddings are unsupervised and the labels of interest are used for coloring only. Specifically, mortality and comorbidity likelihoods were computed using MELD [23], a graph signal processing tool used to smooth a binary variable on the patient-patient graph to determine which regions of its underlying data manifold are enriched or depleted in patients with a specific outcome.
Fig 5Longitudinal sequencing of SARS-CoV-2 positive subjects.
Multiple samples for the same individual at different time points are linked by a black line. The size of the points represents the CN score at diagnosis (with black edges corresponding to scores ≤ 20) and the opacity represents the mean variant allele frequency (mVAF). Individual 8 was sampled twice on the same day. Asterisks indicate consensus genomes with <80% completeness.