| Literature DB >> 32509472 |
Carlos Farkas1,2, Francisco Fuentes-Villalobos3, Jose Luis Garrido4, Jody Haigh1,2, María Inés Barría3.
Abstract
Here we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive and GenBank repositories. Up until 27 March 2020, we downloaded 50 illumina datasets, mostly from China, USA (WA State) and Australia (VIC). A total of 30 datasets (60%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA next-generation sequencing samples. Sequencing samples from North America in GenBank (22 April 2020) present this signature with up to 39% allele frequencies among samples (n = 1,359). Australian variant signatures were more diverse than USA samples, but still, clonal events were found in these samples. Mutations in the helicase, encoded by the ORF1ab gene in SARS-CoV-2 were predominant, among others, suggesting that these regions are actively evolving. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing.Entities:
Keywords: COVID-19; Coronavirus; Founder effect; Mutations; RT-qPCR; SARS-CoV-2; SNVs; Viral genetic drift
Year: 2020 PMID: 32509472 PMCID: PMC7246029 DOI: 10.7717/peerj.9255
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Inspection of variants reveals well-defined signatures with founder effect across sequenced samples.
(A) (Upper) Plot of all merged variants from NGS datasets (n = 50) depicting single nucleotide variants (black dots) and indels (black squares) along SARS-CoV-2 nucleotide positions. (Lower) Snapshot of SARS-CoV-2 ORFs and receptor binding domains. User track denotes merged NGS variants across ORFs. (B) Plot of founder variants sorted by country (China = red, USA = blue and Australia = green) and by the number of variants, from left to right. Deletions and stop codons are framed with black rectangles. (C) IGV screenshots of coverage from USA samples (n = 13) aligned against SARS-CoV-2 reference genome. Founder variants are depicted with colored lines. (D) IGV screenshots of coverage from Australian samples (n = 11) aligned against SARS-CoV-2 reference genome. Founder variants are depicted with colored lines. (E) Allele frequency (plotted as percentage) of founder variants collected from GenBank SARS-CoV-2 sequence alignments from Asia (n = 270), Europe (n = 50) and North America (n = 1,359). Asterisks denotes common founder variants in the three regions, part of the USA-WA signature (8,782, 28,144). (F) (Upper) Variant consequence classification of GenBank founder variants obtained with the variant annotator integrator tool. Missense and synonymous variant consequences per ORF are denoted with red grey bars, respectively. (Lower) Schematic diagram of the general genetic composition of SARS-CoV-2. Colored boxes correspond to main genes and white boxes to smaller ORFs.
Allele frequencies of five detected variants in 1,359 GenBank sequences from North America.
| POS | REF | ALT | INFO | AF |
|---|---|---|---|---|
| 8782 | C | T | DP = 302; DP4 = 246,0,161,0 | 39.55774 |
| 17747 | C | T | DP = 302; DP4 = 272,0,135,0 | 33.16953 |
| 17858 | A | G | DP = 302; DP4 = 273,0,134,0 | 32.92383 |
| 18060 | C | T | DP = 302; DP4 = 271,0,136,0 | 33.41523 |
| 28144 | T | C | DP = 303; DP4 = 246,0,161,0 | 39.55773 |
Note:
POS, position in SARS CoV 2 reference sequence NC_045512.2; REF, reference allele; ALT, mutant allele; DP4, Number of high quality ref forward, ref reverse, alt forward and alt reverse bases; AF, allele frequency (%).
Figure 2Classification of SARS-CoV-2 variants from next-generation sequencing datasets.
(A) Ratio of Synonymous/missense variants across USA samples. Missense variants are denoted in black and Synonymous variants are depicted in grey. N indicates the total number of classified variants. (B) Same as left for Australian samples. (C) Missense variants classification from USA samples across SARS-CoV-2 genes. Each color indicates a different gene stated in legend (at right of the pie charts). (D) Same as left from Australian samples.
Figure 3SNPs in SARS-CoV-2 may diminish efficiency of RT-qPCR testing.
Top IGV screenshots of coverage from Australian sample SRR11397719 aligned against SARS-CoV-2 reference genome. A low allele frequency variant is depicted in red (T). Primers tracks are denoted at the top of the screenshot along with SARS-CoV-2 gff gene models. Bottom same as top for sample SRR11397719 denoting a founder variant in blue.