| Literature DB >> 34211026 |
Sung Yong Park1, Gina Faraci1, Pamela M Ward2, Jane F Emerson2, Ha Youn Lee3.
Abstract
COVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients' Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.Entities:
Mesh:
Year: 2021 PMID: 34211026 PMCID: PMC8249533 DOI: 10.1038/s41598-021-93145-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1CorvGenSurv’s workflow and precision. (a) Remnant NP/OP specimens from COVID-19 diagnostic testing were subject to SARS-CoV-2 RNA extraction. Viral RNA was amplified via three overlapping RT-PCRs (~ 10,000 base long each) and pooled SARS-CoV-2 amplicons of indexed COVID-19 specimens were then sequenced by long-read high-throughput single-molecule sequencing. The output fasta file was de-multiplexed and processed to produce the consensus sequence of each segment. Each COVID-19 specimen’s three overlapping segments were assembled into a SARS-CoV-2 whole genome sequence. (b) CorvGenSurv’s precision was tested by comparing the consensus sequence from a given number of reads with the USA-WA1/2020 control strain (GenBank: MN985325.1). When a consensus sequence was built from three reads, only 67% [52.4–78.9%] of the 1000 bootstrap runs’ resulting consensus sequences were consistent with the correct sequence. When the number of the reads was greater or equal to 31, all 1000 bootstrap runs resulted in the correct sequence.
Amino acid mutations, clade and lineage of 25 whole genome sequences from COVID-19 remnant NP/OP specimens.
| Specimen ID | Collection date | Ct-1 | Ct-2 | Amino acid mutations | GISAID clade (Pango lineage) |
|---|---|---|---|---|---|
| USA/CA-LAC-USC1 | 4/13/20 | 18.58 | 18.72 | NSP12-P323L | GR (B.1.1) |
| USA/CA-LAC-USC2 | 4/13/20 | 24.84 | 25.44 | NSP3-G1011X, | GR (B.1.1) |
| USA/CA-LAC-USC3 | 4/14/20 | 24.18 | 24.96 | NSP2-T85I, NSP12-P323L, NSP14-G44D, NSP16-S33R, S-D614G, S-D1184N, ORF3a-Q57H | GH (B.1) |
| USA/CA-LAC-USC4 | 5/5/20 | 20.06 | 20.46 | NSP2-T85I, NSP12-P323L, S-D614G, ORF3a-Q57H | GH (B.1) |
| USA/CA-LAC-USC5 | 5/6/20 | 24.94 | 25.09 | NSP3-G638S, NSP5-K90R, NSP6-Q208H, NSP12-P323L, NSP14-V287F, | GR (B.1.1.61) |
| USA/CA-LAC-USC6 | 5/9/20 | 28.25 | 28.48 | NSP12-T85I, NSP12-P323L, NSP16-A188S, S-D614G, N-R203K, N-G204R | GR (B.1.1) |
| USA/CA-LAC-USC7 | 5/9/20 | 23.05 | 23.19 | NSP3-D1214N, NSP12-P323L, S-D614G, N-R203K, N-G204R | GR (B.1.1) |
| USA/CA-LAC-USC8 | 5/18/20 | 17.34 | 17.10 | NSP2-T85I, NSP12-P323L, S-D614G, S-P812L, ORF3a-Q57H, M-A68S, N-G34W | GH (B.1) |
| USA/CA-LAC-USC9 | 6/1/20 | 20.26 | 20.46 | NSP2-K110N, NSP2-P191S, NSP12-P323L, S-D614G, ORF7a-H73R, N-S194L | G (B.1.397) |
| USA/CA-LAC-USC10 | 6/2/20 | 23.34 | 23.69 | NSP2-K110N, NSP2-P191S, NSP12-P323L, S-D614G, N-S194L | G (B.1.397) |
| USA/CA-LAC-USC11 | 6/4/20 | 19.21 | 19.04 | NSP12-P323L, S-D614G, ORF8-I47F, N-R203K, N-G204R | GR (B.1.1.172) |
| USA/CA-LAC-USC12 | 6/5/20 | 16.79 | 16.42 | NSP3-Q203H, NSP12-P323L, NSP15-E223G, S-D614G, | GR (B.1.1.228) |
| USA/CA-LAC-USC13 | 6/6/20 | 23.60 | 24.01 | NSP2-T85I, NSP2-A361V, NSP8-L35F, NSP12-P323L, NSP13-K460R, NSP16-M17I, S-D614G, S-K1191N, ORF3a-Q57H, ORF3a-T175I | GH (B.1.166) |
| USA/CA-LAC-USC14 | 6/8/20 | 15.61 | 15.51 | G (B.1) | |
| USA/CA-LAC-USC15 | 6/9/20 | 18.59 | 18.64 | G (B.1) | |
| USA/CA-LAC-USC16 | 6/9/20 | 26.52 | 27.18 | NSP12-P323L, S-T286I, S-A522V, S-D614G, ORF3a-Q57H | GH (B.1.110) |
| USA/CA-LAC-USC17 | 6/9/20 | 29.39 | 30.39 | NSP1-V116M, NSP2-T85I, NSP3-A231V, NSP5-L89F, NSP12-P323L, NSP16-V294F, S-D614G, ORF3a-Q57H, ORF8-S24L | GH (B.1.595) |
| USA/CA-LAC-USC18 | 6/9/20 | 17.07 | 16.83 | NSP2-K110N, NSP2-P191S, NSP12-P323L, NSP16-P236S, S-Y144X, S-D614G, N-S194L | G (B.1.397) |
| USA/CA-LAC-USC19 | 6/10/20 | 26.27 | 26.37 | NSP12-P323L, S-D614G, ORF7a-P34S, N-R203K, N-G204R | GR (B.1.1) |
| USA/CA-LAC-USC20 | 6/10/20 | 27.65 | 28.42 | NSP12-P323L, S-G142C, S-R214C, S-D614G, ORF3a-P159S, N-R203K, N-G204R, N-Q229H | GR (B.1.1.132) |
| USA/CA-LAC-USC21 | 6/11/20 | 24.27 | 24.82 | NSP2-T85I, NSP3-P108L, NSP12-P323L, S-R21I, S-Y28H, S-D614G, ORF3a-Q57H, ORF8-S24L, N-G34W | GH (B.1.336) |
| USA/CA-LAC-USC22 | 6/11/20 | 24.48 | 25.19 | NSP2-T85I, | GH (B.1) |
| USA/CA-LAC-USC23 | 6/22/20 | 29.29 | 29.45 | NSP2-T85I, NSP12-T293I, NSP12-P323L, S-V308L, S-D614G, ORF3a-Q57H, N-S183Y | GH (B.1.369) |
| USA/CA-LAC-USC24 | 6/22/20 | 28.15 | 28.19 | NSP12-P323L, NSP16-D102Y, S-D614G, N-S194L | G (B.1.558) |
| USA/CA-LAC-USC25 | 6/22/20 | 28.59 | 28.63 | NSP2-A360V, | GR (B.1.1) |
Novel amino acid mutations that were not observed among 28,176 global SARS-CoV-2 sequences in GISAID are in bold. Ct-1 targeted the ORF1/a-b non-structural region and Ct-2 a conserved region in the structural protein envelope E-gene. qRT-PCR was performed using the Roche COBAS system.
Figure 2Maximum likelihood tree analysis and amino acid mutations of 25 SARS-CoV-2 whole genome sequences obtained by CorvGenSurv. (a) Maximum likelihood tree of 25 SARS-CoV-2 sequences obtained by CorvGenSurv along with sequences collected in California, US. A total of 1215 SARS-CoV-2 sequences collected from California, USA were downloaded from GISAID[20,21] as of July 27th, 2020. Our sequences were obtained from 25 remnant specimens from COVID-19 testing between April 13th and June 22nd, 2020 from Los Angeles County, California, USA. Specimens collected from April to May 2020 were colored purple and those collected in June were colored blue. Sequences of specimens USA/CA-LAC-USC1 to USA-CA-LAC-USC25 in Table 1 were denoted by 1 to 25 in this tree. All 25 sequences were classified as G clade with mutations P323L in NSP12 (RdRP) and D614G in S protein (grey circle). Different ancestral sequences were presented by circles in different colors with common mutations of each lineage presented in the box. The unit branch length (one nucleotide base substitution) was denoted as “HD = 1”. (b) Each of our 25 sequences’ amino acid mutations from Wuhan-Hu-1 (MN908947) were marked using Highlighter (https://www.hiv.lanl.gov/content/sequence/HIGHLIGHT/highlighter_top.html). The regions of NSP2, NSP12 (RdRP), S, E, M, and N were presented by colored boxes. (c) The prevalence of each amino acid mutation with greater than 2% frequency either globally, in the USA, or in California. A total of 28,176 global sequences were downloaded from GISAID[20,21].
Figure 3SARS-CoV-2 divergence. Our 25 Los Angeles sequences’ number of base substitutions from the reference sequence Wuhan-Hu-1 (MN908947) was plotted against the collection time of each sequence as days from the reference sequence collection time, December 31st, 2019. The SARS-CoV-2 evolution rate was estimated to be 8.62 10–4 substitutions per site per year (95% confidence interval: 7.96 10–4 to 9.24 10–4) by linear regression (solid line).
Figure 4Influenza A (H1N1) evolution and vaccination. (a) Maximum likelihood tree of 255 H1N1 Hemagglutinin (HA) sequences sampled in April 2019 (blue boxes), 1140 H1N1 HA sequences sampled in January 2020 (red boxes), 2019–2020 H1N1 Northern hemisphere vaccine strain (A/Brisbane/02/2018, purple diamond) and 2018–2019 vaccine strain (A/Michigan/45/201, grey diamond). All HA nucleotide sequences were downloaded from GISAID[20,21]. The H1N1 HA sequences in January 2020 showed greater tree distances from the 2019–2020 H1N1 vaccine strain, compared to those in April 2019 (b) Two-dimensional map of 255 sequences collected in April 2019 along with the 2019–2020 H1N1 vaccine strain’s HA sequence (purple diamond) and 2018–2020 HIN1 vaccine’s HA sequence (grey diamond). The nucleotide distance among all pairs of sequences was scaled to the Euclidean distance by multidimensional scaling. (c) Two-dimensional map of 1140 HA sequences collected in January 2020 along with the two vaccine sequences. (d) The HA sequences in January 2020 showed greater nucleotide distances from the 2019–2020 vaccine strain than those in April 2019 (p < 0.001, Wilcoxon rank sum test).