| Literature DB >> 32167180 |
Changtai Wang1,2,3, Zhongping Liu1,2, Zixiang Chen2, Xin Huang2, Mengyuan Xu1,2, Tengfei He1,2, Zhenhua Zhang1,2,4.
Abstract
Starting around December 2019, an epidemic of pneumonia, which was named COVID-19 by the World Health Organization, broke out in Wuhan, China, and is spreading throughout the world. A new coronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the Coronavirus Study Group of the International Committee on Taxonomy of Viruses was soon found to be the cause. At present, the sensitivity of clinical nucleic acid detection is limited, and it is still unclear whether it is related to genetic variation. In this study, we retrieved 95 full-length genomic sequences of SARAS-CoV-2 strains from the National Center for Biotechnology Information and GISAID databases, established the reference sequence by conducting multiple sequence alignment and phylogenetic analyses, and analyzed sequence variations along the SARS-CoV-2 genome. The homology among all viral strains was generally high, among them, 99.99% (99.91%-100%) at the nucleotide level and 99.99% (99.79%-100%) at the amino acid level. Although overall variation in open-reading frame (ORF) regions is low, 13 variation sites in 1a, 1b, S, 3a, M, 8, and N regions were identified, among which positions nt28144 in ORF 8 and nt8782 in ORF 1a showed mutation rate of 30.53% (29/95) and 29.47% (28/95), respectively. These findings suggested that there may be selective mutations in SARS-COV-2, and it is necessary to avoid certain regions when designing primers and probes. Establishment of the reference sequence for SARS-CoV-2 could benefit not only biological study of this virus but also diagnosis, clinical monitoring and intervention of SARS-CoV-2 infection in the future.Entities:
Keywords: SARS-CoV-2; homology; nucleotide; reference sequence; variation
Mesh:
Substances:
Year: 2020 PMID: 32167180 PMCID: PMC7228400 DOI: 10.1002/jmv.25762
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Figure 1Flow chart of severe acute respiratory syndrome coronavirus 2 (SARS‐COV‐2) sequence data collection
Comparison of homology among open‐reading frames of SARS‐COV‐2 isolate strains
| Region (ORF) | Nucleotide (nt) | Amino acid (aa) | ||||
|---|---|---|---|---|---|---|
| Start and end | Length | Homology (%) | Start and end | Length | Homology (%) | |
| Full length | 1‐29870 | 29 870 | 99.99 (99.91‐100) | 1‐9744 | 9744 | 99.99 (99.79‐100) |
| 1ab | 266‐21555 | 21 306 | 100 (99.91‐100) | 1‐7096 | 7096 | 100 (99.80‐100) |
| 1a | 266‐13483 | 13 218 | 99.99 (99.88‐100) | 1‐4401 | 4401 | 100 (99.73‐100) |
| 1b | 13468‐21555 | 8088 | 100 (99.93‐100) | 4402‐7096 | 2695 | 100 (99.85‐100) |
| S | 21563‐25384 | 3822 | 100 (99.82‐100) | 7097‐8369 | 1273 | 100 (99.53‐100) |
| 3a | 25393‐26220 | 828 | 100 (99.76‐100) | 8370‐8644 | 275 | 100 (99.27‐100) |
| E | 26245‐26472 | 228 | 100 (100‐100) | 8645‐8719 | 75 | 100 (100‐100) |
| M | 26523‐27191 | 669 | 100 (99.70‐100) | 8720‐8941 | 222 | 100 (99.95‐100) |
| 6 | 27202‐27387 | 186 | 100 (100‐100) | 8942‐9002 | 61 | 100 (100‐100) |
| 7a | 27394‐27759 | 366 | 100 (99.73‐100) | 9003‐9123 | 121 | 100 (99.17‐100) |
| 7b | 27756‐27887 | 132 | 100 (100‐100) | 9124‐9166 | 43 | 100 (100‐100) |
| 8 | 27894‐28259 | 366 | 100 (99.45‐100) | 9167‐9287 | 121 | 100 (98.35‐100) |
| N | 28274‐29533 | 1260 | 100 (99.84‐100) | 9288‐9706 | 419 | 100 (99.76‐100) |
| 10 | 29558‐29674 | 117 | 100 (99.15‐100) | 9707‐9744 | 38 | 100 (97.37‐100) |
Abbreviations: E, Envelope; M, Membrane; N, Nucleoprotein; ORF, open‐reading frame; S, Spike; SARS‐COV‐2, severe acute respiratory syndrome coronavirus 2; 1ab, open‐reading frames 1ab; 1a, open‐reading frames 1a; 1b, open‐reading frames 1b; 3a, open‐reading frames 3a; 6, open‐reading frames 6; 7a, open‐reading frames 7a; 7b, open‐reading frames 7b; 8, open‐reading frames 8; 10, open‐reading frames 10.
Median (min‐max).
The major locus of nucleotide or amino acid variation in SARS‐CoV‐2 isolate strains (≥3/95)
| Regions (ORF) | Nucleotide mutations | Amino acid mutations | ||||
|---|---|---|---|---|---|---|
| site | No. | Type | Site | No. | Type | |
| 1a | 2662 | 3 | C→T | 3606 | 6 | L→F |
| 8782 | 28 | C→T/Y | ||||
| 11083 | 6 | G→T | ||||
| 1b | 17373 | 3 | C→T | |||
| 18060 | 3 | C→T | ||||
| S | 21707 | 4 | C→T | 49 | 4 | H→Y |
| 24034 | 7 | C→T/Y | 860 | 3 | V→Q | |
| 3a | 26144 | 6 | G→T | 251 | 6 | G→V |
| M | 26729 | 5 | T→C/Y | |||
| 8 | 28077 | 5 | G→C/S | 62 | 5 | V→L |
| 28144 | 29 | T→C/Y | 84 | 29 | L→S | |
| N | 28854 | 6 | C→T/Y | 194 | 6 | S→L |
| 29095 | 11 | C→T | ||||
Abbreviations: M, Membrane; N, Nucleoprotein; ORF, open‐reading frame; S, Spike; SARS‐COV‐2, severe acute respiratory syndrome coronavirus 2; 1a, open‐reading frames 1a; 1b, open‐reading frames 1b; 3a, open‐reading frames 3a; 8, open‐reading frames 8.
Figure 2Distribution of the number of mutant bases or amino acids in each SARS‐COV‐2 isolate strain. A, Full‐length and partial regions (1ab, 1a, 1b, S, E, M, N) nucleotides. B, Partial regions (5NCR, 3a, 6, 7a, 7b, 8, 10) nucleotides. C, Partial regions (1ab, 1a, 1b, S, E, M, N) amino acids. D, Partial regions (5‐untranslated region, 3a, 6, 7a, 7b, 8, 10) amino acids. SARS‐COV‐2, severe acute respiratory syndrome coronavirus 2
Figure 3Common sites and frequency of mutation in SARS‐COV‐2 isolate strains (≥5/95). A, Nucleotides. B, Amino acids. SARS‐COV‐2, severe acute respiratory syndrome coronavirus 2
Differences between published primer/probe sequences and reference or clinical isolates
| Target gene | Direction | Primer (5′−3′) | Location | Reference strain | Clinical isolates | References |
|---|---|---|---|---|---|---|
| ORF1b | Forward |
| 14961‐14983 |
| 95 | Lancet ( |
| Reverse | A | 15283‐15304 | A | 95 | ||
| S | Forward | CCTACTAAATTAAATGATCTCTGCTTTACT | 22712‐22741 | No difference | No difference | |
| Reverse | CAAGCTATAACGCAGCCTGTA | 22849‐22869 | No difference | No difference | ||
| ORF1b | Forward | TGGGGYTTTACRGGTAACCT | 18778‐18797 | No difference | No difference | Clinical Chemistry ( |
| Reverse | AACRCGCTTAACAAAGCACTC | 18889‐18909 | No difference | No difference | ||
| Probe | TAGTTGTGATGCWATCATGACTAG | 18849‐18872 | No difference | No difference | ||
| N | Forward | TAATCAGACAAGGAACTGATTA | 29145‐29166 | No difference | No difference | |
| Reverse | CGAAGGTGTGACTTCCATG | 29236‐29254 | No difference | No difference | ||
| Probe | GCAAATTG | 29179‐29198 | GCAAATTG | 95 | ||
| ORF1b | Forward | GTGARATGGTCATGTGTGGCGG | 15431‐15452 | No difference | No difference | Euro Surveill ( |
| Probe2 | CAGGTGGAACCTCATCAGGAGATGC | 15470‐15494 | No difference | No difference | ||
| Probe1 | CCAGGTGGWACRTCATCMGG | 15469‐15494 | CCAGGTGGAACCTCATCAGG | 95 | ||
| Reverse | CARATGTTAAASACACTATTAGCATA | 15505‐15530 | No difference | No difference | ||
| E | Forward | ACAGGTACGTTAATAGTTAATAGCGT | 26269‐26294 | No difference | No difference | |
| Probe | ACACTAGCCATCCTTACTGCGCTTCG | 26332‐26357 | No difference | No difference | ||
| Reverse | ATATTGCAGCAGTACGCACACA | 26360‐26381 | No difference | No difference | ||
| N | Forward | CACATTGGCA | 28706‐28724 | No difference | 1 | |
| Probe | ACTTCCTCAAGGAACAACATTGCCA | 28753‐28777 | No difference | No difference | ||
| Reverse | GAGGAACGAGAAGAGGCTTG | 28814‐28833 | No difference | No difference |
Note: Italics indicates the location of the difference.
Abbreviations: N, Nucleoprotein; S, Spike; 1b, open‐reading frames 1b.