| Literature DB >> 35653026 |
Haeyoung Jeong1, Siseok Lee2, Junsang Ko2, Minsu Ko2, Hwi Won Seo3.
Abstract
BACKGROUND: As the rapidly evolving characteristic of SARS-CoV-2 could result in false negative diagnosis, the use of as much sequence data as possible is key to the identification of conserved viral sequences. However, multiple alignment of massive genome sequences is computationally intensive.Entities:
Keywords: Multiple sequence alignment; RT-qPCR; SARS-CoV-2
Mesh:
Year: 2022 PMID: 35653026 PMCID: PMC9160177 DOI: 10.1007/s13258-022-01264-7
Source DB: PubMed Journal: Genes Genomics ISSN: 1976-9571 Impact factor: 2.164
Fig. 1The workflow for collection and manipulation of SARS-CoV-2 genomes. a Data collection and dereplication process. 32,483 complete viral sequences in the thick-lined boxes (right) were taken as the final dataset. b The Pango lineage distributions of the ‘all’ dataset (comprising the initial ‘NCBI-non-Delta,’ ‘GISAID-S. Korea,’ and ‘NCBI-Delta’ datasets; N = 230,163) and the dereplicated dataset (N = 32,483). Note that singletons were removed from the NCBI-non-Delta dataset after dereplication
List of 17 conserved sequences identified from multiple sequence alignments of 32,483 SARS-CoV-2 genomes
| ID | Position (length in bp) | Sequence | Gene (product) | Identity with consensus sequence |
|---|---|---|---|---|
| CS_1 | 2576–2836 (261) | AGTGAAGCTGTTGAAGCTCCAtTGGTTGGTACACCAGTTTGTATTAACGGGCTTATGTTG CTCGAAATCAAAGACACAGAAAAGTACTGTGCCCTTGCACCTAATATGATGGTAACAAAC AATACCTTCACACTCAAAGGCGGTGCACCAACAAAGGTTACTTTTGGTGATGAcACTGTG ATAGAAGTGCAAGGTTACAAGAGTGTGAATATCACTTTTGAACTTGATGAAAGGATTGAT AAAGTACTTAATGAGAAGTGC | ORF1ab (nsp2..nsp3) | 261/261 (100%) |
| CS_2 | 4508–4670 (163) | GTGGTTGATTATGGTGCTAGATTTTACTTTTACACCAGTAAAACAACTGTAGCGTCACTT ATCAACACACTTAACGATCTAAATGAAACTCTTGTTACAATGCCACTTGGCTATGTAACA CATGGCTTAAATTTGGAAGAAGCTGCTCGGTATATGAGATCTC | ORF1ab (nsp3) | 163/163 (100%) |
| CS_3 | 6830–7332 (503) | AGAATTAAAGCATCTATGCCGACTACTATAGCAAAGAATACTGTTAAGAGTGTCGGTAAA TTTTGTCTAGAGGCTTCATTTAATTATTTGAAGTCACCTAATTTTTCTAAACTGATAAAT ATTAtAATTTGGTTTTTACTATTAAGTGTTTGCCTAGGTTCTTTAATCTACTCAACCGCT GCTTTAGGTGTTTTAATGTCTAATTTAGGCATgCCTTCTTACTGTACTGGTTACAGAGAA GGCTATTTGAACTCTAcTAATGTCACTATTGCAACCTACTGTACTGGTTCTATAcCTTGT AGTGTTTGTCTTAGTGGTTTAGATTCTTTAGACACCTATCCTTCTTTAGAAACTATACAA ATTACCATTTCaTCTTTTAAATGGGATTTAACTGCTTTTGGCTTAGTTGCAGAGTGGTTT TTGGCATATATTCTTTTCACTAGGTTTTTCTATGTACTTGGATTGGCTGcAATCATGCAA TTGTTTTTCAGCTATTTTGCAGT | ORF1ab (nsp3) | 502/503 (99%) |
| CS_4 | 8628–8903 (276) | ATTTAATAACACCTGTTCATGTCATGTCTAAACATACTGACTTTTCAAGTGAAATCATAG GATACAAGGCTATTGATGGTGGTGTCACTCGTGACATAGCATCTACAGATACTTGTTTTG CTAACAAACATGCTGATTTTGACACATGGTTTAGCCAGCGTGGTGGTAGTTATACTAATG AcAAAGCTTGCCCATTGATTGCTGCAGTCATAACAAGAGAAGTGGGTTTTGTCGTGCCTG GTTTGCCTGGCACGATATTACGCACAACTAATGGTG | ORF1ab (nsp4) | 276/276 (100%) |
| CS_5 | 16,509–16,758 (250) | TTTATATAAAAATACATGTGTTGGTAGCGATAATGTTACTGACTTTAATGCAATTGCAAC ATGTGACTGGACAAATGCTGGTGATTACATTTTAGCTAACACCTGTACTGAAAGACTCAA GCTTTTTGCAGCAGAAACGCTcAAAGCTACTGAGGAGACATTTAAACTGTCTTATGGTAT TGCTACTGTACGTGAAGTGCTGTCTGACAGAGAATTACATCTTTCATGGGAAGTTGGTAA ACCTAGACCA | ORF1ab (RdRp..helicase) | 250/250 (100%) |
| CS_6 | 17,716–17,993 (278) | GGCGTGGTAAGAGAATTCCTTACACGTAACCCTGCTTGGAGAAAAGcTGTCTTTATTTCA CCTTATAATTCACAGAATGCTGTAGCCTCAAAGATTTTGGGACTACCAACTCAAACTGTT GATTCATCACAGGGCTCAGAATATGACTATGTCATATTCACTCAAACCACTGAAACAGCT CACTCTTGTAATGTAAACAGATTTAATGTTGCTATTACCAGAGCAAAAGTAGGCATACTT TGCATAATGTCTGATAGAGACCTTTATGACAAGTTGCA | ORF1ab (helicase) | 278/278 (100%) |
| CS_7 | 19,253–19,570 (318) | TGCTATCTAACCTTAACTTGCCTGGTTGTGATGGTGGCAGTttgtatgtaaataaacatg cattccacacaccagcttttgataaaagtgcttttgttaatttaaaacaattaccatttt tctattactctgacagtccatgtgagtctcatggaaaacaagtagtgtcagatatagatt atgtaccactaaagtctgctacgtgtataacacgttgcaatttaggtggtgctgtctgta gacatcatgctaatgagtacagattgtatctcgatgcttataacatgatgatctcagctg gctttagctTGTGGGTTT | ORF1ab (3′-to-5′ exonuclease) | 318/318 (100%) |
| CS_8 | 19,923–20,235 (313) | TTGTTCTATGACTGACATAGCCAAGAAACCAACTGAAACgATTTGTGCACCACTCACTGT CTTTTTTGATGGTAGAGTTGATGGTCAAGTAGACTTATTTAGAAATGCCCGTAATGGTGT TCTTATTACAGAAGGTAGTGTTAAAGGTTTACAACCATCTGTAGGTCCCAAACAAGCTAG TCTTAATGGAGTCACATTAATTGGAGAAGCCGTAAAAACACAGTTCAATTATTATAAGAA AGTTGATGGTGTTGTCCAACAATTACCTGAAACTTACTTTACTCAGAGTAGAAATTTACA AGAATTTAAACCC | ORF1ab (endoRNAse) | 313/313 (100%) |
| CS_9 | 20,238–20,483 (246) | GAGTCAAATGGAAATTGATTTCTTaGAATTaGCTATGgATGAATTCATTGAACGGTATAA ATTAGAAGGCTATGCCTTCGAACATATCGTTTATGGAGATTTTAGTCATAGTCAGTTAGG TGGTTTACATCTACTGATTGGACTAGCTAAACGTTTTAaGGAATCACCTTTTGAATTAGA AGATTTTATTCCTATGGACAGTACAGTTAAAAACTATTTCATAACAGATGCGCAAACAGG TTCATC | ORF1ab (endoRNAse) | 246/246 (100%) |
| CS_10 | 21,169–21,376 (208) | ATAACAGAACATTCTTGGAATGCTGATCTTTATAAGCTCATGGGaCACTTCGCATGGTGG ACAGCCTTTGTTACTAATGTGAATGCgTCATCATCTGAAGCATTTTTAATTGGATGTAAT TATCTTGGCAAACCAcGcGAACAAATAGATGGTTATGTCATGCATGCAAATTACATATTT TGGAGGAATACAAATCCAATTCAGTTGT | ORF1ab (2′- | 208/208 (100%) |
| CS_11 | 21,771–21,990 (220) | TCTCTGGGACCAATGGTACTAaGAGGTTTGaTAACCCTGTcCTACCATTTAATGATGGTG TTTAtTTTGCTTCCAcTGAGAAGTcTAACATAATAAGAGGCTGGATTTTTGGTACTACTT TAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAacgCTACTAAtgttgttattaaag tctgtgaatttcaattttgtaatgatccatttttgggtgt | S (spike glycoprotein) | 220/220 (100%) |
| CS_12 | 22,325–22,542 (218) | TCTTCAGGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACT TTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGAC CCTCTCTCAGAAACAAAGTGTACGTTGAAATCCTTCACTGTAGAAAAAGGAATCTATCAA ACTTCTAACTTTAGAGTCCAACCAACAGAATCTATTGT | S (spike glycoprotein) | 218/218 (100%) |
| CS_13 | 22,874–23,144 (271) | TCTAACAAtCTTGATTCTAAGGTTGGTGGTAATTATAATTACCtGTATAGATTGTTTAGG AAGTCTAATCTCAAACCTTTTGAGAGAGATATTTCAACTGAAATCTATCAGGCCGGTAgC AcACCTTGTAATGGTGTTgAAGGTTTTAATTGTTACTTTCCTTTACAAtCATATGGTTTC CAACCCACTaATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACTTTCTTTTGAACTT CTACATGCACCAGCAACTGTTTGTGGACCTA | S (spike glycoprotein) | 270/271 (99%) |
| CS_14 | 25,630–25,797 (168) | GTTTGCAACTTGCTGTTGTTGTTTGTAACAGTTTACTCACACCTTTTGCTCGTTGCTGCT GGCCTTGAAGCCCcTTTTCTCTATCTTTATGCTTTAGTCTACTTCTTGCAGAGTATAAAC TTTGTAAGAATAATAATGAGGCTTTGGCTTTGCTGgAAATGCCGTTCC | ORF3a (ORF3a protein) | 168/168 (100%) |
| CS_15 | 25,974–26,214 (241) | ATCTGGAGTAAAAGACTGTGTTGTATTACACAGTTACTTCACTTCAGACTATTACCAGCT GTACTCAACTCAATTGAGTACAGACACTGGTGTTGAACATGTTACCTTCTTCATCTACAA TAAAATTGTTGATGAGcCTGAAGAACATGTCCAAATTCACACAATCGACGGTTCAtCCGG AGTTGTTAATCCAGTAATGGAACCAATTTATGATGAACCGACGACGACTACTAGCGTGCC T | ORF3a (ORF3a protein) | 241/241 (100%) |
| CS_16 | 27,467–27,808 (342) | GAGGTACAACAGTACTTTTAAAAGAACCTTGCTCTTCTGGAACATACGAGGGCAATTCAC CATTTCATCCTCTAGCTGATAACAAATTTGCACTGACTTGCTTTAGCACTCAATTTGCTT TTGCTTGTCCTGACGGCGTAAAACACGTCTATCAGTTACGTGCCAGATCAGtTTCACCTA AACTGTTCATCAGACAAGAGGAAGTTCAAGAACTTTACTCTCCAATTTTTCTTATTGTTG CGGCAATAGTGTTTATAACACTTTGCTTCACAcTCAAAAGAAAGAcAGAATGATTGAACT TTCATTAATTGACTTCTATTTGTGCTTTTTAGCCTTTCTGCT | ORF7a (ORF7a protein) | 342/342 (100%) |
| CS17 | 29,357–29,511 (155) | ACATTcCCACCAACAGAGCCTAAAAAGGACAAAAAGAAGAAGGCTgATGAAACTCAAGCC TTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTTGGAT GATTTCTCCAAACAATTGCAACAATCCATGAGCAG | N (nucleocapsid phosphoprotein) | 155/155 (100%) |
All information was derived from RefSeq NC_045512.2. Less conserved positions (conservation < 99%) are shown in lowercase
Fig. 2The locations of the 17 conserved sequences (magenta blocks) on the reference SARS-CoV-2 genome map. The coding region of RdRp (RNA-dependent RNA polymerase) is shown beneath the ORF1ab coding region. The lower plot shows the conservation (red line), gappyness (blue line), and normalized Shannon entropy (plum triangles) of each nucleotide position. The conserved sequences are shown as gray shaded areas in the lower plot
The in silico amplification coverage of various primer sets against SARS-CoV-2 genome datasets. The percentages in the column headers represent the percent mismatches allowed between the primer and template sequences (‘primersearch –mismatchpercent’)
aDetection probe sequence: 5′-JOE-CGGGCTTATGTTGCTCGAAATCAA-BHQ1-3′
bDetection probe sequence: 5′-Texas Red-TGCTCGTTGCTGCTGGCCTTGAAG-BHQ2-3′
c,d,ePrimers with a 1 bp mismatch to the reference sequence (NC_045512.2)
fAvailable from https://www.who.int/docs/default-source/coronaviruse/uscdcrt-pcr-panel-primer-probes.pdf?sfvrsn=fa29cb4b_2
gAvailable from https://www.niid.go.jp/niid/en/2013-03-15-04-39455-59/2483-disease-based/ka/corona-virus/2019-ncov/9334-ncov-vir3-2.html
Fig. 3Specific detection of nine SARS-CoV-2 variants (Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Eta, Iota, and Kappa) using primer sets NH-CS_1 (blue) and NH-CS_14 (red). Multiplex RT-PCR was performed by serially diluted SARS-CoV-2 RNA templates (102 to 104 copies) using primers targeting ORF3a (red) and nsp2 (blue) genes
Comparison of Ct values for RNA extracted from nine SARS-CoV-2 variants
| Target | Copy number | Alpha | Beta | Gamma | Delta | Epsilon | Zeta | Eta | Iota | Kappa |
|---|---|---|---|---|---|---|---|---|---|---|
| nsp2 (JOE) | 104 | 23.35 | 23.16 | 23.30 | 23.28 | 23.46 | 23.26 | 23.38 | 23.29 | 23.33 |
| 103 | 26.75 | 26.74 | 26.34 | 26.42 | 26.83 | 26.87 | 26.42 | 26.39 | 26.31 | |
| 102 | 29.51 | 29.46 | 30.09 | 29.89 | 30.23 | 30.10 | 30.05 | 30.04 | 30.10 | |
| ORF3a (Texas Red) | 104 | 21.71 | 21.67 | 22.01 | 21.77 | 22.16 | 22.03 | 21.90 | 21.79 | 21.94 |
| 103 | 25.26 | 25.14 | 25.14 | 25.03 | 25.46 | 25.50 | 25.28 | 25.02 | 25.07 | |
| 102 | 28.30 | 28.47 | 28.58 | 28.57 | 29.18 | 28.96 | 28.95 | 28.39 | 28.54 |