Literature DB >> 32966857

Mutations on COVID-19 diagnostic targets.

Rui Wang1, Yuta Hozumi1, Changchuan Yin2, Guo-Wei Wei3.   

Abstract

Effective, sensitive, and reliable diagnostic reagents are of paramount importance for combating the ongoing coronavirus disease 2019 (COVID-19) pandemic when there is neither a preventive vaccine nor a specific drug available for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It will cause a large number of false-positive and false-negative tests if currently used diagnostic reagents are undermined. Based on genotyping of 31,421 SARS-CoV-2 genome samples collected up to July 23, 2020, we reveal that essentially all of the current COVID-19 diagnostic targets have undergone mutations. We further show that SARS-CoV-2 has the most mutations on the targets of various nucleocapsid (N) gene primers and probes, which have been widely used around the world to diagnose COVID-19. To understand whether SARS-CoV-2 genes have mutated unevenly, we have computed the mutation rate and mutation h-index of all SARS-CoV-2 genes, indicating that the N gene is one of the most non-conservative genes in the SARS-CoV-2 genome. We show that due to human immune response induced APOBEC mRNA (C > T) editing, diagnostic targets should also be selected to avoid cytidines. Our findings might enable optimally selecting the conservative SARS-CoV-2 genes and proteins for the design and development of COVID-19 diagnostic reagents, prophylactic vaccines, and therapeutic medicines. AVAILABILITY: Interactive real-time online Mutation Tracker.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Year:  2020        PMID: 32966857      PMCID: PMC7502284          DOI: 10.1016/j.ygeno.2020.09.028

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which was first reported in Wuhan in December 2019, is an unsegmented positive-sense single-stranded RNA virus that belongs to the β-coronavirus genus and coronaviridae family. Coronaviruses are some of the most sophisticated viruses with their genome size ranging from 26 to 32 kilobases in length. Caused by SARS-CoV-2, the coronavirus disease 2019 (COVID-19) pandemic outbreak has spread to more than 200 countries and territories with more than 15,012,731 infection cases and 619,150 fatalities worldwide by July 23, 2020 [1]. Additionally, travel restrictions, quarantines, and social distancing measures have essentially put the global economy on hold. Furthermore, since there is neither specific medication nor vaccine for COVID-19 at this moment, economy reopening depends vitally on effective COVID-19 diagnostic testing, patient isolation, contact tracing, and quarantine. Reliable diagnostic testing kits are critical and essential for combating COVID-19. There are three types of diagnostic tests for COVID-19, namely polymerase chain reaction (PCR) tests, antibody tests, and antigen tests. PCR tests detect the genetic material from the virus. Antibody tests, also called serological tests, examine the presence of antibodies produced from immune response to the virus infection. The antigen tests detect the presence of viral antigens, e.g., parts of the viral spike protein. The PCR tests are relatively more accurate but take time to show the test result. The protein tests based on antibody or antigen can display test results in minutes but are relatively insensitive and subject to host immune response limitations. PCR diagnostic test reagents were designed based on early clinical specimens containing a full spectrum of SARS-CoV-2 [2], particularly the reference genome collected on January 5, 2020, in Wuhan (SARS-CoV-2, NC004718) [3]. Approved by the United States (US) Food and Drug Administration (FDA), the US Centers for Disease Control and Prevention (CDC) has detailed guidelines for COVID-19 diagnostic testing, called “CDC 2019-Novel Coronavirus (2019-nCoV) Real-Time RT-PCR Diagnostic Panel” (https://www.fda.gov/media/134922/download). The US CDC has designated two oligonucleotide primers from regions of the virus nucleocapsid (N) gene, i.e., N1 and N2, as probes for the specific detection of SARS-CoV-2. The panel has also selected an additional primer/probe set, the human RNase P gene (RP), as control samples. Many other diagnostic primers and probes based on RNA-dependent RNA polymerase (RdRP), envelope (E), nonstructural protein 14 (NSP14), and nucleocapsid (N) genes have been designed [4] and/or designated by the World Health Organization (WHO) as shown in Table S1 of the Supporting Material, which provides the details of 54 commonly used diagnostic primers and probes [5]. The diagnostic kits are often static over time, yet SARS-CoV-2 is undergoing fast mutations. Hence, it is reported that different primers and probes show nonuniform performance [[6], [7], [8]]. In this study, we genotype 31,421 SARS-CoV-2 genome isolates in the globe and reveal numerous mutations on the COVID-19 diagnostic targets commonly used around the world, including those designated by the US CDC. We identify and analyze the SARS-CoV-2 mutation positions, frequencies, and encoded proteins in the global setting. These mutations may impact the diagnostic sensitivity and specialty, and therefore, they should be considered in designing new testing kits as the current effort in COVID-19 testing, prevention, and control. We propose diagnostic target selection and optimization based on nucleotide-based and gene-based mutation-frequency analysis.

Results and analysis

Genotyping analysis

We first genotype 31,421 SARS-CoV-2 genome samples from the globe as of July 23, 2020. The genotyping results unravel 13,402 single mutations among these virus isolates. Typically, a SARS-CoV-2 isolate can have eight co-mutations on average. A large number of mutations may occur on all of the SARS-CoV-2 genes and have broad effects on diagnostic kits, vaccines, and drug developments. Moreover, we cluster these mutations by K-means methods, resulting in globally at least six distinct subtypes of the SARS-CoV-2 genomes, from Cluster I to Cluster VI. Table 1 shows the mutation distribution clusters with sample counts (SC) and total single mutation counts (MC) in 20 countries.
Table 1

The mutation distribution clusters with sample counts (SC) and total single mutation counts (MC).


Cluster I
Cluster II
Cluster III
Cluster IV
Cluster V
Cluster VI
CountrySCMCSCMCSCMCSCMCSCMCSCMC
US325224,846201314,7372863686236627,01256237983042706
CA113835805619106424178452533290
AU1731204587504875101019521271658851321076
DE6950425121558262092714443366
FR100718145522248523744651083
UK2952328192712,777217127,636162316,123189011,835291925,576
IT18810433561243085728324192
RU7522321921975332187119968
CN3222871155232750835326
JP18134243100123272979231391911676
KR005832700000000
IN2921226830452002703399484014184751487
IS6644610359530345108915292459525
ES433163119833337365170110342359
BR32675178100921074263591
BE564118540066783115103123013811411239
SA1611096100141261713317
TR0028339131585047642831273
PE21253610124548958217
CL13912728221285496653220020169

The listed countries are United States (US), Canada (CA), Australia (AU), Germany (DE), France (FR), United Kingdom (UK), Italy (IT), Russia (RU), China (CN), Japan (JP), Korean (KR), India (IN), Iceland (IS), Brazil (BR), Spain (ES), Belgium (BE), Saudi Arabia (SA), Turkey (TR), Peru(PE), and Chile (CL).

The mutation distribution clusters with sample counts (SC) and total single mutation counts (MC). The listed countries are United States (US), Canada (CA), Australia (AU), Germany (DE), France (FR), United Kingdom (UK), Italy (IT), Russia (RU), China (CN), Japan (JP), Korean (KR), India (IN), Iceland (IS), Brazil (BR), Spain (ES), Belgium (BE), Saudi Arabia (SA), Turkey (TR), Peru(PE), and Chile (CL). All of the countries are involved in six clusters except Korean (KR), Saudi Arabia (SA), and Turkey (TR). Among them, China initially had samples only in clusters II and its sample distributions reached to other Clusters after March 2020. Cluster I, II, and IV are dominated in the United States. Germany (DE) and France (FR) samples are mainly in Cluster I, IV, and VI. Italy (IT) samples are mainly in Clusters III, IV, V, and VI. Samples in Turkey (TR) are mainly in Cluster II, III, IV, and VI. Japan (JP) samples are dominated in Cluster II and VI, Korea (KR) samples belong to Cluster II only. Cluster II is common to all countries. Fig. 1 depicts the distribution of six distinct clusters in the world. The light blue, dark blue, green, red, pink, and yellow represent Cluster I, Cluster II, Cluster III, Cluster IV, Cluster V, and Cluster VI, respectively. The color of the dominated Cluster decides the base color of each country. To be noted, although some countries have a lot of confirmed sequences, a very limited number of complete genome sequences are deposited in the GISAID, which causes the geographical bias in the Table 1.
Fig. 1

The scatter plot of six distinct clusters in the world. The light blue, dark blue, green, red, pink, and yellow represent Cluster I, Cluster II, Cluster III, Cluster IV, Cluster V, and Cluster VI, respectively. The base color of each country is decided by the color of the dominated Cluster. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The scatter plot of six distinct clusters in the world. The light blue, dark blue, green, red, pink, and yellow represent Cluster I, Cluster II, Cluster III, Cluster IV, Cluster V, and Cluster VI, respectively. The base color of each country is decided by the color of the dominated Cluster. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Mutations on diagnostic targets

Table 2 provides all mutations on various primers and probes and their occurring frequencies in various clusters, where SC is the sample counts and MC is the mutation counts. More detailed mutation information is given in Tables S4–S56 of the Supporting Material. We plot the mutation position and frequency for 54 primers and probes in this work in Fig. 2, Fig. 3, Fig. 4, Fig. 5, and Fig. 6 .
Table 2

Summary of mutations on COVID-19 diagnostic primers and probes and their occurrence frequencies in clusters. Here, SC is the sample counts and MC is the mutation counts.

PrimerMCSCCluster ICluster IICluster IIICluster IVCluster VCluster VI
RX7038-N1 primer (Fw)a15795141228146
RX7038-N1 primer (Rv)a17113166149221
RX7038-N2 primer (Fw)a760310242111
RX7038-N2 primer (Rv)a65021761537
RX7038-N3 primer (Fw) [9]1328742241326146
RX7038-N3 primer (Rv) [9]127041073964
N1-U.S.-P [5]1585647822031154
N2-U.S.-P [5]1170104041240
N3-U.S.-P [5]16845271521106
N-Sarbeco-Fb [4]12634201015104
N-Sarbeco-Pb [4]121161193042159
N-Sarbeco-Rb [4]17156372648054
N-China-F [5]2326,2803822610,8731391714,987
N-China-R [5]1721751517157815
N-China-P [5]720146810
N-HK-F [5]514912747164
N-HK-R [5]14841412143545
N-JP-F [5]1066510916260
N-JP-P [5]9320511637
N-TL-F [5]171491841431136
N-TL-R [5]1711529776633
N-TL-P [5]114515135120
E-Sarbeco-F1c5230010922
E-Sarbeco-R2c418065160
E-Sarbeco-P1c9481296930
nCoV-IP2-12669Fwc3500171211010
nCoV-IP2-12759Rvc11739123244771681270
nCoV-IP2-12696bProbe(+)c817241640
nCoV-IP4-14059Fwc39007200
nCoV-IP4-14146Rvc1138779915
nCoV-IP4-14084Probe(+)c114931261954
RdRP-SARSr-F2d58921537440
RdRP-SARSr-R1d [4]34200200
RdRP-SARSr-P2d [4]410062200
ORF1ab-China-F [5]419042652
ORF1ab-China-R [5]00000000
ORF1ab-China-P [5]1461163011310
ORF1b-nsp14-HK-F [5]612216300
ORF1b-nsp14-HK-R [5]98939521465
ORF1b-nsp14-HK-P [5]63721913012
SC2-Fe1188053429137
SC2-Re00000000
NIID_WH-1_F501 [10]132550205251834
NIID_WH-1_R913 [10]1412819491842
NIID_WH-1_F509 [10]1030757632
NIID_WH-1_R854 [10]9261632533117518
NIID_WH-1_Seq F519 [10]19130889171132
NIID_WH-1_Seq R840 [10]126669218319
WuhanCoV-spk1-f [10]14433265221112384
WuhanCoV-spk1-r [10]410023122
NIID_WH-1_F24381 [10]204942753016153137
NIID_WH-1_R24873 [10]515143700
NIID_WH-1_Seq_F24383 [10]2150327530221531310
NIID_WH-1_Seq_R24865 [10]617245600

https://www.fda.gov/media/136691/download

https://www.eurosurveillance.org/content/table/10.2807/1560-7917.ES.2020.25.3.2000045.t1?fmt=ahah&fullscreen=true

https://www.who.int/docs/default-source/coronaviruse/real-time-rt-pcr-assays-for-the-detection-of-sars-cov-2-institut-pasteur-paris.pdf?sfvrsn=3662fcb6_2

https://www.who.int/docs/default-source/coronaviruse/protocol-v2-1.pdf?sfvrsn=a9ef618c_2

https://www.cdc.gov/coronavirus/2019-ncov/lab/multiplex-primer-probes.html

Fig. 2

Illustration of mutation positions and frequencies on the primer and/or probes of RX7038-N1 primer (Fw), RX7038-N1 primer (Rv), RX7038-N2 primer (Fw), RX7038-N2 primer (Rv), RX7038-N3 primer (Fw), RX7038-N3 primer (Rv), N1-U.S.-P, N2-U.S.-P, N3-U.S.-P, N-Sarbeco-F.

Fig. 3

Illustration of mutation positions and frequencies on the primer and/or probes of N-Sarbeco-P, N-Sarbeco-R, N-China-F, N-China-R, N-China-P, N-HK-F, N-HK-R, N-JP-F, N-JP-P, N-TL-F.

Fig. 4

Illustration of mutation positions and frequencies on the primer and/or probes of N-TL-R, N-TL-P, E-Sarbeco-F1, E-Sarbeco-R2, E-Sarbeco-P1, nCoV-IP2-12669Fw, nCoV-IP2-12759Rv, nCoV-IP2-12696bProbe(+), nCoV-IP4-14059Fw, nCoV-IP4-14146Rv.

Fig. 5

Illustration of mutation positions and frequencies on the primer and/or probes of nCoV-IP4-14084Probe(+), RdRP-SARSr-F2, RdRP-SARSr-R1, RdRP-SARSr-P2, ORF1ab-China-F, ORF1ab-China-R, ORF1ab-China-P, ORF1b-nsp14-HK-F, ORF1b-nsp14-HK-R, ORF1b-nsp14-HK-P.

Fig. 6

Illustration of mutation positions and frequencies on the primer and/or probes of SC2-F, SC2-R,NIID_WH-1_F501,NIID_WH-1_R913, NIID_WH-1_F509, NIID_WH-1_R85, NIID_WH-1_Seq F519, NIID_WH-1_Seq R840, WuhanCoV-spk1-f, WuhanCoV-spk1-r, NIID_WH-1_F24381, NIID_WH-1_R24873, NIID_WH-1_Seq F24383, NIID_WH-1_Seq R24865.

Summary of mutations on COVID-19 diagnostic primers and probes and their occurrence frequencies in clusters. Here, SC is the sample counts and MC is the mutation counts. https://www.fda.gov/media/136691/download https://www.eurosurveillance.org/content/table/10.2807/1560-7917.ES.2020.25.3.2000045.t1?fmt=ahah&fullscreen=true https://www.who.int/docs/default-source/coronaviruse/real-time-rt-pcr-assays-for-the-detection-of-sars-cov-2-institut-pasteur-paris.pdf?sfvrsn=3662fcb6_2 https://www.who.int/docs/default-source/coronaviruse/protocol-v2-1.pdf?sfvrsn=a9ef618c_2 https://www.cdc.gov/coronavirus/2019-ncov/lab/multiplex-primer-probes.html Illustration of mutation positions and frequencies on the primer and/or probes of RX7038-N1 primer (Fw), RX7038-N1 primer (Rv), RX7038-N2 primer (Fw), RX7038-N2 primer (Rv), RX7038-N3 primer (Fw), RX7038-N3 primer (Rv), N1-U.S.-P, N2-U.S.-P, N3-U.S.-P, N-Sarbeco-F. Illustration of mutation positions and frequencies on the primer and/or probes of N-Sarbeco-P, N-Sarbeco-R, N-China-F, N-China-R, N-China-P, N-HK-F, N-HK-R, N-JP-F, N-JP-P, N-TL-F. Illustration of mutation positions and frequencies on the primer and/or probes of N-TL-R, N-TL-P, E-Sarbeco-F1, E-Sarbeco-R2, E-Sarbeco-P1, nCoV-IP2-12669Fw, nCoV-IP2-12759Rv, nCoV-IP2-12696bProbe(+), nCoV-IP4-14059Fw, nCoV-IP4-14146Rv. Illustration of mutation positions and frequencies on the primer and/or probes of nCoV-IP4-14084Probe(+), RdRP-SARSr-F2, RdRP-SARSr-R1, RdRP-SARSr-P2, ORF1ab-China-F, ORF1ab-China-R, ORF1ab-China-P, ORF1b-nsp14-HK-F, ORF1b-nsp14-HK-R, ORF1b-nsp14-HK-P. Illustration of mutation positions and frequencies on the primer and/or probes of SC2-F, SC2-R,NIID_WH-1_F501,NIID_WH-1_R913, NIID_WH-1_F509, NIID_WH-1_R85, NIID_WH-1_Seq F519, NIID_WH-1_Seq R840, WuhanCoV-spk1-f, WuhanCoV-spk1-r, NIID_WH-1_F24381, NIID_WH-1_R24873, NIID_WH-1_Seq F24383, NIID_WH-1_Seq R24865. It is noted that N-China-F [5] is the mostly-used reagent among all primers/probes, but the primer target gene of SARS-CoV-2 has 15 mutations involving thousands of samples, which may account for low efficacy of certain COVID-19 diagnostic kits in China [11]. Note that primers and probes typically have a small length of around 20 nucleotides. Currently, most primers and probes used in the US target are the N gene [5]. However, Table 2 shows that a plurality of mutations has been found in all of the targets of the US CDC designated COVID-19 diagnostic primers. The targets of N gene primers and probes used in Japan, Thailand, and China, including Hong Kong, have undergone multiple mutations involving many clusters. Therefore, the N gene may not be an optimal target for diagnostic kits, and the current test kits targeting the N gene should be updated accordingly for testing accuracy. It can be seen that so far, no mutation has been detected on ORF1ab-China-R and SC2-R, showing that they are two relatively reliable diagnostic primers. Notably, the targets of four E gene primers and probes have only six mutations.Also, no mutation has been found on the targets of ORF1ab-China-R and SC2-R. However, the target of nCoV-IP2-12759R recommended by Institute Pasteur, Paris has six mutations. Overall, targets of the envelope and RNA-dependent RNA polymerase based primers and probes have fewer mutations than the N gene. This observation leads to an assumption that the N gene is particularly prone to mutations.

Discussion

Mechanisms of mutation and mutation impact on diagnostics

The accumulation of the frequency of virus mutations is due to the natural selection, polymerase fidelity, cellular environment, features of recent epidemiology, random genetic drift, host immune responses, gene editing [12], replication mechanism, etc. [13,14]. SARS-CoV-2 has a higher fidelity in its transcription and replication process than other single-stranded RNA viruses because it has a proofreading mechanism regulated by NSP14 [15]. However, 13,402 single mutations have been detected from 31,421 SARS-CoV-2 genome isolates. Due to technical constraints, genome sequencing is subject to errors. Some “mutations” might result from sequencing errors, instead of actual mutations. Additionally, mRNA editing, such as APOBEC [12], in defending virus invasion in the human immune system can create fatal mutations. Both cases may lead to single-nucleotide polymorphisms (SNPs) without a descendant. We report that among all of 31,421 genome isolates, 13,402 individual mutations have at least one descendant. It is well known that the sensitivity of diagnostic primers and probes depends on their target positions. Specifically, the beginning part of a primer or probe is not as important as its ending part. A high-frequency mutation on the right end of a primer or probe position of a target would possibly produce more false-negatives in diagnostics. Also, importantly, for primers involving significant mutations, polymerase chain reaction (PCR) annealing temperatures are estimated based on correctly matched sequences [16]. Annealing temperatures for primers and probes involving mutations of are given in Tables S4–S56 of the Supporting Material.

Nucleotide-based diagnostic target optimization

Table 2 shows that the degree of mutations on various diagnostic targets vary dramatically. Therefore, it is of great importance to know how to select an optimal viral diagnostics target to avoid potential mutations. We discuss such a target optimization via both nucleotide-based analysis and gene-based mutation analysis. Fig. 7 illustrates the rates of 12 different types of mutations among 31,421 SNP variants. It is interesting to note that 51.4% mutations on the SARS-CoV-2 are of C>T type, due to strong host cell mRNA editing knows as APOBEC cytidine deaminase [12]. Therefore, researchers should avoid cytosine bases as much as possible when designing the diagnostic test kits.
Fig. 7

The pie chart of the distribution of 12 different types of mutations.

The pie chart of the distribution of 12 different types of mutations. Illustration of SARS-CoV-2 mutation ratio and mutation h-index one various genes. For each gene, its length is given in the mutation ratio bar while the number of unique SNPs is given in the h-index bar.

Gene-based diagnostic target optimization

To further understand how to design the most reliable SARS-CoV-2 diagnostic targets, we carry out gene-level mutation analysis. Fig. 8 and Table 3 present the mutation ratio, i.e., the number of unique single-nucleotide polymorphisms (SNPs) over the corresponding gene length, for each SARS-CoV-2 gene. A smaller mutation ratio for a given gene indicates a higher degree of conservativeness. Clearly, the ORF7b gene has the smallest mutation ratio of 0.155, while the ORF7a gene has the largest mutation ratio of 0.642. The N gene has the fourth-largest mutation rate of 0.558, which is very close to the largest ratio of 0.594 for the ORF3a gene and 0.559 for the ORF8 gene. Additionally, two ends of the SARS-CoV-2 genome, i.e., NSP1, NSP2, ORF10, N gene, ORF8, ORF7a, and ORF6, exception for ORF7b, have higher mutation ratios. Considering the mutation frequency, we introduce the mutation h-index, defined as the maximum value of h such that the given gene section has h single mutations that have each occurred at least h times. Normally, larger genes tend to have a higher h-index. Fig. 8 shows that, with a moderate length, the N gene has the second-largest h-index of 44, which is close to the largest h-index of 47 for NSP3. Therefore, selecting SARS-CoV-2 N gene primers and probes as diagnostic reagents for combating COVID-19 is not an optimal choice. Moreover, a few primers and probes used in Japan are designed on the spike and NSP2 gene. However, the high mutation ratio and h-index of spike and NSP2 gene indicate that these diagnostic reagents may not perform well. Furthermore, we design a website called Mutation Tracker to track the single mutations on 26 SARS-CoV-2 proteins, which will be an intuitive tool to inform other research on regions to be avoided in future diagnostic test development.
Fig. 8

Illustration of SARS-CoV-2 mutation ratio and mutation h-index one various genes. For each gene, its length is given in the mutation ratio bar while the number of unique SNPs is given in the h-index bar.

Table 3

Gene-specific statistics of SARS-CoV-2 single mutations on 26 proteins.

Gene typeGene siteGene lengthUnique SNPsmutation ratioh-index
NSP1266:8055402730.50619
NSP2806:271919149730.50836
NSP32720:8554583526260.45047
NSP48555:1005415006040.40325
NSP5(3CL)10,055:109729183530.38522
NSP610,973:118428703480.40022
NSP711,843:12091249990.39812
NSP812,092:126855942420.40714
NSP912,686:130243391350.39813
NSP1013,025:134414171470.35311
NSP1113,442:1348039110.2824
RNA-dependent-polymerase13,442:16236279610300.36831
Helicase16,237:1803918036530.36229
3′-to-5′ exonuclease18,040:1962015817060.44727
endoRNAse19,621:2065810384760.45919
2′-O-ribose methyltransferase20,659:215528943580.40020
Spike protein21,563:25384381916510.43242
ORF3a protein25,393:262208254900.59432
Envelope protein26,245:26472225950.42213
Membrane glycoprotein26,523:271916662710.40723
ORF6 protein27,202:273871831010.55212
ORF7a protein27,394:277593632330.64216
ORF7b protein27,756:27887129200.1555
ORF8 protein27,894:282593632030.55918
Nucleocapsid protein28,274:2953312577010.55844
ORF10 protein29,558:29674114610.53512
Gene-specific statistics of SARS-CoV-2 single mutations on 26 proteins.

Conclusion

In summary, the targets of currently used COVID-19 diagnostic tests have numerous mutations that impact the diagnostic test accuracy in combating COVID-19. There is a need for continued surveillance of viral evolution and diagnostic test performance, as the emergence of viral variants that are no longer detectable by certain diagnostics tests is a real possibility. A cocktail test kit is needed to mitigate mutations. We propose nucleotide-based and gene-based diagnostic target optimizations to design the most reliable diagnostic targets. We analyze a full list of SNPs for all 31,421 genome isolates, including their positions and mutation types. This information, together with ranking of the degree of the conservativeness of SARS-CoV-2 genes or proteins given in Table 3, enables researchers to avoid non-conservative genes (or their proteins) and mutated nucleotide segments in designing COVID-19 diagnosis, vaccine, and drugs.

Methods and materials

SARS-CoV-2 genome sequences from infected individuals dated between January 5, 2020, and July 23, 2020, are downloaded from the GISAID database [17] (https://www.gisaid.org/). We only consider the records in GISAID with complete genomes (>29,000 bp) and submission dates. The resulting 31,421 complete genome sequences are rearranged according to the reference SARS-CoV-2 genome [3] by using the Clustal Omega multiple sequence alignment with default parameters [18]. Gene variants are recorded as SNPs. The Jaccard distance [19] is employed to compute the similarities among genome samples. The resulting distance matrix is used in the k-means clustering of all genome samples.

Jaccard distance of SNP variants

The Jaccard distance measures the dissimilarity between SNP variants which is widely used in the phylogenetic analysis of human or bacterial genomes. Given two sets A, B, we first define the Jaccard similarity coefficient:and the Jaccard distance is described as the difference between one and the Jaccard similarity coefficient

K-means clustering

As an unsupervised classification algorithm, the K-means clustering method partitions a given dataset X={x 1,  x 2, ⋯,  x , ⋯,  x }, x  ∈ ℝ into k different clusters {C 1,  C 2, ⋯,  C }, k ≤ N such that the specific clustering criteria are optimized. The standard procedure of k-means clustering method aims to obtain the optimal partition for a fixed number of clusters. First, we randomly pick k points as the cluster centers and then assign each data to its nearest cluster. Next, we calculate the within-cluster sum of squares (WCSS) defined below to update the cluster centers iteratively.where μ is the mean value of the points located in the k-th cluster C . Here, ∥ ⋅ ∥2 denotes the L 2 distance. It is noted that the k-mean clustering method described above aims to find the optimal partition for a fixed number of clusters. However, seeking the best number of clusters for the SNP variants is essential as well. In this work, by varying the number of clusters k, a set of WCSS with its corresponding number of clusters can be plotted. The location of the elbow in this plot will be taken as the optimal number of clusters. Such a procedure is called the Elbow method which is frequently applied in the k-means clustering problem. Specifically, in this work we apply the k-means clustering with the Elbow method for the analysis of the optimal number of the subtypes of SARS-CoV-2 SNP variants. The pairwise Jaccard distances between different SNP variants are considered as the input features for the k-means clustering method.

Note added in proof

During the review process of the manuscript, which was published in ArXiv [20], Khan et al. analyzed the presence of the mutations/mismatches on 27 diagnostics assays [21]. In this interesting work, the authors showed the geographical distribution and the mismatches for the N ‐ China ‐ F, N1 ‐ U. S ‐ P, and RX7038 ‐ N1 primer(Fw), revealing that the variants from Europe are more likely to have mutations on the N-China-F. Moreover, N1 ‐ U. S ‐ P and RX7038 ‐ N1 primer(Fw) are not suitable for the people from Asia and Oceania.

Data availability

The nucleotide sequences of the SARS-CoV-2 genomes used in this analysis are available, upon free registration, from the GISAID database (https://www.gisaid.org/). Supporting Material presents a list of 54 commonly used diagnostic primers and probes and tables of mutation details on 54 diagnostic primers and probes. The acknowledgments of the SARS-COV-2 genomes are also given in the Supporting Material.

Declaration of Competing Interest

The authors declare that they have no conflict of interest.
  65 in total

1.  Monitoring of SARS-CoV-2 infection in mustelids.

Authors:  Anette Boklund; Christian Gortázar; Paolo Pasquali; Helen Roberts; Søren Saxmose Nielsen; Karl Stahl; Arjan Stegeman; Francesca Baldinelli; Alessandro Broglia; Yves Van Der Stede; Cornelia Adlhoch; Erik Alm; Angeliki Melidou; Grazina Mirinaviciute
Journal:  EFSA J       Date:  2021-03-03

Review 2.  Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors:  Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal:  Chem Rev       Date:  2022-05-20       Impact factor: 72.087

3.  SARS-CoV-2 E Gene Variant Alters Analytical Sensitivity Characteristics of Viral Detection Using a Commercial Reverse Transcription-PCR Assay.

Authors:  Stephen Tahan; Bijal A Parikh; Lindsay Droit; Meghan A Wallace; Carey-Ann D Burnham; David Wang
Journal:  J Clin Microbiol       Date:  2021-06-18       Impact factor: 5.948

Review 4.  Evolution of SARS-CoV-2: Review of Mutations, Role of the Host Immune System.

Authors:  Helene Banoun
Journal:  Nephron       Date:  2021-04-28       Impact factor: 2.847

5.  Detection and classification of SARS-CoV-2 using high-resolution melting analysis.

Authors:  Liying Sun; Leshan Xiu; Chi Zhang; Yan Xiao; Yamei Li; Lulu Zhang; Lili Ren; Junping Peng
Journal:  Microb Biotechnol       Date:  2022-03-01       Impact factor: 6.575

6.  CLEVER assay: A visual and rapid RNA extraction-free detection of SARS-CoV-2 based on CRISPR-Cas integrated RT-LAMP technology.

Authors:  Akansha Bhatt; Zeeshan Fatima; Munindra Ruwali; Chitra Seetharam Misra; Shyam Sunder Rangu; Devashish Rath; Ashok Rattan; Saif Hameed
Journal:  J Appl Microbiol       Date:  2022-04-18       Impact factor: 4.059

7.  A Multiplex and Colorimetric Reverse Transcription Loop-Mediated Isothermal Amplification Assay for Sensitive and Rapid Detection of Novel SARS-CoV-2.

Authors:  Eduardo Juscamayta-López; Faviola Valdivia; Helen Horna; David Tarazona; Liza Linares; Nancy Rojas; Maribel Huaringa
Journal:  Front Cell Infect Microbiol       Date:  2021-06-29       Impact factor: 5.293

Review 8.  SARS-CoV-2 one year on: evidence for ongoing viral adaptation.

Authors:  Thomas P Peacock; Rebekah Penrice-Randal; Julian A Hiscox; Wendy S Barclay
Journal:  J Gen Virol       Date:  2021-04       Impact factor: 3.891

9.  The interplay of SARS-CoV-2 evolution and constraints imposed by the structure and functionality of its proteins.

Authors:  Lukasz Jaroszewski; Mallika Iyer; Arghavan Alisoltani; Mayya Sedova; Adam Godzik
Journal:  PLoS Comput Biol       Date:  2021-07-08       Impact factor: 4.475

10.  Assessing the impact of air pollution and climate seasonality on COVID-19 multiwaves in Madrid, Spain.

Authors:  Maria A Zoran; Roxana S Savastru; Dan M Savastru; Marina N Tautan; Laurentiu A Baschir; Daniel V Tenciu
Journal:  Environ Res       Date:  2021-08-06       Impact factor: 8.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.