| Literature DB >> 34634813 |
Qinglan Sun1,2, Chang Shu1,3, Wenyu Shi1,2, Yingfeng Luo1,3,4, Guomei Fan1,2, Jingyi Nie1,3,4, Yuhai Bi5, Qihui Wang5, Jianxun Qi5, Jian Lu6, Yuanchun Zhou7, Zhihong Shen7, Zhen Meng7, Xinjiao Zhang1,2, Zhengfei Yu1,2, Shenghan Gao1,3, Linhuan Wu1,2, Juncai Ma1,3,2, Songnian Hu1,3,4.
Abstract
The genomic variations of SARS-CoV-2 continue to emerge and spread worldwide. Some mutant strains show increased transmissibility and virulence, which may cause reduced protection provided by vaccines. Thus, it is necessary to continuously monitor and analyze the genomic variations of SARS-COV-2 genomes. We established an evaluation and prewarning system, SARS-CoV-2 variations evaluation and prewarning system (VarEPS), including known and virtual mutations of SARS-CoV-2 genomes to achieve rapid evaluation of the risks posed by mutant strains. From the perspective of genomics and structural biology, the database comprehensively analyzes the effects of known variations and virtual variations on physicochemical properties, translation efficiency, secondary structure, and binding capacity of ACE2 and neutralizing antibodies. An AI-based algorithm was used to verify the effectiveness of these genomics and structural biology characteristic quantities for risk prediction. This classifier could be further used to group viral strains by their transmissibility and affinity to neutralizing antibodies. This unique resource makes it possible to quickly evaluate the variation risks of key sites, and guide the research and development of vaccines and drugs. The database is freely accessible at www.nmdc.cn/ncovn.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34634813 PMCID: PMC8728250 DOI: 10.1093/nar/gkab921
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Features of the variations evaluation and prewarning system (VarEPS) portal. We show a global distribution of genome sequences by time frame and geography. The risk level and frequency of characteristic variants of each lineage are listed. Users can submit a sequence for variation analysis directly on the homepage.
Figure 2.Statistics of nucleotide mutation numbers in SARS-CoV-2 genomes. (A) Histogram of the mutation count at all nucleotide positions. Red, orange and green bars refer to the frequency of mutation count below 10, below 50 and above 2600, respectively. (B) Histogram of total mutation count in one strain. The heatmap shows the distribution of total mutation count in each month. Mutation counts are accumulating over time and coordinate with lineages.
Variants of SARS-CoV-2 genome and most common variants located on S-RBD.
| Whole genome high frequency variants | RBD high frequency variants | |||||
|---|---|---|---|---|---|---|
| NO | Variants | Counts | Frequency | Variants | Counts | Frequency |
|
| S:D614G | 2467291 | 95.85% | S:N501Y | 1200001 | 46.62% |
|
| ORF1ab:P314L | 2437459 | 94.69% | S:L452R | 269897 | 10.49% |
|
| N:R203K | 1434386 | 55.72% | S:T478K | 206979 | 8.04% |
|
| N:G204R | 1432232 | 55.64% | S:E484K | 151017 | 5.87% |
|
| S:N501Y | 1200001 | 46.62% | S:S477N | 68895 | 2.68% |
|
| S:P681H | 1178130 | 45.77% | S:K417T | 57507 | 2.23% |
|
| ORF1ab:T1001I | 1125623 | 43.73% | S:K417N | 33585 | 1.30% |
|
| S:D1118H | 1124839 | 43.70% | S:N439K | 33447 | 1.30% |
|
| S:A570D | 1122643 | 43.61% | S:S494P | 12880 | 0.50% |
|
| S:T716I | 1122555 | 43.61% | S:F490S | 7757 | 0.30% |
|
| ORF8:Y73C | 1120251 | 43.52% | S:E484Q | 7179 | 0.28% |
|
| ORF1ab:A1708D | 1118801 | 43.46% | S:A520S | 5443 | 0.21% |
|
| N:S235F | 1118673 | 43.46% | S:N440K | 4610 | 0.18% |
|
| S:S982A | 1116061 | 43.36% | S:A522S | 4436 | 0.17% |
|
| ORF8:R52I | 1113847 | 43.27% | S:N501T | 4194 | 0.16% |
|
| N:D3L | 1112519 | 43.22% | S:L452Q | 3704 | 0.14% |
|
| ORF1ab:I2230T | 1099897 | 42.73% | S:V367F | 2499 | 0.10% |
|
| ORF3a:Q57H | 456450 | 17.73% | S:R346K | 2357 | 0.09% |
|
| ORF1ab:E265I | 365975 | 14.22% | S:P384L | 2253 | 0.09% |
|
| S:L452R | 269897 | 10.49% | S:R346S | 2188 | 0.09% |
Figure 3.Binding stability to ACE2 and antibody affinity risk level for key mutations on S-RBD. Risk levels of reduced antibody affinity for 15 antibodies were calculated. The risk levels of antibody affinity and increased binding stability to ACE2 are ranked 0 to 2. Frequency of these variants over time are provided.
Figure 4.Binding stability to ACE2 and antibody affinity risk level for key known mutations and virtual mutations on S-RBD. A red dot indicates is increased binding stability to ACE2. Overall risk levels of reduced antibody affinity for 15 antibodies are ranked 0 to 3. Both known and virtual mutations were evaluated.
Figure 5.Nucleotide mismatch statistics for primers. Nucleotide mismatches were compared for the 3′ end of primers. The number of strains for each lineage were calculated.
Figure 6.Schematic representation of VarEPS for data processing and online analysis service. SARS-CoV-2 genome sequences were integrated to perform metadata curation and quality control procedures. Sequence data were mapped to the reference genome for variation annotation. Each annotated variant was used to calculate effects on translation efficiency, secondary structure, binding capacity of ACE2 and neutralizing antibodies and efficacy of primers. Our web portal provides multiple query selections to display results on both known and virtual mutations. The system also provides online analysis service for custom submitted sequences.