| Literature DB >> 36217240 |
Zhaobin Xu1, Dongying Yang2, Liyan Wang1, Jacques Demongeot3.
Abstract
It was noticed that the mortality rate of SARS-CoV-2 infection experienced a significant declination in the early stage of the epidemic. We suspect that the sharp deterioration of virus toxicity is related to the deletion of the untranslated region (UTR) of the virus genome. It was found that the genome length of SARS-CoV-2 engaged a significant truncation due to UTR deletion after a mega-sequence analysis. Sequence similarity analysis further indicated that short UTR strains originated from its long UTR ancestors after an irreversible deletion. A good correlation was discovered between genome length and mortality, which demonstrated that the deletion of the virus UTR significantly affected the toxicity of the virus. This correlation was further confirmed in a significance analysis of the genetic influence on the clinical outcomes. The viral genome length of hospitalized patients was significantly more extensive than that of asymptomatic patients. In contrast, the viral genome length of asymptomatic was considerably longer than that of ordinary patients with symptoms. A genome-level mutation scanning was performed to systematically evaluate the influence of mutations at each position on virulence. The results indicated that UTR deletion was the primary driving force in alternating virus virulence in the early evolution. In the end, we proposed a mathematical model to explain why this UTR deletion was not continuous.Entities:
Keywords: SARS-COV-2; deletion of the untranslated region; nucleic acid degradation system; viral toxicity
Mesh:
Substances:
Year: 2022 PMID: 36217240 PMCID: PMC9553139 DOI: 10.1080/21505594.2022.2132059
Source DB: PubMed Journal: Virulence ISSN: 2150-5594 Impact factor: 5.428
Figure 1a.(A) the boxplot of SARS-COV-2 genome length distribution at different time points. The median is represented by the horizontal bar inside rectangles. The interquartile range box represents the middle 50% of the data. The whiskers extend from either side of the box. The whiskers represent the ranges for the bottom 25% and the top 25% of the data values, excluding outliers. (B) the average and the standard deviation of SARS-COV-2 genome length at different time points. The average value is marked as the red circles. Standard deviation of its genome length at different months is represented as the ranges marked in red. The emergence timeline of Delta and Omicron is also marked.
Figure 1b.(Continued).
Statistical characteristics of mutation score of two different length groups.
| Pair-wise sequence | Among 29903nt | Between 29903nt | Among 29782nt |
|---|---|---|---|
| Mean Value | 8.4651 | 10.0457 | 10.9576 |
| Standard Deviation | 4.8509 | 5.5880 | 6.3711 |
| Max Value | 40 | 54 | 52 |
| Min Value | 0 | 0 | 0 |
| Sample Size | 15753 | 19224 | 5778 |
Figure 2.SARS-COV-2 mortality calculated using two different approaches. The red line stands for the mortality using death data explicitly. The blue stripe stands for the mortality calculated after transformation.
Pearson correlation between genome length and death rate at different threshold sets.
| Threshold Set | Pearson Correlation factor |
|---|---|
| >=29850nt | 0.8191 |
| >=29855nt | 0.8082 |
| >=29860nt | 0.7980 |
| >=29865nt | 0.8371 |
| >=29870nt | 0.8508 |
| >=29875nt | 0.8217 |
| >=29880nt | 0.8351 |
Figure 3.The genome length distribution of SARS-COV-2 in three different types of patients. The red box stands for symptomatic patients; the blue one stands for hospitalized patients; the green one stands for asymptomatic patients.
Heterogeneity test of SARS-COV-2 genome length among different symptom patients.
| P-value | Virus genome length in hospitalized patient | Virus genome length in asymptomatic patient | Virus genome length in symptomatic patient |
|---|---|---|---|
| Virus genome length in hospitalized patient | 0.5311 | 6.2923e-41 | 0 |
| Virus genome length in asymptomatic patient | 6.2923e-41 | 0.5395 | 6.3969e-117 |
| Virus genome length in symptomatic patient | 0 | 6.3969e-117 | 0.8739 |
Figure 4a.
(A). Conservation frequency of each locus. The position of specific locus is marked as (the number in y coordinate-1) * 1000 + (the number in x coordinate). UTR is marked in red rectangles. (B). the Pearson correlation between the frequency of mutations in genetic variation and mortality. The position of specific loci is marked as (the number in y coordinate −1) * 1000 + (the number in x coordinate). UTR is marked in red rectangles. (C). Significance of the P-value of the ratio between deceased patients and asymptomatic patients calculated by chi-square test. The position of specific loci is marked as (the number in y coordinate −1) * 1000 + (the number in x coordinate). UTR is marked in red rectangles.
Figure 4b.(Continued).
Figure 4c.(Continued).
Locus that meet all of the three thresholds. Specifically, the mutation frequency threshold is set to be 0.2; the Pearson correlation threshold is 0.2; the chi-square significance threshold is 0.01.
| Position | Conservation score | Pearson correlation | P-value using the chi-square test with two extreme symptom groups | Inside UTR or not | Locations in secondary structure | |
|---|---|---|---|---|---|---|
| 1 | 0.0927 | 0.6907 | 0 | Y | ||
| 2 | 0.1059 | 0.6425 | 0 | Y | ||
| 3 | 0.1216 | 0.596 | 0 | Y | ||
| 4 | 0.1278 | 0.6001 | 0 | Y | ||
| 5 | 0.1488 | 0.5076 | 0.0004 | Y | ||
| 6 | 0.1563 | 0.476 | 0.0029 | Y | ||
| 7 | 0.1662 | 0.4416 | 0.0065 | Y | SL1 | |
| 8 | 0.177 | 0.4307 | 0.0018 | Y | SL1 | |
| 9 | 0.1817 | 0.3974 | 0.0016 | Y | SL1 | |
| 11 | 0.1935 | 0.3672 | 0.0095 | Y | SL1 | |
| 24 | 0.2732 | 0.6458 | 0.0044 | Y | SL1 | |
| 25 | 0.2871 | 0.653 | 0.0001 | Y | SL1 | |
| 26 | 0.3205 | 0.6346 | 0 | Y | SL1 | |
| 27 | 0.3251 | 0.6346 | 0 | Y | SL1 | |
| 28 | 0.3308 | 0.6381 | 0 | Y | SL1 | |
| 29 | 0.3345 | 0.6367 | 0 | Y | SL1 | |
| 30 | 0.343 | 0.6325 | 0.0008 | Y | SL1 | |
| 33 | 0.4557 | 0.4655 | 0 | Y | SL1 | |
| 34 | 0.4702 | 0.4803 | 0 | Y | SL1 | |
| 35 | 0.4784 | 0.4845 | 0 | Y | ||
| 36 | 0.4955 | 0.4704 | 0 | Y | ||
| 37 | 0.5118 | 0.4479 | 0 | Y | ||
| 38 | 0.5261 | 0.4075 | 0 | Y | ||
| 39 | 0.5685 | 0.3208 | 0 | Y | ||
| 40 | 0.5778 | 0.3094 | 0 | Y | ||
| 41 | 0.5804 | 0.2977 | 0 | Y | ||
| 42 | 0.5838 | 0.2875 | 0 | Y | ||
| 43 | 0.5909 | 0.2652 | 0 | Y | ||
| 44 | 0.5922 | 0.2603 | 0 | Y | ||
| 45 | 0.5968 | 0.2476 | 0 | Y | ||
| 46 | 0.5981 | 0.2543 | 0 | Y | ||
| 47 | 0.5968 | 0.2499 | 0 | Y | SL2 | |
| 48 | 0.6038 | 0.2831 | 0 | Y | SL2 | |
| 49 | 0.6055 | 0.2763 | 0 | Y | SL2 | |
| 50 | 0.616 | 0.2743 | 0 | Y | SL2 | |
| 51 | 0.6188 | 0.2557 | 0 | Y | SL2 | |
| 52 | 0.6215 | 0.2546 | 0 | Y | SL2 | |
| 53 | 0.6226 | 0.26 | 0 | Y | SL2 | |
| 54 | 0.6248 | 0.2498 | 0 | Y | SL2 | |
| 11285 | 0.7988 | 0.3803 | 0 | N | ||
| 11287 | 0.7994 | 0.3802 | 0 | N | ||
| 11289 | 0.7974 | 0.3805 | 0 | N | ||
| 11291 | 0.7967 | 0.3811 | 0 | N | ||
| 11293 | 0.7982 | 0.3802 | 0 | N | ||
| 11294 | 0.7964 | 0.3802 | 0 | N | ||
| 11295 | 0.7957 | 0.3798 | 0 | N | ||
| 11296 | 0.7938 | 0.3806 | 0 | N | ||
| 29831 | 0.6291 | 0.2119 | 0 | Y | ||
| 29833 | 0.6499 | 0.2084 | 0 | Y | ||
| 29834 | 0.6521 | 0.2188 | 0 | Y | ||
| 29839 | 0.6047 | 0.3783 | 0 | Y | S2 | |
| 29843 | 0.592 | 0.3198 | 0 | Y | S2 | |
| 29848 | 0.5475 | 0.2309 | 0 | Y | S2 | |
| 29849 | 0.5271 | 0.2467 | 0 | Y | S2 | |
| 29853 | 0.5846 | 0.3191 | 0 | Y | ||
| 29854 | 0.4954 | 0.2589 | 0 | Y | ||
| 29855 | 0.4692 | 0.2707 | 0 | Y | ||
| 29858 | 0.4484 | 0.2083 | 0 | Y | ||
| 29860 | 0.3515 | 0.4457 | 0 | Y | ||
| 29861 | 0.2771 | 0.3218 | 0 | Y | S3-B | |
| 29862 | 0.3262 | 0.4985 | 0 | Y | S3-B | |
| 29863 | 0.3095 | 0.6174 | 0 | Y | S3-B | |
| 29864 | 0.46 | 0.6373 | 0 | Y | S3-B | |
| 29865 | 0.2757 | 0.459 | 0 | Y | S3-B | |
| 29866 | 0.4353 | 0.5005 | 0 | Y | S3-B | |
| 29867 | 0.5573 | 0.4274 | 0 | Y | S3-B | |
| 29868 | 0.4415 | 0.7173 | 0 | Y | S3-B | |
| 29869 | 0.2372 | 0.5257 | 0 | Y | S3-B | |
| 29890 | 0.2015 | 0.2203 | 0.0044 | Y | ||
| 29891 | 0.2017 | 0.2067 | 0.002 | Y | ||
| 29892 | 0.2034 | 0.2253 | 0.0032 | Y | ||
| 29893 | 0.205 | 0.2399 | 0.0022 | Y | ||
| 29894 | 0.2069 | 0.2409 | 0.0013 | Y | ||
| 29895 | 0.2084 | 0.25 | 0.001 | Y | ||
| 29896 | 0.211 | 0.2599 | 0.0003 | Y | ||
| 29897 | 0.2121 | 0.2469 | 0.0002 | Y | ||
| 29898 | 0.2159 | 0.2571 | 0.0001 | Y | ||
| 29899 | 0.2211 | 0.256 | 0 | Y | ||
| 29900 | 0.2504 | 0.3796 | 0 | Y | ||
| 29901 | 0.2585 | 0.3614 | 0 | Y | ||
| 29902 | 0.3304 | 0.2282 | 0 | Y | ||
Figure 5.Three destinies of genome RNA in our mathematical model. The first fate is that it might be decomposed and eliminated in the host cell if it doesn’t pass the surviving threshold. The second possibility is deleting into a shorter UTR genome under the pressure of the human RNA degradation system. The shorter genome is depicted as a short solid green line. The third possibility is that it might replicate into two offspring with the template marked as a solid blue line and the new strand marked as a solid red line. The replication can be triggered if the time interval passes the replication cycle.
Figure 6a.(A). UTR deletion size distribution at different generations based on undifferentiated attenuation model. 50th, 100th, 200th, 300th and 500th generations were selected to further analyse their UTR region deletion degree. 50th, 100th, 200th, 300th and 500th generations were marked in the red line, green line, blue line, black line, and cyan line, respectively. (B). UTR deletion size distribution at different generations considering reduced deletion probability at certain bottleneck points. 50th, 100th, 200th, 300th and 500th generations were selected to further analyse their UTR region deletion situation. 50th, 100th, 200th, 300th and 500th generations were marked in the red line, green line, blue line, black line, and cyan line, respectively.
Figure 6b.(Continued).