| Literature DB >> 32425493 |
Snawar Hussain1, Pottathil Shinu1, Mohammed Monirul Islam1, Muhammad Shahzad Chohan1, Sahibzada Tasleem Rasool1.
Abstract
The Middle East Respiratory Syndrome (MERS) is an emerging disease caused by a recently identified human coronavirus (CoV). Over 2494 laboratory-confirmed cases and 858 MERS-related deaths have been reported from 27 countries. MERS-CoV has been associated with a high case fatality rate, especially in patients with pre-existing conditions. Despite the fatal nature of MERS-CoV infection, a comprehensive study to explore its evolution and adaptation in different hosts is lacking. We performed codon usage analyses on 4751 MERS-CoV genes and determined underlying forces that affect the codon usage bias in the MERS-CoV genome. The current analyses revealed a low but highly conserved, gene-specific codon usage bias in the MERS-CoV genome. The codon usage bias is mainly shaped by natural selection, while mutational pressure emerged as a minor factor affecting codon usage in some genes. Other contributory factors included CpG dinucleotide bias, physical and chemical properties of encoded proteins and gene length. Results reported in this study provide considerable insights into the molecular evaluation of MERS-CoV and could serve as a theoretical basis for optimizing MERS-CoV gene expression to study the functional relevance of various MERS-CoV proteins. Alternatively, an attenuated vaccine strain containing hundreds of silent mutations could be engineered. Codon de-optimization will not affect the amino acid sequence or antigenicity of a vaccine strain, but the sheer number of mutations would make viral reversion to a virulent phenotype extremely unlikely.Entities:
Keywords: Middle East respiratory syndrome coronavirus; codon usage bias; mutational bias; natural selection; virus evolution
Year: 2020 PMID: 32425493 PMCID: PMC7218340 DOI: 10.1177/1176934320918861
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1.Compositional analysis of MERS-CoV genes. (A) The G+C and A+T contents (mean ± SD) in 10 MERS-CoV genes (B) percent GC at first, second and third codon position. MERS-CoV indicates Middle East Respiratory Syndrome Coronavirus; ORF, open reading frames.
The Relative Synonymous Codon Usage (RSCU) patterns of MERS-CoV and their host species, human and camel.
| Codon (AA)[ | 1ab | S | ORF3 | ORF4a | ORF4b | orf5 | E | M | N | ORF8b | MERS-CoV | Human | Camel |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UUU(F) |
|
|
| 0.675 |
|
| 0.752 |
|
| 0.999 |
| 0.93 | 0.70 |
| UUC(F) | 0.717 | 0.577 | 0.598 |
| 0.184 | 0.943 |
| 0.393 | 0.983 |
| 0.71 |
|
|
| UUA(L) | 1.283 | 0.956 | 0.624 | 0.860 | 0.829 |
|
| 0.859 | 0.013 | 1.009 | 1.13 | 0.46 | 0.30 |
| UUG(L) | 1.530 | 1.146 | 1.191 |
| 1.035 | 0.215 | 1.635 | 0.571 | 0.461 | 0.762 | 1.28 | 0.77 | 0.54 |
| CUU(L) |
|
|
| 0.857 |
| 0.961 | 0.548 | 1.151 |
| 0.758 |
| 0.79 | 0.66 |
| CUC(L) | 0.701 | 0.613 | 1.790 | 0.857 | 0.829 | 0.854 | 0.545 |
| 0.693 | 0.508 | 0.73 | 1.17 | 1.33 |
| CUA(L) | 0.375 | 0.647 | 0.000 | 0.857 | 0.828 | 1.043 | 0.545 | 1.141 | 0.459 | 1.271 | 0.51 | 0.43 | 0.55 |
| CUG(L) | 0.437 | 0.456 | 0.013 | 0.857 | 1.033 | 0.968 | 0.545 | 0.855 | 0.679 |
| 0.53 |
|
|
| AUU(I) |
|
|
| 0.912 |
|
| 0.811 |
|
|
|
| 1.08 | 0.84 |
| AUC(I) | 0.587 | 0.312 | 0.004 |
| 0.923 | 0.548 | 0.000 | 0.626 | 0.913 | 0.000 | 0.57 |
|
|
| AUA(I) | 0.704 | 0.945 | 0.009 | 0.001 | 0.462 | 0.821 |
| 0.315 | 0.232 | 1.200 | 0.71 | 0.51 | 0.33 |
| GUU(V) |
|
|
| 1.995 |
|
| 0.799 | 1.089 |
| 0.000 |
| 0.72 | 0.57 |
| GUC(V) | 0.769 | 0.439 | 1.072 | 0.002 | 0.799 | 0.941 | 0.800 | 0.832 | 0.709 | 1.993 | 0.73 | 0.95 | 1.16 |
| GUA(V) | 0.761 | 0.418 | 0.366 | 0.000 | 0.204 | 0.806 |
|
| 1.160 |
| 0.74 | 0.47 | 0.35 |
| GUG(V) | 0.757 | 0.547 | 0.731 |
| 0.200 | 0.323 | 0.800 | 0.831 | 0.709 | 0.003 | 0.71 |
|
|
| UCU(S) |
|
| 2.214 | 1.559 |
|
| 0.004 | 1.202 |
| 0.663 |
| 1.13 | 1.02 |
| UCC(S) | 0.582 | 1.099 | 0.006 | 1.531 | 1.364 | 0.873 | 0.004 |
| 0.816 |
| 0.80 | 1.31 |
|
| UCA(S) | 1.230 | 0.946 |
|
| 1.374 | 0.235 |
| 1.209 | 1.373 | 0.000 | 1.20 | 0.92 | 0.66 |
| UCG(S) | 0.184 | 0.136 | 0.000 | 0.003 | 0.537 | 0.461 | 0.000 | 0.000 | 0.344 | 1.336 | 0.20 | 0.33 | 0.32 |
| AGU(S) | 1.482 | 0.980 | 0.498 | 0.512 | 0.747 | 0.691 | 2.996 | 1.196 | 1.039 | 0.001 | 1.23 | 0.90 | 0.66 |
| AGC(S) | 0.438 | 0.451 | 0.000 | 0.511 | 0.068 | 1.384 | 0.000 | 0.598 | 0.518 | 1.328 | 0.47 |
| 1.51 |
| CCU(P) |
|
|
|
|
|
| 1.330 | 1.194 | 1.668 | 0.438 |
| 1.14 | 1.06 |
| CCC(P) | 0.629 | 0.506 | 0.772 | 0.657 | 1.005 | 0.686 |
| 0.402 | 0.588 | 0.438 | 0.61 |
|
|
| CCA(P) | 1.208 | 0.903 | 1.447 | 0.145 | 0.993 | 1.298 | 0.667 |
|
|
| 1.28 | 1.10 | 0.98 |
| CCG(P) | 0.168 | 0.194 | 0.000 | 0.627 | 0.006 | 0.330 | 0.000 | 0.401 | 0.001 | 1.396 | 0.16 | 0.45 | 0.51 |
| ACU(T) |
|
|
|
|
|
|
|
|
| 1.497 |
| 0.98 | 0.93 |
| ACC(T) | 0.667 | 0.620 | 0.362 | 0.000 | 0.278 | 0.336 | 1.143 | 0.861 | 1.218 | 0.500 | 0.71 |
|
|
| ACA(T) | 1.205 | 1.214 | 0.739 | 1.714 | 0.836 | 0.673 | 1.143 | 0.286 | 0.670 |
| 1.11 | 1.14 | 0.94 |
| ACG(T) | 0.196 | 0.043 | 0.000 | 0.000 | 0.382 | 0.334 | 0.571 | 0.569 | 0.131 | 0.003 | 0.18 | 0.46 | 0.34 |
| GCU(A) |
|
|
|
| 1.202 |
|
|
|
| 0.004 |
| 1.06 | 1.07 |
| GCC(A) | 0.610 | 0.725 | 0.661 | 0.444 | 0.851 | 1.126 | 0.000 | 0.842 | 0.990 | 0.000 | 0.68 |
|
|
| GCA(A) | 1.009 | 0.815 | 1.335 | 0.454 |
| 0.712 | 1.000 | 0.834 | 0.852 |
| 0.96 | 0.91 | 0.66 |
| GCG(A) | 0.295 | 0.184 | 0.000 | 0.434 | 0.741 | 0.361 | 0.000 | 0.634 | 0.238 | 1.994 | 0.29 | 0.43 | 0.40 |
| UAU(Y) |
|
|
|
| 1.004 |
|
| 0.805 | 0.230 | 0.000 |
| 0.89 | 0.72 |
| UAC(Y) | 0.700 | 0.764 | 0.499 | 0.847 | 0.996 | 0.433 | 0.667 |
|
| 0.004 | 0.79 |
|
|
| CAU(H) |
|
|
| 0.668 |
| 0.870 | 0.000 |
| 1.009 | 0.997 |
| 0.84 | 0.64 |
| CAC(H) | 0.651 | 0.900 | 0.000 |
| 0.413 |
| 0.009 | 0.667 | 0.991 | 1.003 | 0.71 |
|
|
| CAA(Q) |
|
|
| 0.652 |
| 0.683 |
| 0.668 |
| 0.992 |
| 0.53 | 0.55 |
| CAG(Q) | 0.866 | 0.985 | 0.500 |
| 0.500 |
| 0.999 |
| 0.667 |
| 0.87 |
|
|
| AAU(N) |
|
|
|
|
|
|
|
|
|
|
| 0.94 | 0.68 |
| AAC(N) | 0.580 | 0.570 | 0.008 | 0.801 | 0.600 | 0.857 | 0.666 | 0.386 | 0.937 | 0.687 | 0.63 |
|
|
| AAA(K) | 0.987 |
|
| 1.001 | 1.093 |
|
|
| 1.038 |
|
| 0.87 | 0.84 |
| AAG(K) |
| 0.787 | 0.000 | 0.999 | 0.907 | 0.667 | 0.000 | 0.571 | 0.962 | 0.781 | 0.98 |
|
|
| GAU(D) |
|
|
|
|
|
|
| 0.667 |
| 0.000 |
| 0.93 | 0.74 |
| GAC(D) | 0.732 | 0.518 | 0.779 | 0.684 | 0.398 | 0.858 | 0.996 |
| 0.503 |
| 0.68 |
|
|
| GAA(E) |
| 0.958 | 1.000 | 0.832 | 0.752 | 0.000 | 1.000 | 0.800 | 0.926 |
|
| 0.85 | 0.84 |
| GAG(E) | 0.947 |
| 1.000 |
|
|
| 1.000 |
|
| 0.800 | 0.97 |
|
|
| UGU(C) |
|
|
|
| 0.997 |
|
| 1.000 | 0.000 | 0.000 |
| 0.91 | 0.76 |
| UGC(C) | 0.825 | 0.672 | 0.641 | 0.000 |
| 0.751 | 0.000 | 1.000 | 0.000 | 0.000 | 0.76 |
|
|
| CGU(R) |
|
| 1.197 | 0.000 | 0.672 |
|
| 1.333 | 1.158 | 0.000 |
| 0.48 | 0.40 |
| CGC(R) | 1.032 | 1.200 | 0.000 | 0.000 | 1.981 | 1.283 | 0.000 | 0.000 | 1.380 |
| 1.140 | 1.10 | 0.99 |
| CGA(R) | 0.468 | 0.531 | 1.204 | 2.998 | 0.009 | 0.001 | 2.000 | 0.667 | 0.465 | 0.000 | 0.45 | 0.66 | 0.87 |
| CGG(R) | 0.379 | 0.400 | 0.000 |
| 0.335 | 0.000 | 0.000 | 1.333 | 0.919 | 0.000 | 0.48 | 1.21 | 1.10 |
| AGA(R) | 1.321 | 0.667 |
| 0.000 |
| 1.286 | 2.000 |
|
| 0.000 | 1.38 |
| 1.31 |
| AGG(R) | 0.958 | 0.533 | 1.204 | 0.000 | 0.998 | 0.000 | 0.000 | 0.668 | 0.232 |
| 0.75 | 1.27 |
|
| GGU(G) |
|
|
| 0.800 | 1.003 | 1.600 | 0.012 | 1.620 | 1.160 | 1.091 |
| 0.65 | 0.58 |
| GGC(G) | 0.906 | 1.493 | 0.000 | 0.800 |
| 1.600 |
| 1.290 | 0.740 | 0.364 | 1.02 |
|
|
| GGA(G) | 0.558 | 0.522 | 0.000 |
| 0.992 | 0.799 |
| 1.089 |
|
| 0.75 | 1.00 | 0.85 |
| GGG(G) | 0.295 | 0.175 | 0.000 | 0.000 | 0.000 | 0.000 |
| 0.001 | 0.731 | 0.727 | 0.33 | 1.00 | 0.93 |
Abbreviations: MERS-CoV, Middle East Respiratory Syndrome Coronavirus; ORF, open reading frames.
Optimal codon for each amino acid is marked in bold.
Figure 2.Correspondence analysis of synonymous codon usage in MERS-CoV genes. (A) The relative and cumulative inertia of the first 40 factors from a COA of the relative synonymous codon usage values. (R. Iner.—Relative Inertia, R.Sum—Relative sum or cumulative relative inertia). (B) The positions of each MERS-CoV gene in the first two-main-dimensional coordinates. COA indicates Correspondence analysis; MERS-CoV, Middle East Respiratory Syndrome Coronavirus; ORF, open reading frames.
The effective number of codons (ENC), Codon Adaptation Index (CAI) and Relative Codon Deoptimization Index (RCDI) of MERS-CoV genes.
| ORF1ab | S | ORF3 | ORF4a | ORF4b | ORF5 | E | M | N | ORF8b | All genes | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ENC | 49.259 ± 0.05 | 46.075 ± 0.163 | 41.02 ± 0.53 | 57.061 ± 1.175 | 51.099 ± 0.306 | 51.328 ± 0.325 | 56.007 ± 0.323 | 59.874 ± 0.332 | 49.63 ± 0.291 | 52.483 ± 0.49 | 51.203 ± 5.21 |
| CAI-Hum | 0.718 ± 0 | 0.718 ± 0.001 | 0.693 ± 0.003 | 0.766 ± 0.002 | 0.715 ± 0.003 | 0.68 ± 0.003 | 0.688 ± 0.001 | 0.7 ± 0.002 | 0.737 ± 0.001 | 0.667 ± 0.003 | 0.708 ± 0.027 |
| CAI-Cam | 0.579 ± 0 | 0.571 ± 0.001 | 0.537 ± 0.003 | 0.643 ± 0.003 | 0.578 ± 0.003 | 0.548 ± 0.004 | 0.567 ± 0.001 | 0.568 ± 0.002 | 0.609 ± 0.001 | 0.571 ± 0.003 | 0.577 ± 0.028 |
| RCDI-Hum | 1.399 ± 0.002 | 1.523 ± 0.006 | 2.071 ± 0.019 | 1.465 ± 0.015 | 1.489 ± 0.01 | 1.655 ± 0.013 | 1.759 ± 0.01 | 1.332 ± 0.011 | 1.386 ± 0.006 | 1.621 ± 0.02 | 1.568 ± 0.206 |
| RCDI-Cam | 1.639 ± 0.002 | 1.776 ± 0.007 | 2.421 ± 0.027 | 1.608 ± 0.018 | 1.723 ± 0.015 | 1.961 ± 0.019 | 2.161 ± 0.014 | 1.497 ± 0.014 | 1.531 ± 0.008 | 1.743 ± 0.019 | 1.805 ± 0.275 |
Abbreviation: MERS-CoV, Middle East Respiratory Syndrome Coronavirus; ORF, open reading frames.
Figure 3.The effect of mutational biases and natural selection on synonymous codon usage on MERS-CoV genes. (A) Relationship between GC3 and the effective number of codons (ENC). The ENC values of all genes were plotted against the corresponding GC3s. The standard curve indicates the expected codon usage if GC compositional constraints alone account for codon usage bias. (B). The neutrality plot (GC12 vs GC3). Neutrality plot analysis of the average GC content in the first and second positions of the codons (GC12) and the GC content in the third position (GC3). For ORF1ab; the regression line is y = 0.114x + 0.3945; R² = 0.0621. Spike; the regression line is y = -0.1403x + 0.4954; R² = 0.1369. ORF3; the regression line is y = 0.3652x + 0.3951; R² = 0.1549. ORF4a; the regression line is y = -0.1602x + 0.5196; R² = 0.2798. ORF4b; the regression line is y = 0.0247x + 0.4309; R² = 0.0015. ORF5; the regression line is y = 0.0185x + 0.4161; R² = 0.0105. Envelope; the regression line is y = -0.016x + 0.4097; R² = 0.0101. Matrix; the regression line is y = 0.0545x + 0.4135; R² = 0.0214. Nucleocapsid; the regression line is y = 0.1358x + 0.4646; R² = 0.1387. ORF8b; the regression line is y = 0.0502x + 0.4572; R² = 0.0005. MERS-CoV indicates Middle East Respiratory Syndrome Coronavirus; ORF, open reading frames.
Figure 4.Relative dinucleotide abundance in MERS-CoV genome. (A) Line graph represents the mean observed/expected (O/E) frequency ratio of 16 dinucleotides. The mean ± standard deviation of dinucleotide O/E ratios for MERS-CoV coding sequence is 1.0 ± 0.144. The dotted box represents the confidence interval of 0.856-1.144. Dinucleotides outside the dotted box are under- or over–represented in the MERS-CoV genome. (B) Odds ratio of CpG and GpC dinucleotide in MERS-CoV genes. (C) The loss of CpG dinucleotides and the average gain in TpG and CpA dinucleotides in MERS-CoV genes. MERS-CoV indicates Middle East Respiratory Syndrome Coronavirus.