Literature DB >> 36185504

The first study on analysis of the codon usage bias and evolutionary analysis of the glycoprotein envelope E2 gene of seven Pestiviruses.

Mohammad Shueb1, Shashanka K Prasad1, Kuralayanapalya Puttahonnappa Suresh2, Uma Bharathi Indrabalan2, Mallikarjun S Beelagi2, Chandan Shivamallu1, Ekaterina Silina3, Victor Stupin3, Natalia Manturova3, Shiva Prasad Kollur4, Bibek Ranjan Shome2, Raghu Ram Achar5, Sharanagouda S Patil2.   

Abstract

Background and Aim: Pestivirus, a genus of the Flaviviridae family, comprises viruses that affect bovines, sheep, and pigs. Symptoms, including hemorrhagic syndromes, abortion, respiratory complications, and deadly mucosal diseases, are produced in infected animals, which cause huge economic losses to the farmers. Bovine viral diarrhea virus-1, bovine viral diarrhea virus-2, classical swine fever virus, border disease virus, Bungowannah, Hobi-like, and atypical porcine pestivirus belonging to the Pestivirus genus were selected for the study. This study aimed to estimate the codon usage bias and the rate of evolution using the glycoprotein E2 gene. Furthermore, codon usage bias analysis was performed using publicly available nucleotide sequences of the E2 gene of all seven Pestiviruses. These nucleotide sequences might elucidate the disease epidemiology and facilitate the development of designing better vaccines. Materials and
Methods: Coding sequences of the E2 gene of Pestiviruses A (n = 89), B (n = 60), C (n = 75), D (n = 10), F (n = 07), H (n = 52), and K (n = 85) were included in this study. They were analyzed using different methods to estimate the codon usage bias and evolution. In addition, the maximum likelihood and Bayesian methodologies were employed to analyze a molecular dataset of seven Pestiviruses using a complete E2 gene region.
Results: The combined analysis of codon usage bias and evolutionary rate analysis revealed that the Pestiviruses A, B, C, D, F, H, and K have a codon usage bias in which mutation and natural selection have played vital roles. Furthermore, while the effective number of codons values revealed a moderate bias, neutrality plots indicated the natural selection in A, B, F, and H Pestiviruses and mutational pressure in C, D, and K Pestiviruses. The correspondence analysis revealed that axis-1 significantly contributes to the synonymous codon usage pattern. In this study, the evolutionary rate of Pestiviruses B, H, and K was very high. The most recent common ancestors of all Pestivirus lineages are 1997, 1975, 1946, 1990, 2004, 1990, and 1990 for Pestiviruses A, B, C, D, F, H, and K, respectively. This study confirms that both mutational pressure and natural selection have played a significant role in codon usage bias and evolutionary studies.
Conclusion: This study provides insight into the codon usage bias and evolutionary lineages of pestiviruses. It is arguably the first report of such kind. The information provided by the study can be further used to elucidate the respective host adaptation strategies of the viruses. In turn, this information helps study the epidemiology and control methods of pestiviruses. Copyright: © Shueb, et al.

Entities:  

Keywords:  Flaviviridae; India; Pestivirus; codon usage bias; evolutionary analysis; glycoprotein E2

Year:  2022        PMID: 36185504      PMCID: PMC9394142          DOI: 10.14202/vetworld.2022.1857-1868

Source DB:  PubMed          Journal:  Vet World        ISSN: 0972-8988


Introduction

Pestivirus, a genus of the Flaviviridae family comprising 11 species, infects bovines, sheep, and pigs. They are approximately 50 nm in diameter, exhibiting spherically enveloped geometries. In addition, genomes are linear in structure, being approximately 12 kb in length. The attachment of the viral envelope protein E to host receptors, which mediates clathrin-mediated endocytosis, allows the virus to enter the host cell. Thus, the replication model for positive-stranded RNA viruses was used in this study. The technique of transcription is based on the positive-stranded RNA viral transcription. This process of translation is initiated by a virus. By budding, the virus leaves the host cell. Mammalian hosts are natural hosts, and parental transmission pathways exist [1, 2]. Pestiviruses feature a single strand of positive-sense RNA that is approximately 12.5 kb long. Usually, the 3’ end of the genome has no Poly-A. RNA in the genome encodes both structural and nonstructural proteins [3]. Recently, Pestiviruses were classified into the following categories, namely, Pestivirus A, Pestivirus B, Pestivirus C, Pestivirus D, Pestivirus F, Pestivirus H, and Pestivirus K, which were included in our study. Pestivirus A, also known as bovine viral diarrhea virus 1 (BVDV-1), causes bovine viral diarrhea and mucosal disease; Pestivirus B, also known as bovine viral diarrhea virus 2 (BVDV-2), causes bovine viral diarrhea and mucosal disease; Pestivirus C, often known as classical swine fever virus (CSFV), causes classical swine fever; Pestivirus D, often known as border disease virus (BDV), causes border disease; Pestivirus F is also known as Bungowannah virus; Pestivirus H is also known as Hobi-like pestivirus; and Pestivirus K is also known as atypical porcine pestivirus [4]. The codon usage bias is the most favorable element in host-virus evolution. Codon bias is defined as the non-random selection of synonymous codons for each gene or genome. Furthermore, it is organism-specific, where it can be influenced by GC content, gene lengths, and gene expression level. To comprehend the molecular process of expression and the impact of long-term evolution on a genome, it is necessary to investigate the recognition of a diverse pattern of codons with distinct biological consequences. Codon bias is the most preferred and widely used hypothetical analytic tool for analyzing codon usage [5, 6]. The determination of codon usage reflects the aggregate effects of three evolutionary forces, including genetic drift within a sample, natural selection, and mutational pressure. Overall, shuffle in the GC and AT (U) pairs cause nucleotide composition bias. Furthermore, the efficiency of maximizing protein production by the preferred codons is known as natural selection. Genetic drift results following the eradication of codon changes among generations due to emigration and immigration at the population level [7]. Codon usage patterns can also provide information on the evolutionary process, virus adaptation to the host, genetic drifts, selection, and mutation pressure, among several factors. Variations in gene expression and protein synthesis efficiency may be caused by a bias in the codon usage pattern. In addition, the extent to which the bias in the codon usage pattern affects viral-host adaptiveness influences replication efficiency, virulence, protein synthesis, and virus survival. Several studies have suggested that mutational pressure is the primary driving force behind the formation of a codon use pattern [8, 9]. Bayesian Evolutionary Analysis by Sampling Tree (BEAST) is a quick and easy-to-use software (https://beast.community/) that has become a popular platform for resolving evolutionary analysis and phylogenetic time-trees. It provides the Bayesian Markov Chain Monte Carlo (MCMC) [10] technique or method for phylogenetic reconstruction, which is already the most widely used and fundamental approach. It also creates a platform for analyzing several data partitions simultaneously, which is useful for estimating the single multilocus coalescent analysis. BEAUti is an analysis engine built within the BEAST software that facilitates the creation of a modeling file without any Graphical User Interface (GUI) programming. It offers the ability to check points and restart analysis. Furthermore, it provides a template-based GUI enhancement, an extensible XML format, and a tool tracer, an in-built tool of BEAST software. BEAUti enables the user to see the log file in the graphical format generated after BEAST execution. A tree annotator program is used to burn the tree file. In addition, the Figtree software (https://beast.community/) was used to further visualize the phylogenetic tree and show the year of time of the most recent common ancestor (tMRCA). In contrast, the system requires a Java platform to run the BEAST software in Linux or Ubuntu. To estimate selection and evolutionary pressure on protein-coding areas, the proportion of the substitution rate at non-synonymous and synonymous sites is quantified. The dN/dS ratio is the most used method [9]. The glycoprotein E2 is the major immunodominant glycoprotein on the outer surface of Pestiviruses, which induces neutralizing antibodies in the infected host [11, 12]. It plays an important role in virus attachment and entry into the host [13-15], which determines the host specificity of the Pestiviruses [16, 17]. Studies have shown that there are no available drugs to treat Pestiviruses in animals. Therefore, this study aimed to investigate the codon usage bias and evolutionary analysis of E2 protein of all Pestiviruses affecting animals. This study will provide relevant information to elucidate the processes of gene expression, drug discovery, and epidemiology, which may be useful in designing newer vaccines.

Materials and Methods

Ethical approval

Ethical approval was not necessary as the study materials were collected through the public literature database.

Data collection and sequence editing

The available coding sequences of the E2 gene of Pestiviruses A, B, C, D, F, H, and K were individually downloaded in FASTA format from the GenBank database of National Center for Biotechnology Information (NCBI). MEGA-X software (https://www.megasoftware.net/) and the multiple sequence comparison by log-expectation (MUSCLE) codon were then used to edit and align the coding sequences [18], which were used for further analysis.

Nucleotide composition analysis

MEGA-X software was used to calculate the overall nucleotide content and frequency of bases at third codon sites (A3, C3, T3, and G3) of Pestiviruses A, B, C, D, F, and H. In addition, the SeqinR [Biological Sequences Retrieval and Analysis, it is a package of R (https://cran.r-roject.org/ package=seqinr)] was used to obtain the overall frequency of nucleotide bases, including the composition of G+C contents, GC, GC content at the first codon site, GC1, GC content at the second codon site GC2, GC content at the third codon site GC3, and the average of GC contents at first and second codon site GC12 [18].

Relative dinucleotide abundance analysis

The relative abundance of dinucleotides may play a role in determining codon usage indices. A total of 16 different dinucleotide occurrences are possible. The outline of the dinucleotide abundance frequency defines both mutational and selection pressure. In the present study, the relative dinucleotide abundance in E2 gene of seven Pestiviruses was determined using the method established by Karlin and Burge [19] as below PXY = FXY/(FX FY), Where FX and FY are the frequencies of individual nucleotides, and dinucleotides are indicated by FXY in the same equation. PXY > 1.23 is regarded as high relative abundance, whereas PXY < 0.78 is considered low relative abundance. The required external library “seqinR” was used to determine the dinucleotide frequencies in the R Studio programming language [8].

Relative synonymous codon usage (RSCU) analysis

For a specific amino acid, the RSCU method is defined as the ratio of the observed value to the predicted value. The frequency of amino acids or the sequence length does not affect the RSCU values. RSCU estimates above 1.6 values are overrepresented, whereas those with fewer than 0.6 values are underrepresented. However, RSCU estimates between 1.6 and 0.6 are considered unbiased or randomly used. The following formula was used to calculate the RSCU values [20]. Where g is the observed number of the ith codon for the jth amino acid, and ni is the number of synonymous codons. R Studio programming software (https://cran.r-project.org) and the “seqinR” library were used to obtain and visualize RSCU values for all seven Pestiviruses [8].

The effective number of codons (ENC) analysis

The ENC evaluation reflects the deviation of a codon from random selection. Typically, ENC estimates range from 20 to 60. The value 20 represents an extremely biased situation in which only one codon is used to code for each amino acid. A value of 60 reveals that there is no bias, thus indicating that the codons have been used equally. The codon usage is somewhat biased if the ENC values are fewer than 45. The ENC value was evaluated using the following formula: Where F (i = 2, 3, 4, and 6) denotes the average F in the i - fold degenerate amino acid family. Where the Fi value is calculated using: Where n is the total number of observed codons for a given amino acid, and n is the total number of observed j codons for that amino acid. The ENC values for the Pestiviruses A, B, C, D, F, H, and K were calculated using R Studio programming software using the “vhica [Vertical and Horizontal Inheritance Consistence Analysis, it is package in R. (https://cran.r-project.org/package=vhica)]” library. The ENC plot was generated to show the relationship between the ENC and GC3 frequencies (the sum of G&C nucleotide at the third codon position). Furthermore, this precise method for estimating absolute synonymous codon usage, defines and quantifies codon usage bias in a gene or genome [8, 21]. The formula for calculating ENC plot values is Where s indicates the GC3 content. If the ENC values fall on the standard curve, it indicates that mutational pressure is the only factor influencing the codon usage and the values that fall below the standard curve show that the codon bias is constrained by another factor, that is, natural selection [8, 21].

Neutrality plot analysis

The neutrality plot analysis is used to examine the effect of mutational pressure and natural selection on codon usage patterns. In addition, a neutrality plot was constructed by plotting the GC3 data against the GC12 mean. If the GC3 value is significant and closes to one, mutational pressure plays a major impact in shaping the codon usage pattern over natural selection. However, if the regression slope is = 0, natural selection has a massive impact. A similar technique was used to plot the GC12 values against the GC3 values for each Pestivirus. The neutrality plot’s regression line represents the mutational pressure [22, 23].

Parity rule 2 (PR2) plot analysis

The GC bias on the horizontal axis (G3/[G3+C3]) and AT bias (A3/[A3+T3]) on the vertical axis were plotted in a PR2 or parity rule 2 analysis. Based on the genome makeup, the study generally indicates the relative degree of natural selection and mutation pressure. Both axes will have a 0.5 origin (X = 0.5, Y = 0.5). In addition, the fact that A = T, G = C points are located on the origin implies that natural selection and mutational pressure are not in conflict [22, 24].

Correspondence analysis (COA)

The codon usage bias varies from one gene to another. Therefore, the COA was used to compute the relationship and variation in codon usage among the Pestiviruses A, B, C, D, F, H, and K, based on a previous study by Greenacre [25]. The values of 59 synonymous RSCU codons were plotted across two axes (axis-1 and axis-2) in a plot. CodonW (https://codonw.sourceforge.net/) software was used to perform the COA, which was further visualized in R programming software [24].

Grand average hydrophobicity (GRAVY) and aromaticity (AROMO)

The GRAVY is calculated by dividing the total number of hydropathy values of all amino acids in a sequence by the number of residues. The typical range of hydropathy was between −2.0 and +2.0, where the positive and negative values indicated hydrophobicity and hydrophilicity, respectively. AROMO is the frequency of aromatic amino acids in a sample amino acid sequence, such as Trp, Tyr, and Phe. The total values of GRAVY and AROMO were calculated using the CodonW tool (CodonW download | SourceForge.net) [8].

Correlation analysis

Correlation analysis was conducted for each Pestivirus separately using R Studio programming software and the “corrgram” library. This analysis evaluated the nucleotide composition of A,T,G,C,A3,T3,G3,C3,GC,GC1,GC2,GC3, and other factors such as ENC, CAI, GRAVY, and AROMO [8].

Evolutionary rate analysis

In this study, complete gene sequences of E2 of all Pestiviruses were extracted, aligned, and edited using MEGA-X software. The E2 sequences of Pestiviruses A, B, C, D, F, H, and K were from 1992 to 2020, 1990 to 2018, 1966 to 2019, 1994 to 2019, 2010 to 2014, 2014 to 2017, and 2006 to 2019, respectively. Furthermore, a phylogenetic model was selected based on the Akaike information criteria obtained from the jModelTest2 tool. The BEAUti interface of the BEAST software was used to build the input analysis. In addition, the four molecular clock models (relaxed clock log-normal, relaxed clock exponential, strict clock, and random local clock) were considered with Coalescent: Bayesian skyGrid and Coalescent: Extended Bayesian skyline plot trees [7]. Moreover, MCMC methods, a class of algorithms in the BEAST, were used to assess the evolutionary rate and tMRCA. The MCMC chains were repeatedly changed until the constraints had an effective sample extent >200. The BEAST-generated log files were analyzed using the BEAST integrated Tracer tool.

Results

Codon usage analysis

Data collection and sequence editing

The number of coding sequences of Pestiviruses A (n = 89), B (n = 60), C (n = 75), D (n = 10), F (n = 07), H (n = 52), and K (n = 85) was retrieved from the NCBI database. Sequences with 99% similarity were removed from the study. MUSCLE algorithm and MEGA-X software were used to align and edit all protein-coding sequences.

Analysis of nucleotide composition and relative dinucleotide abundance frequency

In this study, the nucleotide composition was analyzed. The evaluated frequency of nucleotide composition is depicted in Table-1 and Figure-1. The findings showed that the nucleotide composition affects the codon usage bias of gene E2 for all Pestiviruses. Furthermore, R studio software was used to estimate the relative abundance frequency of 16 dinucleotides of the selected Pestiviruses, including A, B, C, D, F, H, and K. Overrepresented dinucleotides have a frequency value >1.23, whereas underrepresented dinucleotides have a frequency value <0.78. The overall abundance frequency of dinucleotides of all seven Pestiviruses is given in Table-2 and Figure-2;
Table 1

Nucleotide compositions in E2 gene of Pestiviruses A, B, C, D, F, H, and K.

ComponentsABCDFHK
T23.21 ± 0.5322.56 ± 0.6823.45 ± 0.2823.35 ± 0.3623.39 ± 0.0722.83 ± 2.1421.41 ± 0.59
C20.96 ± 0.6118.83 ± 0.7421.39 ± 0.3020.13 ± 0.6621.42 ± 0.0620.76 ± 0.7920.41 ± 0.51
A30.48 ± 04533.02 ± 0.8928.28 ± 0.2830.05 ± 0.8030.94 ± 0.1129.54 ± 0.8330.70 ± 0.47
G25.33 ± 0.5325.57 ± 0.7526.86 ± 0.1726.45 ± 0.7124.23 ± 0.0926.86 ± 1.4827.45 ± 0.61
GC46.30 ± 0.6544.42 ± 1.0848.26 ± 0.3646.58 ± 0.8645.66 ± 0.0747.62 ± 1.6247.87 ± 0.78
GC148.28 ± 1.9147.56 ± 1.5649.01 ± 1.4946.62 ± 1.3442.72 ± 0.1549.49 ± 1.8248.57 ± 0.84
GC242.98 ± 3.1339.58 ± 1.5042.88 ± 0.6545.15 ± 3.8946.11 ± 0.1042.38 ± 5.4943.90 ± 0.64
GC347.61 ± 2.0946.12 ± 2.6552.90 ± 1.2747.98 ± 2.5948.14 ± 0.2051.01 ± 1.9351.15 ± 2.13
T322.55 ± 1.5118.19 ± 2.5921.22 ± 1.5222.64 ± 1.9921.37 ± 0.1322.25 ± 1.7219.27 ± 1.76
C324.01 ± 2.7621.66 ± 1.6527.98 ± 0.7822.28 ± 4.1823.42 ± 0.1424.39 ± 3.5622.98 ± 1.57
A329.76 ± 1.2335.67 ± 2.5425.86 ± 0.5429.37 ± 1.4030.47 ± 0.2426.74 ± 1.7929.56 ± 1.25
G323.66 ± 2.2524.46 ± 1.8224.92 ± 0.8725.70 ± 2.2424.71 ± 0.2226.60 ± 3.7328.16 ± 1.34
Figure-1

Overall nucleotide composition frequencies (mean) of E2 gene of Pestiviruses A, B, C, D, F, H, and K.

Table 2

Dinucleotide composition of E2 gene of Pestiviruses A, B, C, D, F, H and K.

DinucleotidesABCDFHK
AA0.99891.07230.94651.02310.97030.9521.1916
AC1.14990.89931.25841.22541.07811.25180.9553
AG0.76920.98481.01340.99181.00331.00480.8698
AT1.05140.98010.81090.79820.95560.85260.9481
CA1.11281.11091.2731.28521.39751.19291.1997
CC1.23011.69091.01911.10171.16941.30631.1743
CG0.46190.20180.53270.45780.2890.43670.4461
CT1.30141.21081.17211.14281.05871.12881.2091
GA1.04741.09850.98960.9580.76730.99320.7221
GC0.80831.0090.7990.84780.85010.75581.1896
GG1.34491.01151.12341.13821.26661.09321.1168
GT0.73490.8671.07091.03721.17371.0741.0812
TA0.81160.72440.82440.77250.91880.90360.8849
TC0.80690.60540.8880.79420.88230.68470.6682
TG1.46981.62561.27341.32941.37711.36421.5748
Figure-2

Dinucleotide composition of E2 gene of Pestiviruses A, B, C, D, F, H, and K.

Nucleotide compositions in E2 gene of Pestiviruses A, B, C, D, F, H, and K. Overall nucleotide composition frequencies (mean) of E2 gene of Pestiviruses A, B, C, D, F, H, and K. Dinucleotide composition of E2 gene of Pestiviruses A, B, C, D, F, H and K. Dinucleotide composition of E2 gene of Pestiviruses A, B, C, D, F, H, and K. Pestivirus A: Among the 16 dinucleotide bases, four were overrepresented >1.23: CC (1.2301), CT (1.3014), GG (1.3449), and TG (1.4698). Likewise, AG (0.7692), CG (0.4619), and GT (0.7349) were found to be underrepresented <0.78. Pestivirus B: Among all the 16 dinucleotide bases, two dinucleotides CC (1.6909) and TG (1.6256) were overrepresented >1.23. Likewise, CG (0.2018), TA (0.7244), and TC (0.6054) were observed as underrepresented <0.78. Pestivirus C: Among all the 16 dinucleotide bases, two dinucleotides AC (1.2582) and TG (1.2734) were overrepresented >1.23. Likewise, CG (0.5327) was observed as underrepresented <0.78. Pestivirus D: Among all the 16 dinucleotide bases, two dinucleotides CA (1.2852) and TG (1.3294) were overrepresented >1.23. Likewise, CG (0.4578) and TA (0.7725) were observed as underrepresented <0.78. Pestivirus F: Among all the 16 dinucleotide bases, three dinucleotides CA (1.3975), GG (1.2666), and TG (1.3771) were overrepresented >1.23. Likewise, CG (0.2890) and GA (0.7673) were observed as underrepresented <0.78. Pestivirus H: Among all the 16 dinucleotide bases, three dinucleotides AC (1.2518), CC (1.3063), and TG (1.3642) were overrepresented >1.23. Likewise, CG (0.4367), GC (0.7558), and TC (0.6847) were observed as underrepresented <0.78. Pestivirus K: Among all the 16 dinucleotide bases, one dinucleotide TG (1.5748) was overrepresented >1.23. Likewise, CG (0.4461), GA (0.7221), TC (0.6682), and TT (0.7544) were observed as underrepresented <0.78.

Analysis of RSCU

The RSCU of each Pestivirus was determined and plotted using the R studio programming software. Each synonymous codon’s frequency value is classified based on the RSCU, which ranged from 0.6 to 1.6. Values >1.6 were classified as overrepresented synonymous codons, whereas values <0.6 were classified as underrepresented synonymous codons. The over- and under-represented codons are highlighted in yellow and blue, respectively (Table-3). Codons with a significant frequency value >1.0 are referred to as high frequency or positively biased codons. The frequency <1.0, on the other hand, is referred to as a lower frequency or negatively biased codon (Figure-3).
Table 3

Relative synonymous codons usage of each amino acid in E2 gene of Pestiviruses A, B, C, D, F, H, and K. Over represented codons (>1.6) are highlighted in yellow and underrepresented codons (<0.6) are in blue.

CodonABCDFHK
AAA11.55560.71.361.30770.81821.6
AAC1.166701.06671.27271.23081.50.6154
AAG10.44441.30.640.69231.18180.4
AAT0.833300.93330.72730.76920.51.3846
ACA1.252.51.44441.41941.57581.51.5385
ACC0.50.51.22221.29030.969710.9231
ACG10.50.44440.64520.36360.3750.3077
ACT1.250.50.88890.64521.09091.1251.2308
AGA2.143.17653.47371.41181.66672.625
AGC0.28571120.66670.81.7647
AGG1.822.11761.26323.88243.66671.875
AGT0.857121.33331.33331.111120.3529
ATA1.82.41.84621.51.90911.71432.3333
ATC0.60.60.46150.90.68180.85710.3333
ATG1111111
ATT0.600.69230.60.40910.42860.3333
CAA1.7521.33331.16671.07691.57140.8
CAC0.444421.142911.33330.750
CAG0.2500.66670.83330.92310.42861.2
CAT1.555600.857110.66671.252
CCA2.52631.33331.521.921.33331.6
CCC0.21050.66671.2511.120.88890.8
CCG0.210500.50.250.161.11110.8
CCT1.052620.750.750.80.66670.8
CGA0.9000.315800.33330
CGC0.600.35290000.375
CGG0.300.35290.63160.352900.75
CGT0.3000.31580.35290.33330.375
CTA1.24141.20.85710.70590.82761.27270.9
CTC0.41380.61.20.70590.62070.72730.9
CTG1.448301.37141.94121.65520.90911.5
CTT1.03450.60.34290.35290.82761.09090.3
GAA10.66670.95651.08331.21.21.75
GAC1.2511.04760.85711.16670.85711.7143
GAG11.33331.04350.91670.80.80.25
GAT0.7510.95241.14290.83331.14290.2857
GCA2.133302.22221.251.61.61.6667
GCC0.820.88890.751.33331.61.6667
GCG0.266700.22220.50.53330.26670.3333
GCT0.820.66671.50.53330.53330.3333
GGA1.33331.14290.64521.17650.68971.10340
GGC0.76191.14290.64520.47060.68970.55171.8947
GGG0.95241.71431.54841.17651.51721.10340.6316
GGT0.952401.16131.17651.10341.24141.4737
GTA1.42861.33330.91430.85711.04350.90320.6667
GTC0.285701.142910.86960.51610.2222
GTG1.71432.66671.37141.57141.9131.80652.4444
GTT0.571400.57140.57140.17390.77420.6667
TAC1.090911.31.36360.92311.27271.25
TAT0.909110.70.63641.07690.72730.75
TCA1.714312.33331.33332.44441.21.7647
TCC1.14291100.44440.40.7059
TCG1.1429100.33330.44440.80.7059
TCT0.857100.333310.88890.80.7059
TGC1.71431.21.21.41181.05261.29411.5
TGG1111111
TGT0.28570.80.80.58820.94740.70590.5
TTA0.62072.40.51431.05880.82760.72731.2
TTC10.81.41180.751.27271.1250
TTG1.24141.21.71431.23531.24141.27271.2
TTT11.20.58821.250.72730.8752
Figure-3

Overall frequencies of relative synonymous codon usage (RSCU) in E2 gene of Pestiviruses A, B, C, D, F, H, and K.

Relative synonymous codons usage of each amino acid in E2 gene of Pestiviruses A, B, C, D, F, H, and K. Over represented codons (>1.6) are highlighted in yellow and underrepresented codons (<0.6) are in blue. Overall frequencies of relative synonymous codon usage (RSCU) in E2 gene of Pestiviruses A, B, C, D, F, H, and K. Pestivirus A: Nine codons were found to be over-represented, while thirteen codons were found to be under-represented. In addition, 26 high-frequency codons and 27 low-frequency codons were discovered. Pestivirus B: Thirteen codons were found to be over-represented, while nineteen codons were found to be under-represented. In addition, 23 high-frequency codons and 28 low-frequency codons were discovered. Pestivirus C: Six codons were found to be over-represented, while 14 codons were found to be under-represented. In addition, 27 high-frequency codons and 31 low-frequency codons were discovered. Pestivirus D: Three codons were found to be over-represented, while 11 codons were found to be under-represented. In addition, 27 high-frequency codons and 28 low-frequency codons were discovered. Pestivirus F: Six codons were found to be over-represented, while 12 codons were found to be under-represented. In addition, 27 high-frequency codons and 34 low-frequency codons were discovered. Pestivirus H: Five codons were found to be over-represented, while 13 codons were found to be under-represented. In addition, 27 high-frequency codons and 31 low-frequency codons were discovered. Pestivirus K: Thirteen codons were found to be over-represented, while 17 codons were found to be under-represented. In addition, 25 high-frequency codons and 33 low-frequency codons were discovered.

Analysis of the ENC

The ENC was calculated for all of the chosen Pestiviruses. Furthermore, the value was used to evaluate the extent of codon usage in a specific Pestivirus. However, in this study, the ENC values for Pestiviruses A, B, C, D, F, H, and K were 48.97–56.59 (standard deviation [SD] ± 1.5848), 37.74–51.50 (SD ± 2.7814), 49.17–55.40 (SD ± 1.3044), 49.78–54.83 (SD ± 1.4489), 51.73–52.20 (SD ± 0.1652), 35.88–61.00 (SD ± 5.6064), and 45.24–52.32 (SD ± 1.8514), respectively. To illustrate and compare the selection and mutational pressure, the ENC of the E2 gene of all Pestiviruses was plotted in a single frame (Figure-4). Each color point represents one of the seven Pestiviruses. Points located directly below and close to the standard curve indicate a significant role in natural selection and a minor influence from GC3.
Figure-4

The ENC plot representing the relationship of number of codons against GC3. Each point represents the E2 sequences of Pestiviruses A, B, C, D, F, H and K.

The ENC plot representing the relationship of number of codons against GC3. Each point represents the E2 sequences of Pestiviruses A, B, C, D, F, H and K.

Analysis of neutrality plot

The nucleotide composition of GC12 (mean value of GC1 and GC2) versus GC3 was calculated to determine the major factors influencing the natural selection and mutational pressure, which were used to plot the neutrality. The slope of the regression line behaves as an indicator. Thus, it is expressed as the evolutionary rate of natural selection and mutational pressure, as shown in the plot. The regression coefficient against GC12 and GC3 is denoted as the natural-mutational equilibrium coefficient. The neutrality plot was interpreted in the following way Pestivirus A: Mean values of GC12 and GC3 were distributed around the regression line, the negative regression line and R-value observed in E2 gene was y = 0.613–0.329, R² = 0.16, that shows 32.9% neutrality indicating the role of natural selection over mutational pressure (Figure-5).
Figure-5

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus A.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus A. Pestivirus B: Mean GC12 and GC3 values were enclosed around the regression line, and the negative regression line and R-value were observed in E2 gene was y = 0.444–0.0169, R² < 0.01, and the influence of neutrality was 1.69%, indicating the role of natural selection over mutational pressure (Figure-6).
Figure-6

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus B.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus B. Pestivirus C: Mean values of GC12 and GC3 were surrounded the regression line, the negative regression line and R-value observed in E2 gene was y = 0.802–0.648, R² = 0.72, indicating that neutrality was 64.8%, revealing the role of mutational pressure over natural selection (Figure-7).
Figure-7

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus C.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus C. Pestivirus D: The regression line is being surrounded by mean values of GC12 and GC3, and a negative regression line, the negative regression and R-value E2 gene was y = 0.771–0.65, R² = 0.65, indicating the neutrality was 65%, depicting the role of mutational pressure over natural selection (Figure-8).
Figure-8

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus D.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus D. Pestivirus F: The regression line is being surrounded by mean values of GC12 and GC3, and a negative regression line, the negative regression and R-value E2 gene was y = 0.506–0.129, R² = 0.10, indicating that neutrality is 12.9% revealing the role of natural selection over mutational pressure (Figure-9).
Figure-9

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus F.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus F. Pestivirus H: The regression line is being surrounded by mean values of GC12 and GC3, and a negative regression line, the negative regression and R-value E2 gene was y = 0.668–0.387, R² = 0.09, showing that neutrality is 38.7% indicating the role of natural selection over mutational pressure (Figure-10).
Figure-10

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus H.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating natural selection. The points represent the E2 sequences of Pestivirus H. Pestivirus K: The regression line was enclosed by GC12 and GC3 mean values, the positive regression line and R-value in E2 gene was y = 0.0523+0.8, R² = 0.86, showing that neutrality is 80%, indicating the role of mutational pressure over natural selection (Figure-11).
Figure-11

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus K.

Neutrality plot showing the relationship between GC12% and GC3% with the slope line indicating mutational pressure. The points represent the E2 sequences of Pestivirus K.

Analysis of PR2 plot

The direction and degree of the bias are represented by the PR2’s origin. When the PR biases are estimated at the third position of AT and GC content, the PR2 bias plot is found to be comparatively informative. According to Chargaff’s second parity rule, the nucleotide composition of Deoxyribonucleic acid is A=T, G=C. (PR2). Therefore, the origin is the point at which there is no accumulation of bias. The PR2 plot is created by plotting the (G3/[G3+C3]) values on the X-axis against the (A3/[A3+T3]) values on the Y-axis. In this study, the mean values of (G3/[G3+C3)]) and (A3/[A3+T3]) of each Pestivirus (Figure-12) are as follows:
Figure-12

Parity rule 2 plots AT-bias against GC-bias. Each point represents E2 sequences of Pestiviruses A, B, C, D, F, H, and K.

Pestivirus A: The calculated mean value of GC and AT bias was 0.49 and 0.57, respectively, implied AT dominance over the GC. Pestivirus B: The calculated mean GC and AT bias values were 0.53 and 0.66, respectively, revealed the domination of AT over the GC. Pestivirus C: The calculated mean GC and AT bias values were 0.47 and 0.55, respectively, which indicated AT’s dominance over the GC. Pestivirus D: The calculated mean GC and AT bias values were 0.54 and 0.56, respectively, revealed AT dominance over the GC. Pestivirus F: The calculated mean GC and AT bias values were 0.51 and 0.58, respectively, indicated AT dominance over the GC. Pestivirus H: The calculated mean value of GC and AT bias was 0.55 and 0.56, respectively. Here, Purines and pyrimidines both contribute equally. Pestivirus K: The calculated mean GC and AT bias values were 0.55 and 0.60, respectively, revealed AT dominance over the GC. In this study, none of the genes have an A=T, G=C composition, thereby indicating a bias among the Pestiviruses studied. The A, C, D, and H Pestiviruses have a slightly lower bias than those of B, F, and K Pestiviruses because the points of the latter were located away from the origin. Clearly, the PR2 plot indicated that the bias occurred at the third position of AT and GC in the studied Pestiviruses, thereby implying that natural selection has a significant influence over mutational pressure. Parity rule 2 plots AT-bias against GC-bias. Each point represents E2 sequences of Pestiviruses A, B, C, D, F, H, and K.

COA

In the COA analysis, axis-1 elucidated 9.5%, 33.1%, 13.4%, 5.4%, 0.15%, 1.37%, and 0.65% contribution, whereas axis-2 elucidated 0.69%, 4.9%, 3.2%, 0.85%, 0.05%, 1.0%, and 2.8% contribution for A, B, C, D, F, H, and K Pestiviruses, respectively. Therefore, axis-1 showed higher usage of the E2 gene of Pestivirus B (33.1%) and comparatively lower usage of the E2 gene of other Pestiviruses (Figure-13). Axis-1 undertook a different approach when devising the codon usage pattern. Furthermore, the contribution of axis-1 indicated greater codon usage variation among the Pestiviruses A, B, C, D, F, H, and K.
Figure-13

Correspondence analysis, showing a greater contribution of Axis-1 in shaping the codon usage pattern in E2 sequences of Pestiviruses A, B, C, D, F, H, and K.

Correspondence analysis, showing a greater contribution of Axis-1 in shaping the codon usage pattern in E2 sequences of Pestiviruses A, B, C, D, F, H, and K.

GRAVY and AROMO

Pestiviruses A, B, C, D, F, H, and K were evaluated to analyze the correlation of hydrophobicity and AROMO between ENC and GC3. In Pestivirus A, ENC-AROMO and ENC-GRAVY had non-significant negative values of −0.32206 and −0.18685, respectively. However, GC3-AROMO and GC3-GRAVY revealed positive but non-significant values of 0.70123 and 0.67793, respectively. In Pestivirus B, ENC-AROMO and ENC-GRAVY indicated a negative non-significant value of −0.12289 and a negative significant value of −0.00604, respectively, whereas GC3-AROMO and GC3-GRAVY revealed positive non-significant values of 0.21998 and 0.25379, respectively. In Pestivirus C, ENC-AROMO and ENC-GRAVY had negative non-significant values of −0.62046 and −0.42875, respectively, whereas GC3-AROMO and GC3-GRAVY had positive non-significant values of 0.93125 and 0.87876, respectively. In Pestivirus D, ENC-AROMO and ENC-GRAVY had non-significant negative values of −0.07384 and −0.34519, respectively, whereas GC3-AROMO and GC3-GRAVY had a negative non-significant value of −0.39537 and a positive non-significant value of 0.76064, respectively. In Pestivirus F, all ENC-AROMO, ENC-GRAVY, GC3-AROMO, and GC3-GRAVY values revealed positive non-significant values of 0.22178, 0.64137, 0.76529, and 0.64449, respectively. In Pestivirus H, ENC-AROMO and ENC-GRAVY had positive non-significant values of 0.10125 and 0.15759, respectively, whereas GC3-AROMO and GC3-GRAVY had a negative non-significant value of −0.21726 and a positive non-significant value of 0.07654, respectively. In Pestivirus K, ENC-AROMO and ENC-GRAVY had positive non-significant values of 0.30111 and 0.13769, respectively, whereas GC3-AROMO and GC3-GRAVY had a positive non-significant value of 0.21688 and a negative non-significant value of −0.18782, respectively. Therefore, the correlation values of GRAVY between ENC and GC3 (hydrophobicity) and AROMO between ENC and GC3 (AROMO) were observed to be non-significant. They were shown to be non-contributory factors while shaping the codon use bias in all seven Pestiviruses.

Evolutionary characteristics analysis

MEGA-X software was used to align and modify the sequences of Pestiviruses A (n = 89), B (n = 60), C (n = 75), D (n = 10), F (n = 07), H (n = 52), and K (n = 85). In addition, the Genetic Algorithm for Recombination Detection was applied to investigate any possible recombination in the Pestiviruses. However, the findings showed no evidence of possible recombination in the Pestiviruses. Therefore, the dataset comprising the FASTA sequences of the identified genes was directly used to calculate the evolutionary rate. This was conducted to determine the important changes in the evolutionary rate during the period. Furthermore, by adopting the Bayesian coalescent approach, this study employed complete nucleotide sequences of the E2 gene of Pestiviruses A, B, C, D, F, H, and K to calculate the tMRCA and substitution rate (s/s/y). The evolutionary rates for Pestivirus A, B, C, D, F, H, and K were calculated as follows 2.67 × 10−4 with 95% HPD (lowest 1.36 × 10−7, highest 5.9 × 10−4), 1.35 × 10−3 with 95% HPD (lowest 5.8 × 10−4, highest 2.13 × 10−3), 6.01 × 10−4 with 95% HPD (lowest 3.3 × 10−4, highest 9.1 × 10−4), 6.53 × 10−11 with 95% HPD (lowest 3.3 × 10−9, highest 3.5 × 10−12), 7.37 × 10−4 with 95% HPD (lowest 6.4 × 10−6, highest 1.9 × 10−3), 1.35 × 10−3 with 95% HPD (lowest 3.3 × 10−8, highest 3.5 × 10−3), and 1.53 × 10−3 with 95% HPD (lowest 6.7 × 10−4, highest 2.5 × 10−3), respectively. Analyzing the MCC tree of Pestiviruses A, B, C, D, F, H, and K revealed that the tMRCA ages were 1997, 1975, 1946, 1990, 2004, 1990, and 1990, respectively (Supplementary data can be available from the corresponding author). The evolutionary rate of Pestivirus C (53 years) evolved at a rapid rate compared to that of Pestiviruses A (28), B (28), D (25), F (04), H (13), and K (13). Thus, this indicated that Pestivirus C was the first virulent virus noted in the Pestivirus family.

Discussion

Pestiviruses have shown large genetic diversity. The envelope glycoprotein E2 of Pestiviruses A, B, C, D, F, H, and K elicits neutralizing antibodies. Therefore, the main target of our study was to assess the codon usage bias and the evolutionary rate of these viruses. The nucleotide composition of each E2 gene in the Pestivirus was examined in this study. These findings revealed that nucleotide A was used abundantly in all Pestiviruses, which might be a genomic characteristic in members of the genus. Furthermore, the dinucleotides TG were overrepresented among the Pestiviruses, except for the Pestivirus D, whereas CG was underrepresented in all Pestiviruses, thereby indicating that each Pestivirus has a varied abundance of dinucleotides. By elucidating the variations or substitutions of the nucleotide compositions, it was shown that the mononucleotide and dinucleotides significantly contribute to shaping the codon usage pattern of the Pestiviruses.In addition, the magnitude of the relationship between mutational pressure and natural selection was evaluated for each Pestivirus. The PR2 analysis found that the B, F, H, and K were more biased compared with A, C, and D Pestiviruses, among the E2 gene of the Pestiviruses. According to a previous study on codon usage bias [23], viruses located near the origin of the PR2 plot indicated that the sequence is free of bias. The average ENC values for the Pestiviruses A, B, C, D, F, H, and K were 52.95 (SD ± 1.59), 45.62 (SD ± 2.78), 52.0 (SD ± 1.30), 53.42 (SD ± 1.45), 51.97 (SD ± 0.17), 51.57 (SD ± 5.60), and 48.72 (SD ± 1.85), respectively. However, in this study, the ENC scores ranged from 45 to 55, which revealed a moderate level of bias among the sequences, as already demonstrated in the previous studies [5, 22]. Among these Pestiviruses of the E2 gene, Pestivirus D possessed a higher ENC value, whereas Pestivirus B had a lower ENC score. Our findings demonstrated that Pestiviruses moderate codon bias may facilitate genome replication and transcription, thereby implying that the virus’s total codon bias is significantly moderate. The neutrality plot was used to confirm the driving forces of the bias and the number of evolutionary forces. Accordingly, our study’s results showed that natural selection played an important role in Pestiviruses A, B, F, and H, whereas mutational pressure influenced the Pestiviruses C, D, and K. In addition, this study assessed the physical features of amino acids, including hydrophobicity and AROMO and the correlation for all seven Pestiviruses. Findings showed that the hydrophobicity and AROMO of codons did not affect the pattern of codon usage bias. Therefore, the results were found to be insignificant for each of the seven Pestiviruses. The most recent common ancestor of lineages is 1997, 1975, 1946, 1990, 2004, 1990, and 1990 for Pestiviruses A, B, C, D, F, H, and K, respectively. Based on our observations, this study confirmed the significant role of mutational pressure and natural selection in shaping the codon use bias and evolutionary studies. Thus far, this is the first report of such a study.

Conclusion

The glycoprotein E2 of all Pestivirus is a potent protein that induces high neutralizing antibodies in the host. The E2 gene has been extensively used in phylogeny to subgroup virus isolates. Glycoprotein E2 is a potent subunit vaccine candidate. However, the Pestivirus, namely, BVDV-1, BVDV-2, CSFV, and to a certain extent BDV, have been studied extensively concerning molecular epidemiology/phylogeny compared with the other pestiviruses. The Pestivirus-infected animals have always elicited an immune response majorly to E2 protein, and thus the E2 gene has been evolving over some time. The mutational and selection factors under the immune response, in addition to compositional considerations, were found to play a substantial role in shaping the codon usage pattern. Analyzing the codon usage pattern can help with protein and gene expression optimization analysis. In addition, it facilitates the development of alternative viral vector vaccine options. Similarly, codon usage information can also be used to inhibit viral protein synthesis during replication, in contrast to enhancing the protein expression. This study may also provide information on codon usage for other viruses; thus, investigations can be conducted to explore other applications as well.

Authors’ Contributions

MS: Analyzed the data and prepared the results. KPS: Guided and supervised in every step of the work. UBI: Prepared tables and graphs. MSB: Prepared tables and edited the manuscript. NM: Collected the data, conducted the analysis and drafted the manuscript. CS: Analyzed the data. RRA: Drafted the manuscript. ES: Analyzed the data and drafted and edited the manuscript. VS: Extracted the data and edited the manuscript. SPK: Extraction and documentation of data. BRS: Reviewed and edited the manuscript. SKP: Prepared the graphs, and edited the manuscript. SSP: Extracted the data, drafted, and edited the manuscript. All authors have read and approved the manuscript.
  22 in total

1.  The 'effective number of codons' used in a gene.

Authors:  F Wright
Journal:  Gene       Date:  1990-03-01       Impact factor: 3.688

2.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.

Authors:  Koichiro Tamura; Joel Dudley; Masatoshi Nei; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2007-05-07       Impact factor: 16.240

3.  Characterization of classical swine fever virus entry by using pseudotyped viruses: E1 and E2 are sufficient to mediate viral entry.

Authors:  Zai Wang; Yuchun Nie; Peigang Wang; Mingxiao Ding; Hongkui Deng
Journal:  Virology       Date:  2004-12-05       Impact factor: 3.616

4.  An extensive evaluation of codon usage pattern and bias of structural proteins p30, p54 and, p72 of the African swine fever virus (ASFV).

Authors:  Uma Bharathi Indrabalan; Kuralayanapalya Puttahonnappa Suresh; Chandan Shivamallu; Sharanagouda S Patil
Journal:  Virusdisease       Date:  2021-07-22

5.  Comprehensive analysis of the codon usage patterns of polyprotein of Zika virus.

Authors:  Jun Tao; Huipeng Yao
Journal:  Prog Biophys Mol Biol       Date:  2019-05-02       Impact factor: 3.667

6.  Genetic typing of recent classical swine fever isolates from India.

Authors:  S S Patil; D Hemadri; B P Shankar; A G Raghavendra; H Veeresh; B Sindhoora; S Chandan; K Sreekala; M R Gajendragad; K Prabhudas
Journal:  Vet Microbiol       Date:  2009-09-26       Impact factor: 3.293

7.  Pestivirus glycoprotein which induces neutralizing antibodies forms part of a disulfide-linked heterodimer.

Authors:  E Weiland; R Stark; B Haas; T Rümenapf; G Meyers; H J Thiel
Journal:  J Virol       Date:  1990-08       Impact factor: 5.103

8.  Spatial seroprevalence of classical swine fever in India.

Authors:  Sharanagouda S Patil; Kuralayanapalya Puttahonnappa Suresh; Divakar Hemadri; Jagadish Hiremath; Rajangam Sridevi; Paramanadham Krishnamoorthy; Sandeep Bhatia; Parimal Roy
Journal:  Trop Anim Health Prod       Date:  2021-07-05       Impact factor: 1.559

Review 9.  Dinucleotide relative abundance extremes: a genomic signature.

Authors:  S Karlin; C Burge
Journal:  Trends Genet       Date:  1995-07       Impact factor: 11.639

10.  Analysis of codon usage and nucleotide composition bias in polioviruses.

Authors:  Jie Zhang; Meng Wang; Wen-qian Liu; Jian-hua Zhou; Hao-tai Chen; Li-na Ma; Yao-zhong Ding; Yuan-xing Gu; Yong-sheng Liu
Journal:  Virol J       Date:  2011-03-30       Impact factor: 4.099

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.