Literature DB >> 25250137

Translation Elongation Rate Measurement of Epstein-Barr Virus Strain GD1.

Gholamreza Motalleb1.   

Abstract

BACKGROUND: Epstein-Barr Virus (EBV) has a great co relationship with human malignancies such as gastric carcinoma. Synonymous codon investigations in viruses could help designing vaccine, to generate immunity. Codon Adaptation Index (CAI) has measured translation elongation rate, among the highly expressed genes. The aim of this study was: usage of "CAI" to measure translation efficiency to know how fast EBV-GD1 could produce its proteins.
METHODS: The complete genomic sequences of human herpes virus 4 strain GD1 have retrieved from <http://www.ncbi.nlm.nih.gov/sites/gquery> (GenBank accession no. AY961628) to extract all protein-coding genes. The sequences have analyzed with DAMBE software.
RESULTS: The results have shown that CAI values for the EBV-GD1 genes were 0.76356 ± 0.02957. The highest and lowest CAI values were 0.82233 and 0.68321 respectively. The results have shown that highly expressed genes mostly had more codon usage bias than low expressed genes.
CONCLUSION: The results provide and introduce not only a system, but also the principles in order to understand the pathogenesis and evolution of EBV-GD1, to open a window, in order to make a better product or vaccine to challenge with the virus.

Entities:  

Keywords:  Codon usage bias; Epstein-Barr virus; Gene expression

Year:  2013        PMID: 25250137      PMCID: PMC4142932     

Source DB:  PubMed          Journal:  Iran J Cancer Prev        ISSN: 2008-2398


Introduction

The Codon Adaptation Index (CAI) measures the synonymous codon, using bias for DNA or RNA sequence. CAI has known to be an excellent predictor of gene expression in prokaryotes and unicellular eukaryotes. CAI has evaluated the effect of natural selection in pattern of codon usage, and prediction of gene expression level [1, 2] to find highly expressed genes [3, 4], virus genes adaptation evaluation and their hostages [1], indication of heterologous gene expression [5], comparing organisms for codon usage favorites [1], to find the genes transfection horizontally [6, 8] using the genomic codon for bias detection in genomes [9] to study the cell cycle species [10], to optimize DNA vaccines [11], gene therapy [12], vaccine development and recombinant therapeutics [13]. Some have reported the influence of codon usage on the viral cycle among viruses. Adaptation Studies for host codon usage, have indicated viral genes which codify for critical proteins, tend to use the synonymous codons, which mostly represented in the host genome [14 ], but the synonymous codon usage within and between genomes could not be used equally [15]. Epstein-Barr Virus (EBV) is a ubiquitous double stranded DNA virus, derived of human herpes virus family, which has B-lymphotropism. More than 90% of adults have serologic evidence of infection with this virus. It has acquired during early childhood, but the age of infection is much lower in undeveloped countries with low socioeconomic condition [16]. It has been documented that gastric carcinoma, Burkitt’s lymphoma, undifferentiated Nasopharyngeal Carcinoma (NPC), Hodgkin’s disease, B and T-cell lymphoma, and B-cell lympho proliferations among the immune compromised patients could cause by EBV [17-20]. EBV infection is ubiquitous. Iran has a high incidence rate of gastric carcinoma with an annual incidence of 26.1 per 100,000 for males and 11.1 for females [21]. In bio pharmacology, researchers have interested to improve translation efficiency that is derived from protein production. Unfortunately, experiments are tedious and the reality is much more complicated. In the current study, DAMBE software (version 5.3.27) has used to assess CAI, to realize how fast EBV-GD1 could produce its proteins. These data might provide and introduce a system and principles in order to understand the pathogenesis and evolution of EBV-GD1.

Materials and Methods

The research study has started in Winter of 2012. All bioinformatics analysis has performed at bioinformatics facility of Faculty of Science in University of Zabol. Sequences of the genome segments of human herpes virus 4 strain GD1 (GenBank accession no. AY961628) have retrieved from (GenBank accession no. AY961628) to extract all protein coding genes in order to evaluate the effectiveness of CAI from DAMBE [22]. To calculate the CAI for any protein-coding sequence: n is the number of sense codons and the related wij value will always be 1 regardless of codon usage bias of the gene. CAI of a coding sequence (CDS) has calculated from 1) the codon frequencies of the CDS and 2) the codon frequencies of a known highly expressed genes set (often referred to as the reference set) which has been used to generate a column of w values: Where fij.ref is the frequency of codon j in synonymous codon family i, and Maxfi.ref is the maximum codon frequency in synonymous codon family i. The codon whose frequency is Maxfi.ref has been often referred to as the major codon (whose w is 1), and the other codons have referred as minor codons. The major codon has assumed to be the translated optimal codon. The CAI value of a CDS has calculated as below equation: Where m is the number of synonymous codon families, ni is the number of synonymous codons between the codon family i, and fij is the frequency of codon j in codon family i. The exponent is simply a weighted average of ln(w). The maximum CAI value is 1 [23]. Relative Synonymous Codon Usage (RSCU) measures codon usage bias for each codon family. It is calculated directly from input sequences. RSCU is a codon-specific index for codon usage, whereas CAI is a gene-specific index for codon usage, which related to gene expression [23]. The general equation for RSCU is: i is codon family, j is specific codon within the family [23]. For example, i for alanine codon family is GCU, GCC, GCA, and GCG, then j would be a specific codon such as GCU. RSCU measures codon usage bias for each codon family. RSCU is 1 whencodon usage bias does not exist, but RSCU would be higher than 1 when its codon is either overused or vice versa [22].

Results

Human herpes virus 4 strain GD1 genome segment sequences have used to evaluate the effectiveness of CAI from DAMBE. The results have shown that CAI values for the EBV-GD1 genes were 0.76356 ± 0.02957 (Table 1). The highest and lowest CAI values were 0.82233 and 0.68321 respectively. The results have shown for alanine codon family (as an example), genes with high-CAI have more codon usage bias with highest RSCU being 2.923 and the lowest being only 0.246. In contrast, for the low-CAI genes, the highest and lowest RSCU is 2.797 and 0.241 (Table 2 and 3). The results have shown that highly expressed genes mostly had more codon usage bias than lowly expressed genes (Figure 1) but ANOVA for RSCU_H and RSCU_L genes , has not significantly shown difference (P>0.05).
Table 1

Output of codon adaptation index (CAI) for EBV-GD1 (Mean: 0.76356; STD: 0.02957)

SeqNameSeqLenCAISeqNameSeqLenCAI
unknown|173639540.76709unknown|987646510.78868
unknown|97105100.76776unknown|C9946024270.78326
unknown|3625813530.70990unknown|1035788340.82115
unknown|4653810080.76740unknown|10676812150.77914
unknown|4745517730.77478unknown|C1083782250.75569
unknown|491545280.74644unknown|C1115729960.75849
unknown|C4972595280.76936probable DNA20700.78928
packaging
protein|112569
unknown|C5924837170.76630unknown|1125699750.76398
unknown|6296610920.76266unknown|C11349410080.77171
unknown|6413624780.79963unknown|C11448215210.75317
unknown|666299060.80136unknown|C1159756750.80152
unknown|6762812120.76329unknown|C1179937020.72556
unknown|6884710710.77103unknown|C11875812600.75635
unknown|C7047313140.77520unknown|C1200319030.78173
unknown|C718991170.76729unknown|C12095241430.80319
unknown|C7198726220.76897unknown|12562117250.78192
unknown|746546540.76750unknown|C12854621180.77372
unknown|C753688340.79476unknown|C13066818210.75959
unknown|762773060.73176unknown|1324907440.71112
unknown|766554860.71418unknown|13304617100.76698
unknown|C7716025680.73715unknown|13555718150.78338
unknown|C772974440.72631unknown|C1374097440.75202
unknown|798203570.68321unknown|C14048627720.74451
EBNA3B (EBNA4A) latent protein|8290328140.72716unknown|C1495277080.71853
EBNA3C latent protein|8592130270.74198unknown|C15019814070.74320
unknown|C890466690.73457unknown|1502363090.72173
Z protein|C898117350.73358unknown|C1516169360.78017
unknown|C9099618150.79433unknown|C15315230450.82233
unknown|928129300.77330unknown|C15620225710.79823
unknown|9393216110.73806unknown|C16083733840.80318
unknown|9558019230.70636unknown|C1643086600.81822
unknown|975884110.77062unknown|1649576630.79950
unknown|979837650.76167unknown|C1667571800.72802
Table 2

RSCU genes with low-CAI value (RSCU_L) for EBV-GD1

CodonAAObsFreqRSCU_LCodonAAObsFreqRSCU_L
UAG*00.000UGA*11.000
GCUA120.361UAA*22.000
GCCA200.602GCGA80.241
UGUC30.750GCAA932.797
GAUD341.172UGCC51.250
GAGE210.792GACD240.828
UUUF131.444GAAE321.208
GGUG290.410UUCF50.556
GGCG290.410GGGG761.074
CACH50.455GGAG1492.106
AUUI171.821CAUH171.545
AUCI60.643AUAI50.536
AAGK171.478AAAK60.522
CUCL141.167CUAL141.167
CUUL131.083CUGL70.583
UUGL81.000UUAL81.000
AACN121.143AUGM201.000
CCAP681.744AAUN90.857
CCUP401.026CCCP330.846
CAAQ281.217CCGP150.385
AGAR190.844CAGQ180.783
CGAR101.000AGGR261.156
CGGR121.200CGCR90.900
AGCS110.759CGUR90.900
UCAS242.043AGUS181.241
UCGS40.340UCCS110.936
ACCT141.167UCUS80.681
ACGT70.583ACAT171.417
GUUV100.930ACUT100.833
GUCV121.116GUGV111.023
UGGW121.000GUAV100.930
UAUY101.429UACY40.571

ObsFreq: observation frequency; AA: amino acid. RSCU_L: Low relative synonymous codon usage.

Table 3

RSCU genes with high-CAI value (RSCU_H) for EBV-GD1

CodonAAObsFreqRSCU_ HCodonAAObsFreqRSCU_ H
UAG*11.000UGA*00.000
GCUA80.246UAA*22.000
GCCA952.923GCGA180.554
UGUC80.390GCAA90.277
GAUD220.518UGCC331.610
GAGE701.750GACD631.482
UUUF330.971GAAE100.250
GGUG30.132UUCF351.029
GGCG411.802GGGG391.714
CACH311.442GGAG80.352
AUUI150.703CAUH120.558
AUCI401.875AUAI90.422
AAGK631.703AAAK110.297
CUCL581.415CUAL90.220
CUUL40.098CUGL932.268
UUGL101.818UUAL10.182
AACN371.609AUGM291.000
CCAP150.779AAUN90.391
CCUP140.727CCCP361.870
CAAQ100.417CCGP120.623
AGAR90.529CAGQ381.583
CGAR40.219AGGR251.471
CGGR291.589CGCR341.863
AGCS271.636CGUR60.329
UCAS70.444AGUS60.364
UCGS161.016UCCS311.968
ACCT351.892UCUS90.571
ACGT251.351ACAT120.649
GUUV40.143ACUT20.108
GUCV371.321GUGV662.357
UGGW151.000GUAV50.179
UAUY110.379UACY471.621

ObsFreq: observation frequency; AA: amino acid. RSCU_H: High relative synonymous codon usage.

Figure 1

It shows relative synonymous codon usage (RSCU) for high-CAI and low-CAI genes (RSCU_H and RSCU_L, respectively) for 64 codons of EBV-GD1.

Discussion

In molecular biology, one of the fundamental questions is genetic codes. In microorganisms, the unequal usage of synonymous codons, due to both of the mutation and the pressure of usual normal selection, has been accepted as the most common hypothesis which could effect on translation level. The CAI has used highly expressed genes from a species to evaluate the relative merits of each codon. CAI has also used for gene expression and translation efficiency [23]. The mRNA translation efficiency has depended partially on mRNA coding strategy, and has reflected codon usage bias. Codon usage bias has often determined by codon-specific, as well as the other existing gene-specific. A representative of codon-specific could be the RSCU or relative synonymous codon usage [24], and a representative of the gene-specific could be the codon adaptation index or CAI. CAI is a measure index of translation elongation rate according to our finding of highly expressed genes [25]. Clarifying in a different better way, highly expressed genes would be under pressure to use abundant, or common, or cheap amino acids. On the other hand, we couldn’t produce a big mass of the protein that its amino acids components would be rare or expensive. According to previous data, highly expressed genes which would use codons,have distinguished by the most abundant tRNA, in order to code each amino acid. For this matter, highly biased codon has used in highly expressed genes, especially in organisms with rapidly replication [23-28]. Finding the highly and lowly expressed genes in organisms, we might be able to select them as the main targets in pharmacology, especially in vaccine production. CAI has calculated with a reference set of highly expressed genes. The maximum CAI is 1, and the minimum is 0. In general, the higher that the CAI value would be, caused the mRNA have translatedmuch more efficient. Highly expressed human genes typically have CAI value above 0.7, have given the human reference set of highly expressed genes [23]. The results have shown CAI values for the EBV-GD1 genes were 0.76356 ± 0.02957. Our result have agreed with Knipe et al. (2001) that EBV is an extremely efficient virus, which has infected a large majority of the adult population, as well as following primary infection, EBV has remained in the infected host as a lifelong asymptomatic infection [26]. Xia (2007) has determined that the viruses which have caused acute diseases, as well as being pathogen, need to translate their mRNAs efficiently [27].Figure 1 plots the RSCU for the high-CAI genes (RSCU_H) and low-CAI genes (RSCU_L) of the 64 codons. It has shown that high-CAI genes (representing highly expressed genes) have RSCU values deviating much more from 1 than the low-CAI genes (representing lowly expressed genes) relatively. The results have shown that highly expressed genes mostly had more codon usage bias than lowly expressed genes (Figure 1) but ANOVA has not shown a significant difference (P>0.05). This might be related to EBV, that has two different form of existence: latent and productive. The EBV genes that have been expressed during latency, has show codon usage highly different from the genes that would be expressed during lytic growth [29]. For example, what could we say about the tRNA carrying alanine? From the results, GCC is the most frequently used codon, but we might predict that tRNAAla/AGG might be the most abundant. How could we test this prediction? Unfortunately this is extremely difficult experiment and all these data could be used in order to highlight the genes with high rate of expressions, related to its importance in EBV-GD1, then for this important reason might introduce a basis to understand the pathogenesis of EBV-GD1 to open a window to produce a better product or vaccine, in order to challenge with the virus.

Conclusion

The results might provide and introduce a system and its principles, in order to understand the pathogenesis then evolution of EBV-GD1 and opening a window to make a better product or vaccine to challenge with the virus. Based on the results, we could find which genes or sequences would be highly expressed, or under strong natural selection to maximize translation efficiency and accuracy in order to optimize their codon usage. To say in a different way, selection should be weak for lowly expressed genes that codon usage might largely depend on mutation bias [27].
  22 in total

1.  Epstein-Barr virus-associated Hodgkin's disease: epidemiologic characteristics in international data.

Authors:  S L Glaser; R J Lin; S L Stewart; R F Ambinder; R F Jarrett; P Brousset; G Pallesen; M L Gulley; G Khan; J O'Grady; M Hummel; M V Preciado; H Knecht; J K Chan; A Claviez
Journal:  Int J Cancer       Date:  1997-02-07       Impact factor: 7.396

2.  The impact of intragenic CpG content on gene expression.

Authors:  Asli Petra Bauer; Doris Leikam; Simone Krinner; Frank Notka; Christine Ludwig; Gernot Längst; Ralf Wagner
Journal:  Nucleic Acids Res       Date:  2010-03-04       Impact factor: 16.971

3.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

4.  A theoretical analysis of codon adaptation index of the Boophilus microplus bm86 gene directed to the optimization of a DNA vaccine.

Authors:  Lina María Ruiz; Gemma Armengol; Edwin Habeych; Sergio Orduz
Journal:  J Theor Biol       Date:  2005-09-19       Impact factor: 2.691

5.  Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses.

Authors:  S Karlin; B E Blaisdell; G A Schachtel
Journal:  J Virol       Date:  1990-09       Impact factor: 5.103

6.  Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis.

Authors:  S Garcia-Vallvé; J Palau; A Romeu
Journal:  Mol Biol Evol       Date:  1999-09       Impact factor: 16.240

7.  Cancer occurrence in Iran in 2002, an international perspective.

Authors:  Alireza Sadjadi; Mehdi Nouraie; Mohammad Ali Mohagheghi; Alireza Mousavi-Jarrahi; Reza Malekezadeh; Donald Maxwell Parkin
Journal:  Asian Pac J Cancer Prev       Date:  2005 Jul-Sep

8.  Nasal T-cell lymphoma causally associated with Epstein-Barr virus: clinicopathologic, phenotypic, and genotypic studies.

Authors:  Y Harabuchi; S Imai; J Wakashima; M Hirao; A Kataura; T Osato; S Kon
Journal:  Cancer       Date:  1996-05-15       Impact factor: 6.860

9.  Molecular archaeology of the Escherichia coli genome.

Authors:  J G Lawrence; H Ochman
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-04       Impact factor: 11.205

10.  An environmental signature for 323 microbial genomes based on codon adaptation indices.

Authors:  Hanni Willenbrock; Carsten Friis; Agnieszka S Juncker; David W Ussery
Journal:  Genome Biol       Date:  2006       Impact factor: 13.583

View more
  1 in total

1.  Listeria Monocytogenes La111 and Klebsiella Pneumoniae KCTC 2242: Shine-Dalgarno Sequences.

Authors:  Gholamreza Motalleb
Journal:  Int J Mol Cell Med       Date:  2014
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.