Literature DB >> 27882066

Bioinformatic Analysis of Codon Usage and Phylogenetic Relationships in Different Genotypes of the Hepatitis C Virus.

Mojtaba Mortazavi1, Mohammad Zarenezhad2, Seyed Moayed Alavian3, Saeed Gholamzadeh4, Abdorrasoul Malekpour4, Mohammad Ghorbani5, Masoud Torkzadeh Mahani1, Safa Lotfi1, Ali Fakhrzad6.   

Abstract

BACKGROUND: The hepatitis C virus (HCV) has six major genotypes. The purpose of this study was to phylogenetically investigate the differences between the genotypes of HCV, and to determine the types of amino acid codon usage in the structure of the virus in order to discover new methods for treatment regimes.
METHODS: The codon usage of the six genotypes of the HCV nucleotide sequence was investigated through the online application available on the website Gene Infinity. Also, phylogenetic analysis and the evolutionary relationship of HCV genotypes were analyzed with MEGA 7 software.
RESULTS: The six genotypes of HCV were divided into two groups based on their codon usage properties. In the first group, genotypes 1 and 5 (74.02%), and in the second group, genotypes 2 and 6 (72.43%) were shown to have the most similarity in terms of codon usage. Unlike the results with respect to determining the similarity of codon usage, the phylogenetic analysis showed the closest resemblance and correlation between genotypes 1 and 4. The results also showed that HCV has a GC (guanine-cytosine) abundant genome structure and prefers codons with GC for translation.
CONCLUSIONS: Genotypes 1 and 4 demonstrated remarkable similarity in terms of genome sequences and proteins, but surprisingly, in terms of the preferred codons for gene expression, they showed the greatest difference. More studies are therefore needed to confirm the results and select the best approach for treatment of these genotypes based on their codon usage properties.

Entities:  

Keywords:  Bioinformatic Study; Codon Usage; Hepatitis C Virus; Phylogenetic Analysis

Year:  2016        PMID: 27882066      PMCID: PMC5111459          DOI: 10.5812/hepatmon.39196

Source DB:  PubMed          Journal:  Hepat Mon        ISSN: 1735-143X            Impact factor:   0.660


1. Background

There are several factors which can cause hepatitis, including certain drugs, chemicals, and infectious agents (1). Different infectious agents’ resulting viruses are involved in the pathogenesis of hepatitis, such as hepatitis viruses A, B, C, D, and, E (2). Among these diseases, hepatitis B and C are considered to be more serious and can become chronic (3, 4). Hepatitis C (HCV) is a viral infection that causes either acute or chronic liver inflammation (5). HCV is from the Flaviviridae family and the hepacivirus genus, and has a single-strand RNA (ribonucleic acid) genome (6). It leads to inflammation of the liver, and is one of the most common causes of liver transplants in the world (7-9). In 70% of cases, the disease becomes chronic; self-improvement may occur in 30% of cases (10). Annually, three to five million people are infected with the virus worldwide, and it is estimated that 170 million people are currently infected with the virus around the world (5). Chronic infection with HCV causes deaths due to decompensated cirrhosis, end-stage liver disease, and hepatocellular carcinoma (11). HCV has high molecular diversity, six major genotypes (named from 1- 6), and over 70 sub-genotypes named a, b, and c (12). Therapeutic programs usually begin with rapid determination of HCV genotypes, because genotyping influences the duration of treatment and the impact of the sustained virological response (SVR) (13). The genetic code reveals that a high ratio of amino acids are encoded by multiple (two to six) codons, which generally differ only at the third codon’s nucleotide (14, 15). This understanding has led to the identification of some important facts about the virus, as patterns of codon usage vary among species (16). Although each codon is specific to only one amino acid, a single amino acid may be coded by more than one codon. Such groups of codons coding a single amino acid are known as synonymous codons (e.g., there are six synonymous codons of leucine). In total, 18 of the 20 amino acids can be encoded by more than one codon due to variations at the third nucleotide position within a particular codon. Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA (17). Codon usage study can help clarify the evolution of a particular species (14). Recent studies have shown that synonymous codons or the equivalent of an amino acid are not used with the same frequency, and each type of codon usage, in organisms and even between the genes of one organism, is different (18). As HCV exhibits high genetic diversity, this poses a challenge for the improvement of vaccines and pan-genotypic treatment methods (19). Multiple genotypes and subtypes of HCV have been identified via the analysis of nucleotide sequences (20). Characterization of these genetic properties and the possible differences between these genotypes is likely to facilitate and contribute to the development of effective prevention and treatment protocols against HCV infection (21). Previously, we were the first to have studied rare codon clusters (RCCs) and their locations in structures of HCV proteins (22).

2. Objectives

In this project, a bioinformatic study of different genotypes of HCV was conducted to check the phylogenetical differences between these genotypes, as well as the amino acid codon usage in the structure of the virus. It was hoped that more precise and effective approaches could then be chosen for treatment regimens using the findings of this study.

3. Methods

3.1. HCV Genome Sequences

For the bioinformatic analysis, the nucleotide sequences and features of the six genotypes of HCV were obtained from the following website : http://www.ncbi.nlm.nih.gov/genome/genomes/10312 (Table 1).
Table 1.

Genetic Properties of HCV Genotypes

HCV-G1HCV-G2HCV-G3HCV-G4HCV-G5HCV-G6
Locus NC_004102, 9646 bp ss-RNA linear, VRL 17-JUN-2016NC_009823, 9711 bp RNA linear, VRL 26-JUL-2011NC_009824, 9456 bp RNA linear, VRL 27-JUL-2011NC_009825, 9355 bp RNA linear, VRL 26-JUL-2011NC_009826, 9343 bp RNA linear, VRL 26-JUL-2011NC_009827, 9628 bp RNA linear, VRL 26-JUL-2011
Accession NC_004102NC_009823NC_009824NC_009825NC_009826NC_009827
Version NC_004102.1, GI:22129792NC_009823.1, GI:157781212NC_009824.1, GI:157781216NC_009825.1, GI:157781208NC_009826.1, GI:157781210NC_009827.1, GI:157781214
Serotype 1a2a3a4a5a6b
Db_Xref Taxon:11103, GeneID:951475Taxon:40271, GeneID:11027172Taxon:356114, GeneID:11027185Taxon:33745, GeneID:11027168Taxon:33746, GeneID:11027170Taxon:42182, GeneID:11027174
Protein ID NP_671491.1YP_001469630.1YP_001469631.1YP_001469632.1YP_001469633.1YP_001469634.1
Db_Xref GI:22129793, GeneID:951475GI:157781213 GeneID:11027172GI:157781217, GeneID:11027185GI:157781209, GeneID:11027168GI:157781211, GeneID:11027170GI:157781215, GeneID:11027174

3.2. Analysis of Codon Usage

In the next step, the frequency, number, and fraction of 61 codons for each amino acid were evaluated within the structure of HCV proteins, and the preferred codons were extracted using the information provided on the Gene Infinity website: http://www.geneinfinity.org/sms/sms_codonusage.html (23) (Table 2).
Table 2.

The Nucleotide Compositional Properties of the Six HCV Genotypes

HCV-G1HCV-G2HCV-G3HCV-G4HCV-G5HCV-G6
%G1 + C1 57.3955.7556.6056.0756.4755.81
%G1 + A1 57.6257.9656.4758.0657.6057.80
%G1 + T1 51.9453.0252.8652.7851.9652.10
%A1 + T1 42.6144.2543.4043.9343.5344.19
%A1 + C1 48.0646.9847.1447.2248.0447.90
%C1 + T1 42.3842.0443.5341.9442.4042.20
%G2 + C2 50.6150.3550.4549.2949.7050.15
%G2 + A2 44.5443.5244.6543.8044.7244.05
%G2 + T2 49.6248.6048.5948.3548.6448.79
%A2 + T2 49.3949.6549.5550.7150.3049.85
%A2 + C2 50.3851.4051.4151.6551.3651.21
%C2 + T2 55.4656.4855.3556.2055.2855.95
%G3 + C3 68.5866.2459.9163.0564.7661.08
%G3 + A3 43.0844.2144.3644.6044.1645.28
%G3 + T3 47.8647.2549.5547.0948.7448.56
%A3 + T3 31.4233.7640.0936.9535.2438.92
%A3 + C3 52.1452.7550.4552.9151.2651.44
%C3 + T3 56.9255.7955.6455.4055.8454.72
%G3s + C3s 67.2064.6058.0861.4863.2959.34
Also, phylogenetic analysis and the evolutionary relationship of HCV genotypes were evaluated using MEGA 7 software (24). The analysis of the deduced amino acid sequences from the collected samples and data obtained from GenBank was performed through the construction of a phylogenetic tree with maximum parsimony using MEGA 7. The frequencies of the used codons were reported as descriptive statistics. The software Minitab version 16.0 was used for statistical analysis (24).

3.3. Compositional Properties Measures

To examine the compositional properties of the six HCV sequences, GC1s,2s,3s, GA1s,2s,3s, GT1s,2s,3s, AT1s,2s,3s, AC1s,2s,3s, and CT1s,2s,3s (the frequencies of nucleotide G + C, G+A, G+T, A+T, A+C, and C+T at the first, second and third codon position) within each open reading frame (ORF) were calculated. This calculation was done using the CAIcal web server (25).

4. Results

4.1. Cluster Codon Analysis

The results of the cluster codon analysis showed that the codon usage for terminal nucleotides of all amino acids included C and G. For example, the amino acids alanine (Ala), glycine (Gly), tyrosine (Tyr), and valine (Val), which each have four codon codes, had reported terminal nucleotides with codon usage of C or G. The results of the cluster codon analysis also showed that genotypes were divided into two groups with 4% similarity: genotypes 1, 5, and 3 in one group, and genotypes 2, 6, and 4 in the other group. In the first group, genotypes 1 and 5 had the highest similarity of codon usage (74.02%), and in the second group, genotypes 2 and 6 showed the highest similarity of codon usage (72.43%). The most differences in codon usage were detected between genotype 1 from the first group and genotype 4 from the second group, with 4% similarity in terms of preferred codons (Figure 1).
Figure 1.

Similarity of Codon Usage Between HCV Genotypes

Phylogenetic analysis of the genotypes showed that closest resemblances were between genotypes 1 and 4 (Figure 2). The close proximity of the genotypes 1 and 4 in the tree diagram represented a similarity in their gene and protein sequence, but codon usage analysis showed that genotypes 1 and 4 had minimal similarity and maximal distance. This phylogenetic analysis also indicated that genotypes 1 and 2 had the most significant phylogenetical distance (Figure 2).
Figure 2.

Molecular Evolution and Phylogenetic Diagram of HCV Genotypes

4.2. Compositional Properties of the Genomes in HCV Genotypes

The compositional properties of the genomes of the six HCV genotypes in the CAIcal web server showed that these HCV genotypes have the similar contents of GC1s,2s,3s, GA1s,2s,3s, GT1s,2s,3s, AT1s,2s,3s, AC1s,2s,3s, and CT1s,2s,3s (Table 3). It was found that the frequency of GC1s, 2s, 3s was higher in comparison with other nucleotide compositions. The minimum frequency of nucleotide composition belonged to AT3s. These results showed that HCV is a GC abundant virus.
Table 3.

The Frequency, Number, and Fraction of Each of the 61 Codons for Each Amino Acid in the Protein Structure of HCV Genotypes

Amino AcidsCodonHCV-G1HCV-G2HCV-G3HCV-G4HCV-G5HCV-G6
NumberFractionNumberFractionNumberFractionNumberFractionNumberFractionNumberFraction
Ala GCG640.23610.22520.19550.21560.21500.19
GCA460.17420.15540.20490.19500.18550.21
GCT550.20760.28810.30690.26580.21700.27
GCC1120.40970.35870.32900.341090.40890.34
Cys TGT320.31210.24300.31250.29370.37410.41
TGC710.69660.76660.69610.71620.63580.59
Asp GAT330.28380.29550.42400.30360.28440.34
GAC860.72910.71770.58950.70930.72870.66
Glu GAG840.72870.77760.66790.70620.58840.76
GAA320.28260.23400.34340.30450.42270.24
Phe TTT310.36390.43360.38280.30270.29410.48
TTC560.64520.57590.62660.70650.71450.52
Gly GGG740.29870.33240.30610.25920.36650.26
GGA350.14440.17470.19480.20340.13500.20
GGT420.16260.10520.21440.18510.20510.21
GGC1040.411050.40730.30910.37800.31800.33
His CAT200.43200.34430.61280.38280.40270.38
CAC380.57390.66270.39460.62420.60450.62
Ile ATA330.25300.22400.32310.23330.25400.29
ATT240.18310.23250.20270.20320.24270.20
ATC740.56750.55610.48760.57690.51710.51
Lys AAG630.68600.59610.66690.68840.72480.58
AAA300.32420.41320.34330.32330.28420.42
Leu TTG380.12550.18540.18510.17460.15540.18
TTA90.03230.08210.07220.07240.08150.05
CTG980.32630.21700.24700.24750.24680.23
CTA210.07330.11340.11280.09320.10370.12
CTT520.17400.13480.16510.17560.18360.12
CTC870.29880.28690.23750.25740.24880.30
Met ATG561.00721.00631.00551.00551.00621.00
Asn AAT250.29300.39260.33460.51360.40310.51
AAC610.71460.61530.67440.49530.60470.49
Pro CCG340.16330.16300.14420.21480.22330.16
CCA350.17410.19570.27570.28310.14460.22
CCT560.27460.22600.29450.22470.22620.30
CCC820.40910.43630.30600.29910.42690.33
Gln CAG520.59570.61550.59450.56530.62580.46
CAA360.41360.39390.41350.44330.38320.36
Arg AGG530.30470.27330.18340.20430.25430.25
AGA260.14300.17300.16370.22290.17360.21
CGG340.19330.19310.17280.17360.21270.16
CGA130.07140.08170.09140.08120.07130.08
CGT150.08160.09260.14130.08170.10200.12
CGC380.21320.19450.25430.25320.19320.19
Ser AGT150.07190.09220.10130.06170.08210.09
AGC490.23360.16450.20430.20410.20450.20
TCG250.12270.12240.11340.16210.10260.11
TCA270.13300.13330.14410.19280.14490.22
TCT280.13360.16440.19370.17360.18400.18
TCC700.33750.34600.26480.22580.29460.20
Thr ACG500.23350.15340.16340.15490.23430.19
ACA330.15520.23510.23520.22440.20580.25
ACT440.20520.23620.28500.22440.20450.20
ACC890.41890.39720.33960.41790.37840.37
Val GTG980.41900.39870.38980.39830.36910.38
GTA250.10230.10320.14370.15340.15380.16
GTT350.15300.13350.15390.16430.19400.17
GTC830.34890.38740.32770.31690.30710.30
Trp TGG711.00681.00691.00681.00661.00671.00
Tyr TAT290.30380.37390.37380.38350.35410.41
TAC690.70660.63660.63620.62660.65580.59
Terminal Codon TGA1.001.001.001.001.001.000.000.001.001.000.000.00
TAG0.000.000.000.000.000.001.001.000.000.001.001.00
TAA0.000.000.000.000.000.000.000.000.000.000.000.00

4.3. Prevalence of Preferred (Used) Codons

Figure 3 shows the prevalence of the preferred (used) codons in the HCV genotypes. Here, it can be seen which codon is preferred and used more than other codons. The results showed that the most preferred codon usage for all of the amino acids was, in order, as follows: Ala (GCC), Cys (TGC), Asp (GAC), Glu (GAG), Phe (TTC), Gly (GGC), His (CAC), Ile (ATC), Lys (AAG), Leu (CTC), Asn (AAC), Pro (CCC), Gln (CAG), Arg (AGG), Ser (TCC), Thr (ACC), Val (GTG), Tyr (TAC), and the stop codon (TGA-TAG). Also, the least preferred codons for all of the amino acids was, in order, as follows: Ala (GCA), Cys (TGT), Asp (GAT), Glu (GAA), Phe (TTT), Gly (GGA), His (CAT), Ile (ATT), Lys (AAA), Leu (TTA), Asn (AAT), Pro (CCG), Gln (CAA), Arg (CGA), Ser (AGT), Thr (ACG), Val (GTA), Tyr (TAT), and the stop codon (TAA; not used). Met (ATG) and Trp (TGG) had one codon. The results of the cluster codon analysis also showed that the lowest codon usages for terminal nucleotides among all amino acids, with the exception of Met, Trp, Thr, and Pro, were A and T.
Figure 3.

Frequency of Used Codons in HCV Genotypes

5. Discussion

HCV is the leading causes for chronic liver disease (1, 2), with the possibility of leading to chronic hepatitis and eventually hepatocellular carcinoma (HCC) (26). In addition to the clinical and epidemiological significance of HCV, genotyping has significant prognostic value and can be used to help determine the progress and treatment protocols of the disease (21). The amino acid sequences of proteins are determined by three nucleotide codons. Living organisms use standard genetic codes including 61 codons for 20 amino acids, with some amino acids having more than one codon. The pressure on the translated codons is to prefer (use) some codons rather than others for effective protein expression (27). Changes in the patterns of codon usage can lead to changes in response to the treatment of nucleotide-like drugs. Genotypes that have the greatest differences in codon usage may lead to significant differences in the response to and duration of treatments with the same drug regimens. The reason can be attributed to the pattern of using similar nucleotide codons in these two genotypes. In this study, the biggest similarities in codon usage were observed between genotypes 1 and 5; therefore, it was expected that the results regarding the dosage and treatment protocol for genotypes 1 and 4 would be reversed. Despite the significant differences in codon usage among genotypes 1 and 4, the two genotypes had the phylogenetically closest resemblances, indicating more similarities in their genome and protein sequences. The most significant phylogenetical difference was observed between genotypes 1 and 2, which indicated that these two genotypes had the greatest difference in terms of the sequences of genomes and protein. The results of the codon usage analysis showed that some codon usages, such as Gln (CAG, CAA), Ser (AGC), and Trp (TGG), had very similar frequencies in all of the HCV genotypes. This result is very important, as these residues may have a critical role in determining the final structure of the HCV proteins. However, it is essential to confirm this conclusion with more experimental evidence. As the results of this study showed, the most preferred terminal nucleotides in codon usage for all of the amino acids were C and G. Consequently, the least preferred terminal nucleotides in codon usage for all of the amino acids were T and A. This is a very important finding, and as previously reported, an additional layer of hidden information lies within the codon sequence and beyond the amino acid sequence (28). Studies of such hidden information in codon sequences can reveal the molecular evolution of the organisms, and provide insights into the functional categories and histories of the genes in the respective genome. Codon usage analysis can also contribute to understanding the interaction between RNA viruses and the immune responses of the hosts (29). These findings showed that all of the transfer RNAs (tRNA) had C and G in the first nucleotides for anti-codon usage among all of the amino acids and, consequently, codon-anti-codon interaction in messenger RNA (mRNA) translation would be very strong. As a result, the average binding energy in codon-anti-codon interaction in hepatitis C is more than that with human cell interaction with HCV, and the mRNA and tRNA translation is stronger here than among similar human cell components (30). Based on the nucleotide structure of the codons, different used codons have special interactive affinity to anti-codons, and this thus leads to different powers of translation. Used codons that have C and G nucleotides in their structures have more energy in their affinity to anti-codons. The exact calculation of this energy can help us to better understand the mechanisms of successful HCV replication and pathogenicity. In this study, we were able to detect a layer of hidden information within the codon sequences of HCV genomes. Here, we report these findings for the first time, and we believe that they are very critical for planning new research projects and designing new drugs that will influence codon-anti-codon interaction. The findings of such bioinformatic studies can be used for further practical research and clinical trials, and help us establish a better understanding of HCV replication and pathogenesis. Such an analysis conducted on other viral agents of hepatitis could also provide new insights in the field of viral behavior.
  29 in total

1.  The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences.

Authors:  P Stothard
Journal:  Biotechniques       Date:  2000-06       Impact factor: 1.993

2.  Approximation of genetic code via cell-free protein synthesis directed by template RNA.

Authors:  M W NIRENBERG; J H MATTHAEI; O W JONES; R G MARTIN; S H BARONDES
Journal:  Fed Proc       Date:  1963 Jan-Feb

3.  Nucleotide modifications and tRNA anticodon-mRNA codon interactions on the ribosome.

Authors:  Olof Allnér; Lennart Nilsson
Journal:  RNA       Date:  2011-10-25       Impact factor: 4.942

Review 4.  Natural history of hepatitis C.

Authors:  A Alberti; L Chemello; L Benvegnù
Journal:  J Hepatol       Date:  1999       Impact factor: 25.083

5.  Global cancer statistics, 2002.

Authors:  D Max Parkin; Freddie Bray; J Ferlay; Paola Pisani
Journal:  CA Cancer J Clin       Date:  2005 Mar-Apr       Impact factor: 508.702

Review 6.  Clinical significance of hepatitis C virus genotypes.

Authors:  N N Zein
Journal:  Clin Microbiol Rev       Date:  2000-04       Impact factor: 26.132

7.  Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: updated criteria and genotype assignment web resource.

Authors:  Donald B Smith; Jens Bukh; Carla Kuiken; A Scott Muerhoff; Charles M Rice; Jack T Stapleton; Peter Simmonds
Journal:  Hepatology       Date:  2014-01       Impact factor: 17.425

8.  The characteristics of rare codon clusters in the genome and proteins of hepatitis C virus; a bioinformatics look.

Authors:  Mohammadreza Fattahi; Abdorrasoul Malekpour; Mojtaba Mortazavi; Alireza Safarpour; Nasrin Naseri
Journal:  Middle East J Dig Dis       Date:  2014-10

9.  Causes and implications of codon usage bias in RNA viruses.

Authors:  Ilya S Belalov; Alexander N Lukashev
Journal:  PLoS One       Date:  2013-02-25       Impact factor: 3.240

10.  CAIcal: a combined set of tools to assess codon usage adaptation.

Authors:  Pere Puigbò; Ignacio G Bravo; Santiago Garcia-Vallve
Journal:  Biol Direct       Date:  2008-09-16       Impact factor: 4.540

View more
  3 in total

1.  In-silico Evaluation of Rare Codons and their Positions in the Structure of ATP8b1 Gene.

Authors:  Zarenezhad M; Dehghani S M; Ejtehadi F; Fattahi M R; Mortazavi M; Tabei S M B
Journal:  J Biomed Phys Eng       Date:  2019-02-01

2.  Molecular Modelling and Evaluation of Hidden Information in ABCB11 Gene Mutations.

Authors:  Zarenezhad M; Dehghani S M; Ejtehadi F; Fattahi M R; Mortazavi M; Tabei S M B
Journal:  J Biomed Phys Eng       Date:  2019-06-01

3.  A comparative genomics-based study of positive strand RNA viruses emphasizing on SARS-CoV-2 utilizing dinucleotide signature, codon usage and codon context analyses.

Authors:  Jayanti Saha; Sukanya Bhattacharjee; Monalisha Pal Sarkar; Barnan Kumar Saha; Hriday Kumar Basak; Samarpita Adhikary; Vivek Roy; Parimal Mandal; Abhik Chatterjee; Ayon Pal
Journal:  Gene Rep       Date:  2021-02-17
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.