Literature DB >> 24707212

A comparative analysis of synonymous codon usage bias pattern in human albumin superfamily.

Hoda Mirsafian1, Adiratna Mat Ripen2, Aarti Singh1, Phaik Hwan Teo1, Amir Feisal Merican3, Saharuddin Bin Mohamad3.   

Abstract

Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Though the frequency of codon usage is not equal across species and within genome in the same species, the phenomenon is non random and is tissue-specific. Several factors such as GC content, nucleotide distribution, protein hydropathy, protein secondary structure, and translational selection are reported to contribute to codon usage preference. The synonymous codon usage patterns can be helpful in revealing the expression pattern of genes as well as the evolutionary relationship between the sequences. In this study, synonymous codon usage bias patterns were determined for the evolutionarily close proteins of albumin superfamily, namely, albumin, α-fetoprotein, afamin, and vitamin D-binding protein. Our study demonstrated that the genes of the four albumin superfamily members have low GC content and high values of effective number of codons (ENC) suggesting high expressivity of these genes and less bias in codon usage preferences. This study also provided evidence that the albumin superfamily members are not subjected to mutational selection pressure.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24707212      PMCID: PMC3951064          DOI: 10.1155/2014/639682

Source DB:  PubMed          Journal:  ScientificWorldJournal        ISSN: 1537-744X


1. Introduction

Amino acids, the monomeric unit of proteins, are encoded by triplet of nucleotides called codons. Most of the amino acids have alternative codons which are known as synonymous codons. The frequencies with which these synonymous codons are used are unequal [1], some codons being used preferentially than others. Furthermore, Plotkin et al. [2] reported that codon usage is tissue-specific. The phenomenon of codon usage bias, which can be interpreted as an outcome of either mutational bias or translational selection, is an essential feature of most genomes across all the three domains of life [3]. The patterns of codon usage within the mammalian genomes are markedly different from other taxa. In mammals, the codon usage bias is found to be influenced by the variation in isochores (GC content) or variation in tRNA pool of the cell [4, 5]. The differences in codon usage or the variation in tRNA abundance can elicit varied responses to the environmental changes, in terms of regulation of translation mechanism and cell phenotype [6]. Urrutia and Hurst [7] reported that, in humans, the codon usage bias is positively related to gene expression but is inversely related to the rate of synonymous substitution. Several factors contribute to synonymous codon usage bias such as gene expression level, protein hydropathy, protein secondary structure, and translational selection [8-11]. Information on the synonymous codon usage pattern can provide significant insights pertaining to the prediction, classification, and molecular evolution of genes and design of highly expressed genes and cloning vectors [12]. It may be useful in better understanding of host-pathogen interactions as information on synonymous codon usages can reveal about the host-pathogen coevolution and adaptation of pathogens to specific hosts [13]. The evolutionarily close proteins of albumin superfamily are comprised of albumin (ALB), α-fetoprotein (AFP), vitamin D-binding protein (VDBP), and afamin (AFM). In human, the genes encoding these proteins are mapped to chromosome 4. These proteins are synthesized primarily and predominantly in liver but the expression pattern varies temporally. One common functional property amongst all the members of albumin superfamily is their tendency to serve as transporters to various cellular components, metabolites, and so forth. ALB, an abundant serum protein of MW of ~66 KDa, binds and transports a variety of ligands such as steroids, fatty acids, bilirubin, lysolecithin, prostaglandins, thyroid hormones, and drugs. In addition to this, ALB is known to be involved in various cellular functions including oxygen-free radicals scavenging, anticoagulation, and maintenance of physiological pH and oncotic pressure of the plasma [14]. AFP (MW ~67 KDa), a serum glycoprotein which is expressed at high levels by fetal liver and visceral yolk sac [15, 16], is critical for the female fertility rather than embryonic development [17]. VDBP or Gc globulin (MW ~58 KDa) is synthesized by various tissues, namely, liver, kidneys, gonads, and fat, and also by neutrophils [18]. Apart from binding and transporting vitamin D sterols, VDBP's physiological functions include scavenging of G-actin [19], macrophage activation [20], and enhancement of chemotactic activity of C5a and C5a des-Arg molecules [21, 22]. AFM or α-albumin (MW ~87 KDa) is synthesized by liver and brain capillary endothelial cells. It mediates the transport of α-tocopherol across the blood-brain barrier [23]. The members of albumin superfamily have been found to act as markers in various disease states in humans. AFP in maternal serum is an indicative of Down's syndrome and neural tube defects in the fetus [24, 25]. AFP levels are elevated in patients with high risk for hepatocellular carcinoma. In some patients, an increase in AFP levels manifests liver metastasis with gastric cancer and the condition is termed as α-fetoprotein producing gastric cancer (AFPGC) [26, 27]. VDBP may serve as a biomarker for vascular injury as predicted by proteomic identification [28]. AFM may act as a potential adjunct marker to cancer antigen 125 (CA125) for the diagnosis of ovarian cancer [29]. A vast array of research has been done on the members of albumin superfamily; however, so far, studies related to the usage of synonymous codon and the factors influencing the codon usage in this gene family have not been done. In this study, we applied bioinformatics approaches to elucidate the pattern of synonymous codon usage bias and its consequences on the expression level of genes in the albumin superfamily.

2. Materials and Methods

2.1. Sequences

The mRNA reference sequences of human serum albumin (ALB), afamin (AFM), α-fetoprotein (AFP), and vitamin D-binding protein (VDBP) in FASTA format were retrieved from GenBank of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/genbank/). Open Reading Frame (ORF) of the mRNA sequences of human albumin superfamily was obtained by using ExPASy Translate tool (http://web.expasy.org/translate/).

2.2. Hydrophobicity Analysis

Grand average of hydrophobicity score (Gravy score) was calculated to quantify the general average hydrophobicity for the translated gene product found in albumin superfamily. It was calculated as the arithmetic mean of the sum of the hydrophobic indices of each amino acid as shown in where N corresponds to the number of amino acids, while K represents hydrophobic index of amino acid. The Gravy score of a protein can be either negative or positive depending on the frequency of amino acids with distinct properties. Negative Gravy score implies that the protein is hydrophilic and is soluble in water. In contrast, protein with positive Gravy is considered as hydrophobic and is water soluble [30].

2.3. Codon Usage Analysis

The nucleotide distribution for albumin superfamily was analyzed using ExPASy ProtParam tool (http://web.expasy.org/protparam/). The quantities of individual nucleotide (A, T, G, and C) were determined and used to sum up the AT and GC content for each protein in the albumin superfamily.

2.4. Rare Codon (RC) Analysis

Rare codon (RC) is considered as low-usage codon in the genome such as synonymous codon or stop codon [31]. The RC analysis was performed using the GenScript web server (http://www.genscript.com/cgi-bin/tools/rare_codon_analysis/) to examine the number of highest-usage and lowest-usage codons in the human albumin superfamily.

2.5. Indices of Codon Usage Deviation

Indices of codon usage deviation were calculated using CodonW (J Peden, version 1.4.2 http://codonw.sourceforge.net/) [32] to measure deviation between the observed codon usage and expected codon usage. Based on that, two internal measures were applied including identification of GC variation and third nucleotide preference in codon [33, 34]. These were obtained by calculating the number of GC nucleotides and number of G or C nucleotides at the third position of synonymous codon (GC3), except the start and termination codons. In addition, the expected effective number of codons (ENC) for each albumin superfamily protein was calculated. ENC is the measure of codon usage affected only by the GC3 as a consequence of mutation pressure and genetic drift. The ENC was calculated according to [35] where s corresponds to the GC3 value ranging from 0 to 100%.

2.6. Relative Synonymous Codon Usage (RSCU)

Relative synonymous codon usage (RSCU) was calculated in order to examine the frequency of each synonymous codon that encoded the same amino acid without confounding effect on the composition of amino acid. The index was calculated as follows [36]: where X is the amount of jth codon to represent the ith amino acid that can be encoded by n synonymous codons.

3. Results and Discussion

Genomic information of mRNA sequences of the four members of human albumin superfamilyis shown in Table 1. The mRNA sequences of albumin superfamily were translated into protein sequences using the ExPASy Translate Tool. Only the ORF with no intermediate stop codon was selected for codon usage analysis. The similarity of nucleotide and amino acid sequences of the albumin superfamily members is summarized in Figure 1. The results showed that ALB and AFP are more closely related compared to AFM and VDBP. AFP and VDBP have almost similar gene length of 2032 bp and 2024 bp, respectively. ALB possesses the longest (2264 bp), while AFM has the shortest gene length (1997 bp). Moreover, human ALB and AFP possessed exactly the same length of ORF (1830 bp), while AFM (1800 bp) has similar length of the ORF compared to that of ALB and AFP. VDBP (1425 bp) has the shortest length of ORF within the albumin superfamily. The similarity pattern of ORF among ALB, AFM, and AFP indicated that they may carry out similar biological functions, especially AFM, since its function is not well-known.
Table 1

Genomic information of the reference sequences, grand average hydrophobicity score, ENCs, GC content, and GC3 of human albumin superfamily members.

Human albumin superfamily
Albumin (ALB)Afamin (AFM)Alpha-fetoprotein (AFP)Vitamin D-binding protein (VDBP)
GenBank accession numberNM_000477.5NM_001133.2NM_001134.1NM_000583.3
Gene length (bp)2264199720322024
Grand average of hydrophobicity score (Gravy score)−0.354−0.248−0.388−0.336
GC content 42.9542.0239.2844.63
Effective number of codons (ENC)53.9151.6554.7856.62
GC3 38.0037.1037.3042.80
Figure 1

Comparison of percent similarity and identity of nucleotide sequences and amino acid sequences of human albumin superfamily members.

The solubility of protein for the members of the albumin superfamily was assessed through Gravy score (Table 1). All the family members are found to have negative Gravy score, suggesting that these proteins are water soluble. This is in accordancewith the biological role of these proteins as serum transporters. The nucleotide distribution of albumin superfamily is shown in Table 2. The members of this superfamily exhibit low GC content (<44.63%). ALB and AFP shows similar nucleotide distribution pattern implying that they share similarity in their structures and biological functions. There is a close relationship between the nucleotide composition and gene function [37]. AFM has the highest AT content, whereas VDBP has the lowest AT content. Although AFM and VDBP are grouped in the same superfamily, they show differential nucleotide composition suggesting variation in their biological functions compared to the other members of albumin superfamily.
Table 2

Nucleotide distribution of human albumin superfamily members.

ALB (%)AFP (%)AFM (%)VDBP (%)
A30.4 (556)32.6 (596)32.8 (591)29.9 (426)
T26.7 (488)25.4 (465)27.9 (502)25.5 (363)
G23.0 (421)21.7 (397)20.1 (361)21.4 (305)
C19.9 (365)20.3 (372)19.2 (346)23.2 (331)
AT57.04957.97860.72255.368
GC42.95142.02239.27844.632

The values in parenthesis represent the number of individual nucleotides in the genes of human albumin superfamily members.

Rare codon analysis was carried out using the GenScript web server as described in Materials and Methods. A graph of codon frequency distribution was plotted to identify the quantities of rare codons present in each albumin superfamily protein (Figure 2). Frequency of codon usage with a value of 100 indicates that the codons are highly used for a given amino acid. Conversely, the frequency of codon usage with a value of less than 30 is determined as low-frequency codon, which is likely to affect the expression efficiency. Percentages of low-frequency codon present in protein ALB, AFM, AFP, and VDBP are 4%, 3%, 4%, and 4%, respectively. This result suggested that members of the albumin superfamily contain a significantly small number of rare codons that may reduce translational efficiency of the genes.
Figure 2

Codon frequency distribution of human albumin superfamily members.

Indices of codon usage deviation are used to determine the differences between the observed and expected codon usage. The results for the effective number of codon (ENC), GC content, and G or C nucleotides at the third position of synonymous codon are summarized in Table 1. The effective number of codons (ENC) for each member of human albumin superfamily was calculated in order to examine the pattern of synonymous codon usage independent of the gene length. The ENC value ranges from 20 to 61, in which value of 20 indicates extreme bias toward the usage of one codon, while value of 61 represents equal usage of the synonymous codons [35, 38]. Result from this analysis revealed that the ENC value of albumin superfamily varies from 51.65 to 56.62. The overall ENC value of albumin superfamily is greater than 50. The high ENC value suggested that the synonymous codons of albumin superfamily were equally used and hence displayed less biased synonymous codon usage. The GC content of albumin superfamily is given in Table 1. GC content may affect the thermostability, bendability, and the ability of DNA helix transition from B to Z form. GC content can be related to the ability of coding region to be in an open chromatin state, leading to active transcription [39]. It is evident that all the members of albumin superfamily genes have low GC content, indicating that these family members are highly expressed. Furthermore, it has been reported that highly transcribed genes may have low mutation rates because they are subjected to DNA repair [40]. However, within the albumin superfamily, VDBP contains the highest GC content indicating that it has the lowest expressivity level. GC content at the third position of codons (GC3) is a putative indicator of the extent of base composition bias. Table 1 revealed that the albumin superfamily has low GC3 values ranging from 37.1% to 42.8%. The albumin superfamily has low GC3 value because the majority of genes in this superfamily are located in AT-rich region. Genes in AT-rich regions within the genome would prefer to use A or T ending codon. The low usage of codons ending with G or C signifies less GC codon usage bias in albumin superfamily. In other words, it proved the homogeneity of synonymous codon usage pattern in albumin superfamily. The synonymous codon bias usage of each albumin superfamily protein was computed and tabulated in Table 3. The most preferentially used codon for a given amino acid is highlighted in red. Asn of AFP and His, Cys, and Arg of VDBP have equal usage of the synonymous codons. The variation of relative synonymous codon usage (RSCU) values not only indicated the different frequency of occurrence of each codon for a given amino acid in different albumin superfamily protein but also revealed the preference of either A + U or G + C codon usage as listed in Table 3. The results of RSCU analysis (Table 3) are summarized in Table 4. Preferential codon usage in albumin superfamily indicates that the codons with A or U at the third position are more preferred compared to G or C ending codons. Table 4 also shows that the total score of A + U and G + C codon usage in the proteins of albumin superfamily is not equal to 20. It is because some amino acid residues are encoded in equal frequencies by both A or U and G or C ending codons and hence are excluded from the analysis. The tendency of albumin superfamily to use high A + U and low G + C indicated that the mutational bias does not play a significant role in synonymous codon usage.
Table 3

Relative synonymous codon usage in human albumin superfamily members. The value in bold indicates the codons used with high frequency.

Amino acidCodonsRSCU1 NumberRSCU2 NumberRSCU3 NumberRSCU4 Number
PheUUU 1.43 25 1.06 17 1.30 28 1.16 11
UUC0.57100.94150.70150.848

LeuUUA0.94101.0010 1.20 110.535
UUG1.13121.10110.8780.636
CUU 1.78 190.909 1.20 111.1611
CUC0.6670.4040.9890.959
CUA0.3840.9090.6561.0510
CUG1.1312 1.70 171.0910 1.68 16

IleAUU 1.33 4 1.32 151.0710 1.13 3
AUC 1.33 40.7180.757 1.13 3
AUA0.3310.9711 1.18 110.752

ValGUU1.1212 1.47 11 1.44 130.896
GUC0.6570.8060.676 1.19 8
GUA0.7480.8060.6761.047
GUG 1.49 160.9371.22110.896

SerUCU0.6431.268 2.06 121.4310
UCC 1.50 70.4730.8651.299
UCA1.296 1.58 101.036 1.71 12
UCG0.6430.4730.0000.000

ProCCU 1.67 10 1.71 9 1.71 121.389
CCC1.0060.7640.7150.926
CCA1.1771.5281.5711 1.54 10
CCG0.1710.0000.0000.151

ThrACU0.977 1.78 161.18101.2510
ACC1.2490.6760.827 1.38 11
ACA 1.52 111.3312 1.53 131.139
ACG0.2820.2220.4740.252

AlaGCU 1.90 301.2015 1.57 11 1.82 15
GCC0.89140.88110.7151.099
GCA1.0817 1.68 211.2990.857
GCG0.1320.2430.4330.242

TyrUAU 1.37 13 1.06 9 1.06 9 1.13 9
UAC0.6360.9480.9480.887

HisCAU 1.38 11 1.63 13 1.23 8 1.00 4
CAC0.6350.3830.775 1.00 4

GlnCAA 1.10 11 1.15 23 1.26 17 1.33 8
CAG0.9090.85170.74100.674

AsnAAU 1.29 11 1.00 10 1.03 17 1.33 12
AAC0.716 1.00 100.97160.676

LysAAA 1.33 40 1.29 33 1.33 280.9320
AAG0.67200.71180.6714 1.07 23

AspGAU 1.39 25 1.27 21 1.30 15 1.23 16
GAC0.61110.73120.7080.7710

GluGAA 1.23 38 1.24 34 1.36 40 1.26 27
GAG0.77240.76210.64190.7416

CysUGU0.8615 1.06 180.8113 1.00 14
UGC 1.14 200.9416 1.19 19 1.00 14

ArgCGU 0.67 3 0.50 2 0.55 20.000
CGC0.2210.2510.2710.000
CGA 0.67 3 0.50 2 0.55 2 0.92 2
CGG0.4420.2510.0000.461

SerAGU 1.29 6 1.54 90.795 0.86 6
AGC0.6430.513 1.42 90.715

ArgAGA 2.89 13 2.75 11 3.27 12 2.31 5
AGG1.1151.7571.365 2.31 5

GlyGGU0.9230.7530.6240.291
GGC0.9230.7530.775 1.43 5
GGA 1.85 6 1.50 6 2.00 13 1.43 5
GGG0.311 1.50 40.6240.863

RSCU1 : RSCU values for ALB; RSCU2: RSCU values for AFP; RSCU3: RSCU values for AFM; RSCU4: RSCU values for DBP.

Table 4

A + U and G + C preferential codon usage of human albumin superfamily members.

A + UG + C
ALB173
AFP171
AFM182
VDBP114

4. Conclusions

The members of albumin superfamily, namely, ALB, AFP, AFM, and VDBP, exhibit sequence and structural similarities. The proteins possess three homologous folding domains as a result of conserved pattern of cysteine residues in the members of albumin superfamily [41, 42]. Our study on codon usage bias in the members of the albumin gene family revealed that they are also similar in terms of their low GC content, low GC3, and high ENC values. In addition, they are not having a bias in the usage of synonymous codons and are highly expressible genes. Furthermore, low GC and GC3 values revealed that mutational bias and translational selection do not play a significant role in shaping the codon usage pattern in the albumin superfamily.
  40 in total

1.  Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii.

Authors:  H Naya; H Romero; N Carels; A Zavala; H Musto
Journal:  FEBS Lett       Date:  2001-07-20       Impact factor: 4.124

2.  The base composition of the genes is correlated with the secondary structures of the encoded proteins.

Authors:  Giuseppe D'Onofrio; Tapash Chandra Ghosh; Giorgio Bernardi
Journal:  Gene       Date:  2002-10-30       Impact factor: 3.688

Review 3.  Evolution of synonymous codon usage in metazoans.

Authors:  Laurent Duret
Journal:  Curr Opin Genet Dev       Date:  2002-12       Impact factor: 5.578

4.  Alphafetoprotein and neural tube defects.

Authors:  J H Brock
Journal:  J Clin Pathol Suppl (R Coll Pathol)       Date:  1976

5.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

6.  Proteomic identification of biomarkers of vascular injury.

Authors:  Ngan F Huang; Kyle Kurpinski; Qizhi Fang; Randall J Lee; Song Li
Journal:  Am J Transl Res       Date:  2010-11-21       Impact factor: 4.060

7.  Alpha-fetoprotein, the major fetal serum protein, is not essential for embryonic development but is required for female fertility.

Authors:  Philippe Gabant; Lesley Forrester; Jennifer Nichols; Thierry Van Reeth; Christelle De Mees; Bernard Pajack; Alistair Watt; Johan Smitz; Henri Alexandre; Claude Szpirer; Josiane Szpirer
Journal:  Proc Natl Acad Sci U S A       Date:  2002-09-24       Impact factor: 11.205

8.  Tissue-specific codon usage and the expression of human genes.

Authors:  Joshua B Plotkin; Harlan Robins; Arnold J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-16       Impact factor: 11.205

9.  Tissue specificity of alpha-fetoprotein messenger RNA expression during mouse embryogenesis.

Authors:  M A Dziadek; G K Andrews
Journal:  EMBO J       Date:  1983       Impact factor: 11.598

Review 10.  Characteristic analysis of α-fetoprotein-producing gastric carcinoma in China.

Authors:  Xiao-Dong Li; Chang-Ping Wu; Mei Ji; Jun Wu; Binfeng Lu; Hong-Bing Shi; Jing-Ting Jiang
Journal:  World J Surg Oncol       Date:  2013-10-01       Impact factor: 2.754

View more
  6 in total

1.  Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome.

Authors:  Monisha Nath Choudhury; Arif Uddin; Supriyo Chakraborty
Journal:  Genetica       Date:  2017-04-18       Impact factor: 1.082

2.  Codon Usage Pattern of Genes Involved in Central Nervous System.

Authors:  Arif Uddin; Supriyo Chakraborty
Journal:  Mol Neurobiol       Date:  2018-06-19       Impact factor: 5.590

3.  Codon usage vis-a-vis start and stop codon context analysis of three dicot species.

Authors:  Prosenjit Paul; Arup Kumar Malakar; Supriyo Chakraborty
Journal:  J Genet       Date:  2018-03       Impact factor: 1.166

4.  Codon usage in Alphabaculovirus and Betabaculovirus hosted by the same insect species is weak, selection dominated and exhibits no more similar patterns than expected.

Authors:  Sheng-Lin Shi; Yi-Ren Jiang; Rui-Sheng Yang; Yong Wang; Li Qin
Journal:  Infect Genet Evol       Date:  2016-07-30       Impact factor: 3.342

5.  Codon usage pattern in human SPANX genes.

Authors:  Monisha Nath Choudhury; Supriyo Chakraborty
Journal:  Bioinformation       Date:  2015-10-31

6.  Composition, codon usage pattern, protein properties, and influencing factors in the genomes of members of the family Anelloviridae.

Authors:  Bornali Deb; Arif Uddin; Supriyo Chakraborty
Journal:  Arch Virol       Date:  2021-01-03       Impact factor: 2.574

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.