BACKGROUND: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. RESULTS: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3(rd) position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. CONCLUSIONS: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.
BACKGROUND: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. RESULTS: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3(rd) position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. CONCLUSIONS: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.
The mitochondrial genome is ideal as the molecular marker for
species identification as well as systematic phylogenetic studies
due to its small size. It is easily amplified and mostly conserved
in gene content and characterized by lack of recombination,
maternal inheritance and high evolutionary rate [1]. The
respiratory chains of mitochondrial genome comprise of four
complexes (complex, I-IV) and are encoded by 37 genes
consisting of two ribosomal RNA (rRNA), twenty-two transfer
RNA (tRNA) and thirteen protein coding genes. The complex-I
of mitochondrial respiratory chain includes the first enzyme
NADH dehydrogenase and its seven subunits (ND1-6 & ND4L)
play a pivotal role in diverse pathological processes [2]. The
subunit 2 of NADH dehydrogenase is encoded by ND2 gene
and its function is not yet fully understood. However, literature
suggests that a mutation in the ND2 (T4681C) gene was found
in patients with Leigh syndrome, a neurodegenerative disease
characterized by bilateral symmetric lesions in basal ganglia
and subcortical brain regions [3].Urrutia and Hurst (2003) reported that the codon usage in
human is positively related to gene expression but is inversely
related to the rate of synonymous substitution [4]. Several
genomic factors such as gene expression level, protein
secondary structure, and translational preferences balancing
between the mutational pressure and natural selection
contribute to the synonymous codon usage variation in
different organisms [5,
6]. Therefore, gaining the information on
the synonymous codon usage pattern provides significant
insights pertaining to the prediction, classification, and
evolution of a gene at molecular level and also helps in
designing highly expressed genes. In the present study we have
carried out a comparative analysis of the ND2 gene codon
usage and codon context patterns among the mitochondrial
genomes of three chordate classes (pisces, aves and mammals)
in order to understand the molecular mechanism along with
functional conservation of gene expression during the period of
evolution using several bioinformatics tools.
Methodology
Retrieval of Sequence data:
The coding sequences (cds) of MT-ND2 gene from five species
of pisces, aves and mammals each were retrieved from
National Center for Biotechnology Information, USA (
http://www.ncbi.nlm.nih.gov/) using the following accession
numbers. The accession numbers of different species are
AP006806, AP006813, AP006778, AP006825, AP006858, X52392,
AF090337, AF090341, AF090338, AF090340, U96639, AJ001562,
X14848, Y11832 and AJ001588. A perl programme was used to
analyze the compositional features and codon usage bias
parameters.
Compositional properties:
The overall composition of A, T, G, C bases and its composition
at 3rd position along with GC, GC1, GC2 and GC3 contents
were calculated using the perl script.
Codon adaptation Index (CAI):
Codon adaptation index (CAI) is used to estimate gene
expression level. The CAI is calculated asWhere, ωκ is the relative adaptiveness of the κth codon and L is
the number of synonymous codons in the gene [7].
Effective Number of Codons (ENC):
The effective number of codons (ENC) is the most extensively
used parameter to measure the usage bias of the synonymous
codons [8]. The ENC value ranges from 20 (when only one
codon is used for each amino acid) to 61 (when all codons are
used randomly). It is calculated as:Where Fκ(κ= 2,3,4,6) is the mean of Fκ values for the κ-fold
degenerate amino acids.
Relative Synonymous Codon Usage (RSCU):
Relative synonymous codon usage was calculated as the ratio
of the observed frequency of a codon to its expected frequency
if all the synonymous codons of a particular amino acid are
used equally [9]. The RSCU value is calculated using the
formulawhere, Xij is the frequency of occurrence of the jth codon for ith
amino acid (any Xij with a value of zero is arbitrarily assigned a
value of 0.5) and ni is the number of codons for the ith amino
acid (ith codon family).
GRAVY:
GRAVY (Grand Average of Hydropathicity) values are the sum
of the hydropathy values of all the amino acids in the encoded
protein of the gene divided by the number of residues in the
sequence [10].
Aromaticity (Aromo):
Aromo refers to the frequency of aromatic amino acids (Phe,
Tyr, Trp) in the translated gene product [11].
Correspondence analysis (COA):
Correspondence analysis is a multivariate statistical method
used to study the major trends in synonymous codon usage
variation in coding sequences and distributes the codons in
axis1 and axis2 with these trends [12].
Software used:
Novel software developed by SC (corresponding author) using
Perl script was used to calculate all the codon usage bias
parameters and nucleotide composition. The genetic code of
vertebrate mitochondria having 60 sense codons available in
NCBI database was used for the present analysis. The RSCU
values of each codon from different species were clustered by
hierarchal clustering method using XLSTAT.
Statistical analysis:
Correlation analysis was used to identify the relationship
between overall nucleotide composition and each base at 3rd
codon position. All the statistical analyses were done using the
SPSS software.
Results & Discussion
The overall nucleotide compositions in the coding sequence of
MT-ND2 gene among pisces, aves and mammals were analyzed
Table 1 (see supplementary material). Our results showed that
the nucleobase C was the highest (%) in pisces and aves but the
nucleobase A was the highest in mammals whereas G was the
lowest in pisces, aves and mammals. For the 3rd position of
codon, A3 was the highest in pisces, aves and mammals but G3
the lowest. This clearly indicates that compositional constraint
might influence the codon usage pattern of MT-ND2 gene
[13].The effective number of codon (ENC) values for MT-ND2 gene
among pisces, aves and mammals were estimated Table 2 (see
supplementary material). The effective number of codon
(ENC) is a non directional measure of codon usage bias. Its
value ranges from 20-61. ENC value 20 indicates that for each
amino acid only one codon is used (extreme bias) and 61 means
all codons equally encode the same amino acids (no bias)
[8].
We observed that the ENC value in MT-ND2 gene was
(Mean±SD) 57±2.91, 59±0.44 and 55±1.58 among pisces, aves
and mammals, reflecting a weak codon bias which was similar
to the findings of Jia X et al, 2015 in B.mori
[14]. It was also
found that the overall GC % was less than 50% and the gene
was AT rich. This phenomenon was also reported in AT rich
species such as Plasmodium falciparum [15].We calculated the codon adaptation index (CAI) values for MTND2
gene in order to find out the expression level among
pisces, aves and mammals (Figure 1). In our analysis, the CAI
values were (Mean±SD) 0.7851±0.05, 0.7667±0.05, 0.7635±0.02
in pisces, aves and mammals, respectively. We used unpaired t
test between pisces and aves as well as between pisces and
mammals but the difference was not statistically significant.
Wei et.al 2014, also reported the average value of CAI in
mitochondrial protein coding genes ranged from 0.5-0.7 in
B.mori [16]. In addition, we performed a correlation analysis
between ENC and gene expression level as measured by CAI
and found no significant relationship suggesting that the codon
usage bias in MT-ND2 gene is not associated with expression
level among the three classes.
Figure 1
Distribution of CAI in MT-ND2 gene among different
species
Moreover, we calculated the relative synonymous codon usage
(RSCU) values in the coding sequences of MT-ND2 gene among
pisces, aves and mammals Table 3 (see supplementary
material). In our analysis, the RSCU value > 1 means the codon
is more frequently used, RSCU value <1 as less frequently used
codon. But RSCU value >1.6 means over represented codon
whereas the RSCU value <0.06 as under represented codon.
The RSCU values of 60 codons also indicated that the codon
usage bias of MT-ND2 gene is low. Approximately half of the
codons were frequently used i.e. 30 codons in pisces, 28 in aves
and 21 in mammals. In all three classes the most frequent
codons ended with A or C at the 3rd codon position (A/T
ended: G/C ended = 16:14 in pisces, 13: 15 in aves and 14: 7 in
mammals). The RSCU values were analyzed using heat map
(Figure 2) which represented the more frequently, less
frequently, over-represented and under-represented codons.
The results showed that the most preferred codons were TTC,
CTA and TAC in all the three classes. In addition, the four
codons namely TCA, CTA, CGA and TGA encoding the amino
acid serine, leucine, arginine and tryptophan respectively were
over represented in MT-ND2 gene among pisces, aves and
mammals. This result suggests that compositional features
played an important role in codon usage in MT-ND2 gene
[13].
Figure 2
Hierarchal clustering of RSCU value in different
species of MT-ND2 gene
The overall percentage of GC contents at different codon
positions were calculated (see supplementary material Table
S2). In order to find out the role of mutation pressure and
natural selection, we constructed a neutrality plot of GC12
against GC3 (Figure 3, a-c)
[17]. The linear regression
coefficient of GC12 on GC3 indicated that natural selection
plays a major role while mutation pressure plays a minor role
in shaping the codon usage patterns in MT-ND2 gene. Our
result was similar to the findings of Wei et.al (2014) in the
mitochondrial DNA codon usage analysis of B.mori
[16].
Figure 3
Neutrality plot of GC12 versus GC3 in (a) Pisces (b)
Aves (c) mammals. GC12: average of GC1 and GC2.
We performed correspondence analysis (CoA) based on RSCU
values to analyze the codon usage variation in MT-ND2 gene
among pisces, aves and mammals. In our analysis, the 1st axis
(F1) accounted for 34.50% of the total variation and the 2nd axis
accounted for 12.51% of the total variation (Figure 4). Further,
correlation analysis was done to determine the
interrelationships between the first two principle axes (F1 and
F2), nucleotide constraints and indices of natural selection
(CAI, Gravy, Aromo) on MT-ND2 gene. The F1 axis showed
significant positive correlation with G, T3 and CAI whereas the
F2 showed significant positive correlation with G, C, G3, C3,
ENC, GC and GC1-3 but significant negative correlation with A
and T Table 4 (see supplementary material). These results
suggest both compositional constraint under mutation pressure
and natural selection affect the codon usage pattern in MTND2.
The results were similar to the findings of Butt et.al.
[18].
Figure 4
Correspondence analysis of the synonymous codon
usage in MT-ND2 gene. The analysis was based on the RSCU
value of the 60 synonymous codons.
Conclusion
The codon usage bias in MT-ND2 gene is weak with high
expression level. It is found that natural selection and mutation
pressure affect the codon usage pattern in MT-ND 2 gene.
Authors: Cristina Ugalde; Reetta Hinttala; Sharita Timal; Roel Smeets; Richard J T Rodenburg; Johanna Uusimaa; Lambert P van Heuvel; Leo G J Nijtmans; Kari Majamaa; Jan A M Smeitink Journal: Mol Genet Metab Date: 2006-09-22 Impact factor: 4.797