Literature DB >> 20451159

Study of completed archaeal genomes and proteomes: hypothesis of strong mutational AT pressure existed in their common predecessor.

Vladislav V Khrustalev1, Eugene V Barkovsky.   

Abstract

The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C<0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4>>Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea. 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20451159      PMCID: PMC5054120          DOI: 10.1016/S1672-0229(10)60003-4

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

In this work we analyzed G+C composition of genomes and amino acid content of proteomes of all the archaeal species whose genomes have already been completely sequenced and submitted to the Codon Usage Database (www.kazusa.or.jp/codon) (. We applied directed mutational pressure theory 2, 3 to the analysis of completely sequenced archaeal genomes. Mutational pressure in double-stranded DNA genome is the situation when AT to GC substitution rates are not equal to GC to AT substitution rates (. Most substitutions in third codon positions are synonymous, so single nucleotide mutations occurring due to mutational pressure may be fixed in these codon positions by the random genetic drift without any selective limitations (. When the level of GC content in third codon positions (3GC) for most of coding regions in the prokaryotic genome is higher than 0.5, one can suspect that this genome is under the influence of GC pressure (. There should be AT pressure in the genome if 3GC level for most of its coding regions is lower than 0.5. The strongest evidence for GC pressure in genome is the situation when 3GC is higher than 1GC and 2GC for most of its genes (. If 3GC level is lower than 2GC and 1GC for most of coding regions, there is AT pressure in this genome. Mutational pressure is caused by the imbalance of mutational processes and reparation 3, 5. The most common and well-studied mutational processes that contribute into mutational pressure are: (i) deamination of cytosine leading to C to U transitions 3, 5, (ii) deamination of methylcytosine leading to 5-methyl-C to T transitions 5, 6, (iii) oxidation of guanine leading to G to T transversions 3, 5, (iv) deamination of adenine leading to A to G transitions 5, 7, (v) oxidation of thymine leading to T to C transitions (, (vi) incorporation of 8-oxo-G into the growing DNA strand opposite adenine followed by the replacement of A with C and the excision of 8-oxo-G leading to A to C transversions 3, 5. Mono and polyfunctional enzymes involved in reparation of above-mentioned lesions are found in species from all three superkingdoms of life 5, 6, 7. When third codon positions become saturated (due to GC pressure) or desaturated (due to AT pressure) with G and C, the probability of substitutions occurring in first and second codon positions increases (. This situation has been called a strong mutational pressure ( that leads to the simplification of the amino acid content of proteome (. In the work on amino acid biases of halobacterial proteomes, an elevated level of aspartic acid (which is higher than that predicted according to dinucleotide composition of genomic DNA) has been found (. Authors speculated that increased level of Ala and decreased level of Lys in halobacterial proteomes are “dragging effects” caused by the compositional shift of halobacterial DNA, which would have changed to increase principally the fraction of aspartic acid alone (. Another study on the haloarchaeal secretomes showed that frequencies of Lys, Ile and Leu are much lower, but the frequencies of Arg, Thr, Val and Gly are higher than those in bacterial signal peptides (. In our work we found out that amino acid biases in archaeal proteomes, as well as genomic biases characteristic to most of archaeal species, can be simply explained by a single hypothesis of strong mutational AT pressure existed in the common predecessor of all archaea.

Results and Discussion

Estimation of the strength of mutational pressure in archaeal genomes

The estimation of the strength of mutational pressure in archaeal genomes is shown in Figure 1. Mutational pressure is strong if the module of the coefficient of correlation (R) between GC content in third codon positions (3GC) and total GC content (G+C) for all coding regions in the genome (excluding those from genomic islands) is lower than 0.5 (. Strong mutational GC pressure leads to the almost equivalently high saturation of third codon positions with G and C in all coding regions (Figure 1B) 4, 8. Strong mutational AT pressure leads to the almost equivalent desaturation of third codon positions with G and C through the coding genome (Figure 1D) (.
Figure 1

Intragenomic dependences between total GC content (G+C) and GC content in three codon positions reflecting the strength and direction of mutational pressure in certain archaeal coding genomes.

Usually the picture of GC content distribution in all genes from archaeal genome can be described as “head and tail” (Figure 1A, B and D), where a head is formed by most of the coding regions with close GC content and a tail is formed by coding regions from genomic islands. Genome of Methanosarcina acetivorans (Figure 1C) is an exceptional one: you cannot find any “head”, but just a long “tail” that lasts from G+C = 0.24 to G+C = 0.60. Even in the genome of Halobacterium sp. NRC-1 with total GC content about 0.68, there is a strong correlation between 3GC and G+C (R=0.864). As one can see in Figure 1B, this correlation is due to the small part of coding regions with lower GC content, while most of the genes (the “head” forming a kind of “roof”) have 3GC levels about 0.87. However, for coding regions with G+C>0.62 the correlation is already low (R=0.489). It means that there is strong mutational GC pressure in the main part of Halobacterium sp. genome (which includes 77.7% of coding regions), but there is no strong mutational GC pressure in the Halobacterium sp. genomic islands. Main parts of Natronomonas pharaonis and Haloarcula marismortui genomes (genes with G+C>0.60) are also under the influence of strong mutational GC pressure. The coefficient of correlation between 3GC and G+C for Sulfolobus tokodai is 0.716. If we calculate this coefficient only for coding regions with G+C<0.4 (Figure 1D), the situation characteristic for the strong AT pressure (low correlation between 3GC and G+C) will be found (R=0.486). There are about 5.9% of S. tokodai coding regions with G+C more than 0.4. Strong mutational AT pressure has also been found in the main part of Sulfolobus solfataricus coding genome (in genes with G+C<0.4) and in whole genomes of Nanoarchaeum equitans, Methanobrevibacter smithii, Methanococcus aeolicus and Methanosphaera stadtmanae.

Bias in Arg4 and Arg2 usage correlates with bias in GC4f and GC2f3p: genomic indicators of previously existed strong AT pressure

For the levels of arginine codon usage, it is expectable that GC-rich Arg4 codons (CGX) are used with a great preference in GC-rich genomes (Figure 2). In genomes with strong mutational AT pressure, there is also an expectable bias in Arg4/Arg2 codon usage: GC-poor Arg2 codons (AGA/G) are used with the great preference (Figure 2).
Figure 2

Dependences between average genomic GC content (G+C) and levels of arginine codon usage (Arg2 and Arg4) in archaeal genomes.

The strange is the fact that there is a great bias in Arg2 and Arg4 codon usage in the absence of strong mutational pressure. In most of archaeal genomes with average GC content, Arg2 codons are used with the great preference against Arg4 codons (Figure 2). The level of Arg4 usage is increasing with the growth of GC content but even in Thermofilum pendens (G+C=0.58) it is more than twofold lower than the level of Arg2 usage. Only in the genome of Methanopyrus kandleri with G+C=0.61 the bias became controversial: Arg4>Arg2. However, the level of M. kandleri Arg2 usage is still high in relation to the archaeal genomes with strong GC pressure. High level of AT-rich Arg2 codons and low level of GC-rich Arg4 codons in archaea with average GC content should be the “trace” of previously existed strong AT pressure in their common predecessor. As one can see in Figure 2, the bias in Arg4 and Arg2 codon usage still exists in methane-producing archaea with G+C<0.46, but the level of Arg2 is some lower, while the level of Arg4 is some higher in them in comparison with genomes from the main archaeal group. It is important to highlight that two methane-producing archaea (M. kandleri and Methanothermobacter thermautotrophicus) are classified into the main group: their biases in arginine codon usage and amino acid content are different from biases found in their relatives. In Figure 2 one can see outlying points with inversed Arg4/Arg2 bias that belong to Haloquadratum walsbyi. As shown in Table 1, H. walsbyi has an elevated level of 1GC relative to the species with the same total GC content (around 0.5). On the other hand, the level of 3GC in H. walsbyi is too low for “normal” archaea with average G+C.
Table 1

GC content and levels of arginine codon usage for genomes from the main group of archaea

SpeciesG+C1GC2GC3GCGC4fGC2f3pArg4 (×0.001)Arg2 (×0.001)
Halobacterium sp. NRC-10.6790.6980.4660.8720.8820.84462.682.13
Natronomonas pharaonis0.6380.6940.4440.7750.7910.74261.073.33
Haloarcula marismortui0.6230.6730.4400.7570.7450.72057.014.02
Haloquadratum walsbyi0.4880.6130.4300.4210.4040.41253.545.39
Methanopyrus kandleri0.6080.6560.4200.7490.7340.76161.2823.92
Thermofilum pendens0.5780.5960.4120.7240.6780.80321.3953.94
Pyrobaculum calidifontis0.5710.5940.4150.7050.7030.73823.9046.21
Aeropyrum pernix0.5660.5920.4370.6690.6160.77815.0866.04
Pyrobaculum arsenaticum0.5510.5780.4080.6680.6670.71122.1846.78
Thermococcus kodakarensis0.5240.5540.3630.6530.6150.71615.4345.34
Pyrobaculum aerophilum0.5150.5500.4030.5920.6230.58621.0646.43
Methanothermobacter thermautotrophicus0.4990.5460.3910.5590.5090.62914.1353.70
Pyrobaculum islandicum0.4890.5520.3970.5200.5190.52921.2849.22
Archaeoglobus fulgidus0.4870.5250.3600.5770.5120.6526.2553.03
Thermoplasma acidophilum0.4670.4900.3700.5410.5240.58213.6742.83
Metallosphaera sedula0.4650.5000.3660.5280.4620.6004.9253.03
Pyrococcus abyssi0.4490.5000.3460.5010.4170.5973.9155.00
Pyrococcus horikoshii0.4220.4730.3650.4300.3670.4865.3348.96
Pyrococcus furiosus0.4090.4900.3420.3940.3210.4554.6049.73
Thermoplasma volcanium0.4060.4620.3480.4060.3580.45610.3037.96
Picrophilus torridus0.3680.4150.3250.3650.3130.4185.1840.86
Sulfolobus acidocaldarius0.3700.4430.3330.3350.2650.3873.5343.95
Sulfolobus solfataricus0.3640.4290.3300.3340.2700.3764.8644.47
Sulfolobus tokodaii0.3340.4220.3240.2560.2000.2824.0438.93
Nanoarchaeum equitans0.3110.3960.2840.2520.2370.2612.5438.20
H. walsbyi is phylogenetically closely related to GC-rich Halobacterium sp., H. marismortui and N. pharaonis 12, 13, 14, genomes of which are under the influence of strong mutational GC pressure. However, we can describe mutational pressure in H. walsbyi genome as the weak AT pressure. We came to the conclusion that the direction of mutational pressure in the genome of H. walsbyi has changed not very long time ago. There was a strong GC pressure in the genome of H. walsbyi predecessor, the level of 1GC was high and the bias between Arg4 and Arg2 (Arg4>>Arg2) was great. The product of gene MutY is DNA glycosylase that excises A opposite G, C and 8-oxo-G 3, 5. This enzyme is involved in A to C and T to G transversion mechanism. Probably, some amino acid substitutions had altered the function of H. walsbyi MutY protein or decreased its specificity for 8-oxo-G residues and the change of mutational pressure direction has occurred. GC4f is the GC content in fourfold degenerated sites where all nucleotide substitutions are synonymous. GC2f3p is the GC content in twofold degenerated sites from third codon positions where only transitions are synonymous. As shown in Table 1, in all archaeal species, except those that are under strong mutational GC pressure and Pyrobaculum aerophilum, GC4f is higher than GC2f3p. In most of them the difference between GC4f and GC2f3p is great enough to state that the rates of GC to AT transversions in archaeal genomes are higher than the rates of AT to GC transversions, but the rates of GC to AT transitions are lower than the rates of AT to GC transitions. The level of Arg4 cannot be elevated in most of archaeal species because the rates of AT to GC transversions are much less frequent than the rates of GC to AT transversions. In P. aerophilum GC4f (0.623) is a little higher than GC2f3p (0.586), but the bias in arginine codons is the same as that for the most part of archaea. This may be explained by the suggestion that the bias in arginine codon usage is a much more “retrospective” indicator of previously existed mutational pressure than the bias in GC4f and GC2f3p. In Table 2 one can see that the bias in GC4f and GC2f3p is lower for Methanosarcina species than for the main group of archaea. In genomes of Methanocorpusculum labreanum and Methanospirillum hungatei the level of GC4f is much higher than the level of GC2f3p. Increased rates of AT to GC transversions in these genomes have already resulted in the change of arginine codon usage bias (Arg4>Arg2).
Table 2

GC content and levels of arginine codon usage for genomes from the group of methane-producing archaea

SpeciesG+C1GC2GC3GCGC4fGC2f3pArg4 (×0.001)Arg2 (×0.001)
Methanocorpusculum labreanum0.5080.5580.3960.5710.6000.48033.5314.24
Methanospirillum hungatei0.4580.5390.3880.4470.4610.39033.1819.60
Methanosarcina acetivorans0.4430.5080.3660.4540.4490.43718.0429.09
Methanosarcina mazei0.4360.5140.3650.4280.4130.42715.8631.43
Methanococcoides burtonii0.4190.5060.3560.3940.3410.40817.7324.44
Methanosarcina barkeri0.4170.5010.3610.3910.3680.38916.3927.70
Methanococcus maripaludis0.3370.4430.3170.2520.2060.2556.1425.16
Methanococcus vannielii0.3210.4270.3110.2240.1950.2237.5924.24
Methanobrevibacter smithii0.3180.4320.3170.2060.1500.2106.4824.95
Methanococcus aeolicus0.3110.4070.3060.2220.2310.2023.8626.28
Methanosphaera stadtmanae0.2900.4230.3120.1340.0740.1408.0024.13
One can speculate that the bias in arginine codon usage is due to the increased (or decreased) number of tRNA clones recognizing Arg4 (or Arg2) codons. However, this alternative hypothesis is not working well: different biases in arginine codon usage are observed in species with the same number of tRNA copies; close biases are found in species with different numbers of tRNA copies (Table 3).
Table 3

Number of tRNA clones recognizing arginine codons and levels of their usage for certain archaeal genomes

Number of tRNA clones recognizing the following codons coding for arginine
SpeciesArg2
Arg4
Arg2 (×0.001)Arg4 (×0.001)
AGAAGGCGACGUCGGCGC
Haloarcula marismortui111114.0257.01
Haloquadratum walsbyi11115.3953.54
Archaeoglobus fulgidus1111153.036.25
Pyrococcus abyssi1111155.003.91
Sulfolobus tokodaii1111138.934.04
In addition, one should remember that the level of Arg4 codon usage makes a great contribution to the total level of CpG dinucleotide usage (.

Mosaic structure of M. acetivorans genome

Figure 3 shows the distribution of 3GC in coding regions of two genomes. Several genomic islands with higher 3GC are clearly seen in the genome of H. walsbyi (Figure 3A), but 3GC levels of “normal” coding regions do not vary widely (from 0.33 to 0.43, approximately). Figure 3B shows that the genome of M. acetivorans consists of numerous short genomic islands significantly different in their GC content. In some of those “microislands”, mutational pressure should have AT to GC direction (3GC higher than 50%), in others the direction of mutational pressure should be different (AT pressure). Wide variation in 3GC levels (from 0.2 to 0.8) in M. acetivorans coding regions is close to that in eukaryotic chromosomes (.
Figure 3

Distribution of GC content in third codon positions (3GC) of coding regions along the length of Haloquadratum walsbyi (A) and Methanosarcina acetivorans (B) genomes.

Methanosarcina species are known to have different stages in their life cycles (. M. acetivorans can live as separate cells, as a cell lining and as a multicellular conglomerate with differentiated cells (. In different cell types different genes are expressed. If mutator-gene is expressed only in a given stage of a life cycle, it will cause nucleotide substitutions mostly in genes that are also expressed in this differentiated cell. If numerous genes are not translated in the differentiated cell, mutator-gene will rarely cause nucleotide substitutions in them. Coding regions might “jump” from GC-rich “microislands” to GC-poor ones and vise versa in M. acetivorans. This process should result in the absence of the great bias in arginine codon usage as well as in the growth of entropy of amino acid content distribution. Analogous mosaic genome structure is also a characteristic of Methanosarcina mazei and M. hungatei genomes. In Methanococcoides burtonii and Methanosarcina barkeri genomes, variations in 3GC are not so wide, but they are still wider than in genomes from the main group of archaea. In Methanococcus maripaludis, Methanococcus vannielii, Methanobrevibacter smithii, Methanococcus aeolicus and Methanosphaera stadtmanae, the distribution of 3GC is identical to that in the main group of archaea; there are even no genomic islands in them, but the bias in arginine codon usage and specific amino acid content features are similar to those of M. acetivorans (Figure 2 and Table 2). The GC content of M. hungatei and M. labreanum genomes is higher than that of M. acetivorans; the size of their genomic islands is larger than in M. acetivorans, while the variations in 3GC are not so wide. Probably, common predecessor of the group of methane-producing archaea existed for a long time with the state of genome organization similar to that of M. acetivorans, but then one group of its offspring drifted to AT pressure and another group drifted to GC pressure, loosing their mosaic G+C structure that remains in M. acetivorans.

Entropy of amino acid content distribution in archaeal genomes

We calculated entropy of amino acid content distribution (according to Claude Shannon’s information theory) in all proteins coded by genomes of the main group of archaea and in all proteins from the group of methane-producing archaeal species. Entropy (the quantity of information) is the measure of uncertainty and diversity of any biological system (. The lower the level of entropy, the higher the level of amino acid content uniformity. In general, Figure 4 shows that both GC and AT pressure lead to the decrease in entropy of amino acid content distribution. GC pressure causes increase in levels of four amino acids coded by GC-rich codons (GARP) and decrease in levels of six amino acids coded by AT-rich codons (FYMINK). On the contrary, AT pressure causes increase in levels of FYMINK and decrease in GARP usage.
Figure 4

Dependences on GC content of the entropy of amino acid usage distribution (A, B) and the level of 10 a.a. usage (C, D) in proteins from the main group of archaea (A, C) and in proteins from the group of methane-producing archaea (B, D). 10 a.a. is the total level of usage for ten amino acids coded by codons average in GC content.

The entropy of amino acid content distribution is significantly higher in proteins from the group of methane-producing archaeal species (Figure 4B) coded by genes with G+C from 0.3 to 0.6 than in proteins from the main group of archaea coded by genes with the same GC content (Figure 4A). The highest entropy of the main group of archaea is in proteins coded by genes with G+C from 0.4 to 0.5 (Figure 4A), while the highest entropy of the group of methane-producing archaeal species is in proteins coded by genes with G+C from 0.5 to 0.6 (Figure 4B). Figure 4C shows the cause of the slow (yet statistically significant) decrease in entropy under the influence of GC pressure in proteins of the main group of archaea: the total level of 10 a.a. usage (the usage of 10 amino acids coded by codons with average GC content) increases with the growth of GC content up to the point of G+C = 0.7 and begins to decrease only in proteins coded by genes with G+C higher than 0.7. In comparison, Figure 4D shows that in proteins from the group of methane-producing archaeal species, the level of 10 a.a. usage begins to decrease in proteins coded by genes with G+C higher than 0.5. Entropy falls more steeply in GC-poor genes (G+C<0.3) from the group of methanoproducents (Figure 4B) than in the main group of archaea (Figure 4A). This fact is surely caused by the absence of statistically significant difference between levels of 10 a.a. in proteins coded by genes with G+C from 0.2 to 0.3 and genes with G+C from 0.3 to 0.4 in the main group of archaea.

Amino acid usage in two groups of archaeal species: proteomic indicators of previously existed strong AT pressure

The levels of amino acid usage in proteins from the main group of archaea is shown in Figure 5. Levels of valine, threonine, aspartic acid, histidine and glutamine (Figure 5C) increase with the growth of GC content just like levels of GARP amino acids do (Figure 5A). Only the level of glutamine begins to decrease in proteins coded by G+C >0.7, when levels of other four amino acids keep growing. To find out the cause of this unexpected growth, we analyzed behavior of its possible sources (FYMINK amino acids).
Figure 5

Levels of amino acid usage in proteins from the main group of archaea.

One can see in Figure 5B that levels of isoleucine and lysine are much higher than the level of any other amino acid from FYMINK group in proteins coded by genes with G+C<0.6 (. Great shift in lysine usage can be seen between proteins coded by genes with G+C from 0.5 to 0.6 and proteins coded by genes with G+C from 0.6 to 0.7. The greatest shift in aspartic acid usage (Figure 5C) has also taken place between proteins coded by genes with G+C from 0.5 to 0.6 and proteins coded by genes with G+C from 0.6 to 0.7. We supposed that the level of aspartic acid grew in proteins coded by genes with G+C from 0.6 to 0.7 due to the decrease in lysine (. Indeed, aspartic acid is coded by GAT and GAC codons, while lysine is coded by AAA and AAG. The easiest pathway of Lys to Asp substitution is two-step nucleotide mutation AAA to GAC. This kind of two-step mutation should be frequent when mutational GC pressure is caused by both AT to GC transitions and AT to GC transversions. Increase in glutamine and histidine levels can also be due to the decrease in lysine level under the influence of GC pressure. Decrease in isoleucine should give source to the increase in valine and in threonine under the influence of GC pressure. The level of alanine is growing more steeply than levels of other three amino acids from the GARP group under the influence of strong GC pressure (Figure 5A). Amino acid substitutions leading to alanine appearance should be more neutral than amino acids leading to glycine, arginine or proline appearance (. We hypothesize that isoleucine and lysine levels are growing so steeply under the influence of AT pressure because of the same circumstances: substitutions leading to isoleucine and lysine appearance should be more neutral than substitutions leading to phenylalanine, tyrosine, methionine or asparagine appearance. As shown in Figure 5, there are only three amino acids (Ile, Pro and Arg) whose levels are significantly different between groups of proteins coded by genes with G+C from 0.2 to 0.3 and proteins coded by genes with G+C from 0.3 to 0.4. In contrast, in proteins from the group of methane-producing archaea, levels of GARP amino acids decrease, levels of FYMINK amino acids (except methionine) increase and levels of eight out of ten amino acids coded by codons with average GC content do decrease under the influence of strong AT pressure (Figure 6).
Figure 6

Levels of amino acid usage in proteins from the group of methane-producing archaea.

The absence of difference between amino acid content of archaeal proteins coded by genes with G+C from 0.2 to 0.3 and proteins coded by genes with G+C from 0.3 to 0.4 is the strongest proteomic evidence of our hypothesis. This situation could be possible only if archaeal species from the main group “came back” from the strong AT pressure. The higher rates of AT to GC transitions than in their common predecessor have not led to the significant rearrangements in amino acid content, just like it happened with H. walsbyi that “came back” from GC pressure. Relatively elevated levels of histidine, cysteine and glutamine are found in proteomes that belong to archaeal species from the group of methanoproducents. This feature should be the cause of higher levels of entropy in their proteins. The level of lysine is much lower than that in the main group of archaea for proteins coded by genes with G+C from 0.4 to 0.5. This may be caused by higher rates of AT to GC transversions in genomes of mentioned methanoproducents: level of lysine has already declined, giving substrate for the growth of histidine and glutamine, and the level of aspartic acid is not growing under the influence of GC pressure in proteins coded by genes with G+C from 0.5 to 0.6. The level of isoleucine has not decreased in proteins coded by genes with G+C from 0.4 to 0.5 just like the level of lysine has done, and so valine and threonine levels keep on growing in proteins coded by genes with G+C from 0.5 to 0.6. Mosaic genome structure should be the cause of amino acid content diversification in the common predecessor of arachaea from the group of methanoproducents. However, relatively elevated levels of isoleucine and lysine persist in this group of archaea, providing the evidence of previously existed strong AT pressure in their common predecessor.

Conclusion

All the genomic and proteomic data obtained in our research can be explained by the single hypothesis: there was a strong mutational AT pressure in the genome of common predecessor of all archaea. Then the rates of AT to GC transitions began to increase, while the rates of AT to GC transversions did not. That is why bias in GC4f and GC2f3p has occurred (GC2f3p>GC4f) and bias in arginine codon usage has not been changed (Arg2>>Arg4) in most of the offspring of common archaeal predecessor. Amino acid content of archaeal proteomes still carries certain features characteristic to proteomes encoded by genomes with strong mutational AT pressure (levels of isoleucine and lysine are increased, while levels of alanine, histidine, cysteine and glutamine are decreased). Many features of archaeal genes and proteins as well as the absence of some genes existing in bacteria and eukaria ( can be explained (at least partially) by our hypothesis.

Materials and Methods

Data

As the material for our in silico work we have used 36 lists of codon usage for each coding region (CDS) from 36 completely sequenced archaeal genomes (Table 1, Table 2). All these lists of codon usage for each CDS were taken from Codon Usage Database (www.kazusa.or.jp/codon) (.

Calculation

For the calculation of all necessary indexes needed for the current work, we used “Coding Genome Scanner” (CGS), which is a Microsoft Excel tool containing original algorithm (www.barkovsky.hotmail.ru). The function of CGS is in the calculation of G+C, 1GC, 2GC, 3GC, GC4f, GC2f3p, frequencies of nucleotide, codon and amino acid usage and the entropy of amino acid content distribution for each coding region in the genome. Coefficients of intragenomic correlation of 3GC on total GC content (R) have been calculated for all species. The coefficient of correlation indicates the strength and direction of a linear relationship between two random variables. The correlation is average or strong if R>0.5 or R<−0.5; the correlation is low or there is no correlation if −0.5 Entropy of amino acid content distribution ( has been calculated using the Equation 1 from Claude Shannon’s information theory: In this equation “faa” is the frequency of amino acid residue usage. The maximum level of uncertainty (H max) for amino acid content of protein is 4,322 bit. In the second step of our study we mixed all coding regions from the main group of archaea (excluding H. walsbyi) and arranged genes according to their GC content (0.2C<0.3; 0.3C<0.4; 0.4C<0.5; 0.5C<0.6; 0.6C<0.7; 0.7C<0.8). Then we compared entropy of amino acid content distribution, average level of 10 a.a. and levels of every amino acid usage in proteins coded by genes from these separate groups using parametric statistics. The same kind of calculations has been performed on genes and proteins from the group of methane-producing archaea. Then we compared levels of entropy and amino acid levels in groups of proteins coded by genes with the same GC content from the main group of archaea and from the group of methane-producing archaea.

Authors’ contributions

Both authors collected the datasets, conducted data analyses, and co-wrote the manuscript. Both authors read and approved the final manuscript.

Competing interests

The authors have declared that no competing interests exist.
  16 in total

1.  Codon usage tabulated from international DNA sequence databases: status for the year 2000.

Authors:  Y Nakamura; T Gojobori; T Ikemura
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Genomic plasticity in prokaryotes: the case of the square haloarchaeon.

Authors:  Sara Cuadros-Orellana; Ana-Belen Martin-Cuadrado; Boris Legault; Giuseppe D'Auria; Olga Zhaxybayeva; R Thane Papke; Francisco Rodriguez-Valera
Journal:  ISME J       Date:  2007-05-31       Impact factor: 10.302

3.  An in-silico study of alphaherpesviruses ICP0 genes: positive selection or strong mutational GC-pressure?

Authors:  Vladislav Victorovich Khrustalev; Eugene Victorovich Barkovsky
Journal:  IUBMB Life       Date:  2008-07       Impact factor: 3.885

4.  Mutational pressure is a cause of inter- and intragenomic differences in GC-content of simplex and varicello viruses.

Authors:  Vladislav Victorovich Khrustalev; Eugene Victorovich Barkovsky
Journal:  Comput Biol Chem       Date:  2009-06-27       Impact factor: 2.877

Review 5.  Archaea and the prokaryote-to-eukaryote transition.

Authors:  J R Brown; W F Doolittle
Journal:  Microbiol Mol Biol Rev       Date:  1997-12       Impact factor: 11.056

6.  Unique amino acid composition of proteins in halophilic bacteria.

Authors:  Satoshi Fukuchi; Kazuaki Yoshimune; Mamoru Wakayama; Mitsuaki Moriguchi; Ken Nishikawa
Journal:  J Mol Biol       Date:  2003-03-21       Impact factor: 5.469

7.  The Methanosarcina barkeri genome: comparative analysis with Methanosarcina acetivorans and Methanosarcina mazei reveals extensive rearrangement within methanosarcinal genomes.

Authors:  Dennis L Maeder; Iain Anderson; Thomas S Brettin; David C Bruce; Paul Gilna; Cliff S Han; Alla Lapidus; William W Metcalf; Elizabeth Saunders; Roxanne Tapia; Kevin R Sowers
Journal:  J Bacteriol       Date:  2006-09-15       Impact factor: 3.490

8.  Incision at hypoxanthine residues in DNA by a mammalian homologue of the Escherichia coli antimutator enzyme endonuclease V.

Authors:  Ane Moe; Jeanette Ringvoll; Line M Nordstrand; Lars Eide; Magnar Bjørås; Erling Seeberg; Torbjørn Rognes; Arne Klungland
Journal:  Nucleic Acids Res       Date:  2003-07-15       Impact factor: 16.971

9.  Human thymine DNA glycosylase (TDG) and methyl-CpG-binding protein 4 (MBD4) excise thymine glycol (Tg) from a Tg:G mispair.

Authors:  Jung-Hoon Yoon; Shigenori Iwai; Timothy R O'Connor; Gerd P Pfeifer
Journal:  Nucleic Acids Res       Date:  2003-09-15       Impact factor: 16.971

Review 10.  Metabolism of halophilic archaea.

Authors:  Michaela Falb; Kerstin Müller; Lisa Königsmaier; Tanja Oberwinkler; Patrick Horn; Susanne von Gronau; Orland Gonzalez; Friedhelm Pfeiffer; Erich Bornberg-Bauer; Dieter Oesterhelt
Journal:  Extremophiles       Date:  2008-02-16       Impact factor: 2.395

View more
  5 in total

1.  Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly.

Authors:  Ayon Pal; Rachana Banerjee; Uttam K Mondal; Subhasis Mukhopadhyay; Asim K Bothra
Journal:  PLoS One       Date:  2015-02-12       Impact factor: 3.240

2.  Codon usage and codon context bias in Xanthophyllomyces dendrorhous.

Authors:  Marcelo Baeza; Jennifer Alcaíno; Salvador Barahona; Dionisia Sepúlveda; Víctor Cifuentes
Journal:  BMC Genomics       Date:  2015-04-13       Impact factor: 3.969

3.  Multiple Factors Drive Replicating Strand Composition Bias in Bacterial Genomes.

Authors:  Hai-Long Zhao; Zhong-Kui Xia; Fa-Zhan Zhang; Yuan-Nong Ye; Feng-Biao Guo
Journal:  Int J Mol Sci       Date:  2015-09-23       Impact factor: 5.923

4.  A blueprint for a mutationist theory of replicative strand asymmetries formation.

Authors:  Vladislav V Khrustalev; Eugene V Barkovsky
Journal:  Curr Genomics       Date:  2012-03       Impact factor: 2.236

5.  Secondary structure preferences of mn (2+) binding sites in bacterial proteins.

Authors:  Tatyana Aleksandrovna Khrustaleva
Journal:  Adv Bioinformatics       Date:  2014-03-17
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.