Literature DB >> 22942673

Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes.

Qiang Lin1, Peng Cui, Feng Ding, Songnian Hu, Jun Yu.   

Abstract

The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery- tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.

Entities:  

Keywords:  Function-based selection; mitochondrion genome; replication-associated mutational pressure; strand-biased compositional asymmetry.

Year:  2012        PMID: 22942673      PMCID: PMC3269014          DOI: 10.2174/138920212799034811

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


INTRODUCTION

The variation of genome compositions, nucleotide or amino acid sequences, records the evolutionary history of these genomes over evolutionary time scale [1, 2]. One of the most outstanding case is the variation of G+C content that occurs universally, found in all forms of genomes, including mammals, plants, and bacteria [3-7]. The early notion about unbalance genomic composition is discovered in bacteria, and genomic CG content variation has been thought to be closely related to environmental pressure and evolutionary state [8-10], and CG content asymmetry between the two DNA strands (positive vs. negative, leading vs. lagging, and Watson vs. Crick) is associated with either replication- or transcription-associated mutations [11, 12]. In return, dynamic nucleotide composition also shapes codon usage, transcription and translation efficiencies, and therefore affects strand-biased gene organization [13, 14]. By using such logic, methods have been developed to predict origin of replication [15, 16]. In vertebrates, especially in mammals and birds, a long-range mosaic GC content variation spanning hundreds of kilobases was termed “isochores” [17, 18]. Although it is rather controversial to explain the causative mechanism about isochores, such a structure may reflect a clustering nature for some GC-rich genes rather than a global selection or mutation signature. Among the transcripts of protein-coding genes, there is a transcription-associated DNA mutational spectrum that is transcript-centric and exhibits a negative GC content gradient from 5’ to 3’ along the orientation of transcripts, and such feature becomes more pronounced among worm-blooded vertebrate and grass genes [19, 20]. GC-skew is therefore useful in predicting transcript start sites among plants [21]. Nevertheless, strand-biased nucleotide composition changes of diverse genomes are complex as mutations are always at work and selections act at each level of functional networks, all the way down to individual genes and their regulatory elements. A relative straightforward system, other than prokaryotic genomes, is the animal mitochondrial genome that is highly conserved and small in size [22, 23]. Up to now, there have been over 1,500 mitochondrial genomes sequenced from different taxonomic groups of animals. Such a significant collection provides a significant resource for large-scale comparative analysis. Since metazoan mitochondrial genomes are not likely to recombine, their gene structures are rather stable and the number of genes remain constant, providing a model for the study of sequence evolution and compositional dynamics [24]. In this review, we focus our discussions on genome compositional dynamics of mitochondrial genomes to illustrate the evolutionary relationship between genomic DNA composition variation and its functional consequences.

THE STRAND-BIASED NUCLEOTIDE COMPOSITIONS

The two DNA strands of animal mitochondrial genomes have different buoyant densities, and are thus named as heavy strand (or H-strand) and light strand (or L-strand). The two strands often have different nucleotide compositions, where H-strand is GT-rich and L-strand is AC-rich. Such a difference has been explained based on the strand-displacement model of mitochondrial DNA (mtDNA) replication mechanism. During mtDNA replication, the parental H-strand, as a single DNA strand, has a longer single-stranded state, serving as template for the daughter L-strand synthesis [25-27]. Since spontaneous deamination of both A (adenine) and C (cytosine) [28-30] occurs frequently in single-stranded DNA [31], it essentially leads to strand-biased composition. The deamination of A leads to I (hypoxanthine), forming stronger base pairing with C than with T (thymidine) and generating an A:T→G:C mutation. The deamination of C leads to U, generating C:G→U:A mutation. Once the C→U mutant-bearing strand is used as a template to replicate the daughter L-strand, it leads to a G→A mutation in the L-strand after one round of DNA duplication. Therefore, the H-strand, left single-stranded for an extended period of time during DNA replication, tends to accumulate A→G and C→U mutations and become rich in G and T, and meanwhile, the H-strand accumulates an excess of G over C and T over A, i.e. GC skew and AT skew, whereas the L-strand becomes rich in A and C, showing an excess of C over G and A over T (Fig. ).

THE ASYMMETRY OF PROTEIN-CODING SEQUENCES

In protein-coding sequences, the effect of strand-biased compositional asymmetry differs at different codon positions [32]. Because of the wobble position (the 3rd codon position or cp3) and codon redundancy, the accumulative consequence of directional mutation pressure can be observed readily based on analysis of the nucleotide composition at cp3 sites of fourfold degenerate amino acids. We collected 870 vertebrate and 342 invertebrate complete mitochondrial genome sequences from different taxonomic groups archived at NCBI and calculated the relative synonymous codon usage (RSCU) as a measure of codon usage bias, which is defined as the observed frequency of a codon divided by the expected frequency under the assumption of equal codon usage (Table ). Among the vertebrate mtDNAs, the L-strand genes tend to end their codons with A or C more frequently than G or T as compared to the H-strand genes that prefer to have codons ending with G and T. It is obvious that genes of the two strands exhibit distinct codon usages unique to the strand. In contrast to vertebrates, arthropod mtDNAs share similar mutation bias in the sites of four-fold or six-fold degenerate amino acids but it weakens in the two-fold degenerate sites, or rather their corresponding amino acids, are very sensitive to transversions (purine to pyrimidine or vice versa) that always change the amino acid sequences. Among animals, mitochondrial genomes, with a few exceptions, encode 22 tRNA genes, resulting one tRNA species per amino acid on average (Table ). The exceptions are leucine and serine tRNAs, which are two of the three six-fold degenerate amino acids and among the most abundant amino acids of protein-coding sequences (the case of the other six-fold degenerate amino acids Arg is rather complex in mitochondrial genomes so we chose not to discuss it in details). There are usually two tRNAs for their six codons and only one (occasionally two) for each amino acid; the first two codon positions (codon positions 1 and 2, or cp1 and cp2) are complementary to its tRNA anticodons and the third one, cp3, involves wobble base pairing. Moreover, all anticodons of 22 tRNAs are highly conserved, which perfectly match the single codons ending in A or C (Fig. ). The most frequently used codons by the L-strand genes are those perfectly match to their tRNAs, whereas for the H-strand genes, the most frequently used codons do not perfectly match to their tRNAs. There is only one gene found to be on H-strand among almost all vertebrate mitochondria is nad6. A recent study found that in Antarctic notothenioids, nad6, adjacent to tRNA-Glu, has translocated into the control region from their canonical location to release H-strand’s mutation pressure [33]. In addition, a study on horses residing at different altitudes provided evidence that their nad6 genes have the lowest genetic diversity and may undergo purifying selection for adapting to high altitudes, and again the observation suggests that nad6 may become intolerable to additional mutations [34]. Therefore, there may be an advantage for genes to have better matched codons and anticodons for a “best-fit” in the protein translational machinery, and an appropriate positioning on the right strand alters the strand distribution of mtDNA genes.

THE ASYMMETRY OF AMINO ACID COMPOSITIONS

In bacteria, asynchronous replication between leading and lagging strands induces SCA that shapes codon and amino acid usages and contributes to strand-biased gene distribution (SGD) [13, 14, 35, 36]. In E. coli and Bacillus subtili, Gly, Val, Glu, and Asp are relatively abundant on leading strand but Ile, Thr, and His are relatively more abundant on lagging strand. In addition, hydrophilic amino acids are more plentiful in proteins encoded by regions close to the terminus of chromosome replication, whereas hydrophobic amino acids are more abundant in proteins encoded by regions close to the origin of the replication [37]. Such replication-associated amino acid compositional asymmetry is also found among metazoan mitochondrial genomes. Compared with CG-skew and AT-skew in flatworms and mammals, an opposite SCA was observed in amino acids of protein-coding sequences [38]. To summarize the impact of RMP on amino acid compositional dynamics, we used dinucleotide frequency to show amino acid composition variations between the two strands, and Pyrocoelia rufa, Lampsilis ornate, and Trichosurus vulpecula, are representing arthropods, mollusks, and vertebrates, respectively (Figs. and ). In P. rufa and L. ornate mitochondrial genomes, AT, AA, AC, CC, and GC are used more frequently on their L-strands, as opposed to TT, TG, GT, and GG used more frequently on their H-strands. As a result, Thr, Ile, Pro, Ala, and Lys are more frequently found among the L-strand genes, whereas Gly, Val, Phe, and Cys are more prevalent among the H-strand genes. This amino acid composition bias is obvious among invertebrate mtDNAs. Despite the fact that vertebrate mtDNA harbor one gene, nad6, on their H-strand, and the pattern is not statistically significant, we can still see more frequent Val, Gly, and Glu on the H-strand of T. vulpecula mtRNA. In addition, the synonymous sites of four-fold and six-fold degenerate amino acids are more strongly affected as compared to other amino acids.

A STRONGER RMP AMONG INVERTEBRATE MTDNA GENES

We selected several complete mitochondrial genomes from different taxonomic groups of animals to validate the different trends found between vertebrates and non-vertebrates. Among the vertebrate mtDNA genes, SCA is more pronounced at cp1 and cp3 but weaker at cp2 (Fig. , ). Such codon-position-specific effects reflect the organization of the genetic code [39-42]. There are several mechanisms may be at work. First, mutational pressure pushes for a compositional change on a strand, altering DNA sequences. The cp1 and cp3 nucleotides are changing more easily under pressures so that we can measure the presence of such pressure. Second, once the amino acids are fixed due to functional selections, purifying selection conveys resistance to further changes, especially for those cp3 nucleotides to avoid loss-of-function mutations (such as at the two-fold degenerate sites). Since the protein-coding nucleotide sequences at cp2 very much determine the properties of amino acids (such as hydrophobicity), amino acids at this position are largely negatively selected in most cases. Third, what is striking about such observations is the fact that, in the case of invertebrate mtDNA genes, RMP appears to show an equal effect on the three codon positions. In other words, codon positions, cp1, cp3, and even cp2 all show similar yet stronger SCA (Fig. to ). This phenomenon appears universal among most, if not all, invertebrate mitochondrial genomes. This is believed to be a result of balanced mutation and selection [43]. It can be achieved in two ways. First, positive selection and advantageous mutations are so strong that they gradually become fixed at cp2 and the rest sites are left to draft back and forth. Second, relaxed selection or drafting may occur in invertebrate mtDNAs, where the codon bias of all positions reaches an equilibrium [44].

GENE ORGANIZATION VARIATION AMONG ANIMAL MITOCHONDRIAL GENOMES

Gene organization among animal mitochondrial genomes is highly conserved and related to nucleotide composition variation through the change of codon and amino acid usages. Usually, strand-biased mutations lead to SGD because genes or CDS (coding sequences) favor higher purine contents. The reason lies in the organization of the genetic code where abundant amino acids are in general purine-rich. For instance, two abundant acidic amino acids are both encoded by GAN [35, 39]. However, SGD in prokaryotic genomes are complicated due to frequent horizontal gene transfers. In Chlamydia genomes, gene strand-switching decreases as nucleotide composition becomes more biased and most substitution types are asymmetric substitution due to replication-associated mechanisms [45]. In Bacillaceae group, horizontally transferred genes prefer the leading strand, and conserved genes are subjected to be discarded more than lineage-specific genes from the leading strand [46]. In vertebrate mitochondrial genomes, recombination is rather rare but it seems not true among metazoan species of the early animal branches according to an analysis on co-cluster genes; recombination occurs in some animal species [22]. A recent study on mitochondrial genome of Antarctic notothenioid showed that its nad6 gene jumps from H-strand to L-strand through a duplication-loss model [33]. This provides a solid incidence that gene shift occurs under mutational pressure or due to SCA.

CONCLUSION

The mitochondrial genome provides a simple model system for study RMP and SCA as well as their relationship to codon and amino acid usages and gene organization. Among animal mitochondrial genomes, SCA of the two differently replicated strands may be introduced by replication-associated mutational pressure [25, 32, 47] and our review suggested this asymmetry is directly related to mitochondrial protein-coding genes since genes on both DNA strands have distinct codon and amino acid usages. The composition and usage variations may also govern the position change of mitochondrial genes of the DNA strands. These statements are consistent with previous studies on bacterial genomes where replication-associated mutational pressure drives codon and amino acid usage changes in protein-coding genes. Natural selection also acts on codon and amino acid usages because some of these changes alter protein function or biological processes [48-50]. Furthermore, natural selection could also call upon appropriate gene organization in the two DNA strands that often have distinct nucleotide, codon, and amino acid compositions. Therefore, selection can act to promote SGD directly.
Table 1.

The Average Relative Synonymous Codon Usage (RSCU) of L- and H-Strand Genes Among Vertebrates and Hexapod mtDNAs

Endothermic animalbExothermic animalcHexapodd
Amino acid (anticodon)aCodonRSCU (L-strand)RSCU (H-strand)RSCU (L-strand)RSCU (H-strand)Amino acid (anticodon)CodonRSCU (L-strand)RSCU (H-strand)
Leu(UAG)TTG0.092.320.161.84Leu (UAG/UAA)TTG0.240.97
CTT0.750.561.230.88CTT0.750.44
CTA2.740.271.930.41CTA10.22
CTG0.340.430.440.46CTG0.090.07
TTA0.982.371.122.28TTA3.764.28
CTC1.10.051.120.12CTC0.160.03
Lys (UUU)AAA1.820.671.730.73Lys (UUU/CUU)AAA1.751.37
AAG0.181.330.271.27AAG0.250.63
Glu (UUC)GAA1.710.571.610.56Glu (UUC)GAA1.841.44
GAG0.291.430.391.44GAG0.160.56
Gln (UUG)CAG0.191.340.31.25Gln (UUG)CAG0.110.54
CAA1.810.661.70.75CAA1.891.46
His (GUG)CAT0.631.820.611.56His (GUG)CAT1.351.79
CAC1.370.181.390.44CAC0.650.21
Asn (GUU)AAC1.340.161.250.48Asn (GUU)AAC0.450.14
AAT0.661.840.751.52AAT1.551.86
Ser(GCU/UGA)TCC1.610.191.540.41Ser (GCU/UGA)TCC0.560.15
TCA2.230.591.950.83AGA1.371.97
AGC0.870.230.970.31TCA2.871.32
TCG0.140.510.190.61AGC0.190.12
TCT0.912.251.072.61TCG0.140.15
AGT0.252.230.281.23TCT2.092.91
Trp (UCA)TGG0.171.120.281.07AGG0.120.18
TGA1.830.881.720.93AGT0.651.2
Ala (UGC)GCG0.080.920.140.87Trp (UCA)TGG0.180.43
GCT0.772.080.731.88TGA1.821.57
GCC1.710.281.760.41Ala (UGC)GCG0.070.21
GCA1.430.721.370.84GCT1.692.69
Arg (UCG)CGG0.181.730.391.56GCC0.610.23
CGA2.50.532.370.85GCA1.620.87
CGT0.411.660.461.44Arg (UCG)CGG0.260.55
CGC0.910.080.780.15CGA2.871.49
Cys (GCA)TGC1.430.251.350.26CGT0.671.88
TGT0.571.750.651.74CGC0.20.07
Gly (UCC)GGC1.280.141.250.27Cys (GCA)TGC0.520.12
GGG0.361.680.622.07TGT1.481.88
GGA1.820.681.590.63Gly (UCC)GGC0.190.15
GGT0.541.50.531.03GGG0.450.91
Asp (GUC)GAT0.671.840.641.66GGA2.521.3
GAC1.330.161.360.34GGT0.841.64
Phe (GAA)TTT0.751.870.981.75Asp (GUC)GAT1.461.84
TTC1.250.131.020.25GAC0.540.16
Met (CAU)ATA1.670.671.430.74Phe (GAA)TTT1.571.91
ATG0.331.330.571.26TTC0.430.09
Tyr (GUA)TAC1.170.271.140.39Met (CAU)ATA1.81.61
TAT0.831.730.861.61ATG0.20.39
Val (UAC)GTT0.71.821.051.62Tyr (GUA)TAC0.540.15
GTC1.020.140.970.21TAT1.461.85
GTG0.281.220.371.28Val (UAC)GTT1.482.34
GTA20.821.610.89GTC0.250.13
Thr (UGU)ACT0.732.310.742.07GTG0.230.4
ACC1.370.151.440.49GTA2.041.14
ACA1.780.871.650.73Thr (UGU)ACT1.472.45
ACG0.120.670.160.71ACC0.450.26
Pro (UGG)CCA1.690.331.590.6ACA21.13
CCC1.360.291.390.54ACG0.080.16
CCT0.853.010.812.07Pro (UGG)CCA1.690.86
CCG0.10.370.20.79CCC0.520.27
Ile (GAU)ATT0.911.871.171.76CCT1.692.69
ATC1.090.130.830.24CCG0.10.17
Ile (GAU)ATT1.691.9
ATC0.310.1

The bold numbers indicate the most frequently used codons.

What in parentheses are anticodons of the corresponding tRNAs.

Andothermic animals whose mitochondrial genome sequences are available.

Exothermic animals whose mitochondrial genome sequences are available.

Hexapods whose mitochondrial genome sequences are available.

  49 in total

1.  Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes.

Authors:  Hongzhu Qu; Hao Wu; Tongwu Zhang; Zhang Zhang; Songnian Hu; Jun Yu
Journal:  Res Microbiol       Date:  2010-09-22       Impact factor: 3.992

2.  GC content variability of eubacteria is governed by the pol III alpha subunit.

Authors:  Xiaoqian Zhao; Zhang Zhang; Jiangwei Yan; Jun Yu
Journal:  Biochem Biophys Res Commun       Date:  2007-02-28       Impact factor: 3.575

Review 3.  The isochore organization of the human genome.

Authors:  G Bernardi
Journal:  Annu Rev Genet       Date:  1989       Impact factor: 16.830

4.  Ongoing evolution of strand composition in bacterial genomes.

Authors:  E P Rocha; A Danchin
Journal:  Mol Biol Evol       Date:  2001-09       Impact factor: 16.240

5.  A strong effect of AT mutational bias on amino acid usage in Buchnera is mitigated at high-expression genes.

Authors:  Carmen Palacios; Jennifer J Wernegreen
Journal:  Mol Biol Evol       Date:  2002-09       Impact factor: 16.240

6.  Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes.

Authors:  M J McLean; K H Wolfe; K M Devine
Journal:  J Mol Evol       Date:  1998-12       Impact factor: 2.395

7.  Dependence of mutational asymmetry on gene-expression levels in the human genome.

Authors:  Jacek Majewski
Journal:  Am J Hum Genet       Date:  2003-07-24       Impact factor: 11.025

8.  Recombination drives the evolution of GC-content in the human genome.

Authors:  Julien Meunier; Laurent Duret
Journal:  Mol Biol Evol       Date:  2004-02-12       Impact factor: 16.240

9.  ND6 gene "lost" and found: evolution of mitochondrial gene rearrangement in Antarctic notothenioids.

Authors:  Xuan Zhuang; C-H Christina Cheng
Journal:  Mol Biol Evol       Date:  2010-01-27       Impact factor: 16.240

10.  GC-compositional strand bias around transcription start sites in plants and fungi.

Authors:  Shigeo Fujimori; Takanori Washio; Masaru Tomita
Journal:  BMC Genomics       Date:  2005-02-28       Impact factor: 3.969

View more
  5 in total

1.  Maternal ancestry and population history from whole mitochondrial genomes.

Authors:  Toomas Kivisild
Journal:  Investig Genet       Date:  2015-03-10

2.  Multiple Factors Drive Replicating Strand Composition Bias in Bacterial Genomes.

Authors:  Hai-Long Zhao; Zhong-Kui Xia; Fa-Zhan Zhang; Yuan-Nong Ye; Feng-Biao Guo
Journal:  Int J Mol Sci       Date:  2015-09-23       Impact factor: 5.923

3.  G-quadruplex dynamics contribute to regulation of mitochondrial gene expression.

Authors:  M Falabella; J E Kolesar; C Wallace; D de Jesus; L Sun; Y V Taguchi; C Wang; T Wang; I M Xiang; J K Alder; R Maheshan; W Horne; J Turek-Herman; P J Pagano; C M St Croix; N Sondheimer; L A Yatsunyk; F B Johnson; B A Kaufman
Journal:  Sci Rep       Date:  2019-04-03       Impact factor: 4.379

4.  Potential causes and consequences of rapid mitochondrial genome evolution in thermoacidophilic Galdieria (Rhodophyta).

Authors:  Chung Hyun Cho; Seung In Park; Claudia Ciniglia; Eun Chan Yang; Louis Graf; Debashish Bhattacharya; Hwan Su Yoon
Journal:  BMC Evol Biol       Date:  2020-09-07       Impact factor: 3.260

5.  New Insight into Parrots' Mitogenomes Indicates That Their Ancestor Contained a Duplicated Region.

Authors:  Adam Dawid Urantówka; Aleksandra Kroczak; Tony Silva; Rafael Zamora Padrón; Nuhacet Fernández Gallardo; Julie Blanch; Barry Blanch; Pawel Mackiewicz
Journal:  Mol Biol Evol       Date:  2018-12-01       Impact factor: 16.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.