| Literature DB >> 23475937 |
Laura E Williams1, Jennifer J Wernegreen.
Abstract
Indel mutations play key roles in genome and protein evolution, yet we lack a comprehensive understanding of how indels impact evolutionary processes. Genome-wide analyses enabled by next-generation sequencing can clarify the context and effect of indels, thereby integrating a more detailed consideration of indels with our knowledge of nucleotide substitutions. To this end, we sequenced Blochmannia chromaiodes, an obligate bacterial endosymbiont of carpenter ants, and compared it with the close relative, B. pennsylvanicus. The genetic distance between these species is small enough for accurate whole genome alignment but large enough to provide a meaningful spectrum of indel mutations. We found that indels are subjected to purifying selection in coding regions and even intergenic regions, which show a reduced rate of indel base pairs per kilobase compared with nonfunctional pseudogenes. Indels occur almost exclusively in repeat regions composed of homopolymers and multimeric simple sequence repeats, demonstrating the importance of sequence context for indel mutations. Despite purifying selection, some indels occur in protein-coding genes. Most are multiples of three, indicating selective pressure to maintain the reading frame. The deleterious effect of frameshift-inducing indels is minimized by either compensation from a nearby indel to restore reading frame or the indel's location near the 3'-end of the gene. We observed amino acid divergence exceeding nucleotide divergence in regions affected by frameshift-inducing indels, suggesting that these indels may either drive adaptive protein evolution or initiate gene degradation. Our results shed light on how indel mutations impact processes of molecular evolution underlying endosymbiont genome evolution.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23475937 PMCID: PMC3622351 DOI: 10.1093/gbe/evt033
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Gene Content in Blochmannia chromaiodes and B. pennsylvanicus
| Species | Size (bp) | GC Content (%) | Total Genes | Protein-Coding Genes | tRNA | rRNA | Other RNA | Pseudogenes | Frameshifted Genes |
|---|---|---|---|---|---|---|---|---|---|
| 791,219 | 29.5 | 658 | 609 | 40 | 3 | 3 | 3 | 4 | |
| 791,654 | 29.6 | 658 | 609 | 40 | 3 | 3 | 3 | 4 |
Substitutions and Indels between Blochmannia chromaiodes and B. pennsylvanicus
| Substitutions | Substitutions/kb | Indels | Indels/kb | Indel bp | Indel bp/kb | |
|---|---|---|---|---|---|---|
| Protein-coding | 8,166 | 13.5 | 63 | 0.1 | 175 | 0.3 |
| RNA-coding | 45 | 5.2 | 6 | 0.7 | 21 | 2.4 |
| Pseudogene | 63 | 26.8 | 18 | 7.7 | 112 | 47.6 |
| Intergenic | 5,094 | 29.3 | 960 | 5.5 | 2,004 | 11.5 |
| Ambiguous | 21 | NA | 4 | NA | 9 | NA |
| Total | 13,389 | 16.9 | 1,051 | 1.3 | 2,321 | 2.9 |
Note.—NA, not applicable.
aTo account for differences in the amount of each sequence type in the two genomes, rates were calculated using the B. chromaiodes and B. pennsylvanicus annotations and then averaged.
b“Ambiguous” refers to positions annotated differently in the two genomes (i.e., protein-coding in B. chromaiodes and intergenic in B. pennsylvanicus).
FCompensatory indels detected in a whole genome alignment of Blochmannia chromaiodes and B. pennsylvanicus. Regions of the whole genome alignment with translated amino acid sequences are shown. For each, the alternative alignment hypothesis with only nucleotide substitutions is also shown. The region of the whole genome alignment shown for yraP is reverse complemented.
Indels Occurring in the 3'-End of Blochmannia chromaiodes and B. pennsylvanicus Genes
| Gene | Indel Size (bp) | Repeat Context of Indel | Effect of Indel | ||
|---|---|---|---|---|---|
| Difference in Protein Length (aa) | Amino Acid Substitutions | Nucleotide Substitutions | |||
| 1 | 2 G hp | 5 | 1 | 0 | |
| 1 | 8 A hp | 3 | 1 | 2 | |
| 1 | 2 A hp | 1 | 3 | 0 | |
| 2 | 5 AT mSSR | 0 | 0 | 0 | |
| 4 | 2 TAAG mSSR | 0 | 0 | 0 | |
| 1 | 4 T hp | 1 | 3 | 1 | |
| 1 | 7 A hp | 4 | 1 | 0 | |
| 1 | 8 A hp | 13 | 3 | 0 | |
| 1 | 6 A hp | 5 | 1 | 0 | |
| 2 | None | 3 | 3 | 3 | |
| 1 | 4 C hp | 1 | 6 | 1 | |
| 1 | 7 A hp | 2 | 1 | 0 | |
| 1 | 2 T hp | 1 | 0 | 0 | |
| 1 | 6 A hp | 2 | 3 | 0 | |
| 1 | 4 A hp | 4 | 2 | 1 | |
aConsidering the region from the indel to the end of the longest gene sequence.
bhp refers to homopolymers, and mSSR refers to multimeric simple sequence repeats.
cBecause this indel comprises a complete repeat unit at the exact end of the gene, there is no detectable effect on the gene or protein sequence.