| Literature DB >> 31194871 |
Kirill V Mikhailov1,2, Boris D Efeykin2,3, Alexander Y Panchin2, Dmitry A Knorre1,4, Maria D Logacheva1,2,5, Aleksey A Penin1,2, Maria S Muntyan1, Mikhail A Nikitin1,2, Olga V Popova1, Olga N Zanegina1, Mikhail Y Vyssokikh1, Sergei E Spiridonov3, Vladimir V Aleoshin1,2, Yuri V Panchin1,2.
Abstract
Inverted repeats are common DNA elements, but they rarely overlap with protein-coding sequences due to the ensuing conflict with the structure and function of the encoded protein. We discovered numerous perfect inverted repeats of considerable length (up to 284 bp) embedded within the protein-coding genes in mitochondrial genomes of four Nematomorpha species. Strikingly, both arms of the inverted repeats encode conserved regions of the amino acid sequence. We confirmed enzymatic activity of the respiratory complex I encoded by inverted repeat-containing genes. The nucleotide composition of inverted repeats suggests strong selection at the amino acid level in these regions. We conclude that the inverted repeat-containing genes are transcribed and translated into functional proteins. The survey of available mitochondrial genomes reveals that several other organisms possess similar albeit shorter embedded repeats. Mitochondrial genomes of Nematomorpha demonstrate an extraordinary evolutionary compromise where protein function and stringent secondary structure elements within the coding regions are preserved simultaneously.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31194871 PMCID: PMC6649704 DOI: 10.1093/nar/gkz517
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Inverted repeats in the coding sequences of nad1 genes. (A) Schematic depiction of an amino acid alignment of nad1 sequences from four species of Nematomorpha; inverted repeats in the underlying nucleotide sequences are highlighted in purple or orange, regions of overlapping repeats are indicated with a striped pattern; a conservation profile of nad1, derived from an amino acid alignment of animal sequences (Materials and Methods section), is depicted below the alignment as an estimate of site rates. (B) A portion of the nad1 amino acid sequence alignment (sites 1–100) showing conservation of amino acids encoded by both arms of the inverted repeat in nematomorph sequences; the major inverted repeat (purple) and the shorter overlapping repeat (orange) are shown schematically above the alignment; the overlapping region of the repeats and the resulting direct repeat are indicated with a striped pattern.
Figure 2.Inverted repeat enrichment experiment with the mitochondrial DNA of Nematomorpha. (A) The DNA digestion procedure employed for confirming the inverted repeat sequences, involving DNA denaturation, rapid renaturation, and treatment with a single-strand-specific mung bean nuclease. (B) Histogram of read depth over the mitochondrial genome of Gordius sp. with reads from a sequencing library of untreated DNA; inverted repeat regions are depicted in gray. (C) Mapping of reads from a library of nuclease-treated DNA, showing the enrichment of inverted repeat sequences.
Figure 3.Mapping of reads from an RNA-Seq library of Gordionus alpestris. The histogram shows read depth over the mitochondrial genome of G. alpestris with the inverted repeat regions shaded gray; region containing the highly expressed rRNA genes was masked during the mapping (indicated with a striped pattern).
Figure 4.Characterization of the three inverted repeat phases for coding sequences. (A) Three possible relative arrangements of codons in the complementary arms of inverted repeats: phase 1 repeats tie together codon positions 1 between the complementary arms of the repeat, phase 2 repeats tie together codon positions 2 and phase 3 repeats tie together codon positions 3. (B) Dependence of the relative amounts of repeats of each phase on the minimal length of the repeat in the genomes of Nematomorpha; the total number of detected repeats drops from 35 487 at the minimal length of 5 bp to 68 at the minimal length of 25 bp, and to 31 at the minimal length of 50 bp. (C) Distributions of inverted repeat lengths by phase in nematomorph species (for repeats over 15 bp), showing individual repeat occurrences.
Characteristics of inverted repeat phases
| Phase 1 | Phase 2 | Phase 3 | |
|---|---|---|---|
| Mean number of permitted amino acids | 3.84 | 3.21 | 1.59 |
| Mean entropy of amino acid states, bits | 1.20 | 1.11 | 0.50 |
| Mean entropy of amino acid states, bits (equal codon frequencies) | 1.80 | 1.63 | 0.45 |
| Observed number of repeats >15 bp | 47 | 22 | 41 |
| Observed cumulative length of repeats >15 bp | 2419 | 1207 | 1120 |
| Observed number of repeats >30 bp | 32 | 13 | 10 |
| Observed cumulative length of repeats >30 bp | 2101 | 1030 | 448 |
The number of permitted amino acids denotes how many different amino acids can be accommodated by the reverse strand without changing the amino acids coded by the forward strand, averaged across the length of the sequence; entropy provides a measure of permitted amino acids with the account of naturally observed or equal codon frequencies (Materials and Methods section).
Figure 5.Concerted evolution of the complementary arms of inverted repeats in Nematomorpha. (A) Scatter plot of inverted repeat length versus number of matching sites for repeats discovered by einverted (Materials and Methods section), permitting imperfect match between the sequences; nematomorph repeats (depicted in green) are clustered on or near the diagonal; the striped region of the plot corresponds to the alignment scores below the allowed threshold under the default scoring scheme of einverted with a score cutoff of 15. (B) A portion of read alignments with libraries from two individuals of Gordionus alpestris, featuring the central region of a hairpin in nad2; the reference sequence (above) is based on the library of the first individual; translated sequence is given on top of the alignment; the hairpin arrangement is indicated below the alignment using the parenthesis notation (a fuller version of the alignments is given in Supplementary Figure S8).
Figure 6.Characteristics of perfect inverted repeats in the protein-coding sequences of mitochondrial genomes. (A) Over-representation of inverted repeats in the mitochondrial coding sequences of eukaryotes (excluding Nematomorpha) (red) over the repeats in sequences with randomly shuffled codons—10 replicates (gray). (B) Plot of perfect inverted repeat length versus the repeat AT content in the mitochondrial coding sequences of Nematomorpha (blue) and other eukaryotes (red).
Figure 7.Schematic depiction of amino acid sequence alignments with mitochondrial genes of Nematomorpha (Ch, Chordodes sp.; Gw, Gordionus wolterstorffii; Ga, Gordionus alpestris; G, Gordius sp.), featuring repeat-containing genes of other invertebrates: Ls, Lepidodermella squamata; Aa, Aleurochiton aceris; He, Hoploplana elisabelloi; Tc, Thaumamermis cosgrovei. The positions of inverted repeats in sequences are marked with colors corresponding to the three inverted repeat phases, and the regions of overlap between repeats of different phases are marked with a striped pattern. Red pins mark midpoint positions of hairpins that are shared by at least three species. Each alignment is supplemented with a conservation profile corresponding to the site rate estimates inferred using a concatenate of amino acid alignments of animal sequences (Materials and Methods section).