| Literature DB >> 35809042 |
David L J Vendrami1, Toni I Gossmann1, Nayden Chakarov1, Anneke J Paijmans1, Vivienne Litzke1, Adam Eyre-Walker2, Jaume Forcada3, Joseph I Hoffman1,3.
Abstract
Nuclear copies of mitochondrial genes (numts) are commonplace in vertebrate genomes and have been characterized in many species. However, relatively little attention has been paid to understanding their evolutionary origins and to disentangling alternative sources of insertions. Numts containing genes with intact mitochondrial reading frames represent good candidates for this purpose. The sequences of the genes they contain can be compared with their mitochondrial homologs to characterize synonymous to nonsynonymous substitution rates, which can shed light on the selection pressures these genes have been subjected to. Here, we characterize 25 numts in the Antarctic fur seal (Arctocephalus gazella) genome. Among those containing genes with intact mitochondrial reading frames, three carry multiple substitutions in comparison to their mitochondrial homologs. Our analyses reveal that one represents a historic insertion subjected to strong purifying selection since it colonized the Otarioidea in a genomic region enriched in retrotransposons. By contrast, the other two numts appear to be more recent and their large number of substitutions can be attributed to noncanonical insertions, either the integration of heteroplasmic mtDNA or hybridization. Our study sheds new light on the evolutionary history of pinniped numts and uncovers the presence of hidden sources of mitonuclear variation.Entities:
Keywords: zzm321990 Arctocephalus gazellazzm321990 ; Antarctic fur seal; genome evolution; mitochondrial DNA; nuclear DNA sequences of mitochondrial origin (numts); pinniped
Mesh:
Substances:
Year: 2022 PMID: 35809042 PMCID: PMC9338431 DOI: 10.1093/gbe/evac104
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 4.065
Fig. 1.Summary of 25 numts characterized in A. gazella including their mitochondrial origins, locations in the nuclear genome, and patterns of phylogenetic conservation. (a) The mitochondrial origin of each numt (outer colored arcs). The innermost circle represents the A. gazella mitochondrial genome, where light grey sectors refer to tRNA and rRNA genes, dark grey sectors represent other mitochondrial genes (a: ND1, b: ND2, c: COX1, d: COX2, e: ATP8, f: ATP6, g: COX3, h: ND3, i: ND4L, j: ND4, k: ND5, l: ND6, m: CYTB) and the white sector represents the D-loop region. Numts are color coded according to the nuclear chromosome in which they are located as shown in (b). Yellow numts, absent in (b), were found in unplaced genomic scaffolds. Fragments belonging to the same numt are connected by solid black lines, whereas fragments belonging to the same numt and originating from overlapping mitochondrial regions are indicated with dashed lines. (b) The location of the numts in the A. gazella nuclear genome. Each colored arc connects the relevant region of the mitochondrial genome (grey) to the corresponding chromosome in the nuclear genome. Numbers indicate base-pair positions along the mitochondrial genome (kb) and along the nuclear chromosomes (Mb). (c) The results of blast searches against the nuclear genomes of A. gazella and other pinniped species (Z. californianus, E. jubatus, C. ursinus, O. rosmarus, H. grypus, P. vitulina, L. weddellii, M. angustirostris, M. leonina, and M. schauinslandi), whose phylogenetic relationships are indicated by the tree on the left (redrawn from Higdon et al., 2007). Each column represents a single numt. The colored squares indicate both the query cover (which is proportional to the size of square, see caption) and the percentage of sequence identity (indicated by the color gradient, see caption) when a given numt was blasted against the nuclear genome of another species. The circular plots were drawn using the R package circlize (Gu et al. 2014).
Summary Information for All 25 numts Characterized in the Antarctic Fur Seal Nuclear Genome
| Numt ID | Nuclear Scaffold | Start | End | Chromosome | Number of Fragments | Numt Length | mtDNA Coordinates | Expected PCR Product |
|---|---|---|---|---|---|---|---|---|
| 1 | ScWAj4l_1;HRSCAF = 3 | 127934607 | 127935176 | 9 | 1 | 569 | 9,566–10,139 | Yes |
| 2 | ScWAj4l_2;HRSCAF = 13 | 74612525 | 74621691 | 8 | 4 | 9166 | 2,107–5,497; 7,736–10,840; 5,454–7,653; 11,628–12,127 | Yes |
| 3 | ScWAj4l_18;HRSCAF = 239 | 84876333 | 84889827 | 14 | 2 | 13494 | 3,018–13,905; 6,301–8,931 | No |
| 4 | ScWAj4l_37;HRSCAF = 538 | 184440549 | 184440843 | 2 | 1 | 294 | 15,767–16,061 | Yes |
| 5 | ScWAj4l_43;HRSCAF = 587 | 54190362 | 54194849 | 10 | 2 | 4487 | 9,871–11,748; 11,733–13,078 | Yes |
| 6 | ScWAj4l_44;HRSCAF = 590 | 2316056 | 2316387 | 7 | 1 | 331 | 8,723–9,062 | No |
| 7 | ScWAj4l_44;HRSCAF = 590 | 7718809 | 7728008 | 7 | 2 | 9199 | 1–2,801; 10,088–16,156 | No |
| 8 | ScWAj4l_44;HRSCAF = 590 | 7977588 | 7991731 | 7 | 4 | 14,143 | 9,436–16,156; 1–1,616; 2,738–6,806; 1,597–2,721 | Yes |
| 9 | ScWAj4l_46;HRSCAF = 602 | 45289204 | 45289714 | 11 | 1 | 510 | 11,396–11,907 | Yes |
| 10 | ScWAj4l_46;HRSCAF = 602 | 85808165 | 85817409 | 11 | 4 | 9,244 | 104–428; 1,757–2,257; 11,770–12,369; 14,387–15,304 | Yes |
| 11 | ScWAj4l_49;HRSCAF = 609 | 50930992 | 50931325 | 3 | 1 | 333 | 1–326 | Yes |
| 12 | ScWAj4l_82;HRSCAF = 715 | 87894123 | 87902946 | 6 | 4 | 8,823 | 3,942–5,516; 2,319–3,843; 10,185–10,984; 9,671–9,955 | Yes |
| 13 | ScWAj4l_82;HRSCAF = 715 | 88838208 | 88839332 | 6 | 1 | 1,124 | 6,301–7,449 | Yes |
| 14 | ScWAj4l_84;HRSCAF = 721 | 138510899 | 138511378 | 1 | 1 | 479 | 14,905–15,381 | Yes |
| 15 | ScWAj4l_133;HRSCAF = 810 | 20791 | 26530 | Unplaced scaffold | 3 | 5,739 | 2,968–5,471; 1,597–2,721; 74–1,616 | Yes |
| 16 | ScWAj4l_2350;HRSCAF = 3197 | 2 | 2125 | Unplaced scaffold | 2 | 2,123 | 2,177–2,819; 3,037–3,535 | No |
| 17 | ScWAj4l_2441;HRSCAF = 3292 | 24002125 | 24007610 | 12 | 3 | 5,485 | 74–1,616; 2,968–4,007; 5,120–5,471 | Yes |
| 18 | ScWAj4l_2441;HRSCAF = 3292 | 24086242 | 24091805 | 12 | 3 | 5,563 | 74–1,616; 2,738–5,471; 1,597–2,721 | Yes |
| 19 | ScWAj4l_2441;HRSCAF = 3292 | 24285509 | 24299708 | 12 | 3 | 14,199 | 74–1,198; 2,738–5,471; 6,202–6,875 | Yes |
| 20 | ScWAj4l_2441;HRSCAF = 3292 | 24425761 | 24432730 | 12 | 3 | 6,969 | 74–1,616; 2,968–6,806; 1,597–2,671 | Yes |
| 21 | ScWAj4l_2441;HRSCAF = 3292 | 24605035 | 24611940 | 12 | 3 | 6,905 | 2,968–6,806; 74–1,198; 1,597–2,671 | Yes |
| 22 | ScWAj4l_2441;HRSCAF = 3292 | 24665417 | 24670984 | 12 | 3 | 5,567 | 79–1,616; 2,740–5,471; 1,597–2,721 | Yes |
| 23 | ScWAj4l_2441;HRSCAF = 3292 | 53287344 | 53289309 | 12 | 2 | 1,965 | 1,917–2,215; 78–432 | No |
| 24 | ScWAj4l_2441;HRSCAF = 3292 | 60968082 | 60971080 | 12 | 2 | 2,998 | 3,905–4,582; 7,807–8,060 | Yes |
| 25 | ScWAj4l_4416;HRSCAF = 5370 | 222 | 1965 | Unplaced scaffold | 1 | 1,743 | 8,175–9,955 | Yes |
Note.— Locations in the A. gazella nuclear genome are reported together with information on the number of numt fragments, total length, mitochondrial location of origin, and whether PCR amplification using nuclear genome-specific primers yielded a product of the expected size.
The Number of Synonymous and Nonsynonymous Substitutions, Together with Relative dN and dS Measures, Between Protein-Coding Genes Residing in the numt Fragments and Their mtDNA Homologs
| Numt | Fragment | Gene | Accession Number | Synonymous Substitutions | Nonsynonymous Substitutions | dN | dS | dN/dS |
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | ND3 | DAC80253.1 | 48 | 14 | 0.0564 | 1.1062 | 0.051 | <0.01 |
| 2 | 1 | ATP8 | DAC80250.1 | 0 | 0 | — | — | — | — |
| 2 | 1 | ATP6 | DAC80251.1 | 0 | 0 | — | — | — | — |
| 2 | 1 | COX3 | DAC80252.1 | 0 | 0 | — | — | — | — |
| 2 | 1 | ND3 | DAC80253.1 | 0 | 0 | — | — | — | — |
| 2 | 1 | ND4L | DAC80254.1 | 1 | 0 | — | 0.0148 | 0 | — |
| 2 | 3 | ND2 | DAC80247.1 | 46 | 16 | 0.0218 | 0.1881 | 0.1160 | <0.01 |
| 3 | 1 | COX1 | DAC80248.1 | 0 | 1 | 0.0009 | — | — | — |
| 3 | 1 | ATP8 | DAC80250.1 | 0 | 0 | — | — | — | — |
| 3 | 1 | COX3 | DAC80252.1 | 13 | 4 | 0.0066 | 0.0835 | 0.0794 | <0.01 |
| 3 | 1 | ND3 | DAC80253.1 | 5 | 2 | 0.0075 | 0.0719 | 0.1042 | <0.01 |
| 3 | 1 | ND4L | DAC80254.1 | 4 | 0 | — | 0.0683 | 0 | — |
| 3 | 1 | ND4 | DAC80255.1 | 24 | 5 | 0.0052 | 0.0615 | 0.085 | <0.01 |
| 3 | 1 | ND5 | DAC80256.1 | 30 | 5 | 0.0038 | 0.0667 | 0.0562 | <0.01 |
| 3 | 2 | COX2 | DAC80249.1 | 7 | 1 | 0.0019 | 0.0472 | 0.0404 | <0.01 |
| 3 | 2 | ATP8 | DAC80250.1 | 0 | 1 | 0.0068 | — | — | — |
| 3 | 2 | ATP6 | DAC80251.1 | 4 | 5 | 0.0103 | 0.022 | 0.4668 | 0.2145 |
| 8 | 3 | ND4L | DAC80254.1 | 4 | 0 | — | 0.0683 | 0 | — |
| 8 | 3 | ND4 | DAC80255.1 | 0 | 0 | — | — | — | — |
| 8 | 3 | ND5 | DAC80256.1 | 0 | 0 | — | — | — | — |
| 8 | 3 | CYTB | DAC80258.1 | 0 | 0 | — | — | — | — |
Note.—Identities and NCBI accession numbers are provided for each gene.
Fig. 2.Phylogenetic reconstruction of numt 1. The phylogenetic tree was built using the nuclear sequences of numt 1 found in A. gazella, Z. californianus, E. jubatus, C. ursinus, and O. rosmarus together with homologous mtDNA sequences from all pinnipeds with an available mitochondrial reference genome (see Materials and Methods). The branch containing the nuclear sequences is indicated in red.
Summary Statistics of the Substitution Rate Ratio Models Implemented to Test for Functional Conservation of the ND3 Gene Located in numt 1
| Model | Site Class | Proportion | dN/dS (mitochondria) | dN/dS (NUMTs) | LogL |
|---|---|---|---|---|---|
| One-ratio | / | / | 0.01613 | −1828.804375 | |
| Two-ratio | / | / | 0.01105 | 0.99424 | −1811.746294 |
| Two-ratio (fixed) | / | / | 0.01105 | 1 | −1811.746326 |
| Clade C | 1 | 0.57666 | 0 | 0 | −1802.187057 |
| 2 | 0 | 1 | 1 | ||
| 3 | 0.42334 | 0.02947 | 2.7993 | ||
| Clade C (fixed) | 1 | 0.55786 | 0 | 0 | −1803.236605 |
| 2 | 0 | 1 | 1 | ||
| 3 | 0.44214 | 0.02856 | 1 | ||
| M1A | 1 | 0.97937 | 0.02063 | −1827.229958 | |
| 2 | 0.02063 | 1 | |||
| M22 | 1 | 0.90499 | 0.00806 | −1815.988374 | |
| 2 | 0 | 1 | |||
| 3 | 0.09501 | 0.13246 | |||
Note.—For each model, we report dN/dS ratios for mitochondrial and nuclear sequences separately, as well as the associated log likelihood value. Models “Clade C,” “Clade C (fixed),” “M1A,” and “M22” take into account heterogeneity of the dN/dS ratio across the protein sequence and the sites are therefore partitioned into three classes. Site class 1 corresponds to sites under purifying selection (dN/dS < 1), site class 2 corresponds to neutral sites (dN/dS = 1), and site class 3 corresponds to unrestricted sites.