| Literature DB >> 35719890 |
Torres Lucas1,2, Bretagnolle Vincent1, Pante Eric2.
Abstract
Mitochondrial DNA (mtDNA) translocated into the nuclear genome (numt), when co-analysed with genuine mtDNA, could plague phylogeographic studies. To evaluate numt-related biases in population genetics parameters in birds, which are prone to accumulating numts, we targeted the mitochondrial mt-cytb gene. We looked at 13 populations of Audubon's shearwater (Puffinus lherminieri), including five mitochondrial lineages. mt-cytb homologue and paralogue (numt) sequences were determined by Sanger sequencing with and without prior exonuclease digestion of nuclear DNA. Numts formed monophyletic clades corresponding to three of the five mitochondrial lineages tested (the remaining two forming a paraphyletic group). Nineteen percent of numt alleles fell outside of their expected mitochondrial clade, a pattern consistent with multiple translocation events, incomplete lineage sorting (ILS), and/or introgression. When co-analysing mt-cytb paralogues and homologues, excluding individuals with ambiguities underestimates genetic diversity (4%) and differentiation (11%) among least-sampled populations. Removing ambiguous sites drops the proportion of inter-lineage genetic variance by 63%. While co-analysing numts with mitochondrial sequences can lead to severe bias and information loss in bird phylogeographic studies, the separate analysis of genuine mitochondrial loci and their nuclear paralogues can shed light on numt molecular evolution, as well as evolutionary processes such as ILS and introgression.Entities:
Keywords: mito-nuclear discordance; mitochondrial DNA; numts; pseudogenes; shearwater; transposable elements
Year: 2022 PMID: 35719890 PMCID: PMC9198517 DOI: 10.1098/rsos.211888
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 3.653
Figure 1Distribution of the breeding colonies sampled for this study.
Summary of the presence of ambiguous sequences within each tested lineage and population, among the 201 sequenced shearwater individuals.
| lineage | population | no. sequenced individuals | no. ambiguous sequences | proportion of ambiguous sequences | average number of ambiguous sites per sequence |
|---|---|---|---|---|---|
| Allencay | 17 | 4 | 0.24 | 14 | |
| Longcay | 17 | 4 | 0.24 | 22 | |
| Martinique | 10 | 5 | 0.5 | 33 | |
| St Barthélemy | 8 | 5 | 0.63 | 27 | |
| Raso | 16 | 12 | 0.75 | 40 | |
| Cima | 18 | 5 | 0.28 | 10 | |
| Mclara | 14 | 7 | 0.50 | 19 | |
| Vila | 17 | 5 | 0.29 | 36 | |
| Selvagem | 4 | 1 | 0.25 | 10 | |
| Funchal | 3 | 0 | 0 | 0 | |
| North Reunion | 24 | 7 | 0.29 | 29 | |
| South Reunion | 25 | 11 | 0.46 | 36 | |
| Seychelles | 28 | 9 | 0.32 | 61 |
Difference of codon bias position and transition (Ti) to transversion (Tv) ratio between mitochondrial and numt sequences.
| lineage | sequence number | clean | numt | clean | numt | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| first | second | third | first | second | third | Ti | Tv | Ti/Tv ratio | Ti | Tv | Ti/Tv ratio | ||
| position | position | position | position | position | position | ||||||||
| 18 | 5 | 0 | 10 | 20 | 10 | 39 | 13 | 2 | 6,5 | 54 | 15 | 3,6 | |
| 17 | 2 | 0 | 8 | 8 | 0 | 36 | 9 | 1 | 9 | 41 | 5 | 8,2 | |
| 13 | 1 | 0 | 6 | 9 | 1 | 27 | 7 | 0 | Inf | 33 | 4 | 8,3 | |
| 18 | 1 | 0 | 1 | 11 | 3 | 41 | 2 | 0 | Inf | 49 | 6 | 7,1 | |
| 9 | 1 | 0 | 2 | 9 | 3 | 30 | 3 | 0 | Inf | 38 | 6 | 6,3 | |
| all lineages | 75 | 13 | 1 | 47 | 31 | 13 | 77 | 57 | 6 | 9,5 | 101 | 25 | 4 |
Figure 2Phylogenetic relationships among mitochondrial and numt sequences. (a) Pooled mitochondrial and numt sequences. Tree tips are coloured according to lineages (lherminieri in blue, boydi in green, baroli in red, nicolae in black, bailloni in grey and P. pacificus outgroup in orange). At each tip, the sequence origin is provided as a coloured dot (mitochondrial as a blue dot, numts as red dots). Mitochondrial and numt sequences retrieved from a single individual are linked by a curved blue line, the width of which is proportional to the genetic distance between the aforementioned sequences. Posterior probabilities greater than 0.95 are indicated. (b) Separately analysed mitochondrial (left) and numt (right) sequences. Tree tips are coloured according to lineages as above. Segments link sequences from a single individual. Orange lines show discordant mitochondrial/numt phylogenetic positions while concordant sequence pairs are coloured according to their lineage. Only the individuals presenting different mitochondrial and numt sequences (genetic distance ≠ 0) were used. Posterior probabilities greater than 0.95 are shown for (b).
Diversity statistics. S represents the number of parsimony-informative sites, h the haplotype diversity and π the nucleotide diversity. 95% confidence intervals (within brackets) were calculated based on 1000 sample bootstraps. Dataset dimensions are provided in terms of sequences × nucleotide positions.
| dataset | dimensions | |||
|---|---|---|---|---|
| CLEAN | 59 [57–66] | 0.95 [0.93–0.96] | 0.0236 [0.0229–0.02383] | 201 × 833 |
| SITES-LESS | 11 [9–24] | 0.82 [0.37–0.85] | 0.0018 [0.00157–0.00215] | 201 × 720 |
| INDIVIDUAL-LESS | 54 [54–63] | 0.95 [0.93–0.96] | 0.0238 [0.02307–0.02411] | 140 × 833 |
| AMBIGUOUS | 59 [59–76] | 0.94 [0.93–0.96] | 0.0226 [0.02177–0.02312] | 201 × 833 |
| NUMTS | 59 [58–74] | 0.94 [0.93–0.96] | 0.0226 [0.02186–0.02309] | 201 × 833 |
Figure 3Haplotype networks for the five datasets and all individuals. (a) CLEAN dataset. (b) SITES-LESS dataset. (c) INDIVIDUAL-LESS dataset. (d) AMBIGUOUS dataset. The scale bars show how the length of a branch translates in sequence divergence. The unit is divergent nucleotides divided by the length of the sequence analysed.
Global differentiation statistics. Global FST was calculated both by the Weir and Cockerham method implemented in the R hierfstat package with 95% CI calculated from 1000 samples bootstrap and by an AMOVA in Arlequin. Va, Vb and Vc represent the percentage of variation among groups, among populations within groups and within populations of the AMOVA, respectively.
| dataset | global | global | |||
|---|---|---|---|---|---|
| CLEAN | 0.87 [0.88–0.92] | 0.92 | 90.49*** | 1.75*** | 7.76*** |
| SITES-LESS | 0.69 [0.64–0.79] | 0.37 | 27.98*** | 8.27*** | 63.75*** |
| INDIVIDUAL-LESS | 0.89 [0.89–0.9] | 0.92 | 91.02*** | 1.59*** | 7.39*** |
| AMBIGUOUS | 0.77 [0.75–0.83] | 0.92 | 90.19*** | 1.47*** | 8.34*** |
| NUMTS | 0.89 [0.88–0.92] | 0.91 | 89.42*** | 1.72*** | 8.85*** |