| Literature DB >> 27345955 |
Irene Lobon1, Serena Tucci2, Marc de Manuel3, Silvia Ghirotto2, Andrea Benazzo2, Javier Prado-Martinez4, Belen Lorente-Galdos5, Kiwoong Nam6, Marc Dabad7, Jessica Hernandez-Rodriguez3, David Comas3, Arcadi Navarro8, Mikkel H Schierup6, Aida M Andres9, Guido Barbujani2, Christina Hvilsom10, Tomas Marques-Bonet11.
Abstract
The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81-1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41-0.78] with further internal separations at 0.32 Mya [0.22-0.43] and 0.16 Mya [0.17-0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics.Entities:
Keywords: bioinformatics; bonobo; chimpanzee; genome diversity; mtArchitect; next-generation sequencing
Mesh:
Year: 2016 PMID: 27345955 PMCID: PMC4943195 DOI: 10.1093/gbe/evw124
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.—(A) Geographical distribution of the genus Pan species and subspecies. (B) Neighbor-joining bootstrap consensus tree of the mitogenome of the 200 samples. The scale is in changes per base. The bootstrap values for each of the five clusters are all 100. (C) Median joining network of 11 P. t. verus indicating their geographical origin. Branch lengths are proportional to the amount of differences, symbolizing phylogenetic relations.
Fig. 2.—mtArchitect overview. (1) Whole genome sequencing reads are mapped to a standard mitochondrial reference sequence with low stringency parameters to retrieve mitochondrial reads. (2) and (3) After these reads are mapped with regular parameters to the reference, SNPs are called and incorporated into the reference, creating a new specific sequence. This step is iterative in order that the newly incorporated SNPs favour the mapping of more reads at each iteration. (4) The final modified reference start is shifted 8 kb so that the highly polymorphic D-loop is centred and more reads covering it can be retrieved. (5) All whole genome-sequencing reads are mapped to both modified references. (6) The final set of reads is subsampled up to 150× and a de novo assembly is performed 20 times for each modified reference. (7) The final sequence is constructed from the consensus of the 40 assemblies.
Summary table of the individuals included in this study. Data were gathered from six published articles. Phase I samples are 5 unpublished mitochondrial newly reconstructed sequences from Prado-Martinez et al. (2013) and Phase II are the 20 newly sequenced, unpublished and reconstructed samples. NCBI samples are chimpanzee and bonobo references (NC001643 and NC001644) and Jenny (X93335), a P. t. verus from Arnason et al. (1996)
| Subspecies | Source | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Phase I | Phase II | NCBI | TOTAL | |||||||
| — | 7 | — | 20 | 12 | 4 | — | — | 1 | 44 | |
| 1 | — | 3 | 4 | 5 | — | 5 | — | — | 18 | |
| 1 | — | 6 | 17 | 6 | 5 | — | 5 | — | 40 | |
| 1 | — | 12 | 16 | 4 | 21 | — | 5 | — | 59 | |
| 4 | — | 1 | 13 | 5 | 9 | — | 5 | 2 | 39 | |
| Total | 7 | 7 | 22 | 70 | 32 | 39 | 5 | 15 | 3 | |
Fig. 3.—(A) Nuclear and mitochondrial divergence times. Nuclear divergence times are represented by the blue tree and the mitochondrial times by the black tree. (B) Inferred historical effective population size (Ne) of each population.
Diversity statistics. For each (sub)species (D-loop, coding and complete mitochondrial genome) we report the number of individuals (N), the number of haplotypes (k), the number of segregating sites (S), Tajima’s D (D) and Fu’s F statistics, mitochondrial diversity (π) (with its standard deviation (SD)), haplotype diversity (H) and mean number of pairwise differences (MNPD)
| D | MNPD (SD) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Dloop | 40 | 36 | 101 | −1.88708 | 0.0121 | −22.71477 | 0 | 0.010992 (0.005653) | 0.9949 (0.0069) | 11.508974 (5.328178) | |
| 59 | 53 | 153 | −0.98407 | 0.1628 | −23.71742 | 0.0002 | 0.022666 (0.011210) | 0.9930 (0.0061) | 23.731151 (10.582534) | ||
| 18 | 15 | 68 | −1.07001 | 0.1318 | −2.30847 | 0.1431 | 0.014027 (0.007368) | 0.9739 (0.0293) | 14.686275 (6.898381) | ||
| 38 | 33 | 110 | 0.63672 | 0.7951 | −6.16683 | 0.0399 | 0.029312 (0.014543) | 0.9929 (0.0077) | 30.689900 (13.703112) | ||
| 43 | 28 | 85 | 0.23429 | 0.6736 | −2.14633 | 0.26 | 0.019982 (0.009988) | 0.9723 (0.0121) | 20.921373 (9.417391) | ||
| Coding region | 40 | 38 | 287 | −2.12846 | 0.0028 | −13.45385 | 0.0017 | 0.001889 (0.000937) | 0.9974 (0.0063) | 28.971795 (12.939033) | |
| 59 | 57 | 550 | −1.80945 | 0.0111 | −16.48661 | 0.0011 | 0.003816 (0.001853) | 0.9988 (0.0034) | 58.515488 (25.629206) | ||
| 18 | 18 | 161 | −1.34843 | 0.0691 | −4.08519 | 0.0321 | 0.002080 (0.001065) | 1.0000 (0.0185) | 31.895425 (14.605480) | ||
| 38 | 34 | 275 | 0.77392 | 0.837 | −1.69194 | 0.2553 | 0.005148 (0.002517) | 0.9900 (0.0105) | 78.944523 (34.744269) | ||
| 43 | 36 | 331 | −0.19018 | 0.4855 | −0.93566 | 0.3832 | 0.004732 (0.002308) | 0.9900 (0.0078) | 72.569214 (31.879795) | ||
| Complete mitogenome | 40 | 38 | 388 | −2.08028 | 0.0044 | −9.6645 | 0.007 | 0.002471 (0.001217) | 0.9974 (0.0063) | 40.480769 (17.952297) | |
| 59 | 57 | 703 | −1.63536 | 0.023 | −11.66109 | 0.006 | 0.005020 (0.002430) | 0.9988 (0.0034) | 82.246639 (35.893197) | ||
| 18 | 18 | 229 | −1.27599 | 0.0887 | −2.88198 | 0.061 | 0.002843 (0.001446) | 1.0000 (0.0185) | 46.581699 (21.180277) | ||
| 38 | 36 | 385 | 0.73975 | 0.8305 | −2.39312 | 0.153 | 0.006692 (0.003264) | 0.9972 (0.0068) | 109.634424 (48.125356) | ||
| 43 | 40 | 416 | −0.1024 | 0.5271 | −3.1576 | 0.122 | 0.005707 (0.002778) | 0.9967 (0.0059) | 93.490587 (40.977571) |
Fig. 4.—(A) Nucleotide diversity of each (sub)species, African human samples and European human samples. Coloured boxes span from the first to the third quartile and the segment inside them is the median. The vertical lines mark the maximum and minimum values (excluding outliers) and the outliers are represented by dots. (B) Correlation of mitochondrial (Mt) diversity (π) with autosomal (Aut) and X chromosome (ChrX) diversity. (C) Heat maps of intraspecific and interspecific pairwise nucleotide diversity at the mitochondrial and nuclear genome calculated for the samples from Prado-Martinez et al. (2013). Note that the scale values of mitogenomes are 10-fold the values of nuclear data.
Inter sub(species) ϕst values. The ϕst value for each pair of (sub)species is shown
| 0 | — | — | — | — | |
| 0.00328 | 0 | — | — | — | |
| 0.00439** | 0.00126 | 0 | — | — | |
| 0.00374*** | 0.00061 | 0.0018 | 0 | — | |
| 0.0046** | 0.00147 | 0.00264* | 0.002* | 0 |
Significance: *P < 0.05; **P < 0.01; ***P < 0.001.
Mitochondrial mutation rates (substitutions/site/year). The mean and 95% highest posterior density (HPD) intervals of mitochondrial mutation rates obtained in this study, Endicott and Ho (2008) and Hvilsom et al. (2014) are shown for the mitogenome and its four partitions: The D-loop, concatenated ribosomal RNA (rRNA) sequences, first and second positions of gene coding codons (PC1 + 2) and the third position of gene coding codons (PC3)
| This study | ||||||
|---|---|---|---|---|---|---|
| Mean | 95% HPD | Mean | 95% HPD | Mean | 95% HPD | |
| Whole | 2.49×10−8 | [1.75×10−8–3.32×10−8] | 9.66×10−8 | [7.35×10−8–1.16×10−7] | 2.69 × 10−8 | 1.73 × 10−8–3.84 × 10−8 |
| D-loop | 6.27×10−8 | [3.83×10−8–9.14×10−8] | 3.02×10−7 | [2.23×10−7–3.73×10−7] | 2.09 × 10−7 | 1.08 × 10−7–3.29 × 10−7 |
| rRNA | 4.58×10−9 | [2.56×10−9–6.9×10−9] | 2.21×10−8 | [1.13×10−8–3.24×10−8] | 1.29 × 10−8 | 7.04 ×10−9–2.05 × 10−8 |
| PC1 + 2 | 1.15×10−8 | [7.71×10−9–1.56×10−8] | 1.11×10−8 | [7.23×10−9–1.53×10−8] | 9.38 × 10−9 | 5.37 × 10−9–1.51 × 10−8 |
| PC3 | 5×10−8 | [3.47×10−8–6.8×10−8] | 5.09×10−8 | [3.44×10−8–6.80×10−8] | 4.83 × 10−8 | 2.63 × 10−8–7.51 × 10−8 |