| Literature DB >> 25609566 |
Zhang Wang1, Martin Wu1.
Abstract
Overwhelming evidence supports the endosymbiosis theory that mitochondria originated once from the Alphaproteobacteria. However, its exact position in the tree of life remains highly debated. This is because systematic errors, including biased taxonomic sampling, high evolutionary rates and sequence composition bias have long plagued the mitochondrial phylogenetics. In this study, we address this issue by 1) increasing the taxonomic representation of alphaproteobacterial genomes by sequencing 18 phylogenetically novel species. They include 5 Rickettsiales and 4 Rhodospirillales, two orders that have shown close affiliations with mitochondria previously, 2) using a set of 29 slowly evolving mitochondria-derived nuclear genes that are less biased than mitochondria-encoded genes as the alternative "well behaved" markers for phylogenetic analysis, 3) applying site heterogeneous mixture models that account for the sequence composition bias. With the integrated phylogenomic approach, we are able to for the first time place mitochondria unequivocally within the Rickettsiales order, as a sister clade to the Rickettsiaceae and Anaplasmataceae families, all subtended by the Holosporaceae family. Our results suggest that mitochondria most likely originated from a Rickettsiales endosymbiont already residing in the host, but not from the distantly related free-living Pelagibacter and Rhodospirillales.Entities:
Mesh:
Year: 2015 PMID: 25609566 PMCID: PMC4302308 DOI: 10.1038/srep07949
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Rooted genome trees of Alphaproteobacteria and mitochondria represented by NeighborNet graphs.
a) Original dataset. b) Original dataset + 18 newly sequenced genomes in this study (denoted with asterisks). Conflicting signal is represented by the network in the graph. The tree is rooted using Beta and Gammaproteobacteria as the outgroup.
Figure 2Split spectrum of the concatenated alignment of 26 mitochondria-encoded genes for a) the original dataset, b) the original dataset plus 18 genomes sequenced in this study.
Each bar represents a split and the height of bar (Y-axis) is the number of sites in the alignment supporting the split. The splits were ranked by their support and only the top 50 splits are shown. The splits were considered as compatible or incompatible by reconciling with well established phylogenetic relationships such as the monophyly of mitochondria or Rickettsiales. Compatible splits are in green and incompatible splits are in red. Asterisks indicate conflicting splits where a single mitochondrial species is placed within the Rickettsiales order.
Overview of the 18 alphaproteobacterial genomes sequenced in this study
| Genomes | Order | Draft genome size | No. of contigs | Coverage | GC content (%) | Protein coding genes | Mito markers | Nuclear markers | Phylum markers |
|---|---|---|---|---|---|---|---|---|---|
| 4149991 | 272 | 320x | 57.6 | 3970 | 25 | 28 | 198 | ||
| 2228395 | 649 | 23x | 58.9 | 2699 | 23 | 15 | 131 | ||
| 3464569 | 324 | 209x | 67.1 | 3494 | 24 | 26 | 197 | ||
| 3436975 | 3024 | 323x | 69.6 | 4127 | 22 | 20 | 187 | ||
| 4067442 | 259 | 150x | 50.1 | 4098 | 24 | 27 | 200 | ||
| 3156491 | 3163 | 294x | 68.0 | 4058 | 26 | 23 | 193 | ||
| 3328337 | 361 | 99x | 69.2 | 3381 | 25 | 27 | 197 | ||
| 6772298 | 4283 | 83x | 69.3 | 8184 | 25 | 24 | 190 | ||
| 4170570 | 258 | 117x | 65.9 | 4040 | 25 | 27 | 199 | ||
| 3635965 | 8906 | 91x | 67.0 | 6978 | 22 | 20 | 175 | ||
| 4353044 | 1038 | 7x | 70.2 | 4337 | 20 | 22 | 145 | ||
| 2175773 | 5 | 50x | 37.9 | 2332 | 26 | 26 | 193 | ||
| 2454690 | 55 | 67x | 41.0 | 2535 | 26 | 26 | 197 | ||
| 2668935 | 299 | 15x | 41.2 | 2967 | 23 | 26 | 195 | ||
| 1615277 | 1 | 20x | 34.8 | 1608 | 24 | 26 | 196 | ||
| 1115609 | 15 | 927x | 49.8 | 1309 | 23 | 21 | 171 | ||
| unclassified | 5676036 | 1169 | 109x | 68.4 | 5909 | 24 | 27 | 191 | |
| unclassified | 2481983 | 1 | 60x | 54.7 | 2432 | 26 | 27 | 198 |
Figure 3A rooted SSU rRNA maximum likelihood tree of alphaproteobacterial representatives using RAxML.
Highlighted in red are the 18 isolates selected for sequencing in this study. The tree was rooted using Beta and Gammaproteobacteria as the outgroup. Bootstrap values (out of 100 replicates) are shown.
Comparison between mitochondria-encoded genes and mitochondria-derived nuclear genes in terms of the evolutionary rate and composition bias
| Mitochondria-encoded genes | Mitochondria-derived nuclear genes | ||
|---|---|---|---|
| Functional categories | Energy production and conversion | ||
| Translation and posttranslational modification | |||
| Others | |||
| Mitochondrial/Nuclear average evolutionary rate (substitution/site) | 1.713 (stdev 0.225) | 1.273 (stdev 0.088) | |
| Mitochondrial/Nuclear average aminoGC content | 0.152 (stdev 0.017) | 0.215 (stdev 0.004) | |
| Mitochondrial/Nuclear average compositional chi-square scores | 662.4 (stdev 394.3) | 89.6 (stdev 41.4) | |
*T-test P < 0.01 ** T-test P < 0.001.
Figure 4Schematic phylogenetic trees based on the mitochondrial, nuclear and phylum-level marker datasets and reconstructed using RAxML and PhyloBayes.
Bootstrap values (for RAxML trees) and posterior probability values (for PhyloBayes trees) for internal nodes are shown beside them.
Figure 5The gene orders of a gene cluster of 12 protein-coding genes in Rickettsiales (red), Holosporaceae (green), the SAR11 group (purple) and the free-living Rhodospirillum rubrum (black).
Each arrow represents a gene in the cluster. Arrows with dotted lines represent a missing gene. Genome rearrangements are shown as dotted lines between two genes, with the distance between them shown above the lines. Because of the incomplete nature of some genome assemblies, the exact distance between two genes could not be determined. In this case, a minimum distance was estimated as the sum of distances of each gene to the end of the contig it was located on. For the same reason, the orientation of some genes could not be determined (indicated by asterisks below the genes).
Figure 6A rooted Bayesian consensus tree made with the nuclear dataset of 72 Alphaproteobacteria and 6 eukaryotes.
Asterisks indicate the 18 genomes sequenced in this study. The tree was rooted using Beta and Gammaproteobacteria as the outgroup. The posterior probability support values of the internal nodes are 1.0 unless as indicated in the tree.