| Literature DB >> 28890940 |
Iker Irisarri1,2, Denis Baurain3, Henner Brinkmann4, Frédéric Delsuc5, Jean-Yves Sire6, Alexander Kupfer7, Jörn Petersen4, Michael Jarek8, Axel Meyer9, Miguel Vences10, Hervé Philippe11,12.
Abstract
Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.Entities:
Keywords: Gnathostomata; RNA-Seq; cross-validation; jackknifing; molecular dating; phylogeny; substitution rates; transcriptome
Year: 2017 PMID: 28890940 PMCID: PMC5584656 DOI: 10.1038/s41559-017-0240-5
Source DB: PubMed Journal: Nat Ecol Evol ISSN: 2397-334X Impact factor: 15.460
Figure 1Transcriptome sequencing effort and performance in phylogenomic dataset assembly. Histogram represent sequencing effort as total number of sequenced (clean) Mbp (million bp). Transcriptome completeness is measured as the proportion of recovered core vertebrate genes (233 CVG; Hara et al.29). Genes effectively usable for phylogenomics are approximated by the proportion of human proteins reconstructed at full (100%) and nearly full (>70%) lengths (in proportion to a total of 22,964 human genes). The completeness the relevant species in our final phylogenomic dataset is shown as the proportion of amino acids across all 7,189 genes (3,791,500 aligned amino acids in total).
Figure 2Backbone phylogeny of jawed vertebrates. (a) Bayesian majority-rule consensus tree from 100 independent MCMC chains derived from gene jackknife replicates (~50,000 amino acid positions each) of the NoDP nuclear dataset, estimated by PhyloBayes under the CAT+Γ model. All nodes received full gene jackknife support (100%), except those displaying the actual value. The scale bar corresponds to the expected number of substitutions per site. Asterisks denote new transcriptomic data generated in this study. (b) Effect of alignment length on the recovery of single nodes in the phylogeny assessed by gene jackknife proportions derived from the NoDP dataset.
Figure 3Time-calibrated phylogeny of jawed vertebrates. Divergences have been averaged across 100 timetrees estimated from independent gene jackknife replicates in PhyloBayes, using the subset of most congruent calibrations (C16; marked by arrows) and best-fit evolutionary (CAT-GTR+Γ) and relaxed clock (autocorrelated lognormal) models. Credibility intervals (CrI) are calculated as the absolute maximum and minimum values of 95% confidence intervals across 100 timetrees (only displayed for key nodes; see Supplementary Table 9 for detailed results). The dimensions of the scale is given in million years and main geological periods are highlighted.