| Literature DB >> 29051622 |
Jacob D Washburn1, James C Schnable2,3, Gavin C Conant4,5, Thomas P Brutnell3, Ying Shao3,6, Yang Zhang2,3, Martha Ludwig7, Gerrit Davidse8, J Chris Pires9.
Abstract
The past few years have witnessed a paradigm shift in molecular systematics from phylogenetic methods (using one or a few genes) to those that can be described as phylogenomics (phylogenetic inference with entire genomes). One approach that has recently emerged is phylo-transcriptomics (transcriptome-based phylogenetic inference). As in any phylogenetics experiment, accurate orthology inference is critical to phylo-transcriptomics. To date, most analyses have inferred orthology based either on pure sequence similarity or using gene-tree approaches. The use of conserved genome synteny in orthology detection has been relatively under-employed in phylogenetics, mainly due to the cost of sequencing genomes. While current trends focus on the quantity of genes included in an analysis, the use of synteny is likely to improve the quality of ortholog inference. In this study, we combine de novo transcriptome data and sequenced genomes from an economically important group of grass species, the tribe Paniceae, to make phylogenomic inferences. This method, which we call "genome-guided phylo-transcriptomics", is compared to other recently published orthology inference pipelines, and benchmarked using a set of sequenced genomes from across the grasses. These comparisons provide a framework for future researchers to evaluate the costs and benefits of adding sequenced genomes to transcriptome data sets.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29051622 PMCID: PMC5648822 DOI: 10.1038/s41598-017-13236-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Genome-guided phylo-transcriptomics workflow. Illustration of the workflow followed to produce the genome-guided phylogenies in this study.
Total orthologs found in each method separated by matrix occupancy.
| Method | 8 spp | 90% | Full | ||
|---|---|---|---|---|---|
| Genome-guided | Genes | Total | 9,757 | 2,211 | 434 |
| Min | 5,389 | 1,963 | 434 | ||
| Amino Acids | Total | 4,182,364 | 835,229 | 144,503 | |
| Min | 1,775,925 | 669,215 | 128,896 | ||
| Agalma | Genes | Total | 11,563 | 2,308 | 555 |
| Min | 5,453 | 2,054 | 555 | ||
| Amino Acids | Total | 4,420,707 | 797,333 | 182,368 | |
| Min | 1,568,329 | 613,538 | 168,157 | ||
| Yang & Smith 1 to 1 | Genes | Total | 7,323 | 1,925 | 898 |
| Min | 3,685 | 1,781 | 898 | ||
| Amino Acids | Total | 2,408,802 | 789,203 | 361,901 | |
| Min | 1,129,993 | 628,190 | 310,283 | ||
| Yang & Smith MO | Genes | Total | 11,568 | 1,966 | 1,076 |
| Min | 6,417 | 1,879 | 1,076 | ||
| Amino Acids | Total | 4,362,686 | 857,857 | 456,597 | |
| Min | 2,009,430 | 687,942 | 380,988 |
Approximate run times in hours (hrs) for each orthology inference method based on a 16 CPU system.
| Synteny Step | BLAST Step | Alignment and Tree Building for Pruning | Total | |
|---|---|---|---|---|
| Genome-Guided | <1 | 6.7 | N/A | 7.7 |
| Agalma | N/A | 46.4 | 88.6 | 135.0 |
| Yang & Smith | N/A | 366.9 | 412.2 | 779.1 |
Figure 2Genome-guided concatenation-based phylogeny of the tribe Paniceae. Phylogenetic tree of the tribe Paniceae (Poaceae) built using RAxML based on a concatenated matrix with 90% gene occupancy. Branches are labeled with maximum likelihood bootstrap values; unlabeled branches have values of 100.
Figure 3(a) Primary nuclear topology found using all methods, (b) Secondary nuclear topology, (c) Chloroplast topology based on Washburn, et al.[80]. (d) An ideogram of the Setaria italica chromosomes[114] with conserved syntenic blocks between S. italica and Sorghum bicolor demarcated. Syntenic blocks are colored based on the phylogenetic patterns from a-c that each block supports. Gray indicates areas of the chromosomes not covered by our blocks. Asterisks below the blocks indicate significance level for pairwise Robinson-Foulds distance tests: ***0.001, **0.01, *0.05.
Grass (Poaceae) wide gene by gene comparisons of orthology detection methods to a benchmark set of orthologs derived entirely from syntenic relationships between sequenced genomes.
| Method | 4 species | 5 species | 6 species | ||
|---|---|---|---|---|---|
| Genome-Guided | All Trees Included | Trees Agreeing with Benchmark | 4,119 | 2,169 | 413 |
| Total Trees | 6,669 | 3,700 | 896 | ||
| Percent Trees in Agreement | 61.8% | 58.6% | 46.1% | ||
| Yang & Smith 1 to 1 | All Trees Included | Trees Agreeing with Benchmark | 1,936 | 1,741 | 1,370 |
| Total Trees | 7,933 | 6,989 | 5,171 | ||
| Percent Trees in Agreement | 24.4% | 24.9% | 26.5% | ||
| Excluding trees not in benchmark set | Trees Agreeing with Benchmark | 1,936 | 1,741 | 1,370 | |
| Total Trees | 6,088 | 5,417 | 4,320 | ||
| Percent Trees in Agreement | 31.8% | 32.1% | 31.7% | ||
| Yang & Smith MO | All Trees Included | Trees Agreeing with Benchmark | 2,000 | 1,795 | 1,404 |
| Total Trees | 8,619 | 7,560 | 5,464 | ||
| Percent Trees in Agreement | 23.2% | 23.7% | 25.7% | ||
| Excluding trees not in benchmark set | Trees Agreeing with Benchmark | 2,000 | 1,795 | 1,404 | |
| Total Trees | 6,503 | 5,757 | 4,516 | ||
| Percent Trees in Agreement | 30.8% | 31.2% | 31.1% |
Figure 4A Venn Diagram comparing the Poaceae gene sets derived from whole genomes, the genome-guided approach, and the Yang & Smith MO pipeline (the 1to1 pipeline is not shown because of large overlap with MO). Diagram created using Inkscape and the R package Vennerable[108,109].