| Literature DB >> 26988252 |
Michael R McKain1, Haibao Tang2, Joel R McNeal3, Saravanaraj Ayyampalayam4, Jerrold I Davis5, Claude W dePamphilis6, Thomas J Givnish7, J Chris Pires8, Dennis Wm Stevenson9, James H Leebens-Mack4.
Abstract
Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification.Entities:
Keywords: GC content; grasses; monocots; whole-genome duplication
Mesh:
Substances:
Year: 2016 PMID: 26988252 PMCID: PMC4860692 DOI: 10.1093/gbe/evw060
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FConsensus species tree from concatenated and coalescence-based analysis of 234 single copy orthogroups with results from gene tree querying of putative paralogs. (A) Mapping results of synteny-derived paralogs from the rice and sorghum genomes displayed as total number of unique last common ancestor (LCA) nodes with BSV ≥80. Results show placement of rho, sigma, and tau WGD events. (B) Mapping results of K plot-derived paralogs with 22 (number for sigma event) or more total unique LCAs and boostrap values ≥80 for Poales species only. (C) Mapping results of gene tree-derived paralogs with 235 (number for sigma event) or more total unique LCAs and boostrap values ≥80 for monocot species only. Previously published WGD events are identified and placed on the tree, including a shared Zingiberaceae event (gamma), a palm event, and an Agavoideae event, represented as gold diamonds. If previously named, the Greek character representing the event is also displayed. Higher support for rho, sigma, and tau is identified relative to the synteny-derived paralogs and other potential WGD events in Juncus, Cyperus, and Restionaceae are also identified.
FHeatmaps depicting general trends in gene GC composition across all sampled taxa. (A) Distribution of total GC content for all genes across all taxa. The varied GC composition of monocots is highlighted by the differences between low GC species (i.e., Juncus) and high GC species (i.e., Aphelia or grasses). Histogram depicts the GC distribution of Sorghum bicolor demonstrating the heatmap information in a vertical format. (B) Distribution of GC content across genes for all orthogroups and species sampled. A 5′ bias for increased GC percentage is seen.
Statistical Tests for Total GC and GC3 Composition across 13,798 Orthogroups for All Taxa Sampled
| 0.505 | 0.063 | 0.551 | 0.132 | 0.712 | 0.163 | ||
| 0.463 | 0.040 | 0.461 | 0.084 | 0.997 | 0.822 | 0.171 | |
| 0.529 | 0.088 | 0.599 | 0.190 | ||||
| 0.508 | 0.095 | 0.547 | 0.201 | ||||
| 0.563 | 0.089 | 0.665 | 0.185 | ||||
| 0.482 | 0.071 | 0.509 | 0.158 | 0.559 | |||
| 0.492 | 0.061 | 0.530 | 0.147 | 0.437 | |||
| 0.480 | 0.061 | 0.494 | 0.129 | 0.529 | |||
| 0.459 | 0.049 | 0.455 | 0.096 | 0.918 | |||
| 0.458 | 0.048 | 0.450 | 0.092 | 0.949 | |||
| 0.500 | 0.085 | 0.534 | 0.189 | ||||
| 0.521 | 0.086 | 0.582 | 0.184 | 0.957 | 0.263 | ||
| 0.487 | 0.071 | 0.503 | 0.155 | 0.762 | 0.118 | ||
| 0.476 | 0.060 | 0.489 | 0.130 | 0.412 | |||
| 0.491 | 0.071 | 0.513 | 0.155 | 0.977 | 0.151 | ||
| 0.472 | 0.060 | 0.481 | 0.142 | 0.760 | |||
| 0.502 | 0.080 | 0.539 | 0.170 | 0.955 | 0.676 | ||
| 0.448 | 0.060 | 0.459 | 0.115 | 0.776 | 0.106 | ||
| 0.455 | 0.065 | 0.475 | 0.131 | 0.936 | 0.110 | ||
| 0.518 | 0.083 | 0.572 | 0.179 | ||||
| 0.471 | 0.050 | 0.484 | 0.096 | 0.667 | |||
| 0.471 | 0.049 | 0.486 | 0.099 | 0.232 | |||
| 0.498 | 0.069 | 0.538 | 0.146 | 0.300 | |||
| 0.517 | 0.079 | 0.577 | 0.171 | 0.861 | 0.658 | ||
| 0.511 | 0.085 | 0.562 | 0.177 | 0.795 | 0.183 | ||
| 0.542 | 0.096 | 0.618 | 0.200 | ||||
| 0.490 | 0.062 | 0.515 | 0.139 | 1.000 | 0.994 | ||
| 0.553 | 0.093 | 0.642 | 0.194 | ||||
| 0.497 | 0.081 | 0.533 | 0.164 | 0.193 | |||
| 0.506 | 0.083 | 0.552 | 0.183 | 0.830 | 0.113 | ||
| 0.469 | 0.064 | 0.467 | 0.143 | 0.744 | 0.268 | ||
| 0.479 | 0.068 | 0.494 | 0.147 | 0.908 | 0.179 | ||
| 0.462 | 0.038 | 0.456 | 0.085 | 0.985 | 0.891 | ||
| 0.520 | 0.080 | 0.584 | 0.167 | 0.948 | 0.521 | ||
| 0.482 | 0.067 | 0.502 | 0.150 | 0.452 | |||
| 0.468 | 0.072 | 0.470 | 0.156 | 0.426 | 0.056 | 0.229 |
Note.—Bold values represent a significant p-value of < 0.05.
FDistributions for percent GC across genes identified to “high GC” (red) or “low GC” (blue) orthogroups for (A) all taxa, (B) Juncus effusus, and (C) Brachypodium distachyon. (A) t-test for distributions of high and low GC orthogroups across all sampled taxa suggests the two sets are distinct (P value = 0.00). (B) The distributions for Juncus effusus demonstrate high overlap of high and low GC orthogroups. t-test of these data suggests that they are distinct sets (P value = 0.00). Juncus effusus transcripts exhibit the lowest overall GC composition across all sampled taxa and transcripts assigned to otherwise high GC composition orthogroups is strikingly lower than the overall distribution of these orthogroups. (C) The distributions for Brachypodium distrachyon exhibit almost nonoverlapping GC values for high and low GC orthogroups. This difference is supported by a t-test (P value = 0.00). Brachypodium distachyon represents the highest GC percentage of all taxa sampled.
Kmeans Clustering of Taxa Exhibiting Bimodal Total GC Composition Distribution
| 0.472 | 0.633 | |
| 0.460 | 0.651 | |
| 0.500 | 0.659 | |
| 0.460 | 0.632 | |
| 0.466 | 0.616 | |
| 0.483 | 0.661 | |
| 0.491 | 0.659 |
Counts of Retained and Lost Paralogs in Gc Classed Orthogroups for Sorghum and Rice with Chi-Squared Test Results
| Synteny | Rho retained duplicate | 35 | 279 | 86 | 80.6502 | <0.00001 |
| Synteny | Rho duplicate lost | 3,627 | 6,491 | 2,509 | ||
| Synteny | Sigma retained duplicate | 0 | 24 | 4 | 14.4838 | 0.000716 |
| Synteny | Sigma duplicate Lost | 3,662 | 6,746 | 2,591 | ||
| Synteny | Tau retained duplicate | 5 | 53 | 20 | 18.2902 | 0.000107 |
| Synteny | Tau duplicate lost | 3,657 | 6,717 | 2,575 | ||
| Gene trees | Rho retained duplicate | 51 | 385 | 133 | 109.3626 | <0.00001 |
| Gene trees | Rho duplicate lost | 3,611 | 6,385 | 2,462 | ||
| Gene trees | Sigma retained duplicate | 15 | 163 | 48 | 55.9048 | <0.00001 |
| Gene trees | Sigma duplicate lost | 3,647 | 6,607 | 2,547 | ||
| Gene trees | Tau retained duplicate | 75 | 235 | 95 | 19.256 | 0.000066 |
| Gene trees | Tau duplicate lost | 3,587 | 6,535 | 2,500 |