| Literature DB >> 28061859 |
Hugo V S Rody1, Gregory J Baute2, Loren H Rieseberg2, Luiz O Oliveira3.
Abstract
BACKGROUND: All extant seed plants are successful paleopolyploids, whose genomes carry duplicate genes that have survived repeated episodes of diploidization. However, the survival of gene duplicates is biased with respect to gene function and mechanism of duplication. Transcription factors, in particular, are reported to be preferentially retained following whole-genome duplications (WGDs), but disproportionately lost when duplicated by tandem events. An explanation for this pattern is provided by the Gene Balance Hypothesis (GBH), which posits that duplicates of highly connected genes are retained following WGDs to maintain optimal stoichiometry among gene products; but such connected gene duplicates are disfavored following tandem duplications.Entities:
Keywords: Biased gene retention; Polyploidy; Transcription factors; Whole-genome duplication
Mesh:
Substances:
Year: 2017 PMID: 28061859 PMCID: PMC5219802 DOI: 10.1186/s12864-016-3423-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Distribution of paralogous gene pairs for 25 plant species targeted by this study
| Specie | Chr | Initial PCG | Duplicates | Number of duplicates by duplication type | Number of duplicates by | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| WGD | Tandem | Undefined | 0 < Ks ≤ 0.5 | 0.5 < Ks ≤ 1 | 1 < Ks ≤ 1.5 | 1.5 < Ks ≤ 2 | Ks > 2 | ||||
|
| 16 | 32670 | 6378 | 3442 | 1816 | 1120 | 2251 | 2216 | 966 | 945 | 228 |
|
| 10 | 33602 | 6194 | 2740 | 1232 | 2222 | 1657 | 2407 | 1183 | 947 | 222 |
|
| 26 | 26460 | 3322 | 15 | 998 | 2309 | 1861 | 427 | 402 | 632 | 137 |
|
| 10 | 26678 | 3573 | 1025 | 1768 | 780 | 981 | 1024 | 835 | 733 | 175 |
|
| 18 | 28072 | 1915 | 24 | 455 | 1436 | 603 | 210 | 402 | 700 | 126 |
|
| 22 | 23438 | 2806 | 385 | 1015 | 1406 | 691 | 435 | 751 | 930 | 227 |
|
| 22 | 36449 | 11120 | 390 | 6424 | 4306 | 8106 | 925 | 1029 | 1060 | 240 |
|
| 14 | 34809 | 3974 | 1021 | 1606 | 1347 | 1500 | 979 | 684 | 811 | 184 |
|
| 40 | 46509 | 15242 | 9721 | 2087 | 3434 | 11697 | 1961 | 790 | 794 | 185 |
|
| 34 | 44144 | 9925 | 57 | 995 | 8873 | 4651 | 2805 | 1551 | 918 | 108 |
|
| 12 | 26818 | 2682 | 184 | 627 | 1871 | 1159 | 774 | 415 | 334 | 51 |
|
| 34 | 63515 | 15551 | 2761 | 1308 | 11482 | 13084 | 1258 | 683 | 526 | 107 |
|
| 36 | 30800 | 7134 | 2530 | 703 | 3901 | 4915 | 837 | 716 | 666 | 110 |
|
| 16 | 57587 | 5098 | 1083 | 2419 | 1596 | 2902 | 1262 | 543 | 391 | 115 |
|
| 24 | 48788 | 8349 | 1957 | 2869 | 3523 | 3361 | 2169 | 1665 | 1154 | 317 |
|
| 24 | 59430 | 5559 | 1482 | 2173 | 1904 | 1928 | 1584 | 1233 | 814 | 183 |
|
| 54 | 36137 | 3769 | 306 | 202 | 3261 | 637 | 1848 | 883 | 401 | 99 |
|
| 36 | 41521 | 9721 | 5609 | 1988 | 2124 | 7572 | 738 | 704 | 707 | 147 |
|
| 20 | 31221 | 2558 | 155 | 614 | 1789 | 628 | 435 | 683 | 812 | 176 |
|
| 20 | 34686 | 4267 | 1048 | 1698 | 1521 | 1468 | 1061 | 993 | 745 | 186 |
|
| 24 | 34432 | 7100 | 1234 | 2561 | 3305 | 3184 | 2287 | 872 | 757 | 209 |
|
| 16–27 | 22285 | 1885 | 351 | 608 | 926 | 1457 | 129 | 102 | 197 | 66 |
|
| 20 | 46269 | 3488 | 722 | 1553 | 1213 | 1199 | 601 | 822 | 866 | 201 |
|
| 38 | 26644 | 4536 | 528 | 1935 | 2073 | 1918 | 852 | 1042 | 724 | 128 |
|
| 20 | 39597 | 6336 | 590 | 1396 | 4350 | 3792 | 1095 | 813 | 636 | 153 |
Chr Number of Chromosomes, Initial PCG Initial number of Protein-coding gene sequences
Fig. 1Ks age distributions (a and b) and SiZer maps (c to e) of five plant species. a Brown bars, all paralogs (background); black bars, WGD-derived paralogs predicted by DAGchainer; yellow bars, paralogs annotated as transcription factor activity (GO:0003700). b Brown bars, background; gray bars, tandem-derived paralogs predicted by DAGchainer. SiZer maps for c All paralogs; d WGD-derived paralogs; e Transcription factor paralogs
Fig. 2Heat maps of GO categories across 25 plant species. a The 10 most frequent GO categories overrepresented among WGD-derived paralogs. b Transcription factor activity category (GO:0003700) enrichment analysis for WGD-derived paralogs. c The 10 most frequent GO categories overrepresented among tandem-derived paralogs. Color gradient represents the Corrected P value calculated by the ErmineJ software: brown colors, significant over-representation (P < 0.05); yellow colors, reduced or non-significant enrichment; and gray color, no enrichment
Fig. 3Phylogenetic distribution of transcription factor retention biases among 25 plant species. The phylogenetic tree was adapted from PLAZA 3.0. Symbol code: Black circles on the tree branches, all known WGD events we also identified in this study; Open circles, suggested ancient WGD events we did not examine; triangles, species with WGD-derived transcription factor paralogs significantly overrepresented; pentagons and stars, species with transcription factor paralogs significantly overrepresented in range 1.5 < Ks ≤ 2 and range 1 < Ks ≤ 2, respectively