| Literature DB >> 16723033 |
Tine Blomme1, Klaas Vandepoele, Stefanie De Bodt, Cedric Simillion, Steven Maere, Yves Van de Peer.
Abstract
BACKGROUND: Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years.Entities:
Mesh:
Year: 2006 PMID: 16723033 PMCID: PMC1779523 DOI: 10.1186/gb-2006-7-5-r43
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Hypothetical examples of phylogenetic trees with duplication and gene loss events. The phylogenetic trees were inferred from a gene family including members of all genomes used in the current study (human, HS; mouse, MM; rat, RN; chicken, GG; frog, XT; zebrafish, DR; Tetraodon, TN; Ciona, CI). All nodes are assumed to be supported by >70% in bootstrap analysis. Gene duplication can be recognized if at least two gene copies are present for the same species. (a) The duplication event (represented by a pink diamond) was inferred to have occurred early in vertebrate evolution because both land vertebrates and fishes have two copies of the gene. This is the most likely explanation, since the alternative assumption, where all lineages have undergone separate gene duplication events, is much less parsimonious. Subsequently, a gene loss event can be inferred for Tetraodon, since gene2 is missing (dotted line). The general conclusion of this hypothetical tree is thus one gene loss event of a TN duplicate that first had been created in the common ancestor of land vertebrates and fishes. For all other genomes, we count two retained duplicates after this ancient duplication event. (b) This more complex phylogenetic tree contains three duplication events (again indicated by diamonds). The oldest duplication event (pink diamond) is dated early in vertebrate evolution (TP13, similar to the one in (a)). HS, MM, RN and GG lost gene2, which is interpreted as gene loss of a TP13 duplicate in the common ancestor of these organisms (thus at TP7; Figure 2). GG also lost gene1, a gene loss event at TP6 of a duplicate that originated at TP13. The orange diamond indicates a duplication event in the common ancestor of the fishes, not shared with land vertebrates (TP12), resulting in gene1 and gene1' for both DR and TN. Finally, DR gene1 and gene1" are the result of a species-specific duplication event in DR.
Figure 2Gene duplications and gene losses mapped on the vertebrate tree. The vertebrate tree is shown with branch lengths proportional to time. The divergence times were taken from [35,68,69]. Abbreviations of species names are as in Figure 1. The numbers in colored circles indicate the different time points analyzed, referred to in the text as TPx. The total number of inferred duplications at each time point (TP) is shown in italics. The (negative) bars on the plots (with gray background) show the fraction of genes that was lost again after they have been created in a specific duplication event (indicated in colors corresponding to the time points (TP)). The total amount of gene loss for each organism is indicated under the species name.
Figure 3Origin of duplicates in different vertebrates. The number of duplicates and their origin (in the vertebrate tree) is shown for all organisms analyzed in the current study. (a) The absolute number of duplicated genes; (b) the relative contribution of the origin of duplicates to the total duplicate content of each vertebrate genome. Colors correspond to the duplication events indicated in Figure 2. Pink represents genes of which a major fraction is assumed to have been created during 1R/2R, while orange refers to the fraction of genes of which many are assumed to have been created during the FSGD.
Number of genes in genomes, gene families, phylogenetic trees, and trees with GOslim annotation
| Genome | 22,218 | 21,952 | 24,461 | 17,709 | 24,405 | 22,877 | 28,005 | |||||||
| Gene families | 14,054 | (0.63) | 14,155 | (0.64) | 14,813 | (0.61) | 9,875 | (0.56) | 13,336 | (0.55) | 14,597 | (0.64) | 12,373 | (0.44) |
| Phylogenetic trees | 13,080 | (0.59) | 12,537 | (0.57) | 13,325 | (0.54) | 9,292 | (0.52) | 11,747 | (0.48) | 13,785 | (0.60) | 11,660 | (0.42) |
| Phylogenetic trees with GO annotation | 12,470 | (0.56) | 11,919 | (0.54) | 12,669 | (0.52) | 8,806 | (0.50) | 11,049 | (0.45) | 13,244 | (0.58) | 11,097 | (0.40) |
The fraction of the proteome analyzed at a certain step in the procedure is in parentheses (see Materials and methods for details).
Excess of gene retention in parts of the vertebrate tree
| GOslim label, category, description | Organism | TPs showing significant difference | TP with highest number of duplicates | q-value |
| GO:0006118, BP, electron transport | TP13 vs TP1 | TP1 | 6.73E-05 | |
| TP12 vs TP10 | TP10 | 2.50E-04 | ||
| TP13 vs TP10 | TP10 | 8.60E-06 | ||
| GO:0006519, BP, amino acid and derivative metabolism | TP13 vs TP1 | TP1 | 8.69E-04 | |
| TP12 vs TP10 | TP10 | 5.37E-04 | ||
| TP13 vs TP10 | TP10 | 1.47E-08 | ||
| GO:0007165, BP, signal transduction | TP13 vs TP11 | TP13 | 5.76E-04 | |
| TP13 vs TP8 | TP13 | 9.18E-04 | ||
| TP13 vs TP2 | TP13 | 1.44E-17 | ||
| TP13 vs TP3 | TP13 | 8.32E-19 | ||
| TP13 vs TP1 | TP13 | 2.43E-02 | ||
| TP12 vs TP10 | TP10 | 3.60E-16 | ||
| TP13 vs TP10 | TP10 | 6.40E-12 | ||
| GO:0003677, MF, DNA binding | TP13 vs TP8 | TP13 | 4.38E-02 | |
| TP13 vs TP2 | TP13 | 1.20E-08 | ||
| TP13 vs TP3 | TP13 | 1.86E-07 | ||
| TP12 vs TP10 | TP10 | 1.51E-10 | ||
| TP13 vs TP10 | TP10 | 3.99E-10 | ||
| GO:0004872, MF, receptor activity | TP13 vs TP11 | TP13 | 6.34E-03 | |
| TP13 vs TP8 | TP13 | 5.83E-03 | ||
| TP13 vs TP2 | TP13 | 2.26E-11 | ||
| TP13 vs TP3 | TP13 | 7.47E-09 | ||
| TP12 vs TP10 | TP10 | 1.01E-08 | ||
| TP13 vs TP10 | TP10 | 7.37E-05 | ||
| GO:0006464, BP, protein modification | TP13 vs TP2 | TP13 | 8.16E-12 | |
| TP13 vs TP3 | TP13 | 7.26E-07 | ||
| TP13 vs TP1 | TP13 | 2.43E-02 | ||
| TP12 vs TP10 | TP10 | 1.00E-09 | ||
| TP13 vs TP10 | TP10 | 2.63E-07 | ||
| GO:0005515, MF, protein binding | TP13 vs TP11 | TP13 | 5.76E-04 | |
| TP13 vs TP8 | TP13 | 1.01E-02 | ||
| TP13 vs TP2 | TP13 | 2.31E-12 | ||
| TP13 vs TP3 | TP13 | 3.72E-08 | ||
| TP12 vs TP10 | TP10 | 7.78E-07 | ||
| TP13 vs TP10 | TP10 | 4.01E-06 | ||
| GO:0007275, BP, development | TP13 vs TP11 | TP13 | 3.71E-02 | |
| TP13 vs TP8 | TP13 | 4.50E-02 | ||
| TP13 vs TP2 | TP13 | 7.64E-08 | ||
| TP13 vs TP3 | TP13 | 1.49E-06 | ||
| TP13 vs TP1 | TP13 | 1.88E-02 | ||
| TP12 vs TP10 | TP10 | 1.18E-06 | ||
| TP13 vs TP10 | TP10 | 1.06E-03 | ||
| GO:0006350, BP, transcription | TP13 vs TP11 | TP13 | 1.97E-02 | |
| TP13 vs TP2 | TP13 | 7.13E-11 | ||
| TP13 vs TP3 | TP13 | 3.82E-11 | ||
| TP12 vs TP10 | TP10 | 1.54E-20 | ||
| TP13 vs TP10 | TP10 | 4.11E-11 | ||
| GO:0003723, MF, RNA binding | TP13 vs TP2 | TP2 | 4.71E-02 | |
| TP13 vs TP1 | TP1 | 9.08E-03 | ||
| TP12 vs TP10 | TP10 | 9.36E-04 | ||
| TP13 vs TP10 | TP10 | 3.42E-05 | ||
| GO:0006811, BP, ion transport | TP13 vs TP11 | TP13 | 1.11E-02 | |
| TP13 vs TP8 | TP13 | 1.79E-02 | ||
| TP13 vs TP2 | TP13 | 1.23E-11 | ||
| TP13 vs TP3 | TP13 | 1.34E-06 | ||
| TP13 vs TP1 | TP13 | 3.84E-02 | ||
| TP12 vs TP10 | TP10 | 8.52E-03 | ||
| TP13 vs TP10 | TP10 | 1.00E-02 |
The GOslim label, its category (MF, molecular function; BP, biological process) and the general description are shown. For each organism (HS, human; MM, mouse; RN, rat; GG, chicken; XT, frog; DR, zebrafish; TN, Tetraodon), the number of species-specific duplicates was compared to the number of duplicates from time points (TPs) coinciding with WGDs (TP12 and TP13). The time points showing a significant difference (q < 0.05) in comparison are shown (TPx vs TPy), followed by the time point with the highest number of duplicates. The last column shows the q-value. Only significant results that are discussed in the text are listed (others can be found in Additional data file 1; Table S2).
Figure 4Retention of duplicates in human and zebrafish following WGDs and small-scale duplications for four different functional categories. The retention of duplicates in (a,b) human and (c,d) zebrafish following WGD events (assumed at TP12 and TP13) versus small-scale duplication events for the GOSlim ontologies 'biotic stimulus' (BS), 'signal transduction' (SD), 'transcription' (TR), and 'metabolism' (MET). Color codings correspond to time points in Figure 2. (a,c) Absolute numbers of retained duplicates. (b,d) Relative numbers of retained duplicates normalized for the total amount of duplicates in the genome.