| Literature DB >> 17183716 |
Jeffery P Demuth1, Tijl De Bie, Jason E Stajich, Nello Cristianini, Matthew W Hahn.
Abstract
Gene families are groups of homologous genes that are likely to have highly similar functions. Differences in family size due to lineage-specific gene duplication and gene loss may provide clues to the evolutionary forces that have shaped mammalian genomes. Here we analyze the gene families contained within the whole genomes of human, chimpanzee, mouse, rat, and dog. In total we find that more than half of the 9,990 families present in the mammalian common ancestor have either expanded or contracted along at least one lineage. Additionally, we find that a large number of families are completely lost from one or more mammalian genomes, and a similar number of gene families have arisen subsequent to the mammalian common ancestor. Along the lineage leading to modern humans we infer the gain of 689 genes and the loss of 86 genes since the split from chimpanzees, including changes likely driven by adaptive natural selection. Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes) in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences. This genomic "revolving door" of gene gain and loss represents a large number of genetic differences separating humans from our closest relatives.Entities:
Mesh:
Year: 2006 PMID: 17183716 PMCID: PMC1762380 DOI: 10.1371/journal.pone.0000085
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of gene gain and loss among mammalian lineages.
Numbers in parentheses report number of genes gained or lost on each branch.
Pie charts near branches show the proportion of families that expanded (green), contracted (red), or did not change (blue).
The large pie chart shows the proportion of all families that change (orange), or remain constant (blue) across all lineages.
Changes along long branches and on the ingroup branch may represent underestimates due to multiple gains and losses within individual families and or lack of phylogenetic resolution.
Numbers of genes and gene families in mammals.
| Chimp | Human | Mouse | Rat | Dog | Primate | Rodent | Ingroup | MRCA | Total # Unique | |
|
| 9,693 | 10,349 | 11,410 | 9,969 | 9,663 | 15,389 | ||||
|
| 20,947 | 22,763 | 24,502 | 22,557 | 18,213 | |||||
|
| 94 | 612 | 1,674 | 500 | 234 | 3,114 | ||||
|
| 2 | 20 | 40 | 15 | 6 | 1,714 | 488 | 0 | 2,285 | |
|
| ||||||||||
|
| 9,597 | 9,717 | 9,696 | 9,454 | 9,423 | 9,766 | 9,847 | 9,990 | 9,990 | 9,990 |
|
| 18,660 | 19,966 | 21,763 | 21,155 | 17,962 | 19,363 | 20,920 | 19,525 | 19,513 | |
|
| 1.94 | 2.05 | 2.24 | 2.24 | 1.91 | 1.98 | 2.12 | 1.95 | 1.95 | |
Data are based on the genome annotations in Ensembl v41. Chimp, human, mouse, rat, and dog are observed values; whereas, Primate, Rodent, Ingroup, and MRCA values are inferred from our analysis. Taxon labels correspond to branches in Figure 1. Annotation artifacts: the number of families represented by a single gene and found only in a single species. Creations: gene families that appear to have arisen subsequent to the MRCA. Families, Genes, and Genes/Family indicate the numbers used in the likelihood analysis.
Changes in gene family size along each branch in the phylogenetic tree.
| Branch | Branch Length |
| Expansions | Contractions | Extinctions | No Change | Avg. Exp. | ||||||
| Families | Genes | genes/expansion | Families | Genes | genes/contraction | Families | Genes | genes/extinction | |||||
| Human | 6 | 9,717 | 414 | 689 | 1.66 | 86 | 86 | 1 | 49 | 49 | 1 | 9,217 | 0.062 |
| Chimp | 6 | 9,597 | 25 | 26 | 1.04 | 546 | 729 | 1.34 | 169 | 172 | 1.02 | 9,026 | −0.073 |
| Mouse | 17 | 9,696 | 714 | 1,405 | 1.97 | 453 | 562 | 1.24 | 151 | 163 | 1.08 | 8,529 | 0.087 |
| Rat | 17 | 9,454 | 673 | 1,355 | 2.01 | 940 | 1,120 | 1.19 | 393 | 403 | 1.03 | 7,841 | 0.025 |
| Primate | 81 | 9,766 | 453 | 870 | 1.92 | 621 | 1,032 | 1.66 | 224 | 240 | 1.07 | 8,692 | −0.017 |
| Rodent | 70 | 9,847 | 514 | 1,773 | 3.45 | 338 | 378 | 1.12 | 143 | 144 | 1.01 | 8,995 | 0.142 |
| Ingroup | 6 | 9,990 | 8 | 16 | 2 | 4 | 4 | 1 | 0 | 0 | 0 | 9,978 | 0.001 |
| Dog | 93 | 9,423 | 395 | 607 | 1.54 | 1,336 | 2,165 | 1.62 | 567 | 607 | 1.07 | 7,692 | −0.165 |
|
| |||||||||||||
| Avg. Exp.: Average Expansion | |||||||||||||
Branches are labeled according to Figure 1 and branch lengths are in millions of years. Families, genes, and genes per family indicate the total number for each category of change. Extinctions are a subset of contractions. Average expansion = (total genes gained – total genes lost)/n; negative average expansion indicates net reduction in gene number.
Figure 2Rapidly evolving gene families in mammals.
Horizontal bars indicate the relative rates of change among taxa for each family (where rate is the change in number of genes per million years).
The top bar indicates the hypothetical case where each lineage has an equal evolutionary rate.
Bars are partitioned by the color codes assigned to the taxon names at the far right.
Boxes immediately right of each bar indicate whether changes in that family are expansions (+) or contractions (−) in each taxon.
Each column of boxes represents a single taxon, color coded in the same order as the bars.
Vertical bars on the right span families that were found to be least likely in the same lineage under the random model of gene gain and loss.
Families with ambiguous or unknown function are left out of the figure to improve legibility.
The complete statistical results are presented in Table S7.