| Literature DB >> 21705381 |
Jonathan E Dickerson1, David L Robertson.
Abstract
Over 3,000 human diseases are known to be linked to heritable genetic variation, mapping to over 1,700 unique genes. Dating of the evolutionary age of these disease-associated genes has suggested that they have a tendency to be ancient, specifically coming into existence with early metazoa. The approach taken by past studies, however, assumes that the age of a disease is the same as the age of its common ancestor, ignoring the fundamental contribution of duplication events in the evolution of new genes and function. Here, we date both the common ancestor and the duplication history of known human disease-associated genes. We find that the majority of disease genes (80%) are genes that have been duplicated in their evolutionary history. Periods for which there are more disease-associated genes, for example, at the origins of bony vertebrates, are explained by the emergence of more genes at that time, and the majority of these are duplicates inferred to have arisen by whole-genome duplication. These relationships are similar for different disease types and the disease-associated gene's cellular function. This indicates that the emergence of duplication-associated diseases has been ongoing and approximately constant (relative to the retention of duplicate genes) throughout the evolution of life. This continued until approximately 390 Ma from which time relatively fewer novel genes came into existence on the human lineage, let alone disease genes. For single-copy genes associated with disease, we find that the numbers of disease genes decreases with recency. For the majority of duplicates, the disease-associated mutation is associated with just one of the duplicate copies. A universal explanation for heritable disease is, thus, that it is merely a by-product of the evolutionary process; the evolution of new genes (de novo or by duplication) results in the potential for new diseases to emerge.Entities:
Mesh:
Year: 2011 PMID: 21705381 PMCID: PMC3709195 DOI: 10.1093/molbev/msr111
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Summary of Disease-Associated, or Not, Gene Counts for Different Taxonomic Levels for MRD Nodes, DCA, and SCA
| MRD | DCA | SCA | ||||
| Taxonomic Level | Disease | Nondisease | Disease | Nondisease | Disease | Nondisease |
|
| 63 | 804 | 28 | 206 | 0 | 0 |
| Homininae | 5 | 233 | 1 | 81 | 0 | 29 |
| Hominidae | 10 | 179 | 4 | 62 | 0 | 46 |
| Catarrhini | 13 | 393 | 6 | 116 | 0 | 8 |
| Primates | 2 | 51 | 0 | 17 | 0 | 32 |
| Eutheria | 55 | 941 | 13 | 558 | 3 | 367 |
| Theria | 14 | 366 | 2 | 234 | 4 | 160 |
| Mammalia | 28 | 320 | 6 | 304 | 5 | 93 |
| Amniota | 17 | 293 | 17 | 439 | 9 | 223 |
| Tetrapoda | 31 | 233 | 17 | 250 | 3 | 73 |
| Euteleostomi | 806 | 6,141 | 367 | 3,696 | 51 | 760 |
| Chordata | 73 | 543 | 125 | 838 | 33 | 441 |
| Bilateria | 206 | 1,550 | 734 | 5,232 | 81 | 815 |
| Fungi/Metazoa | 1 | 8 | 4 | 22 | 142 | 1,004 |
| Total | 1,324 | 12,055 | 1,324 | 12,055 | 331 | 4,051 |
FThe association of disease-associated genes with evolutionary history. Distribution of disease-associated genes for (A) SCA, (C) DCA, and (E) MRD over time. The proportion of duplicates attributed to whole-genome duplication (Makino and McLysaght 2010) are shown (hashed lines) for Euteleostomi only, as these proportions were ≤ 5% for other periods (Supplementary fig. S3, Supplementary Material online). Null distribution: random genes were selected for the distributions of MRD, DCA, and SCA genes, maintaining counts, from their respective nondisease–associated gene lists; this was repeated 10,000 times and the upper and lower quantiles (2.5% and 97.5%, respectively) of these distributions are shown as error bars. Taxonomic levels are indicated on the x axis of panel E and approximate evolutionary time below this. The proportions of disease-associated genes versus nondisease–associated genes for each taxonomic level are also shown for SCA (B), DCA (D), and MRD (F); polynomial regression trend lines (degree = 2) are shown in each case: SCA: R2 = 0.93, F statistic = 78.97, P = value 3.0×10 − 7; DCA: R2 = 0.98, F = 287.5, P = value 3.2×10 − 10; and MRD: R2 = 0.99, F = 558, P = value 8.8×10 − 12. (G) Ratios of the proportions of disease-associated SCA (red), DCA (green), and MRD (blue) among all SCA, DCA, and MRD, respectively, in each taxonomic level over approximate evolutionary time.
FThe evolution of disease types. Disease class frequencies for disease-associated genes for (A) orthologs and (B) paralogs for each taxonomic level. Disease classes correspond to high-level categories.
FEffect of positive selection on disease-associated genes. Mean dN/dS between Homo sapiens and Pan troglodytes for disease-associated (green triangles) and nondisease–associated (blue circles) orthologs (A) and paralogs (B) in each taxonomic level. Category axis labels corresponds to each taxonomic level. Inset bar chart displays percentage of both disease-associated and nondisease–associated genes in each taxonomic level.
F(A) Frequencies of disease genes associated with different sizes of gene families and (B) frequencies of unique diseases associated with the same genes for SCA and MRD. The proportion of duplicates attributed to whole-genome duplication (Makino and McLysaght 2010) are shown for panel A (hashed lines).