| Literature DB >> 18820252 |
Tomislav Domazet-Loso1, Diethard Tautz.
Abstract
Several thousand genes in the human genome have been linked to a heritable genetic disease. The majority of these appear to be nonessential genes (i.e., are not embryonically lethal when inactivated), and one could therefore speculate that they are late additions in the evolutionary lineage toward humans. Contrary to this expectation, we find that they are in fact significantly overrepresented among the genes that have emerged during the early evolution of the metazoa. Using a phylostratigraphic approach, we have studied the evolutionary emergence of such genes at 19 phylogenetic levels. The majority of disease genes was already present in the eukaryotic ancestor, and the second largest number has arisen around the time of evolution of multicellularity. Conversely, genes specific to the mammalian lineage are highly underrepresented. Hence, genes involved in genetic diseases are not simply a random subset of all genes in the genome but are biased toward ancient genes.Entities:
Mesh:
Year: 2008 PMID: 18820252 PMCID: PMC2582983 DOI: 10.1093/molbev/msn214
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FPhylogenetic framework used in the search for the human gene origins. Taxa represented in the databases with complete genomes or a substantial amount of TRACE and EST data are in bold. Taxa in italics are represented in the databases only with small numbers of highly conserved genes, and their exclusion from the analysis does not influence the results.
FPhylostratigraphy of all human genes and different classes of disease-causing genes. The total number of human genes (N = 22,845) found in the different phylostrata is plotted (blue line—squares, note logarithmic scale on the y axis). Distribution of the total number of evaluated disease genes (N = 1,760, red line—circles). The subsample of nonessential disease genes (N = 1,305), the stringent nonessential subsample (N = 1,020), and the genes involved in polygenic traits (N = 149) are also shown. The correlation coefficient between gene count and ranked evolutionary time is listed on top (estimated by Spearman's rank correlation coefficient).
Comparison of Overrepresented GO Terms among the Disease Gene Set with the Rank Order of the Same Term among the Overrepresented GO Terms in the Whole Gene Set
| PS | GO Term Overrepresented among Disease Genes | Rank among All Genes |
| 1 | Metabolic process | 1 |
| Electron transport | 4 | |
| Protein amino acid phosphorylation | 3 | |
| Carbohydrate metabolic process | 5 | |
| Phosphate transport | 28 | |
| Proteolysis | 3 | |
| Ion transport | 8 | |
| Transport | 15 | |
| Homophilic cell adhesion | 7 | |
| Epidermis development | n.s. | |
| 2 | Regulation of transcription, DNA dependent | 1 |
| Multicellular organismal development | n.s. | |
| Protein amino acid dephosphorylation | 30 | |
| Ubiquitin cycle | 3 | |
| Regulation of Rho protein signal transduction | 7 | |
| Transcription | 2 | |
| Dephosphorylation | 49 | |
| Organ morphogenesis | n.s. | |
| Forebrain development | n.s. | |
| 3 | Regulation of transcription, DNA dependent | 1 |
| Transcription | 2 | |
| 4 | Transcription | 3 |
| Regulation of transcription, DNA dependent | 2 | |
| Positive regulation of transcription from RNA polymerase II promoter | 4 | |
| Multicellular organismal development | 5 | |
| Wnt receptor signaling pathway, calcium modulating pathway | 1 | |
| 5 | G-protein–coupled receptor protein signaling pathway | 1 |
| Signal transduction | 3 | |
| Activation of adenylate cyclase activity | 15 | |
| Protein–chromophore linkage | 16 | |
| Regulation of transcription | 9 | |
| Brown fat cell differentiation | 41 | |
| G-protein signaling, coupled to cyclic nucleotide second messenger | 5 | |
| G-protein signaling, coupled to IP3 second messenger (phospholipase C activating) | 7 | |
| Sensory perception of taste | n.s. | |
| Diet-induced thermogenesis | 34 | |
| G-protein signaling, coupled to cAMP nucleotide second messenger | 14 | |
| Heat generation | 37 | |
| Heat generation | 37 | |
| Vasodilation by norepinephrine–epinephrine involved in regulation of systemic arterial blood pressure | 35 | |
| Phototransduction | 20 | |
| 6 | Cell communication | 1 |
| Cell–cell signaling | 3 | |
| 7 | Methylation | 1 |
| 8 | Calcium-independent cell–cell adhesion | 1 |
| 9 | Immune response | 2 |
| Antigen processing and presentation | 1 | |
| Antigen processing and presentation of peptide antigen via MHC class I | 4 | |
| Antigen processing and presentation of peptide or polysaccharide antigen via MHC class II | 3 | |
| Peripheral nervous system development | 23 | |
| 10 | Cell surface receptor–linked signal transduction | 4 |
| Response to virus | 1 | |
| Immune response | 2 | |
| Feeding behavior | 12 | |
| Cell–cell signaling | 3 | |
| Cellular calcium ion homeostasis | 8 | |
| Neutrophil apoptosis | 16 | |
| Inflammatory response | 5 | |
| JAK–STAT cascade | 20 | |
| 11 | None | |
| 12 | Immune response | 1 |
| 13 | Lipoprotein metabolic process | 1 |
| Lipid transport | 4 | |
| Negative regulation of lipid catabolic process | n.s. | |
| Neutrophil activation | n.s. | |
| Regulation of cytokine production | n.s. | |
| Negative regulation of lipoprotein metabolic process | n.s. | |
| Cell surface receptor–linked signal transduction | n.s. | |
| Regulation of cholesterol absorption | n.s. | |
| Positive regulation of interleukin-8 biosynthetic process | n.s. | |
| 14 | None | |
| 15 | Keratinization | 4 |
| Peptide cross-linking | n.s. | |
| Keratinocyte differentiation | n.s. | |
| Protein homooligomerization | n.s. | |
| 16 | None | |
| 17 | None |
NOTE.—“n.s.” means that the term was not significantly overrepresented in the whole gene set.
FProbabilities of over- or underrepresentation of disease-causing genes in the respective phylostrata. Log-odds ratios show how the frequency of disease genes in each phylostratum deviates from the expected one estimated from the whole set of genes. Numbering of the phylostrata corresponds to those in figures 1 and 2 (*P < 0.05; **P < 0.01; ***P < 0.001, two-tailed hypergeometric test corrected for multiple comparison by FDR at 0.05 level).