| Literature DB >> 28333345 |
J T Daub1,2, S Moretti2,3, I I Davydov2,3, L Excoffier1,2, M Robinson-Rechavi2,3.
Abstract
Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier "significant" genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28333345 PMCID: PMC5435107 DOI: 10.1093/molbev/msx083
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FThe Primates clade with the species used in the branch-site test. The four tested branches (Homoninae, Hominidae, Hominoidae, and Catarrhini) are numbered (used to identify branch specific lists of genes or gene sets, e.g., G1, G2, G3, and G4) and marked in red. (Modified from the Ensembl mammalian species tree: https://github.com/Ensembl/ensembl-compara/blob/release/70/scripts/pipeline/species_tree_blength.nh. For more information about the construction of phylogenetic trees in Ensembl and the calculation of branch lengths, see http://dec2013.archive.ensembl.org/info/genome/compara/index.html)
Number of Gene Sets and Genes Part of Sets in the Four Tested Branches.
| Branch | Leading to | #Sets | #Genes in Sets | # Significant Sets | |||
|---|---|---|---|---|---|---|---|
| Before Pruning | Without Top Scoring Gene | After Pruning | |||||
| 1 | Homininae | African Apes(Hu, Ch, Go) | 1,415 | 7,600 | 8 | 6 | 2 |
| 2 | Hominidae | Great Apes(Hu, Ch, Go, Or) | 1,424 | 7,849 | 34 | 32 | 7 |
| 3 | Hominoidae | Apes(Hu, Ch, Go, Gi) | 1,441 | 8,016 | 43 | 42 | 6 |
| 4 | Catarrhini | Apes & Old World Monkeys (Hu, Ch, Go, Gi, Ma) | 1,441 | 8,058 | 95 | 93 | 9 |
Note.—For each branch the number of significant sets (q < 0.2) in the SUMSTAT gene set enrichment test is reported, both before and after removing overlapping genes (“pruning”), as well as the number of significant sets before pruning that remain significant after removal of their highest scoring gene.
Hu, human; Ch, chimpanzee; Go, gorilla; Or, orangutan; Gi, gibbon; Ma, macaque.
Results of the SUMSTAT Gene Set Enrichment Test.
| SUMSTAT (Postpruning) | Size (Postpruning) | ||
|---|---|---|---|
| GPCR downstream signaling § | 125.10 | 645 | 0.0032 |
| Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell § | 19.20 (18.61) | 54 (53) | 0.0047 |
| Olfactory Signaling Pathway | 69.34 | 230 | <10−5 |
| Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell | 22.34 | 57 | <10−5 |
| Metabolism of xenobiotics by cytochrome P450 | 16.81 | 57 | 0.1422 |
| Oxidative phosphorylation (WikiPathways) | 14.43 | 46 | 0.1422 |
| Intestinal immune network for IgA production | 12.68 (11.68) | 40 (36) | 0.1932 |
| Fatty acid metabolism | 13.84 | 47 | 0.1932 |
| Synthesis of bile acids and bile salts via 7alpha-hydroxycholesterol | 8.08 (7.16) | 20 (17) | 0.1932 |
| 23.88 | 44 | <10−5 | |
| GPCR downstream signaling | 155.59 (150.17) | 687 (680) | <10−5 |
| Electron Transport Chain | 30.13 | 84 | 0.0010 |
| Complement cascade | 16.36 (9.30) | 29 (16) | 0.0319 |
| Metabolism of xenobiotics by cytochrome P450 | 20.11 | 59 | 0.0319 |
| Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell | 22.34 (16.92) | 58 (52) | 0.1226 |
| Hematopoietic cell lineage | 40.12 | 79 | <10−5 |
| non-Alchoholic fatty liver disease (NAFLD) | 48.94 (47.02) | 129 (124) | <10−5 |
| Cytokine–cytokine receptor interaction | 95.90 (73.29) | 236 (199) | <10−5 |
| 24.11 (20.48) | 43 (37) | 0.0025 | |
| Chemical carcinogenesis | 28.27 (26.87) | 63 (62) | 0.0066 |
| Defensins | 17.02 (16.47) | 40 (37) | 0.0896 |
| Pancreatic secretion | 29.93 (28.45) | 85 (83) | 0.1177 |
| Fatty acid metabolism | 18.41 | 48 | 0.1814 |
| NF-kB activation through FADD/RIP-1 pathway mediated by caspase-8 and -10 | 7.98 (6.62) | 12 (10) | 0.1898 |
Note.—For each branch, only the pathways that score significant (q < 0.2) both before and after pruning (removal of overlapping genes) are listed. The SUMSTAT scores and gene set sizes that changed after pruning are shown in parentheses. Pathways which score significant on more than one branch are highlighted by the symbol ‘§’
FHeat map showing ΔlnL4 scores of genes in the Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell pathway for the four tested inner branches of the Primates tree. Branches where the pathway scores significant after pruning are marked with a “*.” The genes are grouped by hierarchical clustering to visualize blocks with similar signals within and among branches. Genes for which ΔlnL4 scores were not available (NA) in a certain branch are depicted in grey. Genes are merged (horizontally) with their paralog(s) into an “ancestral gene” in the branches preceding a duplication and their scores were included only once in the calculation of the SUMSTAT score for these branches. Genes with (vertically) merged branches represent cases where the sequence of one or more species is missing or excluded, resulting in a single “average” ΔlnL4 score over multiple branches. We use this score when testing each branch separately. The ΔlnL4 score is computed as the fourth root of log-likelihood ratio in the branch-site test for positive selection.