| Literature DB >> 28934485 |
Alexandre Gouy1,2, Joséphine T Daub3, Laurent Excoffier1,2.
Abstract
Advances in high throughput sequencing technologies have created a gap between data production and functional data analysis. Indeed, phenotypes result from interactions between numerous genes, but traditional methods treat loci independently, missing important knowledge brought by network-level emerging properties. Therefore, detecting selection acting on multiple genes affecting the evolution of complex traits remains challenging. In this context, gene network analysis provides a powerful framework to study the evolution of adaptive traits and facilitates the interpretation of genome-wide data. We developed a method to analyse gene networks that is suitable to evidence polygenic selection. The general idea is to search biological pathways for subnetworks of genes that directly interact with each other and that present unusual evolutionary features. Subnetwork search is a typical combinatorial optimization problem that we solve using a simulated annealing approach. We have applied our methodology to find signals of adaptation to high-altitude in human populations. We show that this adaptation has a clear polygenic basis and is influenced by many genetic components. Our approach, implemented in the R package signet, improves on gene-level classical tests for selection by identifying both new candidate genes and new biological processes involved in adaptation to altitude.Entities:
Mesh:
Year: 2017 PMID: 28934485 PMCID: PMC5766194 DOI: 10.1093/nar/gkx626
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Estimates of the effects of the five parameters on the precision and sensitivity obtained under a logistic regression framework. For each parameter, the coefficient, P-value and the percentage of explained total deviance (%TD) are indicated
| Precision | Sensitivity | |||||
|---|---|---|---|---|---|---|
| Estimate |
| %TD | Estimate |
| %TD | |
|
| –0.025 | <2×10-16 | 17.5 | 4.3×10-3 | <2×10-16 | <1 |
|
| 0.14 | <2×10-16 | 21.3 | –0.03 | 1.4×10-15 | <1 |
|
| 0.75 | <2×10-16 | 45.8 | 1.29 | <2×10-16 | 66 |
|
| –0.029 | 0.29 | <1 | 5×10-2 | 0.31 | <1 |
|
| –1.8×10-6 | 2×10-5 | <1 | 1.5×10-6 | 0.05 | <1 |
1Network size.
2HSS size.
3HSS mean score.
4Network density.
5Number of iterations.
Figure 1.GLM-based estimates of the precision (orange) and sensitivity (blue) of the estimation, as a function of μHSS (A), network size (B) and subnetwork size (C). The horizontal dashed lines indicate a 0.95 threshold.
Figure 2.Most significant subnetwork among the three pathway databases. The HIF-2-α transcription pathway is represented as a graph (A), where each node is a gene, and the node size is proportional to the gene score. The highest scoring subnetwork (HSS) of the pathway is shown in red. The gene scores density distribution in this pathway is shown in (B). The dashed line represents the density of gene scores within all the KEGG database, the histogram shows the distribution of genes scores within this pathway, and the vertical red lines indicate the scores of the genes belonging to the HSS.
Figure 3.Merged significant subnetworks. For each database, NCI (A) and KEGG (B), we merged the significant subnetworks of genes if they overlapped. The colour intensity and size of the nodes are proportional to the gene score. Red lines delimit the individual significant subnetwork and the names of pathways to which they belong are shown next to it.