| Literature DB >> 32847502 |
Yuchen Zhu1, Jiadong Ji2, Weiqiang Lin1, Mingzhuo Li1, Lu Liu1, Huanhuan Zhu3,4, Fuzhong Xue1, Xiujun Li1, Xiang Zhou3,4, Zhongshang Yuan5.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner.Entities:
Keywords: Alzheimer’s disease; Integration method; K shortest paths algorithms; Maximum correlation coefficient; Pathway
Mesh:
Year: 2020 PMID: 32847502 PMCID: PMC7477886 DOI: 10.1186/s12863-020-00899-3
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
The number of times that properly pinpoint the top 4 pathways among 500 simulations with linear between-node correlation
| Sample | Criteria | Pearson | Spearman | Distance | MCC | MIC | MI |
|---|---|---|---|---|---|---|---|
| 100 | All-right | 152 | 129 | 125 | 107 | 44 | 1 |
| Range-right | 408 | 402 | 428 | 378 | 300 | 46 | |
| 300 | All-right | 271 | 252 | 248 | 248 | 88 | 0 |
| Range-right | 492 | 490 | 495 | 486 | 468 | 218 | |
| 500 | All-right | 444 | 429 | 415 | 440 | 152 | 254 |
| Range-right | 500 | 500 | 500 | 500 | 500 | 500 |
Fig. 1The proportion that correctly pinpoint the top 4 pathways among 500 simulations under two criteria when the sample size is 500 and the proportion of nonlinear components is 40%.The nonlinear pattern is (a) , (b) () cos () , (c) mixed nonlinear pattern(6 edges having cosine and 3 edges having quadratic relationship) and (d) respectively
Fig. 2The proportion that correctly pinpoint the top 4 pathways among 500 simulations under two criteria when the sample size is 500 and the proportion of nonlinear components is 50%.The nonlinear pattern is (a) , (b) () cos () , (c) mixed nonlinear pattern(8 edges having cosine and 4 edges having quadratic relationship) and (d) respectively
The top 5 pathways identified by each method from APOE genotype to AD in ROSMAP study
| Method | Order | ||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| Pearson-SP | (152.8) | (69.90) | (69.01) | (59.54) | (32.35) |
| Spearman-SP | (246.54) | (154.98) | (125.08) | (96.00) | (66.61) |
| DC-SP | (57.84) | (41.32) | (35.91) | (31.42) | (24.32) |
| MIC-SP | (3063.17) | (1136.26) | (1099.96) | (743.01) | (380.01) |
| MI-SP | (60.95) | (39.72) | (33.01) | (25.26) | (21.32) |
| MCC-SP | (13.80) | (12.32) | (8.87) | (8.84) | (7.42) |
Note: The pathway importance scores (PISs) were shown in the parenthesis. Note that the PIS is only comparable across one specific method
: APOE genotype →APOE gene expression→GRIN2A → CAPN2 → MAPT→AD
: APOE genotype →APOE gene expression →GRIN2A → MAPK1 → CASP3 → AD
: APOE genotype →APOE gene expression →GRIN2A → NOS1 → AD
: APOE genotype →APOE gene expression →CACNA1C → CAPN2 → MAPT→AD
: APOE genotype →APOE gene expression→CACNA1C → MAPK1 → CASP3 → AD
: APOE genotype →APOE gene expression→CACNA1C → NOS1 → AD
Fig. 3The simulated network from genetic variant to disease constructed from the insulin signaling pathway from KEGG. The hypothesis is that the genetic variant can affect the disease through the gene expression on the multiple pathways. The genes and the direction are highlighted as the green box and the black arrow. The 23 edges included in the 4 effective pathways are highlighted in red
Fig. 4The whole network from APOE genotype to AD constructed from KEGG-based Alzheimer’s disease pathway. The hypothesis is that the APOE genetic variant can affect AD through the gene expression on Alzheimer’s disease pathway. Multiple pathways with chain structure can be formulated with the staring node APOE genotype and the end node AD. The gene and directions are highlighted with a green frame and a black arrow