| Literature DB >> 22273362 |
Natalia Briones1, Valentin Dinu.
Abstract
BACKGROUND: The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS) to be associated with late-onset Alzheimer's disease (LOAD), a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods.Entities:
Mesh:
Year: 2012 PMID: 22273362 PMCID: PMC3355044 DOI: 10.1186/1471-2350-13-7
Source DB: PubMed Journal: BMC Med Genet ISSN: 1471-2350 Impact factor: 2.103
Logistic regression top scoring SNPs, approach I
| Gene Symbol dbSNP RS ID | Distance to Gene | Unadj. p-value | FDR_BH p-value | OR (95% CI) | Pathway/Disease/function |
|---|---|---|---|---|---|
| intron | 7.16E-07 | 4.47E-02 | 2.21 (1.61-3.02) | Interaction with PAK4 for reduction of LIMK1 phosphorylation; | |
| upstream 27742 | 8.59E-07 | 4.47E-02 | 2.21 (1.61-3.02) | Endocytosis [ | |
| intron | 2.25E-06 | 8.78E-02 | 2.13 (1.56-2.92) | Immnune response [ | |
| downstream 40719 | 4.13E-06 | 1.29E-01 | 1.62 (1.32-2.00) | AD, Parkinson's, Hungtinton's, oxidative phosphorylation [ | |
| downstream 72544 | 4.93E-06 | 1.39E-01 | 3.14 (1.92-5.12) | Prevention of cell-cell interaction of Integrins [ | |
| upstream 328744 | 5.33E-06 | 1.39E-01 | 2.08 (1.52-2.85) | Shwachman-Diamond syndrome [ | |
| downstream 187123 | 1.19E-05 | 2.03E-01 | 2.03 (1.48-2.78) | AD (4 studies), Adherens junction, Inmune response [ | |
FDR_BH = Benjamini & Hochberg (1995) step-up False Discovery Rate control.
Figure 1RF performance assessment, different number of features and number of trees fixed at 100; approach I.
Figure 2RF tuning, best number of attributes at different number of trees; approach I. F = number of features.
RF modeling, filtered data, approach I
| Data | RF | 10-Fold | CV | % | Error | |
|---|---|---|---|---|---|---|
| 100 | 33.3 | 33.2 | ||||
| 300 | 33.3 | 33.2 | ||||
| 600 | 33.4 | 33.5 | ||||
| 100 | 11.9 | 12.3 | 12.5 | 12.3 | ||
| 300 | 11.3 | 11.3 | 12.1 | 11.6 | ||
| 600 | 10.6 | 11.6 | 11.2 | 11.1 | ||
| 100 | 10.9 | 9.9 | 10.4 | 10.3 | ||
| 300 | 8.9 | 9.6 | 9.5 | 10.3 | ||
| 600 | 9.0 | 9.1 | 10.1 | 9.9 | ||
| 100 | 9.9 | 10.1 | 9.9 | 11.2 | ||
| 300 | 8.9 | 9.1 | 9.5 | 10.3 | ||
| 600 | 9.1 | 9.4 | 9.9 | 10.3 | ||
F = number of features.
CMH top scoring SNPs in LD, approach II.
| Gene (Chr.) dbSNP RS ID | Physical Position | Distance to Gene | Minor Allele (MAF) | p-value from χ2 | OR (95% CI) |
|---|---|---|---|---|---|
| rs4910068 | 8830651 | intron | C (0.25) | 9.47E-05 | 1.57 (1.25-1.98) |
| rs10743089 | 8744744 | intron | A (0.33) | 2.12E-05 | 1.59 (1.28-1.97) |
| rs4259003 | 144006245 | intron | A (0.21) | 6.82E-05 | 2.36 (1.53-3.64) |
| rs9784320 | 144024724 | intron | C (0.25) | 9.68E-04 | 1.87 (1.28-2.71) |
| rs2033912 | 143999057 | intron | T (0.22) | 1.71E-03 | 1.86 (1.26-2.75) |
| rs891159 | 81526843 | intron | C (0.24) | 2.40E-05 | 2.34 (1.56-3.49) |
| rs1485587 | 81362798 | intron | G (0.48) | 8.11E-05 | 1.82 (1.35-2.45) |
| rs4703879 | 81589571 | intron | A (0.24) | 1.01E-03 | 1.86 (1.28-2.71) |
| rs1389421 | 25747721 | upstream 561825 | G (0.45) | 3.39E-06 | 2.07 (1.52-2.83) |
| rs10834774 | 25715397 | upstream 594149 | C (0.20) | 4.01E-03 | 1.86 (1.21-2.85) |
| rs11028909 | 25729021 | upstream 580525 | G (0.20) | 4.13E-03 | 1.84 (1.21-2.81) |
| rs249153 | 93848520 | downstream 40719 | C (0.19) | 3.19E-06 | 1.63 (1.33-2.00) |
| rs249154 | 93848687 | downstream 40552 | C (0.18) | 3.22E-05 | 1.54 (1.25-1.89) |
| rs6784615 | 52481466 | intron | C (0.09) | 6.63E-07 | 2.13 (1.57-2.88) |
| rs9855470 | 52468315 | intron | A (0.06) | 4.86E-05 | 2.11 (1.46-3.05) |
| rs6445486 | 52481531 | intron | A (0.06) | 2.77E-04 | 1.90 (1.34-2.71) |
| rs10865972 | 52466487 | intron | C (0.06) | 3.38E-04 | 1.87 (1.32-2.65) |
| rs4687619 | 52493826 | intron | T (0.06) | 3.54E-04 | 1.93 (1.34-2.79) |
| rs6810027 | 52499614 | intron | C (0.05) | 5.80E-04 | 1.89 (1.31-2.73) |
All p-values are uncorrected.
RF modeling, filtered data, approach II.
| Data | RF | 10-Fold | Cross | Validation | % Error |
|---|---|---|---|---|---|
| 100 | 33.3 | 33.2 | |||
| 150 | 33.4 | 33.5 | |||
| 200 | 33.4 | 33.2 | |||
| 250 | 33.2 | 33.2 | |||
| 300 | 33.3 | 33.2 | |||
| 600 | 33.4 | 33.5 | |||
| 100 | 26.9 | 27.5 | 27.9 | ||
| 150 | 26.7 | 27.0 | 27.5 | ||
| 200 | 26.8 | 27.0 | 27.4 | ||
| 250 | 27.1 | 27.3 | 27.6 | ||
| 300 | 27.2 | 27.1 | 27.6 | ||
| 600 | 27.1 | 27.4 | 27.4 | ||
| 100 | 20.3 | 20.8 | 20.7 | ||
| 150 | 19.9 | 20.6 | 20.8 | ||
| 200 | 20.1 | 20.7 | 20.9 | ||
| 250 | 20.3 | 20.6 | 20.6 | ||
| 300 | 20.1 | 20.6 | 20.6 | ||
| 600 | 19.8 | 20.4 | 20.7 | ||
| 100 | 17.6 | 17.6 | 18.2 | ||
| 150 | 17.2 | 17.0 | 17.6 | ||
| 200 | 17.2 | 17.2 | 18.1 | ||
| 250 | 17.3 | 17.1 | 17.9 | ||
| 300 | 17.2 | 17.3 | 17.9 | ||
| 600 | 17.3 | 16.9 | 17.6 | ||
F = number of features