| Literature DB >> 31827228 |
Ari Siitonen1,2, Laura Kytövuori3,4, Mike A Nalls5,6, Raphael Gibbs5, Dena G Hernandez5, Pauli Ylikotila7,8, Markku Peltonen9, Andrew B Singleton5, Kari Majamaa3,4.
Abstract
Variants associated with Parkinson's disease (PD) have generally a small effect size and, therefore, large sample sizes or targeted analyses are required to detect significant associations in a whole exome sequencing (WES) study. Here, we used protein-protein interaction (PPI) information on 36 genes with established or suggested associations with PD to target the analysis of the WES data. We performed an association analysis on WES data from 439 Finnish PD subjects and 855 controls, and included a Finnish population cohort as the replication dataset with 60 PD subjects and 8214 controls. Single variant association (SVA) test in the discovery dataset yielded 11 candidate variants in seven genes, but the associations were not significant in the replication cohort after correction for multiple testing. Polygenic risk score using variants rs2230288 and rs2291312, however, was associated to PD with odds ratio of 2.7 (95% confidence interval 1.4-5.2; p < 2.56e-03). Furthermore, an analysis of the PPI network revealed enriched clusters of biological processes among established and candidate genes, and these functional networks were visualized in the study. We identified novel candidate variants for PD using a gene prioritization based on PPI information, and described why these variants may be involved in the pathogenesis of PD.Entities:
Mesh:
Year: 2019 PMID: 31827228 PMCID: PMC6906405 DOI: 10.1038/s41598-019-55479-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Genes with suggested associations with Parkinson’s disease that were used to build PD2300net.
| # | Uniprot ID | Gene Symbol | # | Uniprot ID | Gene Symbol |
|---|---|---|---|---|---|
| 1 | Q9NQ11 | ATP13A2 | 19 | P49821 | NDUFV1 |
| 2 | Q9Y6H1 | CHCHD2 | 20 | Q99497 | PARK7 |
| 3 | O75165 | DNAJC13 | 21 | O95263 | PDE8B |
| 4 | O75061 | DNAJC6 | 22 | Q9BXM7 | PINK1 |
| 5 | Q04637 | EIF4G1 | 23 | O60733 | PLA2G6 |
| 6 | Q9Y3I1 | FBXO7 | 24 | P54098 | POLG |
| 7 | P04062 | GBA | 25 | Q9UGJ0 | PRKAG2 |
| 8 | Q6Y7W6 | GIGYF2 | 26 | O60260 | PRKN |
| 9 | O43464 | HTRA2 | 27 | P37840 | SNCA |
| 10 | Q5S007 | LRRK2 | 28 | Q9Y6H5 | SNCAIP |
| 11 | P10636 | MAPT | 29 | Q13501 | SQSTM1 |
| 12 | P03886 | MT-ND1 | 30 | O43426 | SYNJ1 |
| 13 | P03897 | MT-ND3 | 31 | Q9BSA9 | TMEM175 |
| 14 | P03915 | MT-ND5 | 32 | Q96A57 | TMEM230 |
| 15 | Q8N183 | NDUFAF2 | 33 | P09936 | UCHL1 |
| 16 | Q5TEU4 | NDUFAF5 | 34 | P55072 | VCP |
| 17 | O43181 | NDUFS4 | 35 | Q709C8 | VPS13C |
| 18 | O75251 | NDUFS7 | 36 | Q96QK1 | VPS35 |
Figure 1Whole exome sequencing data analysis workflow.
Single variants in the discovery and replication datasets.
| Discovery Set | Replication Set | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GENE | SNP | CHR | BP | A1 | OR | P | C_A | C_U | F_A | F_U | OR | P | C_A | C_U | F_A | F_U |
| UBXN11 | rs117509001 | 1 | 26629342 | A | 5.979 | 0.0004805 | 13 | 6 | 0.0148 | 0.0035 | NA | NA | 0 | 80 | 0 | 0.00487 |
| GBA | rs2230288 | 1 | 155206167 | T | 2.208 | 8.927e-06 | 74 | 81 | 0.0855 | 0.0474 | 2.137 | 0.02379 | 10 | 676 | 0.083 | 0.04115 |
| TTN | rs2627037 | 2 | 179606538 | A | 1.616 | 0.0004346 | 121 | 166 | 0.1378 | 0.0971 | 1.61 | 0.05265 | 20 | 1780 | 0.167 | 0.1084 |
| TTN | rs922984 | 2 | 179615887 | T | 1.637 | 0.0003337 | 119 | 163 | 0.1355 | 0.0953 | 1.64 | 0.04411 | 20 | 1749 | 0.167 | 0.1065 |
| TTN | rs2291310 | 2 | 179623758 | C | 1.637 | 0.0003337 | 119 | 163 | 0.1355 | 0.0953 | 1.642 | 0.04356 | 20 | 1747 | 0.167 | 0.1063 |
| TTN | rs2291311 | 2 | 179629461 | C | 1.637 | 0.0003337 | 119 | 163 | 0.1355 | 0.0953 | 1.641 | 0.04386 | 20 | 1748 | 0.167 | 0.1064 |
| TTN | rs2291312 | 2 | 179631214 | C | 1.637 | 0.0003337 | 119 | 163 | 0.1355 | 0.0953 | 1.64 | 0.0441 | 20 | 1749 | 0.167 | 0.1065 |
| IKBKB | rs140485496 | 8 | 42178280 | T | 2.666 | 0.0001978 | 31 | 34 | 0.0353 | 0.0199 | NA | NA | 0 | 376 | 0 | 0.02289 |
| MIR7705/PABPC1 | rs113574896 | 8 | 101717195 | C | 3.987 | 1.122e-11 | 84 | 46 | 0.0966 | 0.0269 | NA | NA | 0 | 10 | 0 | 0.000609 |
| INA | chr10_105048270_AGAG_A | 10 | 105048270 | A | 5.722 | 5.064e-05 | 14 | 11 | 0.0294 | 0.0064 | 2.782 | 0.01913 | 6 | 302 | 0.05 | 0.01838 |
| KARS/TERF2IP | rs1865493 | 16 | 75681743 | G | 0.548 | 5.161e-05 | 84 | 242 | 0.0957 | 0.1417 | 1.169 | 0.54 | 18 | 2130 | 0.15 | 0.1297 |
Discovery set: cases N = 439; controls N = 855; replication set: cases N = 60; controls N = 8214; Bonferroni cutoff p < 0.0045; OR = odds ratio; C_A = Allele 1 count among cases; C_U = Allele 1 count among controls; F_A = Allele 1 frequency among cases; F_U = Allele 1 frequency among controls.
Figure 2Workflow of creating visualization of protein-protein interaction network. PPI, protein-protein interaction; WES, whole exome sequencing; GWAS, genome-wide association study; GSEA, gene set enrichment analysis.
Figure 3PD network 1. Protein-protein interaction network visualizing the interactions between established and suggested PD genes and candidate genes. Interactions (edges) of the seven novel candidate genes are highlighted in red color. Abbreviations: PD36, 36 established or suggested PD genes; CANDIDATE, seven novel candidate genes; GWAS NALLS, GWAS hits in Nalls et al.[7] meta-analysis discovery phase; GWAS CHANG, GWAS hits in Chang et al.[6] meta-analysis discovery phase; GWAS/WES FIN, significant GWAS hits and selected WES hits in Siitonen et al.[5].
Logistic regression results of polygenic risk score in the replication dataset.
| P | FDR | Bonf | OR | 2.5% | 97.5% | Estimate | std.error | Statistic | |
|---|---|---|---|---|---|---|---|---|---|
| PRS | 2.56e-03 | 1.19e-02 | 3.58e-02 | 2.7078 | 1.4175 | 5.1728 | 1.00 | 0.33 | 3.016 |
| AGE | 5.26e-07 | 3.68e-06 | 7.37e-06 | 1.0584 | 1.0352 | 1.0821 | 0.06 | 0.01 | 5.016 |
| PC1 | 4.16e-02 | 1.46e-01 | 5.83e-01 | 0 | 0 | 0.3028 | −31.55 | 15.49 | −2.037 |
PRS = Polygenic risk score; AGE = age at onset/age at sampling; PC1 = principal component 1; P = p value; FDR = False discovery rate; Bonf = Bonferroni correction; OR = Odds ratio; 2.5% = 95% lower confidence; 97.5% = 95% upper confidence.
Metrics of prediction models in the replication dataset.
| Model | Accuracy | Specificity | Sensitivity | Bal. accuracy | AUC | 95%CI |
|---|---|---|---|---|---|---|
| PRS | 0.73 | 0.73 | 0.38 | 0.56 | 0.57 | 0.499–0.65 |
| Variant | 0.73 | 0.73 | 0.38 | 0.56 | 0.57 | 0.501–0.63 |
PRS = Polygenic risk score model; Variant = Variant model; Bal. accuracy = Balanced accuracy score; AUC = area under curve score; 95%CI = AUC 95% confidence interval.
Confusion matrix of models in the replication dataset.
| Predicted as cases | Predicted as controls | |
|---|---|---|
| Actual Cases | 23 (TP) | 37 (FN) |
| Actual Controls | 2199 (FP) | 6015 (TN) |
TP = True positive; FN = False negative; FP = False positive; TN = True negative.