| Literature DB >> 21687685 |
Brooke L Fridley1, Ed Iversen, Ya-Yu Tsai, Gregory D Jenkins, Ellen L Goode, Thomas A Sellers.
Abstract
One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating "features" about a SNP to estimate a latent "quality score", with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2(nd) and 7(th) based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197(th) for invasive case analysis), was ranked 8(th) based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP "features" for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.Entities:
Mesh:
Year: 2011 PMID: 21687685 PMCID: PMC3110798 DOI: 10.1371/journal.pone.0020764
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Diagram of Latent Variable Model.
Figure 2Plot of SNP ranks (mean of posterior distribution of rank) and the −log10(p-values) from analyses using (A) all invasive cases or (B) only invasive serous cases for each of the five BLVMs.
Figure 3Plots of the mean rank (lower diagonal of sub-plots) and standard deviation in rank (upper diagonal of sub-plots) in the posterior distributions of the rankings from the five BLVMs.
The two set of sub-plots are all plotted on the same scale.
Top 40 markers determined from BLVM. The markers are sorted by P(top 5).
| Marker | MAF | Invasive | Serous | Rank based on P | Posterior Dist of Ranks | ||||
| P | OR | P | OR | Invasive | Serous | P(top 5) | Median | ||
| 1 | 0.139 | 2.0E-05 | 0.82 | 2.0E-05 | 0.79 | 2 | 7 | 0.47 | 6 |
| 2 | 0.14 | 2.0E-05 | 0.82 | 3.0E-05 | 0.79 | 1 | 9 | 0.46 | 7 |
| 3 | 0.116 | 3.0E-05 | 1.22 | 1.2E-04 | 1.26 | 3 | 24 | 0.33 | 15 |
| 4 | 0.13 | 4.0E-05 | 0.82 | 4.0E-05 | 0.79 | 4 | 13 | 0.27 | 12 |
| 5 | 0.134 | 4.0E-05 | 0.83 | 6.0E-05 | 0.8 | 7 | 20 | 0.19 | 16 |
| 6 | 0.18 | 4.0E-05 | 0.84 | 8.0E-05 | 0.82 | 6 | 23 | 0.19 | 17 |
| 7 | 0.18 | 4.0E-05 | 0.84 | 8.0E-05 | 0.82 | 5 | 21 | 0.19 | 17 |
| 8 | 0.11 | 2.0E-03 | 1.17 | 1.0E-05 | 1.32 | 197 | 1 | 0.13 | 151 |
| 9 | 0.252 | 4.0E-05 | 0.86 | 3.2E-02 | 0.91 | 8 | 286 | 0.12 | 52 |
| 10 | 0.126 | 7.0E-05 | 1.21 | 8.0E-05 | 1.25 | 9 | 22 | 0.09 | 21 |
| 11 | 0.015 | 1.4E-04 | 0.61 | 5.0E-05 | 0.51 | 41 | 16 | 0.06 | 38 |
| 12 | 0.174 | 1.0E-04 | 0.85 | 1.5E-04 | 0.82 | 20 | 26 | 0.05 | 29 |
| 13 | 0.015 | 1.5E-04 | 1.64 | 6.0E-05 | 1.94 | 43 | 19 | 0.05 | 40 |
| 14 | 0.049 | 8.0E-05 | 0.75 | 1.5E-03 | 0.76 | 10 | 57 | 0.04 | 45 |
| 15 | 0.049 | 8.0E-05 | 1.32 | 1.5E-03 | 1.31 | 11 | 58 | 0.04 | 45 |
| 16 | 0.049 | 8.0E-05 | 0.75 | 1.5E-03 | 0.76 | 13 | 59 | 0.04 | 45 |
| 17 | 0.049 | 8.0E-05 | 1.32 | 1.5E-03 | 1.32 | 12 | 60 | 0.04 | 46 |
| 18 | 0.258 | 9.0E-05 | 0.87 | 3.9E-02 | 0.91 | 15 | 301 | 0.02 | 74 |
| 19 | 0.258 | 9.0E-05 | 1.15 | 3.8E-02 | 1.1 | 16 | 297 | 0.02 | 75 |
| 20 | 0.258 | 9.0E-05 | 0.87 | 3.8E-02 | 0.91 | 17 | 298 | 0.02 | 75 |
| 21 | 0.258 | 9.0E-05 | 0.87 | 3.8E-02 | 0.91 | 18 | 300 | 0.02 | 75 |
| 22 | 0.259 | 9.0E-05 | 0.87 | 4.3E-02 | 0.91 | 14 | 311 | 0.02 | 73 |
| 23 | 0.258 | 1.0E-04 | 1.15 | 4.1E-02 | 1.09 | 19 | 307 | 0.02 | 76 |
| 24 | 0.257 | 1.1E-04 | 1.15 | 3.8E-02 | 1.1 | 24 | 296 | 0.02 | 79 |
| 25 | 0.258 | 1.1E-04 | 1.15 | 4.4E-02 | 1.09 | 21 | 321 | 0.02 | 80 |
| 26 | 0.258 | 1.1E-04 | 1.15 | 4.4E-02 | 1.09 | 22 | 319 | 0.02 | 80 |
| 27 | 0.258 | 1.1E-04 | 0.87 | 4.3E-02 | 0.91 | 25 | 318 | 0.02 | 81 |
| 28 | 0.258 | 1.1E-04 | 1.15 | 4.4E-02 | 1.09 | 23 | 320 | 0.02 | 81 |
| 29 | 0.258 | 1.1E-04 | 0.87 | 4.3E-02 | 0.91 | 26 | 317 | 0.02 | 81 |
| 30 | 0.258 | 1.1E-04 | 0.87 | 4.5E-02 | 0.91 | 28 | 329 | 0.02 | 81 |
| 31 | 0.258 | 1.1E-04 | 1.15 | 4.4E-02 | 1.09 | 30 | 322 | 0.02 | 81 |
| 32 | 0.258 | 1.1E-04 | 0.87 | 4.5E-02 | 0.91 | 27 | 327 | 0.02 | 82 |
| 33 | 0.258 | 1.1E-04 | 1.15 | 4.4E-02 | 1.09 | 31 | 323 | 0.02 | 82 |
| 34 | 0.147 | 4.6E-04 | 1.17 | 8.2E-04 | 1.2 | 67 | 37 | 0.01 | 77 |
| 35 | 0.125 | 5.9E-04 | 0.84 | 8.4E-04 | 0.81 | 81 | 40 | 0.01 | 89 |
| 36 | 0.176 | 1.4E-04 | 0.85 | 1.2E-02 | 0.88 | 40 | 218 | 0.01 | 78 |
| 37 | 0.002 | 2.2E-04 | 0.24 | 4.0E-03 | 0.24 | 53 | 119 | 0.01 | 85 |
| 38 | 0.038 | 7.6E-04 | 1.35 | 5.0E-04 | 1.44 | 90 | 34 | 0.01 | 102 |
| 39 | 0.434 | 3.4E-04 | 0.89 | 6.8E-03 | 0.9 | 57 | 186 | 0.01 | 96 |
| 40 | 0.434 | 3.5E-04 | 0.89 | 6.8E-03 | 0.9 | 59 | 185 | 0.01 | 96 |
Figure 4Relationship between SNP association p-values, rankings based on p-values and BLVM and Probability in the top 5 markers.
I.P and S.P represent the p-values from the analyses involving all invasive cases or invasive serous cases, respectively; I.P.Rank and S.P.Rank represent the rank of the marker based on the p-values from the analysis involving all invasive cases or invasive serous cases, respectively; BLVM.Rank and P.Top5 represent the median rank and the probability of being in the top 5 markers based on the BLVM.
Summary of simulated p-values and results from analysis using BLVM for Scenarios 2, 3 and 4.
| Simulation | Mean Quality Score | Median Rank | Mean Quality Score | Median Rank | ||||
| Scenario 2 | Null | Non-Null | Null | Non-Null | Non-Coding | Coding | Non-Coding | Coding |
| 1 | −0.029 | 2.46 | 50.69 | 1 | −0.052 | 1.57 | 51.5 | 7.3 |
| 2 | −0.018 | 1.64 | 50.95 | 4 | −0.033 | 1.01 | 51.5 | 17.7 |
| 3 | −0.021 | 1.81 | 50.77 | 3 | −0.038 | 1.12 | 51.4 | 13.7 |
| 4 | −0.023 | 2.1 | 51.36 | 1 | −0.037 | 1.12 | 51.8 | 21 |
| 5 | −0.02 | 1.73 | 50.83 | 4 | −0.041 | 1.24 | 51.6 | 11 |
| 6 | −0.02 | 2.08 | 51.11 | 2 | −0.043 | 1.41 | 51.8 | 14 |
| 7 | −0.016 | 1.33 | 50.77 | 7 | −0.027 | 0.8 | 51.1 | 25 |
| 8 | −0.029 | 2.81 | 50.9 | 1 | −0.064 | 2.05 | 51.8 | 4.7 |
| 9 | −0.022 | 2.21 | 51.06 | 2 | −0.04 | 1.32 | 51.6 | 16.3 |
| 10 | −0.023 | 2.36 | 50.82 | 2 | −0.053 | 1.73 | 51.6 | 10.3 |
| Scenario 3 | ||||||||
| 1 | −0.011 | 0.84 | 50.82 | 20 | −0.022 | 0.64 | 51.3 | 26.3 |
| 2 | −0.005 | 0.19 | 50.91 | 51 | −0.039 | 1.15 | 52.1 | 11.3 |
| 3 | −0.009 | 0.6 | 50.65 | 25 | −0.033 | 0.97 | 51.4 | 16.7 |
| 4 | −0.006 | 0.48 | 50.28 | 38 | −0.037 | 1.16 | 51.3 | 13 |
| 5 | −0.025 | 2.1 | 51.13 | 3 | −0.004 | −0.01 | 50.7 | 48.7 |
| 6 | −0.012 | 0.74 | 50.78 | 26 | −0.034 | 0.95 | 51.4 | 21 |
| 7 | −0.003 | 0.26 | 50.77 | 42 | −0.044 | 1.39 | 52 | 7 |
| 8 | 0.002 | −0.19 | 50.2 | 63 | −0.046 | 1.47 | 51.6 | 7.7 |
| 9 | −0.008 | 0.66 | 50.71 | 23 | −0.031 | 0.96 | 51.4 | 19 |
| 10 | −0.004 | 0.72 | 50.92 | 24 | −0.011 | 0.47 | 51.1 | 36.7 |
| Scenario 4 | ||||||||
| 1 | −0.018 | 1.86 | 50.76 | 3 | −0.009 | 0.33 | 50.8 | 35 |
| 2 | −0.017 | 1.53 | 50.88 | 5 | −0.026 | 0.79 | 51.2 | 24.7 |
| 3 | −0.021 | 1.85 | 51.25 | 2 | −0.024 | 0.7 | 51.4 | 29 |
| 4 | −0.018 | 1.91 | 50.78 | 3 | −0.021 | 0.72 | 51.2 | 21.3 |
| 5 | −0.02 | 1.91 | 51.19 | 3 | −0.014 | 0.43 | 51.1 | 39.7 |
| 6 | −0.017 | 1.54 | 50.65 | 6 | −0.008 | 0.19 | 50.4 | 42.3 |
| 7 | −0.023 | 2.07 | 51.01 | 3 | −0.005 | 0.08 | 50.7 | 45 |
| 8 | −0.02 | 1.75 | 50.97 | 4 | −0.026 | 0.77 | 51.4 | 22.3 |
| 9 | −0.038 | 3.54 | 51.06 | 1 | −0.004 | 0.07 | 50.8 | 44.3 |
| 10 | −0.013 | 1.27 | 50.48 | 9 | −0.032 | 1.04 | 51.1 | 16.7 |
*Scenario 2: SNP 10 (coding SNP) simulated to be associated with phenotypes 1 & 2.
Scenario 3: SNP 60 (non-coding SNP) simulated to be associated with phenotypes 1 & 2.
Scenario 4: SNP 60 (non-coding SNP) simulated to be associated with all phenotypes.
Summary of simulated p-values and results from analysis using BLVM for Scenario 1.
| SimulationScenario 1 | Mean Quality Score | Median Rank | ||
| Non-Coding | Coding | Non-Coding | Coding | |
| 1 | −0.01 | 0.2 | 50.75 | 46.67 |
| 2 | −0.02 | 0.7 | 51.18 | 27.33 |
| 3 | −0.03 | 1.0 | 51.10 | 23.33 |
| 4 | 0.00 | 0.0 | 50.79 | 54.00 |
| 5 | −0.03 | 0.9 | 51.32 | 19.33 |
| 6 | −0.02 | 0.5 | 51.11 | 35.00 |
| 7 | −0.02 | 0.4 | 50.96 | 34.00 |
| 8 | −0.02 | 0.5 | 50.93 | 33.33 |
| 9 | −0.02 | 0.6 | 51.64 | 30.33 |
| 10 | −0.04 | 1.1 | 51.63 | 13.67 |