| Literature DB >> 22373181 |
Soonil Kwon1, Xiaofei Yan, Jinrui Cui, Jie Yao, Kai Yang, Donald Tsiang, Xiaohui Li, Jerome I Rotter, Xiuqing Guo.
Abstract
Genetic association studies usually involve a large number of single-nucleotide polymorphisms (SNPs) (k) and a relative small sample size (n), which produces the situation that k is much greater than n. Because conventional statistical approaches are unable to deal with multiple SNPs simultaneously when k is much greater than n, single-SNP association studies have been used to identify genes involved in a disease's pathophysiology, which causes a multiple testing problem. To evaluate the contribution of multiple SNPs simultaneously to disease traits when k is much greater than n, we developed the Bayesian regression with singular value decomposition (BRSVD) method. The method reduces the dimension of the design matrix from k to n by applying singular value decomposition to the design matrix. We evaluated the model using a Markov chain Monte Carlo simulation with Gibbs sampler constructed from the posterior densities driven by conjugate prior densities. Permutation was incorporated to generate empirical p-values. We applied the BRSVD method to the sequence data provided by Genetic Analysis Workshop 17 and found that the BRSVD method is a practical method that can be used to analyze sequence data in comparison to the single-SNP association test and the penalized regression method.Entities:
Year: 2011 PMID: 22373181 PMCID: PMC3287895 DOI: 10.1186/1753-6561-5-S9-S57
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Single-SNP association analysis from PLINK. x-axis: All SNPs on chromosomes 1–22 are numbered from 1 to 24,487. y-axis: −log10(p-value). The names for the two SNPs that were correctly identified are given.
Figure 2Association results from the penalized regression method. x-axis: All SNPs on chromosomes 1–22 are numbered from 1 to 24,487. y-axis: −log10(p-value). The three correctly identified SNPs are given.
Figure 3Association results from the BRSVD method. x-axis: All SNPs on chromosomes 1–22 are numbered from 1 to 24,487. y-axis: −log10(p-value). The nine correctly identified SNPs are given.
Summary of validation of the three methods
| Empirical outcome | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Single-SNP association | PR method | BRSVD method | |||||||
| Actual outcome | E′ (= 50) | IE′ (= 24,437) | E′ (= 16) | IE′ (= 24,471) | E′ (= 45) | IE′ (= 24,442) | |||
| TP = 2 | FN = 37 | Sen = 0.051 | TP = 3 | FN = 36 | Sen = 0.077 | TP = 9 | FN = 30 | Sen = 0.231 | |
| IE (= 24,448) | FP = 48 | TN = 24,400 | Spe = 0.998 | FP = 13 | TN = 24,435 | Spe = 0.9995 | FP = 36 | TN = 24,412 | Spe = 0.9985 |
| PPV = 0.04 | NPV = 0.9984 | PPV = 0.187 | NPV = 0.9985 | PPV = 0.2 | NPV = 0.9988 | ||||
Empirical analysis results for the three methods. E is the number of SNPs that are truly effective, IE is the number of SNPs that are ineffective). E` is the number of SNPs that are empirically effective, IE is the number of SNPs that are ineffective); TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; NPV, negative predictive value; Sen, sensitivity; Spe, specificity.