Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multiple testing in genome-wide association studies via hidden Markov models.

Literature DB >> 19654115

Multiple testing in genome-wide association studies via hidden Markov models.

Zhi Wei¹, Wenguang Sun, Kai Wang, Hakon Hakonarson.

Abstract

MOTIVATION: Genome-wide association studies (GWAS) interrogate common genetic variation across the entire human genome in an unbiased manner and hold promise in identifying genetic variants with moderate or weak effect sizes. However, conventional testing procedures, which are mostly P-value based, ignore the dependency and therefore suffer from loss of efficiency. The goal of this article is to exploit the dependency information among adjacent single nucleotide polymorphisms (SNPs) to improve the screening efficiency in GWAS.
RESULTS: We propose to model the linear block dependency in the SNP data using hidden Markov models (HMMs). A compound decision-theoretic framework for testing HMM-dependent hypotheses is developed. We propose a powerful data-driven procedure [pooled local index of significance (PLIS)] that controls the false discovery rate (FDR) at the nominal level. PLIS is shown to be optimal in the sense that it has the smallest false negative rate (FNR) among all valid FDR procedures. By re-ranking significance for all SNPs with dependency considered, PLIS gains higher power than conventional P-value based methods. Simulation results demonstrate that PLIS dominates conventional FDR procedures in detecting disease-associated SNPs. Our method is applied to analysis of the SNP data from a GWAS of type 1 diabetes. Compared with the Benjamini-Hochberg (BH) procedure, PLIS yields more accurate results and has better reproducibility of findings.
CONCLUSION: The genomic rankings based on our procedure are substantially different from the rankings based on the P-values. By integrating information from adjacent locations, the PLIS rankings benefit from the increased signal-to-noise ratio, hence our procedure often has higher statistical power and better reproducibility. It provides a promising direction in large-scale GWAS. AVAILABILITY: An R package PLIS has been developed to implement the PLIS procedure. Source codes are available upon request and will be available on CRAN (http://cran.r-project.org/). CONTACT: zhiwei@njit.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Disease Species

Mesh：

Year: 2009 PMID： 19654115 DOI： 10.1093/bioinformatics/btp476

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

16 in total

1. False Discovery Control in Large-Scale Spatial Multiple Testing.

Authors: Wenguang Sun; Brian J Reich; T Tony Cai; Michele Guindani; Armin Schwartzman
Journal: J R Stat Soc Series B Stat Methodol Date: 2015-01-01 Impact factor: 4.488

2. Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits.

Authors: Hui Yi; Patrick Breheny; Netsanet Imam; Yongmei Liu; Ina Hoeschele
Journal: Genetics Date: 2014-10-28 Impact factor: 4.562

3. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

Authors: Pei Fen Kuan; Derek Y Chiang
Journal: Biometrics Date: 2012-01-19 Impact factor: 2.571

4. Multiple testing for neuroimaging via hidden Markov random field.

Authors: Hai Shu; Bin Nan; Robert Koeppe
Journal: Biometrics Date: 2015-05-26 Impact factor: 2.571

5. Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest.

Authors: Usman Roshan; Satish Chikkagoudar; Zhi Wei; Kai Wang; Hakon Hakonarson
Journal: Nucleic Acids Res Date: 2011-02-11 Impact factor: 16.971

10. Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.

Authors: Chiyong Kang; Hyeji Yu; Gwan-Su Yi
Journal: BMC Med Inform Decis Mak Date: 2013-04-05 Impact factor: 2.796

Multiple testing in genome-wide association studies via hidden Markov models.

1. False Discovery Control in Large-Scale Spatial Multiple Testing.

2. Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits.

3. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

4. Multiple testing for neuroimaging via hidden Markov random field.

5. Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest.

6. Bayesian Hidden Markov Models for Dependent Large-Scale Multiple Testing.

7. Evolutionary forces shaping genomic islands of population differentiation in humans.

8. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data.

9. Gene hunting with hidden Markov model knockoffs.

10. Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.