Xiaonan Hu1, Wei Zhang2, Sanguo Zhang1, Shuangge Ma3, Qizhai Li2. 1. School of Mathematical Sciences, University of Chinese Academy of Sciences Key Laboratory of Big Data Mining and Knowledge Management. 2. Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China. 3. Department of Biostatistics, Yale University, New Haven, CT, USA.
Abstract
MOTIVATION: In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis. RESULTS: The commonly used P-value combination methods (such as the Fisher's combined test, the truncated product method, the truncated tail strength and the adaptive rank truncated product) may lose power when the significance level varies across SNPs. To tackle this problem, a group combined P-value method (GCP) is proposed, where the P-values are divided into multiple groups and then are combined at the group level. With this strategy, the significance values are integrated at different levels, and the power is improved. Simulation shows that the GCP can effectively control the type I error rates and have additional power over the existing methods-the power increase can be as high as over 50% under some situations. The proposed GCP method is applied to data from the Genetic Analysis Workshop 16. Among all the methods, only the GCP and ARTP can give the significance to identify a genomic region covering gene DSC3 being associated with rheumatoid arthritis, but the GCP provides smaller P-value. AVAILABILITY AND IMPLEMENTATION: http://www.statsci.amss.ac.cn/yjscy/yjy/lqz/201510/t20151027_313273.html CONTACT: liqz@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis. RESULTS: The commonly used P-value combination methods (such as the Fisher's combined test, the truncated product method, the truncated tail strength and the adaptive rank truncated product) may lose power when the significance level varies across SNPs. To tackle this problem, a group combined P-value method (GCP) is proposed, where the P-values are divided into multiple groups and then are combined at the group level. With this strategy, the significance values are integrated at different levels, and the power is improved. Simulation shows that the GCP can effectively control the type I error rates and have additional power over the existing methods-the power increase can be as high as over 50% under some situations. The proposed GCP method is applied to data from the Genetic Analysis Workshop 16. Among all the methods, only the GCP and ARTP can give the significance to identify a genomic region covering gene DSC3 being associated with rheumatoid arthritis, but the GCP provides smaller P-value. AVAILABILITY AND IMPLEMENTATION: http://www.statsci.amss.ac.cn/yjscy/yjy/lqz/201510/t20151027_313273.html CONTACT: liqz@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Daniel J Schaid; Shannon K McDonnell; Scott J Hebbring; Julie M Cunningham; Stephen N Thibodeau Journal: Am J Hum Genet Date: 2005-03-22 Impact factor: 11.025
Authors: Kai Yu; Qizhai Li; Andrew W Bergen; Ruth M Pfeiffer; Philip S Rosenberg; Neil Caporaso; Peter Kraft; Nilanjan Chatterjee Journal: Genet Epidemiol Date: 2009-12 Impact factor: 2.135
Authors: Kai Yu; Zhaoming Wang; Qizhai Li; Sholom Wacholder; David J Hunter; Robert N Hoover; Stephen Chanock; Gilles Thomas Journal: PLoS One Date: 2008-07-02 Impact factor: 3.240