MOTIVATION: The quality control (QC) filtering of single nucleotide polymorphisms (SNPs) is an important step in genome-wide association studies to minimize potential false findings. SNP QC commonly uses expert-guided filters based on QC variables [e.g. Hardy-Weinberg equilibrium, missing proportion (MSP) and minor allele frequency (MAF)] to remove SNPs with insufficient genotyping quality. The rationale of the expert filters is sensible and concrete, but its implementation requires arbitrary thresholds and does not jointly consider all QC features. RESULTS: We propose an algorithm that is based on principal component analysis and clustering analysis to identify low-quality SNPs. The method minimizes the use of arbitrary cutoff values, allows a collective consideration of the QC features and provides conditional thresholds contingent on other QC variables (e.g. different MSP thresholds for different MAFs). We apply our method to the seven studies from the Wellcome Trust Case Control Consortium and the major depressive disorder study from the Genetic Association Information Network. We measured the performance of our method compared to the expert filters based on the following criteria: (i) percentage of SNPs excluded due to low quality; (ii) inflation factor of the test statistics (lambda); (iii) number of false associations found in the filtered dataset; and (iv) number of true associations missed in the filtered dataset. The results suggest that with the same or fewer SNPs excluded, the proposed algorithm tends to give a similar or lower value of lambda, a reduced number of false associations, and retains all true associations. AVAILABILITY: The algorithm is available at http://www4.stat.ncsu.edu/jytzeng/software.php
MOTIVATION: The quality control (QC) filtering of single nucleotide polymorphisms (SNPs) is an important step in genome-wide association studies to minimize potential false findings. SNP QC commonly uses expert-guided filters based on QC variables [e.g. Hardy-Weinberg equilibrium, missing proportion (MSP) and minor allele frequency (MAF)] to remove SNPs with insufficient genotyping quality. The rationale of the expert filters is sensible and concrete, but its implementation requires arbitrary thresholds and does not jointly consider all QC features. RESULTS: We propose an algorithm that is based on principal component analysis and clustering analysis to identify low-quality SNPs. The method minimizes the use of arbitrary cutoff values, allows a collective consideration of the QC features and provides conditional thresholds contingent on other QC variables (e.g. different MSP thresholds for different MAFs). We apply our method to the seven studies from the Wellcome Trust Case Control Consortium and the major depressive disorder study from the Genetic Association Information Network. We measured the performance of our method compared to the expert filters based on the following criteria: (i) percentage of SNPs excluded due to low quality; (ii) inflation factor of the test statistics (lambda); (iii) number of false associations found in the filtered dataset; and (iv) number of true associations missed in the filtered dataset. The results suggest that with the same or fewer SNPs excluded, the proposed algorithm tends to give a similar or lower value of lambda, a reduced number of false associations, and retains all true associations. AVAILABILITY: The algorithm is available at http://www4.stat.ncsu.edu/jytzeng/software.php
Authors: David G Clayton; Neil M Walker; Deborah J Smyth; Rebecca Pask; Jason D Cooper; Lisa M Maier; Luc J Smink; Alex C Lam; Nigel R Ovington; Helen E Stevens; Sarah Nutland; Joanna M M Howson; Malek Faham; Martin Moorhead; Hywel B Jones; Matthew Falkowski; Paul Hardenbol; Thomas D Willis; John A Todd Journal: Nat Genet Date: 2005-10-09 Impact factor: 38.330
Authors: J P Hugot; M Chamaillard; H Zouali; S Lesage; J P Cézard; J Belaiche; S Almer; C Tysk; C A O'Morain; M Gassull; V Binder; Y Finkel; A Cortot; R Modigliani; P Laurent-Puig; C Gower-Rousseau; J Macry; J F Colombel; M Sahbatou; G Thomas Journal: Nature Date: 2001-05-31 Impact factor: 49.962
Authors: Alejandro Q Nato; Nicola H Chapman; Harkirat K Sohi; Hiep D Nguyen; Zoran Brkanac; Ellen M Wijsman Journal: Bioinformatics Date: 2015-07-30 Impact factor: 6.937
Authors: Xiang-Lin Tan; Ann M Moyer; Brooke L Fridley; Daniel J Schaid; Nifang Niu; Anthony J Batzler; Gregory D Jenkins; Ryan P Abo; Liang Li; Julie M Cunningham; Zhifu Sun; Ping Yang; Liewei Wang Journal: Clin Cancer Res Date: 2011-07-20 Impact factor: 12.531
Authors: Praveen F Cherukuri; Melissa M Soe; David E Condon; Shubhi Bartaria; Kaitlynn Meis; Shaopeng Gu; Frederick G Frost; Lindsay M Fricke; Krzysztof P Lubieniecki; Joanna M Lubieniecka; Robert E Pyatt; Catherine Hajek; Cornelius F Boerkoel; Lynn Carmichael Journal: BMC Med Genomics Date: 2022-03-14 Impact factor: 3.063
Authors: Nuala J Meyer; Rui Feng; Mingyao Li; Yang Zhao; Chau-Chyun Sheu; Paula Tejera; Robert Gallop; Scarlett Bellamy; Melanie Rushefski; Paul N Lanken; Richard Aplenc; Grant E O'Keefe; Mark M Wurfel; David C Christiani; Jason D Christie Journal: Am J Respir Crit Care Med Date: 2013-05-01 Impact factor: 21.405
Authors: Ivan D Barrero Farfan; Gerald N De La Fuente; Seth C Murray; Thomas Isakeit; Pei-Cheng Huang; Marilyn Warburton; Paul Williams; Gary L Windham; Mike Kolomiets Journal: PLoS One Date: 2015-02-25 Impact factor: 3.240
Authors: Manisha Goyal; Andreu Coello Pelegrin; Magali Jaillard; Yulia Rosa Saharman; Corné H W Klaassen; Henri A Verbrugh; Juliëtte A Severin; Alex van Belkum Journal: Front Microbiol Date: 2022-07-14 Impact factor: 6.064