MOTIVATION: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. RESULTS: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. AVAILABILITY: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/.
MOTIVATION: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. RESULTS: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. AVAILABILITY: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/.
Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330
Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205
Authors: Sergey Nejentsev; Joanna M M Howson; Neil M Walker; Jeffrey Szeszko; Sarah F Field; Helen E Stevens; Pamela Reynolds; Matthew Hardy; Erna King; Jennifer Masters; John Hulme; Lisa M Maier; Deborah Smyth; Rebecca Bailey; Jason D Cooper; Gloria Ribas; R Duncan Campbell; David G Clayton; John A Todd Journal: Nature Date: 2007-11-14 Impact factor: 49.962
Authors: Tristan J Hayeck; Noah A Zaitlen; Po-Ru Loh; Bjarni Vilhjalmsson; Samuela Pollack; Alexander Gusev; Jian Yang; Guo-Bo Chen; Michael E Goddard; Peter M Visscher; Nick Patterson; Alkes L Price Journal: Am J Hum Genet Date: 2015-04-16 Impact factor: 11.025
Authors: Jesse Mez; Jaeyoon Chung; Gyungah Jun; Joshua Kriegel; Alexandra P Bourlas; Richard Sherva; Mark W Logue; Lisa L Barnes; David A Bennett; Joseph D Buxbaum; Goldie S Byrd; Paul K Crane; Nilüfer Ertekin-Taner; Denis Evans; M Daniele Fallin; Tatiana Foroud; Alison Goate; Neill R Graff-Radford; Kathleen S Hall; M Ilyas Kamboh; Walter A Kukull; Eric B Larson; Jennifer J Manly; Jonathan L Haines; Richard Mayeux; Margaret A Pericak-Vance; Gerard D Schellenberg; Kathryn L Lunetta; Lindsay A Farrer Journal: Alzheimers Dement Date: 2016-10-20 Impact factor: 21.566
Authors: Joel Mefford; Danny Park; Zhili Zheng; Arthur Ko; Mika Ala-Korpela; Markku Laakso; Päivi Pajukanta; Jian Yang; John Witte; Noah Zaitlen Journal: J Comput Biol Date: 2020-02-20 Impact factor: 1.479
Authors: Karol Estrada; Ingvild Aukrust; Lise Bjørkhaug; Noël P Burtt; Josep M Mercader; Humberto García-Ortiz; Alicia Huerta-Chagoya; Hortensia Moreno-Macías; Geoffrey Walford; Jason Flannick; Amy L Williams; María J Gómez-Vázquez; Juan C Fernandez-Lopez; Angélica Martínez-Hernández; Silvia Jiménez-Morales; Federico Centeno-Cruz; Elvia Mendoza-Caamal; Cristina Revilla-Monsalve; Sergio Islas-Andrade; Emilio J Córdova; Xavier Soberón; María E González-Villalpando; E Henderson; Lynne R Wilkens; Loic Le Marchand; Olimpia Arellano-Campos; Maria L Ordóñez-Sánchez; Maribel Rodríguez-Torres; Rosario Rodríguez-Guillén; Laura Riba; Laeya A Najmi; Suzanne B R Jacobs; Timothy Fennell; Stacey Gabriel; Pierre Fontanillas; Craig L Hanis; Donna M Lehman; Christopher P Jenkinson; Hanna E Abboud; Graeme I Bell; Maria L Cortes; Michael Boehnke; Clicerio González-Villalpando; Lorena Orozco; Christopher A Haiman; Teresa Tusié-Luna; Carlos A Aguilar-Salinas; David Altshuler; Pål R Njølstad; Jose C Florez; Daniel G MacArthur Journal: JAMA Date: 2014-06-11 Impact factor: 56.272