Huwenbo Shi1, Bogdan Pasaniuc2, Kenneth L Lange3. 1. Bioinformatics Interdepartmental Program, University of California, Los Angeles. 2. Bioinformatics Interdepartmental Program, University of California, Los Angeles, Department of Pathology and Laboratory Medicine, Department of Human Genetics and. 3. Bioinformatics Interdepartmental Program, University of California, Los Angeles, Department of Human Genetics and Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA.
Abstract
MOTIVATION: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: shihuwenbo@ucla.edu or pasaniuc@ucla.edu.
MOTIVATION: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: shihuwenbo@ucla.edu or pasaniuc@ucla.edu.
Authors: Alkes L Price; Michael E Weale; Nick Patterson; Simon R Myers; Anna C Need; Kevin V Shianna; Dongliang Ge; Jerome I Rotter; Esther Torres; Kent D Taylor; David B Goldstein; David Reich Journal: Am J Hum Genet Date: 2008-07 Impact factor: 11.025
Authors: Bryan Howie; Christian Fuchsberger; Matthew Stephens; Jonathan Marchini; Gonçalo R Abecasis Journal: Nat Genet Date: 2012-07-22 Impact factor: 38.330
Authors: Charles C Chung; Peter A Kanetsky; Zhaoming Wang; Michelle A T Hildebrandt; Roelof Koster; Rolf I Skotheim; Christian P Kratz; Clare Turnbull; Victoria K Cortessis; Anne C Bakken; D Timothy Bishop; Michael B Cook; R Loren Erickson; Sophie D Fosså; Kevin B Jacobs; Larissa A Korde; Sigrid M Kraggerud; Ragnhild A Lothe; Jennifer T Loud; Nazneen Rahman; Eila C Skinner; Duncan C Thomas; Xifeng Wu; Meredith Yeager; Fredrick R Schumacher; Mark H Greene; Stephen M Schwartz; Katherine A McGlynn; Stephen J Chanock; Katherine L Nathanson Journal: Nat Genet Date: 2013-05-12 Impact factor: 38.330
Authors: Gustavo de Los Campos; Ana I Vazquez; Rohan Fernando; Yann C Klimentidis; Daniel Sorensen Journal: PLoS Genet Date: 2013-07-11 Impact factor: 5.917