Literature DB >> 17563311

Bayesian haplotype inference via the Dirichlet process.

Eric P Xing1, Michael I Jordan, Roded Sharan.   

Abstract

The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods.

Mesh:

Year:  2007        PMID: 17563311     DOI: 10.1089/cmb.2006.0102

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  4 in total

1.  Modelling transcriptional regulation with a mixture of factor analyzers and variational Bayesian expectation maximization.

Authors:  Kuang Lin; Dirk Husmeier
Journal:  EURASIP J Bioinform Syst Biol       Date:  2009-06-11

2.  CSHAP: efficient haplotype frequency estimation based on sparse representation.

Authors:  Yinsheng Zhou; Han Zhang; Yaning Yang
Journal:  Bioinformatics       Date:  2019-08-15       Impact factor: 6.937

3.  Mixture models with a prior on the number of components.

Authors:  Jeffrey W Miller; Matthew T Harrison
Journal:  J Am Stat Assoc       Date:  2017-11-13       Impact factor: 5.033

4.  Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.

Authors:  Alejandro Cruz-Marcelo; Gary L Rosner; Peter Müller; Clinton F Stewart
Journal:  J Stat Theory Pract       Date:  2013-04-01
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.