Christopher Schröder1, Sven Rahmann1. 1. Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany.
Abstract
BACKGROUND: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. METHODS: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular EM algorithm. RESULTS: As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components. CONCLUSIONS: The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method ("betamix") as open source software under the MIT license.
BACKGROUND: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. METHODS: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular EM algorithm. RESULTS:As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components. CONCLUSIONS: The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method ("betamix") as open source software under the MIT license.
Entities:
Keywords:
Beta distribution; Classification; Differential methylation; EM algorithm; Maximum likelihood; Method of moments; Mixture model
Authors: Giulio Caravagna; Timon Heide; Marc J Williams; Luis Zapata; Daniel Nichol; Ketevan Chkhaidze; William Cross; George D Cresswell; Benjamin Werner; Ahmet Acar; Louis Chesler; Chris P Barnes; Guido Sanguinetti; Trevor A Graham; Andrea Sottoriva Journal: Nat Genet Date: 2020-09-02 Impact factor: 38.330