| Literature DB >> 18382603 |
Petri Kontkanen1, Hannes Wettig, Petri Myllymäki.
Abstract
Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, tree-structured Bayesian networks.Year: 2007 PMID: 18382603 PMCID: PMC3171356 DOI: 10.1155/2007/90947
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145