Literature DB >> 24519380

An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function.

Peng Yu1, Chad A Shaw1.   

Abstract

The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24519380      PMCID: PMC4081639          DOI: 10.1093/bioinformatics/btu079

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

Review 1.  Alternative mRNA polyadenylation in eukaryotes: an effective regulator of gene expression.

Authors:  Carol S Lutz; Alexandra Moreira
Journal:  Wiley Interdiscip Rev RNA       Date:  2010-09-20       Impact factor: 9.957

2.  Fisher information matrix of the Dirichlet-multinomial distribution.

Authors:  Sudhir R Paul; Uditha Balasooriya; Tathagata Banerjee
Journal:  Biom J       Date:  2005-04       Impact factor: 2.207

Review 3.  Sequencing technologies - the next generation.

Authors:  Michael L Metzker
Journal:  Nat Rev Genet       Date:  2009-12-08       Impact factor: 53.242

4.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.

Authors:  K Sjölander; K Karplus; M Brown; R Hughey; A Krogh; I S Mian; D Haussler
Journal:  Comput Appl Biosci       Date:  1996-08

5.  SON connects the splicing-regulatory network with pluripotency in human embryonic stem cells.

Authors:  Xinyi Lu; Jonathan Göke; Friedrich Sachs; Pierre-Étienne Jacques; Hongqing Liang; Bo Feng; Guillaume Bourque; Paula A Bubulya; Huck-Hui Ng
Journal:  Nat Cell Biol       Date:  2013-09-08       Impact factor: 28.824

6.  Analysis of dichotomous response data from certain toxicological experiments.

Authors:  J K Haseman; L L Kupper
Journal:  Biometrics       Date:  1979-03       Impact factor: 2.571

7.  Using Dirichlet mixture priors to derive hidden Markov models for protein families.

Authors:  M Brown; R Hughey; A Krogh; I S Mian; K Sjölander; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993

8.  Metagenome analyses of corroded concrete wastewater pipe biofilms reveal a complex microbial system.

Authors:  Vicente Gomez-Alvarez; Randy P Revetta; Jorge W Santo Domingo
Journal:  BMC Microbiol       Date:  2012-06-22       Impact factor: 3.605

9.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

10.  Alternative isoform regulation in human tissue transcriptomes.

Authors:  Eric T Wang; Rickard Sandberg; Shujun Luo; Irina Khrebtukova; Lu Zhang; Christine Mayr; Stephen F Kingsmore; Gary P Schroth; Christopher B Burge
Journal:  Nature       Date:  2008-11-27       Impact factor: 49.962

View more
  8 in total

1.  CELF1 contributes to aberrant alternative splicing patterns in the type 1 diabetic heart.

Authors:  KarryAnne Belanger; Curtis A Nutter; Jin Li; Sadia Tasnim; Peiru Liu; Peng Yu; Muge N Kuyumcu-Martinez
Journal:  Biochem Biophys Res Commun       Date:  2018-08-27       Impact factor: 3.575

2.  Model-based estimation of baseball batting metrics.

Authors:  Lahiru Wickramasinghe; Alexandre Leblanc; Saman Muthukumarana
Journal:  J Appl Stat       Date:  2020-06-05       Impact factor: 1.416

3.  Activity-dependent aberrations in gene expression and alternative splicing in a mouse model of Rett syndrome.

Authors:  Sivan Osenberg; Ariel Karten; Jialin Sun; Jin Li; Shaun Charkowick; Christy A Felice; Mary Kritzer; Minh Vu Chuong Nguyen; Peng Yu; Nurit Ballas
Journal:  Proc Natl Acad Sci U S A       Date:  2018-05-16       Impact factor: 11.205

4.  Genome-wide transcriptome analysis identifies alternative splicing regulatory network and key splicing factors in mouse and human psoriasis.

Authors:  Jin Li; Peng Yu
Journal:  Sci Rep       Date:  2018-03-07       Impact factor: 4.379

5.  Specification of Drosophila neuropeptidergic neurons by the splicing component brr2.

Authors:  Ignacio Monedero Cobeta; Caroline Bivik Stadler; Jin Li; Peng Yu; Stefan Thor; Jonathan Benito-Sipos
Journal:  PLoS Genet       Date:  2018-08-22       Impact factor: 5.917

6.  A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

Authors:  Xiang Gao; Huaiying Lin; Qunfeng Dong
Journal:  mSphere       Date:  2017-12-13       Impact factor: 4.389

7.  A simple computer vision pipeline reveals the effects of isolation on social interaction dynamics in Drosophila.

Authors:  Guangda Liu; Tanmay Nath; Gerit A Linneweber; Annelies Claeys; Zhengyu Guo; Jin Li; Mercedes Bengochea; Steve De Backer; Barbara Weyn; Manu Sneyders; Hans Nicasy; Peng Yu; Paul Scheunders; Bassem A Hassan
Journal:  PLoS Comput Biol       Date:  2018-08-30       Impact factor: 4.475

8.  RBPMetaDB: a comprehensive annotation of mouse RNA-Seq datasets with perturbations of RNA-binding proteins.

Authors:  Jin Li; Su-Ping Deng; Jacob Vieira; James Thomas; Valerio Costa; Ching-San Tseng; Franjo Ivankovic; Alfredo Ciccodicola; Peng Yu
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.