Literature DB >> 25964663

Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data.

Hongying Dai1, Richard Charnigo2.   

Abstract

Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, [Formula: see text], is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with [Formula: see text] brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3: , e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.
© The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  Cluster; Compound hierarchical correlated betamixture; EM and SEMalgorithm; Exchangeable correlation structure

Mesh:

Substances:

Year:  2015        PMID: 25964663      PMCID: PMC4701176          DOI: 10.1093/biostatistics/kxv016

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  15 in total

1.  TRANSFAC: an integrated system for gene expression regulation.

Authors:  E Wingender; X Chen; R Hehl; H Karas; I Liebich; V Matys; T Meinhardt; M Prüss; I Reuter; F Schacherer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Transcription regulation and animal diversity.

Authors:  Michael Levine; Robert Tjian
Journal:  Nature       Date:  2003-07-10       Impact factor: 49.962

3.  Applications of beta-mixture models in bioinformatics.

Authors:  Yuan Ji; Chunlei Wu; Ping Liu; Jing Wang; Kevin R Coombes
Journal:  Bioinformatics       Date:  2005-02-15       Impact factor: 6.937

4.  ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation.

Authors:  S B Montgomery; O L Griffith; M C Sleumer; C M Bergman; M Bilenky; E D Pleasance; Y Prychyna; X Zhang; S J M Jones
Journal:  Bioinformatics       Date:  2006-01-05       Impact factor: 6.937

5.  A beta-mixture model for dimensionality reduction, sample classification and analysis.

Authors:  Kirsti Laurila; Bodil Oster; Claus L Andersen; Philippe Lamy; Torben Orntoft; Olli Yli-Harja; Carsten Wiuf
Journal:  BMC Bioinformatics       Date:  2011-05-27       Impact factor: 3.169

6.  A clustering property of highly-degenerate transcription factor binding sites in the mammalian genome.

Authors:  Chaolin Zhang; Zhenyu Xuan; Stefanie Otto; John R Hover; Sean R McCorkle; Gail Mandel; Michael Q Zhang
Journal:  Nucleic Acids Res       Date:  2006-05-02       Impact factor: 16.971

7.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters.

Authors:  Enrique Blanco; Domènec Farré; M Mar Albà; Xavier Messeguer; Roderic Guigó
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.

Authors:  Xiaofeng Dai; Timo Erkkilä; Olli Yli-Harja; Harri Lähdesmäki
Journal:  BMC Bioinformatics       Date:  2009-05-29       Impact factor: 3.169

9.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data.

Authors:  Andrew E Teschendorff; Francesco Marabita; Matthias Lechner; Thomas Bartlett; Jesper Tegner; David Gomez-Cabrero; Stephan Beck
Journal:  Bioinformatics       Date:  2012-11-21       Impact factor: 6.937

10.  Probabilistic inference of transcription factor binding from multiple data sources.

Authors:  Harri Lähdesmäki; Alistair G Rust; Ilya Shmulevich
Journal:  PLoS One       Date:  2008-03-26       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.