Literature DB >> 24678134

Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations.

Naim U Rashid1, Wei Sun1, Joseph G Ibrahim1.   

Abstract

In DAE (DNA After Enrichment)-seq experiments, genomic regions related with certain biological processes are enriched/isolated by an assay and are then sequenced on a high-throughput sequencing platform to determine their genomic positions. Statistical analysis of DAE-seq data aims to detect genomic regions with significant aggregations of isolated DNA fragments ("enriched regions") versus all the other regions ("background"). However, many confounding factors may influence DAE-seq signals. In addition, the signals in adjacent genomic regions may exhibit strong correlations, which invalidate the independence assumption employed by many existing methods. To mitigate these issues, we develop a novel Autoregressive Hidden Markov Model (AR-HMM) to account for covariates effects and violations of the independence assumption. We demonstrate that our AR-HMM leads to improved performance in identifying enriched regions in both simulated and real datasets, especially in those in epigenetic datasets with broader regions of DAE-seq signal enrichment. We also introduce a variable selection procedure in the context of the HMM/AR-HMM where the observations are not independent and the mean value of each state-specific emission distribution is modeled by some covariates. We study the theoretical properties of this variable selection procedure and demonstrate its efficacy in simulated and real DAE-seq data. In summary, we develop several practical approaches for DAE-seq data analysis that are also applicable to more general problems in statistics.

Entities:  

Keywords:  Autoregressive modeling; Hidden Markov Model; High-throughput Sequencing; Mixture Regression; Variable Selection

Year:  2014        PMID: 24678134      PMCID: PMC3963211          DOI: 10.1080/01621459.2013.869222

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  28 in total

1.  Combined action of PHD and chromo domains directs the Rpd3S HDAC to transcribed chromatin.

Authors:  Bing Li; Madelaine Gogol; Mike Carey; Daeyoup Lee; Chris Seidel; Jerry L Workman
Journal:  Science       Date:  2007-05-18       Impact factor: 47.728

2.  High-resolution profiling of histone methylations in the human genome.

Authors:  Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Dustin E Schones; Zhibin Wang; Gang Wei; Iouri Chepelev; Keji Zhao
Journal:  Cell       Date:  2007-05-18       Impact factor: 41.582

Review 3.  On the EM algorithm for overdispersed count data.

Authors:  G J McLachlan
Journal:  Stat Methods Med Res       Date:  1997-03       Impact factor: 3.021

4.  A Selective Overview of Variable Selection in High Dimensional Feature Space.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  Stat Sin       Date:  2010-01       Impact factor: 1.261

5.  Markov regression models for time series: a quasi-likelihood approach.

Authors:  S L Zeger; B Qaqish
Journal:  Biometrics       Date:  1988-12       Impact factor: 2.571

6.  Leukemia/lymphoma-related factor, a POZ domain-containing transcriptional repressor, interacts with histone deacetylase-1 and inhibits cartilage oligomeric matrix protein gene expression and chondrogenesis.

Authors:  Chuan-ju Liu; Lisa Prazak; Marc Fajardo; Shuang Yu; Neetu Tyagi; Paul E Di Cesare
Journal:  J Biol Chem       Date:  2004-08-26       Impact factor: 5.157

7.  F-Seq: a feature density estimator for high-throughput sequence tags.

Authors:  Alan P Boyle; Justin Guinney; Gregory E Crawford; Terrence S Furey
Journal:  Bioinformatics       Date:  2008-09-10       Impact factor: 6.937

8.  NUP98-NSD1 links H3K36 methylation to Hox-A gene activation and leukaemogenesis.

Authors:  Gang G Wang; Ling Cai; Martina P Pasillas; Mark P Kamps
Journal:  Nat Cell Biol       Date:  2007-06-24       Impact factor: 28.824

9.  ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions.

Authors:  Naim U Rashid; Paul G Giresi; Joseph G Ibrahim; Wei Sun; Jason D Lieb
Journal:  Genome Biol       Date:  2011-07-25       Impact factor: 13.583

10.  The accessible chromatin landscape of the human genome.

Authors:  Robert E Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T Maurano; Eric Haugen; Nathan C Sheffield; Andrew B Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K Canfield; Morgan Diegel; Douglas Dunn; Abigail K Ebersol; Tristan Frum; Erika Giste; Audra K Johnson; Ericka M Johnson; Tanya Kutyavin; Bryan Lajoie; Bum-Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph; Fidencio Neri; Eric D Nguyen; Hongzhu Qu; Alex P Reynolds; Vaughn Roach; Alexias Safi; Minerva E Sanchez; Amartya Sanyal; Anthony Shafer; Jeremy M Simon; Lingyun Song; Shinny Vong; Molly Weaver; Yongqi Yan; Zhancheng Zhang; Zhuzhu Zhang; Boris Lenhard; Muneesh Tewari; Michael O Dorschner; R Scott Hansen; Patrick A Navas; George Stamatoyannopoulos; Vishwanath R Iyer; Jason D Lieb; Shamil R Sunyaev; Joshua M Akey; Peter J Sabo; Rajinder Kaul; Terrence S Furey; Job Dekker; Gregory E Crawford; John A Stamatoyannopoulos
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

View more
  4 in total

1.  Statistical Methods in Integrative Genomics.

Authors:  Sylvia Richardson; George C Tseng; Wei Sun
Journal:  Annu Rev Stat Appl       Date:  2016-04-18       Impact factor: 5.810

2.  Improved detection of epigenomic marks with mixed-effects hidden Markov models.

Authors:  Pedro L Baldoni; Naim U Rashid; Joseph G Ibrahim
Journal:  Biometrics       Date:  2019-10-17       Impact factor: 2.571

3.  Bayesian continuous-time hidden Markov models with covariate selection for intensive longitudinal data with measurement error.

Authors:  Mingrui Liang; Matthew D Koslovsky; Emily T Hébert; Darla E Kendzor; Michael S Businelle; Marina Vannucci
Journal:  Psychol Methods       Date:  2021-12-20

4.  Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.

Authors:  Naim U Rashid; Quefeng Li; Jen Jen Yeh; Joseph G Ibrahim
Journal:  J Am Stat Assoc       Date:  2019-10-29       Impact factor: 5.033

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.