Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations.

Literature DB >> 24678134

Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations.

Naim U Rashid¹, Wei Sun¹, Joseph G Ibrahim¹.

Abstract

In DAE (DNA After Enrichment)-seq experiments, genomic regions related with certain biological processes are enriched/isolated by an assay and are then sequenced on a high-throughput sequencing platform to determine their genomic positions. Statistical analysis of DAE-seq data aims to detect genomic regions with significant aggregations of isolated DNA fragments ("enriched regions") versus all the other regions ("background"). However, many confounding factors may influence DAE-seq signals. In addition, the signals in adjacent genomic regions may exhibit strong correlations, which invalidate the independence assumption employed by many existing methods. To mitigate these issues, we develop a novel Autoregressive Hidden Markov Model (AR-HMM) to account for covariates effects and violations of the independence assumption. We demonstrate that our AR-HMM leads to improved performance in identifying enriched regions in both simulated and real datasets, especially in those in epigenetic datasets with broader regions of DAE-seq signal enrichment. We also introduce a variable selection procedure in the context of the HMM/AR-HMM where the observations are not independent and the mean value of each state-specific emission distribution is modeled by some covariates. We study the theoretical properties of this variable selection procedure and demonstrate its efficacy in simulated and real DAE-seq data. In summary, we develop several practical approaches for DAE-seq data analysis that are also applicable to more general problems in statistics.

Entities: CellLine Chemical Disease Gene Species

Keywords: Autoregressive modeling; Hidden Markov Model; High-throughput Sequencing; Mixture Regression; Variable Selection

Year: 2014 PMID： 24678134 PMCID： PMC3963211 DOI： 10.1080/01621459.2013.869222

Source DB: PubMed Journal: J Am Stat Assoc ISSN： 0162-1459 Impact factor: 5.033

28 in total

1. Combined action of PHD and chromo domains directs the Rpd3S HDAC to transcribed chromatin.

Authors: Bing Li; Madelaine Gogol; Mike Carey; Daeyoup Lee; Chris Seidel; Jerry L Workman
Journal: Science Date: 2007-05-18 Impact factor: 47.728

2. High-resolution profiling of histone methylations in the human genome.

Authors: Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Dustin E Schones; Zhibin Wang; Gang Wei; Iouri Chepelev; Keji Zhao
Journal: Cell Date: 2007-05-18 Impact factor: 41.582

Review 3. On the EM algorithm for overdispersed count data.

Authors: G J McLachlan
Journal: Stat Methods Med Res Date: 1997-03 Impact factor: 3.021

4. A Selective Overview of Variable Selection in High Dimensional Feature Space.

Authors: Jianqing Fan; Jinchi Lv
Journal: Stat Sin Date: 2010-01 Impact factor: 1.261

5. Markov regression models for time series: a quasi-likelihood approach.

Authors: S L Zeger; B Qaqish
Journal: Biometrics Date: 1988-12 Impact factor: 2.571

6. Leukemia/lymphoma-related factor, a POZ domain-containing transcriptional repressor, interacts with histone deacetylase-1 and inhibits cartilage oligomeric matrix protein gene expression and chondrogenesis.

Authors: Chuan-ju Liu; Lisa Prazak; Marc Fajardo; Shuang Yu; Neetu Tyagi; Paul E Di Cesare
Journal: J Biol Chem Date: 2004-08-26 Impact factor: 5.157

7. F-Seq: a feature density estimator for high-throughput sequence tags.

Authors: Alan P Boyle; Justin Guinney; Gregory E Crawford; Terrence S Furey
Journal: Bioinformatics Date: 2008-09-10 Impact factor: 6.937

8. NUP98-NSD1 links H3K36 methylation to Hox-A gene activation and leukaemogenesis.

Authors: Gang G Wang; Ling Cai; Martina P Pasillas; Mark P Kamps
Journal: Nat Cell Biol Date: 2007-06-24 Impact factor: 28.824

9. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions.

Authors: Naim U Rashid; Paul G Giresi; Joseph G Ibrahim; Wei Sun; Jason D Lieb
Journal: Genome Biol Date: 2011-07-25 Impact factor: 13.583

10. The accessible chromatin landscape of the human genome.

Authors: Robert E Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T Maurano; Eric Haugen; Nathan C Sheffield; Andrew B Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K Canfield; Morgan Diegel; Douglas Dunn; Abigail K Ebersol; Tristan Frum; Erika Giste; Audra K Johnson; Ericka M Johnson; Tanya Kutyavin; Bryan Lajoie; Bum-Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph; Fidencio Neri; Eric D Nguyen; Hongzhu Qu; Alex P Reynolds; Vaughn Roach; Alexias Safi; Minerva E Sanchez; Amartya Sanyal; Anthony Shafer; Jeremy M Simon; Lingyun Song; Shinny Vong; Molly Weaver; Yongqi Yan; Zhancheng Zhang; Zhuzhu Zhang; Boris Lenhard; Muneesh Tewari; Michael O Dorschner; R Scott Hansen; Patrick A Navas; George Stamatoyannopoulos; Vishwanath R Iyer; Jason D Lieb; Shamil R Sunyaev; Joshua M Akey; Peter J Sabo; Rajinder Kaul; Terrence S Furey; Job Dekker; Gregory E Crawford; John A Stamatoyannopoulos
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

4 in total

1. Statistical Methods in Integrative Genomics.

Authors: Sylvia Richardson; George C Tseng; Wei Sun
Journal: Annu Rev Stat Appl Date: 2016-04-18 Impact factor: 5.810

2. Improved detection of epigenomic marks with mixed-effects hidden Markov models.

Authors: Pedro L Baldoni; Naim U Rashid; Joseph G Ibrahim
Journal: Biometrics Date: 2019-10-17 Impact factor: 2.571

3. Bayesian continuous-time hidden Markov models with covariate selection for intensive longitudinal data with measurement error.

Authors: Mingrui Liang; Matthew D Koslovsky; Emily T Hébert; Darla E Kendzor; Michael S Businelle; Marina Vannucci
Journal: Psychol Methods Date: 2021-12-20

4. Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.

Authors: Naim U Rashid; Quefeng Li; Jen Jen Yeh; Joseph G Ibrahim
Journal: J Am Stat Assoc Date: 2019-10-29 Impact factor: 5.033

4 in total