Literature DB >> 17038339

A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.

Jiang Du1, Joel S Rozowsky, Jan O Korbel, Zhengdong D Zhang, Thomas E Royce, Martin H Schultz, Michael Snyder, Mark Gerstein.   

Abstract

MOTIVATION: Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing.
METHODOLOGY: In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively).
RESULTS: For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.

Mesh:

Substances:

Year:  2006        PMID: 17038339     DOI: 10.1093/bioinformatics/btl515

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  A hidden Markov support vector machine framework incorporating profile geometry learning for identifying microbial RNA in tiling array data.

Authors:  Wen-Han Yu; Hedda Høvik; Tsute Chen
Journal:  Bioinformatics       Date:  2010-04-15       Impact factor: 6.937

Review 2.  Annotating non-coding regions of the genome.

Authors:  Roger P Alexander; Gang Fang; Joel Rozowsky; Michael Snyder; Mark B Gerstein
Journal:  Nat Rev Genet       Date:  2010-07-13       Impact factor: 53.242

3.  Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data.

Authors:  Hyungwon Choi; Alexey I Nesvizhskii; Debashis Ghosh; Zhaohui S Qin
Journal:  Bioinformatics       Date:  2009-05-14       Impact factor: 6.937

4.  Transcriptional landscape estimation from tiling array data using a model of signal shift and drift.

Authors:  Pierre Nicolas; Aurélie Leduc; Stéphane Robin; Simon Rasmussen; Hanne Jarmer; Philippe Bessières
Journal:  Bioinformatics       Date:  2009-06-26       Impact factor: 6.937

5.  A varying threshold method for ChIP peak-calling using multiple sources of information.

Authors:  Kuan-Bei Chen; Yu Zhang
Journal:  Bioinformatics       Date:  2010-09-15       Impact factor: 6.937

6.  Location, location, (ChIP-)location! Mapping chromatin landscapes one immunoprecipitation at a time.

Authors:  Benjamin P Berman; Baruch Frenkel; Gerhard A Coetzee
Journal:  J Cell Biochem       Date:  2009-05-01       Impact factor: 4.429

7.  Effect of false positive and false negative rates on inference of binding target conservation across different conditions and species from ChIP-chip data.

Authors:  Debayan Datta; Hongyu Zhao
Journal:  BMC Bioinformatics       Date:  2009-01-19       Impact factor: 3.169

8.  Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays.

Authors:  Gard O S Thomassen; Alexander D Rowe; Karin Lagesen; Jessica M Lindvall; Torbjørn Rognes
Journal:  PLoS One       Date:  2009-06-17       Impact factor: 3.240

9.  BayesPeak: Bayesian analysis of ChIP-seq data.

Authors:  Christiana Spyrou; Rory Stark; Andy G Lynch; Simon Tavaré
Journal:  BMC Bioinformatics       Date:  2009-09-21       Impact factor: 3.169

10.  Wavelet-based detection of transcriptional activity on a novel Staphylococcus aureus tiling microarray.

Authors:  Víctor Segura; Alejandro Toledo-Arana; Maite Uzqueda; Iñigo Lasa; Arrate Muñoz-Barrutia
Journal:  BMC Bioinformatics       Date:  2012-09-05       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.