Literature DB >> 11093115

A Bayesian method for finding regulatory segments in DNA.

E M Crowley1.   

Abstract

A goal of the human genome project is to determine the entire sequence of DNA (3 x 10(9) base pairs) found in chromosomes. The massive amounts of data produced by this project require interpretation. A Bayesian model is developed for locating regulatory regions in a DNA sequence. Regulatory regions are areas of DNA to which specific proteins bind and control whether or not a gene is transcribed to produce templates for protein synthesis. Each human cell contains the same DNA sequence. Thus the particular function of different cells is determined by the genes that are transcribed in that cell. A Hidden Markov chain is used to model whether a small interval of the DNA is in a regulatory region or not. This can be regarded as a changepoint problem where the changepoints are the start of a regulatory or nonregulatory region. The data consists of protein-binding elements, which are short subsequences, or "words," in the DNA sequence. Although these words can occur anywhere in the sequence, a larger number are expected in regulatory regions. Therefore, regulatory regions are detected by locating clusters of words. For a particular DNA sequence, the model automatically selects those words that best predict regions of interest. Markov chain Monte Carlo methods are used to explore the posterior distribution of the Hidden Markov chain. The model is tested by means of simulations, and applied to several DNA sequences. Copyright 2001 John Wiley & Sons, Inc.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11093115     DOI: 10.1002/1097-0282(200102)58:2<165::AID-BIP50>3.0.CO;2-O

Source DB:  PubMed          Journal:  Biopolymers        ISSN: 0006-3525            Impact factor:   2.505


  1 in total

1.  Evaluation of thresholds for the detection of binding sites for regulatory proteins in Escherichia coli K12 DNA.

Authors:  Esperanza Benítez-Bellón; Gabriel Moreno-Hagelsieb; Julio Collado-Vides
Journal:  Genome Biol       Date:  2002-02-21       Impact factor: 13.583

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.