Literature DB >> 15166027

Regulatory motif finding by logic regression.

Sündüz Keles1, Mark J van der Laan, Chris Vulpe.   

Abstract

MOTIVATION: Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although many computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding TFBSs and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression. LogicMotif has two steps: First, potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this step when the genes of interest can be divided into groups such as up-and downregulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up- and down-regulation) using these potential sites. This 2-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions.
RESULTS: LogicMotif is applied to two publicly available datasets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this dataset with standard linear regression methods. Another dataset of S.cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up- and down-regulated genes in iron copper deficiency. LogicMotif identifies an inductive and two repressor motifs in this dataset. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We establish the robustness of the method to the type of outcome variable used by considering both continuous and binary outcome variables for this dataset. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods. AVAILABILITY: Source code for logic regression is freely available as a package of the R programming language by Ruczinski et al. (2003) and can be downloaded at http://bear.fhcrc.org/~ingor/logic/download/download.html an R package for MFURE is available at http://www.stat.berkeley.edu/~sunduz/software.html

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15166027     DOI: 10.1093/bioinformatics/bth333

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

1.  Testing SNPs and sets of SNPs for importance in association studies.

Authors:  Holger Schwender; Ingo Ruczinski; Katja Ickstadt
Journal:  Biostatistics       Date:  2010-07-02       Impact factor: 5.899

2.  Logic Forest: an ensemble classifier for discovering logical combinations of binary markers.

Authors:  Bethany J Wolf; Elizabeth G Hill; Elizabeth H Slate
Journal:  Bioinformatics       Date:  2010-07-13       Impact factor: 6.937

3.  Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.

Authors:  Kuang-Yu Liu; Xiaobo Zhou; Kinhong Kan; Stephen T C Wong
Journal:  Neuroinformatics       Date:  2006

4.  Quantitative analysis of binding motifs mediating diverse spatial readouts of the Dorsal gradient in the Drosophila embryo.

Authors:  Dmitri Papatsenko; Michael Levine
Journal:  Proc Natl Acad Sci U S A       Date:  2005-03-28       Impact factor: 11.205

5.  Repeated measures semiparametric regression using targeted maximum likelihood methodology with application to transcription factor activity discovery.

Authors:  Catherine Tuglus; Mark J van der Laan
Journal:  Stat Appl Genet Mol Biol       Date:  2011-01-06

6.  Importance measures for epistatic interactions in case-parent trios.

Authors:  Holger Schwender; Katherine Bowers; M Daniele Fallin; Ingo Ruczinski
Journal:  Ann Hum Genet       Date:  2010-11-30       Impact factor: 1.670

7.  Machine learning for regulatory analysis and transcription factor target prediction in yeast.

Authors:  Dustin T Holloway; Mark Kon; Charles Delisi
Journal:  Syst Synth Biol       Date:  2007-03

8.  Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs.

Authors:  Hani Z Girgis; Ivan Ovcharenko
Journal:  BMC Bioinformatics       Date:  2012-02-07       Impact factor: 3.169

9.  c-REDUCE: incorporating sequence conservation to detect motifs that correlate with expression.

Authors:  Katerina Kechris; Hao Li
Journal:  BMC Bioinformatics       Date:  2008-11-28       Impact factor: 3.169

10.  Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships.

Authors:  Ken Daigoro Yokoyama; Uwe Ohler; Gregory A Wray
Journal:  Nucleic Acids Res       Date:  2009-05-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.