Literature DB >> 19003435

Machine learning for regulatory analysis and transcription factor target prediction in yeast.

Dustin T Holloway1, Mark Kon, Charles Delisi.   

Abstract

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

Entities:  

Year:  2007        PMID: 19003435      PMCID: PMC2533145          DOI: 10.1007/s11693-006-9003-3

Source DB:  PubMed          Journal:  Syst Synth Biol        ISSN: 1872-5325


  70 in total

Review 1.  DNA binding sites: representation and discovery.

Authors:  G D Stormo
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

2.  Regulatory element detection using correlation with expression.

Authors:  H J Bussemaker; H Li; E D Siggia
Journal:  Nat Genet       Date:  2001-02       Impact factor: 38.330

3.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity.

Authors:  C T Workman; G D Stormo
Journal:  Pac Symp Biocomput       Date:  2000

4.  Rap1p requires Gcr1p and Gcr2p homodimers to activate ribosomal protein and glycolytic genes, respectively.

Authors:  S J Deminoff; G M Santangelo
Journal:  Genetics       Date:  2001-05       Impact factor: 4.562

5.  Metrics for comparing regulatory sequences on the basis of pattern counts.

Authors:  Jacques van Helden
Journal:  Bioinformatics       Date:  2004-02-05       Impact factor: 6.937

6.  DNA dynamically directs its own transcription initiation.

Authors:  Chu H Choi; George Kalosakas; Kim O Rasmussen; Makoto Hiromura; Alan R Bishop; Anny Usheva
Journal:  Nucleic Acids Res       Date:  2004-03-05       Impact factor: 16.971

7.  Using string kernel to predict signal peptide cleavage site based on subsite coupling model.

Authors:  M Wang; J Yang; K-C Chou
Journal:  Amino Acids       Date:  2005-04-21       Impact factor: 3.520

8.  Sequence periodicities in chicken nucleosome core DNA.

Authors:  S C Satchwell; H R Drew; A A Travers
Journal:  J Mol Biol       Date:  1986-10-20       Impact factor: 5.469

9.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies.

Authors:  J van Helden; B André; J Collado-Vides
Journal:  J Mol Biol       Date:  1998-09-04       Impact factor: 5.469

10.  Native human TATA-binding protein simultaneously binds and bends promoter DNA without a slow isomerization step or TFIIB requirement.

Authors:  Kristina M Masters; Kay M Parkhurst; Margaret A Daugherty; Lawrence J Parkhurst
Journal:  J Biol Chem       Date:  2003-06-05       Impact factor: 5.157

View more
  9 in total

1.  Robust gene network analysis reveals alteration of the STAT5a network as a hallmark of prostate cancer.

Authors:  Anupama Reddy; C Chris Huang; Huiqing Liu; Charles Delisi; Marja T Nevalainen; Sandor Szalma; Gyan Bhanot
Journal:  Genome Inform       Date:  2010

2.  Differences in local genomic context of bound and unbound motifs.

Authors:  Loren Hansen; Leonardo Mariño-Ramírez; David Landsman
Journal:  Gene       Date:  2012-06-10       Impact factor: 3.688

3.  Discriminating between HuR and TTP binding sites using the k-spectrum kernel method.

Authors:  Shweta Bhandare; Debra S Goldberg; Robin Dowell
Journal:  PLoS One       Date:  2017-03-23       Impact factor: 3.240

4.  LipocalinPred: a SVM-based method for prediction of lipocalins.

Authors:  Jayashree Ramana; Dinesh Gupta
Journal:  BMC Bioinformatics       Date:  2009-12-24       Impact factor: 3.169

5.  DNA structural properties in the classification of genomic transcription regulation elements.

Authors:  Pieter Meysman; Kathleen Marchal; Kristof Engelen
Journal:  Bioinform Biol Insights       Date:  2012-07-02

6.  A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli.

Authors:  Jason Ernst; Qasim K Beg; Krin A Kay; Gábor Balázsi; Zoltán N Oltvai; Ziv Bar-Joseph
Journal:  PLoS Comput Biol       Date:  2008-03-28       Impact factor: 4.475

7.  In silico regulatory analysis for exploring human disease progression.

Authors:  Dustin T Holloway; Mark Kon; Charles DeLisi
Journal:  Biol Direct       Date:  2008-06-18       Impact factor: 4.540

8.  Classifying transcription factor targets and discovering relevant biological features.

Authors:  Dustin T Holloway; Mark Kon; Charles DeLisi
Journal:  Biol Direct       Date:  2008-05-30       Impact factor: 4.540

9.  Landscape of transcriptional deregulation in lung cancer.

Authors:  Shu Zhang; Mingfa Li; Hongbin Ji; Zhaoyuan Fang
Journal:  BMC Genomics       Date:  2018-06-05       Impact factor: 3.969

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.