| Literature DB >> 21507886 |
Ryan G Christensen1, Ankit Gupta, Zheng Zuo, Lawrence A Schriefer, Scot A Wolfe, Gary D Stormo.
Abstract
We examine the use of high-throughput sequencing on binding sites recovered using a bacterial one-hybrid (B1H) system and find that improved models of transcription factor (TF) binding specificity can be obtained compared to standard methods of sequencing a small subset of the selected clones. We can obtain even more accurate binding models using a modified version of B1H selection method with constrained variation (CV-B1H). However, achieving these improved models using CV-B1H data required the development of a new method of analysis--GRaMS (Growth Rate Modeling of Specificity)--that estimates bacterial growth rates as a function of the quality of the recognition sequence. We benchmark these different methods of motif discovery using Zif268, a well-characterized C(2)H(2) zinc-finger TF on both a 28 bp randomized library for the standard B1H method and on 6 bp randomized library for the CV-B1H method for which 45 different experimental conditions were tested: five time points and three different IPTG and 3-AT concentrations. We find that GRaMS analysis is robust to the different experimental parameters whereas other analysis methods give widely varying results depending on the conditions of the experiment. Finally, we demonstrate that the CV-B1H assay can be performed in liquid media, which produces recognition models that are similar in quality to sequences recovered from selection on solid media.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21507886 PMCID: PMC3130293 DOI: 10.1093/nar/gkr239
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Box plot showing the ability of the set of MEME and BioProspector motifs learned from the four 28 bp B1H data sets to predict the SELEX data. For each PWM, R2 was calculated to determine the correlation between the predicted and observed SELEX counts. The performance of three PWMs from the literature is also shown. Zhao2009, Berger2006 and Meng2005 were learned from SELEX, PBM and B1H data, respectively. The GCGTGGGCGG consensus sequence PWM was constructed using an optimal mismatch penalty term.
Figure 2.Boxplot showing the ability of the 45 PWMs produced by each analysis method using each B1H data set as training data to predict the SELEX nnnnnnGCGG data. For each model, R2 was calculated to determine the correlation between the predicted and observed SELEX counts. The performance of four individual PWMs is also indicated. Two of these PWMs, Zhao2009 and Meng2005, were obtained from published SELEX and B1H studies respectively; the first six positions of these PWMs were used. The LT-B1H PWM was learned from 22 sequences obtained from a CV-B1H experiment. The GCGTGG consensus sequence PWM was constructed using an optimal mismatch penalty term.
Figure 3.Results of CV-B1H on Zif268 analyzed with GRaMS. (A) Plot of predicted energies versus growth rates per 6-mer. The GRaMS PWM (8 h, 50 μM IPTG, 2 mM 3-AT) was used to predict the energies. The growth rates (shifted so that the median value is zero) are from the 8 h, 50 μM IPTG, 2 mM 3-AT data set used to estimate the GRaMS PWM. (B) Sequence logo for the GRaMS PWM obtained from the same data set. The y-axis indicates the information content of each position in bits. Sequence logos were produced using in-house software, svgSeqLogo, written by RGC.