Literature DB >> 26794315

TFBSTools: an R/bioconductor package for transcription factor binding site analysis.

Ge Tan1, Boris Lenhard1.   

Abstract

UNLABELLED: : The ability to efficiently investigate transcription factor binding sites (TFBSs) genome-wide is central to computational studies of gene regulation. TFBSTools is an R/Bioconductor package for the analysis and manipulation of TFBSs and their associated transcription factor profile matrices. TFBStools provides a toolkit for handling TFBS profile matrices, scanning sequences and alignments including whole genomes, and querying the JASPAR database. The functionality of the package can be easily extended to include advanced statistical analysis, data visualization and data integration.
AVAILABILITY AND IMPLEMENTATION: The package is implemented in R and available under GPL-2 license from the Bioconductor website (http://bioconductor.org/packages/TFBSTools/). CONTACT: ge.tan09@imperial.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26794315      PMCID: PMC4866524          DOI: 10.1093/bioinformatics/btw024

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Transcription factor binding sites (TFBSs) on DNA play a central role in gene regulation via their sequence-specific interaction with transcription factor (TF) proteins (Wasserman and Sandelin, 2004). Most individual TFBSs are 4–30 base-pairs (bp) wide, but are generally located in larger cis-regulatory regions of 50–200 bp. Analysis and identification of TFBSs is crucial for understanding the regulatory mechanisms of gene regulation. At present, the TFBS analysis functionality in R/Bioconductor (Gentleman ) is limited and scattered across multiple packages. Here we introduce an R package TFBSTools, which provides a unified and efficiently implemented suite of TFBS analysis tools. The package provides a number of functions for manipulating TFBS profile matrices and searching DNA sequence and pairwise alignments using them. We have ported all of the functionality of our popular TFBS Perl modules (Lenhard and Wasserman, 2002), retaining the equivalent class structure where possible, and expanded the functionality to provide efficient genome-wide analysis of TFBSs. Our implementation is tightly integrated with the existing Bioconductor core packages, enabling high-performance sequence and interval manipulation. A database interface for JASPAR2014 (Mathelier ), JASPAR2016 (Mathelier ) and wrapper function for de novo motif discovery software are also provided.

2 Methods

2.1 S4 classes defined in TFBSTools

To provide easy data storage, manipulation and exchange, we created several novel S4 classes (Fig. 1), and also defined an aggregate version of each class (e.g. PFMatrixList) to help manipulate sets of the corresponding objects. The design of these classes corresponds to classes in TFBS Perl modules, while remaining extensible in an object-oriented manner, adding new functionality and taking advantage of functional programming capabilities of R.
Fig. 1.

A common workflow and classes in TFBSTools. (A) PFMatrix can be converted into PWMatrix, ICMatrix. ICMatrix produces the sequence logos. PWMatrix scans the single sequence or alignment to produce SiteSet object that holds transcription factor binding sites. (B) TFFM: A virtual class for TFFM; TFFMFirst and TFFMDetail are derived from this virtual class. They can produce the position probabilities and the novel graphics representation of TFFM

A common workflow and classes in TFBSTools. (A) PFMatrix can be converted into PWMatrix, ICMatrix. ICMatrix produces the sequence logos. PWMatrix scans the single sequence or alignment to produce SiteSet object that holds transcription factor binding sites. (B) TFFM: A virtual class for TFFM; TFFMFirst and TFFMDetail are derived from this virtual class. They can produce the position probabilities and the novel graphics representation of TFFM

2.2 Operations with TFBS matrix profiles

To characterize the binding preference of a TF, the aligned sequences bound by the TF are aggregated into a position frequency matrix (PFM). From this matrix, another two matrices can be derived: position weight matrix (PWM, the most commonly used kind of position-specific scoring matrix) and information content matrix (ICM). PWM is a matrix of positional log-likelihoods normally used for sequence scanning and scoring against the motif, while ICM is mostly used in motif visualization, e.g. for drawing sequence logos which can be easily done by the package seqLogo (Fig. 1A). As a novel feature, in addition to matrix profiles, TFBSTools also supports the manipulation of transcription factor flexible model (TFFM) profiles (Mathelier and Wasserman, 2013), which capture the dinucleotide dependence (Fig. 1B). TFBSTools provides methods to perform the conversion between different types of matrices, providing a range of options and customizations. The highlights include: (i) a default pseudocount of 0.8 (Nishida ) is used to eliminate the small or zero counts before log transformation, although a different pseudocount, or pseudocount function, for each column is possible; (ii) Schneider correction for ICM is available; (iii) Unequal background nucleotide frequencies can also be specified. TFBSTools provides tools for comparing pairs of PFMs, or a PFM with IUPAC strings, using a modified Needleman–Wunsch algorithm (Sandelin ). Quantification of the similarity between PFMs is commonly used for comparing a newly discovered matrix with existing matrices in the motif database, such as JASPAR, to determine whether the motif is related to known annotated motifs. The similarity between two PWMs can be quantified using several metrics (e.g. normalized Euclidian distance, Pearson correlation coefficient and Kullback–Leibler divergence). In addition, TFBSTools also allows random profile generation by: (i) sampling the posterior distribution of Dirichlet multinomial mixture models trained on all available JASPAR matrices; (ii) permutation of columns from selected PFMs. The availability of random matrices with the same statistical properties as selected profiles is particularly useful for computational/simulation studies, such as matrix-matrix comparison.

2.3 Sequence/alignment scanning with PWM profiles

TFBSTools includes facilities for screening potential TFBSs present in a DNA sequence (searchSeq), or conserved in a pairwise alignment. When a pairwise alignment is available, it can be used to combine the TFBSs prediction with phylogenetic footprinting, which can in many cases reduce the false discovery rate whilst retaining a sufficient level of sensitivity (Wasserman and Sandelin, 2004). Alternatively, it can be used in combination with other data (e.g. ChIP-seq) to study the cross-species conservation properties of TF binding. For genome-wide phylogenetic footprinting, TFBSTools can accept two BSgenome objects, and a chain file for liftover from one genome to another (searchPairBSgenome) or a novel S4 class Axt from our CNEr package (available from the Bioconductor website) for representing the axt alignments (searchAln). It can take up to 50 CPU hours to run searchAln on human–mouse pairwise alignment with the possibility of parallel computation, while searchSeq or searchPairBSgenome only needs several minutes. The computationally predicted putative TFBSs can be returned in GFF format or GRanges for downstream analysis.

2.4 JASPAR database interface

Since the release of JASPAR2014 (Mathelier ), we have provided Bioconductor data packages, JASPAR2014 and JASPAR2016, holding the profile matrices and associated metadata. To accompany the use of this data package for TFBS analysis, TFBSTools provides functions to enable efficient database querying and manipulation.

2.5 Use of de novo motif discovery software

TFBSTools provides wrapper functions for de novo motif discovery softwares and seamlessly integrates the results back into R objects. Currently, support for MEME is implemented and reported motifs are stored in MotifSet object.

3 Conclusions and further information

The Bioconductor TFBSTools package provides a full suite of TFBS analysis tools. The package allows the efficient and reproducible identification and analysis of TFBSs. In combination with other functionality in Bioconductor, it provides a powerful way to analyze TF binding motifs on genome-wide scale. Further development will include an efficient implementation of scanning sequence/alignment with TFFM. A tutorial and additional use cases are available at Bioconductor website.
  8 in total

1.  TFBS: Computational framework for transcription factor binding site analysis.

Authors:  Boris Lenhard; Wyeth W Wasserman
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

Review 2.  Applied bioinformatics for the identification of regulatory elements.

Authors:  Wyeth W Wasserman; Albin Sandelin
Journal:  Nat Rev Genet       Date:  2004-04       Impact factor: 53.242

3.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

4.  Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes.

Authors:  Albin Sandelin; Annette Höglund; Boris Lenhard; Wyeth W Wasserman
Journal:  Funct Integr Genomics       Date:  2003-06-25       Impact factor: 3.410

5.  The next generation of transcription factor binding site prediction.

Authors:  Anthony Mathelier; Wyeth W Wasserman
Journal:  PLoS Comput Biol       Date:  2013-09-05       Impact factor: 4.475

6.  Pseudocounts for transcription factor binding sites.

Authors:  Keishin Nishida; Martin C Frith; Kenta Nakai
Journal:  Nucleic Acids Res       Date:  2008-12-23       Impact factor: 16.971

7.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Xiaobei Zhao; Allen W Zhang; François Parcy; Rebecca Worsley-Hunt; David J Arenillas; Sorana Buchman; Chih-yu Chen; Alice Chou; Hans Ienasescu; Jonathan Lim; Casper Shyr; Ge Tan; Michelle Zhou; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

8.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Oriol Fornes; David J Arenillas; Chih-Yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

  8 in total
  114 in total

1.  Early KLRG1+ but Not CD57+CD8+ T Cells in Primary Cytomegalovirus Infection Predict Effector Function and Viral Control.

Authors:  Aki Hoji; Iulia D Popescu; Matthew R Pipeling; Pali D Shah; Spencer A Winters; John F McDyer
Journal:  J Immunol       Date:  2019-09-25       Impact factor: 5.422

2.  A reassessment of DNA-immunoprecipitation-based genomic profiling.

Authors:  Antonio Lentini; Cathrine Lagerwall; Svante Vikingsson; Heidi K Mjoseng; Karolos Douvlataniotis; Hartmut Vogt; Henrik Green; Richard R Meehan; Mikael Benson; Colm E Nestor
Journal:  Nat Methods       Date:  2018-06-25       Impact factor: 28.547

3.  Identification of Novel Nuclear Factor of Activated T Cell (NFAT)-associated Proteins in T Cells.

Authors:  Christian H Gabriel; Fridolin Gross; Martin Karl; Heike Stephanowitz; Anna Floriane Hennig; Melanie Weber; Stefanie Gryzik; Ivo Bachmann; Katharina Hecklau; Jürgen Wienands; Johannes Schuchhardt; Hanspeter Herzel; Andreas Radbruch; Eberhard Krause; Ria Baumgrass
Journal:  J Biol Chem       Date:  2016-09-16       Impact factor: 5.157

4.  OTX2 Activity at Distal Regulatory Elements Shapes the Chromatin Landscape of Group 3 Medulloblastoma.

Authors:  Gaylor Boulay; Mary E Awad; Nicolo Riggi; Tenley C Archer; Sowmya Iyer; Wannaporn E Boonseng; Nikki E Rossetti; Beverly Naigles; Shruthi Rengarajan; Angela Volorio; James C Kim; Jill P Mesirov; Pablo Tamayo; Scott L Pomeroy; Martin J Aryee; Miguel N Rivera
Journal:  Cancer Discov       Date:  2017-02-17       Impact factor: 39.397

5.  Conditions of embryo culture from days 5 to 7 of development alter the DNA methylome of the bovine fetus at day 86 of gestation.

Authors:  Yahan Li; Paula Tríbulo; Mohammad Reza Bakhtiarizadeh; Luiz Gustavo Siqueira; Tieming Ji; Rocío Melissa Rivera; Peter James Hansen
Journal:  J Assist Reprod Genet       Date:  2019-12-14       Impact factor: 3.412

6.  The Dynamic Landscape of Open Chromatin during Human Cortical Neurogenesis.

Authors:  Luis de la Torre-Ubieta; Jason L Stein; Hyejung Won; Carli K Opland; Dan Liang; Daning Lu; Daniel H Geschwind
Journal:  Cell       Date:  2018-01-04       Impact factor: 41.582

7.  BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test.

Authors:  Ahrim Youn; Eladio J Marquez; Nathan Lawlor; Michael L Stitzel; Duygu Ucar
Journal:  Nucleic Acids Res       Date:  2019-01-25       Impact factor: 16.971

8.  High-resolution interrogation of functional elements in the noncoding genome.

Authors:  Neville E Sanjana; Jason Wright; Kaijie Zheng; Ophir Shalem; Pierre Fontanillas; Julia Joung; Christine Cheng; Aviv Regev; Feng Zhang
Journal:  Science       Date:  2016-09-30       Impact factor: 47.728

9.  Combinatorial chromatin dynamics foster accurate cardiopharyngeal fate choices.

Authors:  Claudia Racioppi; Keira A Wiechecki; Lionel Christiaen
Journal:  Elife       Date:  2019-11-20       Impact factor: 8.140

10.  Reversible Disruption of Specific Transcription Factor-DNA Interactions Using CRISPR/Cas9.

Authors:  S Ali Shariati; Antonia Dominguez; Shicong Xie; Marius Wernig; Lei S Qi; Jan M Skotheim
Journal:  Mol Cell       Date:  2019-05-02       Impact factor: 17.970

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.