| Literature DB >> 28575075 |
Michael G Becker1, Philip L Walker1, Nadège C Pulgar-Vidal2, Mark F Belmonte1.
Abstract
Transcription factors and their associated DNA binding sites are key regulatory elements of cellular differentiation, development, and environmental response. New tools that predict transcriptional regulation of biological processes are valuable to researchers studying both model and emerging-model plant systems. SeqEnrich predicts transcription factor networks from co-expressed Arabidopsis or Brassica napus gene sets. The networks produced by SeqEnrich are supported by existing literature and predicted transcription factor-DNA interactions that can be functionally validated at the laboratory bench. The program functions with gene sets of varying sizes and derived from diverse tissues and environmental treatments. SeqEnrich presents as a powerful predictive framework for the analysis of Arabidopsis and Brassica napus co-expression data, and is designed so that researchers at all levels can easily access and interpret predicted transcriptional circuits. The program outperformed its ancestral program ChipEnrich, and produced detailed transcription factor networks from Arabidopsis and Brassica napus gene expression data. The SeqEnrich program is ideal for generating new hypotheses and distilling biological information from large-scale expression data.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28575075 PMCID: PMC5456048 DOI: 10.1371/journal.pone.0178256
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Conceptual description of a transcription factor network.
TFs are represented as green rounded squares, DNA motifs as pink diamonds, gene patterns as orange hexagons, and gene ontology (GO) terms as blue circles. Connections between TFs and motifs, and between motifs and patterns/GO terms are represented by a grey connecting line.
Fig 2Design and assembly of the SeqEnrich database.
Information on transcription factors, DNA binding site motifs, and gene functions were collected from publically available sources and integrated into the SeqEnrich program.
IUPAC codes used for representation of nucleotides in motifs and corresponding likelihood of each nucleotide at position.
| IUPAC code | IUPAC identity | Nucleotide probabilities |
|---|---|---|
| Adenine | [p( | |
| Cytosine | [p( | |
| Guanine | [p( | |
| Thymine | [p( | |
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( | ||
| [p( |
Fig 3Predicted transcription factor networks from the chalazal endosperm of Arabidopsis.
A) Predicted transcriptional module developed from the ChipEnrich program; B) predicted transcriptional module using the SeqEnrich program; C) subset of the transcriptional module produced from the subanalysis function of the SeqEnrich program. A predicted bZIP, bHLH, MYB, and BES transcriptional module controlling biological processes within the mature endosperm of Arabidopsis.
Fig 4Predicted transcription factor networks from Brassica napus gene sets.
A) Subset of transcription factor network identified from funiculus vasculature dataset. MYB and MADS-box TFs are predicted to regulate genes associated with transport, metal ion homeostasis, and cell wall modification; B) subset of transcriptional module identified from the SeqEnrich subanalysis function predicted to be operative in seedlings infected with fungal pathogen Leptosphaeria maculans; C) transcriptional module depicted by arrow in (B), showing regulation of genes associated with defense bioprocesses by a family of calmodulin-binding transcriptional activators (CAMTAs).