| Literature DB >> 16845066 |
T Waleev1, D Shtokalo, T Konovalova, N Voss, E Cheremushkin, P Stegmaier, O Kel-Margoulis, E Wingender, A Kel.
Abstract
Composite Module Analyst (CMA) is a novel software tool aiming to identify promoter-enhancer models based on the composition of transcription factor (TF) binding sites and their pairs. CMA is closely interconnected with the TRANSFAC database. In particular, CMA uses the positional weight matrix (PWM) library collected in TRANSFAC and therefore provides the possibility to search for a large variety of different TF binding sites. We model the structure of the long gene regulatory regions by a Boolean function that joins several local modules, each consisting of co-localized TF binding sites. Having as an input a set of co-regulated genes, CMA builds the promoter model and optimizes the parameters of the model automatically by applying a genetic-regression algorithm. We use a multicomponent fitness function of the algorithm which includes several statistical criteria in a weighted linear function. We show examples of successful application of CMA to a microarray data on transcription profiling of TNF-alpha stimulated primary human endothelial cells. The CMA web server is freely accessible at http://www.gene-regulation.com/pub/programs/cma/CMA.html. An advanced version of CMA is also a part of the commercial system ExPlaintrade mark (www.biobase.de) designed for causal analysis of gene expression data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845066 PMCID: PMC1538785 DOI: 10.1093/nar/gkl342
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1A schematic representation of a match of a CM in a particular promoter. Several TF sites and pairs of TF sites are found in a sequence window of the length w, which is located in an upstream region of a given gene.
Figure 2CMA web interface and results of identification of the CM discriminating promoters of up-regulated genes from promoters of down-regulated genes upon TNF-alpha stimulation of primary human endothelial cells. (a) CMA web interface; (b) representation of the identified CM that consists of three single weight matrices and three pairs of matrices. The distance limits orientation of the sites in the pairs are schematically shown. (c) Two histograms of the fuzzy promoter score (fps) in the promoters of up-regulated genes (red) and in down-regulated genes (blue); (d) representation of TF sites found in the windows that correspond to the maximal score of the match of the CM in the promoters. Marked are identified NF-AT/EGR2 site pairs (red) and IK3/IRF1 site pairs (blue).