Literature DB >> 19759861

D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

Naresh Sen1, Manoj Mishra, Feroz Khan, Abha Meena, Ashok Sharma.   

Abstract

UNLABELLED: Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. AVAILABILITY: http://203.190.147.116/dmatrix/

Entities:  

Keywords:  Weight matrix; file format; motif databases; motif prediction

Year:  2009        PMID: 19759861      PMCID: PMC2737498          DOI: 10.6026/97320630003415

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

An important task in molecular biology is to identify DNA regulatory elements for transcription factors. These binding sites are short regions and called as ’motifs‘. Despite considerable efforts to date, DNA motif finding in whole genome remains a challenge for researchers. There are several approaches to identify the conserved motifs but the recent one is through weight matrix based. So far no such tool is available to construct the different types of weight matrices according to user defined set. Earlier tools uses promoter sequences of co-regulated genes from single genome and search for statistically over-represented motifs. However, most of these motif finding tools have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. Over the past few years, numerous tools have become available for the prediction of TF binding sites [1-3]. Especially popular are those tools which use information of known binding sites that are collected in databases such as TRANSFAC [4], EpoDB [5], TRANSCompel [6]. More sophisticated approaches include consideration of nucleotide correlation in different positions of the sites, HMM, taking into account flanking regions and others [7-14]. But usually, complex approaches require large training sets, which is rather problematic since, only small sets of binding patterns are known for a motif (i.e. up to 10 sites). Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome wide prediction but no tool for weight matrix construction. Considering this, we have developed D-MATRIX tool which constructs the different types of weight matrices based on user defined motif sequences and width. D-MATRIX can use both orthologous and co-regulated genes upstream sequences as input data set. For demonstration, we used the known LexA transcription factor binding site of Deinococcus radiodurans (a radiation digestive bacterium), to construct the weight matrix similar to earlier reported one [15]. Predictions performance showed promising results, as on comparison of weight matrix with known one, we found 90% accuracy with aligned motifs of same width. D-MATRIX can generate different types of matrices i.e., alignment, frequency and weight matrix. D-MATRIX also offers weight matrix conversion into different file formats as per user ease. These converted files can than be used as input files by genome wide motif prediction tools e.g. PoSSuMsearch [16] and RSAT-Patser [17]. Aligned motif sequences can be retrieved through available motif discovery tools e.g. SIGNAL SCAN [7], MATRIX SEARCH [8], MatInspector [9], Fuzzy clustering tool [10], FUNSITE [11], Gibbs Sampling tool [13], AliBaba2 [14] etc. D-MATRIX differs from existing tools by providing liberty to design user defined weight matrix model & signature.

Methodology

D-MATRIX takes aligned DNA motif sequences ’N‘ and motif width ’w‘ as input, searches for nucleotide frequency at each position ’F(ij)‘ and outputs the found consensus patterns/motifs according to conservation priority based on nucleotide frequency ’F(ij)‘, constructed frequency matrix, alignment matrix and weight matrix along with motif signature and degenerate consensus sequence according to IUPAC/IUB convention. Scoring of the weight matrix was done through following equation (see equation 1 in supplementary material) as described elsewhere [15,18].

Implementation

The D-MATRIX web tool is implemented in CSharp and wrapped in ASP.Net to maintain a user friendly web interface. The D-MATRIX user interface is shown in snapshots (Figure 1). It has been designed so that the user has all necessary parameters available on one screen. The top panel is used to paste the input sequences (or aligned known TF binding sites) and to specify the name and width of motif to be search. The results panel contains five major sections: consensus pattern/motif sequence, frequency matrix, alignment matrix, weight matrix and signature sequence as per IUPAC code. Along with these results a tool for matrix transformation is also associated in right panel, which can transform the derived matrix according to input file format of various genomic motif discovery tools. Since input sequence set required is experimental one, thus all weight matrices constructed through D-MATRIX tool can be considered as a source of well supported hypotheses for further experimental verification.
Figure 1

Snapshots of D­Matrix tool

  16 in total

Review 1.  Regulatory elements and expression profiles.

Authors:  P Bucher
Journal:  Curr Opin Struct Biol       Date:  1999-06       Impact factor: 6.809

Review 2.  Discovery and modeling of transcriptional regulatory regions.

Authors:  J W Fickett; W W Wasserman
Journal:  Curr Opin Biotechnol       Date:  2000-02       Impact factor: 9.740

3.  EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis.

Authors:  C J Stoeckert; F Salas; B Brunk; G C Overton
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

4.  Transcription regulatory region analysis using signal detection and fuzzy clustering.

Authors:  L Pickert; I Reuter; F Klawonn; E Wingender
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

5.  SIGNAL SCAN 4.0: additional databases and sequence formats.

Authors:  D S Prestridge
Journal:  Comput Appl Biosci       Date:  1996-04

6.  Computer tool FUNSITE for analysis of eukaryotic regulatory genomic sequences.

Authors:  A E Kel; Y V Kondrakhin; O V Kel; A G Romashenko; E Wingender; L Milanesi; N A Kolchanov
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1995

7.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors:  C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal:  Science       Date:  1993-10-08       Impact factor: 47.728

8.  MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices.

Authors:  Q K Chen; G Z Hertz; G D Stormo
Journal:  Comput Appl Biosci       Date:  1995-10

9.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.

Authors:  K Quandt; K Frech; H Karas; E Wingender; T Werner
Journal:  Nucleic Acids Res       Date:  1995-12-11       Impact factor: 16.971

10.  Fast index based algorithms and software for matching position specific scoring matrices.

Authors:  Michael Beckstette; Robert Homann; Robert Giegerich; Stefan Kurtz
Journal:  BMC Bioinformatics       Date:  2006-08-24       Impact factor: 3.169

View more
  4 in total

Review 1.  Abscisic-acid-dependent basic leucine zipper (bZIP) transcription factors in plant abiotic stress.

Authors:  Aditya Banerjee; Aryadeep Roychoudhury
Journal:  Protoplasma       Date:  2015-12-15       Impact factor: 3.356

2.  Functional pathway mapping analysis for hypoxia-inducible factors.

Authors:  Chia-Sheng Chuang; Tun-Wen Pai; Chin-Hua Hu; Wen-Shyong Tzou; Margaret Dah-Tsyr Chang; Hao-Teng Chang; Chih-Chia Chen
Journal:  BMC Syst Biol       Date:  2011-06-20

3.  DEAF1 binds unmethylated and variably spaced CpG dinucleotide motifs.

Authors:  Philip J Jensik; Jesse D Vargas; Sara N Reardon; Shivakumar Rajamanickam; Jodi I Huggenvik; Michael W Collard
Journal:  PLoS One       Date:  2014-12-22       Impact factor: 3.240

4.  Regulatory loop between the CsrA system and NhaR, a high salt/high pH regulator.

Authors:  Jarosław E Król
Journal:  PLoS One       Date:  2018-12-27       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.