Literature DB >> 17537821

Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions.

Toshiyuki Okumura¹, Hiroki Makiguchi, Yuko Makita, Riu Yamashita, Kenta Nakai.

Abstract

We present the second version of Melina, a web-based tool for promoter analysis. Melina II shows potential DNA motifs in promoter regions with a combination of several available programs, Consensus, MEME, Gibbs sampler, MDscan and Weeder, as well as several parameter settings. It allows running a maximum of four programs simultaneously, and comparing their results with graphical representations. In addition, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, S. cerevisiae, E. coli, B. subtilis or A. thaliana) or to public motif databases (JASPAR or DBTBS) in order to find similar motifs. Melina II is a client/server system developed by using Adobe (Macromedia) Flash and is accessible over the web at http://melina.hgc.jp.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17537821 PMCID： PMC1933176 DOI： 10.1093/nar/gkm362

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Transcription factor binding sites (TFBSs) play important roles in the regulation of gene expression. Extraction of a common TFBS from a set of DNA sequences is a practically important problem. Although a number of algorithms have been released so far to overcome this problem, none of them seem to be perfect (1–4). Thus, to avoid missing important motifs relying on only one algorithm or to check the effect of changing parameter values, it is useful to compare the prediction results obtained from different algorithms/parameter values. To support this function, we previously released a web tool named Melina (5). Recently, it was updated to its second version, Melina II. In Melina II, some of the integrated algorithms are replaced with more modern ones and the graphical representation is extensively improved. Melina II enables users to compare the results of promoter analysis more efficiently and easily.

OVERVIEW

Melina II allows running at most four out of five external algorithms [Consensus (6), MEME (7), Gibbs sampler (8), MDscan (9) and Weeder (10)] with users’ specified parameter values to avoid missing important motifs. MDscan and Weeder are newly added in this release. MDscan is a hybrid of two motif search strategies, word enumeration and position-specific weight matrix. Weeder adopts an enumerative pattern discovery algorithm carrying out an almost exhaustive search. The integration of algorithms based on different principles should help detecting subtle motifs and reducing the number of false positives. It may also be helpful to narrow down motif candidates or to detect alternative motifs by the combination of different algorithms and/or parameter values. Results of these algorithms are comparatively displayed with intuitive graphics (Figure 1).

Figure 1.

Basic usage of Melina II.

Basic usage of Melina II. As shown in Figure 1, three simple steps are sufficient to use Melina II: Step 1: Input query sequences (Figure 1a) In the Query input panel, multiple input sequences are fed in the FASTA format. Step 2: Select predictive algorithms and their parameters (Figure 1a and b) Although defaults are provided, users can choose the prediction algorithms and their specific parameter values at this step. Default parameters are sometimes chosen originally to make the search conditions as similar as possible to each other. They are: (1) the motif length is around 10 bases (‘6–10’ for MEME and Weeder; otherwise, ‘10’); (2) both strands are searched and (3) multiple occurrences are allowed for each sequence. Selecting the same algorithm with different parameter values at the same time is allowed. Step 3: Submit a query and get results (Figure 1c) After submitting a query, a job ID is displayed on the screen while the job is running. Users can later access the results by using this job ID. After Melina II finishes the motif detection, the results of each prediction are integrated and displayed graphically (Figure 1c). Detected motif candidates are illustrated with colored arrows in the summarized view (upper-right corner of the result view). If users click a motif candidate in the summarized view, more information is shown in the detailed view (lower-right corner) and the predicted motif is illustrated by Sequence Logo (11) [the script for its drawing was taken from WebLogo (12)] or a weight matrix. This integrated result helps finding motif candidates and figuring out the outline of cis-regulatory modules. With the ‘PDF’ button, the output can be saved as a pdf file, which is useful either for users’ further manipulation and inclusion in publication or for getting the entire view by adjusting the scale. The ‘FIT’ button is used for conveniently getting the entire view along its horizontal axis and for hiding the detailed information at its lower half. Furthermore, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, A. thaliana, S. cerevisiae, E. coli or B. subtilis) or to public motif databases [JASPAR (13) or DBTBS (14)] in order to find similar motifs. For the former search, we used the HMMER package by Sean Eddy (http://hmmer.janelia.org/). More details are available from the help document.

EXAMPLES AND DISCUSSION

To illustrate how Melina II works, we give two examples. The first is a set of artificial DNA sequences containing several known motifs. The second consists of upstream sequences of functionally related genes.

Example 1: Embedded motifs in artificial sequences

In this example, the dataset consists of three 250-bp long DNA sequences (Figure 2a). Each DNA sequence was randomly generated by the Random Sequence Generator, which is a function of Melina II. Three known consensus motifs were inserted into each sequence (Figure 2b). Motifs were set in random order to check the influence of their location. In general, it is difficult for multiple alignment programs to detect all motifs from this kind of dataset.

Figure 2.

Embedded motifs in artificial sequences.

Embedded motifs in artificial sequences. In this case, we used four algorithms, Consensus, MEME, Gibbs sampler and Weeder, with their original default parameters. This result shows that there is no predictive algorithm which can correctly detect all motifs. However, we can recover all the inserted motifs if we take motifs detected by at least two algorithms, as illustrated in Figure 3.

Figure 3.

Result view of example 1.

Result view of example 1. For the same dataset, we show another result in Figure 4. In this case, we used Consensus with default parameters and Gibbs sampler with three different sets of parameter values. This result clearly shows that values of parameters such as motif size and cut-off value can significantly influence motif detection. Because Melina II enables fine specification of parameters, expert users can analyze datasets multilaterally.

Figure 4.

Another result of example 1 using different parameter values.

Example 2: Upstream sequences of functionally related genes

We present here an example of real promoters containing a common motif. This dataset consists of 300 bp upstream sequences from the translational start sites of five Bacillus subtilis genes, known to be regulated by a well-known global regulator, CcpA. As shown in Figure 5, a common motif is identified and, through the search against DBTBS, it is confirmed that the motif found corresponds to the CcpA motif. (Figure 5b and c).

Figure 5.

Real promoters and motif database search.

FUTURE PROSPECTS

One future direction is to endow Melina a function to ‘guide’ favorable parameter values to improve the detection accuracy. It is not an easy task because optimal parameter values for each algorithm could depend on, say, the length and the number of input sequences as well as the nature of the pattern to be sought. Nevertheless, it seems to be possible more or less to categorize typical cases with suggested optimal parameter values for each (15).

Implementation

Melina II was developed as a web-based tool by using Adobe (Macromedia) Flash. You may need to install the Flash Plug-in beforehand.

13 in total

Review 1. DNA binding sites: representation and discovery.

Authors: G D Stormo
Journal: Bioinformatics Date: 2000-01 Impact factor: 6.937

2. An algorithm for finding signals of unknown length in DNA sequences.

Authors: G Pavesi; G Mauri; G Pesole
Journal: Bioinformatics Date: 2001 Impact factor: 6.937

3. Melina: motif extraction from promoter regions of potentially co-regulated genes.

Authors: Natalia Poluliakh; Toshihisa Takagi; Kenta Nakai
Journal: Bioinformatics Date: 2003-02-12 Impact factor: 6.937

4. WebLogo: a sequence logo generator.

Authors: Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal: Genome Res Date: 2004-06 Impact factor: 9.043

5. Sequence logos: a new way to display consensus sequences.

Authors: T D Schneider; R M Stephens
Journal: Nucleic Acids Res Date: 1990-10-25 Impact factor: 16.971

6. Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Authors: T L Bailey; C Elkan
Journal: Proc Int Conf Intell Syst Mol Biol Date: 1994

7. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors: C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal: Science Date: 1993-10-08 Impact factor: 47.728

8. DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics.

Authors: Yuko Makita; Mitsuteru Nakao; Naotake Ogasawara; Kenta Nakai
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

9. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments.

Authors: X Shirley Liu; Douglas L Brutlag; Jun S Liu
Journal: Nat Biotechnol Date: 2002-07-08 Impact factor: 54.908

Review 10. Computational identification of transcriptional regulatory elements in DNA sequence.

Authors: Debraj GuhaThakurta
Journal: Nucleic Acids Res Date: 2006-07-19 Impact factor: 16.971

13 in total

1. SENSITIVE TO PROTON RHIZOTOXICITY1, CALMODULIN BINDING TRANSCRIPTION ACTIVATOR2, and other transcription factors are involved in ALUMINUM-ACTIVATED MALATE TRANSPORTER1 expression.

Authors: Mutsutomo Tokizawa; Yuriko Kobayashi; Tatsunori Saito; Masatomo Kobayashi; Satoshi Iuchi; Mika Nomoto; Yasuomi Tada; Yoshiharu Y Yamamoto; Hiroyuki Koyama
Journal: Plant Physiol Date: 2015-01-27 Impact factor: 8.340

2. Mmi1 RNA surveillance machinery directs RNAi complex RITS to specific meiotic genes in fission yeast.

Authors: Edwige Hiriart; Aurélia Vavasseur; Leila Touat-Todeschini; Akira Yamashita; Benoit Gilquin; Emeline Lambert; Jonathan Perot; Yuichi Shichino; Nicolas Nazaret; Cyril Boyault; Joel Lachuer; Daniel Perazza; Masayuki Yamamoto; André Verdel
Journal: EMBO J Date: 2012-04-20 Impact factor: 11.598

3. Upstream Open Reading Frame and Phosphate-Regulated Expression of Rice OsNLA1 Controls Phosphate Transport and Reproduction.

Authors: Shu-Yi Yang; Wen-Chien Lu; Swee-Suak Ko; Ching-Mei Sun; Jo-Chi Hung; Tzyy-Jen Chiou
Journal: Plant Physiol Date: 2019-10-28 Impact factor: 8.340

4. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis.

Authors: Kjetil Klepper; Finn Drabløs
Journal: BMC Bioinformatics Date: 2013-01-16 Impact factor: 3.169

5. Prediction of transcriptional regulatory elements for plant hormone responses based on microarray data.

Authors: Yoshiharu Y Yamamoto; Yohei Yoshioka; Mitsuro Hyakumachi; Kyonoshin Maruyama; Kazuko Yamaguchi-Shinozaki; Mutsutomo Tokizawa; Hiroyuki Koyama
Journal: BMC Plant Biol Date: 2011-02-24 Impact factor: 4.215

6. Characterization of marine diatom-infecting virus promoters in the model diatom Phaeodactylum tricornutum.

Authors: Takashi Kadono; Arisa Miyagawa-Yamaguchi; Nozomu Kira; Yuji Tomaru; Takuma Okami; Takamichi Yoshimatsu; Liyuan Hou; Takeshi Ohama; Kazunari Fukunaga; Masanori Okauchi; Haruo Yamaguchi; Kohei Ohnishi; Angela Falciatore; Masao Adachi
Journal: Sci Rep Date: 2015-12-22 Impact factor: 4.379

7. Genetic tools for advancement of Synechococcus sp. PCC 7002 as a cyanobacterial chassis.

Authors: Anne M Ruffing; Travis J Jensen; Lucas M Strickland
Journal: Microb Cell Fact Date: 2016-11-10 Impact factor: 5.328

8. Identification of chromosomal alpha-proteobacterial small RNAs by comparative genome analysis and detection in Sinorhizobium meliloti strain 1021.

Authors: Vincent M Ulvé; Emeric W Sevin; Angélique Chéron; Frédérique Barloy-Hubler
Journal: BMC Genomics Date: 2007-12-19 Impact factor: 3.969

9. A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1.

Authors: Christof Francke; Robert Kerkhoven; Michiel Wels; Roland J Siezen
Journal: BMC Genomics Date: 2008-03-27 Impact factor: 3.969

10. DynaMIT: the dynamic motif integration toolkit.

Authors: Erik Dassi; Alessandro Quattrone
Journal: Nucleic Acids Res Date: 2015-08-07 Impact factor: 16.971