Literature DB >> 20106815

iMotifs: an integrated sequence motif visualization and analysis environment.

Matias Piipari1, Thomas A Down, Harpreet Saini, Anton Enright, Tim J P Hubbard.   

Abstract

MOTIVATION: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. AVAILABILITY: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. CONTACT: matias.piipari@gmail.com; imotifs@googlegroups.com.

Entities:  

Mesh:

Year:  2010        PMID: 20106815      PMCID: PMC2832821          DOI: 10.1093/bioinformatics/btq026

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Until recent years, studying sequence specificity of transcription factors systematically has been limited to a relatively small number of organisms and transcription factors. High-throughput protein–DNA interaction assays such as protein binding microarrays (Berger et al., 2006), bacterial one-hybrid screens (Meng et al., 2005), large ChIP-chip studies and advances in motif inference algorithms and tools has, however, caused an expansion of motif databases such as UNI-PROBE (Newburger and Bulyk, 2009), TRANSFAC (Matys et al., 2006) and JASPAR (Bryne, 2008). Sequence motif analysis tools can be hard to deploy and use locally. Many commonly used software packages have therefore been made available as web applications (Mahony and Benos, 2007; Thomas-Chollier et al., 2008). Public servers can, however, be limited in the CPU time given to users which can rule out their use for large-scale studies. Data exchange and usability can also be a challenge. Therefore, we have created an OS X-based desktop software package for sequence motif analysis that is easy to install and update. Compared with previously published desktop-based cis-regulatory sequence analysis tools such as TOUCAN (Aerts et al., 2003) or Sockeye (Montgomery et al., 2004), iMotifs is more focused on visualization and computation of sequence motifs, although it also supports visualizing scored motif matches in sequences.

1.1 Features

iMotifs is designed for visualization and analysis of cis-regulatory motifs and sequences. It can be used to retrieve sequences (e.g. for a coregulated group of genes), infer cis-regulatory motifs from them and score sequences with motif models, visualize them and their scored matches and compare them against other motifs (Fig. 1 shows the core functionality). A tutorial is included on the web site for common tasks (see Availability). Motifs can be manipulated and moved between sets by dragging and dropping, and filtered using keyword searches. Summary statistics such as entropy, column count or distance from closest pair can also be shown alongside. Free form key-value pair metadata such as database identifiers, species or notes can be viewed and edited. PDF export and printing is available. Import and export of TRANSFAC formatted motif files is also possible.
Fig. 1.

iMotifs can present motif sets and alignments. It integrates with the OS X desktop's previewing functionality and includes a number of analysis tools including an integrated NestedMICA motif inference tool.

iMotifs can present motif sets and alignments. It integrates with the OS X desktop's previewing functionality and includes a number of analysis tools including an integrated NestedMICA motif inference tool. iMotifs can be used to retrieve sequences from the Ensembl database (Hubbard et al., 2009). The retrieved sequences can be aligned either to transcription start sites (putative promoter sequence) or ends (e.g. for micro-RNA seed finding), and they can be filtered by gene identifiers. The retrieval tool can fetch specific sequence regions using GFF formatted annotation files, and includes specific support for ranking and retrieving regions of interest based on ChIP-seq ‘peaks’: MACS (Zhang et al., 2008), FindPeaks (Fejes et al., 2008) and SWEMBL formats are supported. Sequences are optionally processed to mask repeats and translated sequence. iMotifs supports the quick previewing and thumbnailing service native to OS X (QuickLook). Previewing is especially useful for browsing sequence motif sets stored remotely (e.g. on a remote cluster) as no manual transfer or file opening is needed. An automated software update mechanism is included. Many common motif analysis tasks are supported. These include finding closest matching and reciprocally matching motif pairs between two motif sets with the distance metric and algorithm described in Down et al. (2007). Motif multiple alignments can be visualized and computed with a greedy gapless motif multiple alignment algorithm. Motif inference experiments can be run with the integrated NestedMICA (Down and Hubbard, 2005) tool simply by dragging FASTA formatted sequence files to iMotifs. Downstream analyses such as motif scanning, overrepresentation analysis and motif hit score cutoff assignment as described in (Down et al., 2007) are also possible. Analysis tasks are run in parallel without blocking the user interacting with the application.

1.2 Interoperability

Although iMotifs itself works only on computers running Mac OS X, the analysis tools developed for and included in iMotifs are cross-plaform (Java based) and depend only on libraries included with the package. Most analysis functions are implemented by stand-alone command-line programs. This makes it possible to rapidly integrate unmodified tools into iMotifs. The included analysis tools can also be run on any UNIX system without iMotifs. We feel that the use of a standard format for exchanging sequence motif data is beneficial for the research community, given the literally hundreds of motif inference tools and databases that are available [reviewed in Das and Dai (2007)]. To encourage the take up of a standard file format for motifs, we provide a programming interface for the input and output of the annotated motif file format XMS for the Perl, Ruby, R and Objective-C languages. The Perl and R libraries can also be used to visualize sequence logos.

2 CONCLUSION

We have created an integrated desktop application for short sequence motif analysis. It incorporates visualization, inference, alignment and comparison tools. The application widens the user base of sequence motif analysis tools and can improve the productivity of researchers working with sequence motif data. We aim to integrate with more sequence motif analysis tools and web services and to develop further the already included basic protein motif visualization and inference support. We also encourage the introduction of a standard format for exchange of sequence motif data by providing conversion utilities and an API for input and output of XMS motif set files for a number of common bioinformatics programming languages.
  15 in total

1.  A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors.

Authors:  Xiangdong Meng; Michael H Brodsky; Scot A Wolfe
Journal:  Nat Biotechnol       Date:  2005-07-24       Impact factor: 54.908

2.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.

Authors:  Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk
Journal:  Nat Biotechnol       Date:  2006-09-24       Impact factor: 54.908

3.  Toucan: deciphering the cis-regulatory logic of coregulated genes.

Authors:  Stein Aerts; Gert Thijs; Bert Coessens; Mik Staes; Yves Moreau; Bart De Moor
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

4.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

5.  Large-scale discovery of promoter motifs in Drosophila melanogaster.

Authors:  Thomas A Down; Casey M Bergman; Jing Su; Tim J P Hubbard
Journal:  PLoS Comput Biol       Date:  2006-12-05       Impact factor: 4.475

6.  NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.

Authors:  Thomas A Down; Tim J P Hubbard
Journal:  Nucleic Acids Res       Date:  2005-03-10       Impact factor: 16.971

7.  Ensembl 2009.

Authors:  T J P Hubbard; B L Aken; S Ayling; B Ballester; K Beal; E Bragin; S Brent; Y Chen; P Clapham; L Clarke; G Coates; S Fairley; S Fitzgerald; J Fernandez-Banet; L Gordon; S Graf; S Haider; M Hammond; R Holland; K Howe; A Jenkinson; N Johnson; A Kahari; D Keefe; S Keenan; R Kinsella; F Kokocinski; E Kulesha; D Lawson; I Longden; K Megy; P Meidl; B Overduin; A Parker; B Pritchard; D Rios; M Schuster; G Slater; D Smedley; W Spooner; G Spudich; S Trevanion; A Vilella; J Vogel; S White; S Wilder; A Zadissa; E Birney; F Cunningham; V Curwen; R Durbin; X M Fernandez-Suarez; J Herrero; A Kasprzyk; G Proctor; J Smith; S Searle; P Flicek
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

8.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update.

Authors:  Jan Christian Bryne; Eivind Valen; Man-Hung Eric Tang; Troels Marstrand; Ole Winther; Isabelle da Piedade; Anders Krogh; Boris Lenhard; Albin Sandelin
Journal:  Nucleic Acids Res       Date:  2007-11-15       Impact factor: 16.971

9.  STAMP: a web tool for exploring DNA-binding motif similarities.

Authors:  Shaun Mahony; Panayiotis V Benos
Journal:  Nucleic Acids Res       Date:  2007-05-03       Impact factor: 16.971

Review 10.  A survey of DNA motif finding algorithms.

Authors:  Modan K Das; Ho-Kwok Dai
Journal:  BMC Bioinformatics       Date:  2007-11-01       Impact factor: 3.169

View more
  4 in total

1.  Meta-Analysis of Immune Induced Gene Expression Changes in Diverse Drosophila melanogaster Innate Immune Responses.

Authors:  Ashley L Waring; Joshua Hill; Brooke M Allen; Nicholas M Bretz; Nguyen Le; Pooja Kr; Dakota Fuss; Nathan T Mortimer
Journal:  Insects       Date:  2022-05-23       Impact factor: 3.139

2.  Metamotifs--a generative model for building families of nucleotide position weight matrices.

Authors:  Matias Piipari; Thomas A Down; Tim Jp Hubbard
Journal:  BMC Bioinformatics       Date:  2010-06-25       Impact factor: 3.169

3.  Evolutionarily conserved transcription factors drive the oxidative stress response in Drosophila.

Authors:  Sarah M Ryan; Kaitie Wildman; Briseida Oceguera-Perez; Scott Barbee; Nathan T Mortimer; Alysia D Vrailas-Mortimer
Journal:  J Exp Biol       Date:  2020-07-20       Impact factor: 3.312

4.  The two most common histological subtypes of malignant germ cell tumour are distinguished by global microRNA profiles, associated with differential transcription factor expression.

Authors:  Matthew J Murray; Harpreet K Saini; Stijn van Dongen; Roger D Palmer; Balaji Muralidhar; Mark R Pett; Matias Piipari; Claire M Thornton; James C Nicholson; Anton J Enright; Nicholas Coleman
Journal:  Mol Cancer       Date:  2010-11-08       Impact factor: 27.401

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.