Literature DB >> 21364802

G-IMEx: A comprehensive software tool for detection of microsatellites from genome sequences.

Suresh B Mudunuri, Pankaj Kumar, Allam Appa Rao, S Pallamsetty, H A Nagarajaram.   

Abstract

Microsatellites are ubiquitous short tandem repeats found in all known genomes and are known to play a very important role in various studies and fields including DNA fingerprinting, paternity studies, evolutionary studies, virulence and adaptation of certain bacteria and viruses etc. Due to the sequencing of several genomes and the availability of enormous amounts of sequence data during the past few years, computational studies of microsatellites are of interest for many researchers. In this context, we developed a software tool called Imperfect Microsatellite Extractor (IMEx), to extract perfect, imperfect and compound microsatellites from genome sequences along with their complete statistics. Recently we developed a user-friendly graphical-interface using JAVA for IMEx to be used as a stand-alone software named G-IMEx. G-IMEx takes a nucleotide sequence as an input and the results are produced in both html and text formats. The Linux version of G-IMEx can be downloaded for free from http://www.cdfd.org.in/imex.

Entities:  

Keywords:  Bioinformatics Tool; Genomes; Microsatellites; Simple Sequence Repeats; Stand-alone program

Year:  2010        PMID: 21364802      PMCID: PMC3040503          DOI: 10.6026/97320630005221

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Microsatellites, also known as Simple Sequence Repeats (SSRs) or Short Tandem Repeats (STRs), are tandem repetitions of a nucleotide motif of size 1-6 bp. They are distributed in both coding as well as non-coding regions of all known genomes. Because of their polymorphic nature, they are known to play an important role in gene regulation, pathogenesis, bacterial adaptation and in evolution of genomes [1-5]. They are also applied in various fields such as DNA fingerprinting, Paternity studies, Forensics, Evolutionary studies etc. As the sequencing of new genomes is increasing day-by-day, microsatellites of many genomes remain unexplored. Analysis of these microsatellites is important to understand their role in various studies. Computational analysis is a better alternative to the time-consuming and money-intensive traditional wet lab microsatellite studies. A software tool that can extract all types of microsatellites with greater sensitivity and provides flexible options to analyze the repeats detected is the need of the day. Few tools [6-9] exist in the public domain for extracting microsatellites from genome sequences, but many of them suffer from certain lacunae in- terms of their features and their efficiency. In the course of our studies on evolution of microsatellites in prokaryotic genomes, we developed a novel algorithm [10] to detect imperfect microsatellites from nucleotide sequences. The algorithm has been implemented in the form of a stand- alone software with a user-friendly graphical user interface (GUI) called G-IMEx. The present communication gives the details of this software.

Methodology

The algorithmic details of IMEx have been reported elsewhere [10]. For the sake of continuity we reiterate the method. IMEx scans the input sequence and looks for two consecutive exact repeat units or two alternate exact repeat units and considers them as the ‘ candidate ’ microsatellite repeat tract. The ‘ candidate ’ tract is expanded on both sides by allowing few mismatches in each individual repeat unit ( ‘ k ’ ‐ imperfection limit / repeat unit) such that the percentage of imperfection of the entire tract does not cross the threshold set by the user. The expansion is also terminated if a repeat unit with more than ‘ k ’ mismatches is encountered. The program further collates and clusters equivalent microsatellite repeats into families. It also has an option to identify compound microsatellites, which are regions containing more than one microsatellite tract separated by a certain distance as defined by the user.

Software Requirements

G-IMEx has been developed on the Linux platform and requires preinstalled C and Java (for graphical interface). An ideal environment for running G-IMEx would be a latest Fedora or other Linux distribution with a gcc compiler (version 3.4 or higher), Java version (1.6 or higher) and any browser software.

Input options

G-IMEx offers several options for identification, extraction, collation, clustering and reporting of microsatellites from an input DNA sequence in FASTA format. The software can handle large sequences such as genomes easily and is comparatively faster than many other tools. Users can set thelimits for repeat size, repeat number, repeat type and imperfection level. In addition users can set levels (0 to 4) for clustering of equivalent microsatellites and also to detect compound microsatellites i.e., those microsatellites which are close to each other sequentially. There is also an option to use the core IMEx program in batch mode for scanning multiple sequences.

Output options

G-IMEx creates a folder with the name of the input sequence file and the results are stored in two formats ‐ html and text. The text format of results is optional and separate directories are created for text and html results. The output includes a well-formatted summary table file with information such as the repeating motif, repeat number, imperfection %, tract size, nucleotide composition and protein information (if it falls in coding region) etc. Along with the information about the microsatellite extracted, its corresponding alignment with its perfect repeat counter part is also produced automatically in a separate alignment file which facilitates analysis of mutational events in a microsatellite tract. Figure 1 shows the snapshot of the GUI and the result pages of G-IMEx.

Future Work

The current version of G-IMEx is available only for Linux users. Efforts are underway to develop versions compatible to Windows and Macintosh systems.
  9 in total

1.  IMEx: Imperfect Microsatellite Extractor.

Authors:  Suresh B Mudunuri; Hampapathalu A Nagarajaram
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

2.  SciRoKo: a new tool for whole genome microsatellite search and investigation.

Authors:  Robert Kofler; Christian Schlötterer; Tamas Lelley
Journal:  Bioinformatics       Date:  2007-04-26       Impact factor: 6.937

Review 3.  Short-sequence DNA repeats in prokaryotic genomes.

Authors:  A van Belkum; S Scherer; L van Alphen; H Verbrugh
Journal:  Microbiol Mol Biol Rev       Date:  1998-06       Impact factor: 11.056

4.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

5.  Microsatellite instability regulates transcription factor binding and gene expression.

Authors:  Patricia Martin; Katherine Makepeace; Stuart A Hill; Derek W Hood; E Richard Moxon
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-22       Impact factor: 11.205

Review 6.  Adaptive evolution of highly mutable loci in pathogenic bacteria.

Authors:  E R Moxon; P B Rainey; M A Nowak; R E Lenski
Journal:  Curr Biol       Date:  1994-01-01       Impact factor: 10.834

7.  Microsatellite polymorphism across the M. tuberculosis and M. bovis genomes: implications on genome evolution and plasticity.

Authors:  Vattipally B Sreenu; Pankaj Kumar; Javaregowda Nagaraju; Hampapathalu A Nagarajaram
Journal:  BMC Genomics       Date:  2006-04-10       Impact factor: 3.969

8.  Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley.

Authors:  Mauricio La Rota; Ramesh V Kantety; Ju-Kyung Yu; Mark E Sorrells
Journal:  BMC Genomics       Date:  2005-02-18       Impact factor: 3.969

9.  Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites.

Authors:  Robert Kofler; Christian Schlötterer; Evita Luschützky; Tamas Lelley
Journal:  BMC Genomics       Date:  2008-12-17       Impact factor: 3.969

  9 in total
  6 in total

1.  ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation.

Authors:  Viktor A Shamanskiy; Valeria N Timonina; Konstantin Yu Popadin; Konstantin V Gunbin
Journal:  BMC Genomics       Date:  2019-05-08       Impact factor: 3.969

2.  Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms.

Authors:  Eugene Korotkov; Konstantin Zaytsev; Alexey Fedorov
Journal:  Entropy (Basel)       Date:  2022-04-30       Impact factor: 2.738

3.  MfSAT: Detect simple sequence repeats in viral genomes.

Authors:  Ming Chen; Zhongyang Tan; Guangming Zeng
Journal:  Bioinformation       Date:  2011-05-07

4.  Database of Periodic DNA Regions in Major Genomes.

Authors:  Felix E Frenkel; Maria A Korotkova; Eugene V Korotkov
Journal:  Biomed Res Int       Date:  2017-01-15       Impact factor: 3.411

5.  MICdb3.0: a comprehensive resource of microsatellite repeats from prokaryotic genomes.

Authors:  Suresh B Mudunuri; Sujan Patnana; Hampapathalu A Nagarajaram
Journal:  Database (Oxford)       Date:  2014-02-17       Impact factor: 3.451

6.  Detection of Highly Divergent Tandem Repeats in the Rice Genome.

Authors:  Eugene V Korotkov; Anastasiya M Kamionskya; Maria A Korotkova
Journal:  Genes (Basel)       Date:  2021-03-25       Impact factor: 4.096

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.