Literature DB >> 28398459

MISA-web: a web server for microsatellite prediction.

Sebastian Beier1, Thomas Thiel2, Thomas Münch1, Uwe Scholz1, Martin Mascher1,3.   

Abstract

MOTIVATION: Microsatellites are a widely-used marker system in plant genetics and forensics. The development of reliable microsatellite markers from resequencing data is challenging.
RESULTS: We extended MISA, a computational tool assisting the development of microsatellite markers, and reimplemented it as a web-based application. We improved compound microsatellite detection and added the possibility to display and export MISA results in GFF3 format for downstream analysis.
AVAILABILITY AND IMPLEMENTATION: MISA-web can be accessed under http://misaweb.ipk-gatersleben.de/. The website provides tutorials, usage note as well as download links to the source code. CONTACT: scholz@ipk-gatersleben.de.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28398459      PMCID: PMC5870701          DOI: 10.1093/bioinformatics/btx198

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Microsatellites arose about 25 years ago (Tautz and Schlotterer, 1994) and still remain a commonly used genetic marker system in plant genetics and breeding (Miah ; Matthies )and forensics (Butler, 2005), where they are commonly referred to as simple sequence repeats (SSRs) or short tandem repeats (STR), respectively. The basic building block of a microsatellite is a short sequence motif (usually between one and six base-pairs in length) that is repeated in tandem. These characteristic features can be detected by the in silico analysis of nucleotide sequences obtained by traditional Sanger or high-throughput resequencing data. The MISA microsatellite finder (Thiel ) is a tool for finding microsatellites in nucleotide sequences. In addition to the detection of perfect microsatellites, MISA is also able to find perfect compound microsatellites that are composed multiple occurrences of more than one simple sequence motif. MISA has been widely used over the past ten years, during which two major limitations of MISA have become evident: The Generic Feature Format Version 3 (GFF3, https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) is a commonly used format in genomic data analysis. GFF3 is a tabular format that lists features in nucleotide sequences and provides ontology-based feature classification. The current MISA implementation requires computational expertise and access to a UNIX environment to (i) run the PERL script and (ii) process the results for most downstream applications. The MISA output contains an overview of identified microsatellites in a proprietary format, which cannot be easily parsed for downstream analysis. Here, we present the MISA-web, an extension to the command line tool MISA embedded into an easy-to-use web-based graphical user interface available from http://misaweb.ipk-gatersleben.de/.

2 Materials and methods

2.1 Workflow and implementation

A microsatellite analysis with the command line version of MISA requires two input files: (i) a configuration file (‘MISA.ini’) with three input parameters: ‘SSR search parameters’, ‘compound SSR search parameter’ and ‘output file type parameter’; and (ii) a FASTA file containing the nucleotide sequence that is to be mined for microsatellites. MISA-web runs on a standard Linux server and works in conjunction with several helper scripts and programs in addition to the core MISA PERL script. The outline of the implemented workflow is as follows: Periodically running scripts in PHP and UNIX shell monitor server load and schedule the execution of MISA analysis requests by users of the web site. Entries from the input fields of the web form are compiled into the two input files. The nucleotide sequences are combined into a single file in FASTA format (.fasta). The other entry fields are written to the MISA.ini file. If no parameters are specified by the user, preset default parameters as shown on the web site will be used. After the conversion of input variables, the core PERL function MISA.pl is called. Upon its successful termination, the result files are compressed with UNIX gzip, and the archive is sent to a user-specified email address. A typical workflow is presented in Figure 1.
Fig. 1

MISA-web analysis workflow. MISA-web was updated and set up as a web-application on the IPK server. Users may either paste their nucleotide sequence of interest in the input fields of MISA-web or supply accession numbers to have the corresponding sequences fetched from NCBI (1). Once all input fields have been filled (2), a click on the start button on the bottom of the page starts the analysis. The computation will be conducted on a compute server (3) and the result files will be sent to a user-specified email address (4). Result files can be examined afterwards (5)

MISA-web analysis workflow. MISA-web was updated and set up as a web-application on the IPK server. Users may either paste their nucleotide sequence of interest in the input fields of MISA-web or supply accession numbers to have the corresponding sequences fetched from NCBI (1). Once all input fields have been filled (2), a click on the start button on the bottom of the page starts the analysis. The computation will be conducted on a compute server (3) and the result files will be sent to a user-specified email address (4). Result files can be examined afterwards (5) MISA-web can retrieve sequences from the NCBI database by specifying the corresponding accession numbers in the input field. MISA-web then communicates with the NCBI servers using PHP (www.php.net) and JQuery (www.jquery.com), downloads the sequences and reports them as FASTA sequence in the textbox. A comma-separated list of accession numbers can be entered to retrieve multiple sequences at once (up to a maximum sequence length of 2 Mb).

2.2 Output formats

MISA-web supports two different output formats: the proprietary MISA output format and generic GFF3.

3 Validation

To compare the performance of MISA-web we analyzed ten sequence assemblies of barley bacterial artificial chromosomes (BACs) published by (Munoz-Amatriain ). The assemblies (accession numbers: AC256511.1, AC269605.1, AC265197.1, AC263353.1, AC264961.1, AC266636.1, AC261250.1, AC267178.1, AC259365.1, AC257258.1) were retrieved from the NCBI database. A total of 6,022 microsatellites were identified with the following parameters set: motif length 1 to 6; repetition minimum of 5; 0 base pairs between two microsatellites for compound SSR detection. Almost all of these microsatellites (98%) are simple mononucleotide microsatellites, while 0.16% and 0.03% were di- and trinucleotide microsatellites, respectively. Only two tetranucleotide microsatellites were found. We evaluated seven other microsatellite detection tools on the same BAC dataset: GMATo (Wang ), IMEx (Mudunuri and Nagarajaram, 2007), mreps (Kolpakov ), ProGeRF (Lopes ), SciRoKo (Kofler ), TRF (Benson, 1999) and TROLL (Castelo ). The tools IMEx, TRF and ProGeRF are accessible as web application. We disabled compound microsatellite detection used a motif length between 1 and 6 with minimum number of repetition of 5 for all motif lengths. If possible we turned off imperfect microsatellite detection (Table 1).
Table 1.

Comparison of detected microsatellites and execution time (in seconds) of GMATo, TRF, TROLL, mreps, SciRoKo, ProGeRF and MISA-web

SequenceGMAToTRFTROLLMrepsSciRoKoProGeRFMISA-web
AC256511.1 (113 kb)549580150656549560549
AC257258.1 (124 kb)938943196585938901938
AC259365.1 (118 kb)641666158476641628641
AC261250.1 (91 kb)498457116660498456498
AC263353.1 (33 kb)153173413153142153
AC264961.1 (126 kb)6546201641654605654
AC265197.1 (113 kb)505496140744505503505
AC266636.1 (167 kb)839865217479839811839
AC267178.1 (121 kb)517530152446516496517
AC269605.1 (119 kb)728676171176728700728
Sum6022600615091522602158026022
Execute time per batch [sec]7.49830.7351.0421.2860.64320.9941.796
Comparison of detected microsatellites and execution time (in seconds) of GMATo, TRF, TROLL, mreps, SciRoKo, ProGeRF and MISA-web The tool IMEx generated errors while executing due to operating system incompatibility as reported by (Lopes ). The programs mreps and TROLL required the plain nucleotide sequence without a header. Apart from TROLL and mreps all tools found about 6000 microsatellites in the ten BAC sequences. TROLL detected more than 15,000 microsatellites because it also reports degenerated (imperfect) microsatellites by default. Mreps detected the lowest amount of SSRs due to a hardcoded minimum output sequence length that prevented the identification of small microsatellites. Mreps did not report results for BACs AC263353.1 and AC264961.1 because of an excessive number of ‘N’ characters in their sequences. TRF reported spurious microsatellites as a result of substituting ‘N’ bases with random nucleotides which in turn increased the amount of reported microsatellites. In order to get comparable results for TRF the user needs to manually remove every microsatellites that includes at least a single ‘N’ character. Among the evaluated tools here, only ProGeRF is able to detect microsatellites in protein sequences. The execution time of MISA-web is comparable to that of the other tools. SciRoKo and TRF were the fastest and slowest programs, respectively.

4 Conclusion

We developed the web-application MISA-web as an extension of the microsatellite finder MISA with a user-friendly GUI and improved output formatting options. The GFF3 output format facilitates the integration of MISA-web search results in downstream analysis pipelines.
  11 in total

1.  TROLL--tandem repeat occurrence locator.

Authors:  Adalberto T Castelo; Wellington Martins; Guang R Gao
Journal:  Bioinformatics       Date:  2002-04       Impact factor: 6.937

2.  mreps: Efficient and flexible detection of tandem repeats in DNA.

Authors:  Roman Kolpakov; Ghizlane Bana; Gregory Kucherov
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

3.  IMEx: Imperfect Microsatellite Extractor.

Authors:  Suresh B Mudunuri; Hampapathalu A Nagarajaram
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

4.  SciRoKo: a new tool for whole genome microsatellite search and investigation.

Authors:  Robert Kofler; Christian Schlötterer; Tamas Lelley
Journal:  Bioinformatics       Date:  2007-04-26       Impact factor: 6.937

5.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

Review 6.  Simple sequences.

Authors:  D Tautz
Journal:  Curr Opin Genet Dev       Date:  1994-12       Impact factor: 5.578

7.  ProGeRF: proteome and genome repeat finder utilizing a fast parallel hash function.

Authors:  Robson da Silva Lopes; Walas Jhony Lopes Moraes; Thiago de Souza Rodrigues; Daniella Castanheira Bartholomeu
Journal:  Biomed Res Int       Date:  2015-02-25       Impact factor: 3.411

8.  GMATo: A novel tool for the identification and analysis of microsatellites in large genomes.

Authors:  Xuewen Wang; Peng Lu; Zhaopeng Luo
Journal:  Bioinformation       Date:  2013-06-08

Review 9.  A review of microsatellite markers and their applications in rice breeding programs to improve blast disease resistance.

Authors:  Gous Miah; Mohd Y Rafii; Mohd R Ismail; Adam B Puteh; Harun A Rahim; Kh Nurul Islam; Mohammad Abdul Latif
Journal:  Int J Mol Sci       Date:  2013-11-14       Impact factor: 5.923

10.  Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

Authors:  María Muñoz-Amatriaín; Stefano Lonardi; MingCheng Luo; Kavitha Madishetty; Jan T Svensson; Matthew J Moscou; Steve Wanamaker; Tao Jiang; Andris Kleinhofs; Gary J Muehlbauer; Roger P Wise; Nils Stein; Yaqin Ma; Edmundo Rodriguez; Dave Kudrna; Prasanna R Bhat; Shiaoman Chao; Pascal Condamine; Shane Heinen; Josh Resnik; Rod Wing; Heather N Witt; Matthew Alpert; Marco Beccuti; Serdar Bozdag; Francesca Cordero; Hamid Mirebrahim; Rachid Ounit; Yonghui Wu; Frank You; Jie Zheng; Hana Simková; Jaroslav Dolezel; Jane Grimwood; Jeremy Schmutz; Denisa Duma; Lothar Altschmied; Tom Blake; Phil Bregitzer; Laurel Cooper; Muharrem Dilbirligi; Anders Falk; Leila Feiz; Andreas Graner; Perry Gustafson; Patrick M Hayes; Peggy Lemaux; Jafar Mammadov; Timothy J Close
Journal:  Plant J       Date:  2015-09-21       Impact factor: 6.417

View more
  367 in total

1.  Comparative RNA-Seq profiling of a resistant and susceptible peanut (Arachis hypogaea) genotypes in response to leaf rust infection caused by Puccinia arachidis.

Authors:  Visha Rathod; Rasmieh Hamid; Rukam S Tomar; Rushika Patel; Shital Padhiyar; Jasminkumar Kheni; P P Thirumalaisamy; Nasreen S Munshi
Journal:  3 Biotech       Date:  2020-06-01       Impact factor: 2.406

2.  Sequencing of Complete Chloroplast Genomes.

Authors:  Berthold Heinze
Journal:  Methods Mol Biol       Date:  2021

3.  Development of Microsatellite Markers Using Next-Generation Sequencing.

Authors:  Hélène Vignes; Ronan Rivallan
Journal:  Methods Mol Biol       Date:  2021

4.  First de novo genome specific development, characterization and validation of simple sequence repeat (SSR) markers in Genus Salvadora.

Authors:  Maneesh S Bhandari; Rajendra K Meena; Arzoo Shamoon; Shanti Saroj; Rama Kant; Shailesh Pandey
Journal:  Mol Biol Rep       Date:  2020-09-15       Impact factor: 2.316

5.  Development of whole-genome multiplex assays and construction of an integrated genetic map using SSR markers in Senegalese sole.

Authors:  Israel Guerrero-Cózar; Cathaysa Perez-Garcia; Hicham Benzekri; J J Sánchez; Pedro Seoane; Fernando Cruz; Marta Gut; Maria Jesus Zamorano; M Gonzalo Claros; Manuel Manchado
Journal:  Sci Rep       Date:  2020-12-14       Impact factor: 4.379

6.  Reduced representation approach for identification of genome-wide SNPs and their annotation for economically important traits in Indian Tharparkar cattle.

Authors:  M Joel Devadasan; D Ravi Kumar; M R Vineeth; Anjali Choudhary; T Surya; S K Niranjan; Archana Verma; Jayakumar Sivalingam
Journal:  3 Biotech       Date:  2020-06-16       Impact factor: 2.406

7.  Development of polymorphic microsatellite markers for the Tertiary relict tree species Taiwania cryptomerioides (Cupressaceae) in East Asia.

Authors:  Mengyun Qin; Ningning Zhang; Shixin Zhu; Caipeng Yue; Jinyong Huang; Hao Dong; Yang Lu
Journal:  Mol Biol Rep       Date:  2021-03-23       Impact factor: 2.316

8.  Genome-wide Identification of Structure-Forming Repeats as Principal Sites of Fork Collapse upon ATR Inhibition.

Authors:  Nishita Shastri; Yu-Chen Tsai; Suzanne Hile; Deondre Jordan; Barrett Powell; Jessica Chen; Dillon Maloney; Marei Dose; Yancy Lo; Theonie Anastassiadis; Osvaldo Rivera; Taehyong Kim; Sharvin Shah; Piyush Borole; Kanika Asija; Xiang Wang; Kevin D Smith; Darren Finn; Jonathan Schug; Rafael Casellas; Liliya A Yatsunyk; Kristin A Eckert; Eric J Brown
Journal:  Mol Cell       Date:  2018-10-04       Impact factor: 17.970

9.  PacBio single-molecule long-read sequencing shed new light on the transcripts and splice isoforms of the perennial ryegrass.

Authors:  Lijuan Xie; Ke Teng; Penghui Tan; Yuehui Chao; Yinruizhi Li; Weier Guo; Liebao Han
Journal:  Mol Genet Genomics       Date:  2020-01-01       Impact factor: 3.291

10.  Development of 12 microsatellite markers for Bombina orientails based on RNA-Seq and their usefulness in population genetic diversity.

Authors:  Yanshuang Shi; Liqun Yu; Xiaomin Han; Shuai Zhao; Tianfu Niu; Chunzhu Xu
Journal:  Mol Biol Rep       Date:  2018-09-10       Impact factor: 2.316

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.