Literature DB >> 16845092

SSRPrimer and SSR Taxonomy Tree: Biome SSR discovery.

Erica Jewell¹, Andrew Robinson, David Savage, Tim Erwin, Christopher G Love, Geraldine A C Lim, Xi Li, Jacqueline Batley, German C Spangenberg, David Edwards.

Abstract

Simple sequence repeat (SSR) molecular genetic markers have become important tools for a broad range of applications such as genome mapping and genetic diversity studies. SSRs are readily identified within DNA sequence data and PCR primers can be designed for their amplification. These PCR primers frequently cross amplify within related species. We report a web-based tool, SSR Primer, that integrates SPUTNIK, an SSR repeat finder, with Primer3, a primer design program, within one pipeline. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. Results are then parsed to Primer3 for locus specific primer design. We have applied this tool for the discovery of SSRs within the complete GenBank database, and have designed PCR amplification primers for over 13 million SSRs. The SSR Taxonomy Tree server provides web-based searching and browsing of species and taxa for the visualisation and download of these SSR amplification primers. These tools are available at http://bioinformatics.pbcbasc.latrobe.edu.au/ssrdiscovery.html.

Entities: Chemical Species

Mesh：

Substances：

Year: 2006 PMID： 16845092 PMCID： PMC1538772 DOI： 10.1093/nar/gkl083

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Simple sequence repeats (SSRs), also known as microsatellites, have been shown to be one of the most powerful genetic markers in biology. They are common, readily identified DNA features consisting of short (1–6 bp), tandemly repeated sequences, widely and ubiquitously distributed throughout eukaryotic genomes (1) and have been found in all prokaryotic and eukaryotic genomes that have so far been analysed (2). SSRs are highly polymorphic, owing to the mutation affecting the number of repeat units. This hypervariability among related organisms makes them informative and excellent markers for a wide range of applications including high-density genetic mapping, molecular tagging of genes, genotype identification, analysis of genetic diversity, paternity exclusion, phenotype mapping and marker assisted selection of crop plants (3,4). SSRs were initially considered to be evolutionally neutral, (5), though recent evidence suggests an important role in genome evolution (6). SSRs are a source of abundant, non-deleterious mutations that provide variation in the face of stabilizing selection, and their recognized role in the process of evolutionary adaptation is predicted to increase as our knowledge of them expands (7). SSR stability may be correlated with overall levels of genomic stability (8) as mutations which affect SSR stability, such as those involved in DNA mismatch repair, can also influence genomic stability. The nature of SSRs gives them a number of advantages over other molecular markers; (i) multiple SSR alleles may be detected at a single locus using a simple PCR based screen, (ii) SSRs are evenly distributed all over the genome, (iii) they are co-dominant, (iv) very small quantities of DNA are required for screening, and (v) analysis may be semi-automated. Furthermore, SSRs demonstrate a high degree of transferability between species, as PCR primers designed to an SSR within one species frequently amplifies a corresponding locus in related species, making them excellent markers for comparative genetic and genomic analysis. The potential biological function and evolutionary relevance of SSRs is currently under scrutiny and leading to a greater understanding of genomes and genomics (9). Initial suggestions that the majority of DNA was either ‘junk’ or had no biological function are being challenged by the discovery of new functions for these sequences. Various functional roles have now been attributed to SSRs. For example, SSRs are believed to be involved in gene expression, regulation and function (7,10) and there are numerous lines of evidence suggesting that SSRs in noncoding regions may also be of functional significance (7). Furthermore, SSRs provide hotspots of recombination, a variety of SSRs have been found to bind nuclear proteins and there is direct evidence that SSRs can function as transcriptional activating elements (11). A common method for the discovery of SSR loci is to construct genomic DNA libraries enriched for SSR sequences, followed by DNA sequencing (12). This production of enriched libraries is time consuming and the specific sequencing required is expensive. Where abundant sequence data is already available, it is more economical and efficient to use computational tools to identify SSR loci. Flanking DNA sequences may then be analysed for the presence of suitable forward and reverse PCR primers to assay the SSR loci. Several computational tools are currently available for the identification of SSRs within sequence data, as well as for the design of PCR primers suitable for the amplification of specific loci. We have integrated two such tools within one package SSRPrimer, enabling the simultaneous discovery of SSRs within bulk sequence data and the design of specific PCR primers for the amplification of these marker loci (13). An integrated web interface further permits the remote use of this tool. Sequences are initially parsed to SPUTNIK (14) (), which uses a recursive algorithm to search for repeated patterns of nucleotides of length between 2 and 5. The output of SPUTNIK is then parsed to Primer 3 (15) for PCR Primer design. Primers are designed to a defined set of constraints such as oligonucleotide melting temperature (Tm), size, GC content, primer-dimer possibilities, PCR product size and positional constraints around the SSR to identify the optimal forward and reverse primers for the SSR flanking region. The results of the application of the package to the complete GenBank database, SSR Taxonomy Tree, can be browsed and searched for SSRs and amplification primers for any species of interest.

METHODS

SSRPrimer sequence input and pipeline processing

SSRPrimer is a web-based tool that may also be run on the command line. Access to the web server version requires an internet connection and a standard web browser. The web server version of SSRPrimer acts as a web interface and wrapper for the two programs, SPUTNIK and Primer3 that make up the SSR discovery pipeline (Figure 1). The complete pipeline accepts one or more DNA sequences as input along with PCR Primer design options. Each entry sequence is processed in turn using SPUTNIK for the identification of SSRs. If an SSR is identified within a sequence, the sequence along with the SSR location is parsed to Primer3 for PCR amplification primer design. Default parameters for PCR Primer design are designed to increase primer specificity. While these and additional options may be modified on the SSRPrimer submission page (Figure 2), the authors suggest maintaining these strict criteria to ensure robust PCR amplification.

Figure 1

An overview of the SSRPrimer pipeline. Following entry of DNA sequences, each sequence is processed using SPUTNIK. If an SSR is identified, the sequence and SSR location is parsed to Primer3 for the design of suitable PCR amplification primers.

Figure 2

The SSRPrimer web server. Sequences are pasted into the entry box and PCR Primer parameters specified (A). The resulting identified SSRs are listed along with designed PCR primers and amplification parameters (B).

SSR Taxonomy Tree

The SSR Taxonomy Tree server provides access to over 13 million SSR Primer pairs identified through the application of SSRPrimer to the complete GenBank nucleotide sequence database (Figure 3). Default PCR Primer design parameters were one set of primer pairs designed at least 10 bp distant from either side of the identified SSR. Optimum size for the primers are 21 bases with a maximum of 23 bases. Optimum Tm is 55°C with a minimum of 50°C, maximum of 70°C and maximal difference in Tm of 20°C. The maximum GC content is 70%. Results include over 9.7 million, 1.8 million and 82 thousand SSR Primer pairs designed from mammalia, plant and fungal species, respectively. The server permits the searching of taxa by both latin and common names using standard MySQL Boolean operators and wild cards. Taxa may also be browsed through a hierarchical tree. Resulting lists of SSRs and PCR primers may be viewed or downloaded as a tab-delimited text file for input into a spreadsheet. Large files (<5 Mb) may be downloaded in compressed format. Comprehensive help pages include details of how to search, view and download SSRs. Additionally, a browser log details recent and planned data updates and server down time.

Figure 3

The SSR Taxonomy Tree server. A query (Rosaceae) is entered into the search box (A) identifying two matches (B), clicking Rosaceae displays the taxonomic branches leading to the Rosaceae sub taxa and presence of SSRs within sub taxa (C). Sub taxa may be browsed through Rosoideae to Fragaria (D) and identified Fragaria sub taxa SSR primers viewed and downloaded (E).

13 in total

1. Primer3 on the WWW for general users and for biologist programmers.

Authors: S Rozen; H Skaletsky
Journal: Methods Mol Biol Date: 2000

Review 2. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review.

Authors: You-Chun Li; Abraham B Korol; Tzion Fahima; Avigdor Beiles; Eviatar Nevo
Journal: Mol Ecol Date: 2002-12 Impact factor: 6.185

3. Amplification of DNA markers from evolutionarily diverse genomes using single primers of simple-sequence repeats.

Authors: M Gupta; Y S Chyi; J Romero-Severson; J L Owen
Journal: Theor Appl Genet Date: 1994-12 Impact factor: 5.699

4. DNA microsatellites: agents of evolution?

Authors: E R Moxon; C Wills
Journal: Sci Am Date: 1999-01 Impact factor: 2.142

5. Microsatellite libraries enriched for several microsatellite sequences in plants.

Authors: K J Edwards; J H Barker; A Daly; C Jones; A Karp
Journal: Biotechniques Date: 1996-05 Impact factor: 1.993

Review 6. Simple sequence repeats as a source of quantitative genetic variation.

Authors: Y Kashi; D King; M Soller
Journal: Trends Genet Date: 1997-02 Impact factor: 11.639

7. Hypervariability of simple sequences as a general source for polymorphic DNA markers.

Authors: D Tautz
Journal: Nucleic Acids Res Date: 1989-08-25 Impact factor: 16.971

8. Differential distribution of simple sequence repeats in eukaryotic genome sequences.

Authors: M V Katti; P K Ranjekar; V S Gupta
Journal: Mol Biol Evol Date: 2001-07 Impact factor: 16.240

9. Rapid divergence of microsatellite abundance among species of Drosophila.

Authors: Charles L Ross; Kelly A Dyer; Tamar Erez; Susan J Miller; John Jaenike; Therese A Markow
Journal: Mol Biol Evol Date: 2003-05-30 Impact factor: 16.240

10. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions.

Authors: Subbaya Subramanian; Rakesh K Mishra; Lalji Singh
Journal: Genome Biol Date: 2003-01-23 Impact factor: 13.583

8 in total

1. Genetic map construction and QTL mapping of resistance to blackleg (Leptosphaeria maculans) disease in Australian canola (Brassica napus L.) cultivars.

Authors: S Kaur; N O I Cogan; G Ye; R C Baillie; M L Hand; A E Ling; A K McGearey; J Kaur; C J Hopkins; M Todorovic; H Mountford; D Edwards; J Batley; W Burton; P Salisbury; N Gororo; S Marcroft; G Kearney; K F Smith; J W Forster; G C Spangenberg
Journal: Theor Appl Genet Date: 2009-10-11 Impact factor: 5.699

2. WebSat--a web software for microsatellite marker development.

Authors: Wellington Santos Martins; Divino César Soares Lucas; Kelligton Fabricio de Souza Neves; David John Bertioli
Journal: Bioinformation Date: 2009-01-12

3. SAT, a flexible and optimized Web application for SSR marker development.

Authors: Alexis Dereeper; Xavier Argout; Claire Billot; Jean-François Rami; Manuel Ruiz
Journal: BMC Bioinformatics Date: 2007-11-29 Impact factor: 3.169

4. BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

Authors: Frank M You; Naxin Huo; Yong Qiang Gu; Ming-Cheng Luo; Yaqin Ma; Dave Hane; Gerard R Lazo; Jan Dvorak; Olin D Anderson
Journal: BMC Bioinformatics Date: 2008-05-29 Impact factor: 3.169

5. Large-scale identification of polymorphic microsatellites using an in silico approach.

Authors: Jifeng Tang; Samantha J Baldwin; Jeanne Me Jacobs; C Gerard van der Linden; Roeland E Voorrips; Jack Am Leunissen; Herman van Eck; Ben Vosman
Journal: BMC Bioinformatics Date: 2008-09-15 Impact factor: 3.169

6. Fine mapping of a large-effect QTL conferring Fusarium crown rot resistance on the long arm of chromosome 3B in hexaploid wheat.

Authors: Zhi Zheng; Jian Ma; Jiri Stiller; Qiang Zhao; Qi Feng; Frédéric Choulet; Catherine Feuillet; You-Liang Zheng; Yuming Wei; Bin Han; Guijun Yan; John M Manners; Chunji Liu
Journal: BMC Genomics Date: 2015-10-23 Impact factor: 3.969

7. iMSAT: a novel approach to the development of microsatellite loci using barcoded Illumina libraries.

Authors: Jeremy C Andersen; Nicholas J Mills
Journal: BMC Genomics Date: 2014-10-04 Impact factor: 3.969

8. PlantFuncSSR: Integrating First and Next Generation Transcriptomics for Mining of SSR-Functional Domains Markers.

Authors: Gaurav Sablok; Antonio J Pérez-Pulido; Thac Do; Tan Y Seong; Carlos S Casimiro-Soriguer; Nicola La Porta; Peter J Ralph; Andrea Squartini; Antonio Muñoz-Merida; Jennifer A Harikrishna
Journal: Front Plant Sci Date: 2016-06-27 Impact factor: 5.753

8 in total