Literature DB >> 17082205

InSatDb: a microsatellite database of fully sequenced insect genomes.

Sunil Archak1, Eshwar Meduri, P Sravana Kumar, J Nagaraju.   

Abstract

InSatDb presents an interactive interface to query information regarding microsatellite characteristics per se of five fully sequenced insect genomes (fruit-fly, honeybee, malarial mosquito, red-flour beetle and silkworm). InSatDb allows users to obtain microsatellites annotated with size (in base pairs and repeat units); genomic location (exon, intron, up-stream or transposon); nature (perfect or imperfect); and sequence composition (repeat motif and GC%). One can access microsatellite cluster (compound repeats) information and a list of microsatellites with conserved flanking sequences (microsatellite family or paralogs). InSatDb is complete with the insects information, web links to find details, methodology and a tutorial. A separate 'Analysis' section illustrates the comparative genomic analysis that can be carried out using the output. InSatDb is available at www.cdfd.org.in/insatdb.

Entities:  

Mesh:

Year:  2006        PMID: 17082205      PMCID: PMC1634736          DOI: 10.1093/nar/gkl778

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Microsatellites are simple sequence repeats (SSRs) that exhibit complex patterns in their frequency of occurrence, genomic distribution, mutability, function and evolution. Apart from being the source of informative genetic markers, microsatellites per se have attracted a lot of attention with respect to their origin, distribution, expansion, mutation and disintegration (1–7). Questions are also asked about the functional role of microsatellites in particular and biological significance of the microsatellites in general (4,8–12). Genetic studies and whole genome sequence analysis have established non-random distribution, variability and high mutability as characteristics of microsatellites. Evidences are accruing, which support the role of microsatellites in gene regulation, transcription and protein function (13). Existence of qualitative and quantitative differences between microsatellites of different genomes and their role in adaptive evolution have also been theorized (2,8). However, such studies require information on type (mono to hexa), motif (GC%), abundance (motif preferences), frequency, distribution (linkage group-wise and chromosomal position), location (exon, intron, regulatory element and transposon), nature (perfect, imperfect and compound) and copy number (existence of paralogs) of microsatellites not only on a whole genome basis but also as a comparative analysis of multiple genomes that are related by phylogeny (for instance, fully sequenced primate genomes or fungal genomes or insect genomes) to draw functional conclusions. Insects have long exhibited the greatest genetic diversity on earth that has puzzled mankind. Biologists have relied on insects to unravel many fundamental tenets of biology. Whole genome sequences of insects have lived up to the reputation of diversity and have thrown immense variability in size and organization of their genomes. Among others, there are five fully sequenced insect genomes: Drosophila melanogaster (as a model organism it provides maximum annotated data), Anopheles gambiae (another Dipteran but economically highly important as malarial vector), Tribolium castaneum (relatively early insect order of Coleoptera), Apis mellifera (Hymenoptera, relatively a recent insect order) and Bombyx mori (economically important as silk-producing member of Lepidoptera, members of which are crop pests; also significant as a model for insect development). Researchers attempting to understand the biology and evolution of microsatellites are often faced with the following questions: (i) Do microsatellites occur everywhere in the genome? (ii) Does the length of microsatellites have any relationship with their frequency? (iii) Does the flanking sequence composition influence origin of microsatellites? (iv) Does the microsatellite size affect microsatellite disintegration rate? (v) Does the GC content of the motif affect the length, repeat units or mutation rate of microsatellites? (vi) Do genomes possess hotspots and islands of microsatellites? (vii) Is there any favoured association of microsatellites in the compound repeats? (viii) Do microsatellites occur as families of common flanking sequences in the genomes (paralogs)? InSatDb, unlike many other microsatellite databases that cater to only the needs of microsatellites as markers, allows users to address the above-mentioned questions by accessing qualitative and quantitative genome level microsatellites profile of a single insect or to carry out comparative genomic analysis using all the five genomes.

METHODS

Drosophila melanogaster, A.mellifera, A.gambiae and T.castaneum sequences were downloaded from GenBank () and Bombyx mori sequences were downloaded from . Repeats were extracted employing Tandem Repeat Finder version 4 (14). To ensure that the extracted repeat sequences were real microsatellites, those with less than five repeat units and shorter than 15 bp in length were excluded. Tandem Repeat Finder does not employ minimal alignment score for detecting microsatellites; rather a probabilistic model of random repeat sequences specified by per cent identity and frequency of insertions and deletions. This includes calculation of average per cent identity between the copies (pM) and average percentage of insertions and deletions (pI). The algorithm has a pair of matching probability and indel probability values (pM = 0.80, pI = 0.10) as default to cover most divergent copies at every locus. We used two sets of alignment parameters (match, mismatch, gap), (+2, −3, −5) and (+2, −5, −7) to score the matches. All the microsatellites with a minimum alignment score of 30 are reported in the database, which means that both perfect and imperfect microsatellites are listed. The genome sequences were also analysed using RepeatMasker (A.F.A. Smit, R. Hubley and P. Green, unpublished data; ) to obtain indices marking the occurrence of simple repeats, tandem repeats, segmental duplications, interspersed repeats including SINEs, DNA transposons, retrotransposons, LINEs, etc. Further, sequences were analysed for the delineation of exons and introns using GENSCAN (15). Flanking sequences of microsatellites were aligned to catalogue paralogous microsatellites that exhibit identical origin and hence considered belonging to the same microsatellite family. Occurrence of two or more microsatellites contiguously with intervening non-repeat sequence of ≤70 bp were separately categorized as compound repeats.

DATABASE ORGANIZATION

InSatDb is developed as a multi-tier relational database (Figure 1). It stores microsatellites from all the five insect genomes separately as well as carries complete annotations of these microsatellites. The database also provides basic information on each of the five insects and important links to obtain further knowledge, and contains a tutorial page and a glossary page. Microsatellite data can be accessed in two formats. End users with adequate computational capabilities can batch download full complement of microsatellites (insect-wise), microsatellite sequences, compound microsatellites and full list of microsatellite loci existing as families. These data are made available as csv files, which are compatible with spreadsheet programmes such as MS Excel. Alternatively, details of the microsatellites with highly specific characteristics may be queried using a multi-option query sheet (Figure 2). The options include insect (one at a time); location (intron, exon, i.e. boundary, upstream, intergenic, repeat elements—single or in combination); repeat type (motif size, mono- to hexa-nucleotide) or actual repeat motif (by essentially entering up to five repeat motifs); GC% (fixed value or range); repeat size in either base pairs or number of units (fixed value or range); repeat kind (perfect or imperfect). Once insect and location options are selected rest of the fields are set at ‘ALL’ by default. The output is primarily a list of microsatellites annotated for all options of the query sheet and the output table is generated as a hierarchical pre-sorted list. Each microsatellite is given a unique ID that also carries genomic sequence ID and corresponding indices. If the number of microsatellites selected based on the options of the query sheet exceeds 500, the output is split into sets of 500 microsatellites. In addition, a csv file containing total output is also made available for downloading. If the query options do not select any microsatellite, a message indicating zero output is displayed and a back button is provided to refine the options. The table is a ‘one-stop’ output and gives complete information on microsatellites. SSR motif and 100 bp each of the left and right flanking sequences are given for each microsatellite entry, which allows users to carry out sequence analysis of microsatellite vis-à-vis locus. In addition, users can select individual microsatellites to convert them into locus-specific markers. This is facilitated by automatic uploading of repeat and flanking sequences of the selected microsatellite into Primer3 query form (16).
Figure 1

InSatDb organization and implementation.

Figure 2

Screen shots of (A) InSatDb homepage, (B) multi-option query sheet, (C) output table and (D) analysis page.

InSatDb organization and implementation. Screen shots of (A) InSatDb homepage, (B) multi-option query sheet, (C) output table and (D) analysis page.

DATA ANALYSIS

Insect genomes vary greatly in SSR composition, diversity and distribution. Our analysis showed that microsatellite content of five fully sequenced insect genomes is independent of both genome size and GC content (Table 1). The database consists of a dedicated section (Analysis) that describes the types of analysis that can be carried out using the data obtained from InSatDb. Some of the quick observations and inferences from a comparative genomic analysis are given in this section.
Table 1

Microsatellite content of insect genomes

InsectChr (n)Genome size (Mb)GC%Number of repeatsMicrosatellite content (% Genome)Number of microsatellites per Mb genome
Bombyx mori28397.7137.33111 0060.72280
Drosophila melanogaster4118.3642.4563 6371.56538
Anopheles gambiae3287.7940.51150 9361.58525
Apis mellifera16228.4532.28236 4803.411035
Tribolium castaneum10198.0625.5324 2460.41122
Microsatellite content of insect genomes Preponderance of di- and tri-nucleotide repeats is observed in Drosophila and Anopheles, whereas tri- and tetra-nucleotide repeats are abundant in Bombyx and Tribolium. On the whole, shorter microsatellites are abundant in the five insect genomes; as the length of the microsatellite increases their number decreases logarithmically typified by Bombyx and Drosophila microsatellites (>90% of the microsatellites <50 bp); on the other hand, Anopheles and Tribolium have longer microsatellites in a relatively high frequency. Shorter microsatellites not only predominate microsatellite population in the five insect genomes, but also seem to possess higher number of imperfect repeat units. On the other hand, microsatellites spanning >100 bp consisted of perfect, rather than imperfect repeats. Imperfect repeat units originate because of substitutions and indels. Interruptions, if at all, occur mainly in the middle region of the repeat sequence and the ends seem to be selected against decomposition. On the whole, most of the microsatellites occur within 20% GC bracket. There is no linear correlation between GC content and the average number of repeat units. Average length of the microsatellite across GC range is 37 ± 9 bp and between 0 and 5% GC content, microsatellites tend to be longer than 60 bp. Compound microsatellites account for nearly 3.2% in the insect genomes analysed; owing to high density of microsatellites, Apis has higher number of compound loci (6.12%). Anopheles and Apis genomes have as many as 50 and 60% of the total microsatellites in coding region, respectively. Bombyx genome has only 10% of the microsatellites in regions spanning exons, introns and their boundary. More than 70% of the microsatellites present in exons are trinucleotide repeats except in Apis, where 50% tri- and 25% dinucleotide repeats are present in exonic regions. Microsatellites in insects are AT rich (on an average 23.4% GC); however, they exist within regions that are not always AT rich.

DATABASE ACCESS AND FUTURE PERSPECTIVES

InSatDb is freely available through . Incorporation of microsatellite data from additional insects, query facility for better comparative genomic analysis such as gene-based microsatellite extraction and conservation analysis are planned. Additionally, based on users' feedback, supplementary features will be added to make InSatDb a single window system for insect genome analyses using microsatellite tools.
  16 in total

1.  Mobile elements and the genesis of microsatellites in dipterans.

Authors:  J Wilder; H Hollocher
Journal:  Mol Biol Evol       Date:  2001-03       Impact factor: 16.240

2.  Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes.

Authors:  Michele Morgante; Michael Hanafey; Wayne Powell
Journal:  Nat Genet       Date:  2002-01-22       Impact factor: 38.330

3.  Domain-level differences in microsatellite distribution and content result from different relative rates of insertion and deletion mutations.

Authors:  David Metzgar; Li Liu; Christian Hansen; Kevin Dybvig; Christopher Wills
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

4.  Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species.

Authors:  Daniel Dieringer; Christian Schlötterer
Journal:  Genome Res       Date:  2003-10       Impact factor: 9.043

Review 5.  Microsatellites within genes: structure, function, and evolution.

Authors:  You-Chun Li; Abraham B Korol; Tzion Fahima; Eviatar Nevo
Journal:  Mol Biol Evol       Date:  2004-02-12       Impact factor: 16.240

Review 6.  Simple sequence repeats as advantageous mutators in evolution.

Authors:  Yechezkel Kashi; David G King
Journal:  Trends Genet       Date:  2006-03-29       Impact factor: 11.639

7.  Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications.

Authors:  E Nadir; H Margalit; T Gallily; S A Ben-Sasson
Journal:  Proc Natl Acad Sci U S A       Date:  1996-06-25       Impact factor: 11.205

8.  Microsatellites in different eukaryotic genomes: survey and analysis.

Authors:  G Tóth; Z Gáspári; J Jurka
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

9.  Differential distribution of simple sequence repeats in eukaryotic genome sequences.

Authors:  M V Katti; P K Ranjekar; V S Gupta
Journal:  Mol Biol Evol       Date:  2001-07       Impact factor: 16.240

10.  Tandem repeats in protein coding regions of primate genes.

Authors:  Branko Borstnik; Danilo Pumpernik
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

View more
  15 in total

1.  HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data.

Authors:  Alexander Churbanov; Rachael Ryan; Nabeeh Hasan; Donovan Bailey; Haofeng Chen; Brook Milligan; Peter Houde
Journal:  Bioinformatics       Date:  2012-09-06       Impact factor: 6.937

2.  Genome-based microsatellite development in the Culex pipiens complex and comparative microsatellite frequency with Aedes aegypti and Anopheles gambiae.

Authors:  Paul V Hickner; Becky Debruyn; Diane D Lovin; Akio Mori; Susanta K Behura; Robert Pinger; David W Severson
Journal:  PLoS One       Date:  2010-09-30       Impact factor: 3.240

3.  Application of high-resolution DNA melting for genotyping in lepidopteran non-model species: Ostrinia furnacalis (Crambidae).

Authors:  FengBo Li; BaoLong Niu; YongPing Huang; ZhiQi Meng
Journal:  PLoS One       Date:  2012-01-11       Impact factor: 3.240

4.  UgMicroSatdb: database for mining microsatellites from unigenes.

Authors:  Veenu Aishwarya; P C Sharma
Journal:  Nucleic Acids Res       Date:  2007-10-18       Impact factor: 16.971

5.  EuMicroSatdb: a database for microsatellites in the sequenced genomes of eukaryotes.

Authors:  Veenu Aishwarya; Atul Grover; Prakash C Sharma
Journal:  BMC Genomics       Date:  2007-07-10       Impact factor: 3.969

6.  Mosaic genome architecture of the Anopheles gambiae species complex.

Authors:  Rui Wang-Sattler; Stephanie Blandin; Ye Ning; Claudia Blass; Guimogo Dolo; Yeya T Touré; Alessandra delle Torre; Gregory C Lanzaro; Lars M Steinmetz; Fotis C Kafatos; Liangbiao Zheng
Journal:  PLoS One       Date:  2007-11-28       Impact factor: 3.240

7.  Silkworm nucleotide databases--current trends and future prospects.

Authors:  Nicole Koshy; Kangayam M Ponnuvel; Randhir K Sinha; S M H Qadri
Journal:  Bioinformation       Date:  2008-04-19

8.  Analysis of repetitive DNA distribution patterns in the Tribolium castaneum genome.

Authors:  Suzhi Wang; Marcé D Lorenzen; Richard W Beeman; Susan J Brown
Journal:  Genome Biol       Date:  2008-03-26       Impact factor: 13.583

9.  MICdb3.0: a comprehensive resource of microsatellite repeats from prokaryotic genomes.

Authors:  Suresh B Mudunuri; Sujan Patnana; Hampapathalu A Nagarajaram
Journal:  Database (Oxford)       Date:  2014-02-17       Impact factor: 3.451

10.  FishMicrosat: a microsatellite database of commercially important fishes and shellfishes of the Indian subcontinent.

Authors:  Naresh Sahebrao Nagpure; Iliyas Rashid; Rameshwar Pati; Ajey Kumar Pathak; Mahender Singh; Shri Prakash Singh; Uttam Kumar Sarkar
Journal:  BMC Genomics       Date:  2013-09-18       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.