Literature DB >> 18784187

FLYSNPdb: a high-density SNP database of Drosophila melanogaster.

Doris Chen1, Jürg Berger, Michaela Fellner, Takashi Suzuki.   

Abstract

FLYSNPdb provides high-resolution single nucleotide polymorphism (SNP) data of Drosophila melanogaster. The database currently contains 27,367 polymorphisms, including >3700 indels (insertions/deletions), covering all major chromsomes. These SNPs are clustered into 2238 markers, which are evenly distributed with an average density of one marker every 50.3 kb or 6.6 genes. SNPs were identified automatically, filtered for high quality and partly manually curated. The database provides detailed information on the SNP data including molecular and cytological locations (genome Releases 3-5), alleles of up to five commonly used laboratory stocks, flanking sequences, SNP marker amplification primers, quality scores and genotyping assays. Data specific for a certain region, particular stocks or a certain genome assembly version are easily retrievable through the interface of a publicly accessible website (http://flysnp.imp.ac.at/flysnpdb.php).

Entities:  

Mesh:

Year:  2008        PMID: 18784187      PMCID: PMC2686552          DOI: 10.1093/nar/gkn583

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Drosophila melanogaster is one of the most well-studied model organisms due to its short generation time and ease of genetic manipulation. Hence, it is continuously providing major insights into biological processes which are conserved in multicellular organisms. Single nucleotide polymorphisms (SNPs) are widely used as genetic markers in mapping experiments, quantitative trait loci (QTL) analyses, population genetic or evolutionary studies, since they are frequent, mostly phenotypically neutral and molecularly defined. FLYSNPdb contains data of a polymorphism map with an unprecedented resolution of ∼50 kb between SNP markers, which is significantly higher than the density of previous Drosophila SNP maps (1–4). Polymorphisms >1 nt were also counted, including indels, which are particularly useful for genotyping assays based on PCR-product length polymorphisms [PLP; (2)] or denaturing high performance liquid chromatography [DHPLC; (5)] or also for evolutionary analyses (6,7). The map comprises SNPs from five different D. melanogaster stocks (Supplementary Table 1). Since polymorphisms in Drosophila are generally bi-allelic and randomly distributed among the utilized strains, we anticipate that most of our SNP markers can be used to discriminate almost any other pair of Drosophila stocks. FLYSNPdb is part of the FLYSNP website (http://flysnp.imp.ac.at/), which provides detailed information on the practical aspects of SNP mapping and genotyping in Drosophila (8) as well as a user guide for the database, a glossary and protocols. With this database, we want to provide a versatile SNP data resource, which is easy to use and has a user-friendly web interface.

DATA SOURCE

For SNP identification, we designed primer pairs to amplify fragments which are ∼1 kb long (9), equally distributed along each major chromosome arm (X, 2L, 2R, 3L and 3R), and which preferentially lie in unique, non-protein coding regions (Figure 1). Genomic DNA of up to five standard laboratory stocks per amplicon served as template: besides the wild-type stocks Canton S and Oregon R, we selected for each chromosome arm one strain that carries visible recessive markers, one stock with a Flp recombinase target (FRT) element (10) close to the centromere, and one stock with an enhancer-promoter P- (EP) element at the chromosome tip and a visible white+ marker (11) (see also Supplementary Table 1). The wild-type and FRT stocks are commonly utilized in mutagenesis screens, and the recessive marker as well as EP stocks are useful for identification of recombination events in defined chromosomal regions (2,4). PCR products were sequenced in both orientations, each using one of the amplification primers as sequencing primer. In total, >2.3 Mb (1.7%) of the 117 Mb long euchromatic region of the D. melanogaster genome were resequenced and analysed. After sequencing the PCR fragments, the Phred/Cross_match/PolyBayes software package (12–14) was used for trace quality assessment, alignment to the reference genome (strain y; cn bw sp) (15,16), and automated SNP discovery. In order to obtain high-quality data, SNPs at the first and last 75 bases of an amplicon or below Phred score 20 were omitted. In addition, ∼27% of the alignments were visually inspected [with the help of Consed 11.0 (17)], which was particularly necessary for detection of long indels (>6 bases). If multiple sequence reads from the same stock were available at one site, the allele with the highest Phred score was selected. Moreover, SNPs located at adjacent loci were considered as a single polymorphic site. Of the analysed amplicons, 86.9% contained at least one polymorphism in any of the examined stocks. The SNP positions were updated to Release 5 (FB2006_01) of the D. melanogaster genome by aligning 40 bp of the sequences (from Release 3 or 4) flanking each SNP site to the new reference sequence using Blastn (18). Prediction of restriction fragment length polymorphism (RFLP) sites was accomplished with the help of Remap [EMBOSS software suite (19,20)] and the REBASE list of commercially available restriction enzymes with cut sites ≥4 bp (21,22).
Figure 1.

Data source pipeline for SNP identification, data retrieval and curation. Software tools are displayed below each task (for references please see text); if not otherwise stated, costum-made scripts were used.

Data source pipeline for SNP identification, data retrieval and curation. Software tools are displayed below each task (for references please see text); if not otherwise stated, costum-made scripts were used.

DATABASE CONTENT

The FLYSNPdb data set currently comprises >81 700 SNP alleles at 27 367 sites in 2238 amplicons of about 1 kb length (Table 1). One SNP marker contains in average 12 polymorphisms, the maximal SNP count per marker is 73. The average distance between SNP markers is 50.3 kb, a region in which one can find in average 6.6 genes [according to the FlyBase Release 5.10, FB2008_07 annotation (23)]. The biggest gap between markers is 360 kb long and lies at the tip of chromosome arm 3R, between cytological region 82A1 and 82C3. Only 169 polymorphic loci (0.6% of total SNPs) are tri-allelic, the rest is bi-allelic. A total of 13.7% (3743) of the SNPs are indels, which are up to 360 bp long, but predominantly (96.4%) <10 bp (46.6% of the indels are 1 nt long). For any given stock-pair, the average percentage of SNP markers with a sequence divergence between these two stocks is 76.6%, ranging from 35.3% to 92.0%. Furthermore, the database provides information on the molecular and cytological SNP locations for three genome assembly versions [Release 3–5; (15)], together with the 30 bp flanking sequences as additional site identification feature. For data quality assessment, PolyBayes probability scores (14), Phred trace quality scores (12), as well as the number of sequence reads per alignment are available, and manually curated SNPs are indicated. Since non-coding regions are more polymorphic, we have put our focus on non-exonic regions. If SNPs lie within an intron or exon (according to FlyBase Release 5.10), the corresponding gene name is also retrievable. In addition, information on SNP marker amplification primers is available for genotyping assays which are based on sequencing. Polymorphisms that are suitable for RFLP assays (SNPs which result in differential restriction enzyme sites) or for which verified PLP or tag-array mini-sequencing (TAMS) assays (8) are available, are also indicated, including further information like verified primers or suitable restriction enzymes.
Table 1.

Number of SNPs in FLYSNPdb, per chromosome arm and in total

Chromosome armX2L2R3L3RTotal
SNP markers4834434074174882238
SNP sites4720584954025993540327 367
Indels6857557488097463743
Alleles15 96616 50015 71417 48316 12381 786

SNP sites are locations where a differential base has been identified in at least one of the stocks compared to the reference sequence or to another stock. SNP counts include number of Indels. Allele counts reflect called bases in each of the sequenced Drosophila stocks, without the alleles of the reference sequence (which are also available in FLYSNPdb).

Number of SNPs in FLYSNPdb, per chromosome arm and in total SNP sites are locations where a differential base has been identified in at least one of the stocks compared to the reference sequence or to another stock. SNP counts include number of Indels. Allele counts reflect called bases in each of the sequenced Drosophila stocks, without the alleles of the reference sequence (which are also available in FLYSNPdb).

IMPLEMENTATION, USAGE AND ACCESS

All data are organized and stored in a relational database. For increasing the speed of web queries, several summarizing tables were precomputed and put into a MySQL database which is accessed through PHP scripts. The form on the first page asks the user to specify the chromosomal region and two stocks for which data on differential polymorphisms will be retrieved (Figure 2). The region can be indicated as molecular coordinates (position 1 – position 2 or position 1 + length) or as cytological segment (region 1 – region 2). Furthermore, it is possible to select whole chromosome arms by leaving the ‘Location’ field blank, or getting all data by selecting the ‘Browse all’ option. SNP data can be viewed as list of SNP markers (including SNP count, amplification primer sequences) or as table of SNP sites (with alleles, flanking sequences, etc.). Additional information concerning quality scores, genotyping assay suitability or coding information (genic, intronic or exonic) can be optionally selected. For users of the previous FLYSNP database version, old identifiers (ids) are retrievable and a link to this version is provided. On each query result page, sub-selections can be made by clicking on the checkboxes at the left side of each row, or by entering search parameters in the fields below each column (Figure 2). The tables are downloadable, e.g. as tab-separated text files which can be easily imported into commonly used databases or Excel spreadsheets, or as track files which can be uploaded to the FlyBase genome viewer [GBrowse; (24)]. As an additional feature, a link to FlyBase GBrowse is provided for the graphical display of the region previously specified by the user.
Figure 2.

Screenshots of FLYSNPdb input form and query result. On the first page, the user selects chromosomal region and stocks as well as different view options. On the search result page, further features such as table download or sub-queries are available.

Screenshots of FLYSNPdb input form and query result. On the first page, the user selects chromosomal region and stocks as well as different view options. On the search result page, further features such as table download or sub-queries are available.

RECENT AND FUTURE DEVELOPMENTS

The FLYSNPdb data were recently submitted to dbSNP (NCBI, Release 129; http://www.ncbi.nlm.nih.gov/projects/SNP/) so that direct linkage to the FlyBase data repository is feasible. Furthermore, sequence traces and alignments will be provided for users who would like to see the raw data for detailed quality assessment. We are open to help users with their individual needs and will implement suggestions of common use.

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online.

FUNDING

European Union Fifth Framework Programme (QLRI-CT-2001-00004);Boehringer Ingelheim GmbH; Japan Society for the Promotion of Science. Funding for open access charge: IMP.
  24 in total

1.  A general approach to single-nucleotide polymorphism discovery.

Authors:  G T Marth; I Korf; M D Yandell; R T Yeh; Z Gu; H Zakeri; N O Stitziel; L Hillier; P Y Kwok; W R Gish
Journal:  Nat Genet       Date:  1999-12       Impact factor: 38.330

2.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

3.  Primer3 on the WWW for general users and for biologist programmers.

Authors:  S Rozen; H Skaletsky
Journal:  Methods Mol Biol       Date:  2000

4.  EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite.

Authors:  Sue A Olson
Journal:  Brief Bioinform       Date:  2002-03       Impact factor: 11.622

5.  High-resolution SNP mapping by denaturing HPLC.

Authors:  Knud Nairz; Hugo Stocker; Benno Schindelholz; Ernst Hafen
Journal:  Proc Natl Acad Sci U S A       Date:  2002-07-29       Impact factor: 11.205

6.  Genetic mapping with SNP markers in Drosophila.

Authors:  J Berger; T Suzuki; K A Senti; J Stubbs; G Schaffner; B J Dickson
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

7.  Single nucleotide polymorphism markers for genetic mapping in Drosophila melanogaster.

Authors:  R A Hoskins; A C Phan; M Naeemuddin; F A Mapa; D A Ruddy; J J Ryan; L M Young; T Wells; C Kopczynski; M C Ellis
Journal:  Genome Res       Date:  2001-06       Impact factor: 9.043

8.  Haplotype dimorphism in a SNP collection from Drosophila melanogaster.

Authors:  K Teeter; M Naeemuddin; R Gasperini; E Zimmerman; K P White; R Hoskins; G Gibson
Journal:  J Exp Zool       Date:  2000-04-15

9.  A rapid method to map mutations in Drosophila.

Authors:  S G Martin; K C Dobi; D St Johnston
Journal:  Genome Biol       Date:  2001-08-30       Impact factor: 13.583

10.  The genome sequence of Drosophila melanogaster.

Authors:  M D Adams; S E Celniker; R A Holt; C A Evans; J D Gocayne; P G Amanatides; S E Scherer; P W Li; R A Hoskins; R F Galle; R A George; S E Lewis; S Richards; M Ashburner; S N Henderson; G G Sutton; J R Wortman; M D Yandell; Q Zhang; L X Chen; R C Brandon; Y H Rogers; R G Blazej; M Champe; B D Pfeiffer; K H Wan; C Doyle; E G Baxter; G Helt; C R Nelson; G L Gabor; J F Abril; A Agbayani; H J An; C Andrews-Pfannkoch; D Baldwin; R M Ballew; A Basu; J Baxendale; L Bayraktaroglu; E M Beasley; K Y Beeson; P V Benos; B P Berman; D Bhandari; S Bolshakov; D Borkova; M R Botchan; J Bouck; P Brokstein; P Brottier; K C Burtis; D A Busam; H Butler; E Cadieu; A Center; I Chandra; J M Cherry; S Cawley; C Dahlke; L B Davenport; P Davies; B de Pablos; A Delcher; Z Deng; A D Mays; I Dew; S M Dietz; K Dodson; L E Doup; M Downes; S Dugan-Rocha; B C Dunkov; P Dunn; K J Durbin; C C Evangelista; C Ferraz; S Ferriera; W Fleischmann; C Fosler; A E Gabrielian; N S Garg; W M Gelbart; K Glasser; A Glodek; F Gong; J H Gorrell; Z Gu; P Guan; M Harris; N L Harris; D Harvey; T J Heiman; J R Hernandez; J Houck; D Hostin; K A Houston; T J Howland; M H Wei; C Ibegwam; M Jalali; F Kalush; G H Karpen; Z Ke; J A Kennison; K A Ketchum; B E Kimmel; C D Kodira; C Kraft; S Kravitz; D Kulp; Z Lai; P Lasko; Y Lei; A A Levitsky; J Li; Z Li; Y Liang; X Lin; X Liu; B Mattei; T C McIntosh; M P McLeod; D McPherson; G Merkulov; N V Milshina; C Mobarry; J Morris; A Moshrefi; S M Mount; M Moy; B Murphy; L Murphy; D M Muzny; D L Nelson; D R Nelson; K A Nelson; K Nixon; D R Nusskern; J M Pacleb; M Palazzolo; G S Pittman; S Pan; J Pollard; V Puri; M G Reese; K Reinert; K Remington; R D Saunders; F Scheeler; H Shen; B C Shue; I Sidén-Kiamos; M Simpson; M P Skupski; T Smith; E Spier; A C Spradling; M Stapleton; R Strong; E Sun; R Svirskas; C Tector; R Turner; E Venter; A H Wang; X Wang; Z Y Wang; D A Wassarman; G M Weinstock; J Weissenbach; S M Williams; K C Worley; D Wu; S Yang; Q A Yao; J Ye; R F Yeh; J S Zaveri; M Zhan; G Zhang; Q Zhao; L Zheng; X H Zheng; F N Zhong; W Zhong; X Zhou; S Zhu; X Zhu; H O Smith; R A Gibbs; E W Myers; G M Rubin; J C Venter
Journal:  Science       Date:  2000-03-24       Impact factor: 47.728

View more
  7 in total

1.  UP-TORR: online tool for accurate and Up-to-Date annotation of RNAi Reagents.

Authors:  Yanhui Hu; Charles Roesel; Ian Flockhart; Lizabeth Perkins; Norbert Perrimon; Stephanie E Mohr
Journal:  Genetics       Date:  2013-06-21       Impact factor: 4.562

2.  A modifier screen in the Drosophila eye reveals that aPKC interacts with Glued during central synapse formation.

Authors:  Lisha Ma; Louise A Johns; Marcus J Allen
Journal:  BMC Genet       Date:  2009-11-30       Impact factor: 2.797

3.  CoREST acts as a positive regulator of Notch signaling in the follicle cells of Drosophila melanogaster.

Authors:  Elena Domanitskaya; Trudi Schüpbach
Journal:  J Cell Sci       Date:  2012-02-13       Impact factor: 5.285

4.  Phantom, a cytochrome P450 enzyme essential for ecdysone biosynthesis, plays a critical role in the control of border cell migration in Drosophila.

Authors:  Elena Domanitskaya; Lauren Anllo; Trudi Schüpbach
Journal:  Dev Biol       Date:  2013-12-27       Impact factor: 3.582

5.  SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

Authors:  Alexis Dereeper; Stéphane Nicolas; Loïc Le Cunff; Roberto Bacilieri; Agnès Doligez; Jean-Pierre Peros; Manuel Ruiz; Patrice This
Journal:  BMC Bioinformatics       Date:  2011-05-05       Impact factor: 3.307

6.  FlyVar: a database for genetic variation in Drosophila melanogaster.

Authors:  Fei Wang; Lichun Jiang; Yong Chen; Nele A Haelterman; Hugo J Bellen; Rui Chen
Journal:  Database (Oxford)       Date:  2015-08-19       Impact factor: 3.451

7.  The Landscape of A-to-I RNA Editome Is Shaped by Both Positive and Purifying Selection.

Authors:  Yao Yu; Hongxia Zhou; Yimeng Kong; Bohu Pan; Longxian Chen; Hongbing Wang; Pei Hao; Xuan Li
Journal:  PLoS Genet       Date:  2016-07-28       Impact factor: 5.917

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.