| Literature DB >> 21037260 |
Scott F Saccone1, Jiaxi Quan, Gaurang Mehta, Raphael Bolze, Prasanth Thomas, Ewa Deelman, Jay A Tischfield, John P Rice.
Abstract
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.Entities:
Mesh:
Year: 2010 PMID: 21037260 PMCID: PMC3013662 DOI: 10.1093/nar/gkq1054
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Descriptions of the tasks used in our classification scheme for the tables in the dbSNP database
| Task | Description |
|---|---|
| Submission | Determine the source of the submission such as specific laboratories or researchers, the populations used, any associated publications and how the submissions cluster into ‘reference SNP’ identification numbers (a group of ‘ss’ SNP IDs correspond to a unique ‘rs’ ID via the table |
| Experimental methods | Determine the experimental methods used to produce the data, such as direct DNA sequencing, DNA hybridization and DHPLC (denaturing high pressure liquid chromatography). |
| Validation | Assess the reliability of the information and evaluate whether or not a reported variant is truly a genetic polymorphism or is just an experimental artifact. Methods include determining if there are multiple submissions with at least one non-computational observation and confirmation by observation of positive frequency in a genotyped sample. |
| Classification | Determine if the variants are classified as being a true SNP, insertion, deletion and so on. |
| Sample information | Retrieve information on the biological samples used, such as ethnicity and the number of samples used for a submission. |
| Alleles and frequency data | Retrieve the alleles observed for the variant, which DNA strand was used and the frequencies of the alleles and genotypes in various populations. |
| Genome mapping | Retrieve information on how the variants map to various reference genomes, such as the physical mapping coordinates and the quality of the alignments. |
| Genes and function | Retrieve information on relationships between the variants and genes, such as SNP/gene transcript functional properties (missense mutations, frameshifts, UTR regions and so on). |
| Flanking sequence | Retrieve the flanking DNA sequences used to define the variant. This can be useful when conducting custom genotyping experiments for variants not represented by commercial SNP microarrays. |
| Individual genotyping | Retrieve submitted individual genotypes. |
| Summary information | Retrieve summary information for a reference SNP ID—an amalgamation the tasks above. |
Figure 1.The tasks and corresponding dbSNP tables from our classification scheme. A tree structure is used to partially represent the relationships between the tables. All tables except those listed under ‘Local Tables’ are directly from dbSNP. Tables with asterisks have names in the dbSNP database that are prefixed by the dbSNP build and suffixed by the representative genome build, such as b131_ContigInfo_37_1 in build 131 of the human database.