| Literature DB >> 20624716 |
Lucie N Hutchins1, Yueming Ding, Jin P Szatkiewicz, Randy Von Smith, Hyuna Yang, Fernando Pardo-Manuel de Villena, Gary A Churchill, Joel H Graber.
Abstract
The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/Entities:
Mesh:
Year: 2010 PMID: 20624716 PMCID: PMC2911843 DOI: 10.1093/database/baq008
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 3.A screen capture for the SNP detail page, showing SNPs within the bounds of a transcript, whether in an UTR, intron or CDS. If multiple transcripts have been annotated for a given gene, the results are grouped by whether the change is within the coding sequence. A detail page for intergenic SNPs is available as Supplementary Figure S4.
A summary of the total SNP data included in CGDSNPdb, version 1.3
| Classification | Count |
|---|---|
| Total | 9 686 537 |
| Transition | 6 607 155 |
| Transversion | 3 079 382 |
| Intergenic | 5 617 609 |
| Genic | 4 068 928 |
| Intronic | 3 850 229 |
| Exonic | 247 920 |
| UTR | 112 067 |
| CDS | 126 078 |
| CDS:nonsynonymous | 85 090 |
| CDS:synonymous | 43 698 |
| Noncoding gene exon | 17 032 |
| Noncoding gene intron | 5910 |
A summary of the data for the sources of SNP data in CGDSNPdb, version 1.3
| Source | Available | Loaded | Strains | Genomic mismatch | Duplicate |
|---|---|---|---|---|---|
| Imputed | 7 868 024 | 7 867 856 | 74 | 0 | 0 |
| MusDiv | 584 920 | 548 363 | 72 | 612 | 0 |
| NIEHS | 8 238 764 | 8 230 026 | 16 | 1830 | 22 620 |
| GNF | 156 513 | 155 677 | 76 | 611 | 243 |
| Broad | 138 602 | 138 594 | 48 | 233 | 9 |
| Celera | 2 122 060 | 2 122 059 | 5 | 0 | 0 |
| Paigen | 24 608 | 24 608 | 50 | 0 | 0 |
| Wild Derived | 667 | 667 | 37 | 0 | 0 |
Conflicts in SNP genotypes between different data sources
| NIEHS | MusDiv | Broad | GNF | Celera | Paigen | Wild derived | |
|---|---|---|---|---|---|---|---|
| 17 339 | 393 768 | 2793 | 5065 | 19 914 | 5374 | 19 | |
| 37 916 | 5526 | 10 141 | 18 118 | 1048 | 0 | ||
| 21 222 | 46 475 | 1343 | 2404 | 0 | |||
| 354 | 886 | 247 | 0 | ||||
| 3292 | 6138 | 1 | |||||
| 728 | 1 | ||||||
| 0 |
Each cell counts for the number of SNPs where the indicated sources disagree on at least one strain genotype.
Figure 1.A screen capture of the CGDSNP interface search input form.
Figure 2.A screen capture of the standard summary page for SNP search, specifically showing results for the tumor suppressor P53, using the imputed data resource. Background colors in the genotype table represent the specific nucleotide and the imputed confidence level, with darkest colors representing the highest confidence.