| Literature DB >> 17239255 |
Eli Reuveni1, Vasily E Ramensky, Cornelius Gross.
Abstract
BACKGROUND: The mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has been much more difficult. The recent availability of genomic sequence from several mouse inbred strains (including C57BL/6J, 129X1/SvJ, 129S1/SvImJ, A/J, and DBA/2J) has made it possible to catalog DNA sequence differences within a quantitative trait locus derived from crosses between these strains. However, even for well-defined quantitative trait loci (<10 Mb) the identification of candidate functional DNA sequence changes remains challenging due to the high density of sequence variation between strains. DESCRIPTION: To help identify functional DNA sequence variations within quantitative trait loci we have used the Ensembl annotated genome sequence to compile a database of mouse single nucleotide polymorphisms (SNPs) that are predicted to cause missense, nonsense, frameshift, or splice site mutations (available at http://bioinfo.embl.it/SnpApplet/). For missense mutations we have used the PolyPhen and PANTHER algorithms to predict whether amino acid changes are likely to disrupt protein function.Entities:
Mesh:
Year: 2007 PMID: 17239255 PMCID: PMC1797019 DOI: 10.1186/1471-2164-8-24
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Classification of single nucleotide polymorphisms (SNPs) derived from the comparison of four mouse inbred strains.
| non-synonymous | 9,079 (96.6%) | 2,956 (90.3%) | |
| stop-gained | 153 (1.7%) | 153 (4.7%) | |
| stop-lost | 26 (0.3%) | 26 (0.8%) | |
| splice-site | 129 (1.4%) | 129 (4%) | |
| frameshift | 13 (0.2%) | 13 (0.4%) | |
| Total | 9,400 | 3,277 | |
| non-synonymous | 4,084 (96.5%) | 1,597(92.1%) | |
| stop-gained | 73 (1.8%) | 67 (3.9%) | |
| stop-lost | 7 (0.2%) | 7 (0.5%) | |
| splice-site | 68 (1.7%) | 63 (3.7%) | |
| frameshift | 1 (0.1%) | 1 (0.1%) | |
| Total | 4233 | 1,735 | |
| non-synonymous | 12,435 (96.2%) | 3,653 (90.7%) | |
| stop-gained | 262 (2.1%) | 191 (4.8%) | |
| stop-lost | 22 (0.2%) | 21 (0.6%) | |
| splice-site | 182 (1.5%) | 154 (3.9%) | |
| frameshift | 27 (0.3%) | 11 (0.3%) | |
| Total | 12,928 | 4,030 |
Total number and frequency of Ensembl predicted missense, stop-gained, stop-lost, frameshift, and splice site SNPs for C57BL/6J vs. DBA/2J, C57BL/6J vs. 129S1/SvImJ, and C57BL/6J vs. A/J.
Figure 1Mouse SNP Miner database structure. The core database consists of a MySQL relational database containing information associated with predicted functional mouse SNPs from a selected set of mouse inbred strains. A web-based Java Applet module allows querying, visualization, and downloading of information from the database. Basic information about SNP sequence, location, functional consequence, and associated transcript are derived from private and public sequencing efforts via the dbSNP mouse polymorphism collection mapped onto the annotated Ensembl genome. Additional SNP information was extracted from the OMIM, Symatlas, and GO databases. GO clustering by GeneMerge is queried directly by the Applet viewer prior to downloading. PolyPhen assessment of missense mutation consequence was based on Nrdb orthologous protein alignments, PDB structure information, and protein functional annotation from Uniprot. PANTHER assessment of missense mutation consequence was based on a set of HMM protein alignments. Bold font and arrows pointing out of the database indicate the existence of direct web links from our database to associated database entries.
Summary of PolyPhen and PANTHER annotations of missense SNPs
| Deleterious | 1,755 (19.4%) | 954 (26.1%) | ||
| Benign | 6,757 (74.5%) | 2,343 (64%) | ||
| Unknown | 567 (6.3%) | 367 (10.1%) | ||
| Total | 9,079 | 3,664 | ||
| Deleterious | 1,285 (14.2%) | 632(17.1%) | ||
| non-deleterious | 4,678 (51.6%) | 1,736 (47%) | ||
| Unknown | 3,116 (34.4%) | 1,329 (36%) | ||
| Total | 9,079 | 3,697 | ||
| Deleterious | 796 (19.5%) | 480 (24.9%) | ||
| Benign | 2,995 (73.4%) | 1,270(65.7%) | ||
| Unknown | 293 (7.2%) | 185 (9.6%) | ||
| Total | 4,084 | 1,935 | ||
| Deleterious | 568 (14%) | 327(16.9%) | ||
| non-deleterious | 2,049 (50.2%) | 913 (47.1%) | ||
| Unknown | 1,467 (36%) | 701 (36.2%) | ||
| Total | 4,084 | 1,941 | ||
| Deleterious | 2,350 (18.9%) | 1,197(26.1%) | ||
| Benign | 9,220 (74.2%) | 2,889 (63%) | ||
| Unknown | 865 (7%) | 504 (11%) | ||
| Total | 12,435 | 4,590 | ||
| Deleterious | 1,785 (14.4%) | 798(17.1%) | ||
| non-deleterious | 6,443 (51.9%) | 2,154(46.1%) | ||
| Unknown | 4,207 (33.9%) | 1,724 (36.9%) | ||
| Total | 12,435 | 4,676 |
According to PolyPhen, 20% of missense mutations contained in the database are predicted to be deleterious (either 'possibly' or 'probably' damaging) to protein function. According to PANTHER, 14% of missense mutations contained in the database are predicted to be deleterious to protein function.
| PANTHER | |||
| Deleterious | 1,647 | 2,092 | 73 |
| not-deleterious | 1,857 | 11,569 | 310 |
| Not predicted | 1,818 | 5,888 | 1,889 |
PolyPhen and PANTHER predictions overlap significantly, with 6.1% of missense mutations categorized as detrimental by both algorithms.
Summary of OMIM annotations of predicted functional SNPs.
| non-synonymous (damaging SNPs) | 803 | 859 |
| stop-gained | 90 | 99 |
| stop-lost | 14 | 15 |
| splice-site | 114 | 116 |
| frameshift | 36 | 37 |
Greater than 15% of genes containing at least one predicted functional SNP (splice site, frame shift, STOP-gain, STOP-lost, deleterious missense according to Polyphen) in the database have human orthologs found in the OMIM database of disease-associated mutations.
Figure 2Web-based access to Mouse SNP Miner database. (not shown) 'SNP Query' mode allows selection of strain or strain group for comparison. Searching can be constrained by chromosomal interval, gene name, SNP accession number, and presence of human ortholog in OMIM database. QTL from the MGI database can be searched by name or keyword and associated chromosome intervals imported for convenient screening. (shown) 'SNP View' mode presents results from the search in a graphical format for convenient run-time scanning. SNPs in the interval are listed by functional consequence and PolyPhen/PANTHER prediction in the upper left and can be rapidly added or removed by clicking on the associated box. Boxes indicating transcripts and lines indicating SNPs are color and symbol coded by functional consequence in the graphical display. The placement of marks above and below the bar indicates transcripts in the forward and reverse strand, respectively. Clicking on a SNP causes detailed SNP information to be displayed in the 'Details' window above. The 'Associations' window displays GO, OMIM, and PolyPhen/PANTHER information and links for the selected SNP. Movement across the chromosome and between SNPs is facilitated by buttons at the bottom of the graphical display. In the example shown, a search has been performed for putative functional SNPs differing between C57BL/6J and all 129 strains (129% allows global searching of 129 strains) for the interval 153,482,802–154,678,264 bp on chromosome 4. A deleterious mutation (Ile706Thr) in the fourth transmembrane domain of Tas1r3 is highlighted.