| Literature DB >> 20534127 |
Stephen J Goodswen1, Cedric Gondro, Nathan S Watson-Haigh, Haja N Kadarmideen.
Abstract
BACKGROUND: Whole genome association studies using highly dense single nucleotide polymorphisms (SNPs) are a set of methods to identify DNA markers associated with variation in a particular complex trait of interest. One of the main outcomes from these studies is a subset of statistically significant SNPs. Finding the potential biological functions of such SNPs can be an important step towards further use in human and agricultural populations (e.g., for identifying genes related to susceptibility to complex diseases or genes playing key roles in development or performance). The current challenge is that the information holding the clues to SNP functions is distributed across many different databases. Efficient bioinformatics tools are therefore needed to seamlessly integrate up-to-date functional information on SNPs. Many web services have arisen to meet the challenge but most work only within the framework of human medical research. Although we acknowledge the importance of human research, we identify there is a need for SNP annotation tools for other organisms. DESCRIPTION: We introduce an R package called FunctSNP, which is the user interface to custom built species-specific databases. The local relational databases contain SNP data together with functional annotations extracted from online resources. FunctSNP provides a unified bioinformatics resource to link SNPs with functional knowledge (e.g., genes, pathways, ontologies). We also introduce dbAutoMaker, a suite of Perl scripts, which can be scheduled to run periodically to automatically create/update the customised SNP databases. We illustrate the use of FunctSNP with a livestock example, but the approach and software tools presented here can be applied also to human and other organisms.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20534127 PMCID: PMC2901372 DOI: 10.1186/1471-2105-11-311
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Existing SNP annotation tools
| Name | Description | Type+ | Species | External Resources Used |
|---|---|---|---|---|
| SNPit [ | Analyses the potential functional significance of SNPs derived from genome wide association studies. | W | Human | dbSNP, EntrezGene, UCSC Browser, HGMD, ECR Browser, Haplotter, SIFT |
| SNP Function Portal [ | Explores the functional | WD | Human | dbSNP, UniSTS, NCBI ideogram, Entrez Gene, NCBI human genome assembly, HapMap II project |
| Pupasuite [ | SNP prioritization and characterisation | WD | Human | Ensembl, Gene Ontology, OMIM |
| SNPnexus [ | Provides functional annotation for both novel and public SNPs | WD | Human | NCBI RefSeq, UCSC Known Genes, VEGA, AceView |
| LS-SNP/PDB [ | Annotates non-synonymous SNPs mapped to Protein Data Bank structures | WD | Human | UniProtKB, Genome Browser, dbSNP, PDB |
| F-SNP [ | Computationally predicts functional SNPs for disease association studies | WD | Human | PolyPhen, SIFT, SNPeffect, SNPs3D, LS-SNP, ESEfinder, RescueESE, ESRSearch, PESX, Ensembl, TFSearch, Consite, GoldenPath, Ensembl, KinasePhos, OGPET, Sulfinator, GoldenPath |
| FANS [ | Functional analysis of novel SNPs | WD | Human | NCBI, Ensembl, UCSC BLAT, Rescue-ESE |
| SNPer [ | Facilitates the retrieval and use of Human SNPs for high-throughput research purposes. | WD | Human | dbSNP, Goldenpath, |
| SNAP [ | An integrated SNP annotation platform | W | Human | Ensembl, UCSC, Uniprot, UniProt, Pfam, DAS-CBS, MINT, BIND, KEGG, TreeFam |
| SNPeffect [ | Predicts the effect of non-synonymous | W | Human | Ensembl human databases |
| SNPs3D [ | Assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. | W | Human | dbSNP, SWIIS-Prot, RefSeq, Bioscience, Gene Ontology, KEGG, BIND, OMIM, HGMD, Entrez Gene |
+Type = Application type.
W = Web interface to external resources: WD = Web interface to custom database.
Figure 1Schematic to show the process of finding the potential function of significant SNPs. rs2901126 is a supposed significant SNP but is not located on a gene and has no known biological effect (as is the case for most WGAS derived significant SNPs). Is it a false positive? In this example, rs2901126 is in linkage disequilibrium to an untagged SNP (a SNP not on the genotyping array). The untagged SNP can be linked to gene functions that may provide an indication it contributes to variation in a complex trait. Therefore, rs2901126 is potentially a reliable DNA marker.
External resources used to create SNP customised database
| Acronym | Name | Link | Resource |
|---|---|---|---|
| dbSNP | Single Nucleotide Polymorphism database | SNPs | |
| GO | Gene Ontology | Genes and gene product attributes | |
| KEGG | Kyoto Encyclopaedia of Genes and Genomes | Biological pathways | |
| UniProt | Universal Protein Resource | Protein sequences and functional information | |
| QTLdb++ | Animal Quantitative Trait Locus database | Quantitative Trait Loci data (QTL) | |
| OMIA++ | Online Mendelian Inheritance in Animals | Genes, inherited disorders and traits | |
| HomoloGene | Homolog detection |
++Not applicable to Homo sapiens
Figure 2Schematic to show the dbAutoMaker database creation process. dbAutoMaker is a suite of Perl scripts developed to automatically repeat the steps of 'download-decompress-convert-import' to create a local database with data extracted from any number of resources.
Figure 3Database Schema for custom built SNP database.
FunctSNP R Functions
| Name | Description |
|---|---|
| addSpecies | Adds a new species to the list of species recognized by FunctSNP |
| downloadDB | Download pre-assembled species-specific databases |
| getGeneID | Extract gene ID information using SNP IDs or SNP locations |
| getGenes | Extract gene information using SNP IDs or Gene IDs |
| getGenesByDist | Extract gene ID within a specified distance from a SNP |
| getGO | Extract gene ontology using SNP IDs or Gene IDs |
| getHighScoreSNP | Extract highest scoring SNP using SNP IDs or SNP locations |
| getHomolo | Extract homologous genes using SNP IDs or Gene IDs |
| getKEGG | Extract pathway names using SNP IDs or Gene IDs |
| getNearGenes | Find nearest genes to either SNP IDs or SNP locations |
| getOMIA | Extract OMIA using SNP IDs or Gene IDs |
| getProteins | Extract protein information using SNP IDs or Gene IDs |
| getSNPID | Extract SNP ID using Gene IDs or SNP locations |
| getSNPs | Extract SNP information using SNP IDs or Gene IDs |
| getTraits | Extract traits associated with QTL regions using SNP IDs or Gene IDs |
| installedDBs | Displays the local available databases |
| makeDB | Build a species-specific database from external sources |
| setSpecies | Sets the default species |
| supportedSpecies | Displays the supported species |
| userAddedSpecies | Displays the species added by user |
Shows a summary of the output count after 4 program test runs
| Number of SNPs ... | Number of entries found in database for SNPs meeting input criteria ++ | |||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 165 | 165 | 1 | 52 | 23 | 43 | 259 | 47 |
| 2 | 7 | 7 | 1 | 7 | 4 | 19 | 29 | 5 |
| 3 | 15 | 1 | 8 | 15 | 9 | 29 | 114 | 13 |
| 4 | 11 | 0 | 11 | 6 | 10 | 21 | 126 | 6 |
# = test run number$$ criteria determines the number of SNPs input into program
1: all 165 significant SNPs
2: significant SNPs with score greater than 7
3: SNPs with score greater than 11 within 10,000 base pairs from significant SNPs
4: SNP with score greater than 24 located on genes within 10,000 base pairs from significant SNPs.
++ Example: For run 3, the highest scoring SNPs within 10,000 base pairs from 165 significant SNPs were found in database; 15 of the found SNPs met the criteria of score > 11; 1 out of the 15 SNPs found was a WGAS derived SNP; 8 SNPs from the 15 have a score = > 20. 15 SNPs are located on 15 genes with 9 protein products. The genes are involved in 29 pathways, have 114 Gene Ontology terms, and are homologous with 13 human genes.