| Literature DB >> 23390980 |
Anthony G Doran1, Christopher J Creevey.
Abstract
BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data.Entities:
Mesh:
Year: 2013 PMID: 23390980 PMCID: PMC3574845 DOI: 10.1186/1471-2105-14-45
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The number of SNP annotations (ss#) in dbSNP for species with a reference sequence available from ensembl and at least one SNP annotation in dbSNP (build 137)
| 60480978 | |
| 15721131 | |
| 10016093 | |
| 9587248 | |
| 5227114 | |
| 3328578 | |
| 3295452 | |
| 3041918 | |
| 1751345 | |
| 1660250 | |
| 1441888 | |
| 1319269 | |
| 1194131 | |
| 1163580 | |
| 903110 | |
| 566003 | |
| 327037 | |
| 331438 | |
| 9256 | |
| 2140 | |
| 1644 | |
| 10 | |
| 5 |
Figure 1Overview for using SNPdat and additional scripts available. (A) Retrieval of GTF and FASTA information using GTF_FASTA_finder.pl. (B) Retrieval processing of data from dbSNP using dbSNP_finder.pl and SNPdat_parse_dbSNP.pl. (C) Command line options used to specify input/output files for SNPdat.
Summary description of the annotations provided by SNPdat
| 1 | The queried SNPs chromosome ID | CHR25 |
| 2 | The queried SNPs genomic location | 286966 |
| 3 | Whether or not the SNP was within a feature | Y |
| 4 | Region containing the SNP; either exonic, intronic, or intergenic | Exonic |
| 5 | Distance to nearest feature | NA |
| 6 | Either the closest feature to the SNP or the feature containing the SNP | CDS |
| 7 | The number of different features that the SNP is annotated to | 2 |
| 8 | The number of annotations of the current feature | [1/1] |
| 9 | Start of feature (bp) | 286859 |
| 10 | End of feature (bp) | 287050 |
| 11 | The gene ID for the current feature | ENSBTAG00000016571 |
| 12 | The gene name for the current feature | ITFG3_BOVIN |
| 13 | The transcript ID for the current feature | ENSBTAT00000022045 |
| 14 | The transcript name for the current feature | ITFG3_BOVIN |
| 15 | The exon that contains the current feature and the total number of annotated exons for the gene containing the feature | [3/11] |
| 16 | The strand sense of the feature | + |
| 17 | The annotated reading frame (when contained in GTF) | 2 |
| 18 | The reading frame estimated by SNPdat | NA |
| 19 | The estimated number of stop codons in the estimated reading frame | 0 |
| 20 | The codon containing the SNP, position in the codon and reference base and mutation | C[C/G]T |
| 21 | The amino acid for the reference codon and new amino acid with mutation in place | [P/R] |
| 22 | Whether or not the mutation is synonymous | N |
| 23 | The protein ID for the current feature | ENSBTAP00000022045 |
| 24 | The RS identifier for queries that map to known SNPs | rs134558771 |
| 25 | Error messages, warnings etc. | NA |
The number of SNPs annotated to different regions by SNPdat and Annovar
| Coding | 299 | 299 |
| 3 prime UTR | 108 | 105 |
| 5 prime UTR | 29 | 28 |
| Intronic | 3285 | 3284 |
| Intergenic | 845 | 845 |
| Misc. | 0 | 5 |
| Total | 4566 | 4566 |
Misc features include non-coding RNA and splicing. These features were not included in the GTF version of the ensGene annotation file and so SNPdat was unable to identify them as such.
Figure 2Sample of plots obtained using the results of SNPdat. (A) The number of non-synonymous (black) and total number of exonic SNPs (grey) found on each chromosome. (B) Distances of intergenic SNPs, upstream (black) and downstream (grey) to the nearest transcripts. (C) Synonymous versus non-synonymous SNPs: 231 exonic SNPs were non-synonymous. 96 (41.56%) in the first codon position, 103 (44.59%) in the second codon position and 32 (13.85%) in the third codon position. (D) Distances of Intronic SNPs to the nearest exon.