| Literature DB >> 23618375 |
Konstantinos Karagiannis1, Vahan Simonyan, Raja Mazumder.
Abstract
Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a protein- or gene-centric fashion. Unfortunately, proteome-wide analysis of non-synonymous single-nucleotide variations (nsSNVs) is not easy to perform because information on nsSNVs and functionally important sites are not well integrated both within and between databases and their search engines. We have developed SNVDis that allows evaluation of proteome-wide nsSNV distribution in functional sites, domains and pathways. More specifically, we have integrated human-specific data from major variation databases (UniProtKB, dbSNP and COSMIC), comprehensive sequence feature annotation from UniProtKB, Pfam, RefSeq, Conserved Domain Database (CDD) and pathway information from Protein ANalysis THrough Evolutionary Relationships (PANTHER) and mapped all of them in a uniform and comprehensive way to the human reference proteome provided by UniProtKB/Swiss-Prot. Integrated information of active sites, pathways, binding sites, domains, which are extracted from a number of different sources, provides a detailed overview of how nsSNVs are distributed over the human proteome and pathways and how they intersect with functional sites of proteins. Additionally, it is possible to find out whether there is an over- or under-representation of nsSNVs in specific domains, pathways or user-defined protein lists. The underlying datasets are updated once every 3months. SNVDis is freely available at http://hive.biochemistry.gwu.edu/tool/snvdis.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23618375 PMCID: PMC3807806 DOI: 10.1016/j.gpb.2012.10.003
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1SNVDis data integration model Additional data can be easily integrated into the database if they are mapped to any sequence database accession numbers or identifiers.
Figure 2Distribution of nsSNVs A. The total number of nsSNVs from different databases that fall inside active sites as annotated by UniProt or CDD. B. Similarly, the number of nsSNVs that can be found inside binding site regions defined by UniProt or CDD. Notice that in both cases, all databases have a large percentage of unique entries.