Literature DB >> 16845090

SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences.

Areum Han¹, Hyo Jin Kang, Yoobok Cho, Sunghoon Lee, Young Joo Kim, Sungsam Gong.

Abstract

The single nucleotide polymorphisms (SNPs) in conserved protein regions have been thought to be strong candidates that alter protein functions. Thus, we have developed SNP@Domain, a web resource, to identify SNPs within human protein domains. We annotated SNPs from dbSNP with protein structure-based as well as sequence-based domains: (i) structure-based using SCOP and (ii) sequence-based using Pfam to avoid conflicts from two domain assignment methodologies. Users can investigate SNPs within protein domains with 2D and 3D maps. We expect this visual annotation of SNPs within protein domains will help scientists select and interpret SNPs associated with diseases. A web interface for the SNP@Domain is freely available at http://snpnavigator.net/ and from http://bioportal.net/.

Entities: Disease Gene Mutation Species

Mesh：

Year: 2006 PMID： 16845090 PMCID： PMC1538855 DOI： 10.1093/nar/gkl323

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

To facilitate the identification of disease-associated single nucleotide polymorphisms (SNPs) from a large number of SNPs, it is important to select functionally relevant SNPs (1). There are many SNP annotation servers and databases, such as FESD (), PicSNP (), SNPper () and SNPs3D (). These are useful for selecting SNPs without a priori biological knowledge (2–13). They help biologists focus on specific genomic/proteomic regions or gene sets providing functional annotations and visualization. The SNPs in conserved protein regions have been thought to be strong candidates that can alter protein functions (8,11). However, up to now, there have been no web servers that provide extensive protein domain annotation of SNPs. Currently, Ensembl (14) provides domain annotation of SNPs assigned by Pfam (15), PROSCAN (16) and PFscan (17). However, these protein domains are all sequence-based functional domains that are based on protein sequence profiles. Structure-based approaches define domains according to the compactness and conservation of protein structural regions (18) while sequence-based domain databases constructed based on sequence similarity of proteins implied evolutionary relationships (19,20). If a structure-based domain family and sequence-based domain family are defined at the same location over the same set of protein chains, they should map exactly to each other in a protein. However, it has been known that they have conflicts (19,20). SCOP (21) is a representative structure-based classification database for Protein Data Bank (PDB) (22). They list all the proteins with known structures and organize them hierarchically. Pfam (15) is a representative sequence-based domain database that contains hidden Markov model-based profiles of many common protein domains constructed using multiple sequence alignments. Previously, Elofsson's group (19) reported that 70% of SCOP families exist in Pfam, while 57% of Pfam families exist in SCOP. Recent research conducted by Zhang's group (20) shows that 80% of SCOP domains overlap with at least one Pfam family. These SCOP domain families correspond to 99.7% of the Pfam families. Although the overlaps increased (SCOP, from 70 to 80%; and Pfam, from 57 to 99.7%), partial mapping between SCOP and Pfam domain could still occur. Zhang's group reported that only 62% of the cases of one-to-one mapping of a SCOP domain to a Pfam domain agreed by 90% or more of their coverage (20). Since a non-synonymous SNP can correspond to an amino acid change, it is necessary to have a good protein domain annotation and visualization server. Here, we introduce the SNP@Domain server providing information for SNPs found within protein domains. SNP@Domain contains all the human SNPs from dbSNP (23) that match SCOP and Pfam domain sequences that are assigned to Ensembl database proteins. A 2D map of Pfam and SCOP domains with SNPs is provided. Additionally, a 3D map of SNPs within domains is provided if protein structures are available.

METHODS AND USAGE

Identifying SNPs within protein domains

We annotated protein domains to human proteins in the Ensembl database () and mapped whole SNPs from dbSNP () (23). Since the Ensembl database provides Pfam domain annotation information, we performed a structure-based domain assignment using the PDB-ISL method (24,25) using SCOP version 1.65 and Ensembl human proteins. Domains were classified by keeping BLAST-matched regions having an E-value 1e − 4 or lower. In total, 17 639 SNPs within SCOP and 28 238 SNPs within Pfam domains were identified. Furthermore, 4226 (12.39%) human proteins had at least one SNP within SCOP domain regions and 6781 (19.88%) human proteins had at least one SNP within Pfam domain regions. Two useful annotations of SNPs were parsed with Perl scripts, and subsequently imported into a MySQL relation database including (i) the effects of SNPs predicted by the Sorting Intolerant from Tolerant Server (SIFT; ) (11) and (ii) the relationships between SNPs and diseases from the Online Mendelian Inheritance in Man () (26) database.

Two- and three-dimensional maps of SNPs within protein domains

SNP@Domain is a web-based tool that was constructed using Java Server Pages and Perl Common Gateway Interface scripts. SNP@Domain provides three query interfaces as shown in Figure 1: (i) SNP identifier (rs number), (ii) gene identifier (Ensemble protein ID) and (iii) domain identifier (SCOP concise classification strings ID or Pfam ID). SNP@Domain also supports keyword searches with gene and/or domain names. When the user accesses it with a queried SNP or a gene name, the 2D image map of SNPs within protein domains is displayed as shown in Figure 2. This 2D image map utilizes the Generic Genome Browser (Gbrowse; ), originally developed by Stein's group (27). The 2D map has four kinds of horizontal tracks corresponding to SCOP domains, Pfam domains, synonymous and non-synonymous SNPs within a protein. For convenience, synonymous SNPs and non-synonymous SNPs are displayed separately. The queried SNPs are highlighted in the map so they can be easily distinguished. Each SNP in the 2D map links to detailed information of the SNP such as chromosomal position, class, validation, alleles, effects predicted by SIFT server and relationships with disease(s), if available. If the structure of the protein is available in the PDB, SNP@Domainprovides a 3D view of the protein highlighting the amino acids affected by SNPs. To avoid sequence conflicts between an Ensembl protein sequence and a PDB sequence, SNP@Domain carries out a BLAST with a query of Ensemble protein sequence against a protein sequence from PDB and parsed hits. We use MDL Chime plugin () for visualizing 3D structures of proteins which was developed based on RasMol () (28).

Figure 1

Search interface of SNP@Domain. The user is able to search SNP domain annotations with three inputs including (i) SNP identifier (rs number), (ii) Gene identifier (Ensembl protein ID) or name/symbol, and (iii) Domain identifier (SCOP concise classification strings ID or Pfam ID) or name.

Figure 2

An example of detail information and image maps of an SNP within protein domains. Following the user's query to the SNP (rs number = ‘rs3088308’), the SNP's detail information including chromosomal location, class, validation and alleles were displayed. And a summary of domain mapping results and a corresponding 2D image map were shown up. Four tracks of the 2D image map were displayed including (i) Pfam domain, (ii) SCOP domain, (iii) synonymous SNPs and (iv) non-synonymous SNPs within the protein. The 3D image map of the SNP is also available.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

28 in total

1. topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association.

Authors: Nathan O Stitziel; T Andrew Binkowski; Yan Yuan Tseng; Simon Kasif; Jie Liang
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

3. SNPper: retrieval and analysis of human SNPs.

Authors: A Riva; I S Kohane
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

4. A flexible motif search technique based on generalized profiles.

Authors: P Bucher; K Karplus; N Moeri; K Hofmann
Journal: Comput Chem Date: 1996-03

Review 5. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

6. RASMOL: biomolecular graphics for all.

Authors: R A Sayle; E J Milner-White
Journal: Trends Biochem Sci Date: 1995-09 Impact factor: 13.807

7. SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors: A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal: J Mol Biol Date: 1995-04-07 Impact factor: 5.469

8. The PROSITE database, its status in 1995.

Authors: A Bairoch; P Bucher; K Hofmann
Journal: Nucleic Acids Res Date: 1996-01-01 Impact factor: 16.971

9. Human non-synonymous SNPs: server and survey.

Authors: Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal: Nucleic Acids Res Date: 2002-09-01 Impact factor: 16.971

10. Target SNP selection in complex disease association studies.

Authors: Matthias Wjst
Journal: BMC Bioinformatics Date: 2004-07-12 Impact factor: 3.169

9 in total

1. CanProVar: a human cancer proteome variation database.

Authors: Jing Li; Dexter T Duncan; Bing Zhang
Journal: Hum Mutat Date: 2010-03 Impact factor: 4.878

2. Meet me halfway: when genomics meets structural bioinformatics.

Authors: Sungsam Gong; Catherine L Worth; Tammy M K Cheng; Tom L Blundell
Journal: J Cardiovasc Transl Res Date: 2011-02-25 Impact factor: 4.132

Review 3. Towards precision medicine: advances in computational approaches for the analysis of human variants.

Authors: Thomas A Peterson; Emily Doughty; Maricel G Kann
Journal: J Mol Biol Date: 2013-08-17 Impact factor: 5.469

4. SNP@Ethnos: a database of ethnically variant single-nucleotide polymorphisms.

Authors: Jungsun Park; Sohyun Hwang; Yong Seok Lee; Sang-Cheol Kim; Doheon Lee
Journal: Nucleic Acids Res Date: 2006-11-28 Impact factor: 16.971

5. Inferring selection on amino acid preference in protein domains.

Authors: Alan M Moses; Richard Durbin
Journal: Mol Biol Evol Date: 2008-12-18 Impact factor: 16.240

6. An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

Authors: Jin Ok Yang; Sohyun Hwang; Jeongsu Oh; Jong Bhak; Tae-Kwon Sohn
Journal: BMC Bioinformatics Date: 2008-12-12 Impact factor: 3.169

7. SNP@Promoter: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions.

Authors: Byoung-Chul Kim; Woo-Yeon Kim; Daeui Park; Won-Hyong Chung; Kwang-Sik Shin; Jong Bhak
Journal: BMC Bioinformatics Date: 2008 Impact factor: 3.169

8. Structure SNP (StSNP): a web server for mapping and modeling nsSNPs on protein structures with linkage to metabolic pathways.

Authors: Alper Uzun; Chesley M Leslin; Alexej Abyzov; Valentin Ilyin
Journal: Nucleic Acids Res Date: 2007-05-30 Impact factor: 16.971

9. Prediction and prioritization of rare oncogenic mutations in the cancer Kinome using novel features and multiple classifiers.

Authors: ManChon U; Eric Talevich; Samiksha Katiyar; Khaled Rasheed; Natarajan Kannan
Journal: PLoS Comput Biol Date: 2014-04-17 Impact factor: 4.475

9 in total