Literature DB >> 15980479

MutDB services: interactive structural analysis of mutation data.

Jessica Dantzer¹, Charles Moad, Randy Heiland, Sean Mooney.

Abstract

Non-synonymous single nucleotide polymorphisms (SNPs) and mutations have been associated with human phenotypes and disease. As more and more SNPs are mapped to phenotypes, understanding how these variations affect the function and expression of genes and gene products becomes an important endeavor. We have developed a set of tools to aid in the understanding of how amino acid substitutions affect protein structures. To do this, we have annotated SNPs in dbSNP and amino acid substitutions in Swiss-Prot with protein structural information, if available. We then developed a novel web interface to this data that allows for visualization of the location of these substitutions. We have also developed a web service interface to the dataset and developed interactive plugins for UCSF's Chimera structural modeling tool and PyMOL that integrate our annotations with these sophisticated structural visualization and modeling tools. The web services portal and plugins can be downloaded from http://www.lifescienceweb.org/ and the web interface is at http://www.mutdb.org/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2005 PMID： 15980479 PMCID： PMC1160165 DOI： 10.1093/nar/gki404

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Understanding how missense single nucleotide polymorphisms (SNPs) affect the function of proteins is an important research area that is being studied using genetics, biochemistry, evolutionary biology and bioinformatics (1–4). Efficient identification of SNPs would be useful for SNP selection for genetic studies, understanding the molecular basis of disease, and predicting the effects of in vitro and in vivo mutagenesis experiments. Several web resources are available for the prediction or classification of mutation effects. Two notable examples are SIFT (), which utilizes evolutionary information from homologous proteins (5) and PolyPhen (), which incorporates structural information into classification rules (6). Other resources include SNP3D (), SNPeffect (), PicSNP () and TopoSNP (). Additionally, several projects have focused on using machine learning methods to classify deleterious mutations (7–13). One area of particular interest is in understanding how mutations and missense SNPs affect the structure of the proteins in which they are encoded. We have developed MutDB as a resource for scientists to identify the likely underlying molecular effects of a mutation, and to visualize the location of mutations upon protein structures (14). To enable researchers to investigate whether structural information is available, we are providing a website and novel web services for visualizing mutation sites, as well as an infrastructure for annotating these mutations. To do this, we have annotated all missense SNPs in dbSNP (15) and the mutations in the Swiss-Prot (16) database with structural information, if available. We also built a website to access and visualize these annotations. In addition, we developed a web service API, using the SOAP protocol, for accessing our annotations, and extended two applications, Chimera () (17) and PyMOL () for interactive visualization of the annotation sets.

METHODS AND USAGE

Database annotations

Structure annotations are determined for the mutation data in two publicly available databases, Swiss-Prot and dbSNP. The flat files containing the data housed in these databases were downloaded to a local machine then parsed using Perl and the BioPerl toolkit (18), when appropriate. Several pieces of information were mined for each SNP: the original source identification number, the associated protein or mRNA sequence, the location of the SNP, the wild-type and mutant amino acids involved in the exchange, and the nucleotides involved in the exchange. For positions found in the Swiss-Prot database, any related PubMed articles were also cataloged. This data was stored in a local MySQL database. Tables in the database were created to hold the most recent updates for the data in MutDB, as well as to hold the annotations of associated PDB (19) structures mapped to each gene. All of these tables are maintained and updated by several Perl scripts. For each gene containing an SNP, it was desirable to find any associated protein structures. Using the protein sequences recorded from the Swiss-Prot and dbSNP databases, a BLAST search was run for both the wild-type and mutant amino acid sequences. Results from these searches were only kept if they were identical to the query sequence. Local copies of PDB files were then used to find the exactly matching positions within the protein sequence, using a pattern-matching algorithm and a pairwise alignment.

Web interface

MutDB is a web-based tool, built using several Perl CGI scripts. Navigation is possible through browsing the list of genes or searching by several different parameters, including the NCBI protein ID, Swiss-Prot gene ID, gene symbol and Refseq mRNA ID. A keyword search is also available, to find genes as they relate to particular diseases or by their full name. The genes listed are from the UCSC human genome database and are displayed in alphabetic order. Each gene link takes the user to a page listing all available SNP data. A map of all SNPs to the gene in question is shown, as well as a catalog of each SNP. A few other pieces of information that may be useful are also displayed, such as the chromosome that the gene can be found on and links to pages in other databases, including the many parts of NCBI's website and Swiss-Prot. The SNPs are divided into three categories, mutations, synonymous mutations and non-coding SNPs. For each SNP in the list, the source ID, wild-type and mutant amino acids, sequence location and any PubMed document ID numbers are displayed. The source ID values also serve as links to separate pages about each SNP. On each SNP page, the same information as was displayed on the relevant gene page is shown, plus some additional data. Relevant PDB structures are presented, along with a way to display them via the Jmol visualization tool (). The mutations are highlighted in the structure so that they are more easily recognized. Also for each SNP, the amino acid sequence, if it is known, is given, with the actual mutation shown in red. These pages also include links to the original source data from Swiss-Prot or dbSNP.

Web service

Web service is a standard for providing application-to-application communication over the Internet. A web service is any service that is available over the Internet, uses messages encoded in XML (eXtensible Markup Language) and is independent of any operating system and programming language. SOAP is the current standard protocol (XML-based) for communication, and the XML-based web services Description Language is the mechanism for describing services. Web services are becoming more prevalent in the bioinformatics community. Some examples include the new (beta) RCSB PDB (19), KEGG (20) and BioMOBY (21). The web services for MutDB represent the contents of our resources and can easily be interfaced with other existing biological databases and applications. Our core services allow access to mutation information from either a protein structure or a gene-based perspective. This facilitates visualization clients by allowing them to easily map mutations of interest to the protein structure being viewed. Our service interface is published on our distribution site and may be extended as new feature annotations are included. Attributes are declared to represent mutations, irrespective of whether it is an SNP or a Swiss-Prot mutation. These attributes include, but are note limited to, the source identification number, amino acid position, wild-type amino acid, mutant-type amino acid, phenotype and PubMed references if they exist. Structural attributes are encoded as the PDB code, chain and the starting/ending residue index. These attributes allow for a trivial mapping to an object-oriented representation of the mutations. In addition to the MutDB services, several other utility services are offered. This includes a PDB to Gene (and vice versa) mapping service that has been developed to better facilitate the data. For example, a researcher could use this functionality to map mutations involving their gene of interest to a structure.

Visualization clients

To readily take advantage of the MutDB services, we provide clients that are plugins for use in two well-developed molecular graphics tools: PyMOL and UCSF's Chimera. Upon installation, a user has full access to the contents of MutDB, and all structures and annotations are dynamically displayed based on the user's input. The user is not required to type advanced display commands, because the client leverages the power of all the services previously mentioned. The visualization plugins allow for interactive exploration of the data. For example, in Chimera, once the plugin is installed, a new pull-down submenu will appear under ‘Tools’ labeled ‘LSW Services’. Under LSW Services, a menu item will be listed as ‘MutDB’. Clicking on this will bring up the MutDB controller. Several queries are possible here. First, and most easily, the user can enter the gene symbol, such as BRCA1, TP53 or AR. This will query the service and display the PDB IDs that are associated with that gene symbol. Upon selection of a PDB and chain ID, the chain is downloaded by our PDB chain service and the mutation positions are highlighted upon the resulting structure in red, depicted in Figure 1a (Chimera) and Figure 1b (PyMOL). Additionally, a list of the mapped non-synonymous SNPs and mutations are displayed in the mutation list box. When the user selects a position from this box, the mutation is highlighted on the structure, and the source accession number, PubMed ID and comments are displayed in the information box.

Figure 1

Visualization of mutations through web service-enabled client applications. (a) UCSF Chimera is shown with the mutation web service controller window showing a mutation in the TP53 gene. (b) PyMOL is shown with the mutation web service controller window showing a mutation in the BRCA1 gene.

DISCUSSION

We have developed a novel tool for the interactive visualization of structurally associated mutation data. We have made the interface intuitive and based on gene symbol, allowing for users not familiar with the PDB to find and visualize structures. Additionally, we have developed an infrastructure for delivering structurally relevant mutation annotations for incorporation into open source visualization tools (Figure 2).

Figure 2

Web interface for structural visualization of mutation data. A mutation in TP53 is highlighted showing an aspartate-to-glutamate substitution.

Currently, we have annotated 2422 genes with 3587 mutations from Swiss-Prot (release 44) with structure annotations. There are 3339 distinct chains from the PDB in this set. The dbSNP (build 122) has a total of 1487 SNPs annotated with protein structures. A total of 733 structures have been solved for mutations in Swiss-Prot and 789 structures have been solved for non-synonymous SNPs in dbSNP. Currently, there are over 17 000 genes and transcript variants in MutDB, though not all of them have relevant SNP data. Our web services are all distributed through the lifescienceweb.org domain and the web browser interface can be found at .

19 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation.

Authors: D Chasman; R M Adams
Journal: J Mol Biol Date: 2001-03-23 Impact factor: 5.469

Review 3. Integration of genome data and protein structures: prediction of protein folds, protein interactions and "molecular phenotypes" of single nucleotide polymorphisms.

Authors: S Sunyaev; W Lathe; P Bork
Journal: Curr Opin Struct Biol Date: 2001-02 Impact factor: 6.809

4. Prediction of deleterious human alleles.

Authors: S Sunyaev; V Ramensky; I Koch; W Lathe; A S Kondrashov; P Bork
Journal: Hum Mol Genet Date: 2001-03-15 Impact factor: 6.150

5. Predicting deleterious amino acid substitutions.

Authors: P C Ng; S Henikoff
Journal: Genome Res Date: 2001-05 Impact factor: 9.043

6. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors: Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

7. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

8. Evaluation of structural and evolutionary contributions to deleterious mutation prediction.

Authors: Christopher T Saunders; David Baker
Journal: J Mol Biol Date: 2002-09-27 Impact factor: 5.469

Review 9. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis.

Authors: Sean Mooney
Journal: Brief Bioinform Date: 2005-03 Impact factor: 11.622

10. Human non-synonymous SNPs: server and survey.

Authors: Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal: Nucleic Acids Res Date: 2002-09-01 Impact factor: 16.971

18 in total

1. Next generation tools for the annotation of human SNPs.

Authors: Rachel Karchin
Journal: Brief Bioinform Date: 2009-01 Impact factor: 11.622

2. CanProVar: a human cancer proteome variation database.

Authors: Jing Li; Dexter T Duncan; Bing Zhang
Journal: Hum Mutat Date: 2010-03 Impact factor: 4.878

3. Evaluation of probabilistic and logical inference for a SNP annotation system.

Authors: Terry H Shen; Peter Tarczy-Hornoch; Landon T Detwiler; Eithon Cadag; Christopher S Carlson
Journal: J Biomed Inform Date: 2009-12-14 Impact factor: 6.317

4. Planning the human variome project: the Spain report.

Authors: Jim Kaput; Richard G H Cotton; Lauren Hardman; Michael Watson; Aida I Al Aqeel; Jumana Y Al-Aama; Fahd Al-Mulla; Santos Alonso; Stefan Aretz; Arleen D Auerbach; Bharati Bapat; Inge T Bernstein; Jong Bhak; Stacey L Bleoo; Helmut Blöcker; Steven E Brenner; John Burn; Mariona Bustamante; Rita Calzone; Anne Cambon-Thomsen; Michele Cargill; Paola Carrera; Lawrence Cavedon; Yoon Shin Cho; Yeun-Jun Chung; Mireille Claustres; Garry Cutting; Raymond Dalgleish; Johan T den Dunnen; Carlos Díaz; Steven Dobrowolski; M Rosário N dos Santos; Rosemary Ekong; Simon B Flanagan; Paul Flicek; Yoichi Furukawa; Maurizio Genuardi; Ho Ghang; Maria V Golubenko; Marc S Greenblatt; Ada Hamosh; John M Hancock; Ross Hardison; Terence M Harrison; Robert Hoffmann; Rania Horaitis; Heather J Howard; Carol Isaacson Barash; Neskuts Izagirre; Jongsun Jung; Toshio Kojima; Sandrine Laradi; Yeon-Su Lee; Jong-Young Lee; Vera L Gil-da-Silva-Lopes; Finlay A Macrae; Donna Maglott; Makia J Marafie; Steven G E Marsh; Yoichi Matsubara; Ludwine M Messiaen; Gabriela Möslein; Mihai G Netea; Melissa L Norton; Peter J Oefner; William S Oetting; James C O'Leary; Ana Maria Oller de Ramirez; Mark H Paalman; Jillian Parboosingh; George P Patrinos; Giuditta Perozzi; Ian R Phillips; Sue Povey; Suyash Prasad; Ming Qi; David J Quin; Rajkumar S Ramesar; C Sue Richards; Judith Savige; Dagmar G Scheible; Rodney J Scott; Daniela Seminara; Elizabeth A Shephard; Rolf H Sijmons; Timothy D Smith; María-Jesús Sobrido; Toshihiro Tanaka; Sean V Tavtigian; Graham R Taylor; Jon Teague; Thoralf Töpel; Mollie Ullman-Cullere; Joji Utsunomiya; Henk J van Kranen; Mauno Vihinen; Elizabeth Webb; Thomas K Weber; Meredith Yeager; Young I Yeom; Seon-Hee Yim; Hyang-Sook Yoo
Journal: Hum Mutat Date: 2009-04 Impact factor: 4.878

5. RILM: a web-based resource to aid comparative and functional analysis of the insulin and IGF-1 receptor family.

Authors: Acely Garza-Garcia; Dhaval S Patel; David Gems; Paul C Driscoll
Journal: Hum Mutat Date: 2007-07 Impact factor: 4.878

6. Collection of variation causing disease--the Human Variome Project.

Authors: Richard G H Cotton
Journal: Hum Genomics Date: 2009-07 Impact factor: 4.639

7. The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations.

Authors: Nouf S Al-Numair; Andrew C R Martin
Journal: BMC Genomics Date: 2013-05-28 Impact factor: 3.969

8. coliSNP database server mapping nsSNPs on protein structures.

Authors: Hidetoshi Kono; Tomo Yuasa; Shinya Nishiue; Kei Yura
Journal: Nucleic Acids Res Date: 2007-10-05 Impact factor: 16.971

9. SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry.

Authors: Hong Xi; Jongsun Park; Guohui Ding; Yong-Hwan Lee; Yixue Li
Journal: Nucleic Acids Res Date: 2008-11-26 Impact factor: 16.971

10. SNPs3D: candidate gene and SNP selection for association studies.

Authors: Peng Yue; Eugene Melamud; John Moult
Journal: BMC Bioinformatics Date: 2006-03-22 Impact factor: 3.169