Literature DB >> 27153707

PinSnps: structural and functional analysis of SNPs in the context of protein interaction networks.

Hui-Chun Lu1, Julián Herrera Braga1, Franca Fraternali1.   

Abstract

UNLABELLED: We present a practical computational pipeline to readily perform data analyses of protein-protein interaction networks by using genetic and functional information mapped onto protein structures. We provide a 3D representation of the available protein structure and its regions (surface, interface, core and disordered) for the selected genetic variants and/or SNPs, and a prediction of the mutants' impact on the protein as measured by a range of methods. We have mapped in total 2587 genetic disorder-related SNPs from OMIM, 587 873 cancer-related variants from COSMIC, and 1 484 045 SNPs from dbSNP. All result data can be downloaded by the user together with an R-script to compute the enrichment of SNPs/variants in selected structural regions.
AVAILABILITY AND IMPLEMENTATION: PinSnps is available as open-access service at http://fraternalilab.kcl.ac.uk/PinSnps/ CONTACT: franca.fraternali@kcl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27153707      PMCID: PMC4978923          DOI: 10.1093/bioinformatics/btw153

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

High-throughput experiments are routinely performed to decipher genetic, metabolic and protein-protein interaction networks (PPINs) and bioinformaticians are compelled to develop efficient and accurate tools to assist decision-making based on available data from multiple sources (Chung ; Fernandes ; Lu ). Bioinformatics applications, which merge available genomic, interaction and structural data, can be broadly classified into exploratory or predictive tools. The former comprises of tools which map and visualize the merged data (Kelley ; Lees ; Mosca ; Niknafs ; Pappalardo and Wass, 2014; Ryan ; Vazquez ), while predictive tools are quantitative estimators of the potential impact of SNPs/variants and offer an assessment in terms of scores or pseudo free-energy metrics (Adzhubei ; Betts ; Li ; Ng and Henikoff, 2003; Pires ; Pires ; Pires ; Yates ). In this application, we use 3D interactome networks and their homologs to highlight how human variants and disease-causing mutations may affect protein function and complex stability. Recent studies have used the structural information of PPINs to understand the molecular mechanisms of binding partner selection (Fornili ). These reliable methods only consider the interactions that have a representative 3D structure or a close homolog with a 3D structure to add weight to the existence of the observed protein interactions (or network links) in a given PPIN (Hooda and Kim, 2012; Kim ; Lees ; Meyer ; Mosca ; Wang ). Multiple studies have pointed out that the interfaces of protein complexes harbours mutations associated with diseases (Espinosa ; Gao ; Kamburov ; Nishi ; Studer ; Wang ; Yates and Sternberg, 2013a,b). The evaluation of the impact of genomic variation on coding regions can be enhanced by mapping SNPs to distinct regions of protein structure, i.e. surface, interface or core. To generate a comprehensive mapping of available SNPs onto PPINs, the automatic pipeline PinSnps has been developed (for details see Supplementary Fig. S2); this extracts structure-integrated human PPINs, enriched with information from homologous protein domains with sequence identity higher than 30%. The main strengths and differences to previous approaches lie in (i) the use of homologous structures of human protein sequences in the PPINs to map the studied variants, which more than doubles the available positional 3D information; (ii) the mapping onto predefined protein regions (surface, core, interface) along with the mapping of functional sites and Post-Translational Modifications (PTMs) (obtained from UniProt (UniProt Consortium, 2015)). This information, together with precompiled predictions of the SNP/variant’s impact from multiple predictors, can help users to quantitatively assess and evaluate the functional implications of their studied variants. The annotation of both intra- and inter-domain disordered regions as predicted by DISOPRED2 (Ward ) has also been included in the pipeline, as recent studies imply the importance of these regions in regulating biological functions (Cline and Karchin, 2011; Gibbs and Showalter, 2015; Wright and Dyson, 2015); (iii) allowing the users to download the query data in various file formats (Fig. 1).
Fig. 1.

PinSnps user interface overview. The complex between Raf1 (P04049, coloured in cyan) and Braf (P15056, coloured in orange) is shown. The protein sequence annotated profile of the complex shows the sequence alignment of the query protein sequence and the available PDB structure sequences. A more detailed description of the platform interactive output is given in the Supplementary Figure S1

PinSnps user interface overview. The complex between Raf1 (P04049, coloured in cyan) and Braf (P15056, coloured in orange) is shown. The protein sequence annotated profile of the complex shows the sequence alignment of the query protein sequence and the available PDB structure sequences. A more detailed description of the platform interactive output is given in the Supplementary Figure S1

2 Implementation and features

The PPIN used in this study has been derived as a non-redundant set of protein interactions from the list of human PPIs given in Supplementary Table S1. The current release includes data of 16 603 proteins, of which 4673 have a resolved structure and 4962 have a homologous structure (Supplementary Fig. S3). PinSnps is, to our knowledge, one of the largest collections of variants mapped onto 3D coordinates. SNPs from dbSNP (Sherry ), consisting of common and germ-line disease variants (the later originally from OMIM (Hamosh )), together with somatic cancer mutations from COSMIC (Forbes ) have been mapped onto cognate 3D structures and, when not available, to their homologous structures. The use of homologous structures expands significantly the number of SNPs/variants mapped onto 3D positions within folded domains. The enrichment of disease-associated variants in specific regions of proteins can be quantified using Formula S1 and the R script which is provided on the PinSnps ‘Downloads’ webpage (see example in Supplementary Fig. S4). We present a number of case studies and more detailed instructions on the web server’s ‘Help’ page and in the Supplementary Materials.

2.1 Protein sequence annotated profiles

Each protein in the PPIN is transformed into a sequence-annotated string (we refer to this as ‘profile’) that represents the fingerprint of the user-selected information. These profiles were generated based on information obtained from sequence alignments, available structural information, human genetic data (from dbSNP, OMIM and COSMIC) and UniProt protein functional site and PTM annotations. PSI-BLAST (Altschul ) was used to identify resolved and homologous structures of human proteins by searching against sequences of the Protein Data Bank (Berman ). Homologous structures with more than 80% coverage of the human protein domain sequence and with more than 30% sequence identity were selected. Each protein was annotated with domain boundaries according to Pfam (Finn ). Alignments between sequences of query protein domains and available protein structure sequences were performed using T-Coffee (Notredame ). The classification of structural regions, i.e. the definition of surface, interface and core regions, was based on the surface area analysis of POPSCOMP (Kleinjung and Fraternali, 2005). Conflict of Interest: none declared.
  45 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

3.  Relating three-dimensional structures to protein networks provides evolutionary insights.

Authors:  Philip M Kim; Long J Lu; Yu Xia; Mark B Gerstein
Journal:  Science       Date:  2006-12-22       Impact factor: 47.728

4.  LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.

Authors:  Michael Ryan; Mark Diekhans; Stephanie Lien; Yun Liu; Rachel Karchin
Journal:  Bioinformatics       Date:  2009-04-15       Impact factor: 6.937

5.  Proteins and domains vary in their tolerance of non-synonymous single nucleotide polymorphisms (nsSNPs).

Authors:  Christopher M Yates; Michael J E Sternberg
Journal:  J Mol Biol       Date:  2013-01-25       Impact factor: 5.469

6.  Protein networks reveal detection bias and species consistency when analysed by information-theoretic methods.

Authors:  Luis P Fernandes; Alessia Annibale; Jens Kleinjung; Anthony C C Coolen; Franca Fraternali
Journal:  PLoS One       Date:  2010-08-18       Impact factor: 3.240

7.  Deriving a mutation index of carcinogenicity using protein structure and protein interfaces.

Authors:  Octavio Espinosa; Konstantinos Mitsopoulos; Jarle Hakas; Frances Pearl; Marketa Zvelebil
Journal:  PLoS One       Date:  2014-01-15       Impact factor: 3.240

8.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors:  Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

9.  The Phyre2 web portal for protein modeling, prediction and analysis.

Authors:  Lawrence A Kelley; Stefans Mezulis; Christopher M Yates; Mark N Wass; Michael J E Sternberg
Journal:  Nat Protoc       Date:  2015-05-07       Impact factor: 13.491

10.  In silico functional dissection of saturation mutagenesis: Interpreting the relationship between phenotypes and changes in protein stability, interactions and activity.

Authors:  Douglas E V Pires; Jing Chen; Tom L Blundell; David B Ascher
Journal:  Sci Rep       Date:  2016-01-22       Impact factor: 4.379

View more
  10 in total

Review 1.  Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level.

Authors:  David K Brown; Özlem Tastan Bishop
Journal:  Glob Heart       Date:  2017-03-13

2.  Solvent accessibility of E1α and E1β residues with known missense mutations causing pyruvate dehydrogenase complex (PDC) deficiency: Impact on PDC-E1 structure and function.

Authors:  Nicole H Ducich; Jason A Mears; Jirair K Bedoyan
Journal:  J Inherit Metab Dis       Date:  2022-02-01       Impact factor: 4.750

3.  HUMA: A platform for the analysis of genetic variation in humans.

Authors:  David K Brown; Özlem Tastan Bishop
Journal:  Hum Mutat       Date:  2017-10-17       Impact factor: 4.878

4.  GenProBiS: web server for mapping of sequence variants to protein binding sites.

Authors:  Janez Konc; Blaz Skrlj; Nika Erzen; Tanja Kunej; Dusanka Janezic
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

5.  Mutation-Structure-Function Relationship Based Integrated Strategy Reveals the Potential Impact of Deleterious Missense Mutations in Autophagy Related Proteins on Hepatocellular Carcinoma (HCC): A Comprehensive Informatics Approach.

Authors:  Faryal Mehwish Awan; Ayesha Obaid; Aqsa Ikram; Hussnain Ahmed Janjua
Journal:  Int J Mol Sci       Date:  2017-01-11       Impact factor: 5.923

6.  Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins.

Authors:  Linda Dib; Nicolas Salamin; David Gfeller
Journal:  PLoS Comput Biol       Date:  2018-05-21       Impact factor: 4.475

7.  Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes.

Authors:  A Gress; V Ramensky; O V Kalinina
Journal:  Oncogenesis       Date:  2017-09-25       Impact factor: 7.485

8.  Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants.

Authors:  Anna Laddach; Joseph Chi Fung Ng; Franca Fraternali
Journal:  PLoS Biol       Date:  2021-04-28       Impact factor: 8.029

9.  PSnpBind: a database of mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow.

Authors:  Ammar Ammar; Rachel Cavill; Chris Evelo; Egon Willighagen
Journal:  J Cheminform       Date:  2022-02-28       Impact factor: 5.514

Review 10.  Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology.

Authors:  Payam Behzadi; Márió Gajdács
Journal:  Eur J Microbiol Immunol (Bp)       Date:  2021-12-15
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.