Literature DB >> 25765346

Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces.

Miguel Vázquez¹, Alfonso Valencia¹, Tirso Pons¹.

Abstract

MOTIVATION: The interpretation of cancer-related single-nucleotide variants (SNVs) considering the protein features they affect, such as known functional sites, protein-protein interfaces, or relation with already annotated mutations, might complement the annotation of genetic variants in the analysis of NGS data. Current tools that annotate mutations fall short on several aspects, including the ability to use protein structure information or the interpretation of mutations in protein complexes.
RESULTS: We present the Structure-PPi system for the comprehensive analysis of coding SNVs based on 3D protein structures of protein complexes. The 3D repository used, Interactome3D, includes experimental and modeled structures for proteins and protein-protein complexes. Structure-PPi annotates SNVs with features extracted from UniProt, InterPro, APPRIS, dbNSFP and COSMIC databases. We illustrate the usefulness of Structure-PPi with the interpretation of 1 027 122 non-synonymous SNVs from COSMIC and the 1000G Project that provides a collection of ∼172 700 SNVs mapped onto the protein 3D structure of 8726 human proteins (43.2% of the 20 214 SwissProt-curated proteins in UniProtKB release 2014_06) and protein-protein interfaces with potential functional implications.
AVAILABILITY AND IMPLEMENTATION: Structure-PPi, along with a user manual and examples, isavailable at http://structureppi.bioinfo.cnio.es/Structure, the code for local installations at https://github.com/Rbbt-Workflows

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Multiprotein Complexes

Year: 2015 PMID： 25765346 PMCID： PMC4495296 DOI： 10.1093/bioinformatics/btv142

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Predicting how single-nucleotide variant (SNV) alters the function of protein products is a topic of growing interest in genomics and bioinformatics (reviewed in Hecht ). One of the key limitations of the current computational tools for the prediction of the impact of SNVs is that protein–protein interactions are poorly considered or completely ignored. This is surprising since we know that proteins work as part of protein complexes and interaction networks, and a number of databases with high-quality 3D structurally resolved protein interactome networks are available (Meyer ; Mosca ), and increasingly used to understand human genetic diseases (Guo ; Nishi ; Vidal ; Wang ; Yates and Sternberg, 2013; for a recent review about this topic, see Das ). In spite of this, methods that allow a systematic analysis of SNVs, considering known functional residues in spatial contact with the mutation, and including full atom-level description of protein–protein interfaces, are not available. Indeed, only a few methods, e.g. PMut (Ferrer-Costa ), SNPeffect (Reumers ), SNPs3D (Yue ), PolyPhen-2 (Adzhubei ), PoPMuSiC (Dehouck ) and MuPIT (Niknafs ) use full atom-level description of protein structures explicitly, but they do not include a detailed information about protein–protein interfaces as part of their algorithms (see Supplementary Table S1). Here, we describe Structure–PPi that precisely analyzes mutations data in their 3D protein complex context. This module represents a significant improvement on existing tools in terms of: (i) ability to map mutations onto 3D structures of protein–protein complexes (experimental and homology-based), (ii) description of functional residues around the SNVs in protein–protein interfaces, additionally Structure–PPi implements a full annotation schema annotating post-translational modification sites, catalytic sites, binding sites residues, Pfam domains and prediction of damaging effects from state-of-the-art methods. Besides, the system selects a single reference sequence for each protein-coding gene (i.e. principal isoform), and provides information about cancer somatic mutations, and their corresponding tumor origin and histology. Since this work was submitted two papers dealing with the analysis of disease mutations at protein-protein interfaces have appeared (Mosca ; Porta-Pardo ).

2 Implementation and capabilities

2.1 Overview

Structure–PPi offers a system to analyze SNV data in their protein 3D structure context. A genomic variant that leads to a substitution in a particular residue of a protein isoform is linked to features associated to that amino acid. Those features include secondary structure, post-translational modification sites, catalytic sites, binding sites residues, Pfam domains, signal peptides, trans-membrane regions, prediction of damaging effects with state-of-the-art methods (i.e. SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor, FATHMM, VEST3, CADD) and somatic mutations extracted from: UniProt (UniProt Consortium, 2013), InterPro (Hunter ), APPRIS (Rodriguez ), dbNSFP (Liu ) and COSMIC (Forbes ). Residues in close physical proximity to query SNVs, are extracted from the corresponding 3D structures, including the experimental structures and homology-based models available in the Interactome3D (Mosca ) database. The proximity information is used to generate annotations not directly affected by the investigated mutations but that could be disrupted by changes in the close vicinity (defaults 5 Å). Users may submit batches of tens of thousands SNVs to retrieve the available functional annotations for the corresponding SNVs and residues in spatial contact in that protein or the corresponding protein complex (Fig. 1). Figure 1 also shows the study of hotspot position S427 for the ENSP00000419692 protein isoform in bladder cancer. An assessment of Structure–PPi using a validation set (14 pathogenic and 10 neutral) in BRCA1 BRCT domains (Lee ) is shown in Supplementary Table S3. Structure–PPi achieves a level of performance similar to that obtained by MetaSVM, a support vector machine algorithm, which incorporate results from state-of-the-art methods (i.e. SIFT, Polyphen2, MutationTaster, Mutation Assessor, FATHMM and LRT) and the maximum frequency observed in the 1000G project (Liu ). The results are as follow: MetaSVM (accuracy: 0.83, recall: 1.00, precision: 0.78, MCC: 0.68) and Structure–PPi (accuracy: 0.88, recall: 0.79, precision: 1.00, MCC: 0.78). This assessment reveals that Structure–PPi shows a better precision than MetaSVM, and also a good agreement between predictions and observations. In addition, Supplementary Table S3 shows the utility of Structure–PPi for providing complementary information to the prediction methods. Indeed, this complementary information facilitates discrimination of false-positive results, and also identifies mutations that should be study in more details.

Fig. 1.

Flowchart of steps implemented in the Structure-PPi system (see Supplementary Table S2 for more details). The 3D protein complex interface between ENSP00000419692 and human Liver X nuclear receptor beta is also shown (PDB ID: 4nqa)

2.2 Coverage

Structure–PPi maps SNVs onto the protein 3D structure for 8726 human proteins (43.2% of the 20214 SwissProt-curated proteins in UniProtKB release 2014_06). This value of 43.2% is well above the 18% coverage reported by MuPIT (Niknafs ).

2.3 Software implementation and requirements

Structure–PPi is an independent component of the Rbbt-framework (“Ruby bioinformatics toolkit” Rbbt; https://github.com/mikisvaz/rbbt; Vázquez ). Structure–PPi runs on Unix-based systems (including Linux and Mac). Structure–PPi can be accessed by programmatic access for ruby developers, command-line mode for power users or HTML interface through a web browser (http://structureppi.bioinfo.cnio.es/Structure) for standard users. Structure–PPi includes a pair-wise alignment (Smith–Waterman) step to resolve any potential inconsistency between the isoform sequence and the sequence in the 3D structure, or differences between the isoform sequence and the reference UniProt. Depending on the database used, the throughput on a single process is around hundreds or thousands per second, with less than 500 MB of memory use. We will continue to develop the Structure–PPi, in particular its method for parallelizing file archival and retrieval, and software portability, to further facilitating inclusion into extended genome annotation workflows. We have pre-computed annotations for all coding nsSNVs in COSMIC v69 (∼741 276), and in 1000G Project (∼285 846). The results are available through the website, and some summary statistics and discussion of variants at protein interfaces can be found in Supplementary Table S4. This preliminary analysis might identify disruption of important interactions and improve our understanding about human diseases. Structure–PPi is currently used in different projects, including the ICGC-CLL analysis.

3 Conclusion

We present Structure–PPi, a system to facilitate the comprehensive analysis of cancer-related SNVs, which combines 3D protein structures of protein complexes with functional annotations from different databases. The system implements the generally accepted idea that strong indicators of positive selection for tumorigenesis (driver mutations) are located in functional domain/sites or they affect amino acid residues that have been shown to be important by 3D protein structure. Furthermore, the system provides information about known functional-residues in close physical proximity to query SNVs. Thus, Structure–PPi can provide both mechanistic and biological insights into the role of SNVs in a given cancer.

23 in total

1. PMUT: a web-based tool for the annotation of pathological mutations on proteins.

Authors: Carles Ferrer-Costa; Josep Lluis Gelpí; Leire Zamakola; Ivan Parraga; Xavier de la Cruz; Modesto Orozco
Journal: Bioinformatics Date: 2005-05-06 Impact factor: 6.937

Review 2. Interactome networks and human disease.

Authors: Marc Vidal; Michael E Cusick; Albert-László Barabási
Journal: Cell Date: 2011-03-18 Impact factor: 41.582

3. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays.

Authors: Megan S Lee; Ruth Green; Sylvia M Marsillac; Nicolas Coquelle; R Scott Williams; Telford Yeung; Desmond Foo; D Duong Hau; Ben Hui; Alvaro N A Monteiro; J N Mark Glover
Journal: Cancer Res Date: 2010-06-01 Impact factor: 12.701

4. Three-dimensional reconstruction of protein networks provides insight into human genetic disease.

Authors: Xiujuan Wang; Xiaomu Wei; Bram Thijssen; Jishnu Das; Steven M Lipkin; Haiyuan Yu
Journal: Nat Biotechnol Date: 2012-01-15 Impact factor: 54.908

5. A method and server for predicting damaging missense mutations.

Authors: Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal: Nat Methods Date: 2010-04 Impact factor: 28.547

6. APPRIS: annotation of principal and alternative splice isoforms.

Authors: Jose Manuel Rodriguez; Paolo Maietta; Iakes Ezkurdia; Alessandro Pietrelli; Jan-Jaap Wesselink; Gonzalo Lopez; Alfonso Valencia; Michael L Tress
Journal: Nucleic Acids Res Date: 2012-11-17 Impact factor: 16.971

7. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors: Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal: Nucleic Acids Res Date: 2010-10-15 Impact factor: 16.971

8. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality.

Authors: Yves Dehouck; Jean Marc Kwasigroch; Dimitri Gilis; Marianne Rooman
Journal: BMC Bioinformatics Date: 2011-05-13 Impact factor: 3.307

9. SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs.

Authors: Joke Reumers; Joost Schymkowitz; Jesper Ferkinghoff-Borg; Francois Stricher; Luis Serrano; Frederic Rousseau
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. SNPs3D: candidate gene and SNP selection for association studies.

Authors: Peng Yue; Eugene Melamud; John Moult
Journal: BMC Bioinformatics Date: 2006-03-22 Impact factor: 3.169

15 in total

1. Comparison of algorithms for the detection of cancer drivers at subgene resolution.

Authors: Eduard Porta-Pardo; Atanas Kamburov; David Tamborero; Tirso Pons; Daniela Grases; Alfonso Valencia; Nuria Lopez-Bigas; Gad Getz; Adam Godzik
Journal: Nat Methods Date: 2017-07-17 Impact factor: 28.547

Review 2. Functional variomics and network perturbation: connecting genotype to phenotype in cancer.

Authors: Song Yi; Shengda Lin; Yongsheng Li; Wei Zhao; Gordon B Mills; Nidhi Sahni
Journal: Nat Rev Genet Date: 2017-03-27 Impact factor: 53.242

Review 3. The recurrent architecture of tumour initiation, progression and drug sensitivity.

Authors: Andrea Califano; Mariano J Alvarez
Journal: Nat Rev Cancer Date: 2016-12-15 Impact factor: 60.716

Review 4. Gain-of-Function Mutations: An Emerging Advantage for Cancer Biology.

Authors: Yongsheng Li; Yunpeng Zhang; Xia Li; Song Yi; Juan Xu
Journal: Trends Biochem Sci Date: 2019-04-29 Impact factor: 13.807

Review 5. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine.

Authors: Kivilcim Ozturk; Michelle Dow; Daniel E Carlin; Rafael Bejar; Hannah Carter
Journal: J Mol Biol Date: 2018-06-15 Impact factor: 5.469

6. The MI bundle: enabling network and structural biology in genome visualization tools.

Authors: Arnaud Céol; Heiko Müller
Journal: Bioinformatics Date: 2015-07-25 Impact factor: 6.937

7. KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily.

Authors: Tirso Pons; Miguel Vazquez; María Luisa Matey-Hernandez; Søren Brunak; Alfonso Valencia; Jose Mg Izarzugaza
Journal: BMC Genomics Date: 2016-06-23 Impact factor: 3.969

8. Role of MDH2 pathogenic variant in pheochromocytoma and paraganglioma patients.

Authors: Bruna Calsina; Maria Currás-Freixes; Alexandre Buffet; Tirso Pons; Laura Contreras; Rocío Letón; Iñaki Comino-Méndez; Laura Remacha; María Calatayud; Berta Obispo; Antoine Martin; Regis Cohen; Susan Richter; Judith Balmaña; Esther Korpershoek; Elena Rapizzi; Timo Deutschbein; Laurent Vroonen; Judith Favier; Ronald R de Krijger; Martin Fassnacht; Felix Beuschlein; Henri J Timmers; Graeme Eisenhofer; Massimo Mannelli; Karel Pacak; Jorgina Satrústegui; Cristina Rodríguez-Antona; Laurence Amar; Alberto Cascón; Nicole Dölker; Anne-Paule Gimenez-Roqueplo; Mercedes Robledo
Journal: Genet Med Date: 2018-07-16 Impact factor: 8.822

9. Structure-Based Analysis Reveals Cancer Missense Mutations Target Protein Interaction Interfaces.

Authors: H Billur Engin; Jason F Kreisberg; Hannah Carter
Journal: PLoS One Date: 2016-04-04 Impact factor: 3.240

10. A computational and structural analysis of germline and somatic variants affecting the DDR mechanism, and their impact on human diseases.

Authors: Lorena Magraner-Pardo; Roman A Laskowski; Tirso Pons; Janet M Thornton
Journal: Sci Rep Date: 2021-07-12 Impact factor: 4.379