Literature DB >> 27797766

yMap: an automated method to map yeast variants to protein modifications and functional regions.

Ahmed Arslan1, Vera van Noort1.   

Abstract

Summary: Recent advances in sequence technology result in large datasets of sequence variants. For the human genome, several tools are available to predict the impact of these variants on gene and protein functions. However, for model organisms such as yeast such tools are lacking, specifically to predict the effect of protein sequence altering variants on the protein level. We present a python framework that enables users to map in a fully automated fashion large set of variants to protein functional regions and post-translationally modified residues. Furthermore, we provide the user with the possibility to retrieve predicted functional information on modified residues from other resources for example that are predicted to play a role in protein-protein interactions. The results are complemented by statistical tests to highlight the significance of underlying functions and pathways affected by mutations. We show the application of this package on a yeast dataset derived from a recent evolutionary experiment on adaptation to ethanol. Availability and Implementation: The package is available from https://github.com/CSB-KUL/yMap and is implemented in Python. Contact: vera.vannoort@biw.kuleuven.be. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 27797766      PMCID: PMC5408805          DOI: 10.1093/bioinformatics/btw658

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Non-synonymous Single Nucleotide Polymorphisms (nsSNPs) potentially disrupt normal protein functions. Several computational tools exist today that can predict the damaging effects of nsSNPs on protein structure and functions. For example, PolyPhen predicts the result of amino acid substitutions on the structure and function of human proteins (Adzhubei ). The impact of these tools is enormous on clinical interpretation of human nsSNPs but no such tool is available for the model organism yeast although SNPs are largely available and being identified in large-scale experiments. The most helpful tool to interpret nsSNPs in yeast is the SGD Variant Viewer (Cherry, 2015), but it does not predict impact on structure and function. Hence a tool to analyze the effect of nsSNPs on different functional regions of yeast proteins is much needed to accelerate research in yeast systems biology. We chose to develop such a tool and particularly focus on Post-Translational Modications (PTMs) and protein domains affected by nsSNPs. UniProt (The UniProt Consortium, 2014) contains over 200 different types of PTMs. PTMs can modulate biological interactions and broadly impact on protein functions, stability and localization hence phenotypes in response to stimuli. An nsSNP at a PTM residue under normal or experimental conditions could explain a change in phenotype and could thus be important for interpretation of effects of nsSNPs (Reimand ). Yeast has been an excellent model to study systems biology. It helped considerably in understanding different molecular phenomena including protein-protein interaction and network biology. The exponential increase in yeast PTM data has left a gap between the identification and the functional annotation of PTM residues (Beltrao ). At the same time, due to the way they are stored it is not straightforward to map variations to the available annotated PTMs. Previously, a few studies have mapped residue variation or conservation to PTM residues (Minguez , Reimand ). In order to facilitate researchers to map large datasets of yeast mutations onto PTM residues, we developed an automated method. A python module called yeast genotype to phenotype map (yMap) provides users with the possibility to map their variant data to PTM containing residues and protein functional regions (Fig. 1). We include mapping to functional and structural attributes of proteins, including protein domains, active sites, binding sites, protein-nucleotide binding motifs, structural regions and include a function to generate web links for pathways/networks visualization. The yMap package provides a stable and easy to use framework in pure python for functional analyses of yeast variants.
Fig. 1.

An overview of data acquisition and processing in yMap. The input sequence variants are provided by the user. Functional annotation is downoloaded by data() function. Mapping gives rise to individual reports and a final report

An overview of data acquisition and processing in yMap. The input sequence variants are provided by the user. Functional annotation is downoloaded by data() function. Mapping gives rise to individual reports and a final report

2 Features

Before you can initiate a variant analysis, you need to run a sub module of the yMap package to download and filter all the data needed to perform the mapping (detailed manual in Supplement). The major functions of the package can be categorized as following: The data() function downloads UniProt files for the yeast proteome, extracts annotations on residue level and stores these in a tab-separated file. It also downloads a gff file with the yeast genome sequence and annotation from SGD. Annotations of PTMs from PTMfunc (Beltrao ) and PTMcode v2.0 (Minguez ) are provided with the download. A user-provided file containing mutations (Supplementary Fig. S1) is converted to mutation types with the mutations-types-file() function, annotating synonymous, non-synonymous and nonsense mutations, protein residue number and reference and mutated amino acid. Finally, the ymap() function maps mutations to the residue level annotations. Files are generated for each separate category of functional annotations, as well as a summary file. This module returns different data folders, specific for each type of data, and contains files with mutations mapped onto the PTMs that are annotated for PPI, interface regions, present in proteins domains (Minguez ) interacting with other PTMs between different proteins or within the same protein (Beltrao ). Apart from PTMs, mutations are mapped to DNA binding motifs, active and binding sites, secondary structure regions. A separate file returns GO terms overrepresented among genes containing nsSNPs located in functional regions. The domain annotation is from UniProt. The p-values are calculated using Orange Bioinformatics against the background of the complete genome (Demšar ). A file with BioGrid (Tyers ) IDs of mutated proteins is generated by ymap(). This module generates web links of these BioGrid IDs for visualization of pathways, networks and associated information available in the BioGrid database. All the analyses are summarized in a ‘final-report’ with the proteins that have variants in different above mentioned protein functional regions, the type of mutation (synonymous or non- synonymous), feature, feature type and source of data. The report is complemented with enrichments file and a BioGrid ids containing file. We have re-analyzed variants from evolutionary experiments on adaption to high levels of ethanol (Voordeckers ). In all experiments together, 1,207 non-synonymous and non-sense mutations were found in sequenced clones adapted to high ethanol. A subset of 232 mutations overlap with a protein functional region. One of these is a mutation of a phosphoserine (S391) in HAL5, a protein kinase with putative role in cell surface biosynthesis. A phosphoserine (S1358) in CHD1, an ATP-dependent chromatin remodeling factor, is found mutated to an asparagine. We identify 21 mutations in protein kinase domains such as VPS15, CDC7, CDC15 and RIM15. Together, this suggests adaption may be achieved through rewiring of the signaling network. Finally, mutations of PRT1, eukaryotic translation initiation factor 3 subunit B, are found within two structural regions. The K462E mutation in the same protein was found to increase ethanol tolerance in confirming experiments (Voordeckers ). Instead of selecting mutations based on frequency in the population, mutations can now be prioritized based on their expected impact on the protein level.

3 Conclusion

We present an automated method to analyze the impact of nsSNPs on protein level in yeast for the first time. The python package for yeast genotype to phenotype mapping yMap focuses on mutations in functional and structural protein regions. This package can be used to analyze mutations that affect important functional regions including PTMs, protein domains, active and binding sites, DNA-protein binding motifs, PPI, PTMs crosstalk and hotspots and the secondary structural regions. The method can facilitate the interpretation of large-scale variant data and the correlation of genotype with phenotype.

Funding

This work has been supported by the KU Leuven Research Fund. Conflict of Interest: none declared. Click here for additional data file.
  10 in total

1.  A method and server for predicting damaging missense mutations.

Authors:  Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal:  Nat Methods       Date:  2010-04       Impact factor: 28.547

2.  The Saccharomyces Genome Database: A Tool for Discovery.

Authors:  J Michael Cherry
Journal:  Cold Spring Harb Protoc       Date:  2015-12-02

3.  Systematic functional prioritization of protein posttranslational modifications.

Authors:  Pedro Beltrao; Véronique Albanèse; Lillian R Kenner; Danielle L Swaney; Alma Burlingame; Judit Villén; Wendell A Lim; James S Fraser; Judith Frydman; Nevan J Krogan
Journal:  Cell       Date:  2012-07-20       Impact factor: 41.582

4.  BioGRID: a general repository for interaction datasets.

Authors:  Chris Stark; Bobby-Joe Breitkreutz; Teresa Reguly; Lorrie Boucher; Ashton Breitkreutz; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

5.  UniProt: a hub for protein information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

6.  Evolutionary constraint and disease associations of post-translational modification sites in human genomes.

Authors:  Jüri Reimand; Omar Wagih; Gary D Bader
Journal:  PLoS Genet       Date:  2015-01-22       Impact factor: 5.917

7.  PTMcode v2: a resource for functional associations of post-translational modifications within and between proteins.

Authors:  Pablo Minguez; Ivica Letunic; Luca Parca; Luz Garcia-Alonso; Joaquin Dopazo; Jaime Huerta-Cepas; Peer Bork
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 16.971

8.  PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins.

Authors:  Pablo Minguez; Ivica Letunic; Luca Parca; Peer Bork
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

9.  Adaptation to High Ethanol Reveals Complex Evolutionary Pathways.

Authors:  Karin Voordeckers; Jacek Kominek; Anupam Das; Adriana Espinosa-Cantú; Dries De Maeyer; Ahmed Arslan; Michiel Van Pee; Elisa van der Zande; Wim Meert; Yudi Yang; Bo Zhu; Kathleen Marchal; Alexander DeLuna; Vera Van Noort; Rob Jelier; Kevin J Verstrepen
Journal:  PLoS Genet       Date:  2015-11-06       Impact factor: 5.917

Review 10.  Evolution and functional cross-talk of protein post-translational modifications.

Authors:  Pedro Beltrao; Peer Bork; Nevan J Krogan; Vera van Noort
Journal:  Mol Syst Biol       Date:  2013-12-22       Impact factor: 11.429

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.