Literature DB >> 18757876

SNAP predicts effect of mutations on protein function.

Yana Bromberg¹, Guy Yachdav, Burkhard Rost.

Abstract

Many non-synonymous single nucleotide polymorphisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2008 PMID： 18757876 PMCID： PMC2562009 DOI： 10.1093/bioinformatics/btn435

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Non-synonymous SNPs (nsSNPs) are associated with disease: Estimates expect as many as 200 000 nsSNPs in human (Halushka et al., 1999) and about 24 000–60 000 in an individual (Cargill et al., 1999); this implies about 1–2 mutants per protein. While most of these likely do not alter protein function (Ng and Henikoff, 2006), many non-neutral nsSNPs contribute to individual fitness. Disease studies typically face the challenge finding a needle (SNP yielding particular phenotype) in a haystack (all known SNPs). For example, many of the thousands of mutations associated with cancer do not actually lead to the disease. Evaluating functional effects of known nsSNPs is essential for understanding genotype/phenotype relations and for curing diseases. Computational mutagenesis methods can be useful in this endeavor if they can explain the motivation behind assigning a mutant to neutral or non-neutral class or if they can provide a measure for the reliability of a particular prediction. Screening for non-acceptable polymorphisms is accurate and provides a measure of reliability: here, we present the first web-server implementation of SNAP (screening for non-acceptable polymorphisms), a method that combines many sequence analysis tools in a battery of neural networks to predict the functional effects of nsSNPs (Bromberg and Rost, 2007, 2008). SNAP was developed using annotations extracted from PMD, the Protein Mutant Database (Kawabata et al., 1999; Nishikawa et al., 1994). SNAP needs only sequence as input; it uses sequence-based predictions of solvent accessibility and secondary structure from PROF (Rost, 2000, unpublished data; Rost, 2005; Rost and Sander, 1994), flexibility from PROFbval (Schlessinger et al., 2006), functional effects from SIFT (Ng and Henikoff, 2003), as well as conservation information from PSI-BLAST (Altschul et al., 1997) and PSIC (Sunyaev et al., 1999), and Pfam annotations (Bateman et al., 2004). If available, SNAP can also benefit from SwissProt annotations (Bairoch and Apweiler, 2000). In sustained cross-validation, SNAP correctly identified ∼80% of the non-neutral substitutions at 77% accuracy (often referred to as specificity, i.e. correct non-neutral predictions/all predicted as non-neutral) at its default threshold. When we increase the threshold, accuracy rises at the expense of coverage (fewer of the observed non-neutral nsSNPs are identified). This balance is reflected in a crucial new feature, the reliability index (RI) for each SNAP prediction that ranges from 0 (low) to 9 (high): where OUT is the raw value of one of the two SNAP output units. When given alternative prediction methods, investigators often identify a subset of predictions for which methods agree. This approach may increase accuracy over any single method at the expense of coverage. Well-calibrated method-internal reliability indices can be much more efficient than a combination of different methods (Rost and Eyrich, 2001). Simply put: ‘A basket of rotten fruit does not make for a good fruit salad’ (Chris Sander, CASP1). The SNAP RI has been carefully calibrated.

2 INPUT/OUTPUT

Users submit the wild-type sequence along with their mutants. A comma-separated list gives mutants as: XiY, where X is the wild-type amino acid, Y is the mutant and i is the number of the residue (i=1 for N-terminus). X is not required and a star (⋆) can replace either i or Y. Any combination of characters following these rules is acceptable; e.g. X⋆=replace all residues X in all positions by all other amino acids, ⋆Y=replace Examples of SNAP functionality. (A) SNAP-server predictions for mutations in INS_HUMAN associated with hyperproinsulenemia and diabetes-mellitus type II (Chan et al., 1987; Sakura et al., 1986; Shoelson et al., 1983). (B) SNAP predictions for comprehensive in silico mutagenesis (all-to-alanine). The crystal structure [PDB 2omg (Norrman et al., 2007)] shows an insulin NPH hexamer [insulin co-crystallized with zinc (sphere at the center) in presence of protamine/urea (not highlighted); picture produced by GRASP2 (Petrey and Honig, 2003)]. Red represents mutations predicted as non-neutral and blue represents neutral predictions. Residues in wire depiction are the same as in (A): V92, H34, F48 and F49 of INS_HUMAN (A chain V3, B chain H10, F24 and F25). SNAP predicts all of these to impact function when mutated to alanine. (C) More reliably predicted residues are predicted more accurately: for instance, >90% of the predictions with a reliability index=6 are expected to be right. At this point, SNAP may take more than an hour to return results (processing status can be tracked on the original submission page). Therefore, most requests will be answered by an email containing a link to the results page. It is also highly recommended to check existing mutant evaluations [available immediately under the ‘known variants’ tab; referenced by RefSeq id (Pruitt et al., 2007) and dbSNP id (Sherry et al., 2001)] prior to submitting sequences for processing. In the near future, PredictProtein (Rost et al., 2004) that provides the framework for SNAP, will store sequences and retrieve predictions for additional mutants in real time. Full sequence analysis (e.g. in silico alanine scans; Fig. 1B) is possible for short proteins (≤150 total mutants/protein) via applicable server query. Analysis of longer sequences and/or local SNAP installation is currently available through the authors.

Fig. 1.

Examples of SNAP functionality. (A) SNAP-server predictions for mutations in INS_HUMAN associated with hyperproinsulenemia and diabetes-mellitus type II (Chan et al., 1987; Sakura et al., 1986; Shoelson et al., 1983). (B) SNAP predictions for comprehensive in silico mutagenesis (all-to-alanine). The crystal structure [PDB 2omg (Norrman et al., 2007)] shows an insulin NPH hexamer [insulin co-crystallized with zinc (sphere at the center) in presence of protamine/urea (not highlighted); picture produced by GRASP2 (Petrey and Honig, 2003)]. Red represents mutations predicted as non-neutral and blue represents neutral predictions. Residues in wire depiction are the same as in (A): V92, H34, F48 and F49 of INS_HUMAN (A chain V3, B chain H10, F24 and F25). SNAP predicts all of these to impact function when mutated to alanine. (C) More reliably predicted residues are predicted more accurately: for instance, >90% of the predictions with a reliability index=6 are expected to be right.

22 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis.

Authors: M K Halushka; J B Fan; K Bentley; L Hsie; N Shen; A Weder; R Cooper; R Lipshutz; A Chakravarti
Journal: Nat Genet Date: 1999-07 Impact factor: 38.330

3. Characterization of single-nucleotide polymorphisms in coding regions of human genes.

Authors: M Cargill; D Altshuler; J Ireland; P Sklar; K Ardlie; N Patil; N Shaw; C R Lane; E P Lim; N Kalyanaraman; J Nemesh; L Ziaugra; L Friedland; A Rolfe; J Warrington; R Lipshutz; G Q Daley; E S Lander
Journal: Nat Genet Date: 1999-07 Impact factor: 38.330

4. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 5. Predicting the effects of amino acid substitutions on protein function.

Authors: Pauline C Ng; Steven Henikoff
Journal: Annu Rev Genomics Hum Genet Date: 2006 Impact factor: 8.929

6. PROFbval: predict flexible and rigid residues in proteins.

Authors: Avner Schlessinger; Guy Yachdav; Burkhard Rost
Journal: Bioinformatics Date: 2006-02-02 Impact factor: 6.937

7. Conservation and prediction of solvent accessibility in protein families.

Authors: B Rost; C Sander
Journal: Proteins Date: 1994-11

8. Structural characterization of insulin NPH formulations.

Authors: Mathias Norrman; Frantisek Hubálek; Gerd Schluckebier
Journal: Eur J Pharm Sci Date: 2007-01-20 Impact factor: 4.384

9. Identification of a mutant human insulin predicted to contain a serine-for-phenylalanine substitution.

Authors: S Shoelson; M Fickova; M Haneda; A Nahum; G Musso; E T Kaiser; A H Rubenstein; H Tager
Journal: Proc Natl Acad Sci U S A Date: 1983-12 Impact factor: 11.205

10. A mutation in the B chain coding region is associated with impaired proinsulin conversion in a family with hyperproinsulinemia.

Authors: S J Chan; S Seino; P A Gruppuso; R Schwartz; D F Steiner
Journal: Proc Natl Acad Sci U S A Date: 1987-04 Impact factor: 11.205

88 in total

Review 1. Bioinformatics for personal genome interpretation.

Authors: Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal: Brief Bioinform Date: 2012-01-13 Impact factor: 11.622

2. Real value prediction of protein folding rate change upon point mutation.

Authors: Liang-Tsung Huang; M Michael Gromiha
Journal: J Comput Aided Mol Des Date: 2012-03-18 Impact factor: 3.686

3. Microcephaly with simplified gyration, epilepsy, and infantile diabetes linked to inappropriate apoptosis of neural progenitors.

Authors: Cathryn J Poulton; Rachel Schot; Sima Kheradmand Kia; Marta Jones; Frans W Verheijen; Hanka Venselaar; Marie-Claire Y de Wit; Esther de Graaff; Aida M Bertoli-Avella; Grazia M S Mancini
Journal: Am J Hum Genet Date: 2011-08-12 Impact factor: 11.025

Review 4. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine.

Authors: Avishek Kumar; Brandon M Butler; Sudhir Kumar; S Banu Ozkan
Journal: Curr Opin Struct Biol Date: 2015-12-09 Impact factor: 6.809

5. Genetic Basis of Common Human Disease: Insight into the Role of Missense SNPs from Genome-Wide Association Studies.

Authors: Lipika R Pal; John Moult
Journal: J Mol Biol Date: 2015-05-01 Impact factor: 5.469

6. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.

Authors: Emidio Capriotti; Russ B Altman
Journal: Genomics Date: 2011-07-07 Impact factor: 5.736

Review 7. Inferring causality and functional significance of human coding DNA variants.

Authors: Shamil R Sunyaev
Journal: Hum Mol Genet Date: 2012-09-17 Impact factor: 6.150

8. Whole exome sequencing to identify a novel gene (caveolin-1) associated with human pulmonary arterial hypertension.

Authors: Eric D Austin; Lijiang Ma; Charles LeDuc; Erika Berman Rosenzweig; Alain Borczuk; John A Phillips; Teresa Palomero; Pavel Sumazin; Hyunjae R Kim; Megha H Talati; James West; James E Loyd; Wendy K Chung
Journal: Circ Cardiovasc Genet Date: 2012-04-02

9. A method and server for predicting damaging missense mutations.

Authors: Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal: Nat Methods Date: 2010-04 Impact factor: 28.547

10. An integrated approach to the interpretation of single amino acid polymorphisms within the framework of CATH and Gene3D.

Authors: Jose M G Izarzugaza; Anja Baresic; Lisa E M McMillan; Corin Yeats; Andrew B Clegg; Christine A Orengo; Andrew C R Martin; Alfonso Valencia
Journal: BMC Bioinformatics Date: 2009-08-27 Impact factor: 3.169