Literature DB >> 20354512

A method and server for predicting damaging missense mutations.

Ivan A Adzhubei, Steffen Schmidt, Leonid Peshkin, Vasily E Ramensky, Anna Gerasimova, Peer Bork, Alexey S Kondrashov, Shamil R Sunyaev.   

Abstract

Entities:  

Mesh:

Year:  2010        PMID: 20354512      PMCID: PMC2855889          DOI: 10.1038/nmeth0410-248

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


× No keyword cloud information.
To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naïve Bayes classifier (Supplementary Methods).
Figure 1

PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid lines) and Swiss-Prot (dashed lines) databases were used for the homology search in the sequence analysis pipeline. Also shown are corresponding ROC curves for PolyPhen on HumDiv (orange) and HumVar (dark green) calculated from the difference between PSIC scores1 of the wild type and the mutant amino acid residues. (c) ROC curves for PolyPhen-2 trained on HumDiv and tested on a subset of HumVar non-overlapping with HumDiv (green). UniRef100 (solid lines) and Swiss-Prot (dashed lines) databases were used for the homology search. Also shown are ROC curves for SIFT4 (blue), SNAP5 (cyan) and SNPs3D6 (brown) on HumVar. Methods other than PolyPhen-2 and PolyPhen could not easily be applied to HumDiv because using the same sequences for obtaining both multiple alignments and non-damaging replacements must be avoided. SIFT was used in conjunction with Swiss-Prot database, SNAP and SNPs3D were used with their corresponding default databases. We used SIFT with Swiss-Prot database for homology search since Swiss-Prot does not contain incomplete sequences, sequences of splice forms and sequences of human allelic variants, making it possible to guarantee that allelic variants used in testing datasets would not appear in multiple sequence alignments used in computing prediction rules by other methods.

We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.
  5 in total

1.  SIFT: Predicting amino acid changes that affect protein function.

Authors:  Pauline C Ng; Steven Henikoff
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.

Authors:  E Capriotti; R Calabrese; R Casadio
Journal:  Bioinformatics       Date:  2006-08-07       Impact factor: 6.937

3.  Human non-synonymous SNPs: server and survey.

Authors:  Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

4.  SNAP predicts effect of mutations on protein function.

Authors:  Yana Bromberg; Guy Yachdav; Burkhard Rost
Journal:  Bioinformatics       Date:  2008-08-30       Impact factor: 6.937

5.  SNPs3D: candidate gene and SNP selection for association studies.

Authors:  Peng Yue; Eugene Melamud; John Moult
Journal:  BMC Bioinformatics       Date:  2006-03-22       Impact factor: 3.169

  5 in total
  2000 in total

1.  Significance of Cholesterol-Binding Motifs in ABCA1, ABCG1, and SR-B1 Structure.

Authors:  Alexander D Dergunov; Eugeny V Savushkin; Liudmila V Dergunova; Dmitry Y Litvinov
Journal:  J Membr Biol       Date:  2018-12-06       Impact factor: 1.843

2.  Experimental Modeling Supports a Role for MyBP-HL as a Novel Myofilament Component in Arrhythmia and Dilated Cardiomyopathy.

Authors:  David Y Barefield; Megan J Puckelwartz; Ellis Y Kim; Lisa D Wilsbacher; Andy H Vo; Emily A Waters; Judy U Earley; Michele Hadhazy; Lisa Dellefave-Castillo; Lorenzo L Pesce; Elizabeth M McNally
Journal:  Circulation       Date:  2017-08-04       Impact factor: 29.690

Review 3.  Genetic testing for kidney disease of unknown etiology.

Authors:  Thomas Hays; Emily E Groopman; Ali G Gharavi
Journal:  Kidney Int       Date:  2020-04-24       Impact factor: 10.612

4.  Novel Mutation in FLNC (Filamin C) Causes Familial Restrictive Cardiomyopathy.

Authors:  Nathan R Tucker; Micheal A McLellan; Dongjian Hu; Jiangchuan Ye; Victoria A Parsons; Robert W Mills; Sebastian Clauss; Elena Dolmatova; Marisa A Shea; David J Milan; Nandita S Scott; Mark Lindsay; Steven A Lubitz; Ibrahim J Domian; James R Stone; Honghuang Lin; Patrick T Ellinor
Journal:  Circ Cardiovasc Genet       Date:  2017-12

5.  An Alzheimer's Disease-Linked Loss-of-Function CLN5 Variant Impairs Cathepsin D Maturation, Consistent with a Retromer Trafficking Defect.

Authors:  Yasir H Qureshi; Vivek M Patel; Diego E Berman; Milankumar J Kothiya; Jessica L Neufeld; Badri Vardarajan; Min Tang; Dolly Reyes-Dumeyer; Rafael Lantigua; Martin Medrano; Ivonne J Jiménez-Velázquez; Scott A Small; Christiane Reitz
Journal:  Mol Cell Biol       Date:  2018-09-28       Impact factor: 4.272

6.  Exome Sequencing of African-American Prostate Cancer Reveals Loss-of-Function ERF Mutations.

Authors:  Franklin W Huang; Juan Miguel Mosquera; Andrea Garofalo; Coyin Oh; Maria Baco; Ali Amin-Mansour; Bokang Rabasha; Samira Bahl; Stephanie A Mullane; Brian D Robinson; Saud Aldubayan; Francesca Khani; Beerinder Karir; Eejung Kim; Jeremy Chimene-Weiss; Matan Hofree; Alessandro Romanel; Joseph R Osborne; Jong Wook Kim; Gissou Azabdaftari; Anna Woloszynska-Read; Karen Sfanos; Angelo M De Marzo; Francesca Demichelis; Stacey Gabriel; Eliezer M Van Allen; Jill Mesirov; Pablo Tamayo; Mark A Rubin; Isaac J Powell; Levi A Garraway
Journal:  Cancer Discov       Date:  2017-05-17       Impact factor: 39.397

7.  CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants.

Authors:  Lipika R Pal; Kunal Kundu; Yizhou Yin; John Moult
Journal:  Hum Mutat       Date:  2017-06-27       Impact factor: 4.878

8.  CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease.

Authors:  Lipika R Pal; Kunal Kundu; Yizhou Yin; John Moult
Journal:  Hum Mutat       Date:  2017-06-28       Impact factor: 4.878

9.  Comparison and optimization of in silico algorithms for predicting the pathogenicity of sodium channel variants in epilepsy.

Authors:  Katherine D Holland; Thomas M Bouley; Paul S Horn
Journal:  Epilepsia       Date:  2017-05-18       Impact factor: 5.864

Review 10.  Novel PHKG2 mutation causing GSD IX with prominent liver disease: report of three cases and review of literature.

Authors:  Buthainah Albash; Faiqa Imtiaz; Hamad Al-Zaidan; Hadeel Al-Manea; Mohammed Banemai; R Allam; Ali Al-Suheel; Mohammed Al-Owain
Journal:  Eur J Pediatr       Date:  2013-12-11       Impact factor: 3.183

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.