| Literature DB >> 23316295 |
Monique Ohanian1, Robyn Otway, Diane Fatkin.
Abstract
Entities:
Keywords: cardiovascular disease; genetic variants; nonsynonymous; prediction
Mesh:
Year: 2012 PMID: 23316295 PMCID: PMC3541611 DOI: 10.1161/JAHA.112.002642
Source DB: PubMed Journal: J Am Heart Assoc ISSN: 2047-9980 Impact factor: 5.501
Figure 1.Flow chart showing steps for DNA sequence analysis. ESP indicates Exome Sequencing Project; 1000G, 1000 Genomes project.
Characteristics of 8 Commonly Used Gene Variant Functional Prediction Programs
| Programs | Web Site | Method | Parameters Used | Training Data | Reference |
|---|---|---|---|---|---|
| PANTHER | Hidden Markov Model | Evolutionary conservation across multiple protein families | Disease-associated mutations from HGMD; presumed neutral variants in dbSNP | [ | |
| SIFT | Conservation of protein homologues | Evolutionary conservation | 1 Retroviral+2 bacterial mutagenesis data sets; 5218 human disease-associated SNPs in Swiss-Prot; 3084 SNPs in dbSNP | [ | |
| Align-GVGD | GV, GD | Evolutionary conservation+biochemical properties (amino acid composition, polarity, volume) | Concurrence of unclassified variants with deleterious mutations in | [ | |
| PMut | Neural network | Evolutionary conservation+structural effects (secondary structure and solvent accessibility) | 9334 human disease-associated mutations in 811 proteins from Swiss-Prot; 11 372 neutral variants from | [ | |
| SNPs3D | Support vector machine | Evolutionary conservation+structural effects (protein folding) | Monogenic disease data from HGMD; 10 263 disease SNPs in 731 genes; 16 682 control SNPs | [ | |
| PolyPhen-2 | Naive Bayes classifier | Evolutionary conservation+structural effects | 2 Training models: Hum Div (3155 Mendelian disease-causing variants in UniProt; 6321 presumed nondamaging SNPs) and Hum Var (13 032 human disease-causing mutations from UniProt; 8946 common human nsSNPs with no link to disease) | [ | |
| MutPred | Random forest | Evolutionary conservation | 26 655 Disease-associated mutations in HGMD; 23 426 presumed neutral SNPs in Swiss-Prot | [ | |
| SNPs&GO | Support vector machine | Evolutionary conservation+local sequence+gene ontology score | 16 330 Disease-associated SNPs from Swiss-Prot; 17 432 presumed neutral SNPs from Swiss-Prot | [ | |
GD indicates Grantham deviation; GV, Grantham variation; HGMD, human gene mutation database[1]; MSA, multiple sequence alignment; SNP, single-nucleotide polymorphism.
PolyPhen2 uses 8 sequence-based and 3 structure-based features, including position-specific independent count score of wild-type allele, differences in this score between the wild-type and variant alleles, number of residues observed at the position in the MSA, residue side-chain volume change, variant position with respect to a protein domain defined by Pfam, variant allele congruency to MSA, sequence identity with closest homologue deviating from wild-type allele, normalized accessible surface area of amino acid residue, crystallographic β-factor, and change in accessible surface area propensity for buried residues.
SIFT score, Pfam profile score, and transition frequency (likelihood of observing a given SNP in the UniRef80 database and Protein Data Bank).
Predicted secondary structure, solvent accessibility, transmembrane helices, coiled-coil structure, stability, B-factor, and intrinsic disorder.
Input and Output Characteristics for 8 Common Prediction Algorithms
| Programs | Input | Access to Intermediate Information | Output | Program-Recommended Pathogenicity Criteria |
|---|---|---|---|---|
| PANTHER | WT protein sequence (FASTA or plain format), variant/s of interest; MSA is program generated | MSA (and phylogenetic tree) | subPSEC score: 0 (benign) to −10 (most deleterious); | subPSEC score: <−3 (50% likelihood of deleterious effects); |
| SIFT | WT protein sequence (FASTA format) or Clustal-formatted MSA (WT query sequence must appear first in MSA), variant/s of interest; MSA is program- or user generated | MSA (if single query sequence inputted) | Scaled probability score: 0 (most deleterious) to 1 (benign); no. sequences at position; median sequence conservation | Scaled probability score: <0.05 |
| Align-GVGD | FASTA-formatted MSA | No | Combined GV+GD risk estimate: C0 (lowest risk) to C65 (highest risk); individual GV and GD scores | Incremental risk estimates: 1.0- (C0) to >4.0-fold (C65) |
| PMut | WT protein sequence (FASTA or plain format), or FASTA-formatted MSA (WT query sequence must appear first in MSA), variant/s of interest | PSI-BLAST raw output (protein family analysis), MSA (FASTA format), PHD raw output (secondary structure and accessibility predictions) | Qualitative prediction: neutral or pathogenic; pathogenicity index: 0 (low) to 1.0 (high); reliability: 0 (low) to 9 (high) | Pathogenicity index: >0.5; reliability: >5 |
| SNPs3D | dbSNP, RefSNP or sequence accession number (if variant not present in results list, select protein accession and enter mutation manually); MSA is program generated | MSA | SVM score: positive (nondeleterious) or negative (deleterious) | Negative SVM score |
| PolyPhen-2 | WT protein sequence (FASTA format) or protein identifier, variant position, WT and variant amino acids; MSA is program generated unless downloaded stand-alone version used to input user-generated MSA | MSA, 3D visualization (if protein structure information available) | Qualitative prediction: benign, possibly damaging, probably damaging; Hum Div/Hum Var scores: 0 (benign) to 1.0 (most deleterious); sensitivity: 0 (low) to 1.0 (high); specificity: 0 (low) to 1.0 (high) | Probably damaging prediction; HD/HV scores: closer to 1 |
| MutPred | WT protein sequence in FASTA format, variant/s of interest; MSA is program generated | No | “ | Possibly deleterious ( |
| SNPs&GO | UNIPROT accession number, variant position, WT and variant amino acids; MSA is program generated | No | Qualitative prediction: neutral or disease related; reliability index: 0 (unreliable) to 10 (reliable) | Disease prediction; reliability index: >5 |
General (“g”) score indicates probability that an amino acid substitution is deleterious; MSA, multiple sequence alignment; property (“p”) score, statistical likelihood (P value) that structural and functional properties will be altered; Pdel, deleterious probability; PHD, Profile fed neural network systems from Heidelberg; PSI-BLAST, Position-Specific Iterated Basic Local Alignment Search Tool; subPSEC, substitution position-specific evolutionary conservation score, estimated from the negative logarithm of the probability ratio of wild-type and mutant amino acids at a specific position; WT, wild type.
Except for 7 tumor-related genes in program library.
Nonsynonymous Variants Associated With Cardiac Disorders
| Gene | Protein | Variant | Location | Clinical Association | Genetic Evidence | Functional Evidence | Reference |
|---|---|---|---|---|---|---|---|
| Rare variants | |||||||
| | Lamin A/C | N195K | Coiled-coil rod domain | DCM | Family | Yes | [ |
| | β-Myosin heavy chain | R403Q | Myosin head, interacts with actin | HCM | Family | Yes | [ |
| | β-Myosin heavy chain | S532P | Actin-binding domain | DCM | Family | Yes | [ |
| | Cardiac troponin T | R92Q | α-Tropomyosin-binding domain | HCM | Family | Yes | [ |
| | Cardiac troponin T | R141W | α-Tropomyosin-binding domain | DCM | Family | Yes | [ |
| | α-Tropomyosin | D175N | Troponin T–binding domain | HCM | Family | Yes | [ |
| | KCNQ1 | S140G | S1 transmembrane domain | AF | Family | Yes | [ |
| | KCNQ1 | Y315S | Pore-forming domain | LQTS | Family | Yes | [ |
| | HERG | G628S | Pore-forming domain | LQTS | Sporadic | Yes | [ |
| Common variants | |||||||
| | α-Myosin heavy chain | A1101V | Coiled-coil rod domain | HR, PR | Case–control | No | [ |
| | Angiotensinogen | M235T | Polypeptide chain | HT | Case–control | Yes | [ |
| | Endothelial NO synthase | E298D | NOSIP interaction region | AF, CAD | Case–control | Yes | [ |
| | HERG | K897T | Intracellular C-terminal domain | LQTS, AF | Case–control | Yes | [ |
| | KCNE1 | S38G | Extracellular N-terminal domain | AF | Case–control | Yes | [ |
| | Cardiac sodium channel | H558R | Intracellular repeat I/II linker | AF | Case–control | Yes | [ |
| | β1-adrenergic receptor | S49G | Extracellular N-terminal domain | HR, DCM | Case–control | Yes | [ |
| | β1-adrenergic receptor | G389R | Intracellular C-terminal domain | HF, AF | Case–control | Yes | [ |
| | Cytochrome P450 2C9 | I359L | Substrate recognition site 5 | Warfarin dose | Case–control | Yes | [ |
AF indicates atrial fibrillation; CAD, coronary artery disease; DCM, dilated cardiomyopathy; HCM, hypertrophic cardiomyopathy; HF, heart failure; HR, heart rate; HT, hypertension; LQTS, long QT syndrome; NO, nitric oxide; NOSIP, eNOS interacting protein; PR, PR interval.
Predicted Effects* of Rare and Common Nonsynonymous Variants
Studies Comparing Performance of Different Prediction Algorithms
| Programs Tested | Variants Evaluated | Sensitivities (True-Positive Rates) | Specificities (True-Negative Rates) | Overall Accuracy | Reference |
|---|---|---|---|---|---|
| SIFT, PolyPhen, Align-GVGD, BLOSUM62 | 254 Missense variants in 5 genes involved in familial cancer syndromes and noncancer genetic disease | SIFT (84%), Polyphen (83%), BLOSUM62 (75%), Align-GVGD (69%) | BLOSUM62 (85%), Align-GVGD (84%), SIFT (77%), PolyPhen (58%) | SIFT (82%), BLOSUM62 (78%), PolyPhen (76%), Align-GVGD (73%) | [ |
| SIFT, PolyPhen, PMut, SNPs3D, PhD-SNP, nsSNPAnalyzer | 204 Variants in the human cystathionine β synthase gene | SIFT (89%), PolyPhen (87%; if “possibly damaging” variants were grouped as deleterious), SNPs3D (82%), nsSNPAnalyzer (80%), PhD-SNP (70%), PMut (44%) | PMut (79%), PolyPhen (70%; if “possibly damaging” variants grouped as neutral), nsSNPAnalyzer (59%), PhD-SNP (53%), SIFT (52%), SNPs3D (47%) | PolyPhen (71%; if “possibly damaging” variants grouped as neutral), nsSNPAnalyzer (67%), PMut and SIFT (66%), SNPs3D (61%), PhD-SNP (59%) | [ |
| SIFT, Align-GVGD, PolyPhen-2, XVAR | 267 Variants in 4 cancer-susceptibility genes | Median sensitivities: Xvar (98%), PolyPhen-2 (90%), SIFT (85%), Align-GVGD (10%) | Median specificities: Align-GVGD (>95%), SIFT (52%), PolyPhen-2 (40%), Xvar (33%; if | Align-GVGD, PolyPhen-2, and Xvar (79%), SIFT (77%) | [ |
| MutationTaster, PolyPhen, PolyPhen-2, SNAP, PANTHER, PMut | 1000 Disease-associated mutations and 1000 polymorphisms | MutationTaster (86%), PolyPhen and PolyPhen-2 (78%), SNAP (69%) PMut (68%), PANTHER (50%) | MutationTaster (86%), PolyPhen-2 (83%), PolyPhen (74%), SNAP (69%), PMut (63%), PANTHER (52%) | MutationTaster (86%), PolyPhen (76%), PolyPhen-2 (72%), PMut (65%), SNAP (60%), PANTHER (35%) | [ |
| MutPred, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen, PolyPhen-2, SIFT, SNAP, SNPs&GO | More than 40 000 variants from dbSNP, PhenCode, LSDBs, IDbases | SNAP (88%), PolyPhen-2 (86%), MutPred (85%), PANTHER (77%), PolyPhen (74%), SNPs&GO (71%), SIFT (68%), PhD-SNP (63%), nsSNPAnalyzer (61%) | SNPs&GO (92%), PolyPhen (85%), PhD-SNP (79%), MutPred (78%), PANTHER (76%), PolyPhen-2 (70%), SIFT (62%), nsSNPAnalyzer (58%), SNAP (56%) | SNPs&GO (82%), MutPred (81%), PANTHER (76%), SNAP (72%), PhD-SNP and PolyPhen-2 (71%), PolyPhen (70%), SIFT (65%), nsSNPAnalyzer (60%) | [ |
IDbases indicates LSDBs for immunodeficiency-causing mutations; LSDB, locus-specific databases.
Estimate of true positives and true negatives, some variations in formulas used in different publications.