| Literature DB >> 35488922 |
Panagiotis Katsonis1, Kevin Wilhelm2, Amanda Williams3, Olivier Lichtarge4,5,6,7.
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.Entities:
Mesh:
Year: 2022 PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 5.881
Representative predictors of variant fitness effects
| Type | Method name | Input features | Training dataset | Validating datasets | Paper | Citations |
|---|---|---|---|---|---|---|
| Homology | Align-GVGD | Multiple sequence alignment | – | Exp: p53 Clin: IARC p53 | Tavtigian et al. ( | 655 |
| Homology | DeMaSk | Multiple sequence alignment | Amino acid substitutions from 18 deep mutational scanning studies | Leave one out | Munro and Singh ( | 6 |
| Homology | EFIN | Sequence conservation features | UniProt, HumDiv | SwissProt | Zeng et al. | 25 |
| Homology | Evolutionary Action | Multiple sequence alignment | – | Exp: lacI, lysozyme, HIV protease, RecA, p53 Clin: IARC p53, UniProt, CFTR, GAA Pop: 1000 Genomes | Katsonis and Lichtarge | 105 |
| Homology | FATHMM | Multiple sequence alignment | – | Exp: VariBench Clin: HGMD, UniProt, SwissVar Oth: Hicks et al., | Shihab et al. | 932 |
| Homology | likelihood ratio test (LRT) | Multiple sequence alignment | – | SNPs of three individualexomes OMIM database | Chun and Fay | 920 |
| Homology | MAPP | Multiple sequence alignment Phylogenetic Tree | – | Exp: lacI, lysozyme, HIV protease, HIV RT Clin: pyruvate kinase, G6PD, HBB, IARC p53 | Stone and Sidow | 371 |
| Homology | MutationAssessor | Multiple sequence alignment | – | UniProt, IARC p53, COSMIC | Reva et al. | 1687 |
| Homology | PANTHER | Multiple sequence alignment | – | Clin: HGMD, dbSNP, Pop: SNPs sampled from healthy individuals | Thomas et al. | 2924 |
| Homology | PhD-SNP | Multiple sequence alignment Local sequence environment | Swiss-Prot, HumVar | Newer HumVar variants | Capriotti et al. | 732 |
| Homology | PrimateAI | Sequence alignments and sequence-based predictions of structure features | Path: SNVs absent from ExAC Ben: common SNVs and primate changes | Withheld variants from training | Sundaram et al. | 183 |
| Homology | PROVEAN | Multiple sequence alignment | – | Exp: LacI, TP53, ABCA1 Clin: UniProtKB/Swiss-Prot | Choi et al. | 2445 |
| Homology | SIFT/SIFT4G | Multiple sequence alignment | – | Exp: lacI, lysozyme, HIV protease Clin: HumDiv, HumVar | Ng and Henikoff | 5242 |
| Ng and Henikoff | 2543 | |||||
| Vaser et al. | 691 | |||||
| Multiple features | DeepSAV | 16 conservation, 28 structure and 1 dynamics score | ClinVar and UniProt | ClinVar and UniProt | Pei et al. ( | 5 |
| Multiple features | ENTPRISE | Alignment and structure related features | multiple sources, including: HGMD, UniProt, SwissProt | Clin: COSMIC, TCGA,COBR Pop: VariSNP, 1000 Genomes | Zhou et al. | 22 |
| Multiple features | Envision | homology, structure, amino acid properties, and more | large-scale experimental mutagenesis data | Withheld variants from training | Gray et al. | 107 |
| Multiple features | FATHMM-MKL | 10 feature groups (conservation and function annotations) | Path: HGMD Ben: 1000 Genomes | ClinVar, newer HGMD variants | Shihab et al. | 451 |
| Multiple features | FATHMM-XF | 31 feature groups (conservation, ENCODE, Epigenomics, and more) | Path: HGMD Ben: 1000 Genomes | ClinVar | Rogers et al. ( | 174 |
| Multiple features | MutationTaster/MutationTaster2 | Conservation, sequence, and protein features | Path: HGMD, OMIM, and more Ben: 1000 Genomes, HapMap, and more | Withheld variants from training | Schwarz et al. | 2675 |
| Schwarz et al. | 2778 | |||||
| Multiple features | MutPred/MutPred2 | Structure, function annotation, and evolutionary features | Path: HGMD, SwissProt, Cancer drivers Ben: SwissProt | Withheld variants from training/ClinVar, SwissVar, experimental data | Li et al. | 773 |
| Pejaver et al. ( | 190 | |||||
| Multiple features | PolyPhen/PolyPhen-2 | Sequence and structure based features | HGVbase / Path: UniProt (HumDiv, HumVar) Ben: Differences between orthologs | HumVar | Ramensky et al. | 2538 |
| Adzhubei et al. | 11,660 | |||||
| Multiple features | PON-P2 | Conservation, ontology functional, and structural | VariBench | Withheld variants from training | Niroula et al. ( | 161 |
| Multiple features | SNAP/SNAP2 | Conservation, structure prediction, and more | Path: Protein Mutant Database Ben: Swiss-Prot | Exp: lacI, lysozyme, HIV protease, Melanocortin-4 | Bromberg and Rost | 807 |
| Hecht et al. | 339 | |||||
| Multiple features | SNPs&GO | 50 features of mutation, sequence profile, PANTHER output, and GO classification | Swiss-Prot | Disease classification from Goh et al. 2007 | Calabrese et al. | 600 |
| Multiple features | SNPs3D | Structure related features/alignment related features | Path: HGMD Ben: Differences of homologous proteins | Clin: HGMD Pop: dbSNP | Yue et al. | 452 |
| Yue and Moult | 283 | |||||
| Multiple features | VEST | 86 features of SNVBox | Path: HGMD Ben: ESP | Pop: SwissProt, 1000 Genomes | Carter et al. | 342 |
| Multiple features | VIPUR | 20 protein sequence and structure-based features | UniProt | withheld variants from training | Baugh et al. | 50 |
| Functional genomic | fitCons | DNase-seq data, RNA-seq data, ChIP-seq data | – | Transcription factor binding sites expression quantitative trait loci (eQTL) enhancers based on characteristic chromatin marks | Gulko et al. | 215 |
| Ensemble + | CADD | 63 features, including: predictors, conservation scores, and function annotation | Human-chimp changes simulated de novo variants | Exp: ALDOB, ECR11, HBB Clin: HGMD, MML2, ClinVar, IARC p53 Pop: ESP, GVS, 11 individuals | Kircher et al. ( | 4681 |
| Ensemble + | CAPICE | 63 features, including: predictors, conservation scores, and function annotation | Path: ClinVar, VKGL Ben: ClinVar, VKGL, ExAC | ClinVar, VKGL and ExAC | Li et al. ( | 9 |
| Ensemble + | ClinPred | 16 predictors and allele frequencies of gnomAD | ClinVar | Exp: BRCA1 Clin: mutagenetix database, DoCM, | Alirezaie et al. ( | 79 |
| Ensemble | Condel | 5 predictors (Logre, MAPP, MutationAssessor, Polyphen2, and SIFT) | HumVar, HumDiv, COSMIC, p53 | – | Gonzalez-Perez andLopez-Bigas | 788 |
| Ensemble + | DANN | 949 features, including: predictors, conservation scores, and function annotation | Human-chimp changes simulated de novo variants | Clin: ClinVar Pop: ESP | Quang et al. | 689 |
| Ensemble + | DEOGEN / DEOGEN2 | PROVEAN scores, alignment, Network, pathway, gene essentiality and more features | UniProt | Clin: UniProt, p53, F8, BRCA1 | Raimondi et al. | 26 |
| Raimondi et al. | 61 | |||||
| Ensemble + | Eigen | Predictors, conservation scores and allele frequencies of the 1000 Genomes | ClinVar | De novo variants in several studies | Ionita-Laza et al. | 408 |
| Ensemble | InMeRF | 28 predictors and 9 conservation scores | Path: HGMD Ben: variants with MAF > 0.1% | VariBench, PredictSNP, SwissVar | Takeda et al. | 3 |
| Ensemble + | M-CAP | 9 predictors, 7 conservation scores, and 298 alignment-based features | Path: HGMD (AF < 1%) Ben: ExAC (AF < 1%) | Withheld variants from training | Jagadeesh et al. | 531 |
| Ensemble | MetaLR MetaSVM | 15 predictors and 3 conservation scores | UniProt | VariBench, CHARGE database, and publications | Dong et al. | 769 |
| Ensemble | Meta-SNP | 4 predictors: PANTHER, PhD-SNP, SIFT, and SNAP | SwissVar | Newer SwissVar variants | Capriotti et al. | 176 |
| Ensemble + | MISTIC | 7 predictors, 8 conservation scores, MAF, and genetic and protein function | Path: ClinVar Ben: gnomAD | Clin: new ClinVar Pop: SweGen, UK10K, and more | Chennen et al. | 12 |
| Ensemble + | MPC | PolyPhen2 and other deleteriousness metrics | Path: ClinVar Ben: common ExAC variants | 5620 neurodevelopmental disorder cases and 2078 controls | Samocha et al. ( | 146 |
| Ensemble + | MutScore | 5 Predictors: SIFT, SIFT4G, LRT, PROVEAN, GERP + + RS and 9 conservantion scores | ClinVar | ClinVar | Quinodoz et al. ( | 0 |
| Ensemble + | MVP | 15 predictors, 6 conservation scores, structure, interactions, gene intolerance, and more | Path: HGMD, UniProt, ClinVar Ben: UniProt, and more | Path: VariBench, Cancer hotspots Ben: DiscovEHR | Qi et al. | 25 |
| Ensemble | PON-P | 5 predictors: PhD-SNP, SIFT, PolyPhen-2, SNAP, I-Mutant | Path: PhenCode, Idbases, and more Ben: dbSNP with AF > 0.1 | Protein Mutant Database | Olatubosun et al. | 108 |
| Ensemble | PredictSNP/PredictSNP2 | 8 predictors / 6 predictors | UniProt and training datasets of: SNPs&GO, MutPred, and PON-P / ClinVar,GWAS catalog, COSMIC, VariSNP | Protein Mutant Database, and experimental studies/Mendelian disease and cancer driver variants | Bendl et al. | 485 |
| Bendl et al. ( | 119 | |||||
| Ensemble | REVEL | 10 predictors and 8 conservation scores | Path: HGMD Ben: ESP, ARIC, 1000 Genomes | ClinVar and SwissVar | Ioannidis et al. | 853 |
| Ensemble + | Rhapsody | 4 sequence, 1 structure, and 4 dynamic scores | HumVar, ExoVar, PredictSNP, VariBench, and SwissVar | HumVar, ExoVar, PredictSNP, VariBench, and SwissVar | Ponzoni et al. ( | 32 |
Methods citations (Adzhubei et al. 2010; Alirezaie et al. 2018; Baugh et al. 2016; Bendl et al. 2014, 2016; Bromberg and Rost 2007; Calabrese et al. 2009; Capriotti et al. 2013, 2006; Carter et al. 2013; Chennen et al. 2020; Choi et al. 2012; Chun and Fay 2009; Dong et al. 2015; Gonzalez-Perez and Lopez-Bigas 2011; Gray et al. 2018; Gulko et al. 2015; Hecht et al. 2015; Ioannidis et al. 2016; Ionita-Laza et al. 2016; Jagadeesh et al. 2016; Katsonis and Lichtarge 2014; Kircher et al. 2014; Li et al. 2009, 2020; Munro and Singh 2020; Ng and Henikoff 2001, 2003; Niroula et al. 2015; Olatubosun et al. 2012; Pei et al. 2020; Pejaver et al. 2020; Ponzoni et al. 2020; Qi et al. 2021; Quang et al. 2015; Quinodoz et al. 2022; Raimondi et al. 2016, 2017; Ramensky et al. 2002; Reva et al. 2011; Rogers et al. 2018; Samocha et al. 2017; Schwarz et al. 2010; Shihab et al. 2013, 2014, 2015; Stone and Sidow 2005; Sundaram et al. 2018; Takeda et al. 2020; Tavtigian et al. 2006; Thomas et al. 2003; Vaser et al. 2016; Yue et al. 2005; Yue and Moult 2006; Zeng et al. 2014; Zhou et al. 2016)
*Method inclusion criteria:
(i) be applicable on missense mutations
(ii) provide a single value for the impact of mutations
(iii) the impact should represent the overall effect on the protein function
Path pathogenic, Ben benign, Exp experimental associations, Clin clinical associations, Pop population data
Predictors of residue importance
| Type | Method name | Input features | Validating datasets | Paper |
|---|---|---|---|---|
| Homology | ConSurf | Multiple sequence alignment | SH2 and PTB signaling domains/Bcl-XL/Bak peptide complex | Armon et al. Glaser et al. |
| Homology | Evolutionary Trace | Multiple sequence alignment | ligand binding sites (SH2, SH3 domains, DNA binding)/PDB structures with ligands bound | Lichtarge et al. Mihalek et al. ( |
| Homology | GERP + + | Multiple sequence alignment | PolII binding regions (ENCODE) | Davydov et al. |
| Homology | PhastCons | Multiple sequence alignment | – | Siepel et al. |
| Homology | PSIC | Multiple sequence alignment | – | Sunyaev et al. |
| Homology | Rate4Site | Multiple sequence alignment | Src SH2 domain | Pupko et al. |
| Homology | SiPhy | Multiple sequence alignment | ENCODE regions | Garber et al. |
| Ensemble | PhyloP | 4 conservation scores: LRT, SCORE, SPH, GERP | ENCODE | Pollard et al. |
Methods citations (Armon et al. 2001; Davydov et al. 2010; Garber et al. 2009; Glaser et al. 2003; Lichtarge et al. 1996; Mihalek et al. 2004; Pollard et al. 2010; Pupko et al. 2002; Siepel et al. 2005; Sunyaev et al. 1999)
Fig. 1Number of citations to the primary paper of variant prediction methods as a function of the year it was published. The number of citations were obtained by Google Scholar search on the 7th of March 2022. When methods could be matched to multiple primary papers or newer versions were introduced, the paper with the most citations was used here. Methods are classified as (i) analytical models not trained on available variant annotations (red color), (ii) machine learning approaches trained on variant annotations (blue color), (iii) ensemble models that integrate scores from available predictors (purple color), and (iv) models that combine scores from available predictors and additional features (black color)