| Literature DB >> 28498993 |
Daniele Raimondi1,2,3, Ibrahim Tanyalcin1,3, Julien Ferté1,4, Andrea Gazzo1,2, Gabriele Orlando1,2,3, Tom Lenaerts1,2,5, Marianne Rooman1,4, Wim Vranken1,3,5.
Abstract
High-throughput sequencing methods are generating enormous amounts of genomic data, giving unprecedented insights into human genetic variation and its relation to disease. An individual human genome contains millions of Single Nucleotide Variants: to discriminate the deleterious from the benign ones, a variety of methods have been developed that predict whether a protein-coding variant likely affects the carrier individual's health. We present such a method, DEOGEN2, which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates. This extensive contextual information is non-linearly mapped into one single deleteriousness score for each variant. Since for the non-expert user it is sometimes still difficult to assess what this score means, how it relates to the encoded protein, and where it originates from, we developed an interactive online framework (http://deogen2.mutaframe.com/) to better present the DEOGEN2 deleteriousness predictions of all possible variants in all human proteins. The prediction is visualized so both expert and non-expert users can gain insights into the meaning, protein context and origins of each prediction.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28498993 PMCID: PMC5570203 DOI: 10.1093/nar/gkx390
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of the features used in DEOGEN2
| Feature | Status | Short name | Code |
|---|---|---|---|
| PROVEAN score ( | From version 1 | PROV | PR |
| Conservation Index ( | Improved | CI | CI |
| Mutant/wildtype log-odd ratio ( | Improved | LOR | LO |
| Early Folding predictions | New | EF | EF |
| PFAM log-odd score ( | New | PF | PF |
| Interaction patches annotation ( | New | INT | IN |
| RVIS ( | New | RVIS | RV |
| GDI ( | New | GDI | GD |
| Recessiveness index ( | From version 1 | REC | RE |
| Gene essentiality ( | From version 1 | ESS | ES |
| Pathway log-odd score ( | Extended data | PATH | PA |
Comparison of DEOGEN2 cross-validated performances with state of the art predictors on the Humsavar16 dataset
| Method | Sen | Spe | Bac | Pre | MCC |
|---|---|---|---|---|---|
| PolyPhen2 | 85 | 73 | 79 | 71 | 57 |
| LRT | 84 | 70 | 77 | 70 | 54 |
| MutationTaster |
| 70 | 82 | 70 | 63 |
| MutationAssessor | 81 | 71 | 76 | 69 | 52 |
| fatHMM | 78 | 85 | 82 | 80 | 63 |
| PROVEAN | 82 | 75 | 79 | 72 | 57 |
| metaSVM | 83 |
| 88 |
|
|
| fatHMM-MKL |
| 54 | 74 | 61 | 51 |
| SIFT | 85 | 68 | 77 | 67 | 53 |
| PON-P2 | 86 | 83 | 84 | 80 | 69 |
| VEST3 | 88 | 87 | 87 | 82 | 74 |
| DEOGEN | 77 | 92 | 84 | 85 | 71 |
|
| 89 | 88 |
| 84 |
|
DEOGEN and DEOGEN2 scores have been computed in-house with a stratified 10-folds cross-validation. DEOGEN2 uses the MCC-optimal deleteriousness threshold >0.45. PON-P2 predictions have been obtained from its web server. All the other scores have been extracted from the 3.2 version of dbNSFP (34).
Predictor performances on the Blind dataset
| Method | Sen | Spe | Bac | Pre | MCC |
|---|---|---|---|---|---|
| SIFT | 68 | 75 | 72 | 79 | 43 |
| PolyPhen2 (HVAR) | 88 | 67 | 78 | 78 | 57 |
| LRT | 88 | 66 | 77 | 78 | 57 |
| Mutation Taster |
| 74 | 84 | 83 | 70 |
| Mutation Assessor | 70 | 80 | 75 | 83 | 49 |
| FatHMM | 55 | 91 | 73 | 90 | 48 |
| GERP++ | 77 | 72 | 75 | 80 | 49 |
| PhyloP | 76 | 73 | 75 | 80 | 49 |
| SNAP | 53 | 70 | 62 | 63 | 23 |
| SNP&GO | 55 | 94 | 75 | 89 | 53 |
| MutPred | 74 | 81 | 78 | 79 | 55 |
| CONDEL | 71 | 73 | 72 | 72 | 44 |
| CADD phred | 79 | 74 | 77 | 81 | 53 |
| M-CAP | 93 | 68 | 81 | 74 | 63 |
| DEOGEN | 46 |
| 71 |
| 48 |
|
| 87 | 86 |
| 87 |
|
DEOGEN2 scores were computed in-house using the deleteriousness threshold >0.5. M-CAP (35) scores were downloaded and interpreted using pathogenicity threshold >0.025. All other scores from (8).
Figure 1.Overview of the DEOGEN2 web server visualization. (1) The user starts to enter a Uniprot ID or sequence, which activates a dropdown list from which a human protein is selected. After pressing the play button, the user can navigate the sequence (2) to create and submit a variant for this sequence. After pressing ‘Submit sequence’, the variant is visualized in the page report (3) which contains two sections. The General section displays the change between the wild-type and variant amino acid, with the chemical structures of both shown, and the difference between the amino acids expressed on (A) the dashboard as a percentage; clicking on the percentage bar will show the breakdown of these components. The DEOGEN2 section shows the DEOGEN2 score with (B) a breakdown of the contribution of each machine learning feature, so informing the user about which contextual information was most important to reach the final score, and an overview of the raw features scores used as input for the machine learning. Section (C) shows the distribution of all the variant scores in this protein, including in a heat map format (not shown). Information on data points is obtained by hovering over them, the visualization can be changed by clicking on the buttons or the graph icon in the top right corner.