| Literature DB >> 32151256 |
Priscilla Machado do Nascimento1, Inácio Gomes Medeiros1, Raul Maia Falcão1, Beatriz Stransky2,3, Jorge Estefano Santana de Souza4,5.
Abstract
BACKGROUND: A variant of unknown significance (VUS) is a variant form of a gene that has been identified through genetic testing, but whose significance to the organism function is not known. An actual challenge in precision medicine is to precisely identify which detected mutations from a sequencing process have a suitable role in the treatment or diagnosis of a disease. The average accuracy of pathogenicity predictors is 85%. However, there is a significant discordance about the identification of mutational impact and pathogenicity among them. Therefore, manual verification is necessary for confirming the real effect of a mutation in its casuistic.Entities:
Keywords: Decision tree; Mutation; Pathogenicity; Precision medicine; Predictor; VOUS
Year: 2020 PMID: 32151256 PMCID: PMC7063785 DOI: 10.1186/s12911-020-1060-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Accuracy of proposed model and predictors trained with full ClinVar version 2017-05-30, according to ClinVar version 2019-09-23
| Classifier | Accuracy | Predictor = N, | Predictor = P, | Predictor = N, | Predictor = P, |
|---|---|---|---|---|---|
| Extreme Gradient Boosting | 93 (0.3) | 92 (0.5) | 8 (0.5) | 7 (0.3) | 93 (0.3) |
| 92 (0.3) | 91 (0.5) | 9 (0.5) | 8 (0.3) | 92 (0.3) | |
| Random Forest | 92 (0.3) | 91 (0.5) | 9 (0.5) | 8 (0.3) | 92 (0.3) |
| Bagging | 92 (0.3) | 90 (0.5) | 10 (0.5) | 8 (0.3) | 92 (0.3) |
| K Nearest Neighbors | 92 (0.3) | 89 (0.5) | 11 (0.5) | 6 (0.3) | 94 (0.3) |
| Ada Boost | 92 (0.3) | 93 (0.5) | 7 (0.5) | 8 (0.3) | 92 (0.3) |
| Extra Trees | 91 (0.3) | 90 (0.5) | 10 (0.5) | 8 (0.3) | 92 (0.3) |
| Extra Tree | 91 (0.3) | 90 (0.5) | 10 (0.5) | 8 (0.3) | 92 (0.3) |
| Linear Discriminant Analysis | 91 (0.3) | 88 (0.6) | 12 (0.6) | 8 (0.3) | 92 (0.3) |
| Support Vector Machines (Linear kernel) | 91 (0.3) | 86 (0.6) | 14 (0.6) | 6 (0.3) | 94 (0.3) |
| SKLearn Decision Tree | 91 (0.3) | 90 (0.5) | 10 (0.5) | 8 (0.3) | 92 (0.3) |
| Multilayer Perceptron | 91 (0.3) | 85 (0.6) | 15 (0.6) | 6 (0.3) | 94 (0.3) |
| Quadratic Discriminant Analysis | 91 (0.3) | 88 (0.5) | 12 (0.5) | 8 (0.3) | 92 (0.3) |
| Bernoulli Naive Bayes | 91 (0.3) | 86 (0.6) | 14 (0.6) | 7 (0.3) | 93 (0.3) |
| Support Vector Machines (RBF Kernel) | 91 (0.3) | 86 (0.6) | 14 (0.6) | 7 (0.3) | 93 (0.3) |
| Logistic Regression | 91 (0.3) | 86 (0.6) | 14 (0.6) | 7 (0.3) | 93 (0.3) |
| Gaussian Naive Bayes | 90 (0.3) | 84 (0.6) | 16 (0.6) | 6 (0.3) | 94 (0.3) |
| Nu-Support Vector Machines | 87 (0.4) | 82 (0.6) | 18 (0.6) | 11 (0.3) | 89 (0.3) |
| PROVEAN | 83 (0.4) | 75 (0.7) | 25 (0.7) | 13 (0.4) | 87 (0.4) |
| MetaSVM | 81 (0.4) | 69 (0.6) | 31 (0.6) | 10 (0.4) | 90 (0.4) |
| Polyphen | 80 (0.4) | 82 (0.8) | 18 (0.8) | 20 (0.3) | 80 (0.3) |
| SIFT | 80 (0.4) | 77 (0.8) | 23 (0.8) | 18 (0.4) | 82 (0.4) |
*Mean and standard were calculated from 1000 random samples, each one with 30% of ClinVar version 2019-09-23
Fig. 1Decision tree obtained after integration and discretization of variables. Level 1: the root node, responsible for receiving input variant, separates the identified mutations in Neutral, if three predictors classify it so (in this case, variant remains not evaluated in future steps), or Pathogenic, if at least one of three predictors classify it so. Level 2: mutations classified as Pathogenic in the previous step are reevaluated according to their allele frequency in ExAC database, being reclassified as Neutral for mutations with allele frequency higher than 0.0001 (remains not evaluated in future steps), or maintained as Pathogenic, for mutations with allele frequency less than 0.0001. Level 3: Mutations previously classified as Pathogenic are reevaluated according to the number of predictors that converge to the same result. Mutations are reclassified as Neutral if identified as pathogenic by less than five predictors (remains not evaluated in future steps), or maintained as Pathogenic if identified as so by five to nine predictors. Level 4: Mutations previously classified as Pathogenic are reevaluated according to the COMMON variable from 1000genomes, being reclassified as Neutral for mutations with allele frequency higher than 0.0001, or maintained as Pathogenic, for mutations with allele frequency less than 0.0001
Accuracies at each level of constructed tree topologies variants from our proposed model, according to ClinVar (version 2017-05-30)
| Accuracy (%) | FPR (%) | FNR (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Topology | Lv. 1 | Lv. 2 | Lv. 3 | Lv. 4 | Lv. 1 | Lv. 2 | Lv. 3 | Lv. 4 | Lv. 1 | Lv. 2 | Lv. 3 | Lv. 4 |
| 74 | 91 | 90 | 91 | 30 | 8 | 11 | 9 | 10 | 11 | 9 | 9 | |
| SPP-ExAC-COMMON-NDamage | 74 | 91 | 87 | 91 | 30 | 8 | 16 | 9 | 10 | 11 | 8 | 9 |
| SPP-NDamage-ExAC-COMMON | 74 | 78 | 79 | 79 | 30 | 7 | 2 | 4 | 10 | 32 | 33 | 32 |
| SPP-NDamage-COMMON-ExAC | 74 | 78 | 79 | 79 | 30 | 7 | 4 | 2 | 10 | 32 | 32 | 33 |
| SPP-COMMON-ExAC-NDamage | 74 | 87 | 91 | 91 | 30 | 15 | 8 | 9 | 10 | 8 | 11 | 9 |
| SPP-COMMON-NDamage-ExAC | 74 | 87 | 79 | 79 | 30 | 15 | 4 | 2 | 10 | 8 | 32 | 33 |
| NDamage-SPP-ExAC-COMMON | 78 | 78 | 79 | 79 | 7 | 7 | 2 | 4 | 32 | 32 | 33 | 32 |
| NDamage-SPP-COMMON-ExAC | 78 | 78 | 79 | 79 | 7 | 7 | 4 | 2 | 32 | 32 | 32 | 33 |
| NDamage-ExAC-SPP-COMMON | 78 | 79 | 78 | 79 | 7 | 2 | 7 | 4 | 32 | 33 | 32 | 32 |
| NDamage-ExAC-COMMON-SPP | 78 | 79 | 79 | 79 | 7 | 2 | 4 | 4 | 32 | 33 | 32 | 32 |
| NDamage-COMMON-SPP-ExAC | 78 | 79 | 79 | 79 | 7 | 4 | 4 | 2 | 32 | 32 | 32 | 33 |
| NDamage-COMMON-ExAC-SPP | 78 | 79 | 79 | 79 | 7 | 4 | 2 | 4 | 32 | 32 | 33 | 32 |