| Literature DB >> 35453737 |
Helber Gonzales Almeida Palheta1, Wanderson Gonçalves Gonçalves1,2, Leonardo Miranda Brito1, Arthur Ribeiro Dos Santos1, Marlon Dos Reis Matsumoto1, Ândrea Ribeiro-Dos-Santos1,2, Gilderlanio Santana de Araújo1.
Abstract
ClinVar is a web platform that stores ∼789,000 genetic associations with complex diseases. A partial set of these cataloged genetic associations has challenged clinicians and geneticists, often leading to conflicting interpretations or uncertain clinical impact significance. In this study, we addressed the (re)classification of genetic variants by AmazonForest, which is a random-forest-based pathogenicity metaprediction model that works by combining functional impact data from eight prediction tools. We evaluated the performance of representation learning algorithms such as autoencoders to propose a better strategy. All metaprediction models were trained with ClinVar data, and genetic variants were annotated with eight functional impact predictors cataloged with SnpEff/SnpSift. AmazonForest implements the best random forest model with a one hot data-encoding strategy, which shows an Area Under ROC Curve of ≥0.93. AmazonForest was employed for pathogenicity prediction of a set of ∼101,000 genetic variants of uncertain significance or conflict of interpretation. Our findings revealed ∼24,000 variants with high pathogenic probability (RFprob≥0.9). In addition, we show results for Alzheimer's Disease as a demonstration of its application in clinical interpretation of genetic variants in complex diseases. Lastly, AmazonForest is available as a web tool and R object that can be loaded to perform pathogenicity predictions.Entities:
Keywords: clinical impact; encoding data; functional impact; genetic variants; metaprediction; random forest; representation learning
Year: 2022 PMID: 35453737 PMCID: PMC9024711 DOI: 10.3390/biology11040538
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Distribution of genetic variants by functional impact in ClinVar original dataset. The training and test dataset is composed of biological annotated variants for the eight functional predictors described in Section 2.2.
|
|
|
|
|
|---|---|---|---|
| Benign | 266,145 | 18,891 | - |
| Pathogenic | 130,739 | 16,471 | - |
| With conflit of interpretation | 42,609 | - | 7193 |
| With uncertain significance | 349,926 | - | 93,612 |
Figure 1Distribution of variants by functional impact prediction for the eight predictors described in Section 2.2. Each functionial predictor provides their own type of classification. Deleterious (D) and Tolerated (T) for FATHMM; neutral (N) or unknown (U) for LRT; high (H), medium (M), low (L), or neutral for MutationAssessor; disease-causing, automatic prediction (A), disease-causing (D), probably harmless automatic prediction (N), and known to be harmless (P) for MutationTaster; deleterious (D) an neutral (N) for PROVEAN; probably damaging (D), possibly damaging (P) and benign for Polyphen; and finally, deleterious (D) and tolerated (T) for SIFT.
Figure 2Fine-tuning analysis of Random Forest models. The Random Forest models were trained with label encoding and one hot encoding; learned data from multiple correspondence analysis and neural networks as autoencoders. (A) Random Forest shows high values of AUC when data is one hot encoded; (B) AUC results for Random Forest models trained with learned data from multiple correspondence analysis; (C) AUC results for Random forest models trained with autoencoded data.
Figure 3On the left, distribution of CI and VUS classified into benign and pathogenic after impact prediction with probability by AmazonForest. On the right, distribution of variants for Alzheimer’s Disease-related genes.
AmazonForest prediction results for reclassification of genetic variants in genes associated with Alzheimer’s disease.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 21 | 26000066 | APP | NM_000484.4 | c.982CT (p.Arg328Trp) | VUS | Pathogenic | |
| 21 | 26090000 | APP | NM_000484.4 | c.298CT (p.Arg100Trp) | rs200347552 | VUS | Pathogenic |
| 17 | 58278000 | MPO | NM_000250.2 | c.1031GA (p.Gly344Asp) | VUS | Pathogenic | |
| 14 | 73173702 | PSEN1 | NM_000021.4 | c.475TC (p.Tyr159His) | VUS | Pathogenic | |
|
|
|
|
|
|
|
|
|
| 15 | 58665141 | ADAM10 | NM_001110.4 | c.541AG (p.Arg181Gly) | rs145518263 | VUS | Benign |
| 15 | 58665172 | ADAM10 | NM_001110.4 | c.510GC (p.Gln170His) | rs61751103 | VUS | Benign |
| 21 | 25997360 | APP | NM_000484.4 | c.1090CT (p.Leu364Phe) | rs749453173 | VUS | Benign |
| 21 | 25997413 | APP | NM_000484.4 | c.1037CA (p.Ser346Tyr) | VUS | Benign | |
| 21 | 26000018 | APP | NM_000484.4 | c.1030GA (p.Ala344Thr) | rs201045185 | VUS | Benign |
| 21 | 26000167 | APP | NM_000484.4 | c.881AG (p.Gln294Arg) | VUS | Benign | |
| 21 | 26021902 | APP | NM_000484.4 | c.803GA (p.Arg268Lys) | rs1601237753 | VUS | Benign |
| 21 | 26021954 | APP | NM_000484.4 | c.751GA (p.Gly251Ser | VUS | Benign | |
| 21 | 26021978 | APP | NM_000484.4 | c.727GA (p.Asp243Asn) | VUS | Benign | |
| 21 | 26022001 | APP | NM_000484.4 | c.704CT (p.Ala235Val) | CI | Benign | |
| 21 | 26022031 | APP | NM_000484.4 | c.674TC (p.Val225Ala) | rs746313873 | VUS | Benign |
| 21 | 26051060 | APP | NM_000484.4 | c.602CT (p.Ala201Val) | rs149995579 | VUS | Benign |
| 21 | 26051088 | APP | NM_000484.4 | c.574GA (p.Glu192Lys) | VUS | Benign | |
| 21 | 26170574 | APP | NM_000484.4 | c.47GA (p.Arg16Gln) | VUS | Benign |