| Literature DB >> 31026367 |
Mattia Bosio1,2,3, Oliver Drechsel4, Rubayte Rahman5, Francesc Muyas1,2, Raquel Rabionet1,2,6, Daniela Bezdan1,2, Laura Domenech Salgado1,2, Hyun Hor7, Jean-Jacques Schott8,9, Francina Munell10, Roger Colobran10, Alfons Macaya10, Xavier Estivill11,12, Stephan Ossowski1,2,13.
Abstract
Mendelian diseases have shown to be an and efficient model for connecting genotypes to phenotypes and for elucidating the function of genes. Whole-exome sequencing (WES) accelerated the study of rare Mendelian diseases in families, allowing for directly pinpointing rare causal mutations in genic regions without the need for linkage analysis. However, the low diagnostic rates of 20-30% reported for multiple WES disease studies point to the need for improved variant pathogenicity classification and causal variant prioritization methods. Here, we present the exome Disease Variant Analysis (eDiVA; http://ediva.crg.eu), an automated computational framework for identification of causal genetic variants (coding/splicing single-nucleotide variants and small insertions and deletions) for rare diseases using WES of families or parent-child trios. eDiVA combines next-generation sequencing data analysis, comprehensive functional annotation, and causal variant prioritization optimized for familial genetic disease studies. eDiVA features a machine learning-based variant pathogenicity predictor combining various genomic and evolutionary signatures. Clinical information, such as disease phenotype or mode of inheritance, is incorporated to improve the precision of the prioritization algorithm. Benchmarking against state-of-the-art competitors demonstrates that eDiVA consistently performed as a good or better than existing approach in terms of detection rate and precision. Moreover, we applied eDiVA to several familial disease cases to demonstrate its clinical applicability.Entities:
Keywords: NGS diagnostics; disease variant prioritization; machine learning; rare genetic disease; whole-exome sequencing
Mesh:
Year: 2019 PMID: 31026367 PMCID: PMC6767450 DOI: 10.1002/humu.23772
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Figure 1eDiVA‐Score random forest model. (a) Estimated importance of features used in the model (extracted with varImp command). (b) Distribution of values for the top‐9 features used in the model, comparing ClinVar pathogenic against ClinVar nonpathogenic variants. AF: allele frequency; eDiVA: exome Disease Variant Analysis
Figure 2Benchmarking of the pathogenicity classifiers eDiVA‐Score, CADD, Eigen, Revel, and M‐CAP using ROC for (a) set of 10,494 ClinVar pathogenic variants (TP) and 3,887 ClinVar “benign” variants (TN); (b) set of 16,694 ClinVar pathogenic variants (TP) and 19,888 ClinVar “benign” variants (TN), setting missing values to benign, (c) subset of rare variants (AF, <1% from set c); (d) set of 63,712 variants from HGMD (TP) and 100,000 from GnomAD (TN) for which values from all tools are available; (e) set of 96,569 variants from HGMD (TP) and 100,000 from GnomAD (TN), setting missing values to benign; (f) subset of rare variants (AF, <1% from set e); (g) set of 63,712 HGMD variants (“DM” and “DM?”) as TP, and 1,892 HGMD variants (other categories) as TN for which values from all tools are available; (h) set of 96,569 variants from HGMD (“DM” and “DM?”) as TP, and 7,376 HGMD (other categories) as TN, setting missing values to benign; and (i) subset of rare variants (AF, <1% from set h). AF: allele frequency; eDiVA: exome Disease Variant Analysis; M‐CAP: Mendelian clinically applicable pathogenicity; ROC: receiver operating characteristic; TN: true negative; TP: true positive
Figure 3Receiver operating characteristic curves comparing pathogenicity classifiers on five independent data sets (and the combined set) composed of pathogenic and neutral variants. Revel, M‐CAP, and eDiVA show a similarly strong performance, with the exception of the PredictSNP and Varibench sets, on which Revel and M‐CAP outperform eDiVA‐Score. eDiVA: exome Disease Variant Analysis; M‐CAP: Mendelian clinically applicable pathogenicity; SNP: single‐nucleotide polymorphism
Figure 4Benchmark of the causal variant prioritization tools eDiVA, Exomiser, Phen–Gen, and PhenoDB. (a) Violin plots showing the rank of disease‐causing variants within the reported candidate lists for the three tested inheritance types: “recessive homozygous”, “compound heterozygous”, and “dominant de novo”; (b) Recall values for 6,811 semisynthetic trio cases, representing the fraction of identified causal variants (i.e., “solved cases”). (c) Average number of false positives reported per case as a proxy for precision. eDiVA has been tested in two configurations, with HPO‐based gene prioritization (eDiVA_HPO) and with the default configuration not using HPO terms (eDiVA). Adding HPO filtering reduces false positives at the cost of a slightly reduced Recall. HPO: Human Phenotype Ontology