| Literature DB >> 28414800 |
Imane Boudellioua1, Rozaimi B Mahamad Razali1, Maxat Kulmanov1, Yasmeen Hashish1, Vladimir B Bajic1, Eva Goncalves-Serra2, Nadia Schoenmakers3, Georgios V Gkoutos4,5,6, Paul N Schofield7, Robert Hoehndorf1.
Abstract
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.Entities:
Mesh:
Year: 2017 PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Overview of how many causative variants out of 8,746 exonic were recovered on rank 1 and within the top 10 ranks by PVP and PVP-Human, and comparison to CADD, DANN, GWAVA, Exomiser eXtasy, and Phevor.
Analysis was performed on WES data. If a tool did not provide a score for a causative variant, we excluded the variant from this table; consequently, the total number of samples analyzed differs between the methods and the percentages reported are based on the number of samples for which the causative variant was ranked.
| Top hit (exonic) | Top 10 (exonic) | Total (exonic) | Median (exonic) | |
|---|---|---|---|---|
| CADD | 1,095 (15.15%) | 2,317 (32.05%) | 7,229 | 49 |
| DANN | 406 (6.06%) | 1,789 (26.69%) | 6,704 | 108 |
| GWAVA | 102 (1.41%) | 458 (6.32%) | 7244 | 339 |
| eXtasy | 553 (14.85%) | 1,601 (42.99%) | 3,724 | 19 |
| Exomiser | 2,156 (24.65%) | 5,122 (58.56%) | 8,746 | 5 |
| Phevor | 1,679 (28.25%) | 3,845 (64.70%) | 5,943 | 4 |
| PVP-Model | 4,007 (45.82%) | 6,353 (72.64%) | 8,746 | 2 |
| PVP-Human | 6,928 (79.21%) | 7,691 (87.94%) | 8,746 | 1 |
| PVP | 6,892 (78.80%) | 7,828 (89.50%) | 8,746 | 1 |
Overview of the performance of PVP, CADD, DANN, GWAVA and Exomiser in prioritizing causative variants in WGS data.
We prioritize all variants in a VCF file resulting from WGS using the same models. Analysis is separated reflecting the performance of the various tools identifying exonic and non-exonic variants. For CADD, DANN, and GWAVA, we report only analysis results for which a prediction score is returned; consequently, total numbers are less than the total of 11,251 causative variants.
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 6,500 | 74.32% | 7,595 | 86.84% | 8,746 |
| Non-exonic | 2,104 | 83.99% | 2,374 | 94.77% | 2,505 |
| Total | 8,604 | 76.47% | 9,969 | 88.61% | 11,251 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 1,012 | 11.57% | 1,992 | 22.78% | 8,746 |
| Non-exonic | 435 | 17.37% | 703 | 28.06% | 2,505 |
| Total | 1,447 | 12.86% | 2,695 | 23.95% | 11,251 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 6,611 | 75.59% | 7,620 | 87.13% | 8,746 |
| Non-exonic | 2,156 | 86.07% | 2,368 | 94.53% | 2,505 |
| Total | 8,767 | 77.92% | 9,988 | 88.77% | 11,251 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 441 | 6.1% | 1759 | 24.33% | 7229 |
| Non-exonic | 118 | 4.77% | 599 | 24.2% | 2475 |
| Total | 559 | 5.76% | 2358 | 24.3% | 9704 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 325 | 4.85% | 1287 | 19.2% | 6704 |
| Non-exonic | 101 | 5.32% | 347 | 18.27% | 1899 |
| Total | 426 | 4.95% | 1634 | 18.99% | 8603 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 34 | 0.47% | 44 | 0.61% | 7244 |
| Non-exonic | 9 | 0.42% | 22 | 1.04% | 2121 |
| Total | 43 | 0.46% | 66 | 0.7% | 9365 |
| # top 1 hits | % top 1 hits | # top 10 hits | % top 10 hits | Total | |
| Exonic | 2,747 | 31.41% | 6,879 | 78.65% | 8,746 |
| Non-exonic | 780 | 31.14% | 1,895 | 75.65% | 2,505 |
| Total | 3,527 | 31.35% | 8,774 | 77.98% | 11,251 |
Fig 1Performance of PVP in retrieving causative variants in whole exome sequences.
Results are compared against CADD, DANN, and GWAVA, and the phenotype-based tools Exomiser, Phevor and eXtasy.
Fig 2Performance of PVP in identifying causative variants in whole genome sequences using human phenotypes (PVP-Human), model organisms phenotypes (PVP-Model), and combined phenotypes (PVP), and comparison of PVP to CADD, DANN, GWAVA, and Genomiser.
Performance of PVP in variant prioritization in WGS data, separated by mode of inheritance of the disease.
| Coding | Noncoding | |||||
|---|---|---|---|---|---|---|
| Dominant | Recessive | Others/Unknown | Dominant | Recessive | Others/Unknown | |
| 4006 (77.61%) | 2005 (93.26%) | 881 (61.44%) | 1178 (83.66%) | 684 (97.3%) | 310 (78.68%) | |
| 2100 (40.68%) | 1535 (71.40%) | 372 (25.94%) | 754 (53.55%) | 587 (83.50%) | 179 (45.43%) | |
| 4027 (78.01%) | 1993 (92.7%) | 908 (63.32%) | 1197 (85.01%) | 686 (97.58%) | 321 (81.47%) | |