| Literature DB >> 29423277 |
Denise Anderson1, Timo Lassmann1.
Abstract
Next generation sequencing is a standard tool used in clinical diagnostics. In Mendelian diseases the challenge is to discover the single etiological variant among thousands of benign or functionally unrelated variants. After calling variants from aligned sequencing reads, variant prioritisation tools are used to examine the conservation or potential functional consequences of variants. We hypothesised that the performance of variant prioritisation tools may vary by disease phenotype. To test this we created benchmark data sets for variants associated with different disease phenotypes. We found that performance of 24 tested tools is highly variable and differs by disease phenotype. The task of identifying a causative variant amongst a large number of benign variants is challenging for all tools, highlighting the need for further development in the field. Based on our observations, we recommend use of five top performers found in this study (FATHMM, M-CAP, MetaLR, MetaSVM and VEST3). In addition we provide tables indicating which analytical approach works best in which disease context. Variant prioritisation tools are best suited to investigate variants associated with well-studied genetic diseases, as these variants are more readily available during algorithm development than variants associated with rare diseases. We anticipate that further development into disease focussed tools will lead to significant improvements.Entities:
Year: 2018 PMID: 29423277 PMCID: PMC5799157 DOI: 10.1038/s41525-018-0044-9
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Distribution of the number of genes returned by Phenolyzer
| Phenolyzer gene list type | 1–10 | 11–50 | 51–250 | 251–500 | 501–1000 | >1000 |
|---|---|---|---|---|---|---|
| Gene panels threshold = 0 | 3637 | 1450 | 1041 | 259 | 152 | 88 |
| Gene panels threshold = 0.25 | 4807 | 1551 | 268 | 1 | 0 | 0 |
| Gene panels threshold = 0.5 | 6108 | 515 | 4 | 0 | 0 | 0 |
| Extended gene panels threshold = 0 | 214 | 63 | 204 | 182 | 237 | 5727 |
| Extended gene panels threshold = 0.25 | 4696 | 1344 | 388 | 73 | 68 | 58 |
| Extended gene panels threshold = 0.5 | 5874 | 683 | 70 | 0 | 0 | 0 |
Distribution of the number of ClinVar pathogenic variants returned by dbNSFP for the 11,722 HPO Phenotypic abnormality terms
| Phenolyzer gene list type | 0 | 1–10 | 11–50 | 51–250 | 251–500 | 501–1000 | >1000 |
|---|---|---|---|---|---|---|---|
| Gene panels threshold = 0 | 5657 | 1219 | 1611 | 1680 | 617 | 445 | 493 |
| Gene panels threshold = 0.25 | 5848 | 1556 | 1974 | 1746 | 413 | 148 | 37 |
| Gene panels threshold = 0.5 | 6015 | 1957 | 2305 | 1354 | 85 | 6 | 0 |
| Extended gene panels threshold = 0 | 5217 | 100 | 105 | 101 | 107 | 194 | 5898 |
| Extended gene panels threshold = 0.25 | 5838 | 1536 | 1913 | 1637 | 385 | 189 | 224 |
| Extended gene panels threshold = 0.5 | 6006 | 1898 | 2227 | 1370 | 156 | 62 | 3 |
Fig. 1Heatmaps showing auROC (a) and auPRC (b) values for the 4026 HPO ‘Phenotypic abnormality’ terms when using Phenolyzer gene panels with no score threshold. Right-hand plots show the top level ontology (HP:0000118 ‘Phenotypic abnormality’) and broad child terms of ‘Phenotypic abnormality’. Left-hand plots show the remaining HPO terms not plotted in the right-hand plots. Colour coding of columns represents the score type for each variant prioritisation tool where black = conservation scores, red = ensemble scores, blue = functional prediction scores and yellow=general prediction scores. The heatmap colour scale of the auROC (a) values has been adjusted to highlight moderate to strong performance by only colour coding auROC values greater than or equal to 0.7
Fig. 2Boxplots showing the auPRC values across the top performing variant prioritisation tools for selected HPO ‘phenotypic abnormality’ terms. The vertical red line indicates a strong performance value of 0.8
Fig. 3Heatmap showing auPRC for HPO ‘Phenotypic abnormality’ terms where top performing variant prioritisation tools differ by greater than 0.5. Colour coding of rows is by the parent HPO term. Row annotation includes term and [Number of ClinVar pathogenic variants (number of genes returned by Phenolyzer)]