| Literature DB >> 31754101 |
Denise Anderson1, Gareth Baynam2,3,4, Jenefer M Blackwell5, Timo Lassmann6.
Abstract
Whole genome and exome sequencing is a standard tool for the diagnosis of patients suffering from rare and other genetic disorders. The interpretation of the tens of thousands of variants returned from such tests remains a major challenge. Here we focus on the problem of prioritising variants with respect to the observed disease phenotype. We hypothesise that linking patterns of gene expression across multiple tissues to the phenotypes will aid in discovering disease causing variants. To test this, we construct classifiers that learn associations between tissue-specific gene expression and disease phenotypes. We find that using Genotype-Tissue Expression project (GTEx) expression data in conjunction with disease agnostic variant prioritisation methods (CADD or MetaSVM) results in consistent improvements in classification accuracy. Our method represents a previously overlooked avenue of utilising existing expression data for clinical diagnostics, and also opens the door to use of other functional genomic data sets in the same manner.Entities:
Mesh:
Year: 2019 PMID: 31754101 PMCID: PMC6872807 DOI: 10.1038/s41467-019-13345-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Rationale for use of tissue and cell-specific gene expression for prioritisation of variants associated with a brain disease phenotype. Pathogenic variants are coloured magenta and benign variants are coloured blue (first column). The second column shows in silico predictions of variant pathogenicity, where increasing magenta intensity indicates stronger probability of pathogenicity and increasing blue intensity indicates stronger probability of being benign. The remaining columns in order represent heart tissue, kidney tissue, lung tissue, brain tissue, red blood cells, neurons and T cells. The green colour scale represents gene expression, where increasing colour intensity indicates higher expression values.
Performance of VARPP versus use of variant prioritisation tools alone.
| VARPP | auPRC | Difference (CI) | PP100 | Difference (CI) | ||||
|---|---|---|---|---|---|---|---|---|
| GTEx expression + CADD | 1314 (70%) | 0.042 (0.038 to 0.046) | 21.6 (1878) | <0.001 | 1030 (55%) | 0.051 (0.044 to 0.057) | 15.0 (1878) | <0.001 |
| GTEx expression + MetaSVM | 721 (38%) | −0.005 (−0.001 to −0.009) | −2.7 (1878) | 0.006 | 1474 (78%) | 0.062 (0.058 to 0.067) | 25.7 (1878) | <0.001 |
| GTEx specificity + CADD | 1306 (70%) | 0.042 (0.038 to 0.046) | 21.4 (1878) | <0.001 | 1057 (56%) | 0.051 (0.045 to 0.057) | 16.7 (1878) | <0.001 |
| GTEx specificity + MetaSVM | 1251 (67%) | 0.031 (0.027 to 0.036) | 14.6 (1878) | <0.001 | 1645 (88%) | 0.096 (0.092 to 0.100) | 43.7 (1878) | <0.001 |
| FANTOM5 expression + CADD | 705 (38%) | −0.013 (−0.009 to −0.016) | −6.7 (1878) | <0.001 | 797 (42%) | 0.011 (0.005 to 0.017) | 3.7 (1878) | 0.0002 |
| FANTOM5 expression + MetaSVM | 189 (10%) | −0.071 (−0.067 to −0.075) | −34.8 (1878) | <0.001 | 1127 (60%) | 0.015 (0.010 to 0.020) | 6.4 (1878) | <0.001 |
| FANTOM5 specificity + CADD | 581 (31%) | −0.026 (−0.022 to −0.030) | −13.6 (1878) | <0.001 | 637 (34%) | −0.015 (−0.009 to −0.021) | −4.7 (1878) | <0.001 |
| FANTOM5 specificity + MetaSVM | 264 (14%) | −0.058 (−0.053 to −0.063) | −24.1 (1878) | <0.001 | 1286 (68%) | 0.027 (0.021 to 0.032) | 9.7 (1878) | <0.001 |
aNumber and percentage of HPO terms where VARPP performed better for the auPRC than the variant prioritisation tool alone. auPRC mean difference, 95% confidence intervals (CI), t statistic, degrees of freedom (df) and P value are from a Student’s paired t test
bNumber and percentage of HPO terms where VARPP performed better for the PP100 than the variant prioritisation tool alone. PP100 mean difference, 95% confidence intervals (CI), t statistic, degrees of freedom (df) and P value are from a Student’s paired t test
Fig. 2Performance of VARPP classifiers across 1879 HPO Phenotypic Abnormality terms. a Agreement scatter plot comparing the PP100 for VARPP including CADD + GTEx specificity (y axis) versus the PP100 for CADD scores alone (x axis). b Agreement scatter plot comparing the PP100 for VARPP including MetaSVM + GTEx specificity (y axis) versus the PP100 for MetaSVM scores alone (x axis). The red line is the line of identity.
Fig. 3Performance of VARPP classifiers by disease group. a Agreement scatter plots comparing the PP100 for VARPP including CADD + GTEx specificity (y axis) versus the PP100 for CADD scores alone (x axis). b Agreement scatter plots comparing the PP100 for VARPP including MetaSVM + GTEx specificity (y axis) versus the PP100 for MetaSVM scores alone (x axis). The red line is the line of identity.