| Literature DB >> 29370760 |
Guido Zampieri1,2, Dinh Van Tran3, Michele Donini4, Nicolò Navarin3, Fabio Aiolli3, Alessandro Sperduti3, Giorgio Valle5,6.
Abstract
BACKGROUND: The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability.Entities:
Keywords: Gene prioritization; Genetic disease; Kernel methods; Semi-supervised learning
Mesh:
Year: 2018 PMID: 29370760 PMCID: PMC5785908 DOI: 10.1186/s12859-018-2025-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The performance of different techniques in the experimental setting of Chen et al. [18] expressed in terms of AUC
| Method | AUC | |
|---|---|---|
| Scuba | 0.876 | - |
| F3PC [ | 0.830 | 1.39 ·10−4 * |
| MRF [ | 0.731 | < 10−6 * |
| DIR [ | 0.716 | < 10−6 * |
| GeneWanderer [ | 0.711 | < 10−6 * |
Except for our proposed method Scuba, these results were taken from that work. The p-values indicate significance of the pairwise AUC differences with respect to Scuba AUC [36]. Asterisks indicate significance of the test (p-value < 0.05)
Performances of Scuba, MKL1class and ProDiGe in the unbiased setting of Börnigen et al. [20]
| Tool/Method | Rank | Rank | TPR in top | TPR in top | TPR in top | AUC | Rank difference |
|---|---|---|---|---|---|---|---|
| median | average | 5% (%) | 10% (%) | 30% (%) | |||
| Genome-wide prioritization methods | |||||||
| Scuba |
|
|
|
|
| - | |
| MKL1class [ | 13.30 | 23.42 ± 23.23 | 21.4 |
| 69.0 | 0.77 | 2.5 ·10−2 * |
| ProDiGe [ | 11.73 | 24.45 ± 27.33 | 31.0 | 45.2 | 71.4 | 0.76 | 3.0 ·10−7 * |
| Candidate set-based prioritization methods | |||||||
| Scuba |
|
|
|
|
| - | |
| MKL1class [ | 15.07 | 25.63 ± 24.73 | 23.8 | 40.5 | 61.9 | 0.76 | 9.7 ·10−2 |
| ProDiGe [ | 14.41 | 26.39 ± 29.09 | 26.2 | 40.5 | 71.4 | 0.75 | 2.7 ·10−3 * |
Values refer to predictions on all the 42 gene-disease associations. Rank difference p-values were obtained using Wilcoxon signed rank tests comparing separately Scuba/MKL1class and Scuba/ProDiGe ranks differences. Asterisks indicate significance of the tests at a threshold of 0.05
Italics indicates the top ranking score of each column
Fig. 1Comparison of normalized ranks predicted by Scuba and competing kernel methods. Normalized test genes rank distributions predicted by Scuba, MKL1class and ProDiGe for test genes in (a) genome-wide prioritizations in the unbiased evaluation of Table 2 - (b) candidate set-based prioritizations in the unbiased evaluation of Table 2 - (c) genome-wide prioritizations in the expanded unbiased evaluation of Table 4. In all cases, each point represents a test gene and lower values on the axes indicate better predictions. Genes lying on a diagonal have the same rank according to both methods considered on a plot. The further a gene lies above (below) a diagonal and the better it was ranked by Scuba (MKL1class/ProDiGe) compared to MKL1class/ProDiGe (Scuba). In each plot we show the Pearson correlation coefficient r between the test genes rank distributions and its associated p-value
Performances of Scuba, MKL1class and ProDiGe in the expanded unbiased setting involving seven multifactorial diseases
| Method | Rank | Rank | TPR in top | TPR in top | TPR in top | TPR in top | AUC | Rank difference |
|---|---|---|---|---|---|---|---|---|
| median | average | 1% (%) | 5% (%) | 10% (%) | 30% (%) | |||
| Genome-wide prioritizations | ||||||||
| Scuba | 8.13 |
| 41.7 |
|
|
| - | |
| MKL1class [ | 14.28 | 25.79 ± 26.96 | 2.1 | 27.1 | 45.8 | 66.7 | 0.74 | 1.2 ·10−5 * |
| ProDiGe [ |
| 18.40 ± 23.77 |
|
| 54.2 |
| 0.82 | 9.5 ·10−2 |
Values refer to predictions on 48 gene-disease associations. Rank difference p-values were obtained using Wilcoxon signed rank tests comparing separately Scuba/MKL1class and Scuba/ProDiGe ranks differences. Asterisks indicate significance of the tests at a threshold of 0.05
Italics indicates the top ranking score of each column
Performances of Scuba and of some gene prioritization web tools in the unbiased setting of Börnigen et al. [20]
| Tool/Method | Response | Rank | Rank | TPR in top | TPR in top | TPR in top | AUC |
|---|---|---|---|---|---|---|---|
| rate (%) | median | average | 5% (%) | 10% (%) | 30% (%) | ||
| Genome-wide prioritization methods | |||||||
| Scuba | 100 |
|
|
|
|
| |
| Candid [ | 100 | 18.10 | 27.35 ± 24.62 | 21.4 | 33.3 | 64.3 | 0.73 |
| Endeavour [ | 100 | 15.49 | 21.47 ± 22.37 | 28.6 | 38.1 | 71.4 | 0.79 |
| Pinta [ | 100 | 19.03 | 23.52 ± 23.58 | 26.2 | 31.0 | 71.4 | 0.77 |
| Candidate set-based prioritization methods | |||||||
| Scuba | 100 | 12.95 | 23.32 ± 25.46 | 28.6 | 45.2 | 73.8 | 0.78 |
| Suspects [ | 88.9a | 12.77a | 24.64 ± 26.42a | 33.3a | 33.3a | 63.0a | 0.76a |
| ToppGene [ | 97.6 | 16.80 | 34.53 ± 35.31 |
| 42.9 | 52.4 | 0.66 |
| GeneWanderer-RW [ | 88.1 | 22.10 | 29.55 ± 26.28 | 16.7 | 26.2 | 61.9 | 0.71 |
| Posmed-KS [ | 47.6 | 31.44 | 42.07 ± 30.98 | 4.7 | 7.1 | 23.8 | 0.58 |
| GeneDistiller [ | 97.6 |
| 26.2 |
| 78.6 |
| |
| Endeavour [ | 100 | 11.16 | 18.41 ± 21.39 | 26.2 | 42.9 |
| 0.82 |
| Pinta [ | 100 | 18.87 | 25.23 ± 24.72 | 28.6 | 31.0 | 71.4 | 0.75 |
Response rate is the percentage of gene-disease associations considered by each tool. Values for Suspects were computed on the first 27 associations only (highlighted by a)
Italics indicates the top ranking score of each column