| Literature DB >> 25253562 |
Li Jiang1, Stefan M Edwards, Bo Thomsen, Christopher T Workman, Bernt Guldbrandtsen, Peter Sørensen.
Abstract
BACKGROUND: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25253562 PMCID: PMC4181406 DOI: 10.1186/1471-2105-15-315
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Performance of the approach using different protein-protein interaction (PPI) confidence score thresholds. The influence of different PPI thresholds on the precision (red) and recall (black) is shown. The precision (y-axis) and recall (y-axis) were determined for each PPI threshold (x-axis) at the maximal Matthews correlation coefficient (MCC).
Figure 2Influence of protein-protein interaction (PPI) thresholds on the prioritization of causal genes in the test sets. The proportion (y-axis) of prioritized test-sets where causal genes were ranked within the top five (black) or top one (red) is shown according to different PPI confidence score thresholds (x-axis).
Figure 3Receiver operating characteristic (ROC) curves of prioritizations using different phenotype sources and vocabulary filters. Each ROC curve represents the prioritization performance when combining a specific gene-associated phenotype with a vocabulary filter. The phenotype sources were OMIM (brown), PubMed (green), and GeneRIF (purple). The vocabulary filters were STY, MeSH, ICD9CM, and GO (colored from dark to light accordingly).
Comparison of AUC (area under the curve), precision, and recall using different sources for the gene-associated phenotypes and phenotype vocabulary filters
| OMIM | PubMed | GeneRIF | |||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC | Precision | Recall | AUC | Precision | Recall | AUC | Precision | Recall | |
|
| 0.90 | 0.52 | 0.48 | 0.87 | 0.40 | 0.43 | 0.82 | 0.40 | 0.32 |
|
| 0.89 | 0.48 | 0.49 | 0.85 | 0.36 | 0.38 | 0.83 | 0.36 | 0.37 |
|
| 0.81 | 0.31 | 0.38 | 0.75 | 0.31 | 0.28 | 0.71 | 0.35 | 0.19 |
|
| 0.82 | 0.38 | 0.30 | 0.73 | 0.32 | 0.18 | 0.71 | 0.23 | 0.19 |
AUC, area under the curve.
Comparison with other network-based approaches
|
|
|
|
| |
|---|---|---|---|---|
| *PPI evidence incl. co-mention | PPI evidence excl. co-mention | |||
| Bayesian | Recall | 0.55 | 0.43 | 0.21 |
| Precision | 0.58 | 0.56 | 0.45 | |
| Regression | Recall | 0.57 | 0.43 | 0.30 − 0.55 |
| Precision | 0.52 | 0.47 | 0.73 − 0.55 | |
*PPI, protein-protein interaction.