| Literature DB >> 25861967 |
Agne Antanaviciute1, Catherine Daly1, Laura A Crinnion2, Alexander F Markham1, Christopher M Watson2, David T Bonthron1, Ian M Carr1.
Abstract
MOTIVATION: In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually.Entities:
Mesh:
Year: 2015 PMID: 25861967 PMCID: PMC4528628 DOI: 10.1093/bioinformatics/btv196
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of GeneTIER implementation. The web-based interface allows the user to supply candidate disease genes to prioritize and to select affected tissues. Top prioritization results are returned in a tabular form and are available to visualize and compare using an interactive chart. Full results are available for download
Fig. 2.ROC curve showing classifier performance on different size input generated using disease genes from the benchmarking dataset (see Section 2)
AUC scores for classifier performance when assessed using 1000 known disease genes
| Random gene sample size | Area under the ROC curve |
|---|---|
| 50 | 0.83 |
| 100 | 0.80 |
| 200 | 0.81 |
| 500 | 0.78 |
Fig. 3.Expression profiles of PRPF3, PRPF31, PRPF6, ROM11 and RP1 genes, associated with retinitis pigmentosa (OMIM:610282) across a selection of tissues, RNA sequencing data
Mean ranks and standard deviations of five case-study genes shown in Fig. 3
| Gene | PR1 | ROM1 | PRPF6 | PRPF31 | PRPF3 | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Input size | Mean rank | Standard deviation | Mean rank | Standard deviation | Mean rank | Standard deviation | Mean rank | Standard deviation | Mean rank | Standard deviation |
| 50 | 34.7 | 11.03 | 8.16 | 11.7 | 8.8 | 4.2 | 2.9 | 1.78 | 5.4 | 3.01 |
| 100 | 66.03 | 8.9 | 17.07 | 6.38 | 22.7 | 5.53 | 4.13 | 2.21 | 7.1 | 2.54 |
| 200 | 172.07 | 5.75 | 53.87 | 6.47 | 28.33 | 3.20 | 20.3 | 3.91 | 28.0 | 4.08 |
| 500 | 288.11 | 13.33 | 67.4 | 10.03 | 140.65 | 15.07 | 39.24 | 4.45 | 41.65 | 6.51 |
Each gene was ranked 30 times against a set of 50, 100, 200 and 500 randomly generated genes
Mean reciprocal ranks of five case-study genes assessed against a set with 50, 100, 200 and 500 randomly generated genes; 30 replicates
| Mean reciprocal rank | |||||
|---|---|---|---|---|---|
| Input size | PR1 | ROM1 | PRPF6 | PRPF31 | PRPF3 |
| 50 | 0.78 | 0.09 | 0.09 | 0.08 | 0.08 |
| 100 | 0.66 | 0.17 | 0.23 | 0.04 | 0.07 |
| 200 | 0.86 | 0.27 | 0.14 | 0.10 | 0.14 |
| 500 | 0.58 | 0.13 | 0.28 | 0.08 | 0.08 |