| Literature DB >> 27131783 |
Léon-Charles Tranchevent1, Amin Ardeshirdavani2, Sarah ElShal2, Daniel Alcaide2, Jan Aerts2, Didier Auboeuf3, Yves Moreau2.
Abstract
Genomic studies and high-throughput experiments often produce large lists of candidate genes among which only a small fraction are truly relevant to the disease, phenotype or biological process of interest. Gene prioritization tackles this problem by ranking candidate genes by profiling candidates across multiple genomic data sources and integrating this heterogeneous information into a global ranking. We describe an extended version of our gene prioritization method, Endeavour, now available for six species and integrating 75 data sources. The performance (Area Under the Curve) of Endeavour on cross-validation benchmarks using 'gold standard' gene sets varies from 88% (for human phenotypes) to 95% (for worm gene function). In addition, we have also validated our approach using a time-stamped benchmark derived from the Human Phenotype Ontology, which provides a setting close to prospective validation. With this benchmark, using 3854 novel gene-phenotype associations, we observe a performance of 82%. Altogether, our results indicate that this extended version of Endeavour efficiently prioritizes candidate genes. The Endeavour web server is freely available at https://endeavour.esat.kuleuven.be/.Entities:
Mesh:
Year: 2016 PMID: 27131783 PMCID: PMC4987917 DOI: 10.1093/nar/gkw365
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The Endeavour algorithm. Users can start a prioritization by (1) selecting the species of interest, (2) defining which genes are known to be associated with the process of interest, (3) selecting the data sources to be used in the process and (4) providing the candidate genes to prioritize. Endeavour then (A) uses the seed genes to build a model of the process of interest, (B) scores the candidate genes with this model to produce several rankings and (C) integrate these rankings into one global ranking, which (5) is returned to the user through the web server.
Results of the leave-one-out cross-validation on ‘gold standard’ gene sets
| Species | Source | Nb sets | Nb genes | AUC | Control AUC |
|---|---|---|---|---|---|
| HPO | 1553 | 19 386 | 88.34% | 49.79% | |
| OMIM | 29 | 611 | 93.41% | 48.43% | |
| GAD | 966 | 10 921 | 88.96% | 50.04% | |
| GO | 4526 | 55 930 | 92.26% | 49.93% | |
| RGD-RDO | 672 | 8413 | 88.61% | 49.59% | |
| GO | 4 379 | 53 105 | 90.46% | 49.75% | |
| RGD-RDO | 652 | 7997 | 90.55% | 49.21% | |
| GO | 4140 | 49 895 | 88.68% | 49.24% | |
| FlyBase-Pheno | 1612 | 17 395 | 91.38% | 50.12% | |
| GO | 2371 | 28 834 | 89.88% | 49.93% | |
| WormBase-Func | 225 | 2304 | 94.93% | 50.87% | |
| GO | 1400 | 17 060 | 92.05% | 49.95% | |
| Zfin-Pato | 135 | 1662 | 88.70% | 49.37% | |
| GO | 1856 | 22 476 | 90.76% | 49.16% |
For each benchmark (row), the columns contain the number of gene sets, the number of genes, the AUC and the control AUC respectively.