| Literature DB >> 18508807 |
Léon-Charles Tranchevent1, Roland Barriot, Shi Yu, Steven Van Vooren, Peter Van Loo, Bert Coessens, Bart De Moor, Stein Aerts, Yves Moreau.
Abstract
Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein-protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.Entities:
Mesh:
Year: 2008 PMID: 18508807 PMCID: PMC2447805 DOI: 10.1093/nar/gkn325
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Endeavour: the algorithm behind the wizard. Once the organism of interest is chosen (Step 1), the user can specify the training genes (Step 2). Step 3 lets the user select the data sources that will be used to build the models. The models summarize the training gene information. The candidate genes specified by the user in Step 4 are then scored against the model. This produces one ranking per data source plus one global ranking obtained by fusion of the rankings per data source. The global ranking together with the rankings per data source are returned to the application and can be viewed in the ‘Results’ panel.
Figure 2.Results of the leave-one-out cross-validation. For each organism, the leave-one-out cross-validation was performed on three pathways sets from Gene Ontology (23), and, as a control, on five sets of 20 randomly selected genes. The ROC curves of the random (dotted green) and pathway validation (solid red and dashed blue) are plotted for (a) H. sapiens, (b) M. musculus, (c) R. norvegicus and (d) C. elegans. Notice that for the fair validation (dashed blue), Gene Ontology, KEGG, Text and String were excluded while all data sources were used for the complete validation (solid red). The AUC of the control validations are respectively 48, 39, 45 and 51% indicating a random performance. On the opposite, the AUC of the pathway validations are respectively 88, 92, 90 and 86% for the fair validation and 99, 99, 99 and 98% for the complete validation showing the validity of our approach.
Results of the thirty two genetic disorder prioritizations
| Gene | Disorder | Reference | Endeavour rank |
|---|---|---|---|
| Systemic lupus erythematosus | Kozyrev | 1 | |
| Systemic lupus erythematosus | Nath | 3 | |
| Systemic lupus erythematosus | Graham | 16 | |
| Amyotropic lateral sclerosis | van Es | 15 | |
| Chronic pancreatitis | Rosendahl | 1 | |
| Impaired glycosylation | Kornak | 5 | |
| Cutis laxa | Kornak | 5 | |
| LDL/HDL cholesterol | Willer | 13 | |
| LDL/HDL cholesterol | Willer | 1 | |
| LDL/HDL cholesterol | Willer | 12 | |
| Human height | Sanna | 2 | |
| Human height | Sanna | 41 | |
| Prostate cancer | Eeles | 18 | |
| Prostate cancer | Thomas | 14 | |
| Prostate cancer | Thomas | 4 | |
| Prostate cancer | Eeles | 4 | |
| Prostate cancer | Eeles | 9 | |
| Prostate cancer | Thomas | 42 | |
| Prostate cancer | Thomas | 9 | |
| Prostate cancer | Thomas | 40 | |
| Prostate cancer | Gudmundsson | 19 | |
| Celiac disease | Hunt | 12 | |
| Celiac disease | Hunt | 2 | |
| Celiac disease | Hunt | 30 | |
| Celiac disease | Hunt | 3 | |
| Celiac disease | Hunt | 2 | |
| Celiac disease | Hunt | 18 | |
| Celiac disease | Hunt | 20 | |
| Celiac disease | Hunt | 3 | |
| Celiac disease | Hunt | 4 | |
| Celiac disease | Hunt | 10 | |
| Celiac disease | Hunt | 14 | |
| Mean (all genes) | 12.25 | ||
| Mean (GWAS excluded) | 6.57 |
Associations reported with GWAS (Genome Wide SNPs Associations Studies).
The gene-disease associations were reported in Nature Genetics after 1 January 2008 to exclude the presence of explicit evidence in our data sources. The training sets were built with OMIM and Gene Ontology; and the candidate regions contain the novel gene and its 99 nearest neighbours. The 20 human data sources were used to perform the prioritizations. The results show that Endeavour ranked all the novel genes but four within the top 20%, and half of them within the top 9%.